Hi John Erik, Thanks for all your work on the Resource Resolver and the cdx-cli. I tried them both successfully. I noticed a few things, but nothing major.
For the Resource Resolver I basically just did what was documented in the README: queried both /resource and /resourcelist, used old-style CDX and CDXJ, tried the various parameters listed, sent request headers for the different Accept values. Here are the issues I encountered (all were easily overcome). When first trying to start up with openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr I got: Exception in thread "main" java.lang.UnsupportedClassVersionError: org/netpreserve/resource/resolver/Main : Unsupported major.minor version 52.0 - I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by default on my machine. Then trying to start again I got: java.lang.IllegalArgumentException: /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not a recognized CDX format - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I grabbed a SURT-formatted file. Tried to start again: 10:51:48.707 [main] INFO org.netpreserve.commons.cdx.CdxSourceFactory - Loaded CDX Source Factory for scheme 'cdxfile' 10:51:48.712 [main] INFO org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding all files in '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx' as cdx sources 10:51:48.713 [main] INFO org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding file '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index_IA_surt.cdx' as a cdx source Exception in thread "main" java.lang.IllegalArgumentException: Negative position at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670) at org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:70) at org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:55) at org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory.createCdxSource(CdxFileSourceFactory.java:71) at org.netpreserve.commons.cdx.CdxSourceFactory.getCdxSource(CdxSourceFactory.java:62) at org.netpreserve.resource.resolver.settings.SettingsUtil.lambda$createCdxSource$0(SettingsUtil.java:38) at org.netpreserve.resource.resolver.settings.SettingsUtil$$Lambda$1/1279271200.apply(Unknown Source) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1359) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.netpreserve.resource.resolver.settings.SettingsUtil.createCdxSource(SettingsUtil.java:40) at org.netpreserve.resource.resolver.ResourceResolverServer.<init>(ResourceResolverServer.java:69) at org.netpreserve.resource.resolver.Main.main(Main.java:33) - Turns out the first SURT-formatted CDX file I grabbed was 30GB and seemed to be too big to handle. I fed the first 1,000,000 lines to a new CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT) started. I tried doing some searches, stopped the Resource Resolver, and upon trying to restart it I got: Exception in thread "main" java.lang.IllegalArgumentException: /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp is not a recognized CDX format - At some point my system had created a .out.cdx.swp (my cdx file was called out.cdx). Not sure if Resource Resolver should ignore dot files or if it should just be up to the user to handle this sort of issue. - For a date range query, Resource Resolver did not include the exact start time match (README says start date is inclusive) when precision is down to the second. For example: http://localhost:8080/resourcelist/http%3A%2F%2Fwww.basketball.com%2Frobots.txt?date=2012-10-14T03:18:37,2013 - Does not give me the entry in my CDX file with exact timestamp 2012-10-14T03:18:37. - Also relating to timestamp, but maybe not a problem with the application itself, in the README, it says "The time stamp can be in either WARC-format (e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things I copy and pasted that timestamp without thinking to my request URL and got a 500 error before realizing that example is not a valid time. My mistake, but perhaps the example timestamp formats should be changed in the README. Also, should the invalid time be handled so it doesn't throw a 500? I also tried out cdx-cli to get a CDXJ formatted index. I used both the reformat and extract commands. I very much appreciate the thorough usage instructions that will print at the command line. I did have one issue in trying to convert an existing CDX file: | (pipe character) in URLs (but not in the query string) in the CDX file I was trying to convert (status codes in CDX were 404s for these URLs) would error and the reformatting process would stop. $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i out.cdx Reformatting: out.cdx into: ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/out.cdxj Illegal path: http://youtu.be/csorZustZbo| That is what I found in initial testing. Overall it worked well. Thanks again! Lauren Ko UNT Libraries On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <johnerikha...@gmail.com> wrote: > Hi all, > > A very early version of the Resource Resolver (aka CDX server) is ready > for testing and feedback. > Have a look here for the details: https://github.com/ > iipc/openwayback/tree/3.0.0-DEV/openwayback-resource-resolver > > Since the Resource Resolver also supports the current CDX file format, you > can test it right away, but if you want to use the new format, a tool is > available here: > https://github.com/iipc/cdx-cli > > Best, > > John Erik Halse > > -- > You received this message because you are subscribed to the Google Groups > "openwayback-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to openwayback-dev+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "openwayback-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.