Hi John Erik, I tried things out with the changes you made, and it looks like the issues I mentioned are all addressed. I did also notice a small logging issue when starting up WARR.
In case of multiple cdx sources, as they are added at start up, it is always logging the first file name: https://github.com/iipc/webarchive-commons/blob/2.0.0-DEV/webarchive-commons-cdx/src/main/java/org/netpreserve/commons/cdx/cdxsource/CdxFileSourceFactory.java#L84 Thanks again! Lauren On Thu, Sep 22, 2016 at 7:58 AM, John Erik Halse <johnerikha...@gmail.com> wrote: > Hi Lauren, > > Thanks for testing and finding bugs. I'm sure there are a lot more :-) > > For the cdx-cli tool: I found the bug with the pipe symbol, it should work > now. > > I think I also have fixed the bugs in the Resource resolver. I also > updated the readme to reflect that Java 8 is needed. > > I totally agree that error messages from the Resource Resolver with > incorrect input, needs better handling. It's on my todo list. > > For the problem with invalid cdx-files, it will now log a warning and skip > the invalid file instead of aborting. > > Thanks, > > John Erik > > > fredag 16. september 2016 21.18.58 UTC+2 skrev Lauren Ko følgende: > >> Hi John Erik, >> Thanks for all your work on the Resource Resolver and the cdx-cli. I >> tried them both successfully. I noticed a few things, but nothing major. >> >> For the Resource Resolver I basically just did what was documented in the >> README: queried both /resource and /resourcelist, used old-style CDX and >> CDXJ, tried the various parameters listed, sent request headers for the >> different Accept values. Here are the issues I encountered (all were easily >> overcome). >> >> When first trying to start up with >> openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr >> I got: >> Exception in thread "main" java.lang.UnsupportedClassVersionError: >> org/netpreserve/resource/resolver/Main : Unsupported major.minor version >> 52.0 >> >> - I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by >> default on my machine. >> >> >> Then trying to start again I got: >> java.lang.IllegalArgumentException: /tmp/warr/openwayback-resource >> -resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not a recognized CDX format >> >> - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I >> grabbed a SURT-formatted file. >> >> >> Tried to start again: >> 10:51:48.707 [main] INFO org.netpreserve.commons.cdx.CdxSourceFactory >> - Loaded CDX Source Factory for scheme 'cdxfile' >> 10:51:48.712 [main] INFO >> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory >> - Adding all files in >> '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx' >> as cdx sources >> 10:51:48.713 [main] INFO >> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory >> - Adding file '/tmp/warr/openwayback-resourc >> e-resolver-3.0.0-SNAPSHOT/cdx/index_IA_surt.cdx' as a cdx source >> Exception in thread "main" java.lang.IllegalArgumentException: Negative >> position >> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670) >> at org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<ini >> t>(CdxFileDescriptor.java:70) >> at org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<ini >> t>(CdxFileDescriptor.java:55) >> at org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory.c >> reateCdxSource(CdxFileSourceFactory.java:71) >> at org.netpreserve.commons.cdx.CdxSourceFactory.getCdxSource(Cd >> xSourceFactory.java:62) >> at org.netpreserve.resource.resolver.settings.SettingsUtil. >> lambda$createCdxSource$0(SettingsUtil.java:38) >> at org.netpreserve.resource.resolver.settings.SettingsUtil$$ >> Lambda$1/1279271200.apply(Unknown Source) >> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipel >> ine.java:193) >> at java.util.ArrayList$ArrayListSpliterator.forEachRemaining( >> ArrayList.java:1359) >> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512) >> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPi >> peline.java:502) >> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Reduc >> eOps.java:708) >> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) >> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) >> at org.netpreserve.resource.resolver.settings.SettingsUtil. >> createCdxSource(SettingsUtil.java:40) >> at org.netpreserve.resource.resolver.ResourceResolverServer.< >> init>(ResourceResolverServer.java:69) >> at org.netpreserve.resource.resolver.Main.main(Main.java:33) >> >> - Turns out the first SURT-formatted CDX file I grabbed was 30GB and >> seemed to be too big to handle. I fed the first 1,000,000 lines to a new >> CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT) >> started. >> >> >> I tried doing some searches, stopped the Resource Resolver, and upon >> trying to restart it I got: >> Exception in thread "main" java.lang.IllegalArgumentException: >> /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp >> is not a recognized CDX format >> >> - At some point my system had created a .out.cdx.swp (my cdx file was >> called out.cdx). Not sure if Resource Resolver should ignore dot files or >> if it should just be up to the user to handle this sort of issue. >> >> >> - For a date range query, Resource Resolver did not include the exact >> start time match (README says start date is inclusive) when precision is >> down to the second. For example: >> http://localhost:8080/resourcelist/http%3A%2F%2Fwww.basketb >> all.com%2Frobots.txt?date=2012-10-14T03:18:37,2013 >> >> - Does not give me the entry in my CDX file with exact timestamp >> 2012-10-14T03:18:37. >> >> >> - Also relating to timestamp, but maybe not a problem with the >> application itself, in the README, it says "The time stamp can be in either >> WARC-format (e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things >> I copy and pasted that timestamp without thinking to my request URL and got >> a 500 error before realizing that example is not a valid time. My mistake, >> but perhaps the example timestamp formats should be changed in the README. >> Also, should the invalid time be handled so it doesn't throw a 500? >> >> >> I also tried out cdx-cli to get a CDXJ formatted index. I used both the >> reformat and extract commands. I very much appreciate the thorough usage >> instructions that will print at the command line. I did have one issue in >> trying to convert an existing CDX file: >> >> | (pipe character) in URLs (but not in the query string) in the CDX file >> I was trying to convert (status codes in CDX were 404s for these URLs) >> would error and the reformatting process would stop. >> >> $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o >> ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i >> out.cdx >> Reformatting: out.cdx into: ../openwayback-resource-resolv >> er-3.0.0-SNAPSHOT/cdx/out.cdxj >> Illegal path: http://youtu.be/csorZustZbo| >> >> That is what I found in initial testing. Overall it worked well. Thanks >> again! >> >> Lauren Ko >> UNT Libraries >> >> On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <johner...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> A very early version of the Resource Resolver (aka CDX server) is ready >>> for testing and feedback. >>> Have a look here for the details: https://github.com/ii >>> pc/openwayback/tree/3.0.0-DEV/openwayback-resource-resolver >>> >>> Since the Resource Resolver also supports the current CDX file format, >>> you can test it right away, but if you want to use the new format, a tool >>> is available here: >>> https://github.com/iipc/cdx-cli >>> >>> Best, >>> >>> John Erik Halse >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "openwayback-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to openwayback-d...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "openwayback-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to openwayback-dev+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "openwayback-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to openwayback-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.