Thanks for testing and finding bugs. I'm sure there are a lot more :-)
For the cdx-cli tool: I found the bug with the pipe symbol, it should work
I think I also have fixed the bugs in the Resource resolver. I also updated
the readme to reflect that Java 8 is needed.
I totally agree that error messages from the Resource Resolver with
incorrect input, needs better handling. It's on my todo list.
For the problem with invalid cdx-files, it will now log a warning and skip
the invalid file instead of aborting.
fredag 16. september 2016 21.18.58 UTC+2 skrev Lauren Ko følgende:
> Hi John Erik,
> Thanks for all your work on the Resource Resolver and the cdx-cli. I tried
> them both successfully. I noticed a few things, but nothing major.
> For the Resource Resolver I basically just did what was documented in the
> README: queried both /resource and /resourcelist, used old-style CDX and
> CDXJ, tried the various parameters listed, sent request headers for the
> different Accept values. Here are the issues I encountered (all were easily
> When first trying to start up with
> openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr I got:
> Exception in thread "main" java.lang.UnsupportedClassVersionError:
> org/netpreserve/resource/resolver/Main : Unsupported major.minor version
> - I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by
> default on my machine.
> Then trying to start again I got:
> /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not
> a recognized CDX format
> - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I
> grabbed a SURT-formatted file.
> Tried to start again:
> 10:51:48.707 [main] INFO org.netpreserve.commons.cdx.CdxSourceFactory -
> Loaded CDX Source Factory for scheme 'cdxfile'
> 10:51:48.712 [main] INFO
> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding all
> files in '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx' as
> cdx sources
> 10:51:48.713 [main] INFO
> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding file
> as a cdx source
> Exception in thread "main" java.lang.IllegalArgumentException: Negative
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at org.netpreserve.resource.resolver.Main.main(Main.java:33)
> - Turns out the first SURT-formatted CDX file I grabbed was 30GB and
> seemed to be too big to handle. I fed the first 1,000,000 lines to a new
> CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT)
> I tried doing some searches, stopped the Resource Resolver, and upon
> trying to restart it I got:
> Exception in thread "main" java.lang.IllegalArgumentException:
> /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp is
> not a recognized CDX format
> - At some point my system had created a .out.cdx.swp (my cdx file was
> called out.cdx). Not sure if Resource Resolver should ignore dot files or
> if it should just be up to the user to handle this sort of issue.
> - For a date range query, Resource Resolver did not include the exact
> start time match (README says start date is inclusive) when precision is
> down to the second. For example:
> - Does not give me the entry in my CDX file with exact timestamp
> - Also relating to timestamp, but maybe not a problem with the application
> itself, in the README, it says "The time stamp can be in either WARC-format
> (e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things I copy and
> pasted that timestamp without thinking to my request URL and got a 500
> error before realizing that example is not a valid time. My mistake, but
> perhaps the example timestamp formats should be changed in the README.
> Also, should the invalid time be handled so it doesn't throw a 500?
> I also tried out cdx-cli to get a CDXJ formatted index. I used both the
> reformat and extract commands. I very much appreciate the thorough usage
> instructions that will print at the command line. I did have one issue in
> trying to convert an existing CDX file:
> | (pipe character) in URLs (but not in the query string) in the CDX file I
> was trying to convert (status codes in CDX were 404s for these URLs) would
> error and the reformatting process would stop.
> $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o
> ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i out.cdx
> Reformatting: out.cdx into:
> Illegal path: http://youtu.be/csorZustZbo|
> That is what I found in initial testing. Overall it worked well. Thanks
> Lauren Ko
> UNT Libraries
> On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <johner...@gmail.com
>> Hi all,
>> A very early version of the Resource Resolver (aka CDX server) is ready
>> for testing and feedback.
>> Have a look here for the details:
>> Since the Resource Resolver also supports the current CDX file format,
>> you can test it right away, but if you want to use the new format, a tool
>> is available here:
>> John Erik Halse
>> You received this message because you are subscribed to the Google Groups
>> "openwayback-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.