Hi John Erik,
Thanks for all your work on the Resource Resolver and the cdx-cli. I tried
them both successfully. I noticed a few things, but nothing major.

For the Resource Resolver I basically just did what was documented in the
README: queried both /resource and /resourcelist, used old-style CDX and
CDXJ, tried the various parameters listed, sent request headers for the
different Accept values. Here are the issues I encountered (all were easily
overcome).

When first trying to start up with
openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr I got:
 Exception in thread "main" java.lang.UnsupportedClassVersionError:
org/netpreserve/resource/resolver/Main : Unsupported major.minor version
52.0

- I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by
default on my machine.


Then trying to start again I got:
 java.lang.IllegalArgumentException:
/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not
a recognized CDX format

 - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I
grabbed a SURT-formatted file.


Tried to start again:
 10:51:48.707 [main] INFO  org.netpreserve.commons.cdx.CdxSourceFactory -
Loaded CDX Source Factory for scheme 'cdxfile'
 10:51:48.712 [main] INFO
 org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding all
files in '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx' as
cdx sources
 10:51:48.713 [main] INFO
 org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding file
'/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index_IA_surt.cdx'
as a cdx source
 Exception in thread "main" java.lang.IllegalArgumentException: Negative
position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670)
at
org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:70)
at
org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:55)
at
org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory.createCdxSource(CdxFileSourceFactory.java:71)
at
org.netpreserve.commons.cdx.CdxSourceFactory.getCdxSource(CdxSourceFactory.java:62)
at
org.netpreserve.resource.resolver.settings.SettingsUtil.lambda$createCdxSource$0(SettingsUtil.java:38)
at
org.netpreserve.resource.resolver.settings.SettingsUtil$$Lambda$1/1279271200.apply(Unknown
Source)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1359)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at
org.netpreserve.resource.resolver.settings.SettingsUtil.createCdxSource(SettingsUtil.java:40)
at
org.netpreserve.resource.resolver.ResourceResolverServer.<init>(ResourceResolverServer.java:69)
at org.netpreserve.resource.resolver.Main.main(Main.java:33)

 - Turns out the first SURT-formatted CDX file I grabbed was 30GB and
seemed to be too big to handle. I fed the first 1,000,000 lines to a new
CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT)
started.


I tried doing some searches, stopped the Resource Resolver, and upon trying
to restart it I got:
 Exception in thread "main" java.lang.IllegalArgumentException:
/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp is
not a recognized CDX format

 - At some point my system had created a .out.cdx.swp (my cdx file was
called out.cdx). Not sure if Resource Resolver should ignore dot files or
if it should just be up to the user to handle this sort of issue.


- For a date range query, Resource Resolver did not include the exact start
time match (README says start date is inclusive) when precision is down to
the second. For example:

http://localhost:8080/resourcelist/http%3A%2F%2Fwww.basketball.com%2Frobots.txt?date=2012-10-14T03:18:37,2013

- Does not give me the entry in my CDX file with exact timestamp
2012-10-14T03:18:37.


- Also relating to timestamp, but maybe not a problem with the application
itself, in the README, it says "The time stamp can be in either WARC-format
(e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things I copy and
pasted that timestamp without thinking to my request URL and got a 500
error before realizing that example is not a valid time. My mistake, but
perhaps the example timestamp formats should be changed in the README.
Also, should the invalid time be handled so it doesn't throw a 500?


I also tried out cdx-cli to get a CDXJ formatted index. I used both the
reformat and extract commands. I very much appreciate the thorough usage
instructions that will print at the command line. I did have one issue in
trying to convert an existing CDX file:

| (pipe character) in URLs (but not in the query string) in the CDX file I
was trying to convert (status codes in CDX were 404s for these URLs) would
error and the reformatting process would stop.

 $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o
../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i out.cdx
Reformatting: out.cdx into:
../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/out.cdxj
Illegal path: http://youtu.be/csorZustZbo|

That is what I found in initial testing. Overall it worked well. Thanks
again!

Lauren Ko
UNT Libraries

On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <johnerikha...@gmail.com>
wrote:

> Hi all,
>
> A very early version of the Resource Resolver (aka CDX server) is ready
> for testing and feedback.
> Have a look here for the details: https://github.com/
> iipc/openwayback/tree/3.0.0-DEV/openwayback-resource-resolver
>
> Since the Resource Resolver also supports the current CDX file format, you
> can test it right away, but if you want to use the new format, a tool is
> available here:
> https://github.com/iipc/cdx-cli
>
> Best,
>
> John Erik Halse
>
> --
> You received this message because you are subscribed to the Google Groups
> "openwayback-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openwayback-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to openwayback-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to