Hi Lauren,

Thanks for testing and finding bugs. I'm sure there are a lot more :-)

For the cdx-cli tool: I found the bug with the pipe symbol, it should work 
now.

I think I also have fixed the bugs in the Resource resolver. I also updated 
the readme to reflect that Java 8 is needed.

I totally agree that error messages from the Resource Resolver with 
incorrect input, needs better handling. It's on my todo list.

For the problem with invalid cdx-files, it will now log a warning and skip 
the invalid file instead of aborting.

Thanks,

John Erik

fredag 16. september 2016 21.18.58 UTC+2 skrev Lauren Ko følgende:
>
> Hi John Erik,
> Thanks for all your work on the Resource Resolver and the cdx-cli. I tried 
> them both successfully. I noticed a few things, but nothing major.
>
> For the Resource Resolver I basically just did what was documented in the 
> README: queried both /resource and /resourcelist, used old-style CDX and 
> CDXJ, tried the various parameters listed, sent request headers for the 
> different Accept values. Here are the issues I encountered (all were easily 
> overcome).
>
> When first trying to start up with 
> openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr I got:
>  Exception in thread "main" java.lang.UnsupportedClassVersionError: 
> org/netpreserve/resource/resolver/Main : Unsupported major.minor version 
> 52.0
>  
> - I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by 
> default on my machine.
>
>
> Then trying to start again I got:
>  java.lang.IllegalArgumentException: 
> /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not 
> a recognized CDX format
>
>  - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I 
> grabbed a SURT-formatted file.
>
>
> Tried to start again:
>  10:51:48.707 [main] INFO  org.netpreserve.commons.cdx.CdxSourceFactory - 
> Loaded CDX Source Factory for scheme 'cdxfile'
>  10:51:48.712 [main] INFO 
>  org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding all 
> files in '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx' as 
> cdx sources
>  10:51:48.713 [main] INFO 
>  org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory - Adding file 
> '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/index_IA_surt.cdx'
>  
> as a cdx source
>  Exception in thread "main" java.lang.IllegalArgumentException: Negative 
> position
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670)
> at 
> org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:70)
> at 
> org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<init>(CdxFileDescriptor.java:55)
> at 
> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory.createCdxSource(CdxFileSourceFactory.java:71)
> at 
> org.netpreserve.commons.cdx.CdxSourceFactory.getCdxSource(CdxSourceFactory.java:62)
> at 
> org.netpreserve.resource.resolver.settings.SettingsUtil.lambda$createCdxSource$0(SettingsUtil.java:38)
> at 
> org.netpreserve.resource.resolver.settings.SettingsUtil$$Lambda$1/1279271200.apply(Unknown
>  
> Source)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1359)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
> at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.netpreserve.resource.resolver.settings.SettingsUtil.createCdxSource(SettingsUtil.java:40)
> at 
> org.netpreserve.resource.resolver.ResourceResolverServer.<init>(ResourceResolverServer.java:69)
> at org.netpreserve.resource.resolver.Main.main(Main.java:33)
>
>  - Turns out the first SURT-formatted CDX file I grabbed was 30GB and 
> seemed to be too big to handle. I fed the first 1,000,000 lines to a new 
> CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT) 
> started.
>
>
> I tried doing some searches, stopped the Resource Resolver, and upon 
> trying to restart it I got:
>  Exception in thread "main" java.lang.IllegalArgumentException: 
> /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp is 
> not a recognized CDX format
>
>  - At some point my system had created a .out.cdx.swp (my cdx file was 
> called out.cdx). Not sure if Resource Resolver should ignore dot files or 
> if it should just be up to the user to handle this sort of issue.
>
>  
> - For a date range query, Resource Resolver did not include the exact 
> start time match (README says start date is inclusive) when precision is 
> down to the second. For example:
>  
> http://localhost:8080/resourcelist/http%3A%2F%2Fwww.basketball.com%2Frobots.txt?date=2012-10-14T03:18:37,2013
>  
> - Does not give me the entry in my CDX file with exact timestamp 
> 2012-10-14T03:18:37.
>
>
> - Also relating to timestamp, but maybe not a problem with the application 
> itself, in the README, it says "The time stamp can be in either WARC-format 
> (e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things I copy and 
> pasted that timestamp without thinking to my request URL and got a 500 
> error before realizing that example is not a valid time. My mistake, but 
> perhaps the example timestamp formats should be changed in the README. 
> Also, should the invalid time be handled so it doesn't throw a 500?
>
>
> I also tried out cdx-cli to get a CDXJ formatted index. I used both the 
> reformat and extract commands. I very much appreciate the thorough usage 
> instructions that will print at the command line. I did have one issue in 
> trying to convert an existing CDX file:
>
> | (pipe character) in URLs (but not in the query string) in the CDX file I 
> was trying to convert (status codes in CDX were 404s for these URLs) would 
> error and the reformatting process would stop.
>
>  $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o 
> ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i out.cdx
> Reformatting: out.cdx into: 
> ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/out.cdxj
> Illegal path: http://youtu.be/csorZustZbo|
>
> That is what I found in initial testing. Overall it worked well. Thanks 
> again!
>
> Lauren Ko
> UNT Libraries
>
> On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <johner...@gmail.com 
> <javascript:>> wrote:
>
>> Hi all,
>>
>> A very early version of the Resource Resolver (aka CDX server) is ready 
>> for testing and feedback.
>> Have a look here for the details: 
>> https://github.com/iipc/openwayback/tree/3.0.0-DEV/openwayback-resource-resolver
>>
>> Since the Resource Resolver also supports the current CDX file format, 
>> you can test it right away, but if you want to use the new format, a tool 
>> is available here:
>> https://github.com/iipc/cdx-cli
>>
>> Best,
>>
>> John Erik Halse
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "openwayback-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to openwayback-d...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to openwayback-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to