Hi John Erik,
I tried things out with the changes you made, and it looks like the issues
I mentioned are all addressed. I did also notice a small logging issue when
starting up WARR.

In case of multiple cdx sources, as they are added at start up, it is
always logging the first file name:
https://github.com/iipc/webarchive-commons/blob/2.0.0-DEV/webarchive-commons-cdx/src/main/java/org/netpreserve/commons/cdx/cdxsource/CdxFileSourceFactory.java#L84

Thanks again!
Lauren

On Thu, Sep 22, 2016 at 7:58 AM, John Erik Halse <johnerikha...@gmail.com>
wrote:

> Hi Lauren,
>
> Thanks for testing and finding bugs. I'm sure there are a lot more :-)
>
> For the cdx-cli tool: I found the bug with the pipe symbol, it should work
> now.
>
> I think I also have fixed the bugs in the Resource resolver. I also
> updated the readme to reflect that Java 8 is needed.
>
> I totally agree that error messages from the Resource Resolver with
> incorrect input, needs better handling. It's on my todo list.
>
> For the problem with invalid cdx-files, it will now log a warning and skip
> the invalid file instead of aborting.
>
> Thanks,
>
> John Erik
>
>
> fredag 16. september 2016 21.18.58 UTC+2 skrev Lauren Ko følgende:
>
>> Hi John Erik,
>> Thanks for all your work on the Resource Resolver and the cdx-cli. I
>> tried them both successfully. I noticed a few things, but nothing major.
>>
>> For the Resource Resolver I basically just did what was documented in the
>> README: queried both /resource and /resourcelist, used old-style CDX and
>> CDXJ, tried the various parameters listed, sent request headers for the
>> different Accept values. Here are the issues I encountered (all were easily
>> overcome).
>>
>> When first trying to start up with 
>> openwayback-resource-resolver-3.0.0-SNAPSHOT/bin/warr
>> I got:
>>  Exception in thread "main" java.lang.UnsupportedClassVersionError:
>> org/netpreserve/resource/resolver/Main : Unsupported major.minor version
>> 52.0
>>
>> - I set $JAVA_HOME to Java 8 instead of the Java 7 that I had set by
>> default on my machine.
>>
>>
>> Then trying to start again I got:
>>  java.lang.IllegalArgumentException: /tmp/warr/openwayback-resource
>> -resolver-3.0.0-SNAPSHOT/cdx/index.cdx is not a recognized CDX format
>>
>>  - I remembered OpenWayback 3 requires SURT-formatted CDX files, so I
>> grabbed a SURT-formatted file.
>>
>>
>> Tried to start again:
>>  10:51:48.707 [main] INFO  org.netpreserve.commons.cdx.CdxSourceFactory
>> - Loaded CDX Source Factory for scheme 'cdxfile'
>>  10:51:48.712 [main] INFO  
>> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory
>> - Adding all files in 
>> '/tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx'
>> as cdx sources
>>  10:51:48.713 [main] INFO  
>> org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory
>> - Adding file '/tmp/warr/openwayback-resourc
>> e-resolver-3.0.0-SNAPSHOT/cdx/index_IA_surt.cdx' as a cdx source
>>  Exception in thread "main" java.lang.IllegalArgumentException: Negative
>> position
>> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:670)
>> at org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<ini
>> t>(CdxFileDescriptor.java:70)
>> at org.netpreserve.commons.cdx.cdxsource.CdxFileDescriptor.<ini
>> t>(CdxFileDescriptor.java:55)
>> at org.netpreserve.commons.cdx.cdxsource.CdxFileSourceFactory.c
>> reateCdxSource(CdxFileSourceFactory.java:71)
>> at org.netpreserve.commons.cdx.CdxSourceFactory.getCdxSource(Cd
>> xSourceFactory.java:62)
>> at org.netpreserve.resource.resolver.settings.SettingsUtil.
>> lambda$createCdxSource$0(SettingsUtil.java:38)
>> at org.netpreserve.resource.resolver.settings.SettingsUtil$$
>> Lambda$1/1279271200.apply(Unknown Source)
>> at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipel
>> ine.java:193)
>> at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(
>> ArrayList.java:1359)
>> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
>> at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPi
>> peline.java:502)
>> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Reduc
>> eOps.java:708)
>> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>> at org.netpreserve.resource.resolver.settings.SettingsUtil.
>> createCdxSource(SettingsUtil.java:40)
>> at org.netpreserve.resource.resolver.ResourceResolverServer.<
>> init>(ResourceResolverServer.java:69)
>> at org.netpreserve.resource.resolver.Main.main(Main.java:33)
>>
>>  - Turns out the first SURT-formatted CDX file I grabbed was 30GB and
>> seemed to be too big to handle. I fed the first 1,000,000 lines to a new
>> CDX file (359MB) and then it worked: Resource Resolver (v. 3.0.0-SNAPSHOT)
>> started.
>>
>>
>> I tried doing some searches, stopped the Resource Resolver, and upon
>> trying to restart it I got:
>>  Exception in thread "main" java.lang.IllegalArgumentException:
>> /tmp/warr/openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/.out.cdx.swp
>> is not a recognized CDX format
>>
>>  - At some point my system had created a .out.cdx.swp (my cdx file was
>> called out.cdx). Not sure if Resource Resolver should ignore dot files or
>> if it should just be up to the user to handle this sort of issue.
>>
>>
>> - For a date range query, Resource Resolver did not include the exact
>> start time match (README says start date is inclusive) when precision is
>> down to the second. For example:
>>  http://localhost:8080/resourcelist/http%3A%2F%2Fwww.basketb
>> all.com%2Frobots.txt?date=2012-10-14T03:18:37,2013
>>
>> - Does not give me the entry in my CDX file with exact timestamp
>> 2012-10-14T03:18:37.
>>
>>
>> - Also relating to timestamp, but maybe not a problem with the
>> application itself, in the README, it says "The time stamp can be in either
>> WARC-format (e.g. 2016-02-05T45:42:00Z)..." In my initial testing of things
>> I copy and pasted that timestamp without thinking to my request URL and got
>> a 500 error before realizing that example is not a valid time. My mistake,
>> but perhaps the example timestamp formats should be changed in the README.
>> Also, should the invalid time be handled so it doesn't throw a 500?
>>
>>
>> I also tried out cdx-cli to get a CDXJ formatted index. I used both the
>> reformat and extract commands. I very much appreciate the thorough usage
>> instructions that will print at the command line. I did have one issue in
>> trying to convert an existing CDX file:
>>
>> | (pipe character) in URLs (but not in the query string) in the CDX file
>> I was trying to convert (status codes in CDX were 404s for these URLs)
>> would error and the reformatting process would stop.
>>
>>  $ cdxcli-1.0.0-SNAPSHOT/bin/cdxcli reformat -o
>> ../openwayback-resource-resolver-3.0.0-SNAPSHOT/cdx/ -f cdxj -s -i
>> out.cdx
>> Reformatting: out.cdx into: ../openwayback-resource-resolv
>> er-3.0.0-SNAPSHOT/cdx/out.cdxj
>> Illegal path: http://youtu.be/csorZustZbo|
>>
>> That is what I found in initial testing. Overall it worked well. Thanks
>> again!
>>
>> Lauren Ko
>> UNT Libraries
>>
>> On Wed, Sep 14, 2016 at 9:20 AM, John Erik Halse <johner...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> A very early version of the Resource Resolver (aka CDX server) is ready
>>> for testing and feedback.
>>> Have a look here for the details: https://github.com/ii
>>> pc/openwayback/tree/3.0.0-DEV/openwayback-resource-resolver
>>>
>>> Since the Resource Resolver also supports the current CDX file format,
>>> you can test it right away, but if you want to use the new format, a tool
>>> is available here:
>>> https://github.com/iipc/cdx-cli
>>>
>>> Best,
>>>
>>> John Erik Halse
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "openwayback-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to openwayback-d...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "openwayback-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to openwayback-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to openwayback-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to