Where are we going to share WARC dataset for testing OWB 3?

Thank you.

On Friday, August 19, 2016 at 12:06:30 PM UTC+2, Kristinn Sigurðsson wrote:
>
> Dear all, 
>
> An OWB call was held on 17/08/16 @ 15:00 UTC. No agenda was sent out. 
>
> The following topics were discussed. 
>
>
> John Erik expects the new CDX server tools to be ready for testing very 
> soon. This includes tools to generate the new CDXJ files as well as the CDX 
> server itself. The CDX server will not be feature complete but should 
> support the main use cases. 
>
>
> John Erik raised the question whether the CDX server should be packed as a 
> servlet (WAR) that is deployed into a web server (e.g. Tomcat) or if it 
> should be published as a stand-alone utility (effectively embedding the web 
> server). Doing so may reduce the complexity of setup and allow us to choose 
> the most appropriate server. Currently John Erik is considering Grizzly for 
> this (https://grizzly.java.net/dependencies.html ). Comments on this are 
> most welcome! 
>
>
> With a major new piece needing testing Sawood raised the idea of a 
> standard WARC dataset for testing. This has been discussed before and 
> usually is well received in principle but (so far) no one has volunteered 
> to put together a suitable dataset. We'd very much welcome such volunteers! 
>
>
> There was some discussion about the practical differences between the 
> Memento API and the CDX server API. 
>
>
> Mohammed raised a question about input sanitization on URLs searched for 
> in OWB. The general consensus was that the search JSP pages might benefit 
> from preventing some obvious data entry errors (repeated protocol for 
> example) but that any API level interfaces should leave this to the caller. 
>
>
> Sawood advocated that issue https://github.com/iipc/openwayback/issues/285 
> be considered for the CDX server. I.e. that the cdx server advertise its 
> version number in HTTP response headers. 
>
>
> There was a brief discussion on URI canonicalization. Existing 
> canonicalizers can be over aggressive (e.g. down casing the entire URL). 
> OWB 3 will include a new canonicalizer. 
>
>
> The next OWB call will be on September 7 @ 15:00 UTC. 
>
>
> Best, 
> Kristinn 
> ------------------------------------------------------------------------- 
> Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík 
> Sími/Tel: +354 5255600 | www.landsbokasafn.is 
> ------------------------------------------------------------------------- 
> fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is 
>

-- 
You received this message because you are subscribed to the Google Groups 
"openwayback-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to