Where are we going to share WARC dataset for testing OWB 3? Thank you.
On Friday, August 19, 2016 at 12:06:30 PM UTC+2, Kristinn Sigurðsson wrote: > > Dear all, > > An OWB call was held on 17/08/16 @ 15:00 UTC. No agenda was sent out. > > The following topics were discussed. > > > John Erik expects the new CDX server tools to be ready for testing very > soon. This includes tools to generate the new CDXJ files as well as the CDX > server itself. The CDX server will not be feature complete but should > support the main use cases. > > > John Erik raised the question whether the CDX server should be packed as a > servlet (WAR) that is deployed into a web server (e.g. Tomcat) or if it > should be published as a stand-alone utility (effectively embedding the web > server). Doing so may reduce the complexity of setup and allow us to choose > the most appropriate server. Currently John Erik is considering Grizzly for > this (https://grizzly.java.net/dependencies.html ). Comments on this are > most welcome! > > > With a major new piece needing testing Sawood raised the idea of a > standard WARC dataset for testing. This has been discussed before and > usually is well received in principle but (so far) no one has volunteered > to put together a suitable dataset. We'd very much welcome such volunteers! > > > There was some discussion about the practical differences between the > Memento API and the CDX server API. > > > Mohammed raised a question about input sanitization on URLs searched for > in OWB. The general consensus was that the search JSP pages might benefit > from preventing some obvious data entry errors (repeated protocol for > example) but that any API level interfaces should leave this to the caller. > > > Sawood advocated that issue https://github.com/iipc/openwayback/issues/285 > be considered for the CDX server. I.e. that the cdx server advertise its > version number in HTTP response headers. > > > There was a brief discussion on URI canonicalization. Existing > canonicalizers can be over aggressive (e.g. down casing the entire URL). > OWB 3 will include a new canonicalizer. > > > The next OWB call will be on September 7 @ 15:00 UTC. > > > Best, > Kristinn > ------------------------------------------------------------------------- > Landsbókasafn Íslands - Háskólabókasafn | Arngrímsgötu 3 - 107 Reykjavík > Sími/Tel: +354 5255600 | www.landsbokasafn.is > ------------------------------------------------------------------------- > fyrirvari/disclaimer - http://fyrirvari.landsbokasafn.is > -- You received this message because you are subscribed to the Google Groups "openwayback-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
