Re: Adding a WARC parser to Tika

2017-07-11 Thread Jackson, Andy
of the >>metadata is coming from, but it¹s probably worth it. >> >> One final note - I've not put the test WARC files in that repo yet as I >>need to create some new ones from an Apache 2 source. >> >> I hope this is useful. >> >> Best, >> Andy >

RE: Adding a WARC parser to Tika

2017-07-11 Thread Jackson, Andy
2017 19:45 To: user@tika.apache.org Subject: Re: Adding a WARC parser to Tika On Mon, 10 Jul 2017, Allison, Timothy B. wrote: > Sorry, I can't tell if this is tongue-in-cheek... No, I do think we should add a WARC parser to Tika Parsers. Once done, I'd suggest we figure out a way for Tika Ba

Re: Adding a WARC parser to Tika

2017-07-10 Thread Nick Burch
On Mon, 10 Jul 2017, Allison, Timothy B. wrote: Sorry, I can't tell if this is tongue-in-cheek... No, I do think we should add a WARC parser to Tika Parsers. Once done, I'd suggest we figure out a way for Tika Batch to run over a collection of WARC files just as it does for directories, to

Adding a WARC parser to Tika

2017-07-10 Thread Allison, Timothy B.
Nick, Sorry, I can't tell if this is tongue-in-cheek... Should we look into this? Perhaps for the -z option? -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Friday, July 7, 2017 6:55 AM To: user@tika.apache.org Subject: RE: Tika content detection and crawled