of the
>>metadata is coming from, but it¹s probably worth it.
>>
>> One final note - I've not put the test WARC files in that repo yet as I
>>need to create some new ones from an Apache 2 source.
>>
>> I hope this is useful.
>>
>> Best,
>> Andy
>
2017 19:45
To: user@tika.apache.org
Subject: Re: Adding a WARC parser to Tika
On Mon, 10 Jul 2017, Allison, Timothy B. wrote:
> Sorry, I can't tell if this is tongue-in-cheek...
No, I do think we should add a WARC parser to Tika Parsers.
Once done, I'd suggest we figure out a way for Tika Ba
On Mon, 10 Jul 2017, Allison, Timothy B. wrote:
Sorry, I can't tell if this is tongue-in-cheek...
No, I do think we should add a WARC parser to Tika Parsers.
Once done, I'd suggest we figure out a way for Tika Batch to run over a
collection of WARC files just as it does for directories, to
Nick,
Sorry, I can't tell if this is tongue-in-cheek...
Should we look into this? Perhaps for the -z option?
-Original Message-
From: Nick Burch [mailto:apa...@gagravarr.org]
Sent: Friday, July 7, 2017 6:55 AM
To: user@tika.apache.org
Subject: RE: Tika content detection and crawled