Not sure what to do with this one.

The triggering document has a run of ~50 <div> starts and then ~50+ <font> 
starts.  So, y, Tika limits nested elements to 100.

Tika's DefaultHtmlMapper only passes through a few handfuls of elements 
(SAFE_ELEMENTS), not including <font> or <div>. 

Solr's MostlyPassThroughHtmlMapper passes through, well, mostly everything.


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, September 22, 2016 12:47 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Disabling Zip bomb detection in Tika

So far a Tika JIRA seems like the right thing. Tim is "a well known entity"
in Solr though so I'm sure he'll move it over to Solr if appropriate.

Erick

On Thu, Sep 22, 2016 at 9:43 AM, Rodrigo Rosenfeld Rosas 
<rr_ro...@yahoo.com.br.invalid> wrote:
> Here it is. Not sure if it's clear enough though:
>
> https://issues.apache.org/jira/browse/TIKA-2091
>
> Or should I have created the ticket in the Solr project instead?
>
>
> Em 22-09-2016 13:32, Rodrigo Rosenfeld Rosas escreveu:
>>
>> This is one of the documents:
>>
>>
>> https://www.sec.gov/Archives/edgar/data/1472033/000119380513001310/e6
>> 11133_f6ef-eutelsat.htm
>>
>> I'll try to create a ticket for this on Jira if I find its location 
>> but feel free to open it yourself if you prefer, just let me know.
>>
>> Em 22-09-2016 12:33, Allison, Timothy B. escreveu:
>>>>
>>>> I'll try to get a sample HTML yielding to this problem and attach 
>>>> it to Jira.
>>>
>>> Great!  Tika 1.14 is around the corner...if this is an easy fix ... 
>>> :)
>>>
>>> Thank you.
>>>
>>
>>
>

Reply via email to