On Sat, Aug  1, 2015 at  8:07 AM, Nick Burch 
<apache-5Jw25rjQhWFrovVCs/[email protected]> wrote:
> On Sat, 1 Aug 2015, Nick Burch wrote:
>> If you need full control over the ordering, for now, you need to
>> write some code something like:
>>   DefaultDetector d1 = new DefaultDetector();
>>   MyCustomDetector d2 = new MyCustomDetector();
>>   CompositeDetector detecter = new CompositeDetector(d1,d2);
>> Then use that composite detector everywhere
>>
>> It seems you can't use the Tika Config xml to set it up like that,
>> though you could for parsers, so I'll raise a JIRA for that
>
> Actually, you might be able to do that, but I just couldn't find any
> unit tests
>
> Try with
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>  <detectors>
>   <detector class="org.apache.tika.detector.DefaultDetector"/>
>   <detector class="my.custom.detector"/>
>  </detectors>
> </properties>
>

Hey, Nick!

Only now I got back to this. I see all docs are updated with 1.10 and it
is documented now.

This approach however works partly for me. It occurred to me that
*detect* method in *CompositeDetector* does not pass-in current *type*
while trying detectors and testing for specialization[1]. Thus there is
no way for me to know whether detector is being run for a case of
specialization test or on a brand new file without testing (e.g. for
office file) the 2nd time. Another option would be a "composite parser",
but this is not possible at this moment according to tika web page[2].

I wonder if an idea of updating type in metadata while calling various
detectors looking for specialization worth a JIRA.

On the second though, with the most software out there (e.g. Alfresco)
not handling the concept of types specialization and fallbacks, my
initial goal seems somewhat futile as I won't be able edit my subset
as, e.g., with Excel if it was a separate type.


Footnotes: 
[1]  
https://github.com/apache/tika/blob/trunk/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L73

[2]  https://tika.apache.org/1.10/configuring.html#Configuring_Parsers

-- 
Mikhail

Reply via email to