On Sat, Aug 1, 2015 at 8:07 AM, Nick Burch <apache-5Jw25rjQhWFrovVCs/[email protected]> wrote: > On Sat, 1 Aug 2015, Nick Burch wrote: >> If you need full control over the ordering, for now, you need to >> write some code something like: >> DefaultDetector d1 = new DefaultDetector(); >> MyCustomDetector d2 = new MyCustomDetector(); >> CompositeDetector detecter = new CompositeDetector(d1,d2); >> Then use that composite detector everywhere >> >> It seems you can't use the Tika Config xml to set it up like that, >> though you could for parsers, so I'll raise a JIRA for that > > Actually, you might be able to do that, but I just couldn't find any > unit tests > > Try with > <?xml version="1.0" encoding="UTF-8"?> > <properties> > <detectors> > <detector class="org.apache.tika.detector.DefaultDetector"/> > <detector class="my.custom.detector"/> > </detectors> > </properties> >
Hey, Nick! Only now I got back to this. I see all docs are updated with 1.10 and it is documented now. This approach however works partly for me. It occurred to me that *detect* method in *CompositeDetector* does not pass-in current *type* while trying detectors and testing for specialization[1]. Thus there is no way for me to know whether detector is being run for a case of specialization test or on a brand new file without testing (e.g. for office file) the 2nd time. Another option would be a "composite parser", but this is not possible at this moment according to tika web page[2]. I wonder if an idea of updating type in metadata while calling various detectors looking for specialization worth a JIRA. On the second though, with the most software out there (e.g. Alfresco) not handling the concept of types specialization and fallbacks, my initial goal seems somewhat futile as I won't be able edit my subset as, e.g., with Excel if it was a separate type. Footnotes: [1] https://github.com/apache/tika/blob/trunk/tika-core/src/main/java/org/apache/tika/detect/CompositeDetector.java#L73 [2] https://tika.apache.org/1.10/configuring.html#Configuring_Parsers -- Mikhail
