Direct Java calls and "I am using the AutoDetectParser at the moment."



I find an online example buried a test for another package, so I have worked 
out how to do it now, but it seems that if I have many difference document 
types to support I will have to configure each parser separately. So be it, but 
it seems like there is a case for a subset of options that may apply to all 
such as "extract anything that qualifies as a 'macro'" that all parsers would 
obey if they have not been told anything specifically.



It is my opinion (for what it's worth 😉, that all parsers should extract 
everything they can unless told otherwise, but it is what it is I guess and I 
am pleased to have TIKA as an aid in analyzing all the myriad document types.



Jim



        pc = new ParseContext();

       parser = new AutoDetectParser();

        OfficeParserConfig officeParserConfig = new OfficeParserConfig();

        officeParserConfig.setExtractMacros(true);

        pc.set(OfficeParserConfig.class, officeParserConfig);







> -----Original Message-----

> From: Nick Burch [mailto:apa...@gagravarr.org]

> Sent: Saturday, June 3, 2017 16:36

> To: user@tika.apache.org

> Subject: Re: Extracting macros in 1.15

>

> On Sat, 3 Jun 2017, Jim Idle wrote:

> > After being baffled why macros no longer show up in 1.15 I found:

> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org

> > _jira_browse_TIKA-

> 2D2302&d=DwIBAg&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJy

> >

> p031rXjhg&r=LQ_Q8ZxvkO2zK857fAbj5MDtaB4Bvrpw3bihfO3Bhbw&m=o8gr

> 8gP1-gre

> >

> pBVLNkl9r56fM6Jt6LIlRff8aub3bEA&s=8nhkO_W_dLX6R9XdCgmgqoEpbRlvVL

> iSwf4L

> > rAFE1tA&e=

> >

> > Can anyone point me to an example of doing this? I am finding bits and

> > pieces but no example of turning macros back on.I basically want all

> > macros in all documents, office, pdf, anything really.

>

> How do you call Apache Tika? Tika App? Tika Server? Tika java class facade?

> Direct Java calls to TikaConfig / AutoDetectParser etc?

>

> The solution will differ depending on which one you use

>

> Nick

Reply via email to