Tim,

Thanks for that. It did show up in the release notes but it was not immediately 
obvious to me where to make the config call. Once I found an external source 
then it fell in to place easily enough. However, I feel that this was just that 
I had not had to configure anything specifically before now and that a similar 
change in the future would be obvious. Once I worked out which config container 
class to use, it was trivial.

Big projects like this are difficult to keep documented and full of useful 
examples, in part because most people want to code rather than document. We had 
similar issues with ANTLR for a while until there was a book written (which was 
of course immediately out of date but at least gave a fighting chance! :)

I think that a bunch of meaty code examples would be a lot better than trying 
to document everything explicitly myself, so long as they had some good 
comments.

Thanks for a worthy project and the continuous updates!

Jim



From: Allison, Timothy B. [mailto:talli...@mitre.org]
Sent: Tuesday, June 6, 2017 01:39
To: user@tika.apache.org
Subject: RE: Extracting macros in 1.15

Y, sorry about that surprise.  I tried to communicate it in the release notes, 
and you’re right, we could do a better job of documenting it…please let us know 
specifically how we can improve our documentation!



> that all parsers should extract everything they can unless told otherwise, 
> but it is what it is I guess.
It is, but we can make modifications based on user feedback.  The reason I 
chose to turn it off was my opinion (and no fellow devs objecting to the 
proposal) that enterprise search users probably wouldn’t want to get false 
positives on a macro in an excel sheet, and that folks who cared about those 
would figure it out and set Tika correctly.  That doesn’t mean my opinion is 
correct.

> So be it, but it seems like there is a case for a subset of options that may 
> apply to all such as "extract anything that qualifies as a 'macro'" that all 
> parsers would obey if they have not been told anything specifically.
If you feel strongly about this, please open an issue on our JIRA.  There may 
be an easy(ish) fix.  I can’t think of one at the moment, but we should look 
into it if there’s sufficient user need.

Cheers,

           Tim

From: Jim Idle [mailto:ji...@proofpoint.com]
Sent: Sunday, June 4, 2017 4:07 AM
To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: RE: Extracting macros in 1.15


Direct Java calls and "I am using the AutoDetectParser at the moment."



I find an online example buried a test for another package, so I have worked 
out how to do it now, but it seems that if I have many difference document 
types to support I will have to configure each parser separately. So be it, but 
it seems like there is a case for a subset of options that may apply to all 
such as "extract anything that qualifies as a 'macro'" that all parsers would 
obey if they have not been told anything specifically.



It is my opinion (for what it's worth 😉, that all parsers should extract 
everything they can unless told otherwise, but it is what it is I guess and I 
am pleased to have TIKA as an aid in analyzing all the myriad document types.



Jim



        pc = new ParseContext();

       parser = new AutoDetectParser();

        OfficeParserConfig officeParserConfig = new OfficeParserConfig();

        officeParserConfig.setExtractMacros(true);

        pc.set(OfficeParserConfig.class, officeParserConfig);







> -----Original Message-----

> From: Nick Burch [mailto:apa...@gagravarr.org]

> Sent: Saturday, June 3, 2017 16:36

> To: user@tika.apache.org<mailto:user@tika.apache.org>

> Subject: Re: Extracting macros in 1.15

>

> On Sat, 3 Jun 2017, Jim Idle wrote:

> > After being baffled why macros no longer show up in 1.15 I found:

> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org

> > _jira_browse_TIKA-

> 2D2302&d=DwIBAg&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJy

> >

> p031rXjhg&r=LQ_Q8ZxvkO2zK857fAbj5MDtaB4Bvrpw3bihfO3Bhbw&m=o8gr

> 8gP1-gre

> >

> pBVLNkl9r56fM6Jt6LIlRff8aub3bEA&s=8nhkO_W_dLX6R9XdCgmgqoEpbRlvVL

> iSwf4L

> > rAFE1tA&e=

> >

> > Can anyone point me to an example of doing this? I am finding bits and

> > pieces but no example of turning macros back on.I basically want all

> > macros in all documents, office, pdf, anything really.

>

> How do you call Apache Tika? Tika App? Tika Server? Tika java class facade?

> Direct Java calls to TikaConfig / AutoDetectParser etc?

>

> The solution will differ depending on which one you use

>

> Nick

Reply via email to