Tim, Thanks for that. It did show up in the release notes but it was not immediately obvious to me where to make the config call. Once I found an external source then it fell in to place easily enough. However, I feel that this was just that I had not had to configure anything specifically before now and that a similar change in the future would be obvious. Once I worked out which config container class to use, it was trivial.
Big projects like this are difficult to keep documented and full of useful examples, in part because most people want to code rather than document. We had similar issues with ANTLR for a while until there was a book written (which was of course immediately out of date but at least gave a fighting chance! :) I think that a bunch of meaty code examples would be a lot better than trying to document everything explicitly myself, so long as they had some good comments. Thanks for a worthy project and the continuous updates! Jim From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Tuesday, June 6, 2017 01:39 To: user@tika.apache.org Subject: RE: Extracting macros in 1.15 Y, sorry about that surprise. I tried to communicate it in the release notes, and you’re right, we could do a better job of documenting it…please let us know specifically how we can improve our documentation! > that all parsers should extract everything they can unless told otherwise, > but it is what it is I guess. It is, but we can make modifications based on user feedback. The reason I chose to turn it off was my opinion (and no fellow devs objecting to the proposal) that enterprise search users probably wouldn’t want to get false positives on a macro in an excel sheet, and that folks who cared about those would figure it out and set Tika correctly. That doesn’t mean my opinion is correct. > So be it, but it seems like there is a case for a subset of options that may > apply to all such as "extract anything that qualifies as a 'macro'" that all > parsers would obey if they have not been told anything specifically. If you feel strongly about this, please open an issue on our JIRA. There may be an easy(ish) fix. I can’t think of one at the moment, but we should look into it if there’s sufficient user need. Cheers, Tim From: Jim Idle [mailto:ji...@proofpoint.com] Sent: Sunday, June 4, 2017 4:07 AM To: user@tika.apache.org<mailto:user@tika.apache.org> Subject: RE: Extracting macros in 1.15 Direct Java calls and "I am using the AutoDetectParser at the moment." I find an online example buried a test for another package, so I have worked out how to do it now, but it seems that if I have many difference document types to support I will have to configure each parser separately. So be it, but it seems like there is a case for a subset of options that may apply to all such as "extract anything that qualifies as a 'macro'" that all parsers would obey if they have not been told anything specifically. It is my opinion (for what it's worth 😉, that all parsers should extract everything they can unless told otherwise, but it is what it is I guess and I am pleased to have TIKA as an aid in analyzing all the myriad document types. Jim pc = new ParseContext(); parser = new AutoDetectParser(); OfficeParserConfig officeParserConfig = new OfficeParserConfig(); officeParserConfig.setExtractMacros(true); pc.set(OfficeParserConfig.class, officeParserConfig); > -----Original Message----- > From: Nick Burch [mailto:apa...@gagravarr.org] > Sent: Saturday, June 3, 2017 16:36 > To: user@tika.apache.org<mailto:user@tika.apache.org> > Subject: Re: Extracting macros in 1.15 > > On Sat, 3 Jun 2017, Jim Idle wrote: > > After being baffled why macros no longer show up in 1.15 I found: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org > > _jira_browse_TIKA- > 2D2302&d=DwIBAg&c=Vxt5e0Osvvt2gflwSlsJ5DmPGcPvTRKLJy > > > p031rXjhg&r=LQ_Q8ZxvkO2zK857fAbj5MDtaB4Bvrpw3bihfO3Bhbw&m=o8gr > 8gP1-gre > > > pBVLNkl9r56fM6Jt6LIlRff8aub3bEA&s=8nhkO_W_dLX6R9XdCgmgqoEpbRlvVL > iSwf4L > > rAFE1tA&e= > > > > Can anyone point me to an example of doing this? I am finding bits and > > pieces but no example of turning macros back on.I basically want all > > macros in all documents, office, pdf, anything really. > > How do you call Apache Tika? Tika App? Tika Server? Tika java class facade? > Direct Java calls to TikaConfig / AutoDetectParser etc? > > The solution will differ depending on which one you use > > Nick