[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534262#comment-15534262 ] Tim Allison commented on TIKA-2069: --- Y, sorry. I opened TIKA-2104 to track this. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, > tika-app-1.14-20160928.19-109-test-macro-doc.docm.output, > tika-app-1.14-20160928.19-109-xlsmacro.xlsm.output, word-macro.PNG, > xlsmacro.xlsm, xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534259#comment-15534259 ] Jeff Swindle commented on TIKA-2069: Thanks. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, > tika-app-1.14-20160928.19-109-test-macro-doc.docm.output, > tika-app-1.14-20160928.19-109-xlsmacro.xlsm.output, word-macro.PNG, > xlsmacro.xlsm, xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534253#comment-15534253 ] Jeff Swindle commented on TIKA-2069: [~talli...@apache.org] I tried a tika-app 1.14 snapshot and didn't get the expected output for the test-macro-doc.docm file. I also tried another internal file and didn't see macro output. Executing against xlsmacro.xlsm provided expected output of macro content. I've attached the output from tika-app against xlsmacro.xlsm and test-macro-doc.docm. Here are the commands I used: # java -jar tika-app-1.14-20160928.19-109.jar test-macro-doc.docm > tika-app-1.14-20160928.19-109-test-macro-doc.docm.output # java -jar tika-app-1.14-20160928.19-109.jar xlsmacro.xlsm > tika-app-1.14-20160928.19-109-xlsmacro.xlsm.output Is there something specific I need to add to the command to extract the macro in the docm? > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, > tika-app-1.14-20160928.19-109-test-macro-doc.docm.output, > tika-app-1.14-20160928.19-109-xlsmacro.xlsm.output, word-macro.PNG, > xlsmacro.xlsm, xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534244#comment-15534244 ] Tim Allison commented on TIKA-2069: --- Right. Sorry. Unfortunately, there's a bug in POI that prevents reading the macro in your docm file. See [above|https://issues.apache.org/jira/browse/TIKA-2069?focusedCommentId=15510207&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15510207]. There's still some work to do on the POI side. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, > tika-app-1.14-20160928.19-109-test-macro-doc.docm.output, > tika-app-1.14-20160928.19-109-xlsmacro.xlsm.output, word-macro.PNG, > xlsmacro.xlsm, xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513354#comment-15513354 ] Hudson commented on TIKA-2069: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1105 (See [https://builds.apache.org/job/Tika-trunk/1105/]) TIKA-2069 -- extract macros from MSOffice docs, fix tests to find target (tallison: rev 8a45f67a2e3641b08fcfb5e2283e4a43ff86f3cd) * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java * (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513351#comment-15513351 ] Hudson commented on TIKA-2069: -- SUCCESS: Integrated in Jenkins build tika-2.x #147 (See [https://builds.apache.org/job/tika-2.x/147/]) TIKA-2069 -- extract macros from MSOffice docs, fix tests to find target (tallison: rev d543378a88aeca574d15ab31d13b6316fb938f7f) * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java * (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513279#comment-15513279 ] Hudson commented on TIKA-2069: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #51 (See [https://builds.apache.org/job/tika-2.x-windows/51/]) TIKA-2069 -- extract macros from MSOffice docs, fix tests to find target (tallison: rev d543378a88aeca574d15ab31d13b6316fb938f7f) * (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511987#comment-15511987 ] Hudson commented on TIKA-2069: -- ABORTED: Integrated in Jenkins build Tika-trunk #1104 (See [https://builds.apache.org/job/Tika-trunk/1104/]) TIKA-2069 -- extract macros from MSOffice docs (tallison: rev 2ae7206d9c99fb553314cff21bb155d4e6f06d12) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java * (edit) CHANGES.txt * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (add) tika-parsers/src/test/resources/test-documents/testWORD_macros.docm * (add) tika-parsers/src/test/resources/test-documents/testPPT_macros.pptm * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java * (add) tika-parsers/src/test/resources/test-documents/testWORD_macros.doc * (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java * (add) tika-parsers/src/test/resources/test-documents/testEXCEL_macro.xlsm * (add) tika-parsers/src/test/resources/test-documents/testEXCEL_macro.xls * (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java * (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java * (add) tika-parsers/src/test/resources/test-documents/testPPT_macros.ppt > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511970#comment-15511970 ] Hudson commented on TIKA-2069: -- ABORTED: Integrated in Jenkins build tika-2.x #146 (See [https://builds.apache.org/job/tika-2.x/146/]) TIKA-2069 -- extract macros from MSOffice files. (tallison: rev 66f433471f59d5af931f0a49bf8bddd33a7f27a7) * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (add) tika-test-resources/src/test/resources/test-documents/testWORD_macros.docm * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_macro.xls * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java * (add) tika-test-resources/src/test/resources/test-documents/testWORD_macros.doc * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java * (add) tika-test-resources/src/test/resources/test-documents/testPPT_macros.pptm * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java * (edit) CHANGES.txt * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java * (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_macro.xlsm * (add) tika-test-resources/src/test/resources/test-documents/testPPT_macros.ppt > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511744#comment-15511744 ] Hudson commented on TIKA-2069: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #50 (See [https://builds.apache.org/job/tika-2.x-windows/50/]) TIKA-2069 -- extract macros from MSOffice files. (tallison: rev 66f433471f59d5af931f0a49bf8bddd33a7f27a7) * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java * (add) tika-test-resources/src/test/resources/test-documents/testWORD_macros.doc * (edit) tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java * (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_macro.xlsm * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java * (edit) tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java * (add) tika-test-resources/src/test/resources/test-documents/testPPT_macros.pptm * (edit) CHANGES.txt * (add) tika-test-resources/src/test/resources/test-documents/testEXCEL_macro.xls * (add) tika-test-resources/src/test/resources/test-documents/testPPT_macros.ppt * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (add) tika-test-resources/src/test/resources/test-documents/testWORD_macros.docm > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Fix For: 2.0, 1.14 > > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510446#comment-15510446 ] Tim Allison commented on TIKA-2069: --- Currently, multiple macros are appended to one string in POI. {noformat} Attribute VB_Name = "NewMacros" Sub Embolden() Attribute Embolden.VB_Description = "This tests changing the selection to bold" Attribute Embolden.VB_ProcData.VB_Invoke_Func = "Project.NewMacros.Embolden" ' ' Embolden Macro ' ' Selection.Font.Bold = wdToggle Selection.Font.BoldBi = wdToggle End Sub Sub Italicize() Attribute Italicize.VB_Description = "This tests italicizing" Attribute Italicize.VB_ProcData.VB_Invoke_Func = "Project.NewMacros.Italicize" ' ' Italicize Macro ' ' Selection.Font.Italic = wdToggle Selection.Font.ItalicBi = wdToggle End Sub {noformat} > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510207#comment-15510207 ] Tim Allison commented on TIKA-2069: --- [~jeffswindle], I should point out that the VBAMacroReader is still relatively new in POI, and there are currently 3 open bugs, one triggered by the docm file that you submitted. * [60158|https://bz.apache.org/bugzilla/show_bug.cgi?id=60158] * [59830|https://bz.apache.org/bugzilla/show_bug.cgi?id=59830] * [59858|https://bz.apache.org/bugzilla/show_bug.cgi?id=59858] For now, we'll swallow the exceptions in Tika, but there's much more work to be done. Patches to POI would be welcomed! :) > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510157#comment-15510157 ] Jeff Swindle commented on TIKA-2069: For my purposes, the output shown is good. I need the macro text content primarily. Thanks Tim! > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509832#comment-15509832 ] Tim Allison commented on TIKA-2069: --- This reminds me that I need to commit TIKA-2047 so that the mime-type isn't overwritten. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15509825#comment-15509825 ] Tim Allison commented on TIKA-2069: --- I think I get it. One challenge is that we're currently getting a {{Map}} from POI, there doesn't seem currently to be an obvious way to link metadata to the actual text. On POI's test doc, with this code: {noformat} VBAMacroReader reader = new VBAMacroReader(fs); for (Map.Entry e : reader.readMacros().entrySet()) { Metadata m = new Metadata(); m.set(Metadata.EMBEDDED_RESOURCE_TYPE, TikaCoreProperties.EmbeddedResourceType.MACRO.toString()); m.set(Metadata.CONTENT_TYPE, "text/x-vbasic"); EmbeddedDocumentExtractor ex = context.get(EmbeddedDocumentExtractor.class); if (ex == null) { ex = new ParsingEmbeddedDocumentExtractor(context); } if (ex.shouldParseEmbedded(m)) { ex.parseEmbedded(new ByteArrayInputStream(e.getValue().getBytes(StandardCharsets.UTF_8)), xhtml, m, true); } } {noformat} we get: {noformat} 1: X-Parsed-By : org.apache.tika.parser.DefaultParser 1: X-Parsed-By : org.apache.tika.parser.txt.TXTParser 1: embeddedResourceType : MACRO 1: Content-Encoding : windows-1252 1: X-TIKA:parse_time_millis : 27 1: X-TIKA:content : http://www.w3.org/1999/xhtml";> Attribute VB_Name = "Module1" Sub TestMacro() ' ' TestMacro Macro ' This is a test macro ' ' ActiveDocument.Paragraphs(1).Range.Text = "This is a macro word processing document" End Sub 1: X-TIKA:embedded_resource_path : /embedded-1 1: Content-Type : text/plain; charset=windows-1252 2: X-Parsed-By : org.apache.tika.parser.DefaultParser 2: X-Parsed-By : org.apache.tika.parser.txt.TXTParser 2: embeddedResourceType : MACRO 2: Content-Encoding : windows-1252 2: X-TIKA:parse_time_millis : 4 2: X-TIKA:content : http://www.w3.org/1999/xhtml";> Attribute VB_Name = "ThisDocument" Attribute VB_Base = "1Normal.ThisDocument" Attribute VB_GlobalNameSpace = False Attribute VB_Creatable = False Attribute VB_PredeclaredId = True Attribute VB_Exposed = True Attribute VB_TemplateDerived = True Attribute VB_Customizable = True 2: X-TIKA:embedded_resource_path : /embedded-2 2: Content-Type : text/plain; charset=windows-1252 {noformat} Is this good enough for now? > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504643#comment-15504643 ] Tim Allison commented on TIKA-2069: --- Just realized that we might want to handle extraction of Actions and/or javascript from PDFs in a similar way? New+related ticket if anyone has an interest? > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493924#comment-15493924 ] Nick Burch commented on TIKA-2069: -- Yes! If you wrote a VB Script, and zipped it up, it'd be a {{text/x-vbasic}} with no extra metadata. When you add a macro to an office doc, you get the macro text but also some metadata. We wouldn't need a parser for {[text/x-vbasic}}, only for {{application/vnd.ms-office.vbaProject}} which would expose the embedded script text + metadata > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493780#comment-15493780 ] Tim Allison commented on TIKA-2069: --- Makes sense, although I'd prefer to write one parser rather than two. :) Would the {{application/vnd.ms-office.vbaProject}} ever have any content? Would its metadata be different from the vbscript? > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487662#comment-15487662 ] Nick Burch commented on TIKA-2069: -- I think the idea of a Macro is probably general enough across a range of file formats that we could add it as an embedded type However, there's actually 2 levels to an OOXML macro. The OOXML file contains a binary vba project bin file, and within that is the actual macro text + its properties. Maybe we should have the ooxml extractor first expose a `application/vnd.ms-office.vbaProject` embedded resource, then we use a second parser which extracts a body of the macro vbscript as {{text/x-vbasic}} with the other macro properties/attributes (name, sid, various boolean flags) as metadata? eg {{application/vnd.ms-excel.sheet.macroenabled.12}} -> {{application/vnd.ms-office.vbaProject}} -> {{text/x-vbasic}} + metadata > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484929#comment-15484929 ] Tim Allison commented on TIKA-2069: --- Sounds good. Thank you, [~gagravarr]. Do we want to distinguish between an attached vba/text file and a macro? Perhaps add {{MACRO}} to {{TikaCoreProperties.EmbeddedResourceType}}? Or, do we want to distinguish between the two by using a different mime type? I think I'd prefer the former. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484834#comment-15484834 ] Nick Burch commented on TIKA-2069: -- I think that, given both how big macros can get and how they logically fit with the document, as an embedded document might be best Mimetype wise, some people seem to use {{application/x-vba}}, but the office content types file uses {{application/vnd.ms-office.vbaProject}}. Our own tika mimetypes file defines {{text/x-vbasic}}. I'd lean towards one of the latter two > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477438#comment-15477438 ] Tim Allison commented on TIKA-2069: --- Once we upgrade to POI 3.15-beta3, this _should_ be fairly straightforward, thanks to the work of others on POI. We may want to copy/modify the "find the vba.bin file" at the Tika level for OOXML files to pass an npoifs into VBAMacroReader from an open OOXML/zip file. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477415#comment-15477415 ] Tim Allison commented on TIKA-2069: --- Thanks to [~blagerw...@gmail.com], [~gagravarr] and [~onealj] among others, it looks like this is all nicely handled by POI now as of [bug-52949|https://bz.apache.org/bugzilla/show_bug.cgi?id=52949]. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477255#comment-15477255 ] Jeff Swindle commented on TIKA-2069: OOXML would be great. Not just limited to Word and Excel. Need Powerpoint also. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477188#comment-15477188 ] Tim Allison commented on TIKA-2069: --- Thank you! This question is for [~jeffswindle] and fellow Tika devs (esp. [~rgauss]), should we add macros as metadata items or inline them in the content via elements? I'd prefer a metadata item for each macro, but could go either way. [~jeffswindle], the title of this issue is for msoffice...is it ok to limit this to ooxml? Do you need this for the older doc and xls? > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477176#comment-15477176 ] Jeff Swindle commented on TIKA-2069: Desire is for TIKA to extract macro text from Microsoft Office files as it does metadata and content. Need is to search for specific signatures that may be present in macros and if present should be removed prior to distributing document. TIKA would facilitate the search. > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > Attachments: excel-macro.PNG, test-macro-doc.docm, > test-macro-doc.docm-tika-app-output.txt, word-macro.PNG, xlsmacro.xlsm, > xlsmacro.xlsm.tika-app-output.txt > > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-2069) Extract Macro text from Microsoft Office documents
[ https://issues.apache.org/jira/browse/TIKA-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475422#comment-15475422 ] Tim Allison commented on TIKA-2069: --- [~jeffswindle], thank you for opening this. Would you be able to share some example test documents and expected output? Bonus points for a unit test or two... :) > Extract Macro text from Microsoft Office documents > -- > > Key: TIKA-2069 > URL: https://issues.apache.org/jira/browse/TIKA-2069 > Project: Tika > Issue Type: Improvement > Components: detector, parser >Affects Versions: 1.13 > Environment: RHEL 5.x, Apache Tomcat >Reporter: Jeff Swindle > Labels: features > > Tika supports macro-enabled Microsoft Office documents by extracting metadata > and contents, however, macros within the document are not in the metadata or > content output. > Desire is to have the macro text extracted also. > Info regarding macro extraction: http://www.decalage.info/vba_tools -- This message was sent by Atlassian JIRA (v6.3.4#6332)