[jira] [Comment Edited] (TIKA-3686) CSS file detected as JavaScript (application/javascript)
[ https://issues.apache.org/jira/browse/TIKA-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500814#comment-17500814 ] Vincent Massol edited comment on TIKA-3686 at 3/3/22, 2:57 PM: --- [~nick] Thanks. Note that it seems to be some sort of regression since it worked fine before the upgrade to Tika 2.0.0. Was this change of behavior wanted? Details at https://jira.xwiki.org/browse/XWIKI-19491 was (Author: vmassol): [~nick] Thanks. Note that it seems to be some sort of regression since it worked fine before the upgrade to Tika 2.0.0. Was this change of behavior wanted? > CSS file detected as JavaScript (application/javascript) > > > Key: TIKA-3686 > URL: https://issues.apache.org/jira/browse/TIKA-3686 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.0.0-ALPHA >Reporter: Marius Dumitru Florea >Priority: Major > > The following CSS file > [https://github.com/techlab/jquery-smartwizard/blob/v5.1.1/dist/css/smart_wizard_all.min.css] > is detected as {{application/javascript}} using: > {noformat} > TikaUtils.detect(InputStream stream, String name) > {noformat} > The reason seems to be that the CSS file starts with: > {noformat} > /*! > * jQuery > {noformat} > which matches the "jQuery" entry from > [tika-mimetypes.xml|https://github.com/apache/tika/blob/2.3.0/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L348] > used by Tika's {{MimeTypes}} detector. > This is a regression introduced by > https://github.com/apache/tika/commit/97699598f000139b1222b785d634b3c8a8e216c7 > in TIKA-1141 (2.0.0-ALPHA). > The implications are serious if the mime type returned by Tika is used to set > the content type on the HTTP request returning the CSS file to the browser: > the browser ignores the CSS. > FTR, in my case the CSS file is not served directly from the file system but > from a WebJar (in this case > https://search.maven.org/artifact/org.webjars.npm/smartwizard/5.1.1/jar ) and > we're using Tika to determine the type of files requested from the WebJars. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-3686) CSS file detected as JavaScript (application/javascript)
[ https://issues.apache.org/jira/browse/TIKA-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500814#comment-17500814 ] Vincent Massol commented on TIKA-3686: -- [~nick] Thanks. Note that it seems to be some sort of regression since it worked fine before the upgrade to Tika 2.0.0. Was this change of behavior wanted? > CSS file detected as JavaScript (application/javascript) > > > Key: TIKA-3686 > URL: https://issues.apache.org/jira/browse/TIKA-3686 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.0.0-ALPHA >Reporter: Marius Dumitru Florea >Priority: Major > > The following CSS file > [https://github.com/techlab/jquery-smartwizard/blob/v5.1.1/dist/css/smart_wizard_all.min.css] > is detected as {{application/javascript}} using: > {noformat} > TikaUtils.detect(InputStream stream, String name) > {noformat} > The reason seems to be that the CSS file starts with: > {noformat} > /*! > * jQuery > {noformat} > which matches the "jQuery" entry from > [tika-mimetypes.xml|https://github.com/apache/tika/blob/2.3.0/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L348] > used by Tika's {{MimeTypes}} detector. > This is a regression introduced by > https://github.com/apache/tika/commit/97699598f000139b1222b785d634b3c8a8e216c7 > in TIKA-1141 (2.0.0-ALPHA). > The implications are serious if the mime type returned by Tika is used to set > the content type on the HTTP request returning the CSS file to the browser: > the browser ignores the CSS. > FTR, in my case the CSS file is not served directly from the file system but > from a WebJar (in this case > https://search.maven.org/artifact/org.webjars.npm/smartwizard/5.1.1/jar ) and > we're using Tika to determine the type of files requested from the WebJars. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362379#comment-14362379 ] Vincent Massol commented on TIKA-1143: -- It's been fixed in Tika 1.5. > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java
[jira] [Commented] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362308#comment-14362308 ] Vincent Massol commented on TIKA-1143: -- Thanks Tyler. Could you set the "fix version" and "assignee" fields please? The fix version is especially important so that we can know which version of Tika we have to take that has the fix (ie the POI upgrade). Thanks! > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:
[jira] [Commented] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698944#comment-13698944 ] Vincent Massol commented on TIKA-1143: -- Sorry my bad, I said earlier that my content didn't seem indexed. It's actually not correct and I confirm it is indexed (I was not searching correctly) and thus the stack trace is only a warning and doesn't affect the rest. Thanks! > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:16
[jira] [Commented] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698861#comment-13698861 ] Vincent Massol commented on TIKA-1143: -- You guys are awesome! Such as fast response time :) Way to go! > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika
[jira] [Commented] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698860#comment-13698860 ] Vincent Massol commented on TIKA-1143: -- {quote} Are you able to extract text from the rest of the document? {quote} No, it doesn't seem indexed since I can't find it in my search. However I haven't coded this code in XWiki so maybe we stop some processing when we get an exception and this is why the rest isn't indexed. Will check with the developer who coded this part. > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parse
[jira] [Commented] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698809#comment-13698809 ] Vincent Massol commented on TIKA-1143: -- Hi Nick. Unfortunately I don't know the origin. I can open the file fine with libreoffice or MS PPT. I've just attached the file to this issue. > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.pars
[jira] [Updated] (TIKA-1143) Fails to parse some PPT file
[ https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Massol updated TIKA-1143: - Attachment: XWikiIExpoPresentation.ppt > Fails to parse some PPT file > > > Key: TIKA-1143 > URL: https://issues.apache.org/jira/browse/TIKA-1143 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.4 >Reporter: Vincent Massol > Attachments: XWikiIExpoPresentation.ppt > > > See also http://jira.xwiki.org/browse/XWIKI-9308 > Here's what I get with the attached file: > {noformat} > 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > [tika-core-1.4.jar:na] > at org.apache.tika.Tika.parseToString(Tika.java:380) > [tika-core-1.4.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) > [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] > at > com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) > [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] > at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] > 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN > a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing > summary entry DocumentSummaryInformation > java.lang.ClassCastException: [B cannot be cast to java.lang.String > at > org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) > ~[poi-3.9.jar:3.9] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) > [tika-parsers-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > [tika-core-1.4.jar:na] >
[jira] [Created] (TIKA-1143) Fails to parse some PPT file
Vincent Massol created TIKA-1143: Summary: Fails to parse some PPT file Key: TIKA-1143 URL: https://issues.apache.org/jira/browse/TIKA-1143 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.4 Reporter: Vincent Massol See also http://jira.xwiki.org/browse/XWIKI-9308 Here's what I get with the attached file: {noformat} 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing summary entry DocumentSummaryInformation java.lang.ClassCastException: [B cannot be cast to java.lang.String at org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) ~[poi-3.9.jar:3.9] at org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) [tika-core-1.4.jar:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) [tika-core-1.4.jar:na] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) [tika-core-1.4.jar:na] at org.apache.tika.Tika.parseToString(Tika.java:380) [tika-core-1.4.jar:na] at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353) [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na] at com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na] at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51] 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing summary entry DocumentSummaryInformation java.lang.ClassCastException: [B cannot be cast to java.lang.String at org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78) ~[poi-3.9.jar:3.9] at org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) [tika-parsers-1.4.jar:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) [tika-core-1.4.jar:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) [tika-core-1.4.jar:na] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) [tika-core-1.4.jar:na] at org.apache.tika.Tika.parseToString(Tika.java:380) [tika-core-1.4.jar:na] at com.xpn.xwiki.plugin.lucene.internal.AttachmentData.getContentAsText(AttachmentData.java:221) [xwiki-platform-search-lucene-api-5.2-20130702.191134-22.jar:na]
[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694677#comment-13694677 ] Vincent Massol commented on TIKA-1053: -- Cool, thanks Nick. > Upgrade Tika Parsers to use ASM 4.x > --- > > Key: TIKA-1053 > URL: https://issues.apache.org/jira/browse/TIKA-1053 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Vincent Massol >Assignee: Michael McCandless > Fix For: 1.4 > > Attachments: TIKA-1053.patch > > > Right now Tika 1.2 uses ASM 3.1. > However this is causing some issues for us on the XWiki project since we also > bundle other framework that use a more recent version of ASM (we use pegdown > which uses parboiled which draws ASM 4.0). > The problem is that ASM 3.x and 4.0 are not compatible... > See http://jira.xwiki.org/browse/XE-1269 for more details about the issue > we're facing. > Thanks for considering upgrading to ASM 4.x :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694624#comment-13694624 ] Vincent Massol commented on TIKA-1053: -- Thanks again for fixing this. Any idea when Tika 1.4 is going to be released? (I'm still waiting for this fix). > Upgrade Tika Parsers to use ASM 4.x > --- > > Key: TIKA-1053 > URL: https://issues.apache.org/jira/browse/TIKA-1053 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Vincent Massol >Assignee: Michael McCandless > Fix For: 1.4 > > Attachments: TIKA-1053.patch > > > Right now Tika 1.2 uses ASM 3.1. > However this is causing some issues for us on the XWiki project since we also > bundle other framework that use a more recent version of ASM (we use pegdown > which uses parboiled which draws ASM 4.0). > The problem is that ASM 3.x and 4.0 are not compatible... > See http://jira.xwiki.org/browse/XE-1269 for more details about the issue > we're facing. > Thanks for considering upgrading to ASM 4.x :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x
[ https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573623#comment-13573623 ] Vincent Massol commented on TIKA-1053: -- Thanks a lot guys. Let's hope Tika 1.4 will not be released in too long now ;) > Upgrade Tika Parsers to use ASM 4.x > --- > > Key: TIKA-1053 > URL: https://issues.apache.org/jira/browse/TIKA-1053 > Project: Tika > Issue Type: Improvement > Components: parser >Affects Versions: 1.2 >Reporter: Vincent Massol >Assignee: Michael McCandless > Fix For: 1.4 > > Attachments: TIKA-1053.patch > > > Right now Tika 1.2 uses ASM 3.1. > However this is causing some issues for us on the XWiki project since we also > bundle other framework that use a more recent version of ASM (we use pegdown > which uses parboiled which draws ASM 4.0). > The problem is that ASM 3.x and 4.0 are not compatible... > See http://jira.xwiki.org/browse/XE-1269 for more details about the issue > we're facing. > Thanks for considering upgrading to ASM 4.x :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x
Vincent Massol created TIKA-1053: Summary: Upgrade Tika Parsers to use ASM 4.x Key: TIKA-1053 URL: https://issues.apache.org/jira/browse/TIKA-1053 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.2 Reporter: Vincent Massol Right now Tika 1.2 uses ASM 3.1. However this is causing some issues for us on the XWiki project since we also bundle other framework that use a more recent version of ASM (we use pegdown which uses parboiled which draws ASM 4.0). The problem is that ASM 3.x and 4.0 are not compatible... See http://jira.xwiki.org/browse/XE-1269 for more details about the issue we're facing. Thanks for considering upgrading to ASM 4.x :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira