[jira] [Comment Edited] (TIKA-3686) CSS file detected as JavaScript (application/javascript)

2022-03-03 Thread Vincent Massol (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500814#comment-17500814
 ] 

Vincent Massol edited comment on TIKA-3686 at 3/3/22, 2:57 PM:
---

[~nick] Thanks. Note that it seems to be some sort of regression since it 
worked fine before the upgrade to Tika 2.0.0. Was this change of behavior 
wanted?

Details at https://jira.xwiki.org/browse/XWIKI-19491


was (Author: vmassol):
[~nick] Thanks. Note that it seems to be some sort of regression since it 
worked fine before the upgrade to Tika 2.0.0. Was this change of behavior 
wanted?

> CSS file detected as JavaScript (application/javascript)
> 
>
> Key: TIKA-3686
> URL: https://issues.apache.org/jira/browse/TIKA-3686
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.0.0-ALPHA
>Reporter: Marius Dumitru Florea
>Priority: Major
>
> The following CSS file 
> [https://github.com/techlab/jquery-smartwizard/blob/v5.1.1/dist/css/smart_wizard_all.min.css]
>  is detected as {{application/javascript}} using:
> {noformat}
> TikaUtils.detect(InputStream stream, String name)
> {noformat}
> The reason seems to be that the CSS file starts with:
> {noformat}
> /*!
>  * jQuery
> {noformat}
> which matches the "jQuery" entry from 
> [tika-mimetypes.xml|https://github.com/apache/tika/blob/2.3.0/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L348]
>  used by Tika's {{MimeTypes}} detector.
> This is a regression introduced by 
> https://github.com/apache/tika/commit/97699598f000139b1222b785d634b3c8a8e216c7
>  in TIKA-1141 (2.0.0-ALPHA).
> The implications are serious if the mime type returned by Tika is used to set 
> the content type on the HTTP request returning the CSS file to the browser: 
> the browser ignores the CSS.
> FTR, in my case the CSS file is not served directly from the file system but 
> from a WebJar (in this case 
> https://search.maven.org/artifact/org.webjars.npm/smartwizard/5.1.1/jar ) and 
> we're using Tika to determine the type of files requested from the WebJars.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3686) CSS file detected as JavaScript (application/javascript)

2022-03-03 Thread Vincent Massol (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500814#comment-17500814
 ] 

Vincent Massol commented on TIKA-3686:
--

[~nick] Thanks. Note that it seems to be some sort of regression since it 
worked fine before the upgrade to Tika 2.0.0. Was this change of behavior 
wanted?

> CSS file detected as JavaScript (application/javascript)
> 
>
> Key: TIKA-3686
> URL: https://issues.apache.org/jira/browse/TIKA-3686
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.0.0-ALPHA
>Reporter: Marius Dumitru Florea
>Priority: Major
>
> The following CSS file 
> [https://github.com/techlab/jquery-smartwizard/blob/v5.1.1/dist/css/smart_wizard_all.min.css]
>  is detected as {{application/javascript}} using:
> {noformat}
> TikaUtils.detect(InputStream stream, String name)
> {noformat}
> The reason seems to be that the CSS file starts with:
> {noformat}
> /*!
>  * jQuery
> {noformat}
> which matches the "jQuery" entry from 
> [tika-mimetypes.xml|https://github.com/apache/tika/blob/2.3.0/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L348]
>  used by Tika's {{MimeTypes}} detector.
> This is a regression introduced by 
> https://github.com/apache/tika/commit/97699598f000139b1222b785d634b3c8a8e216c7
>  in TIKA-1141 (2.0.0-ALPHA).
> The implications are serious if the mime type returned by Tika is used to set 
> the content type on the HTTP request returning the CSS file to the browser: 
> the browser ignores the CSS.
> FTR, in my case the CSS file is not served directly from the file system but 
> from a WebJar (in this case 
> https://search.maven.org/artifact/org.webjars.npm/smartwizard/5.1.1/jar ) and 
> we're using Tika to determine the type of files requested from the WebJars.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-1143) Fails to parse some PPT file

2015-03-15 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362379#comment-14362379
 ] 

Vincent Massol commented on TIKA-1143:
--

It's been fixed in Tika 1.5.

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java

[jira] [Commented] (TIKA-1143) Fails to parse some PPT file

2015-03-15 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362308#comment-14362308
 ] 

Vincent Massol commented on TIKA-1143:
--

Thanks Tyler. Could you set the "fix version" and "assignee" fields please? The 
fix version is especially important so that we can know which version of Tika 
we have to take that has the fix (ie the POI upgrade). Thanks!

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:

[jira] [Commented] (TIKA-1143) Fails to parse some PPT file

2013-07-03 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698944#comment-13698944
 ] 

Vincent Massol commented on TIKA-1143:
--

Sorry my bad, I said earlier that my content didn't seem indexed. It's actually 
not correct and I confirm it is indexed (I was not searching correctly) and 
thus the stack trace is only a warning and doesn't affect the rest.

Thanks!

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:16

[jira] [Commented] (TIKA-1143) Fails to parse some PPT file

2013-07-03 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698861#comment-13698861
 ] 

Vincent Massol commented on TIKA-1143:
--

You guys are awesome! Such as fast response time :) Way to go!

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika

[jira] [Commented] (TIKA-1143) Fails to parse some PPT file

2013-07-03 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698860#comment-13698860
 ] 

Vincent Massol commented on TIKA-1143:
--

{quote}
Are you able to extract text from the rest of the document?
{quote}

No, it doesn't seem indexed since I can't find it in my search. However I 
haven't coded this code in XWiki so maybe we stop some processing when we get 
an exception and this is why the rest isn't indexed.

Will check with the developer who coded this part.

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parse

[jira] [Commented] (TIKA-1143) Fails to parse some PPT file

2013-07-03 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698809#comment-13698809
 ] 

Vincent Massol commented on TIKA-1143:
--

Hi Nick. Unfortunately I don't know the origin.

I can open the file fine with libreoffice or MS PPT. I've just attached the 
file to this issue.

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.pars

[jira] [Updated] (TIKA-1143) Fails to parse some PPT file

2013-07-03 Thread Vincent Massol (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Massol updated TIKA-1143:
-

Attachment: XWikiIExpoPresentation.ppt

> Fails to parse some PPT file
> 
>
> Key: TIKA-1143
> URL: https://issues.apache.org/jira/browse/TIKA-1143
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.4
>Reporter: Vincent Massol
> Attachments: XWikiIExpoPresentation.ppt
>
>
> See also http://jira.xwiki.org/browse/XWIKI-9308
> Here's what I get with the attached file:
> {noformat}
> 2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
> [tika-core-1.4.jar:na]
>   at org.apache.tika.Tika.parseToString(Tika.java:380) 
> [tika-core-1.4.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
>  [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
>   at 
> com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
> [xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
>   at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
> 2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
> a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
> summary entry DocumentSummaryInformation 
> java.lang.ClassCastException: [B cannot be cast to java.lang.String
>   at 
> org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
>  ~[poi-3.9.jar:3.9]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
>  [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
> [tika-parsers-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
> [tika-core-1.4.jar:na]
>  

[jira] [Created] (TIKA-1143) Fails to parse some PPT file

2013-07-03 Thread Vincent Massol (JIRA)
Vincent Massol created TIKA-1143:


 Summary: Fails to parse some PPT file
 Key: TIKA-1143
 URL: https://issues.apache.org/jira/browse/TIKA-1143
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.4
Reporter: Vincent Massol


See also http://jira.xwiki.org/browse/XWIKI-9308

Here's what I get with the attached file:

{noformat}
2013-07-03 11:52:45,332 [XWiki Solr index thread] WARN  
a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
summary entry DocumentSummaryInformation 
java.lang.ClassCastException: [B cannot be cast to java.lang.String
at 
org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
 ~[poi-3.9.jar:3.9]
at 
org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
 [tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
 [tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
 [tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
[tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
[tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
[tika-core-1.4.jar:na]
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
[tika-core-1.4.jar:na]
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
[tika-core-1.4.jar:na]
at org.apache.tika.Tika.parseToString(Tika.java:380) 
[tika-core-1.4.jar:na]
at 
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.getContentAsText(AttachmentSolrMetadataExtractor.java:130)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setLocaleAndContentFields(AttachmentSolrMetadataExtractor.java:97)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
org.xwiki.search.solr.internal.metadata.AttachmentSolrMetadataExtractor.setFieldsInternal(AttachmentSolrMetadataExtractor.java:79)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
org.xwiki.search.solr.internal.metadata.AbstractSolrMetadataExtractor.getSolrDocument(AbstractSolrMetadataExtractor.java:114)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
org.xwiki.search.solr.internal.DefaultSolrIndexer.getSolrDocument(DefaultSolrIndexer.java:465)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
org.xwiki.search.solr.internal.DefaultSolrIndexer.processBatch(DefaultSolrIndexer.java:378)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
org.xwiki.search.solr.internal.DefaultSolrIndexer.runInternal(DefaultSolrIndexer.java:353)
 [xwiki-platform-search-solr-api-5.2-20130702.194010-10.jar:na]
at 
com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:121) 
[xwiki-platform-oldcore-5.2-20130702.190754-22.jar:na]
at java.lang.Thread.run(Thread.java:680) [na:1.6.0_51]
2013-07-03 11:52:49,985 [Lucene Index Updater] WARN  
a.t.p.m.AbstractPOIFSExtractor - Ignoring unexpected exception while parsing 
summary entry DocumentSummaryInformation 
java.lang.ClassCastException: [B cannot be cast to java.lang.String
at 
org.apache.poi.hpsf.DocumentSummaryInformation.getCategory(DocumentSummaryInformation.java:78)
 ~[poi-3.9.jar:3.9]
at 
org.apache.tika.parser.microsoft.SummaryExtractor.parse(SummaryExtractor.java:143)
 [tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:88)
 [tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
 [tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:170) 
[tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161) 
[tika-parsers-1.4.jar:na]
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
[tika-core-1.4.jar:na]
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) 
[tika-core-1.4.jar:na]
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) 
[tika-core-1.4.jar:na]
at org.apache.tika.Tika.parseToString(Tika.java:380) 
[tika-core-1.4.jar:na]
at 
com.xpn.xwiki.plugin.lucene.internal.AttachmentData.getContentAsText(AttachmentData.java:221)
 [xwiki-platform-search-lucene-api-5.2-20130702.191134-22.jar:na]
   

[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-06-27 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694677#comment-13694677
 ] 

Vincent Massol commented on TIKA-1053:
--

Cool, thanks Nick. 

> Upgrade Tika Parsers to use ASM 4.x
> ---
>
> Key: TIKA-1053
> URL: https://issues.apache.org/jira/browse/TIKA-1053
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Vincent Massol
>Assignee: Michael McCandless
> Fix For: 1.4
>
> Attachments: TIKA-1053.patch
>
>
> Right now Tika 1.2 uses ASM 3.1. 
> However this is causing some issues for us on the XWiki project since we also 
> bundle other framework that use a more recent version of ASM (we use pegdown 
> which uses parboiled which draws ASM 4.0).
> The problem is that ASM 3.x and 4.0 are not compatible...
> See http://jira.xwiki.org/browse/XE-1269 for more details about the issue 
> we're facing.
> Thanks for considering upgrading to ASM 4.x :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-06-27 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694624#comment-13694624
 ] 

Vincent Massol commented on TIKA-1053:
--

Thanks again for fixing this. Any idea when Tika 1.4 is going to be released? 
(I'm still waiting for this fix).

> Upgrade Tika Parsers to use ASM 4.x
> ---
>
> Key: TIKA-1053
> URL: https://issues.apache.org/jira/browse/TIKA-1053
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Vincent Massol
>Assignee: Michael McCandless
> Fix For: 1.4
>
> Attachments: TIKA-1053.patch
>
>
> Right now Tika 1.2 uses ASM 3.1. 
> However this is causing some issues for us on the XWiki project since we also 
> bundle other framework that use a more recent version of ASM (we use pegdown 
> which uses parboiled which draws ASM 4.0).
> The problem is that ASM 3.x and 4.0 are not compatible...
> See http://jira.xwiki.org/browse/XE-1269 for more details about the issue 
> we're facing.
> Thanks for considering upgrading to ASM 4.x :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-02-07 Thread Vincent Massol (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573623#comment-13573623
 ] 

Vincent Massol commented on TIKA-1053:
--

Thanks a lot guys. Let's hope Tika 1.4 will not be released in too long now ;)

> Upgrade Tika Parsers to use ASM 4.x
> ---
>
> Key: TIKA-1053
> URL: https://issues.apache.org/jira/browse/TIKA-1053
> Project: Tika
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.2
>Reporter: Vincent Massol
>Assignee: Michael McCandless
> Fix For: 1.4
>
> Attachments: TIKA-1053.patch
>
>
> Right now Tika 1.2 uses ASM 3.1. 
> However this is causing some issues for us on the XWiki project since we also 
> bundle other framework that use a more recent version of ASM (we use pegdown 
> which uses parboiled which draws ASM 4.0).
> The problem is that ASM 3.x and 4.0 are not compatible...
> See http://jira.xwiki.org/browse/XE-1269 for more details about the issue 
> we're facing.
> Thanks for considering upgrading to ASM 4.x :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (TIKA-1053) Upgrade Tika Parsers to use ASM 4.x

2013-01-07 Thread Vincent Massol (JIRA)
Vincent Massol created TIKA-1053:


 Summary: Upgrade Tika Parsers to use ASM 4.x
 Key: TIKA-1053
 URL: https://issues.apache.org/jira/browse/TIKA-1053
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.2
Reporter: Vincent Massol


Right now Tika 1.2 uses ASM 3.1. 

However this is causing some issues for us on the XWiki project since we also 
bundle other framework that use a more recent version of ASM (we use pegdown 
which uses parboiled which draws ASM 4.0).

The problem is that ASM 3.x and 4.0 are not compatible...

See http://jira.xwiki.org/browse/XE-1269 for more details about the issue we're 
facing.

Thanks for considering upgrading to ASM 4.x :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira