[
https://issues.apache.org/jira/browse/OAK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362306#comment-14362306
]
Chetan Mehrotra edited comment on OAK-2468 at 3/15/15 9:10 AM:
---------------------------------------------------------------
Done with http://svn.apache.org/r1666787
Now {{jcr:mimeType}} has not be not null and supported by Tika for the binary
content to be index. Note that with this its now assumed that all binary
properties in given node are of same mimeType.
JR2 used to restrict indexing only binary content stored under {{jcr:data}} and
when {{jcr:mimeType}} is specified. With Oak we index binary property with any
name but do enforce that {{jcr:mimeType}} is not null
was (Author: chetanm):
Done with
Now {{jcr:mimeType}} has not be not null and supported by Tika for the binary
content to be index. Note that with this its now assumed that all binary
properties in given node are of same mimeType.
JR2 used to restrict indexing only binary content stored under {{jcr:data}} and
when {{jcr:mimeType}} is specified. With Oak we index binary property with any
name but do enforce that {{jcr:mimeType}} is not null
> Index binary only if some Tika parser can support the binaries mimeType
> -----------------------------------------------------------------------
>
> Key: OAK-2468
> URL: https://issues.apache.org/jira/browse/OAK-2468
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: oak-lucene
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Priority: Minor
> Fix For: 1.1.8
>
>
> Currently all binaries are passed to Tika for text extraction. However Tika
> can only parse those for which it has supported parser present. Therefore
> extraction logic should parse a binary only if the mimeType is supported by
> Tika.
> With this change {{jcr:mimeType}} would become a mandatory property
> JR2 had a similar check [1]
> [1]
> https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-core/src/main/java/org/apache/jackrabbit/core/query/lucene/NodeIndexer.java#L932
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)