[ https://issues.apache.org/jira/browse/JAMES-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benoit Tellier closed JAMES-2910. --------------------------------- Resolution: Fixed https://github.com/linagora/james-project/pull/2739 solved this > HTML could be indexed directly in ElasticSearch > ----------------------------------------------- > > Key: JAMES-2910 > URL: https://issues.apache.org/jira/browse/JAMES-2910 > Project: James Server > Issue Type: Improvement > Components: elasticsearch, guice > Affects Versions: 3.4.0 > Reporter: Benoit Tellier > Priority: Major > Fix For: 3.5.0 > > > When tika is disabled, the DefaultTextExtract is used, which does not perform > html text extraction. > This results in decreased precision in search in such situation (index being > polluted by html) and of course results in a massive index size. > Proposal: > CassandraGuice should default to JsoupTextExtractor when tika is disabled. > This will allow html text extraction to actually happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org