[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0
[ https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357783#comment-16357783 ] Ahmet Arslan commented on SOLR-10308: - {quote}I don't know if it was correct to assume UTF-8 on the hashString usages {quote} I believe it should be as the following. I had confirmed it before in SOLR-11260 * HashFunction.hashString -> HashFunction.hashUnencodedChars > Solr fails to work with Guava 21.0 > -- > > Key: SOLR-10308 > URL: https://issues.apache.org/jira/browse/SOLR-10308 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Vincent Massol >Priority: Major > Attachments: SOLR-10308.patch > > > This is what we get: > {noformat} > Caused by: java.lang.NoSuchMethodError: > com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > at > org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957) > at > org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117) > at > org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122) > at > org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72) > at > org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67) > at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287) > at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384) > at > org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173) > ... 183 more > {noformat} > Guava 21 has removed some signature that solr is currently using. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0
[ https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327479#comment-16327479 ] Ahmet Arslan commented on SOLR-10308: - Unfortunately, the patch affects the following hdfs test cases. {noformat} [junit4] HEARTBEAT J1 PID(26866@...): 2018-01-16T19:27:40, stalled for 7988s at: HdfsNNFailoverTest (suite) [junit4] HEARTBEAT J2 PID(26860@...): 2018-01-16T19:27:40, stalled for 8591s at: StressHdfsTest (suite) [junit4] HEARTBEAT J0 PID(26869@...): 2018-01-16T19:28:04, stalled for 7141s at: MoveReplicaHDFSFailoverTest (suite) [junit4] HEARTBEAT J3 PID(26880@...): 2018-01-16T19:28:04, stalled for 8352s at: CheckHdfsIndexTest (suite) [junit4] HEARTBEAT J1 PID(26866@...): 2018-01-16T19:28:40, stalled for 8048s at: HdfsNNFailoverTest (suite) [junit4] HEARTBEAT J2 PID(26860@...): 2018-01-16T19:28:40, stalled for 8651s at: StressHdfsTest (suite) [junit4] HEARTBEAT J0 PID(26869@...): 2018-01-16T19:29:04, stalled for 7201s at: MoveReplicaHDFSFailoverTest (suite) [junit4] HEARTBEAT J3 PID(26880@...): 2018-01-16T19:29:04, stalled for 8412s at: CheckHdfsIndexTest (suite) {noformat} > Solr fails to work with Guava 21.0 > -- > > Key: SOLR-10308 > URL: https://issues.apache.org/jira/browse/SOLR-10308 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Vincent Massol >Priority: Major > > This is what we get: > {noformat} > Caused by: java.lang.NoSuchMethodError: > com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > at > org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957) > at > org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117) > at > org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122) > at > org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72) > at > org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67) > at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287) > at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384) > at > org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173) > ... 183 more > {noformat} > Guava 21 has removed some signature that solr is currently using. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0
[ https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327071#comment-16327071 ] Ahmet Arslan commented on SOLR-10308: - {quote} relocate the version of guava used{quote} [~mdrob] can you please provide some context/pointers? I am not familiar with the relocate concept. > Solr fails to work with Guava 21.0 > -- > > Key: SOLR-10308 > URL: https://issues.apache.org/jira/browse/SOLR-10308 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Vincent Massol >Priority: Major > > This is what we get: > {noformat} > Caused by: java.lang.NoSuchMethodError: > com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > at > org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957) > at > org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117) > at > org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122) > at > org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72) > at > org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67) > at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287) > at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384) > at > org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173) > ... 183 more > {noformat} > Guava 21 has removed some signature that solr is currently using. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0
[ https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327068#comment-16327068 ] Ahmet Arslan commented on SOLR-10308: - {{ant precommit}} passed. Running {{ant test}} now but it looks like it is hanged: {noformat} [junit4] HEARTBEAT J0 PID(2348@...): 2018-01-16T15:19:55, stalled for 28263s at: HdfsDirectoryTest (suite) [junit4] HEARTBEAT J0 PID(2348@...): 2018-01-16T15:20:55, stalled for 28323s at: HdfsDirectoryTest (suite) [junit4] HEARTBEAT J0 PID(2348@...): 2018-01-16T15:21:55, stalled for 28383s at: HdfsDirectoryTest (suite) {noformat} I am not sure whether it is due to guava update or not. I will try to figure out. > Solr fails to work with Guava 21.0 > -- > > Key: SOLR-10308 > URL: https://issues.apache.org/jira/browse/SOLR-10308 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Vincent Massol >Priority: Major > > This is what we get: > {noformat} > Caused by: java.lang.NoSuchMethodError: > com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > at > org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957) > at > org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117) > at > org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122) > at > org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72) > at > org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67) > at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287) > at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384) > at > org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173) > ... 183 more > {noformat} > Guava 21 has removed some signature that solr is currently using. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0
[ https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326657#comment-16326657 ] Ahmet Arslan commented on SOLR-10308: - {quote}Fixing the three guava usages in Solr code that are incompatible with version 21 should be pretty easy {quote} I have a patch in SOLR-11260 for this. I needed this for another reason: to be able to use a third party NLP library for my [Turkish analysis plugin|https://github.com/iorixxx/lucene-solr-analysis-turkish]. A drop-in upgrade of Guava breaks highlighting. I patched solr 6.6.0 and using it in a production-like environment. Does this patch solve [~vmassol]'s problem? Does it break any existing functionality? > Solr fails to work with Guava 21.0 > -- > > Key: SOLR-10308 > URL: https://issues.apache.org/jira/browse/SOLR-10308 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Vincent Massol >Priority: Major > > This is what we get: > {noformat} > Caused by: java.lang.NoSuchMethodError: > com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; > at > org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299) > at > org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942) > at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957) > at > org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117) > at > org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122) > at > org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72) > at > org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67) > at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287) > at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395) > at > org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384) > at > org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173) > ... 183 more > {noformat} > Guava 21 has removed some signature that solr is currently using. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11260) Update Guava to 23.0
[ https://issues.apache.org/jira/browse/SOLR-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan reassigned SOLR-11260: --- Assignee: Ahmet Arslan > Update Guava to 23.0 > > > Key: SOLR-11260 > URL: https://issues.apache.org/jira/browse/SOLR-11260 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.6 >Reporter: Ahmet Arslan >Assignee: Ahmet Arslan >Priority: Minor > Fix For: master (8.0) > > Attachments: SOLR-11260.patch > > > Solr 6.6.0 depends on a pretty old version of guava. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11260) Update Guava to 23.0
[ https://issues.apache.org/jira/browse/SOLR-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-11260: Attachment: SOLR-11260.patch Patch replaces two methods that are removed in Guava 23.0 * Objects.firstNonNull -> MoreObjects.firstNonNull * HashFunction.hashString -> HashFunction.hashUnencodedChars > Update Guava to 23.0 > > > Key: SOLR-11260 > URL: https://issues.apache.org/jira/browse/SOLR-11260 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.6 >Reporter: Ahmet Arslan >Priority: Minor > Fix For: master (8.0) > > Attachments: SOLR-11260.patch > > > Solr 6.6.0 depends on a pretty old version of guava. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11260) Update Guava to 23.0
Ahmet Arslan created SOLR-11260: --- Summary: Update Guava to 23.0 Key: SOLR-11260 URL: https://issues.apache.org/jira/browse/SOLR-11260 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 6.6 Reporter: Ahmet Arslan Priority: Minor Fix For: master (8.0) Solr 6.6.0 depends on a pretty old version of guava. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch Fix TestStopFilterFactory and TestSuggestStopFilterFactory failure > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: 7.0 > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, > LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, > LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072994#comment-16072994 ] Ahmet Arslan commented on LUCENE-7773: -- Can someone please look into this issue? This issue addresses a [TODO|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java#L43] introduced by [~rcmuir] in [this|https://github.com/apache/lucene-solr/commit/bc3a3dc5d47af0c00748468b1ae14b4a18854366] commit. > Remove unused/deprecated token types from StandardTokenizer > --- > > Key: LUCENE-7773 > URL: https://issues.apache.org/jira/browse/LUCENE-7773 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.5 >Reporter: Ahmet Arslan >Priority: Minor > Labels: analyzers > Fix For: 7.0 > > Attachments: LUCENE-7773.patch, LUCENE-7773.patch > > > StandardTokenizer does not recognize e-mail, company etc. This issue removes > those token types. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch Sorry for the huge delay. This patch addresses the issues raised by David. * consumeAllTokens is used by LimitTokenOffset and LimitTokenPosition too. * applies Yonik's concept * improved javadoc. Some arguments are difficult since they have different meanings in different components. * covers a few more overlooked analysis factories * spotted a copy-paste mistake Any feedback is appreciated. > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: 7.0 > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, > LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch Finally {{ant precommit}} passes with this patch. It checks missing javadocs using *level=package* for icu, morfologik, phonetic, and suggest. > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, > LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch bring the patch to master. > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, > LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7773: - Attachment: LUCENE-7773.patch Make the {{TestAnalyzers}} compile again. > Remove unused/deprecated token types from StandardTokenizer > --- > > Key: LUCENE-7773 > URL: https://issues.apache.org/jira/browse/LUCENE-7773 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.5 >Reporter: Ahmet Arslan >Priority: Minor > Labels: analyzers > Fix For: master (7.0) > > Attachments: LUCENE-7773.patch, LUCENE-7773.patch > > > StandardTokenizer does not recognize e-mail, company etc. This issue removes > those token types. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7773: - Attachment: LUCENE-7773.patch This patch removes old types. > Remove unused/deprecated token types from StandardTokenizer > --- > > Key: LUCENE-7773 > URL: https://issues.apache.org/jira/browse/LUCENE-7773 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.5 >Reporter: Ahmet Arslan >Priority: Minor > Labels: analyzers > Fix For: master (7.0) > > Attachments: LUCENE-7773.patch > > > StandardTokenizer does not recognize e-mail, company etc. This issue removes > those token types. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer
Ahmet Arslan created LUCENE-7773: Summary: Remove unused/deprecated token types from StandardTokenizer Key: LUCENE-7773 URL: https://issues.apache.org/jira/browse/LUCENE-7773 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 6.5 Reporter: Ahmet Arslan Priority: Minor Fix For: master (7.0) StandardTokenizer does not recognize e-mail, company etc. This issue removes those token types. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile
[ https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780903#comment-15780903 ] Ahmet Arslan commented on LUCENE-7602: -- I think the current issue will clean up three previous issues. > Fix compiler warnings for ant clean compile > --- > > Key: LUCENE-7602 > URL: https://issues.apache.org/jira/browse/LUCENE-7602 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Paul Elschot >Priority: Minor > Labels: build > Fix For: trunk > > Attachments: LUCENE-7602-ContextMap-lucene.patch, > LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile
[ https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777253#comment-15777253 ] Ahmet Arslan commented on LUCENE-7602: -- can't we just use Map
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch Patch that adds javadocs. {{ant documentation-lint}} still fails for some reason that I cannot figure out. > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, > LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7599) replace TestRandomChains.Predicate with java.util.function.Predicate
[ https://issues.apache.org/jira/browse/LUCENE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7599: - Attachment: LUCENE-7599.patch Patch that replaces ArgProducer with Function> replace TestRandomChains.Predicate with java.util.function.Predicate > > > Key: LUCENE-7599 > URL: https://issues.apache.org/jira/browse/LUCENE-7599 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Priority: Trivial > Labels: test > Fix For: master (7.0) > > Attachments: LUCENE-7599.patch, LUCENE-7599.patch > > > {{TestRandomChains}} has its own Predicate interface which can be replaced > with {{java.util.function.Predicate}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7599) replace TestRandomChains.Predicate with java.util.function.Predicate
[ https://issues.apache.org/jira/browse/LUCENE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7599: - Attachment: LUCENE-7599.patch Patch that removes {{TestRandomChains.Predicate}} in favour of {{java.util.function.Predicate}. It simplifies code with lambda expressions or method references. > replace TestRandomChains.Predicate with java.util.function.Predicate > > > Key: LUCENE-7599 > URL: https://issues.apache.org/jira/browse/LUCENE-7599 > Project: Lucene - Core > Issue Type: Improvement > Components: general/test >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Priority: Trivial > Labels: test > Fix For: master (7.0) > > Attachments: LUCENE-7599.patch > > > {{TestRandomChains}} has its own Predicate interface which can be replaced > with {{java.util.function.Predicate}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7599) replace TestRandomChains.Predicate with java.util.function.Predicate
Ahmet Arslan created LUCENE-7599: Summary: replace TestRandomChains.Predicate with java.util.function.Predicate Key: LUCENE-7599 URL: https://issues.apache.org/jira/browse/LUCENE-7599 Project: Lucene - Core Issue Type: Improvement Components: general/test Affects Versions: 6.3 Reporter: Ahmet Arslan Priority: Trivial Fix For: master (7.0) {{TestRandomChains}} has its own Predicate interface which can be replaced with {{java.util.function.Predicate}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758938#comment-15758938 ] Ahmet Arslan commented on LUCENE-7585: -- I tried adding javadocs to fields in the interface, but it did not solve the missing javadocs problem. {{documentation-lint}} complains/fails for the lucene/analysis/modules, which are explicitly defined with the level of method in [lucene/build.xml|https://github.com/apache/lucene-solr/blob/master/lucene/build.xml] {code:xml} {code} I figured that this method=(level|class|none) thing is about [checkJavaDocs.py|https://github.com/apache/lucene-solr/blob/master/dev-tools/scripts/checkJavaDocs.py]. Any pointer how to document interface fields so that level="method" passes in checkJavaDocs.py? Or, can we remove above xml fragment from build.xml? > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch a few more refactoring including the overlooked code point filter factory. > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749577#comment-15749577 ] Ahmet Arslan commented on LUCENE-7585: -- By saying inconsistency, I mean the strategy to retrieve those parameters from the arg map. Some use inline string constant e.g. getBoolean(args, "reverse"); others define private or public static final String for the key. > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737593#comment-15737593 ] Ahmet Arslan commented on LUCENE-7585: -- Here is an excerpt from {{documentation-lint}} {code} [exec] build/docs/analyzers-icu/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.html [exec] missing Fields: CONSUME_ALL_TOKENS [exec] missing Fields: DELIMITER [exec] missing Fields: DICTIONARY [exec] missing Fields: ENCODER [exec] missing Fields: FORMAT [exec] missing Fields: IGNORE_CASE [exec] missing Fields: LUCENE_MATCH_VERSION [exec] missing Fields: MAX [exec] missing Fields: MAX_TOKEN_LENGTH [exec] missing Fields: MIN [exec] missing Fields: PATTERN [exec] missing Fields: PRESERVE_ORIGINAL [exec] missing Fields: PROTECTED [exec] missing Fields: TYPES [exec] missing Fields: WORDS [exec] [exec] Missing javadocs were found! {code} > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736501#comment-15736501 ] Ahmet Arslan commented on LUCENE-7585: -- Thank you for looking into this. Initially, I was planning to move all existing parameters to a common interface. I figured that the interface will grow very large since certain factories have many specific parameters. I moved the most common parameters to the interface. However, there still remains a lot in the codebase. For example, ngram package has "minGramSize" and "maxGramSize" in common. Phonetic module has "maxCodeLength" and "inject." What could be the preferred course of action here? * Handle packages and modules locally? If yes how? * Move all parameters to the interface unconditionally. * Devise an algorithm: Move if a parameter is shared by at least two package or module. * ? > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch Properly created patch that includes proposed changes (alphabetisation and lucene_match_version). Ant {{documentation-lint}} complains about factories of icu. Any pointer how to fix it? > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Assignee: David Smiley >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch, LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories
[ https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7585: - Attachment: LUCENE-7585.patch > Interface for common parameters used across analysis factories > -- > > Key: LUCENE-7585 > URL: https://issues.apache.org/jira/browse/LUCENE-7585 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.3 >Reporter: Ahmet Arslan >Priority: Minor > Fix For: master (7.0) > > Attachments: LUCENE-7585.patch > > > Certain parameters (String constants) are same/common for multiple analysis > factories. Some examples are {{ignoreCase}}, {{dictionary}}, and > {{preserveOriginal}}. These string constants are handled inconsistently in > different factories. This is an effort to define most common constants in > ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7585) Interface for common parameters used across analysis factories
Ahmet Arslan created LUCENE-7585: Summary: Interface for common parameters used across analysis factories Key: LUCENE-7585 URL: https://issues.apache.org/jira/browse/LUCENE-7585 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 6.3 Reporter: Ahmet Arslan Priority: Minor Fix For: master (7.0) Certain parameters (String constants) are same/common for multiple analysis factories. Some examples are {{ignoreCase}}, {{dictionary}}, and {{preserveOriginal}}. These string constants are handled inconsistently in different factories. This is an effort to define most common constants in ({{CommonAnalysisFactoryParams}}) interface and reuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7525) ASCIIFoldingFilter.foldToASCII performance issue due to large compiled method size
[ https://issues.apache.org/jira/browse/LUCENE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616922#comment-15616922 ] Ahmet Arslan commented on LUCENE-7525: -- Can workings of ICUFoldingFilter give any insight here? > ASCIIFoldingFilter.foldToASCII performance issue due to large compiled method > size > -- > > Key: LUCENE-7525 > URL: https://issues.apache.org/jira/browse/LUCENE-7525 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 6.2.1 >Reporter: Karl von Randow > Attachments: ASCIIFolding.java, ASCIIFoldingFilter.java, > TestASCIIFolding.java > > > The {{ASCIIFoldingFilter.foldToASCII}} method has an enormous switch > statement and is too large for the HotSpot compiler to compile; causing a > performance problem. > The method is about 13K compiled, versus the 8KB HotSpot limit. So splitting > the method in half works around the problem. > In my tests splitting the method in half resulted in a 5X performance > increase. > In the test code below you can see how slow the fold method is, even when it > is using the shortcut when the character is less than 0x80, compared to an > inline implementation of the same shortcut. > So a workaround is to split the method. I'm happy to provide a patch. It's a > hack, of course. Perhaps using the {{MappingCharFilterFactory}} with an input > file as per SOLR-2013 would be a better replacement for this method in this > class? > {code:java} > public class ASCIIFoldingFilterPerformanceTest { > private static final int ITERATIONS = 1_000_000; > @Test > public void testFoldShortString() { > char[] input = "testing".toCharArray(); > char[] output = new char[input.length * 4]; > for (int i = 0; i < ITERATIONS; i++) { > ASCIIFoldingFilter.foldToASCII(input, 0, output, 0, > input.length); > } > } > @Test > public void testFoldShortAccentedString() { > char[] input = "éúéúøßüäéúéúøßüä".toCharArray(); > char[] output = new char[input.length * 4]; > for (int i = 0; i < ITERATIONS; i++) { > ASCIIFoldingFilter.foldToASCII(input, 0, output, 0, > input.length); > } > } > @Test > public void testManualFoldTinyString() { > char[] input = "t".toCharArray(); > char[] output = new char[input.length * 4]; > for (int i = 0; i < ITERATIONS; i++) { > int k = 0; > for (int j = 0; j < 1; ++j) { > final char c = input[j]; > if (c < '\u0080') { > output[k++] = c; > } else { > Assert.assertTrue(false); > } > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7377) Remove ClassicSimilarity?
[ https://issues.apache.org/jira/browse/LUCENE-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374882#comment-15374882 ] Ahmet Arslan commented on LUCENE-7377: -- I think, an implementation of TFIDF should stay in Lucene, but it should extend SimilarityBase and it should have a simple, single line code in org.apache.lucene.search.similarities.SimilarityBase#score method. e.g., {code} return tf * log2(((double) stats.getNumberOfDocuments() / (double) stats.getDocFreq()) + 1); {code} Current TFIDFSimilarity and ClassicSimilarity are hard to understand. > Remove ClassicSimilarity? > - > > Key: LUCENE-7377 > URL: https://issues.apache.org/jira/browse/LUCENE-7377 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > ClassicSimilarity was relying on coordination factors in order to produce > good scores. Now that coords are gone, it is quite a bad option compared to > eg. BM25Similarity. > Maybe we should remove ClassicSimilarity entirely in master and deprecated in > 6.x in order to encourage users to move to BM25Similarity rather than stay on > a Similarity impl of lesser quality? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *
[ https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349836#comment-15349836 ] Ahmet Arslan commented on SOLR-9250: Yes this is a know issue of wildcard queries. > Search breaks with EU symbol € and wildcard * > - > > Key: SOLR-9250 > URL: https://issues.apache.org/jira/browse/SOLR-9250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 5.3.1 >Reporter: Tim Nolan > Attachments: contact-name-analyze.png, contact-name-field-type.png > > > While testing UTF-8 character searches, which worked, we have noticed a > combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we > found the search worked, but by adding a wild-card (e.g. > {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the > {{€}} symbol worked (i.e. {{Tùûüÿ*}}). > Showing the logs for these queries: > {noformat:title=Full text without wildcard, hit=1} > 2016-06-25 13:16:34.361 [qtp237852351-21] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348} > hits=1 status=0 QTime=0 > {noformat} > {noformat:title=Full text with wildcard, hit=0} > 2016-06-25 13:16:41.172 [qtp237852351-16] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160} > hits=0 status=0 QTime=0 > {noformat} > {noformat:title=Partial text before € with wildcard, hit=1} > 2016-06-25 13:16:52.135 [qtp237852351-18] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} > hits=1 status=0 QTime=2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *
[ https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349767#comment-15349767 ] Ahmet Arslan commented on SOLR-9250: Yes this one, but you needs to make the chains visible. It is the tag in schema. Anyways, the problem looks like your tokenizer breaks/tokenizes your sample input at the UE char. Please use analysis admin page to see how your example text is tokenized/indexed. Have you read https://wiki.apache.org/solr/MultitermQueryAnalysis ? > Search breaks with EU symbol € and wildcard * > - > > Key: SOLR-9250 > URL: https://issues.apache.org/jira/browse/SOLR-9250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 5.3.1 >Reporter: Tim Nolan > Attachments: contact-name-analyze.png, contact-name-field-type.png > > > While testing UTF-8 character searches, which worked, we have noticed a > combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we > found the search worked, but by adding a wild-card (e.g. > {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the > {{€}} symbol worked (i.e. {{Tùûüÿ*}}). > Showing the logs for these queries: > {noformat:title=Full text without wildcard, hit=1} > 2016-06-25 13:16:34.361 [qtp237852351-21] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348} > hits=1 status=0 QTime=0 > {noformat} > {noformat:title=Full text with wildcard, hit=0} > 2016-06-25 13:16:41.172 [qtp237852351-16] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160} > hits=0 status=0 QTime=0 > {noformat} > {noformat:title=Partial text before € with wildcard, hit=1} > 2016-06-25 13:16:52.135 [qtp237852351-18] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} > hits=1 status=0 QTime=2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *
[ https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349738#comment-15349738 ] Ahmet Arslan commented on SOLR-9250: Please, we need to see field *type* definition. Where the analyzer elements are chained. > Search breaks with EU symbol € and wildcard * > - > > Key: SOLR-9250 > URL: https://issues.apache.org/jira/browse/SOLR-9250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 5.3.1 >Reporter: Tim Nolan > > While testing UTF-8 character searches, which worked, we have noticed a > combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we > found the search worked, but by adding a wild-card (e.g. > {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the > {{€}} symbol worked (i.e. {{Tùûüÿ*}}). > Showing the logs for these queries: > {noformat:title=Full text without wildcard, hit=1} > 2016-06-25 13:16:34.361 [qtp237852351-21] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348} > hits=1 status=0 QTime=0 > {noformat} > {noformat:title=Full text with wildcard, hit=0} > 2016-06-25 13:16:41.172 [qtp237852351-16] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160} > hits=0 status=0 QTime=0 > {noformat} > {noformat:title=Partial text before € with wildcard, hit=1} > 2016-06-25 13:16:52.135 [qtp237852351-18] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} > hits=1 status=0 QTime=2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *
[ https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349677#comment-15349677 ] Ahmet Arslan commented on SOLR-9250: bq. I'm not sure what you mean by that statement Please see https://wiki.apache.org/solr/MultitermQueryAnalysis > Search breaks with EU symbol € and wildcard * > - > > Key: SOLR-9250 > URL: https://issues.apache.org/jira/browse/SOLR-9250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 5.3.1 >Reporter: Tim Nolan > > While testing UTF-8 character searches, which worked, we have noticed a > combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we > found the search worked, but by adding a wild-card (e.g. > {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the > {{€}} symbol worked (i.e. {{Tùûüÿ*}}). > Showing the logs for these queries: > {noformat:title=Full text without wildcard, hit=1} > 2016-06-25 13:16:34.361 [qtp237852351-21] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348} > hits=1 status=0 QTime=0 > {noformat} > {noformat:title=Full text with wildcard, hit=0} > 2016-06-25 13:16:41.172 [qtp237852351-16] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160} > hits=0 status=0 QTime=0 > {noformat} > {noformat:title=Partial text before € with wildcard, hit=1} > 2016-06-25 13:16:52.135 [qtp237852351-18] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} > hits=1 status=0 QTime=2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *
[ https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349674#comment-15349674 ] Ahmet Arslan commented on SOLR-9250: We need to see field type definition for that field. Index time analyzer may breaking words at EU symbol or something. > Search breaks with EU symbol € and wildcard * > - > > Key: SOLR-9250 > URL: https://issues.apache.org/jira/browse/SOLR-9250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 5.3.1 >Reporter: Tim Nolan > > While testing UTF-8 character searches, which worked, we have noticed a > combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we > found the search worked, but by adding a wild-card (e.g. > {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the > {{€}} symbol worked (i.e. {{Tùûüÿ*}}). > Showing the logs for these queries: > {noformat:title=Full text without wildcard, hit=1} > 2016-06-25 13:16:34.361 [qtp237852351-21] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348} > hits=1 status=0 QTime=0 > {noformat} > {noformat:title=Full text with wildcard, hit=0} > 2016-06-25 13:16:41.172 [qtp237852351-16] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160} > hits=0 status=0 QTime=0 > {noformat} > {noformat:title=Partial text before € with wildcard, hit=1} > 2016-06-25 13:16:52.135 [qtp237852351-18] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} > hits=1 status=0 QTime=2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *
[ https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349652#comment-15349652 ] Ahmet Arslan commented on SOLR-9250: What do you mean by saying the search fails? Throws exception? Does not return expected results? Wildcard queries are not analyzed by the way. Please ask question of this type on user mailing list. > Search breaks with EU symbol € and wildcard * > - > > Key: SOLR-9250 > URL: https://issues.apache.org/jira/browse/SOLR-9250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Server >Affects Versions: 5.3.1 >Reporter: Tim Nolan > > While testing UTF-8 character searches, which worked, we have noticed a > combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we > found the search worked, but by adding a wild-card (e.g. > {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the > {{€}} symbol worked (i.e. {{Tùûüÿ*}}). > Showing the logs for these queries: > {noformat:title=Full text without wildcard, hit=1} > 2016-06-25 13:16:34.361 [qtp237852351-21] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348} > hits=1 status=0 QTime=0 > {noformat} > {noformat:title=Full text with wildcard, hit=0} > 2016-06-25 13:16:41.172 [qtp237852351-16] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160} > hits=0 status=0 QTime=0 > {noformat} > {noformat:title=Partial text before € with wildcard, hit=1} > 2016-06-25 13:16:52.135 [qtp237852351-18] INFO > org.apache.solr.core.SolrCore.Request – [core-name] webapp=/solr > path=/select > params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} > hits=1 status=0 QTime=2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346949#comment-15346949 ] Ahmet Arslan commented on LUCENE-7287: -- This is a new feature that is never released, new ticket may not be needed. > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 > PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346875#comment-15346875 ] Ahmet Arslan commented on LUCENE-7287: -- Hi, multiple tokens OK, but multiple identical tokens look weird, no? Have you checked the screenshot that includes RemoveDuplicatesTokenFilterFactory (RDTF)? bq. Shall I create mappings_uk.txt so we can use it in solr? Lets ask Michael. Either separate file or we can just recommend to use mapping char filter the recommended mappings. May be we can place the uk_mappings.txt file under https://github.com/apache/lucene-solr/tree/master/solr/server/solr/configsets/sample_techproducts_configs/conf/lang > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 > PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346857#comment-15346857 ] Ahmet Arslan commented on LUCENE-7287: -- Please see screenshots in the attachments section at the begging of the page and let me know what you think. > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 > PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7287: - Attachment: Screen Shot 2016-06-23 at 8.41.28 PM.png Here is the screen shot of analysis admin page, with RemoveDuplicatesTokenFilter added. {code:xml} {code} > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 > PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7287: - Attachment: Screen Shot 2016-06-23 at 8.23.01 PM.png {code:xml} {code} > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 > PM.png > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346816#comment-15346816 ] Ahmet Arslan commented on LUCENE-7287: -- Hi, I was able to run the analyzer successfully. Without mapping chart filter. Because character mappings are hardcoded into code. I am attaching an analysis screen shot. However, it looks like we need a remove duplicates token filter at the end. It looks like Morfologik filter injects multiple tokens at the same position > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344403#comment-15344403 ] Ahmet Arslan commented on LUCENE-7287: -- only committers have rights to edit confluence wiki. Contributors include the proposed change/addition as a message at the end of the page. > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344290#comment-15344290 ] Ahmet Arslan commented on LUCENE-7287: -- I think you, as the author of Ukrainian. Thanks! > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343877#comment-15343877 ] Ahmet Arslan commented on LUCENE-7287: -- So, Solr field type counterpart of this analyzer would be something like: {code:xml} {code} It would be nice to add an entry for Ukranian to https://cwiki.apache.org/confluence/display/solr/Language+Analysis > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342944#comment-15342944 ] Ahmet Arslan commented on LUCENE-7287: -- Can we use this analyzer in solr? {code:xml} {code} > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > Fix For: master (7.0), 6.2 > > Attachments: LUCENE-7287.patch > > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326628#comment-15326628 ] Ahmet Arslan commented on LUCENE-7287: -- May be MappingCharFilter could be used instead of a token filter? > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8676) It's not possible to use a different log4.properties file on windows
[ https://issues.apache.org/jira/browse/SOLR-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314246#comment-15314246 ] Ahmet Arslan commented on SOLR-8676: [~mkhludnev] do you mind looking into SOLR-8445 too? > It's not possible to use a different log4.properties file on windows > > > Key: SOLR-8676 > URL: https://issues.apache.org/jira/browse/SOLR-8676 > Project: Solr > Issue Type: Bug >Affects Versions: 5.4.1 >Reporter: Kristine Jetzke >Assignee: Mikhail Khludnev > Attachments: SOLR-8676.patch, SOLR-8676.patch, verifying SOLR-8676.txt > > > It's currently not possible to change the location of the log4j.properties > file on windows. The value of {{LOG4J_CONFIG}} always gets replaced with the > default value {{server\resources\log4j.properties}}. Thus, this file inside > the server directory needs to be changed after every update. > See attached patch for a fix. Unfortunately, I couldn't figure out why > {{LOG4J_CONFIG}} was set to empty. I tested manually that logging still works > when running an example so I hope that this line is really just obsolete. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9174) After Solr 5.5, mm parameter doesn't work properly
[ https://issues.apache.org/jira/browse/SOLR-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308266#comment-15308266 ] Ahmet Arslan commented on SOLR-9174: Can someone explain why (e)dismax should honor/respect/care the {{q.op}} parameter? (e)dismax has its own parameter {{mm}} for the task. > After Solr 5.5, mm parameter doesn't work properly > -- > > Key: SOLR-9174 > URL: https://issues.apache.org/jira/browse/SOLR-9174 > Project: Solr > Issue Type: Bug > Components: query parsers, search >Affects Versions: 5.5, 6.0, 6.0.1 >Reporter: Issei Nishigata > > “mm" parameter does not work properly, when I set "q.op=AND” after Solr 5.5. > In Solr 5.4, mm parameter works expectedly with the following setting. > [schema] > {code:xml} > > > maxGramSize="2"/> > > > {code} > [request] > {quote} > http://localhost:8983/solr/collection1/select?defType=edismax=AND=2=solar > {quote} > After Solr 5.5, the result will not be the same as Solr 5.4. > [Solr 5.4] > {code:xml} > > ... > > 2 > solar > edismax > AND > > ... > > > 0 > > solr > > > > > solar > solar > > (+DisjunctionMaxQuerytext:so text:ol text:la text:ar)~2/no_coord > > +(((text:so text:ol text:la > text:ar)~2)) > ... > > {code} > [Solr 6.0.1] > {code:xml} > > ... > > 2 > solar > edismax > AND > > ... > > > solar > solar > > (+DisjunctionMaxQuery(((+text:so +text:ol +text:la +text:ar/no_coord > > +((+text:so +text:ol +text:la > +text:ar)) > ... > {code} > As shown above, parsedquery also differs from Solr 5.4 and Solr 6.0.1(after > Solr 5.5). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.
[ https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299381#comment-15299381 ] Ahmet Arslan commented on LUCENE-7287: -- This looks like a wrapper for string to string mapping. No need to roll a custom Lucene code for this: Just replace comma with tab in the {{mapping_sorted.csv}} file and use good old {{StemmerOverrideFilter}}, which has the fast lookup that does not require {{termAtt.toString()}} conversion. > New lemma-tizer plugin for ukrainian language. > -- > > Key: LUCENE-7287 > URL: https://issues.apache.org/jira/browse/LUCENE-7287 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Dmytro Hambal >Priority: Minor > Labels: analysis, language, plugin > > Hi all, > I wonder whether you are interested in supporting a plugin which provides a > mapping between ukrainian word forms and their lemmas. Some tests and docs go > out-of-the-box =) . > https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer > It's really simple but still works and generates some value for its users. > More: https://github.com/elastic/elasticsearch/issues/18303 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7148) Support boolean subset matching
[ https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227887#comment-15227887 ] Ahmet Arslan edited comment on LUCENE-7148 at 4/6/16 7:37 AM: -- bq. Perhaps you mean something like Solr's frange that filters based on the value? Exactly. Given that q=john smith, lets assume that we have a field titleLenght that stores the number of words in the field. We can even extract that info from norm doc values later on. Something like: {noformat} fq={!frange l=0 u=0 cache=false cost=200} sub(titleLength, sum(termfreq(title,'smith'), termfreq(title,'john'))) {noformat} bq. That would be O(docs) as it evaluates per doc. Cant we make this filter query executed last, with cache=false cost=150? was (Author: iorixxx): bq. Perhaps you mean something like Solr's frange that filters based on the value? Exactly. Given that q=john smith, lets assume that we have a field titleLenght that stores the number of words in the field. We can even extract that info from norm doc values later on. Something like fq={!frange l=0 u=0} sub(titleLength, sum(termfreq(title,'smith'), termfreq(title,'john'))) bq. That would be O(docs) as it evaluates per doc. Cant we make this filter query executed last, with cache=false cost=150? > Support boolean subset matching > --- > > Key: LUCENE-7148 > URL: https://issues.apache.org/jira/browse/LUCENE-7148 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Affects Versions: 5.x >Reporter: Otmar Caduff > Labels: newbie > > In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the > “minimum should match” setting on the boolean query. > Now, when querying, I want to > - (1) match the documents which either contain all the terms of the query > (Occur.MUST for all terms would do that) or, > - (2) if all terms for a given field of a document are a subset of the query > terms, that document should match as well. > Example: > Document d hast field f with terms A, B, C > Query with the following terms should match that document: > A > B > A B > A B C > A B C D > Query with the following terms should not match: > D > A B D -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7148) Support boolean subset matching
[ https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227887#comment-15227887 ] Ahmet Arslan commented on LUCENE-7148: -- bq. Perhaps you mean something like Solr's frange that filters based on the value? Exactly. Given that q=john smith, lets assume that we have a field titleLenght that stores the number of words in the field. We can even extract that info from norm doc values later on. Something like fq={!frange l=0 u=0} sub(titleLength, sum(termfreq(title,'smith'), termfreq(title,'john'))) bq. That would be O(docs) as it evaluates per doc. Cant we make this filter query executed last, with cache=false cost=150? > Support boolean subset matching > --- > > Key: LUCENE-7148 > URL: https://issues.apache.org/jira/browse/LUCENE-7148 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Affects Versions: 5.x >Reporter: Otmar Caduff > Labels: newbie > > In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the > “minimum should match” setting on the boolean query. > Now, when querying, I want to > - (1) match the documents which either contain all the terms of the query > (Occur.MUST for all terms would do that) or, > - (2) if all terms for a given field of a document are a subset of the query > terms, that document should match as well. > Example: > Document d hast field f with terms A, B, C > Query with the following terms should match that document: > A > B > A B > A B C > A B C D > Query with the following terms should not match: > D > A B D -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7148) Support boolean subset matching
[ https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227378#comment-15227378 ] Ahmet Arslan commented on LUCENE-7148: -- can't we have a function query that just returns the number of matching terms here? Then we compare it with the document length. > Support boolean subset matching > --- > > Key: LUCENE-7148 > URL: https://issues.apache.org/jira/browse/LUCENE-7148 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Affects Versions: 5.x >Reporter: Otmar Caduff > Labels: newbie > > In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the > “minimum should match” setting on the boolean query. > Now, when querying, I want to > - (1) match the documents which either contain all the terms of the query > (Occur.MUST for all terms would do that) or, > - (2) if all terms for a given field of a document are a subset of the query > terms, that document should match as well. > Example: > Document d hast field f with terms A, B, C > Query with the following terms should match that document: > A > B > A B > A B C > A B C D > Query with the following terms should not match: > D > A B D -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's
[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210008#comment-15210008 ] Ahmet Arslan commented on LUCENE-7132: -- It is really hard to decipher what is going on inside the good old TFIDFSimilarity. {code:title=TFIDFSimilarity.IDFStats.normalize|borderStyle=solid} @Override public void normalize(float queryNorm, float boost) { this.boost = boost; this.queryNorm = queryNorm; queryWeight = queryNorm * boost * idf.getValue(); value = queryWeight * idf.getValue(); // idf for document } {code} * Why query weight has a IDF multiplicand? * Why TFIDFSimilarity.IDFStats#value is set to IDF square? * Why TFIDFSimilarity.IDFStats#value is need even though we have TFIDFSimilarity.IDFStats.idf.getValue(); * TFIDFSimilarity.TFIDFSimScorer#score returns tf(freq) * IDFStats.value which looks tfxIDFxIDF to me. > ScoreDoc.score() returns a different value than that of Explanation's > - > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 5.5 >Reporter: Ahmet Arslan >Assignee: Steve Rowe > Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, > debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's
[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7132: - Component/s: core/search > ScoreDoc.score() returns a different value than that of Explanation's > - > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 5.5 >Reporter: Ahmet Arslan >Assignee: Steve Rowe > Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, > debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's
[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209155#comment-15209155 ] Ahmet Arslan commented on LUCENE-7132: -- Thanks Steve for taking care of this! > ScoreDoc.score() returns a different value than that of Explanation's > - > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 5.5 >Reporter: Ahmet Arslan >Assignee: Steve Rowe > Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, > debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's
[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7132: - Attachment: LUCENE-7132.patch Lucene only patch. Interestingly, *testExplainScoreEquality* method also failed once for me. Which can be reproduced with : {{ant test -Dtestcase=TestExplain -Dtests.method=testExplainScoreEquality -Dtests.seed=B90C674F754D524 -Dtests.locale=de -Dtests.timezone=Etc/GMT-12 -Dtests.asserts=true -Dtests.file.encoding=UTF-8}} However, *testRajeshData* method fails more frequently. > ScoreDoc.score() returns a different value than that of Explanation's > - > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 5.5 >Reporter: Ahmet Arslan >Assignee: Steve Rowe > Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, > debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's
[ https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7132: - Summary: ScoreDoc.score() returns a different value than that of Explanation's (was: fl=score returns a different value than that of Explain's) > ScoreDoc.score() returns a different value than that of Explanation's > - > > Key: LUCENE-7132 > URL: https://issues.apache.org/jira/browse/LUCENE-7132 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 5.5 >Reporter: Ahmet Arslan >Assignee: Steve Rowe > Attachments: SOLR-8884.patch, SOLR-8884.patch, debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8884) fl=score returns a different value than that of Explain's
[ https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208940#comment-15208940 ] Ahmet Arslan commented on SOLR-8884: Can someone who have the appropriate permissions please move SOLR-8884 to LUCENE-? > fl=score returns a different value than that of Explain's > - > > Key: SOLR-8884 > URL: https://issues.apache.org/jira/browse/SOLR-8884 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 5.5 >Reporter: Ahmet Arslan > Attachments: SOLR-8884.patch, SOLR-8884.patch, debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8884) fl=score returns a different value than that of Explain's
[ https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8884: --- Attachment: SOLR-8884.patch This is truly a Lucene level bug. Attached path includes a failing test case. It can be reproduced with: {{ant test -Dtestcase=TestExplain -Dtests.method=testRajeshData -Dtests.seed=D5E55A7E84F4C82C -Dtests.slow=true -Dtests.locale=es-HN -Dtests.timezone=Asia/Samarkand -Dtests.asserts=true -Dtests.file.encoding=UTF-8}} > fl=score returns a different value than that of Explain's > - > > Key: SOLR-8884 > URL: https://issues.apache.org/jira/browse/SOLR-8884 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 5.5 >Reporter: Ahmet Arslan > Attachments: SOLR-8884.patch, SOLR-8884.patch, debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8884) fl=score returns a different value than that of Explain's
[ https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8884: --- Attachment: SOLR-8884.patch Randomized test case for Lucene in hopes that it will trigger sometime. Will try to write Solr counterpart. > fl=score returns a different value than that of Explain's > - > > Key: SOLR-8884 > URL: https://issues.apache.org/jira/browse/SOLR-8884 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 5.5 >Reporter: Ahmet Arslan > Attachments: SOLR-8884.patch, debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8884) fl=score returns a different value than that of Explain's
[ https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8884: --- Attachment: debug.xml There is the Rajesh's response file that demonstrates the problem. > fl=score returns a different value than that of Explain's > - > > Key: SOLR-8884 > URL: https://issues.apache.org/jira/browse/SOLR-8884 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 5.5 >Reporter: Ahmet Arslan > Attachments: debug.xml > > > Some of the folks > [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes > explain's score can be different than the score requested by fields > parameter. Interestingly, Explain's scores would create a different ranking > than the original result list. This is something users experience, but it > cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8884) fl=score returns a different value than that of Explain's
Ahmet Arslan created SOLR-8884: -- Summary: fl=score returns a different value than that of Explain's Key: SOLR-8884 URL: https://issues.apache.org/jira/browse/SOLR-8884 Project: Solr Issue Type: Bug Components: search Affects Versions: 5.5 Reporter: Ahmet Arslan Some of the folks [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes explain's score can be different than the score requested by fields parameter. Interestingly, Explain's scores would create a different ranking than the original result list. This is something users experience, but it cannot be re-produced deterministically. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7014) Use TimeUnit.TARGETUNIT.convert() to convert between time units
[ https://issues.apache.org/jira/browse/LUCENE-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7014: - Attachment: LUCENE-7014.patch I started to incorporate suggested changes. This patch includes only {{org.apache.lucene.index.*}} files. For three digits, I switched to milliseconds. However, I rounded {{%.1f}}. Is this reasonable in terms of precision loss? May be we should not touch these cases? > Use TimeUnit.TARGETUNIT.convert() to convert between time units > --- > > Key: LUCENE-7014 > URL: https://issues.apache.org/jira/browse/LUCENE-7014 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master, 5.4.1 >Reporter: Ahmet Arslan >Priority: Minor > Fix For: 5.5, master, 6.0 > > Attachments: LUCENE-7014.patch, LUCENE-7014.patch > > > Re-phrased from [~steve_rowe]'s > [comment|https://issues.apache.org/jira/browse/LUCENE-6823?focusedCommentId=14941283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941283] > : > System.nanoTime(), which is guaranteed to be monotonic, is now used to > recored elapsed times. In several places, conversion from nano seconds to > some target unit (e.g. seconds, milli seconds) is performed using hard-coded > conversion constants, which is prone to mistakes. > It would be nice to use {{TimeUnit.TARGETUNIT.convert(sourceDuration, > TimeUnit.SOURCEUNIT)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7014) Use TimeUnit.TARGETUNIT.convert() to convert between time units
[ https://issues.apache.org/jira/browse/LUCENE-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-7014: - Attachment: LUCENE-7014.patch > Use TimeUnit.TARGETUNIT.convert() to convert between time units > --- > > Key: LUCENE-7014 > URL: https://issues.apache.org/jira/browse/LUCENE-7014 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master, 5.4.1 >Reporter: Ahmet Arslan >Priority: Minor > Fix For: 5.5, master, 6.0 > > Attachments: LUCENE-7014.patch > > > Re-phrased from [~steve_rowe]'s > [comment|https://issues.apache.org/jira/browse/LUCENE-6823?focusedCommentId=14941283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941283] > : > System.nanoTime(), which is guaranteed to be monotonic, is now used to > recored elapsed times. In several places, conversion from nano seconds to > some target unit (e.g. seconds, milli seconds) is performed using hard-coded > conversion constants, which is prone to mistakes. > It would be nice to use {{TimeUnit.TARGETUNIT.convert(sourceDuration, > TimeUnit.SOURCEUNIT)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7014) Use TimeUnit.TARGETUNIT.convert() to convert between time units
Ahmet Arslan created LUCENE-7014: Summary: Use TimeUnit.TARGETUNIT.convert() to convert between time units Key: LUCENE-7014 URL: https://issues.apache.org/jira/browse/LUCENE-7014 Project: Lucene - Core Issue Type: Improvement Affects Versions: 5.4.1, master Reporter: Ahmet Arslan Priority: Minor Fix For: 5.5, master, 6.0 Re-phrased from [~steve_rowe]'s [comment|https://issues.apache.org/jira/browse/LUCENE-6823?focusedCommentId=14941283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941283] : System.nanoTime(), which is guaranteed to be monotonic, is now used to recored elapsed times. In several places, conversion from nano seconds to some target unit (e.g. seconds, milli seconds) is performed using hard-coded conversion constants, which is prone to mistakes. It would be nice to use {{TimeUnit.TARGETUNIT.convert(sourceDuration, TimeUnit.SOURCEUNIT)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files
[ https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8445: --- Fix Version/s: 5.5 > fix line separator in log4j.properties files > > > Key: SOLR-8445 > URL: https://issues.apache.org/jira/browse/SOLR-8445 > Project: Solr > Issue Type: Bug > Components: Server >Affects Versions: 5.4, Trunk >Reporter: Ahmet Arslan >Priority: Trivial > Labels: log4j, logging > Fix For: 5.5, Trunk > > Attachments: SOLR-8445.patch, SOLR-8445.patch > > > new line is mistyped in conversion pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory
[ https://issues.apache.org/jira/browse/SOLR-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8570: --- Fix Version/s: 5.5 > Make discountOverlaps' initialization value consistent across subclasses of > SimilarityFactory > -- > > Key: SOLR-8570 > URL: https://issues.apache.org/jira/browse/SOLR-8570 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.4 >Reporter: Ahmet Arslan >Priority: Minor > Labels: similarity > Fix For: 5.5, Trunk > > Attachments: SOLR-8570.patch, SOLR-8570.patch > > > Subclasses of SimilarityFactory have a member variable named > {{discountOverlaps}}. > In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561. > Since discountOverlaps' default value is true, we should do the same in > remaining subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files
[ https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8445: --- Attachment: SOLR-8445.patch Patch generated by {{git diff}}. > fix line separator in log4j.properties files > > > Key: SOLR-8445 > URL: https://issues.apache.org/jira/browse/SOLR-8445 > Project: Solr > Issue Type: Bug > Components: Server >Affects Versions: 5.4, Trunk >Reporter: Ahmet Arslan >Priority: Trivial > Labels: log4j, logging > Fix For: Trunk > > Attachments: SOLR-8445.patch, SOLR-8445.patch > > > new line is mistyped in conversion pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory
[ https://issues.apache.org/jira/browse/SOLR-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8570: --- Attachment: SOLR-8570.patch Patch generated by {{git diff origin/master..SOLR-8570}} . > Make discountOverlaps' initialization value consistent across subclasses of > SimilarityFactory > -- > > Key: SOLR-8570 > URL: https://issues.apache.org/jira/browse/SOLR-8570 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.4 >Reporter: Ahmet Arslan >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: SOLR-8570.patch, SOLR-8570.patch > > > Subclasses of SimilarityFactory have a member variable named > {{discountOverlaps}}. > In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561. > Since discountOverlaps' default value is true, we should do the same in > remaining subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files
[ https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8445: --- Fix Version/s: Trunk > fix line separator in log4j.properties files > > > Key: SOLR-8445 > URL: https://issues.apache.org/jira/browse/SOLR-8445 > Project: Solr > Issue Type: Bug > Components: Server >Affects Versions: 5.4, Trunk >Reporter: Ahmet Arslan >Priority: Trivial > Labels: log4j, logging > Fix For: Trunk > > Attachments: SOLR-8445.patch > > > new line is mistyped in conversion pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files
[ https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8445: --- Labels: log4j logging (was: ) > fix line separator in log4j.properties files > > > Key: SOLR-8445 > URL: https://issues.apache.org/jira/browse/SOLR-8445 > Project: Solr > Issue Type: Bug > Components: Server >Affects Versions: 5.4, Trunk >Reporter: Ahmet Arslan >Priority: Trivial > Labels: log4j, logging > Fix For: Trunk > > Attachments: SOLR-8445.patch > > > new line is mistyped in conversion pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory
[ https://issues.apache.org/jira/browse/SOLR-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8570: --- Attachment: SOLR-8570.patch I had this patch handy. However, does moving {{protected boolean discountOverlaps = true;}} into the SimilarityFactory breaks any good practices? > Make discountOverlaps' initialization value consistent across subclasses of > SimilarityFactory > -- > > Key: SOLR-8570 > URL: https://issues.apache.org/jira/browse/SOLR-8570 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.4 >Reporter: Ahmet Arslan >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: SOLR-8570.patch > > > Subclasses of SimilarityFactory have a member variable named > {{discountOverlaps}}. > In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561. > Since discountOverlaps' default value is true, we should do the same in > remaining subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory
Ahmet Arslan created SOLR-8570: -- Summary: Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory Key: SOLR-8570 URL: https://issues.apache.org/jira/browse/SOLR-8570 Project: Solr Issue Type: Improvement Affects Versions: 5.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Trunk Subclasses of SimilarityFactory have a member variable named {{discountOverlaps}}. In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561. Since discountOverlaps' default value is true, we should do the same in remaining subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106999#comment-15106999 ] Ahmet Arslan commented on LUCENE-6818: -- Thanks [~rcmuir] for taking care of this. bq. For the solr factory changes around discountOverlaps, can you make a separate issue for that? Created SOLR-8570 > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Assignee: Robert Muir >Priority: Minor > Labels: similarity > Fix For: 5.5, Trunk > > Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, > LUCENE-6818.patch, LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-839) XML Query Parser support (deftype=xmlparser)
[ https://issues.apache.org/jira/browse/SOLR-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-839: -- Attachment: SOLR-839.patch This patch replaces utf8 constant string with StandardCharsets.UTF_8 as suggested by LUCENE-5560 > XML Query Parser support (deftype=xmlparser) > > > Key: SOLR-839 > URL: https://issues.apache.org/jira/browse/SOLR-839 > Project: Solr > Issue Type: New Feature > Components: query parsers >Affects Versions: 1.3, 5.4, Trunk >Reporter: Erik Hatcher >Assignee: Christine Poerschke >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-839-object-parser.patch, SOLR-839.patch, > SOLR-839.patch, SOLR-839.patch, lucene-xml-query-parser-2.4-dev.jar > > > Lucene contrib includes a query parser that is able to create the > full-spectrum of Lucene queries, using an XML data structure. > This patch adds "xml" query parser support to Solr. > Example (from > {{lucene/queryparser/src/test/org/apache/lucene/queryparser/xml/NestedBooleanQuery.xml}}): > {code} > > > > > > doesNotExistButShouldBeOKBecauseOtherClauseExists > > > > > bank > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files
[ https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-8445: --- Attachment: SOLR-8445.patch > fix line separator in log4j.properties files > > > Key: SOLR-8445 > URL: https://issues.apache.org/jira/browse/SOLR-8445 > Project: Solr > Issue Type: Bug > Components: Server >Affects Versions: 5.4, Trunk >Reporter: Ahmet Arslan >Priority: Trivial > Attachments: SOLR-8445.patch > > > new line is mistyped in conversion pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8445) fix line separator in log4j.properties files
Ahmet Arslan created SOLR-8445: -- Summary: fix line separator in log4j.properties files Key: SOLR-8445 URL: https://issues.apache.org/jira/browse/SOLR-8445 Project: Solr Issue Type: Bug Components: Server Affects Versions: 5.4, Trunk Reporter: Ahmet Arslan Priority: Trivial new line is mistyped in conversion pattern -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-2649: --- Priority: Major (was: Minor) > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Bug > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038596#comment-15038596 ] Ahmet Arslan commented on SOLR-2649: bq. do we have a consensus on what the new behavior should be? I think Jan's [proposal | https://issues.apache.org/jira/browse/SOLR-2649?focusedCommentId=13199400=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13199400] Personally, I understand that edismax is initially designed for power users who supposed to know what they are looking for. However, this assumption looks too strong given that wide range of users started to use edismax. > MM ignored in edismax queries with operators > > > Key: SOLR-2649 > URL: https://issues.apache.org/jira/browse/SOLR-2649 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Magnus Bergmark >Assignee: Erick Erickson > Fix For: 4.9, Trunk > > Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, > SOLR-2649.diff, SOLR-2649.patch > > > Hypothetical scenario: > 1. User searches for "stocks oil gold" with MM set to "50%" > 2. User adds "-stockings" to the query: "stocks oil gold -stockings" > 3. User gets no hits since MM was ignored and all terms where AND-ed > together > The behavior seems to be intentional, although the reason why is never > explained: > // For correct lucene queries, turn off mm processing if there > // were explicit operators (except for AND). > boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; > (lines 232-234 taken from > tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) > This makes edismax unsuitable as an replacement to dismax; mm is one of the > primary features of dismax. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8339) SolrDocument and SolrInputDocument should have a common interface
[ https://issues.apache.org/jira/browse/SOLR-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035846#comment-15035846 ] Ahmet Arslan commented on SOLR-8339: With this change, can we remove {{org.apache.solr.client.solrj.util.ClientUtils#toSolrInputDocument}} method? And {{org.apache.solr.client.solrj.util.ClientUtils#toSolrDocument}} ? > SolrDocument and SolrInputDocument should have a common interface > - > > Key: SOLR-8339 > URL: https://issues.apache.org/jira/browse/SOLR-8339 > Project: Solr > Issue Type: Bug >Reporter: Ishan Chattopadhyaya > Attachments: SOLR-8339.patch, SOLR-8339.patch, SOLR-8339.patch > > > Currently, both share a Map interface (SOLR-928). However, there are many > common methods like createField(), setField() etc. that should perhaps go > into an interface/abstract class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8345) Wrong query parsing
[ https://issues.apache.org/jira/browse/SOLR-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029175#comment-15029175 ] Ahmet Arslan commented on SOLR-8345: Saar, this is not a bug. Minus is a special query parser character. There are other query parsers to query those special characters. For example, terms query parser or raw query parser. Alternatively you can escape those special characters. Please rise your questions on the solr mailing list. > Wrong query parsing > --- > > Key: SOLR-8345 > URL: https://issues.apache.org/jira/browse/SOLR-8345 > Project: Solr > Issue Type: Bug >Reporter: Saar Carmi >Priority: Minor > > When sending a query for a numeirc field with =myfield:(-1) the query is > parsed as -myfield:1 > I would expect it to be either parsed as myfield:"-1" or an exception to be > returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6818: - Attachment: LUCENE-6818.patch Patch updated to current trunk (revision 1713433) > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Assignee: Robert Muir >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, > LUCENE-6818.patch, LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6818: - Attachment: LUCENE-6818.patch I tried to implement Robert's suggestion at {{TestSimilarityBase#testCrazyIndexTimeBoosts}} It iterates over all possible norm values and 10 different term frequency _tf_ values. NaN, Infinity, Negative values are checked. But I am note sure about the Negative. Some models can return negative scores for certain terms. For example BM25 returns negative scores for common terms. Currently only DFI is tested. Because other models make fail the test in its current form. Some random question: What is the preferred course of action during scoring when term frequency is greater than document length? I think we should simply recommend to use index time boosts only with ClassicSimilarity. I wonder how SweetSpotSimilarity works with index time boosts, where artificially shortening the document length may decrease its rank. > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Assignee: Robert Muir >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, > LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6818: - Attachment: LUCENE-6818.patch * renamed failing test to {{TestSimilarityBase#testIndexTimeBoost}} * randomized the test method a bit > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Assignee: Robert Muir >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6818: - Attachment: LUCENE-6818.patch This patch prevents infinity score by using +1 trick. Now {{TestSimilarity2#testCrazySpans}} passes. > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Assignee: Robert Muir >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: LUCENE-6818.patch, LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910174#comment-14910174 ] Ahmet Arslan commented on LUCENE-6818: -- bq. The typical solution is to do something like adjust expected: Thanks Robert for the suggestion and explanation. Used the typical solution, its working now. bq. I have not read the paper, but these are things to deal with when integrating into lucene. For your information, if you want to look at, Terrier 4.0 source tree has this model in DFIC.java bq. index-time boosts work on the norm, by making the document appear shorter or longer, so docLen might have a "crazy" value if the user does this. I was relying {{o.a.l.search.similarities.SimilarityBase}} for this but it looks like all of its subclasses (DFR, IB) have this problem. I included {{TestSimilarityBase#testNorms}} method in the new patch to demonstrate the problem. If I am not missing something obvious this is a bug, no? > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Assignee: Robert Muir >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: LUCENE-6818.patch, LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6817) ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in toString()
[ https://issues.apache.org/jira/browse/LUCENE-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6817: - Attachment: LUCENE-6817.patch > ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in > toString() > --- > > Key: LUCENE-6817 > URL: https://issues.apache.org/jira/browse/LUCENE-6817 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Trivial > Fix For: Trunk > > Attachments: LUCENE-6817.patch, LUCENE-6817.patch > > > This one is quite simple (I think) -- ComplexPhraseQuery doesn't display the > slop factor which, when the result of parsing is dumped to logs, for example, > can be confusing. > I'm heading for a weekend out of office in a few hours... so in the spirit of > not committing and running away ( :) ), if anybody wishes to tackle this, go > ahead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6817) ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in toString()
[ https://issues.apache.org/jira/browse/LUCENE-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6817: - Attachment: LUCENE-6817.patch > ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in > toString() > --- > > Key: LUCENE-6817 > URL: https://issues.apache.org/jira/browse/LUCENE-6817 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Priority: Trivial > Fix For: Trunk > > Attachments: LUCENE-6817.patch > > > This one is quite simple (I think) -- ComplexPhraseQuery doesn't display the > slop factor which, when the result of parsing is dumped to logs, for example, > can be confusing. > I'm heading for a weekend out of office in a few hours... so in the spirit of > not committing and running away ( :) ), if anybody wishes to tackle this, go > ahead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
Ahmet Arslan created LUCENE-6818: Summary: Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr Key: LUCENE-6818 URL: https://issues.apache.org/jira/browse/LUCENE-6818 Project: Lucene - Core Issue Type: New Feature Components: core/query/scoring Affects Versions: 5.3 Reporter: Ahmet Arslan Priority: Minor Fix For: Trunk As explained in the [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many state-of-the-art ranking model implementations are added to Apache Lucene. This issue aims to include DFI model, which is the non-parametric counterpart of the Divergence from Randomness (DFR) framework. DFI is both parameter-free and non-parametric: * parameter-free: it does not require any parameter tuning or training. * non-parametric: it does not make any assumptions about word frequency distributions on document collections. It is highly recommended *not* to remove stopwords (very common terms: the, of, and, to, a, in, for, is, on, that, etc) with this similarity. For more information see: [A nonparametric term weighting method for information retrieval based on measuring the divergence from independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
[ https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6818: - Attachment: LUCENE-6818.patch Patch for DFI. However, with this one {{TestSimilarity2#testCrazySpans}} fails. Any pointers how to fix this will be really appreciated. > Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr > -- > > Key: LUCENE-6818 > URL: https://issues.apache.org/jira/browse/LUCENE-6818 > Project: Lucene - Core > Issue Type: New Feature > Components: core/query/scoring >Affects Versions: 5.3 >Reporter: Ahmet Arslan >Priority: Minor > Labels: similarity > Fix For: Trunk > > Attachments: LUCENE-6818.patch > > > As explained in the > [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many > state-of-the-art ranking model implementations are added to Apache Lucene. > This issue aims to include DFI model, which is the non-parametric counterpart > of the Divergence from Randomness (DFR) framework. > DFI is both parameter-free and non-parametric: > * parameter-free: it does not require any parameter tuning or training. > * non-parametric: it does not make any assumptions about word frequency > distributions on document collections. > It is highly recommended *not* to remove stopwords (very common terms: the, > of, and, to, a, in, for, is, on, that, etc) with this similarity. > For more information see: [A nonparametric term weighting method for > information retrieval based on measuring the divergence from > independence|http://dx.doi.org/10.1007/s10791-013-9225-4] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6730) Hyper-parameter c is ignored in term frequency NormalizationH1
Ahmet Arslan created LUCENE-6730: Summary: Hyper-parameter c is ignored in term frequency NormalizationH1 Key: LUCENE-6730 URL: https://issues.apache.org/jira/browse/LUCENE-6730 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.2.1 Reporter: Ahmet Arslan Priority: Minor Fix For: 5.4 Unlike {{NormalizationH2}}, *c* parameter is not used in term frequency calculation in {{NormalizationH1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6730) Hyper-parameter c is ignored in term frequency NormalizationH1
[ https://issues.apache.org/jira/browse/LUCENE-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6730: - Attachment: LUCENE-6730.patch Hyper-parameter c is ignored in term frequency NormalizationH1 -- Key: LUCENE-6730 URL: https://issues.apache.org/jira/browse/LUCENE-6730 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.2.1 Reporter: Ahmet Arslan Priority: Minor Labels: similarity Fix For: 5.4 Attachments: LUCENE-6730.patch Unlike {{NormalizationH2}}, *c* parameter is not used in term frequency calculation in {{NormalizationH1}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase
[ https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6711: - Attachment: LUCENE-6711.patch Patch that includes following migrate entry. But I am not sure this is an appropriate text for migrate.txt. {panel:title=The way how number of document calculated is changed (LUCENE-6711)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} The number of documents (docCount) is used to calculate term specificity (idf) and average document length (avdl). Prior to LUCENE-6711, collectionStats.maxDoc() was used for the statistics. Now, collectionStats.docCount() is used whenever possible, if not maxDocs() is used. Assume that a collection contains 100 documents, and 50 of them have keywords field. In this example, maxDocs is 100 while docCount is 50 for the keywords field. The total number of tokens for keywords field is divided by docCount to obtain avdl. Therefore, docCount which is the total number of documents that have at least one term for the field, is a more precise metric for optional fields. DefaultSimilarity does not leverage avdl, so this change would have relatively minor change in the result list. Because relative idf values of terms will remain same. However, when combined with other factors such as term frequency, relative ranking of documents could change. Some Similarity implementations (such as the ones instantiated with NormalizationH2 and BM25) take account into avdl and would have notable change in ranked list. Especially if you have a collection of documents with varying lengths. Because NormalizationH2 tends to punish documents longer than avdl. {panel} Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase --- Key: LUCENE-6711 URL: https://issues.apache.org/jira/browse/LUCENE-6711 Project: Lucene - Core Issue Type: Bug Components: core/search Reporter: Ahmet Arslan Priority: Minor Fix For: Trunk Attachments: LUCENE-6711.patch, LUCENE-6711.patch, LUCENE-6711.patch, LUCENE-6711.patch {{SimilarityBase.java}} has the following line : {code} long numberOfDocuments = collectionStats.maxDoc(); {code} It seems like {{collectionStats.docCount()}}, which returns the total number of documents that have at least one term for this field, is more appropriate statistics here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase
[ https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6711: - Attachment: LUCENE-6711.patch Includes changes to TFIDF and BM25, {{ant clean test}} passes. Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase --- Key: LUCENE-6711 URL: https://issues.apache.org/jira/browse/LUCENE-6711 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.2.1 Reporter: Ahmet Arslan Priority: Minor Fix For: 5.3 Attachments: LUCENE-6711.patch, LUCENE-6711.patch, LUCENE-6711.patch {{SimilarityBase.java}} has the following line : {code} long numberOfDocuments = collectionStats.maxDoc(); {code} It seems like {{collectionStats.docCount()}}, which returns the total number of documents that have at least one term for this field, is more appropriate statistics here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase
[ https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6711: - Attachment: LUCENE-6711.patch This patch checks for -1 and uses maxDoc() if docCount() is not unsupported. Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase --- Key: LUCENE-6711 URL: https://issues.apache.org/jira/browse/LUCENE-6711 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.2.1 Reporter: Ahmet Arslan Priority: Minor Fix For: 5.3 Attachments: LUCENE-6711.patch, LUCENE-6711.patch {{SimilarityBase.java}} has the following line : {code} long numberOfDocuments = collectionStats.maxDoc(); {code} It seems like {{collectionStats.docCount()}}, which returns the total number of documents that have at least one term for this field, is more appropriate statistics here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase
[ https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651758#comment-14651758 ] Ahmet Arslan commented on LUCENE-6711: -- bq. We should fix TFIDFSimilarity and BM25Similarity too. For TFIDF and BM25, do we simply replace {code}collectionStats.maxDoc(){code} with {code}collectionStats.docCount() == -1 ? collectionStats.maxDoc() : collectionStats.docCount(){code} ? Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase --- Key: LUCENE-6711 URL: https://issues.apache.org/jira/browse/LUCENE-6711 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.2.1 Reporter: Ahmet Arslan Priority: Minor Fix For: 5.3 Attachments: LUCENE-6711.patch, LUCENE-6711.patch {{SimilarityBase.java}} has the following line : {code} long numberOfDocuments = collectionStats.maxDoc(); {code} It seems like {{collectionStats.docCount()}}, which returns the total number of documents that have at least one term for this field, is more appropriate statistics here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase
[ https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated LUCENE-6711: - Attachment: LUCENE-6711.patch Patch that includes suggested change. However, this breaks most of the tests in {{TestSimilarityBase}}. What is the preferred course of action here? Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase --- Key: LUCENE-6711 URL: https://issues.apache.org/jira/browse/LUCENE-6711 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.2.1 Reporter: Ahmet Arslan Priority: Minor Fix For: 5.3 Attachments: LUCENE-6711.patch {{SimilarityBase.java}} has the following line : {code} long numberOfDocuments = collectionStats.maxDoc(); {code} It seems like {{collectionStats.docCount()}}, which returns the total number of documents that have at least one term for this field, is more appropriate statistics here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org