[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0

2018-02-08 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357783#comment-16357783
 ] 

Ahmet Arslan commented on SOLR-10308:
-

{quote}I don't know if it was correct to assume UTF-8 on the hashString usages
{quote}
I believe it should be as the following. I had confirmed it before in SOLR-11260
 * HashFunction.hashString -> HashFunction.hashUnencodedChars

> Solr fails to work with Guava 21.0
> --
>
> Key: SOLR-10308
> URL: https://issues.apache.org/jira/browse/SOLR-10308
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Vincent Massol
>Priority: Major
> Attachments: SOLR-10308.patch
>
>
> This is what we get:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>   at 
> org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
>   at 
> org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117)
>   at 
> org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122)
>   at 
> org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72)
>   at 
> org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67)
>   at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287)
>   at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384)
>   at 
> org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173)
>   ... 183 more
> {noformat}
> Guava 21 has removed some signature that solr is currently using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0

2018-01-16 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327479#comment-16327479
 ] 

Ahmet Arslan commented on SOLR-10308:
-

Unfortunately, the patch affects the following hdfs test cases.
{noformat}
   [junit4] HEARTBEAT J1 PID(26866@...): 2018-01-16T19:27:40, stalled for 7988s 
at: HdfsNNFailoverTest (suite)
   [junit4] HEARTBEAT J2 PID(26860@...): 2018-01-16T19:27:40, stalled for 8591s 
at: StressHdfsTest (suite)
   [junit4] HEARTBEAT J0 PID(26869@...): 2018-01-16T19:28:04, stalled for 7141s 
at: MoveReplicaHDFSFailoverTest (suite)
   [junit4] HEARTBEAT J3 PID(26880@...): 2018-01-16T19:28:04, stalled for 8352s 
at: CheckHdfsIndexTest (suite)
   [junit4] HEARTBEAT J1 PID(26866@...): 2018-01-16T19:28:40, stalled for 8048s 
at: HdfsNNFailoverTest (suite)
   [junit4] HEARTBEAT J2 PID(26860@...): 2018-01-16T19:28:40, stalled for 8651s 
at: StressHdfsTest (suite)
   [junit4] HEARTBEAT J0 PID(26869@...): 2018-01-16T19:29:04, stalled for 7201s 
at: MoveReplicaHDFSFailoverTest (suite)
   [junit4] HEARTBEAT J3 PID(26880@...): 2018-01-16T19:29:04, stalled for 8412s 
at: CheckHdfsIndexTest (suite)
{noformat}

> Solr fails to work with Guava 21.0
> --
>
> Key: SOLR-10308
> URL: https://issues.apache.org/jira/browse/SOLR-10308
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Vincent Massol
>Priority: Major
>
> This is what we get:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>   at 
> org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
>   at 
> org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117)
>   at 
> org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122)
>   at 
> org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72)
>   at 
> org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67)
>   at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287)
>   at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384)
>   at 
> org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173)
>   ... 183 more
> {noformat}
> Guava 21 has removed some signature that solr is currently using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0

2018-01-16 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327071#comment-16327071
 ] 

Ahmet Arslan commented on SOLR-10308:
-

{quote} relocate the version of guava used{quote}
[~mdrob] can you please provide some context/pointers? I am not familiar with 
the relocate concept.

> Solr fails to work with Guava 21.0
> --
>
> Key: SOLR-10308
> URL: https://issues.apache.org/jira/browse/SOLR-10308
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Vincent Massol
>Priority: Major
>
> This is what we get:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>   at 
> org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
>   at 
> org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117)
>   at 
> org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122)
>   at 
> org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72)
>   at 
> org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67)
>   at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287)
>   at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384)
>   at 
> org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173)
>   ... 183 more
> {noformat}
> Guava 21 has removed some signature that solr is currently using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0

2018-01-16 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327068#comment-16327068
 ] 

Ahmet Arslan commented on SOLR-10308:
-

{{ant precommit}} passed. Running {{ant test}} now but it looks like it is 
hanged:

{noformat}
   [junit4] HEARTBEAT J0 PID(2348@...): 2018-01-16T15:19:55, stalled for 28263s 
at: HdfsDirectoryTest (suite)
   [junit4] HEARTBEAT J0 PID(2348@...): 2018-01-16T15:20:55, stalled for 28323s 
at: HdfsDirectoryTest (suite)
   [junit4] HEARTBEAT J0 PID(2348@...): 2018-01-16T15:21:55, stalled for 28383s 
at: HdfsDirectoryTest (suite)
{noformat}

I am not sure whether it is due to guava update or not. I will try to figure 
out.

> Solr fails to work with Guava 21.0
> --
>
> Key: SOLR-10308
> URL: https://issues.apache.org/jira/browse/SOLR-10308
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Vincent Massol
>Priority: Major
>
> This is what we get:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>   at 
> org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
>   at 
> org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117)
>   at 
> org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122)
>   at 
> org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72)
>   at 
> org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67)
>   at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287)
>   at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384)
>   at 
> org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173)
>   ... 183 more
> {noformat}
> Guava 21 has removed some signature that solr is currently using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10308) Solr fails to work with Guava 21.0

2018-01-15 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326657#comment-16326657
 ] 

Ahmet Arslan commented on SOLR-10308:
-

{quote}Fixing the three guava usages in Solr code that are incompatible with 
version 21 should be pretty easy
{quote}
I have a patch in SOLR-11260 for this. I needed this for another reason: to be 
able to use a third party NLP library for my [Turkish analysis 
plugin|https://github.com/iorixxx/lucene-solr-analysis-turkish]. A drop-in 
upgrade of Guava breaks highlighting. I patched solr 6.6.0 and using it in a 
production-like environment. Does this patch solve [~vmassol]'s problem? Does 
it break any existing functionality?

> Solr fails to work with Guava 21.0
> --
>
> Key: SOLR-10308
> URL: https://issues.apache.org/jira/browse/SOLR-10308
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Vincent Massol
>Priority: Major
>
> This is what we get:
> {noformat}
> Caused by: java.lang.NoSuchMethodError: 
> com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>   at 
> org.apache.solr.handler.component.HighlightComponent.prepare(HighlightComponent.java:118)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:166)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2299)
>   at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:178)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
>   at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
>   at 
> org.xwiki.search.solr.internal.AbstractSolrInstance.query(AbstractSolrInstance.java:117)
>   at 
> org.xwiki.query.solr.internal.SolrQueryExecutor.execute(SolrQueryExecutor.java:122)
>   at 
> org.xwiki.query.internal.DefaultQueryExecutorManager.execute(DefaultQueryExecutorManager.java:72)
>   at 
> org.xwiki.query.internal.SecureQueryExecutorManager.execute(SecureQueryExecutorManager.java:67)
>   at org.xwiki.query.internal.DefaultQuery.execute(DefaultQuery.java:287)
>   at org.xwiki.query.internal.ScriptQuery.execute(ScriptQuery.java:237)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.doInvoke(UberspectImpl.java:395)
>   at 
> org.apache.velocity.util.introspection.UberspectImpl$VelMethodImpl.invoke(UberspectImpl.java:384)
>   at 
> org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:173)
>   ... 183 more
> {noformat}
> Guava 21 has removed some signature that solr is currently using.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-11260) Update Guava to 23.0

2018-01-11 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan reassigned SOLR-11260:
---

Assignee: Ahmet Arslan

> Update Guava to 23.0
> 
>
> Key: SOLR-11260
> URL: https://issues.apache.org/jira/browse/SOLR-11260
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6
>Reporter: Ahmet Arslan
>Assignee: Ahmet Arslan
>Priority: Minor
> Fix For: master (8.0)
>
> Attachments: SOLR-11260.patch
>
>
> Solr 6.6.0 depends on a pretty old version of guava.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11260) Update Guava to 23.0

2017-08-19 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-11260:

Attachment: SOLR-11260.patch

Patch replaces two methods that are removed in Guava 23.0
* Objects.firstNonNull -> MoreObjects.firstNonNull
* HashFunction.hashString -> HashFunction.hashUnencodedChars

> Update Guava to 23.0
> 
>
> Key: SOLR-11260
> URL: https://issues.apache.org/jira/browse/SOLR-11260
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6
>Reporter: Ahmet Arslan
>Priority: Minor
> Fix For: master (8.0)
>
> Attachments: SOLR-11260.patch
>
>
> Solr 6.6.0 depends on a pretty old version of guava.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11260) Update Guava to 23.0

2017-08-19 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created SOLR-11260:
---

 Summary: Update Guava to 23.0
 Key: SOLR-11260
 URL: https://issues.apache.org/jira/browse/SOLR-11260
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 6.6
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: master (8.0)


Solr 6.6.0 depends on a pretty old version of guava.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2017-07-03 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

Fix TestStopFilterFactory and TestSuggestStopFilterFactory failure

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: 7.0
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, 
> LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, 
> LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer

2017-07-03 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072994#comment-16072994
 ] 

Ahmet Arslan commented on LUCENE-7773:
--

Can someone please look into this issue? This issue addresses a 
[TODO|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java#L43]
 introduced by [~rcmuir] in 
[this|https://github.com/apache/lucene-solr/commit/bc3a3dc5d47af0c00748468b1ae14b4a18854366]
 commit.

> Remove unused/deprecated token types from StandardTokenizer
> ---
>
> Key: LUCENE-7773
> URL: https://issues.apache.org/jira/browse/LUCENE-7773
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.5
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: analyzers
> Fix For: 7.0
>
> Attachments: LUCENE-7773.patch, LUCENE-7773.patch
>
>
> StandardTokenizer does not recognize e-mail, company etc. This issue removes 
> those token types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2017-07-03 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

Sorry for the huge delay. This patch addresses the issues raised by David.

* consumeAllTokens is used by LimitTokenOffset and LimitTokenPosition too.
* applies Yonik's concept
* improved javadoc. Some arguments are difficult since they have different 
meanings in different components.
* covers a few more overlooked analysis factories
* spotted a copy-paste mistake

Any feedback is appreciated.

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: 7.0
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, 
> LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2017-04-13 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

Finally {{ant precommit}} passes with this patch. It checks missing javadocs 
using *level=package* for icu, morfologik, phonetic, and suggest.

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, 
> LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2017-04-13 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

bring the patch to master. 

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, 
> LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer

2017-04-09 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7773:
-
Attachment: LUCENE-7773.patch

Make the {{TestAnalyzers}} compile again.

> Remove unused/deprecated token types from StandardTokenizer
> ---
>
> Key: LUCENE-7773
> URL: https://issues.apache.org/jira/browse/LUCENE-7773
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.5
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: analyzers
> Fix For: master (7.0)
>
> Attachments: LUCENE-7773.patch, LUCENE-7773.patch
>
>
> StandardTokenizer does not recognize e-mail, company etc. This issue removes 
> those token types.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer

2017-04-09 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7773:
-
Attachment: LUCENE-7773.patch

This patch removes old types.

> Remove unused/deprecated token types from StandardTokenizer
> ---
>
> Key: LUCENE-7773
> URL: https://issues.apache.org/jira/browse/LUCENE-7773
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.5
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: analyzers
> Fix For: master (7.0)
>
> Attachments: LUCENE-7773.patch
>
>
> StandardTokenizer does not recognize e-mail, company etc. This issue removes 
> those token types.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7773) Remove unused/deprecated token types from StandardTokenizer

2017-04-09 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-7773:


 Summary: Remove unused/deprecated token types from 
StandardTokenizer
 Key: LUCENE-7773
 URL: https://issues.apache.org/jira/browse/LUCENE-7773
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 6.5
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: master (7.0)


StandardTokenizer does not recognize e-mail, company etc. This issue removes 
those token types.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-27 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780903#comment-15780903
 ] 

Ahmet Arslan commented on LUCENE-7602:
--

I think the current issue will clean up three previous issues. 

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15777253#comment-15777253
 ] 

Ahmet Arslan commented on LUCENE-7602:
--

can't we just use Map instead of Map or ContextMap?

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-20 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

Patch that adds javadocs. {{ant documentation-lint}} still fails for some 
reason that I cannot figure out.

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch, 
> LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7599) replace TestRandomChains.Predicate with java.util.function.Predicate

2016-12-20 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7599:
-
Attachment: LUCENE-7599.patch

Patch that replaces ArgProducer with Function

> replace TestRandomChains.Predicate with java.util.function.Predicate
> 
>
> Key: LUCENE-7599
> URL: https://issues.apache.org/jira/browse/LUCENE-7599
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Priority: Trivial
>  Labels: test
> Fix For: master (7.0)
>
> Attachments: LUCENE-7599.patch, LUCENE-7599.patch
>
>
> {{TestRandomChains}} has its own Predicate interface which can be replaced 
> with {{java.util.function.Predicate}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7599) replace TestRandomChains.Predicate with java.util.function.Predicate

2016-12-20 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7599:
-
Attachment: LUCENE-7599.patch

Patch that removes {{TestRandomChains.Predicate}} in favour of 
{{java.util.function.Predicate}. It simplifies code with lambda expressions or 
method references.

> replace TestRandomChains.Predicate with java.util.function.Predicate
> 
>
> Key: LUCENE-7599
> URL: https://issues.apache.org/jira/browse/LUCENE-7599
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/test
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Priority: Trivial
>  Labels: test
> Fix For: master (7.0)
>
> Attachments: LUCENE-7599.patch
>
>
> {{TestRandomChains}} has its own Predicate interface which can be replaced 
> with {{java.util.function.Predicate}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7599) replace TestRandomChains.Predicate with java.util.function.Predicate

2016-12-20 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-7599:


 Summary: replace TestRandomChains.Predicate with 
java.util.function.Predicate
 Key: LUCENE-7599
 URL: https://issues.apache.org/jira/browse/LUCENE-7599
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Affects Versions: 6.3
Reporter: Ahmet Arslan
Priority: Trivial
 Fix For: master (7.0)


{{TestRandomChains}} has its own Predicate interface which can be replaced with 
{{java.util.function.Predicate}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-18 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15758938#comment-15758938
 ] 

Ahmet Arslan commented on LUCENE-7585:
--

I tried adding javadocs to fields in the interface, but it did not solve the 
missing javadocs problem.
{{documentation-lint}} complains/fails for the lucene/analysis/modules, which 
are explicitly defined with the level of method in 
[lucene/build.xml|https://github.com/apache/lucene-solr/blob/master/lucene/build.xml]

{code:xml}





{code}

I figured that this method=(level|class|none) thing is about 
[checkJavaDocs.py|https://github.com/apache/lucene-solr/blob/master/dev-tools/scripts/checkJavaDocs.py].
Any pointer how to document interface fields so that level="method" passes in 
checkJavaDocs.py?

Or, can we remove above xml fragment from build.xml?

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-14 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

a few more refactoring including the overlooked code point filter factory.

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-14 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749577#comment-15749577
 ] 

Ahmet Arslan commented on LUCENE-7585:
--

By saying inconsistency, I mean the strategy to retrieve those parameters from 
the arg map. Some use inline string constant e.g. getBoolean(args, "reverse"); 
others define private or public static final String for the key.

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-10 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15737593#comment-15737593
 ] 

Ahmet Arslan commented on LUCENE-7585:
--

Here is an excerpt from {{documentation-lint}}

{code}
  [exec] 
build/docs/analyzers-icu/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.html
 [exec]   missing Fields: CONSUME_ALL_TOKENS
 [exec]   missing Fields: DELIMITER
 [exec]   missing Fields: DICTIONARY
 [exec]   missing Fields: ENCODER
 [exec]   missing Fields: FORMAT
 [exec]   missing Fields: IGNORE_CASE
 [exec]   missing Fields: LUCENE_MATCH_VERSION
 [exec]   missing Fields: MAX
 [exec]   missing Fields: MAX_TOKEN_LENGTH
 [exec]   missing Fields: MIN
 [exec]   missing Fields: PATTERN
 [exec]   missing Fields: PRESERVE_ORIGINAL
 [exec]   missing Fields: PROTECTED
 [exec]   missing Fields: TYPES
 [exec]   missing Fields: WORDS
 [exec] 
 [exec] Missing javadocs were found!
{code}

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-09 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736501#comment-15736501
 ] 

Ahmet Arslan commented on LUCENE-7585:
--

Thank you for looking into this. Initially, I was planning to move all existing 
parameters to a common interface.
I figured that the interface will grow very large since certain factories have 
many specific parameters. 
I moved the most common parameters to the interface. However, there still 
remains a lot in the codebase.
For example, ngram package has "minGramSize" and "maxGramSize" in common. 
Phonetic module has "maxCodeLength" and "inject."

What could be the preferred course of action here?

* Handle packages and modules locally? If yes how?
* Move all parameters to the interface unconditionally.
* Devise an algorithm: Move if a parameter is shared by at least two package or 
module.
* ?

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-09 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

Properly created patch that includes proposed changes (alphabetisation and 
lucene_match_version). Ant {{documentation-lint}} complains about factories  of 
icu. Any pointer how to fix it?

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Assignee: David Smiley
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch, LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-07 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7585:
-
Attachment: LUCENE-7585.patch

> Interface for common parameters used across analysis factories
> --
>
> Key: LUCENE-7585
> URL: https://issues.apache.org/jira/browse/LUCENE-7585
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.3
>Reporter: Ahmet Arslan
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7585.patch
>
>
> Certain parameters (String constants) are same/common for multiple analysis 
> factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
> {{preserveOriginal}}. These string constants are handled inconsistently in 
> different factories. This is an effort to define most common constants in 
> ({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7585) Interface for common parameters used across analysis factories

2016-12-07 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-7585:


 Summary: Interface for common parameters used across analysis 
factories
 Key: LUCENE-7585
 URL: https://issues.apache.org/jira/browse/LUCENE-7585
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 6.3
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: master (7.0)


Certain parameters (String constants) are same/common for multiple analysis 
factories. Some examples are {{ignoreCase}}, {{dictionary}}, and 
{{preserveOriginal}}. These string constants are handled inconsistently in 
different factories. This is an effort to define most common constants in 
({{CommonAnalysisFactoryParams}}) interface and reuse them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7525) ASCIIFoldingFilter.foldToASCII performance issue due to large compiled method size

2016-10-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616922#comment-15616922
 ] 

Ahmet Arslan commented on LUCENE-7525:
--

Can workings of ICUFoldingFilter give any insight here?

> ASCIIFoldingFilter.foldToASCII performance issue due to large compiled method 
> size
> --
>
> Key: LUCENE-7525
> URL: https://issues.apache.org/jira/browse/LUCENE-7525
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.2.1
>Reporter: Karl von Randow
> Attachments: ASCIIFolding.java, ASCIIFoldingFilter.java, 
> TestASCIIFolding.java
>
>
> The {{ASCIIFoldingFilter.foldToASCII}} method has an enormous switch 
> statement and is too large for the HotSpot compiler to compile; causing a 
> performance problem.
> The method is about 13K compiled, versus the 8KB HotSpot limit. So splitting 
> the method in half works around the problem.
> In my tests splitting the method in half resulted in a 5X performance 
> increase.
> In the test code below you can see how slow the fold method is, even when it 
> is using the shortcut when the character is less than 0x80, compared to an 
> inline implementation of the same shortcut.
> So a workaround is to split the method. I'm happy to provide a patch. It's a 
> hack, of course. Perhaps using the {{MappingCharFilterFactory}} with an input 
> file as per SOLR-2013 would be a better replacement for this method in this 
> class?
> {code:java}
> public class ASCIIFoldingFilterPerformanceTest {
>   private static final int ITERATIONS = 1_000_000;
>   @Test
>   public void testFoldShortString() {
>   char[] input = "testing".toCharArray();
>   char[] output = new char[input.length * 4];
>   for (int i = 0; i < ITERATIONS; i++) {
>   ASCIIFoldingFilter.foldToASCII(input, 0, output, 0, 
> input.length);
>   }
>   }
>   @Test
>   public void testFoldShortAccentedString() {
>   char[] input = "éúéúøßüäéúéúøßüä".toCharArray();
>   char[] output = new char[input.length * 4];
>   for (int i = 0; i < ITERATIONS; i++) {
>   ASCIIFoldingFilter.foldToASCII(input, 0, output, 0, 
> input.length);
>   }
>   }
>   @Test
>   public void testManualFoldTinyString() {
>   char[] input = "t".toCharArray();
>   char[] output = new char[input.length * 4];
>   for (int i = 0; i < ITERATIONS; i++) {
>   int k = 0;
>   for (int j = 0; j < 1; ++j) {
>   final char c = input[j];
>   if (c < '\u0080') {
>   output[k++] = c;
>   } else {
>   Assert.assertTrue(false);
>   }
>   }
>   }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7377) Remove ClassicSimilarity?

2016-07-13 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374882#comment-15374882
 ] 

Ahmet Arslan commented on LUCENE-7377:
--

I think, an implementation of  TFIDF should stay in Lucene, but it  should 
extend SimilarityBase and it should have a simple, single line code in 
org.apache.lucene.search.similarities.SimilarityBase#score method. e.g.,
{code}
return tf * log2(((double) stats.getNumberOfDocuments() / (double) 
stats.getDocFreq()) + 1);
{code}

Current TFIDFSimilarity and ClassicSimilarity are hard to understand.

> Remove ClassicSimilarity?
> -
>
> Key: LUCENE-7377
> URL: https://issues.apache.org/jira/browse/LUCENE-7377
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> ClassicSimilarity was relying on coordination factors in order to produce 
> good scores. Now that coords are gone, it is quite a bad option compared to 
> eg. BM25Similarity.
> Maybe we should remove ClassicSimilarity entirely in master and deprecated in 
> 6.x in order to encourage users to move to BM25Similarity rather than stay on 
> a Similarity impl of lesser quality?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *

2016-06-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349836#comment-15349836
 ] 

Ahmet Arslan commented on SOLR-9250:


Yes this is a know issue of wildcard queries.

> Search breaks with EU symbol € and wildcard *
> -
>
> Key: SOLR-9250
> URL: https://issues.apache.org/jira/browse/SOLR-9250
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 5.3.1
>Reporter: Tim Nolan
> Attachments: contact-name-analyze.png, contact-name-field-type.png
>
>
> While testing UTF-8 character searches, which worked, we have noticed a 
> combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we 
> found the search worked, but by adding a wild-card (e.g. 
> {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the 
> {{€}} symbol worked (i.e. {{Tùûüÿ*}}).
> Showing the logs for these queries:
> {noformat:title=Full text without wildcard, hit=1}
> 2016-06-25 13:16:34.361 [qtp237852351-21] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348}
>  hits=1 status=0 QTime=0 
> {noformat}
> {noformat:title=Full text with wildcard, hit=0}
> 2016-06-25 13:16:41.172 [qtp237852351-16] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160}
>  hits=0 status=0 QTime=0 
> {noformat}
> {noformat:title=Partial text before € with wildcard, hit=1}
> 2016-06-25 13:16:52.135 [qtp237852351-18] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} 
> hits=1 status=0 QTime=2 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *

2016-06-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349767#comment-15349767
 ] 

Ahmet Arslan commented on SOLR-9250:


Yes this one, but you needs to make the chains visible.
It is the  tag in schema.
 Anyways, the problem looks like your tokenizer breaks/tokenizes your sample 
input at the UE char.
Please use analysis admin page to see how your example text is 
tokenized/indexed.

Have you read https://wiki.apache.org/solr/MultitermQueryAnalysis ?

> Search breaks with EU symbol € and wildcard *
> -
>
> Key: SOLR-9250
> URL: https://issues.apache.org/jira/browse/SOLR-9250
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 5.3.1
>Reporter: Tim Nolan
> Attachments: contact-name-analyze.png, contact-name-field-type.png
>
>
> While testing UTF-8 character searches, which worked, we have noticed a 
> combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we 
> found the search worked, but by adding a wild-card (e.g. 
> {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the 
> {{€}} symbol worked (i.e. {{Tùûüÿ*}}).
> Showing the logs for these queries:
> {noformat:title=Full text without wildcard, hit=1}
> 2016-06-25 13:16:34.361 [qtp237852351-21] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348}
>  hits=1 status=0 QTime=0 
> {noformat}
> {noformat:title=Full text with wildcard, hit=0}
> 2016-06-25 13:16:41.172 [qtp237852351-16] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160}
>  hits=0 status=0 QTime=0 
> {noformat}
> {noformat:title=Partial text before € with wildcard, hit=1}
> 2016-06-25 13:16:52.135 [qtp237852351-18] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} 
> hits=1 status=0 QTime=2 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *

2016-06-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349738#comment-15349738
 ] 

Ahmet Arslan commented on SOLR-9250:


Please, we need to see field *type* definition. Where the analyzer elements are 
chained.

> Search breaks with EU symbol € and wildcard *
> -
>
> Key: SOLR-9250
> URL: https://issues.apache.org/jira/browse/SOLR-9250
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 5.3.1
>Reporter: Tim Nolan
>
> While testing UTF-8 character searches, which worked, we have noticed a 
> combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we 
> found the search worked, but by adding a wild-card (e.g. 
> {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the 
> {{€}} symbol worked (i.e. {{Tùûüÿ*}}).
> Showing the logs for these queries:
> {noformat:title=Full text without wildcard, hit=1}
> 2016-06-25 13:16:34.361 [qtp237852351-21] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348}
>  hits=1 status=0 QTime=0 
> {noformat}
> {noformat:title=Full text with wildcard, hit=0}
> 2016-06-25 13:16:41.172 [qtp237852351-16] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160}
>  hits=0 status=0 QTime=0 
> {noformat}
> {noformat:title=Partial text before € with wildcard, hit=1}
> 2016-06-25 13:16:52.135 [qtp237852351-18] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} 
> hits=1 status=0 QTime=2 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *

2016-06-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349677#comment-15349677
 ] 

Ahmet Arslan commented on SOLR-9250:


bq. I'm not sure what you mean by that statement
Please see https://wiki.apache.org/solr/MultitermQueryAnalysis

> Search breaks with EU symbol € and wildcard *
> -
>
> Key: SOLR-9250
> URL: https://issues.apache.org/jira/browse/SOLR-9250
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 5.3.1
>Reporter: Tim Nolan
>
> While testing UTF-8 character searches, which worked, we have noticed a 
> combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we 
> found the search worked, but by adding a wild-card (e.g. 
> {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the 
> {{€}} symbol worked (i.e. {{Tùûüÿ*}}).
> Showing the logs for these queries:
> {noformat:title=Full text without wildcard, hit=1}
> 2016-06-25 13:16:34.361 [qtp237852351-21] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348}
>  hits=1 status=0 QTime=0 
> {noformat}
> {noformat:title=Full text with wildcard, hit=0}
> 2016-06-25 13:16:41.172 [qtp237852351-16] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160}
>  hits=0 status=0 QTime=0 
> {noformat}
> {noformat:title=Partial text before € with wildcard, hit=1}
> 2016-06-25 13:16:52.135 [qtp237852351-18] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} 
> hits=1 status=0 QTime=2 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *

2016-06-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349674#comment-15349674
 ] 

Ahmet Arslan commented on SOLR-9250:


We need to see field type definition for that field. Index time analyzer may 
breaking words at EU symbol or something.

> Search breaks with EU symbol € and wildcard *
> -
>
> Key: SOLR-9250
> URL: https://issues.apache.org/jira/browse/SOLR-9250
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 5.3.1
>Reporter: Tim Nolan
>
> While testing UTF-8 character searches, which worked, we have noticed a 
> combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we 
> found the search worked, but by adding a wild-card (e.g. 
> {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the 
> {{€}} symbol worked (i.e. {{Tùûüÿ*}}).
> Showing the logs for these queries:
> {noformat:title=Full text without wildcard, hit=1}
> 2016-06-25 13:16:34.361 [qtp237852351-21] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348}
>  hits=1 status=0 QTime=0 
> {noformat}
> {noformat:title=Full text with wildcard, hit=0}
> 2016-06-25 13:16:41.172 [qtp237852351-16] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160}
>  hits=0 status=0 QTime=0 
> {noformat}
> {noformat:title=Partial text before € with wildcard, hit=1}
> 2016-06-25 13:16:52.135 [qtp237852351-18] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} 
> hits=1 status=0 QTime=2 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9250) Search breaks with EU symbol € and wildcard *

2016-06-25 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15349652#comment-15349652
 ] 

Ahmet Arslan commented on SOLR-9250:


What do you mean by saying the search fails? Throws exception? Does not return 
expected results?
Wildcard queries are not analyzed by the way.
Please ask question of this type on user mailing list.

> Search breaks with EU symbol € and wildcard *
> -
>
> Key: SOLR-9250
> URL: https://issues.apache.org/jira/browse/SOLR-9250
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 5.3.1
>Reporter: Tim Nolan
>
> While testing UTF-8 character searches, which worked, we have noticed a 
> combination that fails. Testing with the data {{Tùûüÿ€àâæçéèêëïîôœm}}, we 
> found the search worked, but by adding a wild-card (e.g. 
> {{Tùûüÿ€àâæçéèêëïîôœm*}}), the search fails. Adding the wildcard before the 
> {{€}} symbol worked (i.e. {{Tùûüÿ*}}).
> Showing the logs for these queries:
> {noformat:title=Full text without wildcard, hit=1}
> 2016-06-25 13:16:34.361 [qtp237852351-21] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm=true=type:CONTACT=12=json&_=1466860594348}
>  hits=1 status=0 QTime=0 
> {noformat}
> {noformat:title=Full text with wildcard, hit=0}
> 2016-06-25 13:16:41.172 [qtp237852351-16] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ€àâæçéèêëïîôœm*=true=type:CONTACT=12=json&_=1466860601160}
>  hits=0 status=0 QTime=0 
> {noformat}
> {noformat:title=Partial text before € with wildcard, hit=1}
> 2016-06-25 13:16:52.135 [qtp237852351-18] INFO  
> org.apache.solr.core.SolrCore.Request  – [core-name] webapp=/solr 
> path=/select 
> params={q=Tùûüÿ*=true=type:CONTACT=12=json&_=1466860612125} 
> hits=1 status=0 QTime=2 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-23 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346949#comment-15346949
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

This is a new feature that is never released, new ticket may not be needed.

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-23 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346875#comment-15346875
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

Hi, 
multiple tokens OK, but multiple identical tokens look weird, no?
Have you checked the screenshot that includes 
RemoveDuplicatesTokenFilterFactory (RDTF)?

bq. Shall I create mappings_uk.txt so we can use it in solr?

Lets ask Michael. 
Either separate file or we can just recommend to use mapping char filter the 
recommended mappings.
May be we can place the uk_mappings.txt file under 
https://github.com/apache/lucene-solr/tree/master/solr/server/solr/configsets/sample_techproducts_configs/conf/lang

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-23 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346857#comment-15346857
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

Please see screenshots in the attachments section at the begging of the page 
and let me know what you think.

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7287:
-
Attachment: Screen Shot 2016-06-23 at 8.41.28 PM.png

Here is the screen shot of analysis admin page, with 
RemoveDuplicatesTokenFilter added.
{code:xml}
   

   





  

{code}

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png, Screen Shot 2016-06-23 at 8.41.28 PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7287:
-
Attachment: Screen Shot 2016-06-23 at 8.23.01 PM.png

 {code:xml}
  

   




  

{code}

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch, Screen Shot 2016-06-23 at 8.23.01 
> PM.png
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-23 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346816#comment-15346816
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

Hi,

I was able to run the analyzer successfully. Without mapping chart filter. 
Because character mappings are hardcoded into code.
I am attaching an analysis screen shot. However, it looks like we need a remove 
duplicates token filter at the end.
It looks like Morfologik filter injects multiple tokens at the same position

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-22 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344403#comment-15344403
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

only committers have rights to edit confluence wiki. Contributors include the 
proposed change/addition as a message at the end of the page.

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-22 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344290#comment-15344290
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

I think you, as the author of Ukrainian. Thanks!

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-22 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343877#comment-15343877
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

So, Solr field type counterpart of this analyzer would be something like:

{code:xml}


   





  

{code}

It would be nice to add an entry for Ukranian to 
https://cwiki.apache.org/confluence/display/solr/Language+Analysis

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-21 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342944#comment-15342944
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

Can we use this analyzer in solr?

{code:xml}
 
{code}


> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7287.patch
>
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-06-12 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326628#comment-15326628
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

May be MappingCharFilter could be used instead of a token filter?

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8676) It's not possible to use a different log4.properties file on windows

2016-06-03 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314246#comment-15314246
 ] 

Ahmet Arslan commented on SOLR-8676:


[~mkhludnev] do you mind looking into SOLR-8445 too?

> It's not possible to use a different log4.properties file on windows
> 
>
> Key: SOLR-8676
> URL: https://issues.apache.org/jira/browse/SOLR-8676
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.4.1
>Reporter: Kristine Jetzke
>Assignee: Mikhail Khludnev
> Attachments: SOLR-8676.patch, SOLR-8676.patch, verifying SOLR-8676.txt
>
>
> It's currently not possible to change the location of the log4j.properties 
> file on windows. The value of {{LOG4J_CONFIG}} always gets replaced with the 
> default value {{server\resources\log4j.properties}}. Thus, this file inside 
> the server directory needs to be changed after every update.
> See attached patch for a fix. Unfortunately, I couldn't figure out why 
> {{LOG4J_CONFIG}} was set to empty. I tested manually that logging still works 
> when running an example so I hope that this line is really just obsolete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9174) After Solr 5.5, mm parameter doesn't work properly

2016-05-31 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308266#comment-15308266
 ] 

Ahmet Arslan commented on SOLR-9174:


Can someone explain why (e)dismax should honor/respect/care the {{q.op}} 
parameter?
(e)dismax has its own parameter {{mm}} for the task.

> After Solr 5.5, mm parameter doesn't work properly
> --
>
> Key: SOLR-9174
> URL: https://issues.apache.org/jira/browse/SOLR-9174
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers, search
>Affects Versions: 5.5, 6.0, 6.0.1
>Reporter: Issei Nishigata
>
> “mm" parameter does not work properly, when I set "q.op=AND” after Solr 5.5.
> In Solr 5.4, mm parameter works expectedly with the following setting.
> [schema]
> {code:xml}
> 
>   
>  maxGramSize="2"/>
>   
> 
> {code}
> [request]
> {quote}
> http://localhost:8983/solr/collection1/select?defType=edismax=AND=2=solar
> {quote}
> After Solr 5.5, the result will not be the same as Solr 5.4.
> [Solr 5.4]
> {code:xml}
> 
> ...
>   
> 2
> solar
> edismax
> AND
>   
> ...
> 
>   
> 0
> 
>   solr
> 
>   
> 
> 
>   solar
>   solar
>   
>   (+DisjunctionMaxQuerytext:so text:ol text:la text:ar)~2/no_coord
>   
>   +(((text:so text:ol text:la 
> text:ar)~2))
>   ...
> 
> {code}
> [Solr 6.0.1]
> {code:xml}
> 
> ...
>   
> 2
> solar
> edismax
> AND
>   
> ...
> 
>   
> solar
> solar
> 
> (+DisjunctionMaxQuery(((+text:so +text:ol +text:la +text:ar/no_coord
> 
> +((+text:so +text:ol +text:la 
> +text:ar))
> ...
> {code}
> As shown above, parsedquery also differs from Solr 5.4 and Solr 6.0.1(after 
> Solr 5.5).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7287) New lemma-tizer plugin for ukrainian language.

2016-05-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299381#comment-15299381
 ] 

Ahmet Arslan commented on LUCENE-7287:
--

This looks like a wrapper for string to string mapping. No need to roll a 
custom Lucene code for this: Just replace comma with tab in the 
{{mapping_sorted.csv}} file and use good old {{StemmerOverrideFilter}}, which 
has the fast lookup that does not require {{termAtt.toString()}} conversion.

> New lemma-tizer plugin for ukrainian language.
> --
>
> Key: LUCENE-7287
> URL: https://issues.apache.org/jira/browse/LUCENE-7287
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Dmytro Hambal
>Priority: Minor
>  Labels: analysis, language, plugin
>
> Hi all,
> I wonder whether you are interested in supporting a plugin which provides a 
> mapping between ukrainian word forms and their lemmas. Some tests and docs go 
> out-of-the-box =) .
> https://github.com/mrgambal/elasticsearch-ukrainian-lemmatizer
> It's really simple but still works and generates some value for its users.
> More: https://github.com/elastic/elasticsearch/issues/18303



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7148) Support boolean subset matching

2016-04-06 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227887#comment-15227887
 ] 

Ahmet Arslan edited comment on LUCENE-7148 at 4/6/16 7:37 AM:
--

bq. Perhaps you mean something like Solr's frange that filters based on the 
value? 
Exactly. Given that q=john smith, lets assume that we have a field titleLenght 
that stores the number of words in the field.  We can even extract that info 
from norm doc values later on. Something like: {noformat} fq={!frange l=0 u=0 
cache=false cost=200} sub(titleLength, sum(termfreq(title,'smith'), 
termfreq(title,'john'))) {noformat}

bq. That would be O(docs) as it evaluates per doc.
Cant we make this filter query executed last, with cache=false cost=150?


was (Author: iorixxx):
bq. Perhaps you mean something like Solr's frange that filters based on the 
value? 
Exactly. Given that q=john smith, lets assume that we have a field titleLenght 
that stores the number of words in the field.  We can even extract that info 
from norm doc values later on. Something like fq={!frange l=0 u=0} 
sub(titleLength, sum(termfreq(title,'smith'), termfreq(title,'john')))

bq. That would be O(docs) as it evaluates per doc.
Cant we make this filter query executed last, with cache=false cost=150?

> Support boolean subset matching
> ---
>
> Key: LUCENE-7148
> URL: https://issues.apache.org/jira/browse/LUCENE-7148
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 5.x
>Reporter: Otmar Caduff
>  Labels: newbie
>
> In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the 
> “minimum should match” setting on the boolean query.
> Now, when querying, I want to
> - (1)  match the documents which either contain all the terms of the query 
> (Occur.MUST for all terms would do that) or,
> - (2)  if all terms for a given field of a document are a subset of the query 
> terms, that document should match as well.
> Example:
> Document d hast field f with terms A, B, C
> Query with the following terms should match that document:
> A
> B
> A B
> A B C
> A B C D
> Query with the following terms should not match:
> D
> A B D



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7148) Support boolean subset matching

2016-04-06 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227887#comment-15227887
 ] 

Ahmet Arslan commented on LUCENE-7148:
--

bq. Perhaps you mean something like Solr's frange that filters based on the 
value? 
Exactly. Given that q=john smith, lets assume that we have a field titleLenght 
that stores the number of words in the field.  We can even extract that info 
from norm doc values later on. Something like fq={!frange l=0 u=0} 
sub(titleLength, sum(termfreq(title,'smith'), termfreq(title,'john')))

bq. That would be O(docs) as it evaluates per doc.
Cant we make this filter query executed last, with cache=false cost=150?

> Support boolean subset matching
> ---
>
> Key: LUCENE-7148
> URL: https://issues.apache.org/jira/browse/LUCENE-7148
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 5.x
>Reporter: Otmar Caduff
>  Labels: newbie
>
> In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the 
> “minimum should match” setting on the boolean query.
> Now, when querying, I want to
> - (1)  match the documents which either contain all the terms of the query 
> (Occur.MUST for all terms would do that) or,
> - (2)  if all terms for a given field of a document are a subset of the query 
> terms, that document should match as well.
> Example:
> Document d hast field f with terms A, B, C
> Query with the following terms should match that document:
> A
> B
> A B
> A B C
> A B C D
> Query with the following terms should not match:
> D
> A B D



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7148) Support boolean subset matching

2016-04-05 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227378#comment-15227378
 ] 

Ahmet Arslan commented on LUCENE-7148:
--

can't we have a function query that just returns the number of matching terms 
here? Then we compare it with the document length.

> Support boolean subset matching
> ---
>
> Key: LUCENE-7148
> URL: https://issues.apache.org/jira/browse/LUCENE-7148
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Affects Versions: 5.x
>Reporter: Otmar Caduff
>  Labels: newbie
>
> In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the 
> “minimum should match” setting on the boolean query.
> Now, when querying, I want to
> - (1)  match the documents which either contain all the terms of the query 
> (Occur.MUST for all terms would do that) or,
> - (2)  if all terms for a given field of a document are a subset of the query 
> terms, that document should match as well.
> Example:
> Document d hast field f with terms A, B, C
> Query with the following terms should match that document:
> A
> B
> A B
> A B C
> A B C D
> Query with the following terms should not match:
> D
> A B D



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's

2016-03-24 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210008#comment-15210008
 ] 

Ahmet Arslan commented on LUCENE-7132:
--

It is really hard to decipher what is going on inside the good old 
TFIDFSimilarity.

{code:title=TFIDFSimilarity.IDFStats.normalize|borderStyle=solid}
@Override
public void normalize(float queryNorm, float boost) {
  this.boost = boost;
  this.queryNorm = queryNorm;
  queryWeight = queryNorm * boost * idf.getValue();
  value = queryWeight * idf.getValue(); // idf for document
}
{code}

* Why query weight has a IDF multiplicand?
* Why TFIDFSimilarity.IDFStats#value is set to IDF square?
* Why TFIDFSimilarity.IDFStats#value is need even though we have 
TFIDFSimilarity.IDFStats.idf.getValue();
* TFIDFSimilarity.TFIDFSimScorer#score returns tf(freq) * IDFStats.value which 
looks tfxIDFxIDF to me.

> ScoreDoc.score() returns a different value than that of Explanation's
> -
>
> Key: LUCENE-7132
> URL: https://issues.apache.org/jira/browse/LUCENE-7132
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
>Assignee: Steve Rowe
> Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, 
> debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's

2016-03-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7132:
-
Component/s: core/search

> ScoreDoc.score() returns a different value than that of Explanation's
> -
>
> Key: LUCENE-7132
> URL: https://issues.apache.org/jira/browse/LUCENE-7132
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
>Assignee: Steve Rowe
> Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, 
> debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's

2016-03-23 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209155#comment-15209155
 ] 

Ahmet Arslan commented on LUCENE-7132:
--

Thanks Steve for taking care of this!

> ScoreDoc.score() returns a different value than that of Explanation's
> -
>
> Key: LUCENE-7132
> URL: https://issues.apache.org/jira/browse/LUCENE-7132
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
>Assignee: Steve Rowe
> Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, 
> debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's

2016-03-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7132:
-
Attachment: LUCENE-7132.patch

Lucene only patch. Interestingly, *testExplainScoreEquality* method also failed 
once for me. Which can be reproduced with : {{ant test  -Dtestcase=TestExplain 
-Dtests.method=testExplainScoreEquality -Dtests.seed=B90C674F754D524 
-Dtests.locale=de -Dtests.timezone=Etc/GMT-12 -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8}} 
However, *testRajeshData* method fails more frequently.

> ScoreDoc.score() returns a different value than that of Explanation's
> -
>
> Key: LUCENE-7132
> URL: https://issues.apache.org/jira/browse/LUCENE-7132
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
>Assignee: Steve Rowe
> Attachments: LUCENE-7132.patch, SOLR-8884.patch, SOLR-8884.patch, 
> debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7132) ScoreDoc.score() returns a different value than that of Explanation's

2016-03-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7132:
-
Summary: ScoreDoc.score() returns a different value than that of 
Explanation's  (was: fl=score returns a different value than that of Explain's)

> ScoreDoc.score() returns a different value than that of Explanation's
> -
>
> Key: LUCENE-7132
> URL: https://issues.apache.org/jira/browse/LUCENE-7132
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
>Assignee: Steve Rowe
> Attachments: SOLR-8884.patch, SOLR-8884.patch, debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8884) fl=score returns a different value than that of Explain's

2016-03-23 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208940#comment-15208940
 ] 

Ahmet Arslan commented on SOLR-8884:


Can someone who have the appropriate permissions please move SOLR-8884 to 
LUCENE-?

> fl=score returns a different value than that of Explain's
> -
>
> Key: SOLR-8884
> URL: https://issues.apache.org/jira/browse/SOLR-8884
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
> Attachments: SOLR-8884.patch, SOLR-8884.patch, debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8884) fl=score returns a different value than that of Explain's

2016-03-23 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8884:
---
Attachment: SOLR-8884.patch

This is truly a Lucene level bug. Attached path includes a failing test case. 
It can be reproduced with: {{ant test  -Dtestcase=TestExplain 
-Dtests.method=testRajeshData -Dtests.seed=D5E55A7E84F4C82C -Dtests.slow=true 
-Dtests.locale=es-HN -Dtests.timezone=Asia/Samarkand -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8}}

> fl=score returns a different value than that of Explain's
> -
>
> Key: SOLR-8884
> URL: https://issues.apache.org/jira/browse/SOLR-8884
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
> Attachments: SOLR-8884.patch, SOLR-8884.patch, debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8884) fl=score returns a different value than that of Explain's

2016-03-22 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8884:
---
Attachment: SOLR-8884.patch

Randomized test case for Lucene in hopes that it will trigger sometime. Will 
try to write Solr counterpart.

> fl=score returns a different value than that of Explain's
> -
>
> Key: SOLR-8884
> URL: https://issues.apache.org/jira/browse/SOLR-8884
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
> Attachments: SOLR-8884.patch, debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8884) fl=score returns a different value than that of Explain's

2016-03-22 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8884:
---
Attachment: debug.xml

There is the Rajesh's response file that demonstrates the problem.

> fl=score returns a different value than that of Explain's
> -
>
> Key: SOLR-8884
> URL: https://issues.apache.org/jira/browse/SOLR-8884
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 5.5
>Reporter: Ahmet Arslan
> Attachments: debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8884) fl=score returns a different value than that of Explain's

2016-03-22 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created SOLR-8884:
--

 Summary: fl=score returns a different value than that of Explain's
 Key: SOLR-8884
 URL: https://issues.apache.org/jira/browse/SOLR-8884
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 5.5
Reporter: Ahmet Arslan


Some of the folks 
[reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
explain's score can be different than the score requested by fields parameter. 
Interestingly, Explain's scores would create a different ranking than the 
original result list. This is something users experience, but it cannot be 
re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7014) Use TimeUnit.TARGETUNIT.convert() to convert between time units

2016-02-09 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7014:
-
Attachment: LUCENE-7014.patch

I started to incorporate suggested changes. This patch includes only 
{{org.apache.lucene.index.*}} files. For three digits, I switched to 
milliseconds. However, I rounded {{%.1f}}. Is this reasonable in terms of 
precision loss? May be we should not touch these cases?

> Use TimeUnit.TARGETUNIT.convert() to convert between time units
> ---
>
> Key: LUCENE-7014
> URL: https://issues.apache.org/jira/browse/LUCENE-7014
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master, 5.4.1
>Reporter: Ahmet Arslan
>Priority: Minor
> Fix For: 5.5, master, 6.0
>
> Attachments: LUCENE-7014.patch, LUCENE-7014.patch
>
>
> Re-phrased from [~steve_rowe]'s 
> [comment|https://issues.apache.org/jira/browse/LUCENE-6823?focusedCommentId=14941283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941283]
>  :
> System.nanoTime(), which is guaranteed to be monotonic, is now used to 
> recored elapsed times. In several places, conversion from nano seconds to 
> some target unit (e.g. seconds, milli seconds) is performed using hard-coded 
> conversion constants, which is prone to mistakes. 
> It would be nice to use {{TimeUnit.TARGETUNIT.convert(sourceDuration, 
> TimeUnit.SOURCEUNIT)}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7014) Use TimeUnit.TARGETUNIT.convert() to convert between time units

2016-02-04 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-7014:
-
Attachment: LUCENE-7014.patch

> Use TimeUnit.TARGETUNIT.convert() to convert between time units
> ---
>
> Key: LUCENE-7014
> URL: https://issues.apache.org/jira/browse/LUCENE-7014
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master, 5.4.1
>Reporter: Ahmet Arslan
>Priority: Minor
> Fix For: 5.5, master, 6.0
>
> Attachments: LUCENE-7014.patch
>
>
> Re-phrased from [~steve_rowe]'s 
> [comment|https://issues.apache.org/jira/browse/LUCENE-6823?focusedCommentId=14941283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941283]
>  :
> System.nanoTime(), which is guaranteed to be monotonic, is now used to 
> recored elapsed times. In several places, conversion from nano seconds to 
> some target unit (e.g. seconds, milli seconds) is performed using hard-coded 
> conversion constants, which is prone to mistakes. 
> It would be nice to use {{TimeUnit.TARGETUNIT.convert(sourceDuration, 
> TimeUnit.SOURCEUNIT)}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7014) Use TimeUnit.TARGETUNIT.convert() to convert between time units

2016-02-04 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-7014:


 Summary: Use TimeUnit.TARGETUNIT.convert() to convert between time 
units
 Key: LUCENE-7014
 URL: https://issues.apache.org/jira/browse/LUCENE-7014
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 5.4.1, master
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 5.5, master, 6.0


Re-phrased from [~steve_rowe]'s 
[comment|https://issues.apache.org/jira/browse/LUCENE-6823?focusedCommentId=14941283=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941283]
 :

System.nanoTime(), which is guaranteed to be monotonic, is now used to recored 
elapsed times. In several places, conversion from nano seconds to some target 
unit (e.g. seconds, milli seconds) is performed using hard-coded conversion 
constants, which is prone to mistakes. 

It would be nice to use {{TimeUnit.TARGETUNIT.convert(sourceDuration, 
TimeUnit.SOURCEUNIT)}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files

2016-01-27 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8445:
---
Fix Version/s: 5.5

> fix line separator in log4j.properties files
> 
>
> Key: SOLR-8445
> URL: https://issues.apache.org/jira/browse/SOLR-8445
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.4, Trunk
>Reporter: Ahmet Arslan
>Priority: Trivial
>  Labels: log4j, logging
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8445.patch, SOLR-8445.patch
>
>
> new line is mistyped in conversion pattern 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory

2016-01-27 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8570:
---
Fix Version/s: 5.5

> Make discountOverlaps' initialization value consistent across subclasses of 
> SimilarityFactory 
> --
>
> Key: SOLR-8570
> URL: https://issues.apache.org/jira/browse/SOLR-8570
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.4
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: similarity
> Fix For: 5.5, Trunk
>
> Attachments: SOLR-8570.patch, SOLR-8570.patch
>
>
> Subclasses of SimilarityFactory have a member variable named 
> {{discountOverlaps}}.
> In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561.
> Since discountOverlaps' default value is true, we should do the same in 
> remaining subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files

2016-01-26 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8445:
---
Attachment: SOLR-8445.patch

Patch generated by {{git diff}}.

> fix line separator in log4j.properties files
> 
>
> Key: SOLR-8445
> URL: https://issues.apache.org/jira/browse/SOLR-8445
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.4, Trunk
>Reporter: Ahmet Arslan
>Priority: Trivial
>  Labels: log4j, logging
> Fix For: Trunk
>
> Attachments: SOLR-8445.patch, SOLR-8445.patch
>
>
> new line is mistyped in conversion pattern 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory

2016-01-26 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8570:
---
Attachment: SOLR-8570.patch

Patch generated by {{git diff origin/master..SOLR-8570}} .

> Make discountOverlaps' initialization value consistent across subclasses of 
> SimilarityFactory 
> --
>
> Key: SOLR-8570
> URL: https://issues.apache.org/jira/browse/SOLR-8570
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.4
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: SOLR-8570.patch, SOLR-8570.patch
>
>
> Subclasses of SimilarityFactory have a member variable named 
> {{discountOverlaps}}.
> In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561.
> Since discountOverlaps' default value is true, we should do the same in 
> remaining subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files

2016-01-19 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8445:
---
Fix Version/s: Trunk

> fix line separator in log4j.properties files
> 
>
> Key: SOLR-8445
> URL: https://issues.apache.org/jira/browse/SOLR-8445
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.4, Trunk
>Reporter: Ahmet Arslan
>Priority: Trivial
>  Labels: log4j, logging
> Fix For: Trunk
>
> Attachments: SOLR-8445.patch
>
>
> new line is mistyped in conversion pattern 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files

2016-01-19 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8445:
---
Labels: log4j logging  (was: )

> fix line separator in log4j.properties files
> 
>
> Key: SOLR-8445
> URL: https://issues.apache.org/jira/browse/SOLR-8445
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.4, Trunk
>Reporter: Ahmet Arslan
>Priority: Trivial
>  Labels: log4j, logging
> Fix For: Trunk
>
> Attachments: SOLR-8445.patch
>
>
> new line is mistyped in conversion pattern 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory

2016-01-19 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8570:
---
Attachment: SOLR-8570.patch

I had this patch handy. However, does moving {{protected boolean 
discountOverlaps = true;}} into the SimilarityFactory breaks any good practices?

> Make discountOverlaps' initialization value consistent across subclasses of 
> SimilarityFactory 
> --
>
> Key: SOLR-8570
> URL: https://issues.apache.org/jira/browse/SOLR-8570
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.4
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: SOLR-8570.patch
>
>
> Subclasses of SimilarityFactory have a member variable named 
> {{discountOverlaps}}.
> In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561.
> Since discountOverlaps' default value is true, we should do the same in 
> remaining subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8570) Make discountOverlaps' initialization value consistent across subclasses of SimilarityFactory

2016-01-19 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created SOLR-8570:
--

 Summary: Make discountOverlaps' initialization value consistent 
across subclasses of SimilarityFactory 
 Key: SOLR-8570
 URL: https://issues.apache.org/jira/browse/SOLR-8570
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Trunk


Subclasses of SimilarityFactory have a member variable named 
{{discountOverlaps}}.
In ClassicSimilarityFactory, it is initialized to {{true}} in SOLR-5561.
Since discountOverlaps' default value is true, we should do the same in 
remaining subclasses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2016-01-19 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106999#comment-15106999
 ] 

Ahmet Arslan commented on LUCENE-6818:
--

Thanks [~rcmuir] for taking care of this.

bq. For the solr factory changes around discountOverlaps, can you make a 
separate issue for that?
Created SOLR-8570

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: similarity
> Fix For: 5.5, Trunk
>
> Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, 
> LUCENE-6818.patch, LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-839) XML Query Parser support (deftype=xmlparser)

2016-01-12 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-839:
--
Attachment: SOLR-839.patch

This patch replaces utf8 constant string with StandardCharsets.UTF_8 as 
suggested by LUCENE-5560

> XML Query Parser support (deftype=xmlparser)
> 
>
> Key: SOLR-839
> URL: https://issues.apache.org/jira/browse/SOLR-839
> Project: Solr
>  Issue Type: New Feature
>  Components: query parsers
>Affects Versions: 1.3, 5.4, Trunk
>Reporter: Erik Hatcher
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-839-object-parser.patch, SOLR-839.patch, 
> SOLR-839.patch, SOLR-839.patch, lucene-xml-query-parser-2.4-dev.jar
>
>
> Lucene contrib includes a query parser that is able to create the 
> full-spectrum of Lucene queries, using an XML data structure.
> This patch adds "xml" query parser support to Solr.
> Example (from 
> {{lucene/queryparser/src/test/org/apache/lucene/queryparser/xml/NestedBooleanQuery.xml}}):
> {code}
>   
>   
> 
>   
> 
> doesNotExistButShouldBeOKBecauseOtherClauseExists
>   
> 
>   
>   
> bank
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8445) fix line separator in log4j.properties files

2015-12-18 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-8445:
---
Attachment: SOLR-8445.patch

> fix line separator in log4j.properties files
> 
>
> Key: SOLR-8445
> URL: https://issues.apache.org/jira/browse/SOLR-8445
> Project: Solr
>  Issue Type: Bug
>  Components: Server
>Affects Versions: 5.4, Trunk
>Reporter: Ahmet Arslan
>Priority: Trivial
> Attachments: SOLR-8445.patch
>
>
> new line is mistyped in conversion pattern 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8445) fix line separator in log4j.properties files

2015-12-18 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created SOLR-8445:
--

 Summary: fix line separator in log4j.properties files
 Key: SOLR-8445
 URL: https://issues.apache.org/jira/browse/SOLR-8445
 Project: Solr
  Issue Type: Bug
  Components: Server
Affects Versions: 5.4, Trunk
Reporter: Ahmet Arslan
Priority: Trivial


new line is mistyped in conversion pattern 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-2649:
---
Priority: Major  (was: Minor)

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038596#comment-15038596
 ] 

Ahmet Arslan commented on SOLR-2649:


bq. do we have a consensus on what the new behavior should be? 
I think Jan's [proposal | 
https://issues.apache.org/jira/browse/SOLR-2649?focusedCommentId=13199400=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13199400]

Personally, I understand that edismax is initially designed for power users who 
supposed to know what they are looking for. However, this assumption looks too 
strong given that wide range of users started to use edismax.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8339) SolrDocument and SolrInputDocument should have a common interface

2015-12-02 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035846#comment-15035846
 ] 

Ahmet Arslan commented on SOLR-8339:


With this change, can we remove 
{{org.apache.solr.client.solrj.util.ClientUtils#toSolrInputDocument}} method? 
And {{org.apache.solr.client.solrj.util.ClientUtils#toSolrDocument}} ?

> SolrDocument and SolrInputDocument should have a common interface
> -
>
> Key: SOLR-8339
> URL: https://issues.apache.org/jira/browse/SOLR-8339
> Project: Solr
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8339.patch, SOLR-8339.patch, SOLR-8339.patch
>
>
> Currently, both share a Map interface (SOLR-928). However, there are many 
> common methods like createField(), setField() etc. that should perhaps go 
> into an interface/abstract class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8345) Wrong query parsing

2015-11-26 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029175#comment-15029175
 ] 

Ahmet Arslan commented on SOLR-8345:


Saar, this is not a bug. Minus is a special query parser character. There are 
other query parsers to query those special characters. For example, terms query 
parser or raw query parser. Alternatively you can escape those special 
characters. 

Please rise your questions on the solr mailing list.

> Wrong query parsing
> ---
>
> Key: SOLR-8345
> URL: https://issues.apache.org/jira/browse/SOLR-8345
> Project: Solr
>  Issue Type: Bug
>Reporter: Saar Carmi
>Priority: Minor
>
> When sending a query for a numeirc field with =myfield:(-1) the query is 
> parsed as -myfield:1 
> I would expect it to be either parsed as myfield:"-1" or an exception to be 
> returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-11-09 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6818:
-
Attachment: LUCENE-6818.patch

Patch updated to current trunk (revision 1713433)

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, 
> LUCENE-6818.patch, LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-09-29 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6818:
-
Attachment: LUCENE-6818.patch

I tried to implement Robert's suggestion at 
{{TestSimilarityBase#testCrazyIndexTimeBoosts}}
It iterates over all possible norm values and 10 different term frequency _tf_ 
values. NaN, Infinity, Negative values are checked. But I am note sure about 
the Negative. Some models can return negative scores for certain terms. For 
example BM25 returns negative scores for common terms.

Currently only DFI is tested. Because other models make fail the test in its 
current form.

Some random question:

What is the preferred course of action during scoring when term frequency is 
greater than document length?


I think we should simply recommend to use index time boosts only with 
ClassicSimilarity. I wonder how SweetSpotSimilarity works with index time 
boosts, where artificially shortening the document length may decrease its rank.

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch, 
> LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-09-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6818:
-
Attachment: LUCENE-6818.patch

* renamed failing test to {{TestSimilarityBase#testIndexTimeBoost}}
* randomized the test method a bit

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: LUCENE-6818.patch, LUCENE-6818.patch, LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-09-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6818:
-
Attachment: LUCENE-6818.patch

This patch prevents infinity score by using +1 trick. Now 
{{TestSimilarity2#testCrazySpans}} passes.

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: LUCENE-6818.patch, LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-09-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910174#comment-14910174
 ] 

Ahmet Arslan commented on LUCENE-6818:
--

bq. The typical solution is to do something like adjust expected:
Thanks Robert for the suggestion and explanation. Used the typical solution, 
its working now.

bq. I have not read the paper, but these are things to deal with when 
integrating into lucene.
For your information, if you want to look at, Terrier 4.0 source tree has this 
model in DFIC.java

bq.  index-time boosts work on the norm, by making the document appear shorter 
or longer, so docLen might have a "crazy" value if the user does this.
I was relying {{o.a.l.search.similarities.SimilarityBase}} for this but it 
looks like all of its subclasses (DFR, IB) have this problem. I included 
{{TestSimilarityBase#testNorms}} method in the new patch to demonstrate the 
problem. If I am not missing something obvious this is a bug, no?

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Assignee: Robert Muir
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: LUCENE-6818.patch, LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6817) ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in toString()

2015-09-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6817:
-
Attachment: LUCENE-6817.patch

> ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in 
> toString()
> ---
>
> Key: LUCENE-6817
> URL: https://issues.apache.org/jira/browse/LUCENE-6817
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Trivial
> Fix For: Trunk
>
> Attachments: LUCENE-6817.patch, LUCENE-6817.patch
>
>
> This one is quite simple (I think) -- ComplexPhraseQuery doesn't display the 
> slop factor which, when the result of parsing is dumped to logs, for example, 
> can be confusing.
> I'm heading for a weekend out of office in a few hours... so in the spirit of 
> not committing and running away ( :) ), if anybody wishes to tackle this, go 
> ahead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6817) ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in toString()

2015-09-28 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6817:
-
Attachment: LUCENE-6817.patch

> ComplexPhraseQueryParser.ComplexPhraseQuery does not display slop in 
> toString()
> ---
>
> Key: LUCENE-6817
> URL: https://issues.apache.org/jira/browse/LUCENE-6817
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Priority: Trivial
> Fix For: Trunk
>
> Attachments: LUCENE-6817.patch
>
>
> This one is quite simple (I think) -- ComplexPhraseQuery doesn't display the 
> slop factor which, when the result of parsing is dumped to logs, for example, 
> can be confusing.
> I'm heading for a weekend out of office in a few hours... so in the spirit of 
> not committing and running away ( :) ), if anybody wishes to tackle this, go 
> ahead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-09-27 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-6818:


 Summary: Implementing Divergence from Independence (DFI) 
Term-Weighting for Lucene/Solr
 Key: LUCENE-6818
 URL: https://issues.apache.org/jira/browse/LUCENE-6818
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/query/scoring
Affects Versions: 5.3
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Trunk


As explained in the 
[write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
state-of-the-art ranking model implementations are added to Apache Lucene. 

This issue aims to include DFI model, which is the non-parametric counterpart 
of the Divergence from Randomness (DFR) framework.

DFI is both parameter-free and non-parametric:

* parameter-free: it does not require any parameter tuning or training.
 * non-parametric: it does not make any assumptions about word frequency 
distributions on document collections.

It is highly recommended *not* to remove stopwords (very common terms: the, of, 
and, to, a, in, for, is, on, that, etc) with this similarity.

For more information see: [A nonparametric term weighting method for 
information retrieval based on measuring the divergence from 
independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6818) Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr

2015-09-27 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6818:
-
Attachment: LUCENE-6818.patch

Patch for DFI. However, with this one {{TestSimilarity2#testCrazySpans}} fails.
Any pointers how to fix this will be really appreciated. 

> Implementing Divergence from Independence (DFI) Term-Weighting for Lucene/Solr
> --
>
> Key: LUCENE-6818
> URL: https://issues.apache.org/jira/browse/LUCENE-6818
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/query/scoring
>Affects Versions: 5.3
>Reporter: Ahmet Arslan
>Priority: Minor
>  Labels: similarity
> Fix For: Trunk
>
> Attachments: LUCENE-6818.patch
>
>
> As explained in the 
> [write-up|http://lucidworks.com/blog/flexible-ranking-in-lucene-4], many 
> state-of-the-art ranking model implementations are added to Apache Lucene. 
> This issue aims to include DFI model, which is the non-parametric counterpart 
> of the Divergence from Randomness (DFR) framework.
> DFI is both parameter-free and non-parametric:
> * parameter-free: it does not require any parameter tuning or training.
>  * non-parametric: it does not make any assumptions about word frequency 
> distributions on document collections.
> It is highly recommended *not* to remove stopwords (very common terms: the, 
> of, and, to, a, in, for, is, on, that, etc) with this similarity.
> For more information see: [A nonparametric term weighting method for 
> information retrieval based on measuring the divergence from 
> independence|http://dx.doi.org/10.1007/s10791-013-9225-4]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-6730) Hyper-parameter c is ignored in term frequency NormalizationH1

2015-08-09 Thread Ahmet Arslan (JIRA)
Ahmet Arslan created LUCENE-6730:


 Summary: Hyper-parameter c is ignored in term frequency 
NormalizationH1
 Key: LUCENE-6730
 URL: https://issues.apache.org/jira/browse/LUCENE-6730
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.2.1
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 5.4


Unlike {{NormalizationH2}}, *c* parameter is not used in term frequency 
calculation in {{NormalizationH1}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6730) Hyper-parameter c is ignored in term frequency NormalizationH1

2015-08-09 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6730:
-
Attachment: LUCENE-6730.patch

 Hyper-parameter c is ignored in term frequency NormalizationH1
 --

 Key: LUCENE-6730
 URL: https://issues.apache.org/jira/browse/LUCENE-6730
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.2.1
Reporter: Ahmet Arslan
Priority: Minor
  Labels: similarity
 Fix For: 5.4

 Attachments: LUCENE-6730.patch


 Unlike {{NormalizationH2}}, *c* parameter is not used in term frequency 
 calculation in {{NormalizationH1}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

2015-08-04 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6711:
-
Attachment: LUCENE-6711.patch

Patch that includes following migrate entry. But I am not sure this is an 
appropriate text for migrate.txt.
{panel:title=The way how number of document calculated is changed 
(LUCENE-6711)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE}
The number of documents (docCount) is used to calculate term specificity (idf) 
and average document length (avdl). Prior to LUCENE-6711, 
collectionStats.maxDoc() was used for the statistics. Now, 
collectionStats.docCount() is used whenever possible, if not maxDocs() is used.

Assume that a collection contains 100 documents, and 50 of them have keywords 
field. In this example, maxDocs is 100 while docCount is 50 for the keywords 
field. The total number of tokens for keywords field is divided by docCount 
to obtain avdl. Therefore, docCount which is the total number of documents that 
have at least one term for the field, is a more precise metric for optional 
fields.

DefaultSimilarity does not leverage avdl, so this change would have relatively 
minor change in the result list. Because relative idf values of terms will 
remain same. However, when combined with other factors such as term frequency, 
relative ranking of documents could change. Some Similarity implementations 
(such as the ones instantiated with NormalizationH2 and BM25) take account into 
avdl and would have notable change in ranked list. Especially if you have a 
collection of documents with varying lengths. Because NormalizationH2 tends to 
punish documents longer than avdl.
{panel}

 Instead of docCount(), maxDoc() is used for numberOfDocuments in 
 SimilarityBase
 ---

 Key: LUCENE-6711
 URL: https://issues.apache.org/jira/browse/LUCENE-6711
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: Trunk

 Attachments: LUCENE-6711.patch, LUCENE-6711.patch, LUCENE-6711.patch, 
 LUCENE-6711.patch


 {{SimilarityBase.java}} has the following line :
 {code}
  long numberOfDocuments = collectionStats.maxDoc();
 {code}
 It seems like {{collectionStats.docCount()}}, which returns the total number 
 of documents that have at least one term for this field, is more appropriate 
 statistics here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

2015-08-03 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6711:
-
Attachment: LUCENE-6711.patch

Includes changes to TFIDF and BM25, {{ant clean test}} passes.

 Instead of docCount(), maxDoc() is used for numberOfDocuments in 
 SimilarityBase
 ---

 Key: LUCENE-6711
 URL: https://issues.apache.org/jira/browse/LUCENE-6711
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.2.1
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 5.3

 Attachments: LUCENE-6711.patch, LUCENE-6711.patch, LUCENE-6711.patch


 {{SimilarityBase.java}} has the following line :
 {code}
  long numberOfDocuments = collectionStats.maxDoc();
 {code}
 It seems like {{collectionStats.docCount()}}, which returns the total number 
 of documents that have at least one term for this field, is more appropriate 
 statistics here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

2015-08-03 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6711:
-
Attachment: LUCENE-6711.patch

This patch checks for -1 and uses maxDoc() if docCount() is not unsupported.

 Instead of docCount(), maxDoc() is used for numberOfDocuments in 
 SimilarityBase
 ---

 Key: LUCENE-6711
 URL: https://issues.apache.org/jira/browse/LUCENE-6711
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.2.1
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 5.3

 Attachments: LUCENE-6711.patch, LUCENE-6711.patch


 {{SimilarityBase.java}} has the following line :
 {code}
  long numberOfDocuments = collectionStats.maxDoc();
 {code}
 It seems like {{collectionStats.docCount()}}, which returns the total number 
 of documents that have at least one term for this field, is more appropriate 
 statistics here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

2015-08-03 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651758#comment-14651758
 ] 

Ahmet Arslan commented on LUCENE-6711:
--

bq. We should fix TFIDFSimilarity and BM25Similarity too.

For TFIDF and BM25, do we simply replace {code}collectionStats.maxDoc(){code} 
with {code}collectionStats.docCount() == -1 ? collectionStats.maxDoc() : 
collectionStats.docCount(){code} ?

 Instead of docCount(), maxDoc() is used for numberOfDocuments in 
 SimilarityBase
 ---

 Key: LUCENE-6711
 URL: https://issues.apache.org/jira/browse/LUCENE-6711
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.2.1
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 5.3

 Attachments: LUCENE-6711.patch, LUCENE-6711.patch


 {{SimilarityBase.java}} has the following line :
 {code}
  long numberOfDocuments = collectionStats.maxDoc();
 {code}
 It seems like {{collectionStats.docCount()}}, which returns the total number 
 of documents that have at least one term for this field, is more appropriate 
 statistics here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

2015-08-01 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-6711:
-
Attachment: LUCENE-6711.patch

Patch that includes suggested change. However, this breaks most of the tests in 
{{TestSimilarityBase}}. What is the preferred course of action here?  

 Instead of docCount(), maxDoc() is used for numberOfDocuments in 
 SimilarityBase
 ---

 Key: LUCENE-6711
 URL: https://issues.apache.org/jira/browse/LUCENE-6711
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.2.1
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 5.3

 Attachments: LUCENE-6711.patch


 {{SimilarityBase.java}} has the following line :
 {code}
  long numberOfDocuments = collectionStats.maxDoc();
 {code}
 It seems like {{collectionStats.docCount()}}, which returns the total number 
 of documents that have at least one term for this field, is more appropriate 
 statistics here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   >