date:20170330

[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3932 - Still Unstable!

2017-03-30 Thread Policeman Jenkins Server

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3932/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

302 tests failed.
FAILED:  
org.apache.solr.AnalysisAfterCoreReloadTest.testStopwordsAfterCoreReload

Error Message:
SolrCore 'collection1' is not available due to init failure: invalid API spec: 
apispec/core.SchemaEdit.json

Stack Trace:
org.apache.solr.core.SolrCoreInitializationException: SolrCore 'collection1' is 
not available due to init failure: invalid API spec: 
apispec/core.SchemaEdit.json
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1311)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:146)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:177)
at 
org.apache.solr.AnalysisAfterCoreReloadTest.testStopwordsAfterCoreReload(AnalysisAfterCoreReloadTest.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Commented] (SOLR-10239) MOVEREPLICA API

2017-03-30 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950315#comment-15950315
 ] 

Noble Paul commented on SOLR-10239:
---

This API can have 2 variants
1) {{collection}}, {{replica}} and {{targetNode}} as parameters
2) {{collection}} , {{shard}}, {{fromNode}} & {{targetNode}}

> MOVEREPLICA API
> ---
>
> Key: SOLR-10239
> URL: https://issues.apache.org/jira/browse/SOLR-10239
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
>
> To move a replica from a node to another node, there should be an API 
> command. This should be better than having to do ADDREPLICA and DELETEREPLICA.
> The API will like this
> {code}
> /admin/collections?action=MOVEREPLICA=collection=shard=replica=nodeName=nodeName
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Config file indenting rules

2017-03-30 Thread David Smiley

bq. Let's make 'em uniform and argue later.

Agreed. I find it very annoying that the indentation on some of them are
internally inconsistent.

On Wed, Mar 29, 2017 at 7:19 PM Erick Erickson 
wrote:

> I don't think we've ever really had a formal rule, certainly not one
> that's been enforced. Personally I'd be satisfied with just doing
> whatever IntelliJ or Eclipse are happy with.
>
> I did finally take time to look and you can make IntelliJ recognize
> "managed-schema" as an XML file with settings>>editor>>file types
>
> Let's make 'em uniform and argue later.
>
> Erick
>
> On Wed, Mar 29, 2017 at 4:10 PM, Alexandre Rafalovitch
>  wrote:
> > I am redoing an example and realized I have no idea what the
> > formatting rules are.
> >
> > In the code, the WIKI says, we should use 2-spaces offset.
> >
> > What about in config (XML) files?
> >
> > I see no tabs, so at least that part is clear. But with spaces, I see
> > all sorts of things.
> >
> > I seem to see a mix of 4 and 2 spaces in managed-schema. I seem to see
> > 2 spaces in solrconfig.xml. I am not sure what I see in DIH
> > configuration files.
> >
> > Also, there are comments and their multi-line internal comments. I
> > seem to see 1 space there. Also, does the text in the comment start on
> > the same line as comment indicator ( > http://www.solr-start.com/ - Resources for Solr users, new and
> experienced
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

[jira] [Updated] (SOLR-10239) MOVEREPLICA API

2017-03-30 Thread Cao Manh Dat (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated SOLR-10239:

Description: 
To move a replica from a node to another node, there should be an API command. 
This should be better than having to do ADDREPLICA and DELETEREPLICA.
The API will like this
{code}
/admin/collections?action=MOVEREPLICA=collection=shard=replica=nodeName=nodeName
{code}

  was:
To move a replica from a node to another node, there should be an API command. 
This should be better than having to do ADDREPLICA and DELETEREPLICA.
The API will like this
{code}
/admin/collections?action=MOVEREPLICA=collection=shard=replica=nodeName=nodeName
{code}


> MOVEREPLICA API
> ---
>
> Key: SOLR-10239
> URL: https://issues.apache.org/jira/browse/SOLR-10239
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
>
> To move a replica from a node to another node, there should be an API 
> command. This should be better than having to do ADDREPLICA and DELETEREPLICA.
> The API will like this
> {code}
> /admin/collections?action=MOVEREPLICA=collection=shard=replica=nodeName=nodeName
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10239) MOVEREPLICA API

2017-03-30 Thread Cao Manh Dat (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated SOLR-10239:

Description: 
To move a replica from a node to another node, there should be an API command. 
This should be better than having to do ADDREPLICA and DELETEREPLICA.
The API will like this
{code}
/admin/collections?action=MOVEREPLICA=collection=shard=replica=nodeName=nodeName
{code}

  was:To move a replica from a node to another node, there should be an API 
command. This should be better than having to do ADDREPLICA and DELETEREPLICA.


> MOVEREPLICA API
> ---
>
> Key: SOLR-10239
> URL: https://issues.apache.org/jira/browse/SOLR-10239
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
>
> To move a replica from a node to another node, there should be an API 
> command. This should be better than having to do ADDREPLICA and DELETEREPLICA.
> The API will like this
> {code}
> /admin/collections?action=MOVEREPLICA=collection=shard=replica=nodeName=nodeName
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-10239) MOVEREPLICA API

2017-03-30 Thread Cao Manh Dat (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat reassigned SOLR-10239:
---

Assignee: Cao Manh Dat

> MOVEREPLICA API
> ---
>
> Key: SOLR-10239
> URL: https://issues.apache.org/jira/browse/SOLR-10239
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Cao Manh Dat
>
> To move a replica from a node to another node, there should be an API 
> command. This should be better than having to do ADDREPLICA and DELETEREPLICA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9989) Add support for PointFields in FacetModule (JSON Facets)

2017-03-30 Thread Cao Manh Dat (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated SOLR-9989:
---
Attachment: SOLR-9989.patch

Initial work for this patch. Implemented UniqueAgg and HLLAgg for 
SortedNumericDocValues.

> Add support for PointFields in FacetModule (JSON Facets)
> 
>
> Key: SOLR-9989
> URL: https://issues.apache.org/jira/browse/SOLR-9989
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Facet Module
>Reporter: Tomás Fernández Löbbe
> Attachments: SOLR-9989.patch
>
>
> Followup task of SOLR-8396



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release PyLucene 6.5.0 (rc2) (now with Python 3 support)

2017-03-30 Thread Jeff Breidenbach

+1.00 ± 0.01

On Thu, Mar 30, 2017 at 12:27 PM, Andi Vajda  wrote:

>
> A few fixes were needed in JCC for better Windows support.
> The PyLucene 6.5.0 rc1 vote is thus cancelled.
>
> I'm now calling for a vote on PyLucene 6.5.0 rc2.
>
> The PyLucene 6.5.0 (rc2) release tracking the recent release of
> Apache Lucene 6.5.0 is ready.
>
> A release candidate is available from:
>   https://dist.apache.org/repos/dist/dev/lucene/pylucene/6.5.0-rc2/
>
> PyLucene 6.5.0 is built with JCC 3.0 included in these release artifacts.
>
> JCC 3.0 now supports Python 3.3+ (in addition to Python 2.3+).
> PyLucene may be built with Python 2 or Python 3.
>
> Please vote to release these artifacts as PyLucene 6.5.0.
> Anyone interested in this release can and should vote !
>
> Thanks !
>
> Andi..
>
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
>
> pps: here is my +1
>

[JENKINS] Lucene-Solr-6.x-MacOSX (64bit/jdk1.8.0) - Build # 795 - Unstable!

2017-03-30 Thread Policeman Jenkins Server

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/795/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseParallelGC

11 tests failed.
FAILED:  org.apache.solr.cloud.CustomCollectionTest.testRouteFieldForHashRouter

Error Message:
Collection not found: routeFieldColl

Stack Trace:
org.apache.solr.common.SolrException: Collection not found: routeFieldColl
at 
__randomizedtesting.SeedInfo.seed([D6B026E0B1EE1F8E:7E86B83D2E8FF4D4]:0)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getCollectionNames(CloudSolrClient.java:1382)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1075)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1054)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160)
at 
org.apache.solr.client.solrj.request.UpdateRequest.commit(UpdateRequest.java:233)
at 
org.apache.solr.cloud.CustomCollectionTest.testRouteFieldForHashRouter(CustomCollectionTest.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at

[jira] [Commented] (SOLR-10087) StreamHandler should be able to use runtimeLib jars

2017-03-30 Thread Kevin Risden (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949872#comment-15949872
 ] 

Kevin Risden commented on SOLR-10087:
-

[~ctargett] - I'm 90% sure the instructions are still accurate. I haven't had a 
chance to try it with the new Solr release though.

> StreamHandler should be able to use runtimeLib jars
> ---
>
> Key: SOLR-10087
> URL: https://issues.apache.org/jira/browse/SOLR-10087
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-10087.patch
>
>
> StreamHandler currently can't uses jars that via the runtimeLib and Blob 
> Store api. This is because the StreamHandler uses core.getResourceLoader() 
> instead of core.getMemClassLoader() for loading classes.
> An example of this working with the fix is here: 
> https://github.com/risdenk/solr_custom_streaming_expressions
> Steps:
> {code}
> # Inspired by 
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
> # Start Solr with enabling Blob Store
> ./bin/solr start -c -f -a "-Denable.runtime.lib=true"
> # Create test collection
> ./bin/solr create -c test
> # Create .system collection
> curl 'http://localhost:8983/solr/admin/collections?action=CREATE=.system'
> # Build custom streaming expression jar
> (cd custom-streaming-expression && mvn clean package)
> # Upload jar to .system collection using Blob Store API 
> (https://cwiki.apache.org/confluence/display/solr/Blob+Store+API)
> curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
> @custom-streaming-expression/target/custom-streaming-expression-1.0-SNAPSHOT.jar
>  'http://localhost:8983/solr/.system/blob/test'
> # List all blobs that are stored
> curl 'http://localhost:8983/solr/.system/blob?omitHeader=true'
> # Add the jar to the runtime lib
> curl 'http://localhost:8983/solr/test/config' -H 
> 'Content-type:application/json' -d '{
>"add-runtimelib": { "name":"test", "version":1 }
> }'
> # Create custom streaming expression using work from SOLR-9103
> # Patch from SOLR-10087 is required for StreamHandler to load the runtimeLib 
> jar
> curl 'http://localhost:8983/solr/test/config' -H 
> 'Content-type:application/json' -d '{
>   "create-expressible": {
> "name": "customstreamingexpression",
> "class": "com.test.solr.CustomStreamingExpression",
> "runtimeLib": true
>   }
> }'
> # Test the custom streaming expression
> curl 'http://localhost:8983/solr/test/stream?expr=customstreamingexpression()'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-10387) zkTransfer normalizes destination path incorrectly if source is a windows directory

2017-03-30 Thread gopikannan venugopalsamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gopikannan venugopalsamy closed SOLR-10387.
---

> zkTransfer normalizes destination path incorrectly if source is a windows 
> directory 
> 
>
> Key: SOLR-10387
> URL: https://issues.apache.org/jira/browse/SOLR-10387
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: gopikannan venugopalsamy
>Assignee: Erick Erickson
> Fix For: trunk, 6.6
>
> Attachments: SOLR-10387.patch, SOLR-10387.patch, SOLR-10387.patch, 
> SOLR-10387.patch, SOLR-10387.patch
>
>
> While normalizing dest it is looking only for '/' in source path but this 
> will not work for windows style delimiter.
> /lucene-solr/solr/solrj/src/java/org/apache/solr/common/cloud/ZkMaintenanceUtils.java
>   private static String normalizeDest(String srcName, String dstName) {
> if (dstName.endsWith("/")) { // Dest is a directory.
>   int pos = srcName.lastIndexOf("/");
>   if (pos < 0) {



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10386) Analysis page breaks on anonymous inner class TokenFilter name

2017-03-30 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949805#comment-15949805
 ] 

Varun Thacker commented on SOLR-10386:
--

I think I saw the same on SOLR-10366 . it was referencing an anonymous inner 
class when I hovered around it. But I don't see it anymore for that specific 
case.

> Analysis page breaks on anonymous inner class TokenFilter name
> --
>
> Key: SOLR-10386
> URL: https://issues.apache.org/jira/browse/SOLR-10386
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 6.2.1
>Reporter: Michael Braun
>Priority: Minor
>
> analysis.js has a function getShortComponentName which attempts to find the 
> short name given a full class name. However, this does not work and gets a 
> null internally when the class looks something like this, as in an anonymous 
> inner class:
> com.company.solr.stuff.MyTokenFilterFactory$1
> {code}
> TypeError: Cannot read property 'join' of null
>   at getShortComponentName (analysis.js:50)
>   at extractComponents (analysis.js:96)
>   at processAnalysisData (analysis.js:109)
>   at analysis.js:172
>   at angular-resource.min.js:33
>   at processQueue (angular.js:13193)
>   at angular.js: 13209
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3931 - Unstable!

2017-03-30 Thread Policeman Jenkins Server

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3931/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseParallelGC

12 tests failed.
FAILED:  
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testCollectionsAPI

Error Message:
Expected to see collection awhollynewcollection_0 null Last available state: 
DocCollection(awhollynewcollection_0//collections/awhollynewcollection_0/state.json/8)={
   "replicationFactor":"4",   "shards":{ "shard1":{   
"range":"8000-b332",   "state":"active",   "replicas":{ 
"core_node2":{   "core":"awhollynewcollection_0_shard1_replica4",   
"base_url":"http://127.0.0.1:59858/solr;,   
"node_name":"127.0.0.1:59858_solr",   "state":"active",   
"leader":"true"}, "core_node5":{   
"core":"awhollynewcollection_0_shard1_replica2",   
"base_url":"http://127.0.0.1:59860/solr;,   
"node_name":"127.0.0.1:59860_solr",   "state":"active"}, 
"core_node9":{   "core":"awhollynewcollection_0_shard1_replica3",   
"base_url":"http://127.0.0.1:59856/solr;,   
"node_name":"127.0.0.1:59856_solr",   "state":"active"}, 
"core_node18":{   "core":"awhollynewcollection_0_shard1_replica1",  
 "base_url":"http://127.0.0.1:59857/solr;,   
"node_name":"127.0.0.1:59857_solr",   "state":"active"}}}, 
"shard2":{   "range":"b333-e665",   "state":"active",   
"replicas":{ "core_node3":{   
"core":"awhollynewcollection_0_shard2_replica2",   
"base_url":"http://127.0.0.1:59860/solr;,   
"node_name":"127.0.0.1:59860_solr",   "state":"active",   
"leader":"true"}, "core_node14":{   
"core":"awhollynewcollection_0_shard2_replica4",   
"base_url":"http://127.0.0.1:59858/solr;,   
"node_name":"127.0.0.1:59858_solr",   "state":"active"}, 
"core_node20":{   "core":"awhollynewcollection_0_shard2_replica3",  
 "base_url":"http://127.0.0.1:59856/solr;,   
"node_name":"127.0.0.1:59856_solr",   "state":"active"}}}, 
"shard3":{   "range":"e666-1998",   "state":"active",   
"replicas":{ "core_node4":{   
"core":"awhollynewcollection_0_shard3_replica2",   
"base_url":"http://127.0.0.1:59860/solr;,   
"node_name":"127.0.0.1:59860_solr",   "state":"active"}, 
"core_node16":{   "core":"awhollynewcollection_0_shard3_replica4",  
 "base_url":"http://127.0.0.1:59858/solr;,   
"node_name":"127.0.0.1:59858_solr",   "state":"active",   
"leader":"true"}}}, "shard4":{   "range":"1999-4ccb",   
"state":"active",   "replicas":{ "core_node7":{   
"core":"awhollynewcollection_0_shard4_replica4",   
"base_url":"http://127.0.0.1:59858/solr;,   
"node_name":"127.0.0.1:59858_solr",   "state":"active",   
"leader":"true"}, "core_node8":{   
"core":"awhollynewcollection_0_shard4_replica3",   
"base_url":"http://127.0.0.1:59856/solr;,   
"node_name":"127.0.0.1:59856_solr",   "state":"active"}, 
"core_node11":{   "core":"awhollynewcollection_0_shard4_replica2",  
 "base_url":"http://127.0.0.1:59860/solr;,   
"node_name":"127.0.0.1:59860_solr",   "state":"active"}}}, 
"shard5":{   "range":"4ccc-7fff",   "state":"active",   
"replicas":{"core_node15":{   
"core":"awhollynewcollection_0_shard5_replica3",   
"base_url":"http://127.0.0.1:59856/solr;,   
"node_name":"127.0.0.1:59856_solr",   "state":"active",   
"leader":"true",   "router":{"name":"compositeId"},   
"maxShardsPerNode":"6",   "autoAddReplicas":"false",   "realtimeReplicas":"-1"}

Stack Trace:
java.lang.AssertionError: Expected to see collection awhollynewcollection_0
null
Last available state: 
DocCollection(awhollynewcollection_0//collections/awhollynewcollection_0/state.json/8)={
  "replicationFactor":"4",
  "shards":{
"shard1":{
  "range":"8000-b332",
  "state":"active",
  "replicas":{
"core_node2":{
  "core":"awhollynewcollection_0_shard1_replica4",
  "base_url":"http://127.0.0.1:59858/solr;,
  "node_name":"127.0.0.1:59858_solr",
  "state":"active",
  "leader":"true"},
"core_node5":{
  "core":"awhollynewcollection_0_shard1_replica2",
  "base_url":"http://127.0.0.1:59860/solr;,
  "node_name":"127.0.0.1:59860_solr",
  "state":"active"},
"core_node9":{
  "core":"awhollynewcollection_0_shard1_replica3",
  "base_url":"http://127.0.0.1:59856/solr;,
  "node_name":"127.0.0.1:59856_solr",
  "state":"active"},
"core_node18":{

[jira] [Comment Edited] (SOLR-10346) Clean up static page HTML top nav

2017-03-30 Thread Cassandra Targett (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949724#comment-15949724
 ] 

Cassandra Targett edited comment on SOLR-10346 at 3/30/17 8:14 PM:
---

I just committed some changes that remove {{topnav.yml}} and hardcoded the html 
into {{topnav.html}}. This makes it easy to use the 
{{site.solr-attributes.solr-javadocs}} variable in the liquid template and 
considering how little we need to put there that seemed easiest. The YAML data 
model really works best when you want the same look & feel with a different 
menu, and that's not the case here.

I also hard-coded the "News" link to go to the Solr website's News page, and 
changed what users see to "Solr News".

I modified {{_config.yml.template}} to disable feedback ({{feedback_disable: 
true}}. I couldn't really think of where we'd want email to go, and they will 
be able to comment on the pages. I have not yet removed the {{feedback.html}} 
file, however, in case anyone disagrees.

Oh, and I removed the "Jekyll Resources" stuff entirely. 

The commit is: 
https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=3b592f1be5f803c33fc170f6bf54b78e0597bef3


was (Author: ctargett):
I just committed some changes that remove {{topnav.yml}} and hardcoded the html 
into {{topnav.html}}. This makes it easy to use the 
{{site.solr-attributes.solr-javadocs}} variable in the liquid template and 
considering how little we need to put there that seemed easiest. The YAML data 
model really works best when you want the same look & feel with a different 
menu, and that's not the case here.

I also hard-coded the "News" link to go to the Solr website's News page, and 
changed what users see to "Solr News".

I modified {{_config.yml.template}} to disable feedback ({{feedback_disable: 
true}}. I couldn't really think of where we'd want email to go, and they will 
be able to comment on the pages. I have not yet removed the {{feedback.html}} 
file, however, in case anyone disagrees.

The commit is: 
https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=3b592f1be5f803c33fc170f6bf54b78e0597bef3

> Clean up static page HTML top nav
> -
>
> Key: SOLR-10346
> URL: https://issues.apache.org/jira/browse/SOLR-10346
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Cassandra Targett
> Attachments: SRG-top-nav-20170322.png, SRG-topnav-20170330.png
>
>
> For demo purposes, the top navigation bar for the HTML version of the Ref 
> Guide includes some stuff we probably don't want in production. This should 
> be cleaned up and finalized.
> I'll attach a screenshot of the current nav for reference. It currently has 
> these sections:
> * Home link. This should be made dynamic to update automatically for each 
> version the Guide applies to
> * News. Probably don't need this? Today it goes nowhere, but it could go to 
> the News section of the Solr website.
> * Jekyll Resources. Links to stuff about Jekyll. We don't want this.
> * Solr Resources. Links to Javadocs, Source code and Community page of Solr 
> website. Javadoc links should be dynamic.
> * Feedback. Javascript to open local Mail application to send an email. 
> Currently goes to my apache.org address, which I don't want.
> * Search box. This can stay, and we can modify it to do whatever we want it 
> to do when SOLR-10299 is resolved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10346) Clean up static page HTML top nav

2017-03-30 Thread Cassandra Targett (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-10346:
-
Attachment: SRG-topnav-20170330.png

Added a new screenshot showing what the nav looks like after my latest changes.

> Clean up static page HTML top nav
> -
>
> Key: SOLR-10346
> URL: https://issues.apache.org/jira/browse/SOLR-10346
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Cassandra Targett
> Attachments: SRG-top-nav-20170322.png, SRG-topnav-20170330.png
>
>
> For demo purposes, the top navigation bar for the HTML version of the Ref 
> Guide includes some stuff we probably don't want in production. This should 
> be cleaned up and finalized.
> I'll attach a screenshot of the current nav for reference. It currently has 
> these sections:
> * Home link. This should be made dynamic to update automatically for each 
> version the Guide applies to
> * News. Probably don't need this? Today it goes nowhere, but it could go to 
> the News section of the Solr website.
> * Jekyll Resources. Links to stuff about Jekyll. We don't want this.
> * Solr Resources. Links to Javadocs, Source code and Community page of Solr 
> website. Javadoc links should be dynamic.
> * Feedback. Javascript to open local Mail application to send an email. 
> Currently goes to my apache.org address, which I don't want.
> * Search box. This can stay, and we can modify it to do whatever we want it 
> to do when SOLR-10299 is resolved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10346) Clean up static page HTML top nav

2017-03-30 Thread Cassandra Targett (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949724#comment-15949724
 ] 

Cassandra Targett commented on SOLR-10346:
--

I just committed some changes that remove {{topnav.yml}} and hardcoded the html 
into {{topnav.html}}. This makes it easy to use the 
{{site.solr-attributes.solr-javadocs}} variable in the liquid template and 
considering how little we need to put there that seemed easiest. The YAML data 
model really works best when you want the same look & feel with a different 
menu, and that's not the case here.

I also hard-coded the "News" link to go to the Solr website's News page, and 
changed what users see to "Solr News".

I modified {{_config.yml.template}} to disable feedback ({{feedback_disable: 
true}}. I couldn't really think of where we'd want email to go, and they will 
be able to comment on the pages. I have not yet removed the {{feedback.html}} 
file, however, in case anyone disagrees.

The commit is: 
https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=3b592f1be5f803c33fc170f6bf54b78e0597bef3

> Clean up static page HTML top nav
> -
>
> Key: SOLR-10346
> URL: https://issues.apache.org/jira/browse/SOLR-10346
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Cassandra Targett
> Attachments: SRG-top-nav-20170322.png
>
>
> For demo purposes, the top navigation bar for the HTML version of the Ref 
> Guide includes some stuff we probably don't want in production. This should 
> be cleaned up and finalized.
> I'll attach a screenshot of the current nav for reference. It currently has 
> these sections:
> * Home link. This should be made dynamic to update automatically for each 
> version the Guide applies to
> * News. Probably don't need this? Today it goes nowhere, but it could go to 
> the News section of the Solr website.
> * Jekyll Resources. Links to stuff about Jekyll. We don't want this.
> * Solr Resources. Links to Javadocs, Source code and Community page of Solr 
> website. Javadoc links should be dynamic.
> * Feedback. Javascript to open local Mail application to send an email. 
> Currently goes to my apache.org address, which I don't want.
> * Search box. This can stay, and we can modify it to do whatever we want it 
> to do when SOLR-10299 is resolved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9745) bin/solr* swallows errors from running example instances at least

2017-03-30 Thread gopikannan venugopalsamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gopikannan venugopalsamy updated SOLR-9745:
---
Attachment: SOLR-9745.patch

Please check this patch, Added a test case to check if SolrCLI returns failure 
on unable to execute script.

> bin/solr* swallows errors from running example instances at least
> -
>
> Key: SOLR-9745
> URL: https://issues.apache.org/jira/browse/SOLR-9745
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: 6.3, master (7.0)
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>  Labels: newbie, newdev
> Attachments: SOLR-9745.patch, SOLR-9745.patch
>
>
> It occurs on mad scenario in LUCENE-7534:
> * solr.cmd weren't granted +x (it happens under cygwin, yes)
> * coolhacker worked it around with cmd /C solr.cmd start -e ..
> * but when SolrCLI runs solr instances with the same solr.cmd, it just 
> silently fails
> I think we can just pass ExecuteResultHandler which will dump exception to 
> console. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[VOTE] Release PyLucene 6.5.0 (rc2) (now with Python 3 support)

2017-03-30 Thread Andi Vajda



A few fixes were needed in JCC for better Windows support.
The PyLucene 6.5.0 rc1 vote is thus cancelled.

I'm now calling for a vote on PyLucene 6.5.0 rc2.

The PyLucene 6.5.0 (rc2) release tracking the recent release of
Apache Lucene 6.5.0 is ready.

A release candidate is available from:
  https://dist.apache.org/repos/dist/dev/lucene/pylucene/6.5.0-rc2/

PyLucene 6.5.0 is built with JCC 3.0 included in these release artifacts.

JCC 3.0 now supports Python 3.3+ (in addition to Python 2.3+).
PyLucene may be built with Python 2 or Python 3.

Please vote to release these artifacts as PyLucene 6.5.0.
Anyone interested in this release can and should vote !

Thanks !

Andi..

ps: the KEYS file for PyLucene release signing is at:
https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS

pps: here is my +1

[jira] [Updated] (LUCENE-7701) Refactor grouping collectors

2017-03-30 Thread Alan Woodward (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7701:
--
Attachment: LUCENE-7701.patch

This patch also refactors the second-pass collector into a concrete class that 
accepts a GroupReducer.  GroupReducers return a Collector instance for each 
group that is to be reduced, so for example the TopGroupsCollector will create 
a TopDocsCollector for each group.

Now if you want to create a new type of group (a set of ranges over a 
DoubleValuesSource, for example) you just need to create a GroupSelector 
implementation; and if you want to create a new type of group summarizer (say 
some statistics over a group), then you create a GroupReducer implementation.

There's a failing test in Solr that I still need to track down, but I think 
this is a much nicer grouping interface.  It is not at all backwards 
compatible, so I'd be targeting this at 7.0.  cc [~martijn.v.groningen]

> Refactor grouping collectors
> 
>
> Key: LUCENE-7701
> URL: https://issues.apache.org/jira/browse/LUCENE-7701
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
> Attachments: LUCENE-7701.patch, LUCENE-7701.patch
>
>
> Grouping currently works via abstract collectors, which need to be overridden 
> for each way of defining a group - currently we have two, 'term' (based on 
> SortedDocValues) and 'function' (based on ValueSources).  These collectors 
> all have a lot of repeated code, and means that if you want to implement your 
> own group definitions, you need to override four or five different classes.
> This would be easier to deal with if instead the 'group selection' code was 
> abstracted out into a single interface, and the various collectors were 
> changed to concrete implementations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

2017-03-30 Thread Andi Vajda



On Thu, 30 Mar 2017, Rüdiger Meier wrote:




On 03/30/2017 09:05 PM, Andi Vajda wrote:


On Thu, 30 Mar 2017, Petrus HyvC6nen wrote:


Hi,

My current diff to the svn is below (as in the chain of mails). Now
i get it to wrap my library in both 2.7, 3.5 and 3.6.


I believe, I've now applied all these diffs (or equivalents). Thank
you Petrus for testing on Windows, I'm going to release rc2 artifacts
and call for a new vote.


Just noticed another minor thing needed for python 3.7 support.


There is no Python 3.7 yet, is there ?
Anyhow, I just cut rc2. Next release.

Andi..



jcc3/sources/jcc.cpp, line 485:

-char *option = PyUnicode_AsUTF8(arg);
+const char *option = PyUnicode_AsUTF8(arg);

cu,
Rudi

[JENKINS] Lucene-Solr-6.x-Windows (32bit/jdk1.8.0_121) - Build # 816 - Still Unstable!

2017-03-30 Thread Policeman Jenkins Server

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/816/
Java: 32bit/jdk1.8.0_121 -client -XX:+UseConcMarkSweepGC

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.component.DistributedFacetPivotLongTailTest

Error Message:
Could not remove the following files (in the order of attempts):
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload\with-script-processor:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload\with-script-processor

C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload

C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets

C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0

C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001
 

Stack Trace:
java.io.IOException: Could not remove the following files (in the order of 
attempts):
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload\with-script-processor:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload\with-script-processor
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets\upload
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0\configsets
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001\shard0
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\solr\build\solr-core\test\J0\temp\solr.handler.component.DistributedFacetPivotLongTailTest_1E2C89DC1F576EF8-001\tempDir-001

at __randomizedtesting.SeedInfo.seed([1E2C89DC1F576EF8]:0)
at org.apache.lucene.util.IOUtils.rm(IOUtils.java:323)
at

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

2017-03-30 Thread Rüdiger Meier




On 03/30/2017 09:05 PM, Andi Vajda wrote:


On Thu, 30 Mar 2017, Petrus Hyvönen wrote:


Hi,

My current diff to the svn is below (as in the chain of mails). Now
i get it to wrap my library in both 2.7, 3.5 and 3.6.


I believe, I've now applied all these diffs (or equivalents). Thank
you Petrus for testing on Windows, I'm going to release rc2 artifacts
and call for a new vote.


Just noticed another minor thing needed for python 3.7 support.

jcc3/sources/jcc.cpp, line 485:

-char *option = PyUnicode_AsUTF8(arg);
+const char *option = PyUnicode_AsUTF8(arg);

cu,
Rudi

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

2017-03-30 Thread Andi Vajda



On Thu, 30 Mar 2017, Petrus Hyvönen wrote:


Hi,

My current diff to the svn is below (as in the chain of mails). Now i get
it to wrap my library in both 2.7, 3.5 and 3.6.


I believe, I've now applied all these diffs (or equivalents).
Thank you Petrus for testing on Windows, I'm going to release rc2 artifacts 
and call for a new vote.


Andi..



/Regards


Index: jcc2/__init__.py
===
--- jcc2/__init__.py (revision 1789413)
+++ jcc2/__init__.py (working copy)
@@ -20,7 +20,7 @@
from windows import add_jvm_dll_directory_to_path
add_jvm_dll_directory_to_path()

-from jcc2.config import SHARED
+from jcc.config import SHARED
if SHARED:
path = os.environ['Path'].split(os.pathsep)
eggpath =
os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
Index: jcc3/sources/functions.cpp
===
--- jcc3/sources/functions.cpp (revision 1789413)
+++ jcc3/sources/functions.cpp (working copy)
@@ -300,7 +300,7 @@
#if defined(_MSC_VER) || defined(__SUNPRO_CC)
int __parseArgs(PyObject *args, char *types, ...)
{
-int count = PY_SIZE((PyTupleObject *) args);
+int count = Py_SIZE((PyTupleObject *) args); //WAS PY_SIZE
va_list list, check;

va_start(list, types);
Index: jcc3/sources/jcc.cpp
===
--- jcc3/sources/jcc.cpp (revision 1789413)
+++ jcc3/sources/jcc.cpp (working copy)
@@ -195,11 +195,11 @@

static PyObject *t_jccenv_strhash(PyObject *self, PyObject *arg)
{
-static const size_t hexdig = sizeof(uintmax_t) * 2;
-uintmax_t hash = (uintmax_t) PyObject_Hash(arg);
+unsigned long long hash = (unsigned long long) PyObject_Hash(arg);
+static const size_t hexdig = sizeof(hash) * 2;
char buffer[hexdig + 1];

-sprintf(buffer, "%0*"PRIxMAX, (int) hexdig, hash);
+sprintf(buffer, "%0*llx", (int) hexdig, hash);
return PyUnicode_FromStringAndSize(buffer, hexdig);
}

Index: setup.py
===
--- setup.py (revision 1789413)
+++ setup.py (working copy)
@@ -158,7 +158,7 @@
'sunos5': ['-L%(sunos5)s/jre/lib/i386' %(JDK), '-ljava',
   '-L%(sunos5)s/jre/lib/i386/client' %(JDK), '-ljvm',
   '-R%(sunos5)s/jre/lib/i386:%(sunos5)s/jre/lib/i386/client'
%(JDK)],
-'win32': ['/LIBPATH:%(win32)s/lib' %(JDK), 'Ws2_32.lib', 'jvm.lib'],
+'win32': ['/LIBPATH:%(win32)s/lib' %(JDK), 'Ws2_32.lib',
'jvm.lib','/DLL'],
'mingw32': ['-L%(mingw32)s/lib' %(JDK), '-ljvm'],
'freebsd7': ['-L%(freebsd7)s/jre/lib/i386' %(JDK), '-ljava',
'-lverify',
 '-L%(freebsd7)s/jre/lib/i386/client' %(JDK), '-ljvm',



On Thu, Mar 30, 2017 at 5:36 PM, Petrus Hyvönen 
wrote:


Hi,

I was trying the python 2.7 build and I think the line 23 in
jcc2/__init__.py should be:

from jcc.config import SHARED

(instead of from jcc2.config import..)

Regards
/Petrus


On Thu, Mar 30, 2017 at 9:10 AM, Petrus Hyvönen 
wrote:


Hi,

With this version of of t_jccenv_strhash I can build both JCC and wrap
the library I'm using!

Regards
/Petrus







static PyObject *t_jccenv_strhash(PyObject *self, PyObject *arg)
{
   unsigned long long hash = (unsigned long long) PyObject_Hash(arg);
   static const size_t hexdig = sizeof(hash) * 2;
   char buffer[hexdig + 1];

   sprintf(buffer, "%0*llx", (int) hexdig, hash);
   return PyUnicode_FromStringAndSize(buffer, hexdig);
}

BTW this function should be also copied to the py2 directory where we
still use int allthough PyObject_Hash returns already long on python


2.x.



cu,
Rudi






--
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00 <073-803%2019%2000>





--
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00 <073-803%2019%2000>





--
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00

[jira] [Updated] (SOLR-8138) Simple UI for issuing SQL queries

2017-03-30 Thread Michael Suzuki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Suzuki updated SOLR-8138:
-
Attachment: SOLR-8138.patch

The new sql query ui with angular ui-grid.

> Simple UI for issuing SQL queries
> -
>
> Key: SOLR-8138
> URL: https://issues.apache.org/jira/browse/SOLR-8138
> Project: Solr
>  Issue Type: New Feature
>  Components: Admin UI
>Affects Versions: 6.0
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8138.patch, SOLR-8138.patch
>
>
> It would be great for Solr 6 if we could have admin screen where we could 
> issue SQL queries using the new SQL interface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8138) Simple UI for issuing SQL queries

2017-03-30 Thread Michael Suzuki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949587#comment-15949587
 ] 

Michael Suzuki commented on SOLR-8138:
--

Good point [~upayavira] , I have reverted it back to its original response.

> Simple UI for issuing SQL queries
> -
>
> Key: SOLR-8138
> URL: https://issues.apache.org/jira/browse/SOLR-8138
> Project: Solr
>  Issue Type: New Feature
>  Components: Admin UI
>Affects Versions: 6.0
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8138.patch
>
>
> It would be great for Solr 6 if we could have admin screen where we could 
> issue SQL queries using the new SQL interface.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949582#comment-15949582
 ] 

ASF subversion and git services commented on SOLR-10351:


Commit 8fcf55634cd1e7335eed1c220c5ab628bbea8202 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8fcf556 ]

SOLR-10351: Fix pre-commit


> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949581#comment-15949581
 ] 

ASF subversion and git services commented on SOLR-10351:


Commit 434a61e1edcf425ae24213b4fddb2a6e4ed741be in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=434a61e ]

SOLR-10351: Add analyze Stream Evaluator to support streaming NLP


> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2017-03-30 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949575#comment-15949575
 ] 

Joel Bernstein commented on SOLR-8593:
--

Yeah, I'm running behind on the docs. I'm traveling this week, but can update 
the docs early next week.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>  Components: Parallel SQL
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10320) Perform secondary sort using both values in and outside Solr index

2017-03-30 Thread Brandy Kinlaw (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandy Kinlaw updated SOLR-10320:
-
Attachment: 0001-SOLR-10320-Perform-secondary-sort-using-both-values-.patch

Submitted Pull Request #179. This is just an ideas pull request to address 
issue SOLR-10320. Any suggestions/feedback is welcome. Still needs more testing 
and unit tests. 

> Perform secondary sort using both values in and outside Solr index
> --
>
> Key: SOLR-10320
> URL: https://issues.apache.org/jira/browse/SOLR-10320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yuchuan Zhou
> Attachments: 
> 0001-SOLR-10320-Perform-secondary-sort-using-both-values-.patch
>
>
> There are some situations that we need to sort results based on values 
> outside of Solr (say, from a separate datastore or a data analytics service 
> that ranks entities based on analytic results). There is also the need to 
> return results in a deterministic order but applying a dynamic chain of 
> sorting and/or ranking algorithms to the result set. This chain would be 
> processed as a secondary sort implementation, where ties returned from one 
> sorting/ranking algorithm are passed to the next sorting/ranking algorithm in 
> the chain until all ties are resolved, resulting in a deterministic result 
> order. This chain should have the ability to apply sorting algorithms that 
> use data found within the solr index as well as outside of the index.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10039) LatLonPointSpatialField

2017-03-30 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949561#comment-15949561
 ] 

David Smiley commented on SOLR-10039:
-

bq. David Smiley: can you please update the ref guide with some guidance on 
this new field type?

Absolutely; I planned to do so tonight.  I know I have to beat the clock before 
the ref guide RC sometime tomorrow.

> LatLonPointSpatialField
> ---
>
> Key: SOLR-10039
> URL: https://issues.apache.org/jira/browse/SOLR-10039
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.5
>
> Attachments: SOLR_10039_LatLonPointSpatialField.patch, 
> SOLR_10039_LatLonPointSpatialField.patch, 
> SOLR_10039_LatLonPointSpatialField.patch, 
> SOLR_10039_LatLonPointSpatialField.patch
>
>
> The fastest and most efficient spatial field for point data in Lucene/Solr is 
> {{LatLonPoint}} in Lucene's sandbox module.  I'll include 
> {{LatLonDocValuesField}} with this even though it's a separate class.  
> LatLonPoint is based on the Points API, using a BKD index.  It's multi-valued 
> capable.  LatLonDocValuesField is based on sorted numeric DocValues, and thus 
> is also multi-valued capable (a big deal as the existing Solr ones either 
> aren't or do poorly at it).  Note that this feature is limited to a 
> latitude/longitude spherical world model.  And furthermore the precision is 
> at about a centimeter -- less precise than the other spatial fields but 
> nonetheless plenty good for most applications.  Last but not least, this 
> capability natively supports polygons, albeit those that don't wrap the 
> dateline or a pole.
> I propose a {{LatLonPointSpatialField}} which uses this.  Patch & details 
> forthcoming...
> This development was funded by the Harvard Center for Geographic Analysis as 
> part of the HHypermap project



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10320) Perform secondary sort using both values in and outside Solr index

2017-03-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949555#comment-15949555
 ] 

ASF GitHub Bot commented on SOLR-10320:
---

GitHub user bkinlaw opened a pull request:

https://github.com/apache/lucene-solr/pull/179

SOLR-10320: Perform secondary sort using both values in and outside S…

…olr index


This is an ideas pull request to address issue SOLR-10320. Any 
suggestions/feedback is welcome. Still needs more testing and unit tests. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bkinlaw/lucene-solr SOLR-10320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #179


commit 52a72dc845261e116a304ccda6e78d2a04790529
Author: Yuchuan Zhou 
Date:   2017-03-23T14:07:28Z

SOLR-10320: Perform secondary sort using both values in and outside Solr 
index




> Perform secondary sort using both values in and outside Solr index
> --
>
> Key: SOLR-10320
> URL: https://issues.apache.org/jira/browse/SOLR-10320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yuchuan Zhou
>
> There are some situations that we need to sort results based on values 
> outside of Solr (say, from a separate datastore or a data analytics service 
> that ranks entities based on analytic results). There is also the need to 
> return results in a deterministic order but applying a dynamic chain of 
> sorting and/or ranking algorithms to the result set. This chain would be 
> processed as a secondary sort implementation, where ties returned from one 
> sorting/ranking algorithm are passed to the next sorting/ranking algorithm in 
> the chain until all ties are resolved, resulting in a deterministic result 
> order. This chain should have the ability to apply sorting algorithms that 
> use data found within the solr index as well as outside of the index.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request #179: SOLR-10320: Perform secondary sort using both...

2017-03-30 Thread bkinlaw

GitHub user bkinlaw opened a pull request:

https://github.com/apache/lucene-solr/pull/179

SOLR-10320: Perform secondary sort using both values in and outside Sâ¦

â¦olr index


This is an ideas pull request to address issue SOLR-10320. Any 
suggestions/feedback is welcome. Still needs more testing and unit tests. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bkinlaw/lucene-solr SOLR-10320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #179


commit 52a72dc845261e116a304ccda6e78d2a04790529
Author: Yuchuan Zhou 
Date:   2017-03-23T14:07:28Z

SOLR-10320: Perform secondary sort using both values in and outside Solr 
index




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10320) Perform secondary sort using both values in and outside Solr index

2017-03-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949554#comment-15949554
 ] 

ASF GitHub Bot commented on SOLR-10320:
---

Github user bkinlaw closed the pull request at:

https://github.com/apache/lucene-solr/pull/178


> Perform secondary sort using both values in and outside Solr index
> --
>
> Key: SOLR-10320
> URL: https://issues.apache.org/jira/browse/SOLR-10320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yuchuan Zhou
>
> There are some situations that we need to sort results based on values 
> outside of Solr (say, from a separate datastore or a data analytics service 
> that ranks entities based on analytic results). There is also the need to 
> return results in a deterministic order but applying a dynamic chain of 
> sorting and/or ranking algorithms to the result set. This chain would be 
> processed as a secondary sort implementation, where ties returned from one 
> sorting/ranking algorithm are passed to the next sorting/ranking algorithm in 
> the chain until all ties are resolved, resulting in a deterministic result 
> order. This chain should have the ability to apply sorting algorithms that 
> use data found within the solr index as well as outside of the index.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10320) Perform secondary sort using both values in and outside Solr index

2017-03-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949553#comment-15949553
 ] 

ASF GitHub Bot commented on SOLR-10320:
---

GitHub user bkinlaw opened a pull request:

https://github.com/apache/lucene-solr/pull/178

SOLR-10320: Perform secondary sort using both values in and outside S…

…olr index

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bkinlaw/lucene-solr SOLR-10320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #178


commit 52a72dc845261e116a304ccda6e78d2a04790529
Author: Yuchuan Zhou 
Date:   2017-03-23T14:07:28Z

SOLR-10320: Perform secondary sort using both values in and outside Solr 
index




> Perform secondary sort using both values in and outside Solr index
> --
>
> Key: SOLR-10320
> URL: https://issues.apache.org/jira/browse/SOLR-10320
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yuchuan Zhou
>
> There are some situations that we need to sort results based on values 
> outside of Solr (say, from a separate datastore or a data analytics service 
> that ranks entities based on analytic results). There is also the need to 
> return results in a deterministic order but applying a dynamic chain of 
> sorting and/or ranking algorithms to the result set. This chain would be 
> processed as a secondary sort implementation, where ties returned from one 
> sorting/ranking algorithm are passed to the next sorting/ranking algorithm in 
> the chain until all ties are resolved, resulting in a deterministic result 
> order. This chain should have the ability to apply sorting algorithms that 
> use data found within the solr index as well as outside of the index.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request #178: SOLR-10320: Perform secondary sort using both...

2017-03-30 Thread bkinlaw

Github user bkinlaw closed the pull request at:

https://github.com/apache/lucene-solr/pull/178


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[GitHub] lucene-solr pull request #178: SOLR-10320: Perform secondary sort using both...

2017-03-30 Thread bkinlaw

GitHub user bkinlaw opened a pull request:

https://github.com/apache/lucene-solr/pull/178

SOLR-10320: Perform secondary sort using both values in and outside Sâ¦

â¦olr index

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bkinlaw/lucene-solr SOLR-10320

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #178


commit 52a72dc845261e116a304ccda6e78d2a04790529
Author: Yuchuan Zhou 
Date:   2017-03-23T14:07:28Z

SOLR-10320: Perform secondary sort using both values in and outside Solr 
index




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2017-03-30 Thread Cassandra Targett (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949518#comment-15949518
 ] 

Cassandra Targett commented on SOLR-8593:
-

[~risdenk], [~caomanhdat], [~joel.bernstein]: The note in CHANGES for this 
issue says the user should refer to the documentation for details on the 
changes made by moving to Calcite. However, it doesn't look like the Parallel 
SQL docs have been updated - did I miss it?

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>  Components: Parallel SQL
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-8593.patch, SOLR-8593.patch, SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10087) StreamHandler should be able to use runtimeLib jars

2017-03-30 Thread Cassandra Targett (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949517#comment-15949517
 ] 

Cassandra Targett commented on SOLR-10087:
--

[~risdenk]: are the instructions in the description of this issue still 
accurate after it was committed? Asking for the Ref Guide, unless perhaps you 
think we should skip documenting this.

> StreamHandler should be able to use runtimeLib jars
> ---
>
> Key: SOLR-10087
> URL: https://issues.apache.org/jira/browse/SOLR-10087
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.5, master (7.0)
>
> Attachments: SOLR-10087.patch
>
>
> StreamHandler currently can't uses jars that via the runtimeLib and Blob 
> Store api. This is because the StreamHandler uses core.getResourceLoader() 
> instead of core.getMemClassLoader() for loading classes.
> An example of this working with the fix is here: 
> https://github.com/risdenk/solr_custom_streaming_expressions
> Steps:
> {code}
> # Inspired by 
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
> # Start Solr with enabling Blob Store
> ./bin/solr start -c -f -a "-Denable.runtime.lib=true"
> # Create test collection
> ./bin/solr create -c test
> # Create .system collection
> curl 'http://localhost:8983/solr/admin/collections?action=CREATE=.system'
> # Build custom streaming expression jar
> (cd custom-streaming-expression && mvn clean package)
> # Upload jar to .system collection using Blob Store API 
> (https://cwiki.apache.org/confluence/display/solr/Blob+Store+API)
> curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
> @custom-streaming-expression/target/custom-streaming-expression-1.0-SNAPSHOT.jar
>  'http://localhost:8983/solr/.system/blob/test'
> # List all blobs that are stored
> curl 'http://localhost:8983/solr/.system/blob?omitHeader=true'
> # Add the jar to the runtime lib
> curl 'http://localhost:8983/solr/test/config' -H 
> 'Content-type:application/json' -d '{
>"add-runtimelib": { "name":"test", "version":1 }
> }'
> # Create custom streaming expression using work from SOLR-9103
> # Patch from SOLR-10087 is required for StreamHandler to load the runtimeLib 
> jar
> curl 'http://localhost:8983/solr/test/config' -H 
> 'Content-type:application/json' -d '{
>   "create-expressible": {
> "name": "customstreamingexpression",
> "class": "com.test.solr.CustomStreamingExpression",
> "runtimeLib": true
>   }
> }'
> # Test the custom streaming expression
> curl 'http://localhost:8983/solr/test/stream?expr=customstreamingexpression()'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949496#comment-15949496
 ] 

ASF subversion and git services commented on SOLR-10351:


Commit bdd0c7e32087f534de04657fb3ef1b3afa93cc68 in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bdd0c7e ]

SOLR-10351: Fix pre-commit


> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949495#comment-15949495
 ] 

ASF subversion and git services commented on SOLR-10351:


Commit 6c2155c02434bfae2ff5aa62c9ffe57318063626 in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6c2155c ]

SOLR-10351: Add analyze Stream Evaluator to support streaming NLP


> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-master-Windows (32bit/jdk1.8.0_121) - Build # 6486 - Unstable!

2017-03-30 Thread Policeman Jenkins Server

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/6486/
Java: 32bit/jdk1.8.0_121 -server -XX:+UseParallelGC

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.TestCloudRecovery

Error Message:
Could not remove the following files (in the order of attempts):
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog\tlog.000:
 java.nio.file.FileSystemException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog\tlog.000:
 The process cannot access the file because it is being used by another 
process. 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog

C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data

C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2

C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1

C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001

C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001
 

Stack Trace:
java.io.IOException: Could not remove the following files (in the order of 
attempts):
   
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog\tlog.000:
 java.nio.file.FileSystemException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog\tlog.000:
 The process cannot access the file because it is being used by another process.

   
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data\tlog
   
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2\data
   
C:\Users\jenkins\workspace\Lucene-Solr-master-Windows\solr\build\solr-core\test\J0\temp\solr.cloud.TestCloudRecovery_EED6F9F525150FD1-001\tempDir-001\node1\collection1_shard1_replica2:
 java.nio.file.DirectoryNotEmptyException:

[jira] [Updated] (SOLR-10264) ManagedSynonymFilterFactory does not parse multi-term synonyms

2017-03-30 Thread Christine Poerschke (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-10264:
---
Attachment: SOLR-10264.patch

Attaching final(\?) patch, will commit early next week or so if no further 
comments or concerns.

> ManagedSynonymFilterFactory does not parse multi-term synonyms
> --
>
> Key: SOLR-10264
> URL: https://issues.apache.org/jira/browse/SOLR-10264
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Affects Versions: 6.4.2
>Reporter: Jörg Rathlev
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-10264.patch, SOLR-10264.patch, SOLR-10264.patch, 
> SOLR-10264.patch, SOLR-10264-test.patch
>
>
> The parser that the {{ManagedSynonymFilterFactory}} uses to parse the JSON 
> resource into a synonym map does not parse multi-term synonyms in the 
> expected way.
> If the synonym {"foo bar":"baz"} is added to the managed resource, the 
> expected behavior is that the multi-term synonym "foo bar" will be mapped to 
> the synonym "baz".
> In the {{analyze}} method of {{SynonymMap.Parser}}, multiple origin terms are 
> concatenated with a separating {{SynonymMap.WORD_SEPARATOR}}, but the 
> {{analyze}} method is not used by the parser in the 
> {{ManagedSynonymFilterFactory}}.
> As a workaround, multi-term synonyms can be uploaded separated by a null 
> character, i.e., uploading {"foo\ubar":"baz"} works.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10394) search.grouping.Command rename: getSortWithinGroup --> getWithinGroupSort

2017-03-30 Thread Christine Poerschke (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-10394:
---
Attachment: SOLR-10394.patch

Attaching proposed patch, factored out from [~Judith]'s SOLR-6203 changes.

> search.grouping.Command rename: getSortWithinGroup --> getWithinGroupSort
> -
>
> Key: SOLR-10394
> URL: https://issues.apache.org/jira/browse/SOLR-10394
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-10394.patch
>
>
> The class is marked _@lucene.experimental_ and SOLR-9660 previously included 
> sortSpecWithinGroup to withinGroupSortSpec renaming for GroupSpecification; 
> the rename proposed here is in line with that.
> Motivation for the change is to reduce group-sort vs. within-group-sort 
> confusion, generally and specifically in SOLR-6203.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10394) search.grouping.Command rename: getSortWithinGroup --> getWithinGroupSort

2017-03-30 Thread Christine Poerschke (JIRA)

Christine Poerschke created SOLR-10394:
--

 Summary: search.grouping.Command rename: getSortWithinGroup --> 
getWithinGroupSort
 Key: SOLR-10394
 URL: https://issues.apache.org/jira/browse/SOLR-10394
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Christine Poerschke
Assignee: Christine Poerschke
Priority: Minor


The class is marked _@lucene.experimental_ and SOLR-9660 previously included 
sortSpecWithinGroup to withinGroupSortSpec renaming for GroupSpecification; the 
rename proposed here is in line with that.

Motivation for the change is to reduce group-sort vs. within-group-sort 
confusion, generally and specifically in SOLR-6203.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10039) LatLonPointSpatialField

2017-03-30 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949437#comment-15949437
 ] 

Hoss Man commented on SOLR-10039:
-

[~dsmiley]: can you please update the ref guide with some guidance on this new 
field type?

Replacing LatLonType with LatLonPointSpatialField in the list of field types is 
easy, but the way the "Spatial" page is written updating it to make meaningful 
comments about LatLonPointSpatialField (as compared to RPT) is a lot harder w/o 
some first hand knowledge about the merits...

* https://cwiki.apache.org/confluence/display/solr/Spatial+Search
* 
https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr

> LatLonPointSpatialField
> ---
>
> Key: SOLR-10039
> URL: https://issues.apache.org/jira/browse/SOLR-10039
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.5
>
> Attachments: SOLR_10039_LatLonPointSpatialField.patch, 
> SOLR_10039_LatLonPointSpatialField.patch, 
> SOLR_10039_LatLonPointSpatialField.patch, 
> SOLR_10039_LatLonPointSpatialField.patch
>
>
> The fastest and most efficient spatial field for point data in Lucene/Solr is 
> {{LatLonPoint}} in Lucene's sandbox module.  I'll include 
> {{LatLonDocValuesField}} with this even though it's a separate class.  
> LatLonPoint is based on the Points API, using a BKD index.  It's multi-valued 
> capable.  LatLonDocValuesField is based on sorted numeric DocValues, and thus 
> is also multi-valued capable (a big deal as the existing Solr ones either 
> aren't or do poorly at it).  Note that this feature is limited to a 
> latitude/longitude spherical world model.  And furthermore the precision is 
> at about a centimeter -- less precise than the other spatial fields but 
> nonetheless plenty good for most applications.  Last but not least, this 
> capability natively supports polygons, albeit those that don't wrap the 
> dateline or a pole.
> I propose a {{LatLonPointSpatialField}} which uses this.  Patch & details 
> forthcoming...
> This development was funded by the Harvard Center for Geographic Analysis as 
> part of the HHypermap project



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7452) json facet api returning inconsistent counts in cloud set up

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949405#comment-15949405
 ] 

ASF subversion and git services commented on SOLR-7452:
---

Commit b17b48d5353fd469c0d8bdbfa25894049495cb46 in lucene-solr's branch 
refs/heads/branch_6x from [~yo...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b17b48d ]

SOLR-7452: refinement of missing buckets and partial facets through missing 
buckets


> json facet api returning inconsistent counts in cloud set up
> 
>
> Key: SOLR-7452
> URL: https://issues.apache.org/jira/browse/SOLR-7452
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Affects Versions: 5.1
>Reporter: Vamsi Krishna D
>  Labels: count, facet, sort
> Attachments: SOLR-7452.patch, SOLR-7452.patch
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> While using the newly added feature of json term facet api 
> (http://yonik.com/json-facet-api/#TermsFacet) I am encountering inconsistent 
> returns of counts of faceted value ( Note I am running on a cloud mode of 
> solr). For example consider that i have txns_id(unique field or key), 
> consumer_number and amount. Now for a 10 million such records , lets say i 
> query for 
> q=*:*=0&
>  json.facet={
>biskatoo:{
>type : terms,
>field : consumer_number,
>limit : 20,
>   sort : {y:desc},
>   numBuckets : true,
>   facet:{
>y : "sum(amount)"
>}
>}
>  }
> the results are as follows ( some are omitted ):
> "facets":{
> "count":6641277,
> "biskatoo":{
>   "numBuckets":3112708,
>   "buckets":[{
>   "val":"surya",
>   "count":4,
>   "y":2.264506},
>   {
>   "val":"raghu",
>   "COUNT":3,   // capitalised for recognition 
>   "y":1.8},
> {
>   "val":"malli",
>   "count":4,
>   "y":1.78}]}}}
> but if i restrict the query to 
> q=consumer_number:raghu=0&
>  json.facet={
>biskatoo:{
>type : terms,
>field : consumer_number,
>limit : 20,
>   sort : {y:desc},
>   numBuckets : true,
>   facet:{
>y : "sum(amount)"
>}
>}
>  }
> i get :
>   "facets":{
> "count":4,
> "biskatoo":{
>   "numBuckets":1,
>   "buckets":[{
>   "val":"raghu",
>   "COUNT":4,
>   "y":2429708.24}]}}}
> One can see the count results are inconsistent ( and I found many occasions 
> of inconsistencies).
> I have tried the patch https://issues.apache.org/jira/browse/SOLR-7412 but 
> still the issue seems not resolved



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949342#comment-15949342
 ] 

Joel Bernstein edited comment on SOLR-10351 at 3/30/17 4:26 PM:


Added a test with the select function


was (Author: joel.bernstein):
Added a test with select function

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949342#comment-15949342
 ] 

Joel Bernstein commented on SOLR-10351:
---

Added a test with select function

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Attachment: SOLR-10351.patch

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch, 
> SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-03-30 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949319#comment-15949319
 ] 

Erick Erickson commented on SOLR-8096:
--

OK, what's the status of this JIRA? Last comment was 9 months ago

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Attachment: SOLR-10351.patch

More tests

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch, SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

2017-03-30 Thread Petrus Hyvönen

Hi,

My current diff to the svn is below (as in the chain of mails). Now i get
it to wrap my library in both 2.7, 3.5 and 3.6.

/Regards


Index: jcc2/__init__.py
===
--- jcc2/__init__.py (revision 1789413)
+++ jcc2/__init__.py (working copy)
@@ -20,7 +20,7 @@
 from windows import add_jvm_dll_directory_to_path
 add_jvm_dll_directory_to_path()

-from jcc2.config import SHARED
+from jcc.config import SHARED
 if SHARED:
 path = os.environ['Path'].split(os.pathsep)
 eggpath =
os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
Index: jcc3/sources/functions.cpp
===
--- jcc3/sources/functions.cpp (revision 1789413)
+++ jcc3/sources/functions.cpp (working copy)
@@ -300,7 +300,7 @@
 #if defined(_MSC_VER) || defined(__SUNPRO_CC)
 int __parseArgs(PyObject *args, char *types, ...)
 {
-int count = PY_SIZE((PyTupleObject *) args);
+int count = Py_SIZE((PyTupleObject *) args); //WAS PY_SIZE
 va_list list, check;

 va_start(list, types);
Index: jcc3/sources/jcc.cpp
===
--- jcc3/sources/jcc.cpp (revision 1789413)
+++ jcc3/sources/jcc.cpp (working copy)
@@ -195,11 +195,11 @@

 static PyObject *t_jccenv_strhash(PyObject *self, PyObject *arg)
 {
-static const size_t hexdig = sizeof(uintmax_t) * 2;
-uintmax_t hash = (uintmax_t) PyObject_Hash(arg);
+unsigned long long hash = (unsigned long long) PyObject_Hash(arg);
+static const size_t hexdig = sizeof(hash) * 2;
 char buffer[hexdig + 1];

-sprintf(buffer, "%0*"PRIxMAX, (int) hexdig, hash);
+sprintf(buffer, "%0*llx", (int) hexdig, hash);
 return PyUnicode_FromStringAndSize(buffer, hexdig);
 }

Index: setup.py
===
--- setup.py (revision 1789413)
+++ setup.py (working copy)
@@ -158,7 +158,7 @@
 'sunos5': ['-L%(sunos5)s/jre/lib/i386' %(JDK), '-ljava',
'-L%(sunos5)s/jre/lib/i386/client' %(JDK), '-ljvm',
'-R%(sunos5)s/jre/lib/i386:%(sunos5)s/jre/lib/i386/client'
%(JDK)],
-'win32': ['/LIBPATH:%(win32)s/lib' %(JDK), 'Ws2_32.lib', 'jvm.lib'],
+'win32': ['/LIBPATH:%(win32)s/lib' %(JDK), 'Ws2_32.lib',
'jvm.lib','/DLL'],
 'mingw32': ['-L%(mingw32)s/lib' %(JDK), '-ljvm'],
 'freebsd7': ['-L%(freebsd7)s/jre/lib/i386' %(JDK), '-ljava',
'-lverify',
  '-L%(freebsd7)s/jre/lib/i386/client' %(JDK), '-ljvm',



On Thu, Mar 30, 2017 at 5:36 PM, Petrus Hyvönen 
wrote:

> Hi,
>
> I was trying the python 2.7 build and I think the line 23 in
> jcc2/__init__.py should be:
>
> from jcc.config import SHARED
>
> (instead of from jcc2.config import..)
>
> Regards
> /Petrus
>
>
> On Thu, Mar 30, 2017 at 9:10 AM, Petrus Hyvönen 
> wrote:
>
>> Hi,
>>
>> With this version of of t_jccenv_strhash I can build both JCC and wrap
>> the library I'm using!
>>
>> Regards
>> /Petrus
>>
>>
>>
>>>
>>>
 static PyObject *t_jccenv_strhash(PyObject *self, PyObject *arg)
 {
unsigned long long hash = (unsigned long long) PyObject_Hash(arg);
static const size_t hexdig = sizeof(hash) * 2;
char buffer[hexdig + 1];

sprintf(buffer, "%0*llx", (int) hexdig, hash);
return PyUnicode_FromStringAndSize(buffer, hexdig);
 }

 BTW this function should be also copied to the py2 directory where we
 still use int allthough PyObject_Hash returns already long on python

> 2.x.
>

 cu,
 Rudi



>>
>>
>> --
>> _
>> Petrus Hyvönen, Uppsala, Sweden
>> Mobile Phone/SMS:+46 73 803 19 00 <073-803%2019%2000>
>>
>
>
>
> --
> _
> Petrus Hyvönen, Uppsala, Sweden
> Mobile Phone/SMS:+46 73 803 19 00 <073-803%2019%2000>
>



-- 
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Attachment: SOLR-10351.patch

Added a very basic test. Expanded tests still to come.

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch, SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10347) Remove index level boost support from "documents" section of the admin UI

2017-03-30 Thread Amrit Sarkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amrit Sarkar updated SOLR-10347:

Attachment: screenshot-old-UI.png

> Remove index level boost support from "documents" section of the admin UI
> -
>
> Key: SOLR-10347
> URL: https://issues.apache.org/jira/browse/SOLR-10347
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Tomás Fernández Löbbe
> Attachments: screenshot-new-UI.png, screenshot-old-UI.png, 
> SOLR-10347.patch
>
>
> Index-time boost is deprecated since LUCENE-6819



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release PyLucene 6.5.0 (rc1) (now with Python 3 support)

2017-03-30 Thread Petrus Hyvönen

Hi,

I was trying the python 2.7 build and I think the line 23 in
jcc2/__init__.py should be:

from jcc.config import SHARED

(instead of from jcc2.config import..)

Regards
/Petrus


On Thu, Mar 30, 2017 at 9:10 AM, Petrus Hyvönen 
wrote:

> Hi,
>
> With this version of of t_jccenv_strhash I can build both JCC and wrap the
> library I'm using!
>
> Regards
> /Petrus
>
>
>
>>
>>
>>> static PyObject *t_jccenv_strhash(PyObject *self, PyObject *arg)
>>> {
>>>unsigned long long hash = (unsigned long long) PyObject_Hash(arg);
>>>static const size_t hexdig = sizeof(hash) * 2;
>>>char buffer[hexdig + 1];
>>>
>>>sprintf(buffer, "%0*llx", (int) hexdig, hash);
>>>return PyUnicode_FromStringAndSize(buffer, hexdig);
>>> }
>>>
>>> BTW this function should be also copied to the py2 directory where we
>>> still use int allthough PyObject_Hash returns already long on python
>>>
 2.x.

>>>
>>> cu,
>>> Rudi
>>>
>>>
>>>
>
>
> --
> _
> Petrus Hyvönen, Uppsala, Sweden
> Mobile Phone/SMS:+46 73 803 19 00 <073-803%2019%2000>
>



-- 
_
Petrus Hyvönen, Uppsala, Sweden
Mobile Phone/SMS:+46 73 803 19 00

[jira] [Updated] (SOLR-10347) Remove index level boost support from "documents" section of the admin UI

2017-03-30 Thread Amrit Sarkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amrit Sarkar updated SOLR-10347:

Attachment: screenshot-new-UI.png

> Remove index level boost support from "documents" section of the admin UI
> -
>
> Key: SOLR-10347
> URL: https://issues.apache.org/jira/browse/SOLR-10347
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Tomás Fernández Löbbe
> Attachments: screenshot-new-UI.png, SOLR-10347.patch
>
>
> Index-time boost is deprecated since LUCENE-6819



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7671) Enhance UpgradeIndexMergePolicy with additional options

2017-03-30 Thread Keith Laban (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949261#comment-15949261
 ] 

Keith Laban commented on LUCENE-7671:
-

[~mikemccand] I updated PR with some extra changes:

- Fixed typo in testUpgradeWithExcplicitUpgrades
- Added usage for -include-new-segments options
- Also added -num-segments option for IndexUpgrader and usage
- Added random toggle for new options to be added in tests

Still outstanding: See my earlier 
[comment|https://issues.apache.org/jira/browse/LUCENE-7671?focusedCommentId=15925030=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15925030]
 about the failing test.



> Enhance UpgradeIndexMergePolicy with additional options
> ---
>
> Key: LUCENE-7671
> URL: https://issues.apache.org/jira/browse/LUCENE-7671
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Keith Laban
>
> Enhance UpgradeIndexMergePolicy to be a MergePolicy that can be used outside 
> the scope the IndexUpgrader.
> The enhancement aims to allow the UpgradeIndexMergePolicy to:
> 1) Delegate normal force merges to the underlying merge policy
> 2) Enable a flag that will explicitly tell UpgradeIndexMergePolicy when it 
> should start looking for upgrades.
> 3) Allow new segments to be considered to be merged with old segments, 
> depending on underlying MergePolicy.
> 4) Be configurable for backwards compatibility such that only segments 
> needing an upgrade would be considered when merging, no explicit upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7761) Typo in comment in ReqExclScorer

2017-03-30 Thread Pablo Pita Leira (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Pita Leira updated LUCENE-7761:
-
Fix Version/s: 5.x

> Typo in comment in ReqExclScorer
> 
>
> Key: LUCENE-7761
> URL: https://issues.apache.org/jira/browse/LUCENE-7761
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.x, 6.x, master (7.0)
>Reporter: Pablo Pita Leira
>Priority: Trivial
> Fix For: 5.x, 6.x, master (7.0)
>
> Attachments: LUCENE-7761.patch
>
>
> There is a typo in the last comment in ReqExclScorer. It should say: 
> "reqTwoPhaseIterator is MORE costly than exclTwoPhaseIterator, check it last"
> The patch fixes this comment.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7761) Typo in comment in ReqExclScorer

2017-03-30 Thread Pablo Pita Leira (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Pita Leira updated LUCENE-7761:
-
Affects Version/s: 5.x

> Typo in comment in ReqExclScorer
> 
>
> Key: LUCENE-7761
> URL: https://issues.apache.org/jira/browse/LUCENE-7761
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.x, 6.x, master (7.0)
>Reporter: Pablo Pita Leira
>Priority: Trivial
> Fix For: 5.x, 6.x, master (7.0)
>
> Attachments: LUCENE-7761.patch
>
>
> There is a typo in the last comment in ReqExclScorer. It should say: 
> "reqTwoPhaseIterator is MORE costly than exclTwoPhaseIterator, check it last"
> The patch fixes this comment.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 6.5.1 release?

2017-03-30 Thread Christine Poerschke (BLOOMBERG/ LONDON)

Correct, release 6.5.1 will be off branch_6_5 branch and there will be a 6.5.1 
tag (but not branch) e.g. for the 6.4.2 release we have tag 
https://github.com/apache/lucene-solr/tree/releases/lucene-solr/6.4.2

branch_6_6 will (in future) be branched off the branch_6x branch.

branch_7x will (in future) be branched off master branch.

That's my understanding anyhow and I'm slightly tempted to draw this as a 
diagram similar to what "git log --decorate --oneline --graph" outputs ...

Regards,
Christine

- Original Message -
From: dev@lucene.apache.org
To: dev@lucene.apache.org
At: 03/30/17 15:29:15

That's done off the 6.5 branch right? I am committing soon a redone
DIH example and only want it in trunk and 6.6 as the change is quite
large.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced

On 30 March 2017 at 06:19, Joel Bernstein  wrote:
> Hi,
>
> I would like to have a 6.5.1 release due to
> https://issues.apache.org/jira/browse/SOLR-10341.
>
> The fix for this is committed and back ported. I'm traveling this week. But
> can be the release manager for this next week.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 6.5.1 release?

2017-03-30 Thread Alexandre Rafalovitch

That's done off the 6.5 branch right? I am committing soon a redone
DIH example and only want it in trunk and 6.6 as the change is quite
large.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 30 March 2017 at 06:19, Joel Bernstein  wrote:
> Hi,
>
> I would like to have a 6.5.1 release due to
> https://issues.apache.org/jira/browse/SOLR-10341.
>
> The fix for this is committed and back ported. I'm traveling this week. But
> can be the release manager for this next week.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Question about grouping in distribute mode

2017-03-30 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)

Yes, I agree. And if there are not problems with the logic it would improve the 
performance in both the cases.. 

From: dev@lucene.apache.org At: 03/30/17 14:59:31
To: dev@lucene.apache.org
Subject: Re: Question about grouping in distribute mode

This is also the case for non-distributed, isn’t it?  The lucene-level 
FirstPassGroupingCollector doesn’t actually record the docid of the top doc for 
each group at the moment, but I don’t think there’s any reason it couldn’t - 
it’s stored in the relevant FieldComparator.  And it would be a nice shortcut 
in GroupingSearch more generally.

Alan Woodward
www.flax.co.uk

On 30 Mar 2017, at 14:26, Diego Ceccarelli  wrote:
Hello, I'm currently working on Solr grouping in order to support reranking 
[1].  
I've a working patch for non distribute search, and I'm now working on the 
distribute setting. 

Looking at the code of distribute grouping (top-k groups, top-n documents for 
each group) search consists in: 

GROUPING_DISTRIBUTED_FIRST 
1. given the grouping query, each shard will return the top-k groups
2. federator will merge the top-k groups and will produce the top-k groups for 
the query

GROUPING_DISTRIBUTED_SECOND
1. given the top-k groups  each shard will return its top-n documents for each 
group.
2. federator will then compute top-n documents for each group merging all the 
shards responses. 

GET_FIELDS
as usual 

My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and return 
the top documents for each group with a new score given by the function used to 
rerank
(affecting maxScore for each group and then also the order of the groups).
Looking at the code then I realized that TopGroups asserts that order of the 
groups is not changing, 
and I realized that indeed _ if the ranking function is the same, group order 
can't change after the first stage _. 

My question is: if the user is interested only in the top document for each 
group (i.e., the default: group.limit = 1) do we really need 
GROUPING_DISTRIBUTED_SECOND, or could we skip it? 
is there any reason to perform grouping distributed second in this case? or we 
could just return the top docid together with the topgroups in 
GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? 

Cheers,
Diego

[1] https://issues.apache.org/jira/browse/SOLR-8542

Re: Question about grouping in distribute mode

2017-03-30 Thread Alan Woodward

This is also the case for non-distributed, isn’t it?  The lucene-level 
FirstPassGroupingCollector doesn’t actually record the docid of the top doc for 
each group at the moment, but I don’t think there’s any reason it couldn’t - 
it’s stored in the relevant FieldComparator.  And it would be a nice shortcut 
in GroupingSearch more generally.

Alan Woodward
www.flax.co.uk


> On 30 Mar 2017, at 14:26, Diego Ceccarelli  wrote:
> 
> Hello, I'm currently working on Solr grouping in order to support reranking 
> [1].  
> I've a working patch for non distribute search, and I'm now working on the 
> distribute setting. 
> 
> Looking at the code of distribute grouping (top-k groups, top-n documents for 
> each group) search consists in: 
> 
> GROUPING_DISTRIBUTED_FIRST 
> 1. given the grouping query, each shard will return the top-k groups
> 2. federator will merge the top-k groups and will produce the top-k groups 
> for the query
> 
> GROUPING_DISTRIBUTED_SECOND
> 1. given the top-k groups  each shard will return its top-n documents for 
> each group.
> 2. federator will then compute top-n documents for each group merging all the 
> shards responses. 
> 
> GET_FIELDS
> as usual 
> 
> My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and 
> return 
> the top documents for each group with a new score given by the function used 
> to rerank
> (affecting maxScore for each group and then also the order of the groups).
> Looking at the code then I realized that TopGroups asserts that order of the 
> groups is not changing, 
> and I realized that indeed _ if the ranking function is the same, group order 
> can't change after the first stage _. 
> 
> My question is: if the user is interested only in the top document for each 
> group (i.e., the default: group.limit = 1) do we really need 
> GROUPING_DISTRIBUTED_SECOND, or could we skip it? 
> is there any reason to perform grouping distributed second in this case? or 
> we could just return the top docid together with the topgroups in 
> GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? 
> 
> Cheers,
> Diego
> 
> [1] https://issues.apache.org/jira/browse/SOLR-8542 
> 
>

[jira] [Commented] (LUCENE-7758) EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original tokens

2017-03-30 Thread Mikhail Bystryantsev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949095#comment-15949095
 ] 

Mikhail Bystryantsev commented on LUCENE-7758:
--

{quote}Assigning offsets is the responsibility of tokenizers. Tokenfilters 
should just look at tokens and modify them, but not split them or change their 
offsets.{quote}
But tokenizer can be *only one*, so there is no way to get tokens different 
than produced by specific single tokenizer. No way to customize without writing 
your own tokenizers. It is possible to combine token filters, but not 
tokenizers.

{quote}In addition, highlighting is not meant to produce "exact" explanations 
of every analysis step. It is more meant to allow highlighting whole tokens 
afterwards, so the user has an idea, which token was responsible for a 
hit.{quote}
I think this should be decided by Lucene users, not by anyone else. When you 
project your index and search behaviour, only you can decide how it should work 
based on your project requirements.

> EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original 
> tokens
> --
>
> Key: LUCENE-7758
> URL: https://issues.apache.org/jira/browse/LUCENE-7758
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.4.1
> Environment: elasticsearch-5.3
>Reporter: Mikhail Bystryantsev
>  Labels: EdgeNGramTokenFilter, highlighting
>
> When EdgeNGramTokenFilter produces new tokens, they inherit end positions 
> from parent tokens. This behaviour is irrational and breaks highlighting: 
> highlighted not matched pattern, but whole source tokens.
> Seems like similar problem was fixed in LUCENE-3642, but end offsets was 
> broken again after LUCENE-3907.
> Some discussion was found in SOLR-7926:
> {quote}I agree this (highlighting of hits from tokens produced by
> EdgeNGramFilter) got worse with LUCENE-3907, but it's not clear how to
> fix it.
> The stacking seems more correct: all these grams are logically
> interchangeable with the original token, and were derived from it, so
> e.g. a phrase query involving them with adjacent tokens would work
> correctly.
> We could perhaps remove the token graph requirement that tokens
> leaving from the same node have the same startOffset, and arriving to
> the same node have the same endOffset. Lucene would still be able to
> index such a graph, as long as all tokens leaving a given node are
> sorted according to their startOffset. But I'm not sure if there
> would be other problems...
> Or we could maybe improve the token graph, at least for the non-edge
> NGramTokenFilter, so that the grams are linked up correctly, so that any
> path through the graph reconstructs the original characters.
> But realistically it's not possible to innovate much with token graphs
> in Lucene today because of apparently severe back compat requirements:
> e.g. LUCENE-6664, which fixes the token graph bugs in the existing
> SynonymFilter so that proximity queries work correctly when using
> search-time synonyums, is blocked because of the back compat concerns
> from LUCENE-6721.
> I'm not sure what the path forward is...{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-7761) Typo in comment in ReqExclScorer

2017-03-30 Thread Pablo Pita Leira (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Pita Leira updated LUCENE-7761:
-
Attachment: LUCENE-7761.patch

> Typo in comment in ReqExclScorer
> 
>
> Key: LUCENE-7761
> URL: https://issues.apache.org/jira/browse/LUCENE-7761
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 6.x, master (7.0)
>Reporter: Pablo Pita Leira
>Priority: Trivial
> Fix For: 6.x, master (7.0)
>
> Attachments: LUCENE-7761.patch
>
>
> There is a typo in the last comment in ReqExclScorer. It should say: 
> "reqTwoPhaseIterator is MORE costly than exclTwoPhaseIterator, check it last"
> The patch fixes this comment.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-7761) Typo in comment in ReqExclScorer

2017-03-30 Thread Pablo Pita Leira (JIRA)

Pablo Pita Leira created LUCENE-7761:


 Summary: Typo in comment in ReqExclScorer
 Key: LUCENE-7761
 URL: https://issues.apache.org/jira/browse/LUCENE-7761
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 6.x, master (7.0)
Reporter: Pablo Pita Leira
Priority: Trivial
 Fix For: 6.x, master (7.0)


There is a typo in the last comment in ReqExclScorer. It should say: 
"reqTwoPhaseIterator is MORE costly than exclTwoPhaseIterator, check it last"
The patch fixes this comment.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-7755) Join queries should not reference IndexReaders.

2017-03-30 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7755.
--
   Resolution: Fixed
Fix Version/s: 6.6
   master (7.0)
   6.5.1

> Join queries should not reference IndexReaders.
> ---
>
> Key: LUCENE-7755
> URL: https://issues.apache.org/jira/browse/LUCENE-7755
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Fix For: 6.5.1, master (7.0), 6.6
>
> Attachments: LUCENE-7755.patch
>
>
> This is similar to LUCENE-7657 and can cause memory leaks when those queries 
> are cached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7755) Join queries should not reference IndexReaders.

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949088#comment-15949088
 ] 

ASF subversion and git services commented on LUCENE-7755:
-

Commit bd2ec8e40e83e4712062c37ed121132054409918 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bd2ec8e ]

LUCENE-7755: Join queries should not reference IndexReaders.


> Join queries should not reference IndexReaders.
> ---
>
> Key: LUCENE-7755
> URL: https://issues.apache.org/jira/browse/LUCENE-7755
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-7755.patch
>
>
> This is similar to LUCENE-7657 and can cause memory leaks when those queries 
> are cached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7755) Join queries should not reference IndexReaders.

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949089#comment-15949089
 ] 

ASF subversion and git services commented on LUCENE-7755:
-

Commit edafcbad14482f3cd2f072fdca0c89600e72885d in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=edafcba ]

LUCENE-7755: Join queries should not reference IndexReaders.


> Join queries should not reference IndexReaders.
> ---
>
> Key: LUCENE-7755
> URL: https://issues.apache.org/jira/browse/LUCENE-7755
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-7755.patch
>
>
> This is similar to LUCENE-7657 and can cause memory leaks when those queries 
> are cached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7755) Join queries should not reference IndexReaders.

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949087#comment-15949087
 ] 

ASF subversion and git services commented on LUCENE-7755:
-

Commit 3a0c2a691d4fea1670b3d071032fc54c716b5d1a in lucene-solr's branch 
refs/heads/branch_6_5 from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3a0c2a6 ]

LUCENE-7755: Join queries should not reference IndexReaders.


> Join queries should not reference IndexReaders.
> ---
>
> Key: LUCENE-7755
> URL: https://issues.apache.org/jira/browse/LUCENE-7755
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
> Attachments: LUCENE-7755.patch
>
>
> This is similar to LUCENE-7657 and can cause memory leaks when those queries 
> are cached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7760) StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying

2017-03-30 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949082#comment-15949082
 ] 

Steve Rowe commented on LUCENE-7760:


+1

>From 
>[http://mail-archives.apache.org/mod_mbox/lucene-java-user/201611.mbox/%3c36fcfd77-d873-4757-9d16-e89016f16...@gmail.com%3e],
> where I most recently responded to a user question about the situation - this 
>should be useful as a seed for javadoc fixes:

{noformat}
The behavior you mention is an intentional change from the behavior in Lucene 
4.9.0 and earlier,
when tokens longer than maxTokenLenth were silently ignored: see LUCENE-5897[1] 
and LUCENE-5400[2].

The new behavior is as follows: Token matching rules are no longer allowed to 
match against
input char sequences longer than maxTokenLength.  If a rule that would match a 
sequence longer
than maxTokenLength, but also matches at maxTokenLength chars or fewer, and has 
the highest
priority among all other rules matching at this length, and no other rule 
matches more chars,
then a token will be emitted for that rule at the matching length.  And then 
the rule-matching
iteration simply continues from that point as normal.  If the same rule matches 
against the
remainder of the sequence that the first rule would have matched if 
maxTokenLength were longer,
then another token at the matched length will be emitted, and so on. 

Note that this can result in effectively splitting the sequence at 
maxTokenLength intervals
as you noted.

You can fix the problem by setting maxTokenLength higher - this has the side 
effect of growing
the buffer and not causing unwanted token splitting.  If this results in tokens 
larger than
you would like, you can remove them with LengthFilter.

FYI there is discussion on LUCENE-5897 about separating buffer size from 
maxTokenLength, starting
here: 

- ultimately I decided that few people would benefit from the increased 
configuration complexity.

[1] https://issues.apache.org/jira/browse/LUCENE-5897
[2] https://issues.apache.org/jira/browse/LUCENE-5400
{noformat}

> StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying
> -
>
> Key: LUCENE-7760
> URL: https://issues.apache.org/jira/browse/LUCENE-7760
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.6
>
>
> The javadocs claim that too-long tokens are discarded, but in fact they are 
> simply chopped up.  The following test case unexpectedly passes:
> {noformat}
>   public void testMaxTokenLengthNonDefault() throws Exception {
> StandardAnalyzer a = new StandardAnalyzer();
> a.setMaxTokenLength(5);
> assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd", 
> "toolo", "ng", "xy", "z"});
> a.close();
>   }
> {noformat}
> We should at least fix the javadocs ...
> (I hit this because I was trying to also add {{setMaxTokenLength}} to 
> {{EnglishAnalyzer}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Attachment: SOLR-10351.patch

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Attachment: (was: SOLR-10351.patch)

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Attachment: SOLR-10351.patch

Patch with the basic implementation. Test to follow.

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
> Attachments: SOLR-10351.patch
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Question about grouping in distribute mode

2017-03-30 Thread Diego Ceccarelli

Hello, I'm currently working on Solr grouping in order to support reranking
[1].
I've a working patch for non distribute search, and I'm now working on the
distribute setting.

Looking at the code of distribute grouping (top-k groups, top-n documents
for each group) search consists in:

GROUPING_DISTRIBUTED_FIRST
1. given the grouping query, each shard will return the top-k groups
2. federator will merge the top-k groups and will produce the top-k groups
for the query

GROUPING_DISTRIBUTED_SECOND
1. given the top-k groups  each shard will return its top-n documents for
each group.
2. federator will then compute top-n documents for each group merging all
the shards responses.

GET_FIELDS
as usual

My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and
return
the top documents for each group with a new score given by the function
used to rerank
(affecting maxScore for each group and then also the order of the groups).
Looking at the code then I realized that TopGroups asserts that order of
the groups is not changing,
and I realized that indeed _ if the ranking function is the same, group
order can't change after the first stage _.

My question is: if the user is interested only in the top document for each
group (i.e., the default: group.limit = 1) do we really need
GROUPING_DISTRIBUTED_SECOND, or could we skip it?
is there any reason to perform grouping distributed second in this case? or
we could just return the top docid together with the topgroups in
GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS?

Cheers,
Diego

[1] https://issues.apache.org/jira/browse/SOLR-8542

[jira] [Commented] (LUCENE-7759) TestBackwardsCompatibility only test sorted indexes created by 6.2.1, 6.2.1 and 6.3.0

2017-03-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949042#comment-15949042
 ] 

Michael McCandless commented on LUCENE-7759:


Sigh, +1.

> TestBackwardsCompatibility only test sorted indexes created by 6.2.1, 6.2.1 
> and 6.3.0
> -
>
> Key: LUCENE-7759
> URL: https://issues.apache.org/jira/browse/LUCENE-7759
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>
> I think we should improve the release process to add sorted bw indices 
> whenever we do a release?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7758) EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original tokens

2017-03-30 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949039#comment-15949039
 ] 

Uwe Schindler commented on LUCENE-7758:
---

bq. Moreover, I would not be surprised that highlighting the entire token is a 
desired behaviour for some users.

This is correct. Modifying offsets inside a TokenFilter is not going to be 
correct for highlighting for the reasons you are mentioning. This is a general 
issue with all token filters that are splitting tokens: The "famous" example is 
WordDelimiterFilter.

Assigning offsets is the responsibility of tokenizers. Tokenfilters should just 
look at tokens and modify them, but not split them or change their offsets. 

In addition, highlighting is not meant to produce "exact" explanations of every 
analysis step. It is more meant to allow highlighting whole tokens afterwards, 
so the user has an idea, which token was responsible for a hit.

> EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original 
> tokens
> --
>
> Key: LUCENE-7758
> URL: https://issues.apache.org/jira/browse/LUCENE-7758
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.4.1
> Environment: elasticsearch-5.3
>Reporter: Mikhail Bystryantsev
>  Labels: EdgeNGramTokenFilter, highlighting
>
> When EdgeNGramTokenFilter produces new tokens, they inherit end positions 
> from parent tokens. This behaviour is irrational and breaks highlighting: 
> highlighted not matched pattern, but whole source tokens.
> Seems like similar problem was fixed in LUCENE-3642, but end offsets was 
> broken again after LUCENE-3907.
> Some discussion was found in SOLR-7926:
> {quote}I agree this (highlighting of hits from tokens produced by
> EdgeNGramFilter) got worse with LUCENE-3907, but it's not clear how to
> fix it.
> The stacking seems more correct: all these grams are logically
> interchangeable with the original token, and were derived from it, so
> e.g. a phrase query involving them with adjacent tokens would work
> correctly.
> We could perhaps remove the token graph requirement that tokens
> leaving from the same node have the same startOffset, and arriving to
> the same node have the same endOffset. Lucene would still be able to
> index such a graph, as long as all tokens leaving a given node are
> sorted according to their startOffset. But I'm not sure if there
> would be other problems...
> Or we could maybe improve the token graph, at least for the non-edge
> NGramTokenFilter, so that the grams are linked up correctly, so that any
> path through the graph reconstructs the original characters.
> But realistically it's not possible to innovate much with token graphs
> in Lucene today because of apparently severe back compat requirements:
> e.g. LUCENE-6664, which fixes the token graph bugs in the existing
> SynonymFilter so that proximity queries work correctly when using
> search-time synonyums, is blocked because of the back compat concerns
> from LUCENE-6721.
> I'm not sure what the path forward is...{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-10393) Add UUID Stream Evaluator

2017-03-30 Thread Joel Bernstein (JIRA)

Joel Bernstein created SOLR-10393:
-

 Summary: Add UUID Stream Evaluator
 Key: SOLR-10393
 URL: https://issues.apache.org/jira/browse/SOLR-10393
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


The cartesianProduct function emits multiple tuples from a single tuple. To 
save the cartesian product in another collection it would be useful to be able 
to dynamically assign new unique id's to tuples. The uuid() stream evaluator 
will allow us to do this.

sample syntax:

{code}
cartesianProduct(expr, fielda, uuid() as id)
{code} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7758) EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original tokens

2017-03-30 Thread Mikhail Bystryantsev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949023#comment-15949023
 ] 

Mikhail Bystryantsev commented on LUCENE-7758:
--

{quote}I just wanted to react to your comment that the current behaviour is 
irrational{quote}
Ok, in other words: unexpected and may be confusing from the point of view of 
the average user.

{quote}I would not be surprised that highlighting the entire token is a desired 
behaviour for some users.{quote}
I think for such cases just should be implemented possibility to behaviour 
tuning, for example parameter like {{inherit_offsets: true | false}}.

> EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original 
> tokens
> --
>
> Key: LUCENE-7758
> URL: https://issues.apache.org/jira/browse/LUCENE-7758
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.4.1
> Environment: elasticsearch-5.3
>Reporter: Mikhail Bystryantsev
>  Labels: EdgeNGramTokenFilter, highlighting
>
> When EdgeNGramTokenFilter produces new tokens, they inherit end positions 
> from parent tokens. This behaviour is irrational and breaks highlighting: 
> highlighted not matched pattern, but whole source tokens.
> Seems like similar problem was fixed in LUCENE-3642, but end offsets was 
> broken again after LUCENE-3907.
> Some discussion was found in SOLR-7926:
> {quote}I agree this (highlighting of hits from tokens produced by
> EdgeNGramFilter) got worse with LUCENE-3907, but it's not clear how to
> fix it.
> The stacking seems more correct: all these grams are logically
> interchangeable with the original token, and were derived from it, so
> e.g. a phrase query involving them with adjacent tokens would work
> correctly.
> We could perhaps remove the token graph requirement that tokens
> leaving from the same node have the same startOffset, and arriving to
> the same node have the same endOffset. Lucene would still be able to
> index such a graph, as long as all tokens leaving a given node are
> sorted according to their startOffset. But I'm not sure if there
> would be other problems...
> Or we could maybe improve the token graph, at least for the non-edge
> NGramTokenFilter, so that the grams are linked up correctly, so that any
> path through the graph reconstructs the original characters.
> But realistically it's not possible to innovate much with token graphs
> in Lucene today because of apparently severe back compat requirements:
> e.g. LUCENE-6664, which fixes the token graph bugs in the existing
> SynonymFilter so that proximity queries work correctly when using
> search-time synonyums, is blocked because of the back compat concerns
> from LUCENE-6721.
> I'm not sure what the path forward is...{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10254) significantTerms Streaming Expression should work in non-SolrCloud mode

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948994#comment-15948994
 ] 

ASF subversion and git services commented on SOLR-10254:


Commit 4d3e94befcb5ea361ceff1fcff1bdc3e6166fdf1 in lucene-solr's branch 
refs/heads/branch_6_5 from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4d3e94b ]

SOLR-10046: remove CHANGES.txt entry

(Reverses unintentional add alongside SOLR-10085 and SOLR-10254 CHANGES.txt 
update.)


> significantTerms Streaming Expression should work in non-SolrCloud mode
> ---
>
> Key: SOLR-10254
> URL: https://issues.apache.org/jira/browse/SOLR-10254
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-10254.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10085) SQL result set fields should be ordered by the field list

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948992#comment-15948992
 ] 

ASF subversion and git services commented on SOLR-10085:


Commit 4d3e94befcb5ea361ceff1fcff1bdc3e6166fdf1 in lucene-solr's branch 
refs/heads/branch_6_5 from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4d3e94b ]

SOLR-10046: remove CHANGES.txt entry

(Reverses unintentional add alongside SOLR-10085 and SOLR-10254 CHANGES.txt 
update.)


> SQL result set fields should be ordered by the field list
> -
>
> Key: SOLR-10085
> URL: https://issues.apache.org/jira/browse/SOLR-10085
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: faceting
>Affects Versions: 6.3
> Environment: Windows 8.1, Java 8
>Reporter: Yeo Zheng Lin
>Assignee: Joel Bernstein
>  Labels: json, streaming
> Attachments: SOLR-10085.patch, SOLR-10085.patch, SOLR-10085.patch
>
>
> I'm trying out the Streaming Expressions in Solr 6.3.0. 
> Currently, I'm facing the issue of not being able to get the fields in the 
> result-set to be displayed in the same order as what I put in the query.
> For example, when I execute this query:
>  http://localhost:8983/solr/collection1/stream?expr=facet(collection1,
>   q="*:*",
>   buckets="id,cost,quantity",
>   bucketSorts="cost desc",
>   bucketSizeLimit=100,
>   sum(cost), 
>   sum(quantity),
>   min(cost), 
>   min(quantity),
>   max(cost), 
>   max(quantity),
>   avg(cost), 
>   avg(quantity),
>   count(*))=true
> I get the following in the result-set.
>{
>   "result-set":{"docs":[
>   {
> "min(quantity)":12.21,
> "avg(quantity)":12.21,
> "sum(cost)":256.33,
> "max(cost)":256.33,
> "count(*)":1,
> "min(cost)":256.33,
> "cost":256.33,
> "avg(cost)":256.33,
> "quantity":12.21,
> "id":"01",
> "sum(quantity)":12.21,
> "max(quantity)":12.21},
>   {
> "EOF":true,
> "RESPONSE_TIME":359}]}}
> The fields are displayed randomly all over the place, instead of the order 
> sum, min, max, avg as in the query. This may cause confusion to user who look 
> at the output.  Possible improvement to display the fields in the result-set 
> in the same order as the query



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10046) Create UninvertDocValuesMergePolicy

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948991#comment-15948991
 ] 

ASF subversion and git services commented on SOLR-10046:


Commit 4d3e94befcb5ea361ceff1fcff1bdc3e6166fdf1 in lucene-solr's branch 
refs/heads/branch_6_5 from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4d3e94b ]

SOLR-10046: remove CHANGES.txt entry

(Reverses unintentional add alongside SOLR-10085 and SOLR-10254 CHANGES.txt 
update.)


> Create UninvertDocValuesMergePolicy
> ---
>
> Key: SOLR-10046
> URL: https://issues.apache.org/jira/browse/SOLR-10046
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Keith Laban
>Assignee: Christine Poerschke
>
> Create a merge policy that can detect schema changes and use 
> UninvertingReader to uninvert fields and write docvalues into merged segments 
> when a field has docvalues enabled.
> The current behavior is to write null values in the merged segment which can 
> lead to data integrity problems when sorting or faceting pending a full 
> reindex. 
> With this patch it would still be recommended to reindex when adding 
> docvalues for performance reasons, as it not guarenteed all segments will be 
> merged with docvalues turned on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 6.5.1 release?

2017-03-30 Thread Joel Bernstein

Sounds good.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Mar 30, 2017 at 1:12 PM, Adrien Grand  wrote:

> +1
>
> I'll include https://issues.apache.org/jira/browse/LUCENE-7755 as well if
> it works for you.
>
> Le jeu. 30 mars 2017 à 12:19, Joel Bernstein  a
> écrit :
>
> Hi,
>
> I would like to have a 6.5.1 release due to https://issues.apache.org/
> jira/browse/SOLR-10341.
>
> The fix for this is committed and back ported. I'm traveling this week.
> But can be the release manager for this next week.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>

[jira] [Commented] (SOLR-10254) significantTerms Streaming Expression should work in non-SolrCloud mode

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948987#comment-15948987
 ] 

ASF subversion and git services commented on SOLR-10254:


Commit 09373aaa0875b8ae2bb795d5dfafbdb1450546cc in lucene-solr's branch 
refs/heads/branch_6x from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=09373aa ]

SOLR-10046: remove CHANGES.txt entry

(Reverses unintentional add alongside SOLR-10085 and SOLR-10254 CHANGES.txt 
update.)


> significantTerms Streaming Expression should work in non-SolrCloud mode
> ---
>
> Key: SOLR-10254
> URL: https://issues.apache.org/jira/browse/SOLR-10254
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-10254.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10085) SQL result set fields should be ordered by the field list

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948986#comment-15948986
 ] 

ASF subversion and git services commented on SOLR-10085:


Commit 09373aaa0875b8ae2bb795d5dfafbdb1450546cc in lucene-solr's branch 
refs/heads/branch_6x from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=09373aa ]

SOLR-10046: remove CHANGES.txt entry

(Reverses unintentional add alongside SOLR-10085 and SOLR-10254 CHANGES.txt 
update.)


> SQL result set fields should be ordered by the field list
> -
>
> Key: SOLR-10085
> URL: https://issues.apache.org/jira/browse/SOLR-10085
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: faceting
>Affects Versions: 6.3
> Environment: Windows 8.1, Java 8
>Reporter: Yeo Zheng Lin
>Assignee: Joel Bernstein
>  Labels: json, streaming
> Attachments: SOLR-10085.patch, SOLR-10085.patch, SOLR-10085.patch
>
>
> I'm trying out the Streaming Expressions in Solr 6.3.0. 
> Currently, I'm facing the issue of not being able to get the fields in the 
> result-set to be displayed in the same order as what I put in the query.
> For example, when I execute this query:
>  http://localhost:8983/solr/collection1/stream?expr=facet(collection1,
>   q="*:*",
>   buckets="id,cost,quantity",
>   bucketSorts="cost desc",
>   bucketSizeLimit=100,
>   sum(cost), 
>   sum(quantity),
>   min(cost), 
>   min(quantity),
>   max(cost), 
>   max(quantity),
>   avg(cost), 
>   avg(quantity),
>   count(*))=true
> I get the following in the result-set.
>{
>   "result-set":{"docs":[
>   {
> "min(quantity)":12.21,
> "avg(quantity)":12.21,
> "sum(cost)":256.33,
> "max(cost)":256.33,
> "count(*)":1,
> "min(cost)":256.33,
> "cost":256.33,
> "avg(cost)":256.33,
> "quantity":12.21,
> "id":"01",
> "sum(quantity)":12.21,
> "max(quantity)":12.21},
>   {
> "EOF":true,
> "RESPONSE_TIME":359}]}}
> The fields are displayed randomly all over the place, instead of the order 
> sum, min, max, avg as in the query. This may cause confusion to user who look 
> at the output.  Possible improvement to display the fields in the result-set 
> in the same order as the query



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10046) Create UninvertDocValuesMergePolicy

2017-03-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948985#comment-15948985
 ] 

ASF subversion and git services commented on SOLR-10046:


Commit 09373aaa0875b8ae2bb795d5dfafbdb1450546cc in lucene-solr's branch 
refs/heads/branch_6x from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=09373aa ]

SOLR-10046: remove CHANGES.txt entry

(Reverses unintentional add alongside SOLR-10085 and SOLR-10254 CHANGES.txt 
update.)


> Create UninvertDocValuesMergePolicy
> ---
>
> Key: SOLR-10046
> URL: https://issues.apache.org/jira/browse/SOLR-10046
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Keith Laban
>Assignee: Christine Poerschke
>
> Create a merge policy that can detect schema changes and use 
> UninvertingReader to uninvert fields and write docvalues into merged segments 
> when a field has docvalues enabled.
> The current behavior is to write null values in the merged segment which can 
> lead to data integrity problems when sorting or faceting pending a full 
> reindex. 
> With this patch it would still be recommended to reindex when adding 
> docvalues for performance reasons, as it not guarenteed all segments will be 
> merged with docvalues turned on.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10383) NPE on debug query in SOLR UI - LTR OriginalScoreFeature

2017-03-30 Thread Christine Poerschke (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-10383:
---
Attachment: SOLR-10383.patch

Attaching partial extension to 
[TestOriginalScoreFeature|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/src/test/org/apache/solr/ltr/feature/TestOriginalScoreFeature.java]
 test.

> NPE on debug query in SOLR UI - LTR OriginalScoreFeature
> 
>
> Key: SOLR-10383
> URL: https://issues.apache.org/jira/browse/SOLR-10383
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.4.2
>Reporter: Vitezslav Zak
> Attachments: SOLR-10383.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi,
> there is a NPE if I want to debug query in SOLR UI.
> I'm using LTR for reranking result.
> My features:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.160Z",
>   "updatedSinceInit":"2017-03-29T05:56:28.721Z",
>   "managedList":[
> {
>   "name":"documentRecency",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"q":"{!func}recip( ms(NOW,initial_release_date), 3.16e-11, 1, 
> 1)"},
>   "store":"_DEFAULT_"},
> {
>   "name":"niceness",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"fq":["{!func}recip(niceness, 0.1, 1, 1)"]},
>   "store":"_DEFAULT_"},
> {
>   "name":"originalScore",
>   "class":"org.apache.solr.ltr.feature.OriginalScoreFeature",
>   "params":null,
>   "store":"_DEFAULT_"}]}
> {code}
> My model:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.167Z",
>   "updatedSinceInit":"2017-03-29T05:54:26.100Z",
>   "managedList":[{
>   "name":"myModel",
>   "class":"org.apache.solr.ltr.model.LinearModel",
>   "store":"_DEFAULT_",
>   "features":[
> {
>   "name":"documentRecency",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"niceness",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"originalScore",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}}],
>   "params":{"weights":{
>   "documentRecency":0.1,
>   "niceness":1.0,
>   "originalScore":0.5}}}]}
> {code}
> NPE occurs in this method, where docInfo is null.
> {code:title=OriginalScoreFeature.java}
> @Override
>   public float score() throws IOException {
> // This is done to improve the speed of feature extraction. Since this
> // was already scored in step 1
> // we shouldn't need to calc original score again.
> final DocInfo docInfo = getDocInfo();
> return (docInfo.hasOriginalDocScore() ? docInfo.getOriginalDocScore() 
> : originalScorer.score());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7758) EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original tokens

2017-03-30 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948956#comment-15948956
 ] 

Adrien Grand commented on LUCENE-7758:
--

I agree with what you are saying, it would like to have the ability to modify 
offsets in the cases that it makes sense too. I just wanted to react to your 
comment that the current behaviour is irrational. Moreover, I would not be 
surprised that highlighting the entire token is a desired behaviour for some 
users.

I haven't thought about it much but it feels to me that we would need a way to 
annotate token streams in order to know whether the content of the 
CharTermAttribute matches the original text between the offsets stored in the 
OffsetAttribute if we want to change offsets safely.

> EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original 
> tokens
> --
>
> Key: LUCENE-7758
> URL: https://issues.apache.org/jira/browse/LUCENE-7758
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.4.1
> Environment: elasticsearch-5.3
>Reporter: Mikhail Bystryantsev
>  Labels: EdgeNGramTokenFilter, highlighting
>
> When EdgeNGramTokenFilter produces new tokens, they inherit end positions 
> from parent tokens. This behaviour is irrational and breaks highlighting: 
> highlighted not matched pattern, but whole source tokens.
> Seems like similar problem was fixed in LUCENE-3642, but end offsets was 
> broken again after LUCENE-3907.
> Some discussion was found in SOLR-7926:
> {quote}I agree this (highlighting of hits from tokens produced by
> EdgeNGramFilter) got worse with LUCENE-3907, but it's not clear how to
> fix it.
> The stacking seems more correct: all these grams are logically
> interchangeable with the original token, and were derived from it, so
> e.g. a phrase query involving them with adjacent tokens would work
> correctly.
> We could perhaps remove the token graph requirement that tokens
> leaving from the same node have the same startOffset, and arriving to
> the same node have the same endOffset. Lucene would still be able to
> index such a graph, as long as all tokens leaving a given node are
> sorted according to their startOffset. But I'm not sure if there
> would be other problems...
> Or we could maybe improve the token graph, at least for the non-edge
> NGramTokenFilter, so that the grams are linked up correctly, so that any
> path through the graph reconstructs the original characters.
> But realistically it's not possible to innovate much with token graphs
> in Lucene today because of apparently severe back compat requirements:
> e.g. LUCENE-6664, which fixes the token graph bugs in the existing
> SynonymFilter so that proximity queries work correctly when using
> search-time synonyums, is blocked because of the back compat concerns
> from LUCENE-6721.
> I'm not sure what the path forward is...{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 6.5.1 release?

2017-03-30 Thread Adrien Grand

+1

I'll include https://issues.apache.org/jira/browse/LUCENE-7755 as well if
it works for you.

Le jeu. 30 mars 2017 à 12:19, Joel Bernstein  a écrit :

Hi,

I would like to have a 6.5.1 release due to
https://issues.apache.org/jira/browse/SOLR-10341.

The fix for this is committed and back ported. I'm traveling this week. But
can be the release manager for this next week.


Joel Bernstein
http://joelsolr.blogspot.com/

[jira] [Updated] (SOLR-9530) Add an Atomic Update Processor

2017-03-30 Thread Amrit Sarkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amrit Sarkar updated SOLR-9530:
---
Attachment: SOLR-9530.patch

Updated patch as per Noble's suggestions.

> Add an Atomic Update Processor 
> ---
>
> Key: SOLR-9530
> URL: https://issues.apache.org/jira/browse/SOLR-9530
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
> Attachments: SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, 
> SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, SOLR-9530.patch, 
> SOLR-9530.patch
>
>
> I'd like to explore the idea of adding a new update processor to help ingest 
> partial updates.
> Example use-case - There are two datasets with a common id field. How can I 
> merge both of them at index time?
> Proposed Solution: 
> {code}
> 
>   
> add
>   
>   
>   
> 
> {code}
> So the first JSON dump could be ingested against 
> {{http://localhost:8983/solr/gettingstarted/update/json}}
> And then the second JSON could be ingested against
> {{http://localhost:8983/solr/gettingstarted/update/json?processor=atomic}}
> The Atomic Update Processor could support all the atomic update operations 
> currently supported.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-10383) NPE on debug query in SOLR UI - LTR OriginalScoreFeature

2017-03-30 Thread Vitezslav Zak (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948899#comment-15948899
 ] 

Vitezslav Zak edited comment on SOLR-10383 at 3/30/17 11:42 AM:


Hi Christine,

I try run test scenario as you said. I run example techproducts. I add features 
and model by curl commands.
{code}
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@/home/zaky/prac/myFeatures.json" -H 
'Content-type:application/json'
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@/home/zaky/prac/myModel.json" -H 'Content-type:application/json'
{code}

Features:
{code}
[
  {
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]
{code}

Model:
{code}
{
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "myModel",
  "features" : [
{ "name" : "originalScore" }
  ],
  "params" : {
"weights" : {
  "originalScore" : 1.0
}
  }
}
{code}

Then I go to the Solr UI/core/Query - run a query with debugQuery And at the 
end of results, there is a error.

I ran this url too and there is at the end error too:
[http://localhost:8983/solr/techproducts/select?debugQuery=on=on=*:*={!ltr%20model=myModel%20reRankDocs=100}=json]



was (Author: zaky.vit):
Hi Christine,

I try run test scenario as you said. I run example techproducts. I add features 
and model by curl commands.
{code}
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@/home/zaky/prac/myFeatures.json" -H 
'Content-type:application/json'
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@/home/zaky/prac/myModel.json" -H 'Content-type:application/json'
{code}

Features:
{code}
[
  {
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]
{code}

Model:
{code}
{
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "myModel",
  "features" : [
{ "name" : "originalScore" }
  ],
  "params" : {
"weights" : {
  "originalScore" : 1.0
}
  }
}
{code}

Then I go to the Solr UI/core/Query - run a query with debugQuery And at the 
end of results, there is a error.

I ran this url too and there is at the end error too:
http://localhost:8983/solr/techproducts/select?debugQuery=on=on=*:*={!ltr%20model=myModel%20reRankDocs=100}=json


> NPE on debug query in SOLR UI - LTR OriginalScoreFeature
> 
>
> Key: SOLR-10383
> URL: https://issues.apache.org/jira/browse/SOLR-10383
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.4.2
>Reporter: Vitezslav Zak
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi,
> there is a NPE if I want to debug query in SOLR UI.
> I'm using LTR for reranking result.
> My features:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.160Z",
>   "updatedSinceInit":"2017-03-29T05:56:28.721Z",
>   "managedList":[
> {
>   "name":"documentRecency",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"q":"{!func}recip( ms(NOW,initial_release_date), 3.16e-11, 1, 
> 1)"},
>   "store":"_DEFAULT_"},
> {
>   "name":"niceness",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"fq":["{!func}recip(niceness, 0.1, 1, 1)"]},
>   "store":"_DEFAULT_"},
> {
>   "name":"originalScore",
>   "class":"org.apache.solr.ltr.feature.OriginalScoreFeature",
>   "params":null,
>   "store":"_DEFAULT_"}]}
> {code}
> My model:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.167Z",
>   "updatedSinceInit":"2017-03-29T05:54:26.100Z",
>   "managedList":[{
>   "name":"myModel",
>   "class":"org.apache.solr.ltr.model.LinearModel",
>   "store":"_DEFAULT_",
>   "features":[
> {
>   "name":"documentRecency",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"niceness",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"originalScore",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}}],
>   "params":{"weights":{
>   "documentRecency":0.1,
>   "niceness":1.0,
>   "originalScore":0.5}}}]}
> {code}
> NPE occurs in this method, where docInfo is null.
> {code:title=OriginalScoreFeature.java}
> @Override
>   public float score() throws IOException {
> // This is done to improve the speed of feature extraction. Since this
> // was already scored in step 1
> // we shouldn't need to calc original

[jira] [Comment Edited] (SOLR-10383) NPE on debug query in SOLR UI - LTR OriginalScoreFeature

2017-03-30 Thread Vitezslav Zak (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948899#comment-15948899
 ] 

Vitezslav Zak edited comment on SOLR-10383 at 3/30/17 11:42 AM:


Hi Christine,

I try run test scenario as you said. I run example techproducts. I add features 
and model by curl commands.
{code}
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@/home/zaky/prac/myFeatures.json" -H 
'Content-type:application/json'
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@/home/zaky/prac/myModel.json" -H 'Content-type:application/json'
{code}

Features:
{code}
[
  {
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]
{code}

Model:
{code}
{
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "myModel",
  "features" : [
{ "name" : "originalScore" }
  ],
  "params" : {
"weights" : {
  "originalScore" : 1.0
}
  }
}
{code}

Then I go to the Solr UI/core/Query - run a query with debugQuery And at the 
end of results, there is a error.

I ran this url too and there is at the end error too:
http://localhost:8983/solr/techproducts/select?debugQuery=on=on=*:*={!ltr%20model=myModel%20reRankDocs=100}=json



was (Author: zaky.vit):
Hi Christine,

I try run test scenario as you said. I run example techproducts. I add features 
and model by curl commands.
{code}
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@/home/zaky/prac/myFeatures.json" -H 
'Content-type:application/json'
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@/home/zaky/prac/myModel.json" -H 'Content-type:application/json'
{code}

Features:
{code}
[
  {
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]
{code}

Model:
{code}
{
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "myModel",
  "features" : [
{ "name" : "originalScore" }
  ],
  "params" : {
"weights" : {
  "originalScore" : 1.0
}
  }
}
{code}

Then I go to the Solr UI/core/Query - run a query with debugQuery And at the 
end of results, there is a error.

I ran this url too and there is at the end error too:
[http://localhost:8983/solr/techproducts/select?debugQuery=on=on=*:*={!ltr%20model=myModel%20reRankDocs=100}=json]


> NPE on debug query in SOLR UI - LTR OriginalScoreFeature
> 
>
> Key: SOLR-10383
> URL: https://issues.apache.org/jira/browse/SOLR-10383
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.4.2
>Reporter: Vitezslav Zak
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi,
> there is a NPE if I want to debug query in SOLR UI.
> I'm using LTR for reranking result.
> My features:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.160Z",
>   "updatedSinceInit":"2017-03-29T05:56:28.721Z",
>   "managedList":[
> {
>   "name":"documentRecency",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"q":"{!func}recip( ms(NOW,initial_release_date), 3.16e-11, 1, 
> 1)"},
>   "store":"_DEFAULT_"},
> {
>   "name":"niceness",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"fq":["{!func}recip(niceness, 0.1, 1, 1)"]},
>   "store":"_DEFAULT_"},
> {
>   "name":"originalScore",
>   "class":"org.apache.solr.ltr.feature.OriginalScoreFeature",
>   "params":null,
>   "store":"_DEFAULT_"}]}
> {code}
> My model:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.167Z",
>   "updatedSinceInit":"2017-03-29T05:54:26.100Z",
>   "managedList":[{
>   "name":"myModel",
>   "class":"org.apache.solr.ltr.model.LinearModel",
>   "store":"_DEFAULT_",
>   "features":[
> {
>   "name":"documentRecency",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"niceness",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"originalScore",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}}],
>   "params":{"weights":{
>   "documentRecency":0.1,
>   "niceness":1.0,
>   "originalScore":0.5}}}]}
> {code}
> NPE occurs in this method, where docInfo is null.
> {code:title=OriginalScoreFeature.java}
> @Override
>   public float score() throws IOException {
> // This is done to improve the speed of feature extraction. Since this
> // was already scored in step 1
> // we shouldn't need to calc original

[jira] [Commented] (SOLR-10383) NPE on debug query in SOLR UI - LTR OriginalScoreFeature

2017-03-30 Thread Vitezslav Zak (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948899#comment-15948899
 ] 

Vitezslav Zak commented on SOLR-10383:
--

Hi Christine,

I try run test scenario as you said. I run example techproducts. I add features 
and model by curl commands.
{code}
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/feature-store' 
--data-binary "@/home/zaky/prac/myFeatures.json" -H 
'Content-type:application/json'
curl -XPUT 'http://localhost:8983/solr/techproducts/schema/model-store' 
--data-binary "@/home/zaky/prac/myModel.json" -H 'Content-type:application/json'
{code}

Features:
{code}
[
  {
"name" : "originalScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]
{code}

Model:
{code}
{
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "myModel",
  "features" : [
{ "name" : "originalScore" }
  ],
  "params" : {
"weights" : {
  "originalScore" : 1.0
}
  }
}
{code}

Then I go to the Solr UI/core/Query - run a query with debugQuery And at the 
end of results, there is a error.

I ran this url too and there is at the end error too:
http://localhost:8983/solr/techproducts/select?debugQuery=on=on=*:*={!ltr%20model=myModel%20reRankDocs=100}=json


> NPE on debug query in SOLR UI - LTR OriginalScoreFeature
> 
>
> Key: SOLR-10383
> URL: https://issues.apache.org/jira/browse/SOLR-10383
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.4.2
>Reporter: Vitezslav Zak
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi,
> there is a NPE if I want to debug query in SOLR UI.
> I'm using LTR for reranking result.
> My features:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.160Z",
>   "updatedSinceInit":"2017-03-29T05:56:28.721Z",
>   "managedList":[
> {
>   "name":"documentRecency",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"q":"{!func}recip( ms(NOW,initial_release_date), 3.16e-11, 1, 
> 1)"},
>   "store":"_DEFAULT_"},
> {
>   "name":"niceness",
>   "class":"org.apache.solr.ltr.feature.SolrFeature",
>   "params":{"fq":["{!func}recip(niceness, 0.1, 1, 1)"]},
>   "store":"_DEFAULT_"},
> {
>   "name":"originalScore",
>   "class":"org.apache.solr.ltr.feature.OriginalScoreFeature",
>   "params":null,
>   "store":"_DEFAULT_"}]}
> {code}
> My model:
> {code}
> {
>   "initArgs":{},
>   "initializedOn":"2017-03-29T05:32:52.167Z",
>   "updatedSinceInit":"2017-03-29T05:54:26.100Z",
>   "managedList":[{
>   "name":"myModel",
>   "class":"org.apache.solr.ltr.model.LinearModel",
>   "store":"_DEFAULT_",
>   "features":[
> {
>   "name":"documentRecency",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"niceness",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}},
> {
>   "name":"originalScore",
>   "norm":{"class":"org.apache.solr.ltr.norm.IdentityNormalizer"}}],
>   "params":{"weights":{
>   "documentRecency":0.1,
>   "niceness":1.0,
>   "originalScore":0.5}}}]}
> {code}
> NPE occurs in this method, where docInfo is null.
> {code:title=OriginalScoreFeature.java}
> @Override
>   public float score() throws IOException {
> // This is done to improve the speed of feature extraction. Since this
> // was already scored in step 1
> // we shouldn't need to calc original score again.
> final DocInfo docInfo = getDocInfo();
> return (docInfo.hasOriginalDocScore() ? docInfo.getOriginalDocScore() 
> : originalScorer.score());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-7758) EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original tokens

2017-03-30 Thread Mikhail Bystryantsev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948882#comment-15948882
 ] 

Mikhail Bystryantsev edited comment on LUCENE-7758 at 3/30/17 11:30 AM:


{quote}Now imagine that someone is applying edge n-grams on top of synonyms, 
this could generate broken offsets (going backwards for instance) so keeping 
the original offsets is the only safe move{quote}
But why one feature should break another? I don't use synonyms or something 
like that, but I have no possibility to use token filter with properly offsets.

{quote}A workaround to this issue is to use the (edge) n-gram tokenizers (as 
opposed to filters){quote}
Such workaround applicable only to cases when input text can be simple splitted 
on specified characters. In my case I want to use {{icu_tokenizer}} before 
{{edge_ngram}} for properly split by words. For example, imagine japan language.


was (Author: mbystryantsev):
{quote}Now imagine that someone is applying edge n-grams on top of synonyms, 
this could generate broken offsets (going backwards for instance) so keeping 
the original offsets is the only safe move{quote}
But why one feature should break another? I don't use synonyms or something 
like that, but I have no possibility to use token filter with properly offsets.

{quote}A workaround to this issue is to use the (edge) n-gram tokenizers (as 
opposed to filters){quote}
Such workaround applicable only to cases when input text can be simple splitted 
on specified characters. In my case I want to use `icu_tokenizer` before 
`edge_ngram` for properly split by words. For example, imagine japan language.

> EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original 
> tokens
> --
>
> Key: LUCENE-7758
> URL: https://issues.apache.org/jira/browse/LUCENE-7758
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.4.1
> Environment: elasticsearch-5.3
>Reporter: Mikhail Bystryantsev
>  Labels: EdgeNGramTokenFilter, highlighting
>
> When EdgeNGramTokenFilter produces new tokens, they inherit end positions 
> from parent tokens. This behaviour is irrational and breaks highlighting: 
> highlighted not matched pattern, but whole source tokens.
> Seems like similar problem was fixed in LUCENE-3642, but end offsets was 
> broken again after LUCENE-3907.
> Some discussion was found in SOLR-7926:
> {quote}I agree this (highlighting of hits from tokens produced by
> EdgeNGramFilter) got worse with LUCENE-3907, but it's not clear how to
> fix it.
> The stacking seems more correct: all these grams are logically
> interchangeable with the original token, and were derived from it, so
> e.g. a phrase query involving them with adjacent tokens would work
> correctly.
> We could perhaps remove the token graph requirement that tokens
> leaving from the same node have the same startOffset, and arriving to
> the same node have the same endOffset. Lucene would still be able to
> index such a graph, as long as all tokens leaving a given node are
> sorted according to their startOffset. But I'm not sure if there
> would be other problems...
> Or we could maybe improve the token graph, at least for the non-edge
> NGramTokenFilter, so that the grams are linked up correctly, so that any
> path through the graph reconstructs the original characters.
> But realistically it's not possible to innovate much with token graphs
> in Lucene today because of apparently severe back compat requirements:
> e.g. LUCENE-6664, which fixes the token graph bugs in the existing
> SynonymFilter so that proximity queries work correctly when using
> search-time synonyums, is blocked because of the back compat concerns
> from LUCENE-6721.
> I'm not sure what the path forward is...{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7758) EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original tokens

2017-03-30 Thread Mikhail Bystryantsev (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948882#comment-15948882
 ] 

Mikhail Bystryantsev commented on LUCENE-7758:
--

{quote}Now imagine that someone is applying edge n-grams on top of synonyms, 
this could generate broken offsets (going backwards for instance) so keeping 
the original offsets is the only safe move{quote}
But why one feature should break another? I don't use synonyms or something 
like that, but I have no possibility to use token filter with properly offsets.

{quote}A workaround to this issue is to use the (edge) n-gram tokenizers (as 
opposed to filters){quote}
Such workaround applicable only to cases when input text can be simple splitted 
on specified characters. In my case I want to use `icu_tokenizer` before 
`edge_ngram` for properly split by words. For example, imagine japan language.

> EdgeNGramTokenFilter breaks highlighting by keeping end offsets of original 
> tokens
> --
>
> Key: LUCENE-7758
> URL: https://issues.apache.org/jira/browse/LUCENE-7758
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 6.4.1
> Environment: elasticsearch-5.3
>Reporter: Mikhail Bystryantsev
>  Labels: EdgeNGramTokenFilter, highlighting
>
> When EdgeNGramTokenFilter produces new tokens, they inherit end positions 
> from parent tokens. This behaviour is irrational and breaks highlighting: 
> highlighted not matched pattern, but whole source tokens.
> Seems like similar problem was fixed in LUCENE-3642, but end offsets was 
> broken again after LUCENE-3907.
> Some discussion was found in SOLR-7926:
> {quote}I agree this (highlighting of hits from tokens produced by
> EdgeNGramFilter) got worse with LUCENE-3907, but it's not clear how to
> fix it.
> The stacking seems more correct: all these grams are logically
> interchangeable with the original token, and were derived from it, so
> e.g. a phrase query involving them with adjacent tokens would work
> correctly.
> We could perhaps remove the token graph requirement that tokens
> leaving from the same node have the same startOffset, and arriving to
> the same node have the same endOffset. Lucene would still be able to
> index such a graph, as long as all tokens leaving a given node are
> sorted according to their startOffset. But I'm not sure if there
> would be other problems...
> Or we could maybe improve the token graph, at least for the non-edge
> NGramTokenFilter, so that the grams are linked up correctly, so that any
> path through the graph reconstructs the original characters.
> But realistically it's not possible to innovate much with token graphs
> in Lucene today because of apparently severe back compat requirements:
> e.g. LUCENE-6664, which fixes the token graph bugs in the existing
> SynonymFilter so that proximity queries work correctly when using
> search-time synonyums, is blocked because of the back compat concerns
> from LUCENE-6721.
> I'm not sure what the path forward is...{quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10347) Remove index level boost support from "documents" section of the admin UI

2017-03-30 Thread Amrit Sarkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amrit Sarkar updated SOLR-10347:

Attachment: SOLR-10347.patch

> Remove index level boost support from "documents" section of the admin UI
> -
>
> Key: SOLR-10347
> URL: https://issues.apache.org/jira/browse/SOLR-10347
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Tomás Fernández Löbbe
> Attachments: SOLR-10347.patch
>
>
> Index-time boost is deprecated since LUCENE-6819



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10347) Remove index level boost support from "documents" section of the admin UI

2017-03-30 Thread Amrit Sarkar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948872#comment-15948872
 ] 

Amrit Sarkar commented on SOLR-10347:
-

[~arafalov] Point noted.

SOLR-10347.patch uploaded. Very trivial stuff, commented out relevant lines, 
object declarations from html and js files both in new Angular UI and the 
original/classic.

> Remove index level boost support from "documents" section of the admin UI
> -
>
> Key: SOLR-10347
> URL: https://issues.apache.org/jira/browse/SOLR-10347
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Tomás Fernández Löbbe
>
> Index-time boost is deprecated since LUCENE-6819



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Labels: NLP Streaming  (was: )

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10351:
--
Fix Version/s: 6.6

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: NLP, Streaming
> Fix For: 6.6
>
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-10351) Add analyze Stream Evaluator to support streaming NLP

2017-03-30 Thread Joel Bernstein (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein reassigned SOLR-10351:
-

Assignee: Joel Bernstein

> Add analyze Stream Evaluator to support streaming NLP
> -
>
> Key: SOLR-10351
> URL: https://issues.apache.org/jira/browse/SOLR-10351
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The *analyze* Stream Evaluator uses a Solr analyzer to return a collection of 
> tokens from a *text field*. The collection of tokens can then be streamed out 
> by  the *cartesianProduct* Streaming Expression or attached to documents as 
> multi-valued fields by the *select* Streaming Expression.
> This allows Streaming Expressions to leverage all the existing tokenizers and 
> filters and provides a place for future NLP analyzers to be added to 
> Streaming Expressions.
> Sample syntax:
> {code}
> cartesianProduct(expr, analyze(analyzerField, textField) as outfield )
> {code}
> {code}
> select(expr, analyze(analyzerField, textField) as outfield )
> {code}
> Combined with Solr's batch text processing capabilities this provides an 
> entire parallel NLP framework. Solr's batch processing capabilities are 
> described here:
> *Batch jobs, Parallel ETL and Streaming Text Transformation*
> http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 116 matches

Mail list logo