[jira] [Updated] (SOLR-9518) Kerberos Delegation Tokens doesn't work without a chrooted ZK

2016-09-25 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9518:
---
Description: 
Starting up Solr 6.2.0 (with delegation tokens enabled) that doesn't have a 
chrooted ZK, I see the following in the startup logs:

{code}
2016-09-15 07:08:22.453 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not 
start Solr. Check solr/home property and the logs
2016-09-15 07:08:22.477 ERROR (main) [   ] o.a.s.c.SolrCore 
null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1927)
at org.apache.solr.security.KerberosPlugin.init(KerberosPlugin.java:138)
at 
org.apache.solr.core.CoreContainer.initializeAuthenticationPlugin(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:442)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:158)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:134)
at 
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:137)
at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:856)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:348)
at 
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1379)
at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1341)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:772)
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at 
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:517)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at 
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at 
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at 
org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at 
org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
at 
org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
{code}

To me, it seems that adding a check for the presence of a chrooted ZK, and, 
calculating the relative ZK path only if it exists should suffice. I'll add a 
patch for this shortly.

  was:
Starting up Solr 6.2.0 that doesn't have a chrooted ZK, I see the following in 
the startup logs:

{code}
2016-09-15 07:08:22.453 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not 
start Solr. Check solr/home property and the logs
2016-09-15 07:08:22.477 ERROR (main) [   ] o.a.s.c.SolrCore 
null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1927)
at org.apache.solr.security.KerberosPlugin.init(KerberosPlugin.java:138)
at 
org.apache.solr.core.CoreContainer.initializeAuthenticationPlugin(CoreContainer.java:316)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:442)
at 
org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:158)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:134)
at 
org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:137)
at 
org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:856)
at 
org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:348)
at 
org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1379)
at 
org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1341)
at 
org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:772)
at 
org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
at 
org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:517)
at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at 
org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at 
org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at 
org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at 

[jira] [Commented] (SOLR-9518) Kerberos Delegation Tokens doesn't work without a chrooted ZK

2016-09-25 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522053#comment-15522053
 ] 

Ishan Chattopadhyaya commented on SOLR-9518:


[~noble.paul], can you please review?

> Kerberos Delegation Tokens doesn't work without a chrooted ZK
> -
>
> Key: SOLR-9518
> URL: https://issues.apache.org/jira/browse/SOLR-9518
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-9518.patch, SOLR-9518.patch
>
>
> Starting up Solr 6.2.0 that doesn't have a chrooted ZK, I see the following 
> in the startup logs:
> {code}
> 2016-09-15 07:08:22.453 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could 
> not start Solr. Check solr/home property and the logs
> 2016-09-15 07:08:22.477 ERROR (main) [   ] o.a.s.c.SolrCore 
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> at java.lang.String.substring(String.java:1927)
> at 
> org.apache.solr.security.KerberosPlugin.init(KerberosPlugin.java:138)
> at 
> org.apache.solr.core.CoreContainer.initializeAuthenticationPlugin(CoreContainer.java:316)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:442)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:158)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:134)
> at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:137)
> at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:856)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:348)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1379)
> at 
> org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1341)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:772)
> at 
> org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:261)
> at 
> org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:517)
> at 
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
> at 
> org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
> at 
> org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
> at 
> org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
> at 
> org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
> at 
> org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
> at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
> at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
> {code}
> To me, it seems that adding a check for the presence of a chrooted ZK, and, 
> calculating the relative ZK path only if it exists should suffice. I'll add a 
> patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7460) Should SortedNumericDocValues expose a per-document random-access API?

2016-09-25 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522041#comment-15522041
 ] 

Adrien Grand commented on LUCENE-7460:
--

Sorted numerics make it a bit hard to reason about to me since I am not very 
clear about the use-cases, but I guess that in some cases one would want to use 
the minimum value when sorting in ascending order and the max value when 
sorting in descending order, so having fast access to the maximum value too 
feels like an important feature. Of course users can index the min/max values 
directly but I think there is also some value in flexibility, eg. we do not 
require users to index edge n-grams to run prefix queries.

That said I do not feel too strongly about it and mostly wanted to give some 
visibility to this change of our doc values API and discuss it. If you feel 
strongly about keeping the iterator API, I'm good with it.

> Should SortedNumericDocValues expose a per-document random-access API?
> --
>
> Key: LUCENE-7460
> URL: https://issues.apache.org/jira/browse/LUCENE-7460
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
>
> Sorted numerics used to expose a per-document random-access API so that 
> accessing the median or max element would be cheap. The new 
> SortedNumericDocValues still exposes the number of values a document has, but 
> the only way to read values is to use {nextValue}, which forces to read all 
> values in order to read the max value.
> For instance, {{SortedNumericSelector.MAX}} does the following in master (the 
> important part is the for-loop):
> {code}
> private void setValue() throws IOException {
>   int count = in.docValueCount();
>   for(int i=0;i value = in.nextValue();
>   }
> }
> @Override
> public int nextDoc() throws IOException {
>   int docID = in.nextDoc();
>   if (docID != NO_MORE_DOCS) {
> setValue();
>   }
>   return docID;
> }
> {code}
> while it used to simply look up the value at index {{count-1}} in 6.x:
> {code}
> @Override
> public long get(int docID) {
>   in.setDocument(docID);
>   final int count = in.count();
>   if (count == 0) {
> return 0; // missing
>   } else {
> return in.valueAt(count-1);
>   }
> }
> {code}
> This could be a conscious decision since a sequential API gives more 
> opportunities to the codec to compress efficiently, but on the other hand 
> this API prevents sorting by max or median values to be efficient.
> On my end I have a preference for the random-access API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Solaris (64bit/jdk1.8.0) - Build # 412 - Still Unstable!

2016-09-25 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/412/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

3 tests failed.
FAILED:  
org.apache.solr.util.TestSolrCLIRunExample.testInteractiveSolrCloudExample

Error Message:
After running Solr cloud example, test collection 'testCloudExamplePrompt' not 
found in Solr at: http://localhost:47541/solr; tool output:  Welcome to the 
SolrCloud example!  This interactive session will help you launch a SolrCloud 
cluster on your local workstation. To begin, how many Solr nodes would you like 
to run in your local cluster? (specify 1-4 nodes) [2]:  Ok, let's start up 1 
Solr nodes for your example SolrCloud cluster. Please enter the port for node1 
[8983]:  Oops! Looks like port 47541 is already being used by another process. 
Please choose a different port. Please enter a port for node 1 [8983]:  
Creating Solr home directory 
/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/build/solr-core/test/J1/temp/solr.util.TestSolrCLIRunExample_3D70E669AD8FD1F3-001/tempDir-002/cloud/node1/solr
  Starting up Solr on port 2 using command: 
/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/bin/solr start 
-cloud -p 2 -s 
"temp/solr.util.TestSolrCLIRunExample_3D70E669AD8FD1F3-001/tempDir-002/cloud/node1/solr"
  

Stack Trace:
java.lang.AssertionError: After running Solr cloud example, test collection 
'testCloudExamplePrompt' not found in Solr at: http://localhost:47541/solr; 
tool output: 
Welcome to the SolrCloud example!

This interactive session will help you launch a SolrCloud cluster on your local 
workstation.
To begin, how many Solr nodes would you like to run in your local cluster? 
(specify 1-4 nodes) [2]: 
Ok, let's start up 1 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]: 
Oops! Looks like port 47541 is already being used by another process. Please 
choose a different port.
Please enter a port for node 1 [8983]: 
Creating Solr home directory 
/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/build/solr-core/test/J1/temp/solr.util.TestSolrCLIRunExample_3D70E669AD8FD1F3-001/tempDir-002/cloud/node1/solr

Starting up Solr on port 2 using command:
/export/home/jenkins/workspace/Lucene-Solr-6.x-Solaris/solr/bin/solr start 
-cloud -p 2 -s 
"temp/solr.util.TestSolrCLIRunExample_3D70E669AD8FD1F3-001/tempDir-002/cloud/node1/solr"


at 
__randomizedtesting.SeedInfo.seed([3D70E669AD8FD1F3:E60106A39AFA1495]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.util.TestSolrCLIRunExample.testInteractiveSolrCloudExample(TestSolrCLIRunExample.java:434)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

[jira] [Created] (SOLR-9560) Solr should check max open files and other ulimits and refuse to start if they are set too low

2016-09-25 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-9560:
---

 Summary: Solr should check max open files and other ulimits and 
refuse to start if they are set too low
 Key: SOLR-9560
 URL: https://issues.apache.org/jira/browse/SOLR-9560
 Project: Solr
  Issue Type: Wish
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Shalin Shekhar Mangar
 Fix For: 6.3, master (7.0)


Solr should check max open files and other ulimits and refuse to start if they 
are set too low. Specifically:
# max open files should be at least 32768
# max memory size and virtual memory should both be unlimited



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] $PROJECT_NAME (${ENV,var="JAVA"}) - Build # $BUILD_NUMBER - $BUILD_STATUS!

2016-09-25 Thread Policeman Jenkins Server
Build: ${BUILD_URL}
Java: ${ENV,var="JAVA_DESC"}

${FAILED_TESTS}

Build Log:
${BUILD_LOG_MULTILINE_REGEX,regex="(?x: 
 \


\
(?:.*\\[javac\\]\\s++(?![1-9]\\d*\\s+error).*\\r?\\n)*+.*\\[javac\\]\\s+[1-9]\\d*\\s+error.*\\r?\\n
   \


\
|.*\\[junit4\\]\\s+Suite:.*+\\s++   
   \
 (?:.*\\[junit4\\]\\s++(?!Suite:)(?!Completed).*\\r?\\n)*+  
  \
 .*\\[junit4\\]\\s++Completed\\s+.*<<<\\s*FAILURES!\\r?\\n  
   \


\
|.*\\[junit4\\]\\s+JVM\\s+J\\d+:\\s+std(?:out|err)\\s+was\\s+not\\s+empty.*+\\s++
 \
 
(?:.*\\[junit4\\]\\s++(?!JVM\\s+\\d+:\\s+std)(?!\\<<<\\s+JVM\\s+J\\d+:\\s+EOF).*\\r?\\n)*+
   \
 .*\\[junit4\\]\\s++<<<\\s+JVM\\s+J\\d+:\\s+EOF.*\\r?\\n
\


 \
|.*rat-sources:.*\\r?\\n
   \
 
(?:\\s*+\\[echo\\]\\s*\\r?\\n|\\s*+\\[echo\\]\\s++(?![1-9]\\d*\\s+Unknown\\s+License)\\S.*\\r?\\n)*+
 \
 \\s*+\\[echo\\]\\s+[1-9]\\d*\\s+Unknown\\s+License.*\\r?\\n
  \
 (?:\\s*+\\[echo\\].*\\r?\\n)*+ 
  \


  \
|(?:.*\\r?\\n){2}.*\\[licenses\\]\\s+MISSING\\s+sha1(?:.*\\r?\\n){2}
  \


  \
|.*check-licenses:.*\\r?\\n\\s*\\[echo\\].*\\r?\\n  
  \
 \\s*\\[licenses\\]\\s+(?:MISSING\\s+LICENSE|CHECKSUM\\s+FAILED).*\\r?\\n   
   \
 (?:\\s*+\\[licenses\\].*\\r?\\n)++ 
 \


   \
|(?:.*\\[javadoc\\]\\s++(?![1-9]\\d*\\s+(?:error|warning)).+\\r?\\n)*+  
\
 .*\\[javadoc\\]\\s+[1-9]\\d*\\s+(?:error|warning).*\\r?\\n 
  \


\
|.*javadocs-lint:.*\\r?\\n(?:.*\\[exec\\].*\\r?\\n)*+   
 \


 \
|.*check.*:.*\\r?\\n

 \
 (?:\\s*+\\[forbidden-apis\\]\\s*\\r?\\n
 \
  |\\s*+\\[forbidden-apis\\]\\s++   
   \

(?!Scanned\\s+\\d+\\s+(?:\\(and\\s+\\d+\\s+related\\)\\s+)?class\\s+file\\(s\\))\\S.*\\r?\n)*+
   \
 \\s*+\\[forbidden-apis\\]\\s++ 
 

Re: Solr configuration format fracturing

2016-09-25 Thread Alexandre Rafalovitch
Did you know about configoverlay.json?

+1 to the discussion.

Additional fuel to the fire is that /config endpoint will return
solrconfig.xml + overlay.json merged, but not params.json. Confusing.

Additionally, /config output is JSON but not one that can round-trip AFAIK.

Regards,
Alex

On 26 Sep 2016 12:42 AM, "Shawn Heisey"  wrote:

> There seems to be some fracturing in the format of various Solr
> configs.  Most of the config uses XML, but some new features in the last
> few years are using JSON, particularly where SolrCloud and Zookeeper are
> concerned.  When notifications about SOLR-9557 came through, it revealed
> that there is a config file sitting next to solrconfig.xml named
> "params.json" that Solr will use.  I wasn't aware of this until reading
> that issue.
>
> This leads me to suggest something rather drastic for 7.0:  Consolidate
> all configuration formats and agree to consistent format usage unless
> there is another major discussion and agreement to change formats.
>
> I did consider starting this discussion in Jira, but it's fairly major,
> so the dev list seemed like the right place to start.
>
> Comments from some new users have come my way along the lines of "XML is
> so 90's ... get with the times!"  Image problems like that can be fatal
> to a software project, even if there's no technical problem.
>
> The likely winner in the format discussion is pure unmodified JSON, but
> I'm not going to make any assumptions.  SOLR-8029 has some format
> discussions that may be relevant here.
>
> IMHO, in order to make the idea successful, Solr 7.0 will need to
> automatically convert most configs on startup from the old format to the
> new format without user intervention.  If there's something that we find
> we can't convert automatically, that should result in a failure to
> start, with a helpful message so the user has some idea what they need
> to do.
>
> Thoughts?  Is this too scary to contemplate?  Should I open an umbrella
> issue in Jira to get the ball rolling?
>
> Thanks,
> Shawn
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Assigned] (SOLR-9411) Better validation for Schema API

2016-09-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-9411:
-

Assignee: Jan Høydahl

> Better validation for Schema API
> 
>
> Key: SOLR-9411
> URL: https://issues.apache.org/jira/browse/SOLR-9411
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
> Attachments: SOLR-9411.patch
>
>
> Schema REST API needs better validation before doing changes.
> * It should not be allowed to delete uniqueKey (also handled in SOLR-9349)
> * When adding a dynamic field the API should test that it begins or ends with 
> {{*}}. Today the change succeeds, but you get errors later
> These are two known cases. We should harden validation across the board for 
> all known schema requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9411) Better validation for Schema API

2016-09-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-9411:
--
Attachment: SOLR-9411.patch

This patch fixes a bug in {{add-dynamic-field}}, where the field got created 
using {{SchemaField.create()}} instead of 
{{managedIndexSchema.newDynamicField()}}

> Better validation for Schema API
> 
>
> Key: SOLR-9411
> URL: https://issues.apache.org/jira/browse/SOLR-9411
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Reporter: Jan Høydahl
> Attachments: SOLR-9411.patch
>
>
> Schema REST API needs better validation before doing changes.
> * It should not be allowed to delete uniqueKey (also handled in SOLR-9349)
> * When adding a dynamic field the API should test that it begins or ends with 
> {{*}}. Today the change succeeds, but you get errors later
> These are two known cases. We should harden validation across the board for 
> all known schema requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7465) Add a PatternTokenizer that uses Lucene's RegExp implementation

2016-09-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-7465:
---
Attachment: LUCENE-7465.patch

Another iteration, adding {{SimplePatternSplitTokenizer}}.  It's surprisingly 
different from the non-spit case, and sort of complex :)  But it does pass its 
tests.  I haven't compared performance to {{PatternTokenizer}} with group -1 
yet.  I'll see if I can simplify it, but I think this is otherwise close.

> Add a PatternTokenizer that uses Lucene's RegExp implementation
> ---
>
> Key: LUCENE-7465
> URL: https://issues.apache.org/jira/browse/LUCENE-7465
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.3
>
> Attachments: LUCENE-7465.patch, LUCENE-7465.patch
>
>
> I think there are some nice benefits to a version of PatternTokenizer that 
> uses Lucene's RegExp impl instead of the JDK's:
>   * Lucene's RegExp is compiled to a DFA up front, so if a "too hard" RegExp 
> is attempted the user discovers it up front instead of later on when a 
> "lucky" document arrives
>   * It processes the incoming characters as a stream, only pulling 128 
> characters at a time, vs the existing {{PatternTokenizer}} which currently 
> reads the entire string up front (this has caused heap problems in the past)
>   * It should be fast.
> I named it {{SimplePatternTokenizer}}, and it still needs a factory and 
> improved tests, but I think it's otherwise close.
> It currently does not take a {{group}} parameter because Lucene's RegExps 
> don't yet implement sub group capture.  I think we could add that at some 
> point, but it's a bit tricky.
> This doesn't even have group=-1 support (like String.split) ... I think if we 
> did that we should maybe name it differently 
> ({{SimplePatternSplitTokenizer}}?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Linux (64bit/jdk1.8.0_102) - Build # 17901 - Failure!

2016-09-25 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17901/
Java: 64bit/jdk1.8.0_102 -XX:+UseCompressedOops -XX:+UseG1GC

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.client.solrj.TestLBHttpSolrClient

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.client.solrj.TestLBHttpSolrClient: 1) Thread[id=2513, 
name=Connection evictor, state=TIMED_WAITING, group=TGRP-TestLBHttpSolrClient]  
   at java.lang.Thread.sleep(Native Method) at 
org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
 at java.lang.Thread.run(Thread.java:745)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.client.solrj.TestLBHttpSolrClient: 
   1) Thread[id=2513, name=Connection evictor, state=TIMED_WAITING, 
group=TGRP-TestLBHttpSolrClient]
at java.lang.Thread.sleep(Native Method)
at 
org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
at java.lang.Thread.run(Thread.java:745)
at __randomizedtesting.SeedInfo.seed([F56801609D417348]:0)


FAILED:  org.apache.solr.client.solrj.TestLBHttpSolrClient.testReliability

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space




Build Log:
[...truncated 13226 lines...]
   [junit4] Suite: org.apache.solr.client.solrj.TestLBHttpSolrClient
   [junit4]   2> Creating dataDir: 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-solrj/test/J0/temp/solr.client.solrj.TestLBHttpSolrClient_F56801609D417348-001/init-core-data-001
   [junit4]   2> 95449 INFO  
(SUITE-TestLBHttpSolrClient-seed#[F56801609D417348]-worker) [] 
o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (true) via: 
@org.apache.solr.util.RandomizeSSL(reason=, value=NaN, ssl=NaN, clientAuth=NaN)
   [junit4]   2> 95452 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.a.s.SolrTestCaseJ4 ###Starting testTwoServers
   [junit4]   2> 95457 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.e.j.s.Server jetty-9.3.8.v20160314
   [junit4]   2> 95458 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.e.j.s.h.ContextHandler Started 
o.e.j.s.ServletContextHandler@3c3ea710{/solr,null,AVAILABLE}
   [junit4]   2> 95460 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.e.j.s.ServerConnector Started ServerConnector@efa6c9c{SSL,[ssl, 
http/1.1]}{127.0.0.1:39244}
   [junit4]   2> 95460 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.e.j.s.Server Started @97415ms
   [junit4]   2> 95460 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.a.s.c.s.e.JettySolrRunner Jetty properties: 
{solr.data.dir=/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-solrj/test/J0/temp/solr.client.solrj.TestLBHttpSolrClient_F56801609D417348-001/instance-0-001/collection1/data,
 solrconfig=bad_solrconfig.xml, hostContext=/solr, hostPort=39244}
   [junit4]   2> 95460 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.a.s.c.SolrXmlConfig Loading container configuration from 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-solrj/test/J0/temp/solr.client.solrj.TestLBHttpSolrClient_F56801609D417348-001/instance-0-001/solr.xml
   [junit4]   2> 95467 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.a.s.c.CorePropertiesLocator Found 1 core definitions underneath 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-solrj/test/J0/temp/solr.client.solrj.TestLBHttpSolrClient_F56801609D417348-001/instance-0-001/.
   [junit4]   2> 95467 INFO  
(TEST-TestLBHttpSolrClient.testTwoServers-seed#[F56801609D417348]) [] 
o.a.s.c.CorePropertiesLocator Cores are: [collection1]
   [junit4]   2> 95474 INFO  (coreLoadExecutor-214-thread-1) [] 
o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.0
   [junit4]   2> 95478 INFO  (coreLoadExecutor-214-thread-1) [] 
o.a.s.c.SolrConfig Loaded SolrConfig: solrconfig.xml
   [junit4]   2> 95480 INFO  (coreLoadExecutor-214-thread-1) [] 
o.a.s.s.IndexSchema [collection1] Schema name=test
   [junit4]   2> 95482 INFO  (coreLoadExecutor-214-thread-1) [] 
o.a.s.s.IndexSchema [collection1] unique key field: id
   [junit4]   2> 95483 INFO  (coreLoadExecutor-214-thread-1) [] 
o.a.s.c.CoreContainer Creating SolrCore 'collection1' using configuration from 
instancedir 
/home/jenkins/workspace/Lucene-Solr-master-Linux/solr/build/solr-solrj/test/J0/temp/solr.client.solrj.TestLBHttpSolrClient_F56801609D417348-001/instance-0-001/./collection1
   [junit4]   2> 95484 INFO  (coreLoadExecutor-214-thread-1) [
x:collection1] o.a.s.c.SolrCore [[collection1] ] Opening new SolrCore at 

[jira] [Comment Edited] (LUCENE-7398) Nested Span Queries are buggy

2016-09-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15519320#comment-15519320
 ] 

Paul Elschot edited comment on LUCENE-7398 at 9/25/16 10:00 PM:


Patch of 24 Sep 2016, work in progress. Edit: superseded on 25 Sep, this can be 
ignored.

This introduces SpanNearQuery.MatchNear to choose the matching method.

The ORDERED_LAZY case is still the patch of 14 August, this should be changed 
back to the current implementation, and be used to implement ORDERED_LOOKAHEAD.

This implements MatchNear.UNORDERED_STARTPOS and uses that as the default 
implementation for the unordered case.
The implementation of UNORDERED_STARTPOS is in NearSpansUnorderedStartPos, 
which is simpler than the current NearSpansUnordered, there is no SpansCell.
I'd expect this StartPos implementation to be a little faster, so I also 
implemented it as default for the unordered case.  In only one test case the 
UNORDERED_LAZY method is needed to pass the test.

The question is whether it is ok to change the default unordered implementation 
to only use the span start positions.

The collect() method is moved to the superclass ConjunctionSpans, this 
simplification might be done at another issue.


was (Author: paul.elsc...@xs4all.nl):
Patch of 24 Sep 2016, work in progress.

This introduces SpanNearQuery.MatchNear to choose the matching method.

The ORDERED_LAZY case is still the patch of 14 August, this should be changed 
back to the current implementation, and be used to implement ORDERED_LOOKAHEAD.

This implements MatchNear.UNORDERED_STARTPOS and uses that as the default 
implementation for the unordered case.
The implementation of UNORDERED_STARTPOS is in NearSpansUnorderedStartPos, 
which is simpler than the current NearSpansUnordered, there is no SpansCell.
I'd expect this StartPos implementation to be a little faster, so I also 
implemented it as default for the unordered case.  In only one test case the 
UNORDERED_LAZY method is needed to pass the test.

The question is whether it is ok to change the default unordered implementation 
to only use the span start positions.

The collect() method is moved to the superclass ConjunctionSpans, this 
simplification might be done at another issue.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-09-25 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398-20160925.patch

Patch of 25 Sep 2016.
Compared to the previous patch, this removes the ORDERED_STARTPOS case, because 
I don't know whether that is needed.
Also this restores backward compatibility.

Compared to master, this has:
Four MatchNear methods, two are the current ones, they are called ORDERED_LAZY 
and UNORDERED_LAZY, and these are used when the current builder and 
constructors use a boolean ordered argument.

The third case is ORDERED_LOOKAHEAD, which is from the patch of 18 August.

The last case is UNORDERED_STARTPOS, which is simpler than UNORDERED_LAZY, 
hopefully a little faster, and with better completeness of the result.

Javadocs for all four cases have been added.

All test cases from here have been added, and where necessary they have been 
modified to use ORDERED_LOOKAHEAD and to not do span collection. These tests 
pass.

For the last case, UNORDERED_STARTPOS, no test cases have been added yet. This 
is still to be done. Does anyone have more difficult cases?

Minor point: the collect() method was moved to the superclass ConjunctionSpans.

Feedback welcome, especially on the javadocs of SpanNearQuery.MatchNear.

Instead of adding backtracking methods, it might be better to do counting of 
input spans in a matching window. I'm hoping that the UNORDERED_STARTPOS case 
can be extended for that. Any ideas there?

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9387) Allow topic expression to store queries and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9387:
-
Description: 
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query and a *macro* to be performed 
with the topic. 

Macros will be run using Solr's built-in parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.

  was:
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query and a *macro* to be performed 
with the topic. 

Macros will be run using Solr's builtin parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.


> Allow topic expression to store queries and macros
> --
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic. This ticket 
> will allow the topic to store the topic query and a *macro* to be performed 
> with the topic. 
> Macros will be run using Solr's built-in parameter substitution:
> Sample syntax:
> {code}
> topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
> {code}
> The query and macro will be stored with the topic. Topics can be retrieved 
> and executed as part of the larger macro using Solr's built in parameter 
> substitution.
> {code}
> http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
> ${topic}))=topic(collection1,)
> {code}
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics and macros.
> The parallel function can then be used to run the topics/macros in parallel 
> across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9387) Allow topic expression to store queries and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9387:
-
Description: 
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query and a *macro* to be performed 
with the topic. 

Macros will be run using Solr's builtin parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.

  was:
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query as well as a macro to be 
performed with the topic. 

Macros will be run using Solr's builtin parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.


> Allow topic expression to store queries and macros
> --
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic. This ticket 
> will allow the topic to store the topic query and a *macro* to be performed 
> with the topic. 
> Macros will be run using Solr's builtin parameter substitution:
> Sample syntax:
> {code}
> topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
> {code}
> The query and macro will be stored with the topic. Topics can be retrieved 
> and executed as part of the larger macro using Solr's built in parameter 
> substitution.
> {code}
> http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
> ${topic}))=topic(collection1,)
> {code}
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics and macros.
> The parallel function can then be used to run the topics/macros in parallel 
> across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-09-25 Thread Ben Manes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521377#comment-15521377
 ] 

Ben Manes commented on SOLR-8241:
-

I took look to refresh myself on LFUCache and decay. I don't think there is an 
issue because TinyLFU has similar logic to age the frequencies asynchronously. 
It observes a sample of 10 * maximum size and then halves the counters. The 
difference is the counters are stored in an array, are 4-bit, and represent all 
items (not just those currently residing in the cache). This extended history 
and using frequency for admission (rather than eviction) is what allows the 
policy to have a superior hit rate and be amortized O(1).

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Linux (64bit/jdk1.8.0_102) - Build # 17900 - Unstable!

2016-09-25 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17900/
Java: 64bit/jdk1.8.0_102 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.handler.TestReplicationHandler.doTestStressReplication

Error Message:
timed out waiting for collection1 startAt time to exceed: Sun Sep 25 20:09:56 
WEST 2016

Stack Trace:
java.lang.AssertionError: timed out waiting for collection1 startAt time to 
exceed: Sun Sep 25 20:09:56 WEST 2016
at 
__randomizedtesting.SeedInfo.seed([1AB77B68D36CBF84:C11C7BAED644D637]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.handler.TestReplicationHandler.watchCoreStartAt(TestReplicationHandler.java:1508)
at 
org.apache.solr.handler.TestReplicationHandler.doTestStressReplication(TestReplicationHandler.java:858)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 11895 lines...]
   [junit4] Suite: 

[JENKINS] Lucene-Solr-6.x-Solaris (64bit/jdk1.8.0) - Build # 411 - Unstable!

2016-09-25 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/411/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.schema.TestCloudSchemaless.test

Error Message:
QUERY FAILED: 
xpath=/response/arr[@name='fields']/lst/str[@name='name'][.='newTestFieldInt445']
  request=/schema/fields?wt=xml  response=  

[jira] [Updated] (SOLR-9559) Add ExecutorStream to execute stored topics and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9559:
-
Description: 
The ExecutorStream will execute the stored topics and macros from SOLR-9387.

The ExecutorStream can be pointed at a SolrCloud collection where the topics 
are stored and it will execute the topics and macros in batches.

The ExecutorStream will support parallel execution of topics/macros as well. 
This will allow the workload to be spread across a cluster of worker nodes.

  was:
The ExecutorStream will execute the stored topics and macros from SOLR-9387.

The ExecutorStream can be pointed at a SolrCloud collection where the topics 
are stored and it will execute the topics and macros in batches.

The ExecutorStream will support parallel execution of topics/macros as well. 
This will allow the workload to be spread across a cluster worker nodes.


> Add ExecutorStream to execute stored topics and macros
> --
>
> Key: SOLR-9559
> URL: https://issues.apache.org/jira/browse/SOLR-9559
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> The ExecutorStream will execute the stored topics and macros from SOLR-9387.
> The ExecutorStream can be pointed at a SolrCloud collection where the topics 
> are stored and it will execute the topics and macros in batches.
> The ExecutorStream will support parallel execution of topics/macros as well. 
> This will allow the workload to be spread across a cluster of worker nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9559) Add ExecutorStream to execute stored topics and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9559:
-
Description: 
The ExecutorStream will execute the stored topics and macros from SOLR-9387.

The ExecutorStream can be pointed at a SolrCloud collection where the topics 
are stored and it will execute the topics and macros in batches.

The ExecutorStream will support parallel execution of topics/macros as well. 
This will allow the workload to be spread across a cluster worker nodes.

  was:
The ExecutorStream will execute the stored topics and macros from SOLR-9387.

The ExecutorStream can be pointed at a SolrCloud collection where the topics 
are stored and it will execute the topics and macros in batches.

The ExecutorStream will support parallel execution of topics/macros as well. 
This will allow the workload to be spread over a large number of worker nodes.


> Add ExecutorStream to execute stored topics and macros
> --
>
> Key: SOLR-9559
> URL: https://issues.apache.org/jira/browse/SOLR-9559
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> The ExecutorStream will execute the stored topics and macros from SOLR-9387.
> The ExecutorStream can be pointed at a SolrCloud collection where the topics 
> are stored and it will execute the topics and macros in batches.
> The ExecutorStream will support parallel execution of topics/macros as well. 
> This will allow the workload to be spread across a cluster worker nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr configuration format fracturing

2016-09-25 Thread Shawn Heisey
There seems to be some fracturing in the format of various Solr
configs.  Most of the config uses XML, but some new features in the last
few years are using JSON, particularly where SolrCloud and Zookeeper are
concerned.  When notifications about SOLR-9557 came through, it revealed
that there is a config file sitting next to solrconfig.xml named
"params.json" that Solr will use.  I wasn't aware of this until reading
that issue.

This leads me to suggest something rather drastic for 7.0:  Consolidate
all configuration formats and agree to consistent format usage unless
there is another major discussion and agreement to change formats.

I did consider starting this discussion in Jira, but it's fairly major,
so the dev list seemed like the right place to start.

Comments from some new users have come my way along the lines of "XML is
so 90's ... get with the times!"  Image problems like that can be fatal
to a software project, even if there's no technical problem.

The likely winner in the format discussion is pure unmodified JSON, but
I'm not going to make any assumptions.  SOLR-8029 has some format
discussions that may be relevant here.

IMHO, in order to make the idea successful, Solr 7.0 will need to
automatically convert most configs on startup from the old format to the
new format without user intervention.  If there's something that we find
we can't convert automatically, that should result in a failure to
start, with a helpful message so the user has some idea what they need
to do.

Thoughts?  Is this too scary to contemplate?  Should I open an umbrella
issue in Jira to get the ball rolling?

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9559) Add ExecutorStream to execute stored topics and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9559:
-
Summary: Add ExecutorStream to execute stored topics and macros  (was: Add 
ExecutorStream)

> Add ExecutorStream to execute stored topics and macros
> --
>
> Key: SOLR-9559
> URL: https://issues.apache.org/jira/browse/SOLR-9559
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> The ExecutorStream will execute the stored topics and macros from SOLR-9387.
> The ExecutorStream can be pointed at a SolrCloud collection where the topics 
> are stored and it will execute the topics and macros in batches.
> The ExecutorStream will support parallel execution of topics/macros as well. 
> This will allow the workload to be spread over a large number of worker nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9559) Add ExecutorStream

2016-09-25 Thread Joel Bernstein (JIRA)
Joel Bernstein created SOLR-9559:


 Summary: Add ExecutorStream
 Key: SOLR-9559
 URL: https://issues.apache.org/jira/browse/SOLR-9559
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


The ExecutorStream will execute the stored topics and macros from SOLR-9387.

The ExecutorStream can be pointed at a SolrCloud collection where the topics 
are stored and it will execute the topics and macros in batches.

The ExecutorStream will support parallel execution of topics/macros as well. 
This will allow the workload to be spread over a large number of worker nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130
 ] 

Pushkar Raste edited comment on SOLR-9506 at 9/25/16 5:14 PM:
--

POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute {{versionsInHash}} from {{versionsInHash}} of individual 
segments. We can not use current {{versionsHash}} (unless we cache all the 
individual version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and versionsHash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative {{versionHash}} of leader and 
replica would match. 
\\ \\Even if decide not to cache {{IndexFingerprint}} per segment but just to 
parallalize the computation, I think we still would run into issue mentioned 
above.

* I still need to figure out how to keep cache in   {{DefaultSolrCoreState}}, 
so that we can reuse {{IndexFingerprint}} of individual segments when a new 
Searcher is opened.  


was (Author: praste):
POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute `versionsInHash` from `versionsInHash` of individual segments. 
We can not use current `versionsHash` (unless we cache all the individual 
version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and hash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative `versionHash` of leader and 
replica would match

* I still need to figure out how to keep cache in   `DefaultSolrCoreState`, so 
that we can reuse `IndexFingerprint` of individual segments when a new Searcher 
is opened.  

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130
 ] 

Pushkar Raste commented on SOLR-9506:
-

POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute `versionsInHash` from `versionsInHash` of individual segments. 
We can not use current `versionsHash` (unless we cache all the individual 
version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and hash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative `versionHash` of leader and 
replica would match

* I still need to figure out how to keep cache in   `DefaultSolrCoreState`, so 
that we can reuse `IndexFingerprint` of individual segments when a new Searcher 
is opened.  

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9387) Allow topic expression to store queries and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9387:
-
Description: 
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query as well as a macro to be 
performed with the topic. 

Macros will be run using Solr's builtin parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.

  was:
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query as well as a macro to be 
performed with the topic. 

Macros will be run using Solr's builtin parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro as using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.


> Allow topic expression to store queries and macros
> --
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic. This ticket 
> will allow the topic to store the topic query as well as a macro to be 
> performed with the topic. 
> Macros will be run using Solr's builtin parameter substitution:
> Sample syntax:
> {code}
> topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
> {code}
> The query and macro will be stored with the topic. Topics can be retrieved 
> and executed as part of the larger macro using Solr's built in parameter 
> substitution.
> {code}
> http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
> ${topic}))=topic(collection1,)
> {code}
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics and macros.
> The parallel function can then be used to run the topics/macros in parallel 
> across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9387) Allow topic expression to store queries and macros

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9387:
-
Summary: Allow topic expression to store queries and macros  (was: Allow 
topic expression to store queries and perform actions)

> Allow topic expression to store queries and macros
> --
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic. This ticket 
> will allow the topic to store the topic query as well as a macro to be 
> performed with the topic. 
> Macros will be run using Solr's builtin parameter substitution:
> Sample syntax:
> {code}
> topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
> {code}
> The query and macro will be stored with the topic. Topics can be retrieved 
> and executed as part of the larger macro as using Solr's built in parameter 
> substitution.
> {code}
> http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
> ${topic}))=topic(collection1,)
> {code}
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics and macros.
> The parallel function can then be used to run the topics/macros in parallel 
> across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9387) Allow topic expression to store queries and perform actions

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9387:
-
Description: 
The topic expression already stores the checkpoints for a topic. This ticket 
will allow the topic to store the topic query as well as a macro to be 
performed with the topic. 

Macros will be run using Solr's builtin parameter substitution:

Sample syntax:

{code}
topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
{code}

The query and macro will be stored with the topic. Topics can be retrieved and 
executed as part of the larger macro as using Solr's built in parameter 
substitution.

{code}
http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
${topic}))=topic(collection1,)
{code}

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics and macros.

The parallel function can then be used to run the topics/macros in parallel 
across a large number of workers.

  was:
The topic expression already stores the checkpoints for a topic.

This ticket will allow the topic to store the topic query as well. 

Because topics are stored in a SolrCloud collection this will allow for storing 
millions of topics/queries.

The parallel function can then be used to execute the topic queries in parallel 
across a large number of workers.


> Allow topic expression to store queries and perform actions
> ---
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic. This ticket 
> will allow the topic to store the topic query as well as a macro to be 
> performed with the topic. 
> Macros will be run using Solr's builtin parameter substitution:
> Sample syntax:
> {code}
> topic(collection1, q="*:*", macro="update(classify(model, ${topic}))")
> {code}
> The query and macro will be stored with the topic. Topics can be retrieved 
> and executed as part of the larger macro as using Solr's built in parameter 
> substitution.
> {code}
> http://localhost:8983/solr/collection1/stream?expr=update(classify(model, 
> ${topic}))=topic(collection1,)
> {code}
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics and macros.
> The parallel function can then be used to run the topics/macros in parallel 
> across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9387) Allow topic expression to store queries and perform actions

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9387:
-
Summary: Allow topic expression to store queries and perform actions  (was: 
Scalable stored queries and alerts with the topic Streaming Expression)

> Allow topic expression to store queries and perform actions
> ---
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic.
> This ticket will allow the topic to store the topic query as well. 
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics/queries.
> The parallel function can then be used to execute the topic queries in 
> parallel across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9387) Allow topic expression to store queries and perform actions

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein reassigned SOLR-9387:


Assignee: Joel Bernstein

> Allow topic expression to store queries and perform actions
> ---
>
> Key: SOLR-9387
> URL: https://issues.apache.org/jira/browse/SOLR-9387
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The topic expression already stores the checkpoints for a topic.
> This ticket will allow the topic to store the topic query as well. 
> Because topics are stored in a SolrCloud collection this will allow for 
> storing millions of topics/queries.
> The parallel function can then be used to execute the topic queries in 
> parallel across a large number of workers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-8577) Add AlertStream and ModelStream to the Streaming API

2016-09-25 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein resolved SOLR-8577.
--
Resolution: Duplicate

> Add AlertStream and ModelStream to the Streaming API
> 
>
> Key: SOLR-8577
> URL: https://issues.apache.org/jira/browse/SOLR-8577
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>
> The AlertStream will return the top N "new" documents for a query from a 
> SolrCloud collection. The AlertStream will track the highest version numbers 
> from each shard and use these as checkpoints to determine new content.
> The DaemonStream (SOLR-8550) can be used to create "live" alerts that run at 
> intervals. Sample syntax:
> {code}
> daemon(alert(collection1, q="hello", n="20"), runInterval="2000")
> {code}
> The DaemonStream can be installed in a SolrCloud worker node where it can 
> llive and send out alerts.
> *AI Models*
> The *AlertStream* will also accept an optional *ModelStream* which will apply 
> a machine learning model to the alert. For example:
> {code}
> alert(collection1, q="hello", n="20", model(collection2, id="model1"))
> {code}
> The ModelStream will return a machine learning model saved in a SolrCloud 
> collection. Function queries for different model types will be developed so 
> the models can be applied in the re-ranker or as a sort.
> *Taking action*
> Custom decorator streams can be developed that *take actions based on the AI 
> driven alerts*. For example the pseudo code below would run the function 
> *someAction* on the Tuples emitted by the AlertStream.
> {code}
> daemon(someAction(alert(...)))
> {code} 
> *Learning*
> While some SolrCloud worker collections are alerting and taking action, other 
> worker collections can be *learning models* which can be applied for 
> alerting. For example:
> {code}
> daemon(update(logit()))
> {code}
> The pseudo code above calls the LogitStream (SOLR-8492)  which would learn a 
> Logistic Regression model and flow the model into a SolrCloud collection. The 
> model can then be used for alerting and taking action on new data as it 
> enters the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3563 - Unstable!

2016-09-25 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3563/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

3 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test

Error Message:
Timeout occured while waiting response from server at: 
http://127.0.0.1:54849/yf_o/mt

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting 
response from server at: http://127.0.0.1:54849/yf_o/mt
at 
__randomizedtesting.SeedInfo.seed([420ECADA066E5E46:CA5AF500A89233BE]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:619)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:261)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:250)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.makeRequest(CollectionsAPIDistributedZkTest.java:400)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testCollectionsAPI(CollectionsAPIDistributedZkTest.java:898)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test(CollectionsAPIDistributedZkTest.java:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Commented] (SOLR-9558) DIH TemplateTransformer does not support multiple values

2016-09-25 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520840#comment-15520840
 ] 

Shalin Shekhar Mangar commented on SOLR-9558:
-

So you want to transform the same column by multiple (different) templates to 
create a multi-valued field with each value being the output of an individual 
template?

> DIH TemplateTransformer does not support multiple values
> 
>
> Key: SOLR-9558
> URL: https://issues.apache.org/jira/browse/SOLR-9558
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Affects Versions: 6.2
>Reporter: Ted Sullivan
>Priority: Minor
> Fix For: trunk
>
> Attachments: SOLR-9558.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The DIH TemplateTransformer does not support multiple templates with the same 
> column name. Rather than creating a List of values as it should do in this 
> case,  the value of the last  tag with the same column name replaces 
> the values of previous transforms for that column. The reason is that it uses 
> a single HashMap to store the transformations with a key on column name. The 
> fix is to detect if a column has previously been transformed within the same 
> field set and to create a List for that column when this occurrs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9558) DIH TemplateTransformer does not support multiple values

2016-09-25 Thread Ted Sullivan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Sullivan updated SOLR-9558:
---
Attachment: SOLR-9558.patch

> DIH TemplateTransformer does not support multiple values
> 
>
> Key: SOLR-9558
> URL: https://issues.apache.org/jira/browse/SOLR-9558
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - DataImportHandler
>Affects Versions: 6.2
>Reporter: Ted Sullivan
>Priority: Minor
> Fix For: trunk
>
> Attachments: SOLR-9558.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The DIH TemplateTransformer does not support multiple templates with the same 
> column name. Rather than creating a List of values as it should do in this 
> case,  the value of the last  tag with the same column name replaces 
> the values of previous transforms for that column. The reason is that it uses 
> a single HashMap to store the transformations with a key on column name. The 
> fix is to detect if a column has previously been transformed within the same 
> field set and to create a List for that column when this occurrs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9558) DIH TemplateTransformer does not support multiple values

2016-09-25 Thread Ted Sullivan (JIRA)
Ted Sullivan created SOLR-9558:
--

 Summary: DIH TemplateTransformer does not support multiple values
 Key: SOLR-9558
 URL: https://issues.apache.org/jira/browse/SOLR-9558
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - DataImportHandler
Affects Versions: 6.2
Reporter: Ted Sullivan
Priority: Minor
 Fix For: trunk


The DIH TemplateTransformer does not support multiple templates with the same 
column name. Rather than creating a List of values as it should do in this 
case,  the value of the last  tag with the same column name replaces the 
values of previous transforms for that column. The reason is that it uses a 
single HashMap to store the transformations with a key on column name. The fix 
is to detect if a column has previously been transformed within the same field 
set and to create a List for that column when this occurrs.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9310) PeerSync fails on a node restart due to IndexFingerPrint mismatch

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520747#comment-15520747
 ] 

Pushkar Raste edited comment on SOLR-9310 at 9/25/16 12:51 PM:
---

I went through logs at 
https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/429/consoleFull 
If PeerSync was unsuccessful I would expect to see a line like 
{{o.a.s.u.PeerSync Fingerprint comparison: -1}} 

However, I don't see such line. I could think of two scenarios that could break 
the test 
* data directory could get deleted while a node is brought down, since data 
directory is created in {{temp}}. Upon restart replica would have no frame of 
reference and will have to fall back on replication.
* we need a better check than relying number of requests made to 
{{ReplicationHandler}}



was (Author: praste):
I went through logs in the failed test email notification but those are 
truncated. Where can I look at the entire build.log for the test. 

Only thing I could think of at this point is data directory could get deleted 
while a node is brought down, since data directory is created in {{temp}}. Upon 
restart replica would have no frame of reference and will have to fall back on 
replication.



> PeerSync fails on a node restart due to IndexFingerPrint mismatch
> -
>
> Key: SOLR-9310
> URL: https://issues.apache.org/jira/browse/SOLR-9310
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
> Fix For: 5.5.3, 6.3, trunk
>
> Attachments: PeerSync_3Node_Setup.jpg, PeerSync_Experiment.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310_3ReplicaTest.patch, 
> SOLR-9310_5x.patch, SOLR-9310_final.patch
>
>
> I found that Peer Sync fails if a node restarts and documents were indexed 
> while node was down. IndexFingerPrint check fails after recovering node 
> applies updates. 
> This happens only when node restarts and not if node just misses updates due 
> reason other than it being down.
> Please check attached patch for the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9310) PeerSync fails on a node restart due to IndexFingerPrint mismatch

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520747#comment-15520747
 ] 

Pushkar Raste commented on SOLR-9310:
-

I went through logs in the failed test email notification but those are 
truncated. Where can I look at the entire build.log for the test. 

Only thing I could think of at this point is data directory could get deleted 
while a node is brought down, since data directory is created in {{temp}}. Upon 
restart replica would have no frame of reference and will have to fall back on 
replication.



> PeerSync fails on a node restart due to IndexFingerPrint mismatch
> -
>
> Key: SOLR-9310
> URL: https://issues.apache.org/jira/browse/SOLR-9310
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
> Fix For: 5.5.3, 6.3, trunk
>
> Attachments: PeerSync_3Node_Setup.jpg, PeerSync_Experiment.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310_3ReplicaTest.patch, 
> SOLR-9310_5x.patch, SOLR-9310_final.patch
>
>
> I found that Peer Sync fails if a node restarts and documents were indexed 
> while node was down. IndexFingerPrint check fails after recovering node 
> applies updates. 
> This happens only when node restarts and not if node just misses updates due 
> reason other than it being down.
> Please check attached patch for the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7452) improve exception message: child query must only match non-parent docs, but parent docID=180314...

2016-09-25 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev resolved LUCENE-7452.
--
Resolution: Fixed

> improve exception message: child query must only match non-parent docs, but 
> parent docID=180314...
> --
>
> Key: LUCENE-7452
> URL: https://issues.apache.org/jira/browse/LUCENE-7452
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 6.2
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Fix For: master (7.0), 6.3
>
> Attachments: LUCENE-7452.patch
>
>
> when parent filter intersects with child query the exception exposes internal 
> details: docnum and scorer class. I propose an exception message to suggest 
> to execute a query intersecting them both. There is an opinion to add this  
> suggestion in addition to existing details. 
> My main concern against is, when index is constantly updated even SOLR-9582 
> allows to search for docnum it would be like catching the wind, also think 
> about cloud case. But, user advised with executing query intersection can 
> catch problem documents even if they occurs sporadically.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7452) improve exception message: child query must only match non-parent docs, but parent docID=180314...

2016-09-25 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520705#comment-15520705
 ] 

Mikhail Khludnev commented on LUCENE-7452:
--

Thanks, Alexandre!

> improve exception message: child query must only match non-parent docs, but 
> parent docID=180314...
> --
>
> Key: LUCENE-7452
> URL: https://issues.apache.org/jira/browse/LUCENE-7452
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 6.2
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Fix For: master (7.0), 6.3
>
> Attachments: LUCENE-7452.patch
>
>
> when parent filter intersects with child query the exception exposes internal 
> details: docnum and scorer class. I propose an exception message to suggest 
> to execute a query intersecting them both. There is an opinion to add this  
> suggestion in addition to existing details. 
> My main concern against is, when index is constantly updated even SOLR-9582 
> allows to search for docnum it would be like catching the wind, also think 
> about cloud case. But, user advised with executing query intersection can 
> catch problem documents even if they occurs sporadically.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7465) Add a PatternTokenizer that uses Lucene's RegExp implementation

2016-09-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-7465:
---
Attachment: LUCENE-7465.patch

Patch.

I added the {{SimplePatternTokenizerFactory}} as well.

{{SimplePatternTokenizer}} takes either a String (parses as a regexp
and compiles it) or a DFA (expert user who pre-built their own
automaton).

I folded in a nice idea from [~rcmuir] to optimize the ascii code
points even when using {{CharacterRunAutomaton}}.

It's quite fast, ~46% faster than {{PatternTokenizer}} when tokenizing
1 MB chunks from the English Wikipedia export, using a simplistic
whitespace regexp {{\[^ \t\r\n]+}}.

And it's nice that it doesn't read the entire input into heap!


> Add a PatternTokenizer that uses Lucene's RegExp implementation
> ---
>
> Key: LUCENE-7465
> URL: https://issues.apache.org/jira/browse/LUCENE-7465
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.3
>
> Attachments: LUCENE-7465.patch
>
>
> I think there are some nice benefits to a version of PatternTokenizer that 
> uses Lucene's RegExp impl instead of the JDK's:
>   * Lucene's RegExp is compiled to a DFA up front, so if a "too hard" RegExp 
> is attempted the user discovers it up front instead of later on when a 
> "lucky" document arrives
>   * It processes the incoming characters as a stream, only pulling 128 
> characters at a time, vs the existing {{PatternTokenizer}} which currently 
> reads the entire string up front (this has caused heap problems in the past)
>   * It should be fast.
> I named it {{SimplePatternTokenizer}}, and it still needs a factory and 
> improved tests, but I think it's otherwise close.
> It currently does not take a {{group}} parameter because Lucene's RegExps 
> don't yet implement sub group capture.  I think we could add that at some 
> point, but it's a bit tricky.
> This doesn't even have group=-1 support (like String.split) ... I think if we 
> did that we should maybe name it differently 
> ({{SimplePatternSplitTokenizer}}?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org