[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3409 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3409/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
Error from server at http://127.0.0.1:52154/solr: 'location' is not specified 
as a query parameter or as a default repository property or as a cluster 
property.

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:52154/solr: 'location' is not specified as a 
query parameter or as a default repository property or as a cluster property.
at 
__randomizedtesting.SeedInfo.seed([60B754BD1A87A703:E8E36B67B47BCAFB]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:606)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:413)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:366)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1270)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1040)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:976)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.testInvalidPath(AbstractCloudBackupRestoreTestCase.java:149)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.test(AbstractCloudBackupRestoreTestCase.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[jira] [Closed] (LUCENE-533) SpanQuery scoring: SpanWeight lacks a recursive traversal of the query tree

2016-07-12 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-533.
---
Resolution: Fixed

I believe this is finally no longer an issue so I'm closing it.  Perhaps 
LUCENE-2880 (in Lucene 5.3) was the one to resolve it, or if not then I'm 
pretty sure some issue in the 5x series.  createWeight() impls do a tree 
traversal, and they are also weighted by a customizable boost factor -- 
SpanBoostQuery.

> SpanQuery scoring: SpanWeight lacks a recursive traversal of the query tree
> ---
>
> Key: LUCENE-533
> URL: https://issues.apache.org/jira/browse/LUCENE-533
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 1.9
>Reporter: Vincent Le Maout
>Priority: Minor
>
> I found the computing of weights to be somewhat different according to the 
> query type (BooleanQuery versus SpanQuery) :
> org.apache.lucene.search.BooleanQuery.BooleanWeight :
> public BooleanWeight(Searcher searcher)
>  throws IOException {
>  this.similarity = getSimilarity(searcher);
>  for (int i = 0 ; i < clauses.size(); i++) {
>BooleanClause c = (BooleanClause)clauses.elementAt(i);
>weights.add(c.getQuery().createWeight(searcher));
>  }
>}
> which looks like a recursive descent through the tree, taking into account 
> the weights of all the nodes, whereas :
> org.apache.lucene.search.spans.SpanWeight :
> public SpanWeight(SpanQuery query, Searcher searcher)
>throws IOException {
>this.similarity = query.getSimilarity(searcher);
>this.query = query;
>this.terms = query.getTerms();
>idf = this.query.getSimilarity(searcher).idf(terms, searcher);
>  }
> lacks any traversal and according to what I have understood so far from the 
> rest
> of the code, only takes into account the boost of the tree root in 
> SumOfSquareWeights(),
> which is consistent with the resulting scores not considering the boost of 
> the tree
> leaves.
> vintz



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Linux (32bit/jdk1.8.0_92) - Build # 17234 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17234/
Java: 32bit/jdk1.8.0_92 -server -XX:+UseG1GC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.overseer.ZkStateWriterTest

Error Message:
1 thread leaked from SUITE scope at 
org.apache.solr.cloud.overseer.ZkStateWriterTest: 1) Thread[id=2116, 
name=watches-310-thread-1, state=TIMED_WAITING, group=TGRP-ZkStateWriterTest]   
  at sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
 at 
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.solr.cloud.overseer.ZkStateWriterTest: 
   1) Thread[id=2116, name=watches-310-thread-1, state=TIMED_WAITING, 
group=TGRP-ZkStateWriterTest]
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at __randomizedtesting.SeedInfo.seed([C4BA732CCA914D51]:0)


FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.overseer.ZkStateWriterTest

Error Message:
There are still zombie threads that couldn't be terminated:1) 
Thread[id=2116, name=watches-310-thread-1, state=TIMED_WAITING, 
group=TGRP-ZkStateWriterTest] at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)  
   at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
 at 
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941) 
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)   
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)

Stack Trace:
com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie 
threads that couldn't be terminated:
   1) Thread[id=2116, name=watches-310-thread-1, state=TIMED_WAITING, 
group=TGRP-ZkStateWriterTest]
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at __randomizedtesting.SeedInfo.seed([C4BA732CCA914D51]:0)


FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
Error from server at http://127.0.0.1:38510/solr: 'location' is not specified 
as a query parameter or as a default repository property or as a cluster 
property.

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:38510/solr: 'location' is not specified as a 
query parameter or as a default repository property or as a cluster property.
at 
__randomizedtesting.SeedInfo.seed([C4BA732CCA914D51:4CEE4CF6646D20A9]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:606)
at 

[jira] [Commented] (SOLR-9285) ArrayIndexOutOfBoundsException when ValueSourceAugmenter used with RTG on uncommitted doc

2016-07-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374299#comment-15374299
 ] 

Yonik Seeley commented on SOLR-9285:


{quote}
- refactors the ulog.openRealtimeSearcher() logic in RTG Component to also 
apply when there are transformers
- Adds a new light weight (private) RTGResultContext class for wrapping 
realtime {{SolrIndexSearcher}}s for use with transformers.
{quote}

+1, this sounds like the right approach.

{quote}
- Fixes ValueSourceAugmenter.setContext to use ResultContext.getSearcher() 
instead of the one that comes from the request
   Independent of anything else, this seems like a bug 
{quote}

Yep.  ResultContext having a getSearcher is relatively new... I added it to 
support SOLR-7830, but I obviously didn't find all the places that needed 
changing.

> ArrayIndexOutOfBoundsException when ValueSourceAugmenter used with RTG on 
> uncommitted doc
> -
>
> Key: SOLR-9285
> URL: https://issues.apache.org/jira/browse/SOLR-9285
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9285.patch, SOLR-9285.patch
>
>
> Found in SOLR-9180 testing.
> Even in single node solr envs, doing an RTG for an uncommitted doc that uses 
> ValueSourceAugmenter (ie: simple field aliasing, or functions in fl) causes 
> an ArrayIndexOutOfBoundsException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Cao Manh Dat (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated SOLR-9252:
---
Attachment: SOLR-9252.patch

Updated patch based on [~joel.bernstein] about numDocs().

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Windows (32bit/jdk1.8.0_92) - Build # 318 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/318/
Java: 32bit/jdk1.8.0_92 -client -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
Error from server at https://127.0.0.1:61976/solr: The backup directory already 
exists: 
file:///C:/Users/jenkins/workspace/Lucene-Solr-6.x-Windows/solr/build/solr-core/test/J1/temp/solr.cloud.TestLocalFSCloudBackupRestore_73AE08B1001AA66F-001/tempDir-002/mytestbackup/

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at https://127.0.0.1:61976/solr: The backup directory already 
exists: 
file:///C:/Users/jenkins/workspace/Lucene-Solr-6.x-Windows/solr/build/solr-core/test/J1/temp/solr.cloud.TestLocalFSCloudBackupRestore_73AE08B1001AA66F-001/tempDir-002/mytestbackup/
at 
__randomizedtesting.SeedInfo.seed([73AE08B1001AA66F:FBFA376BAEE6CB97]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:590)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:403)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:356)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1228)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:998)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:934)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.testBackupAndRestore(AbstractCloudBackupRestoreTestCase.java:207)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.test(AbstractCloudBackupRestoreTestCase.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 

Re: Converting double values to char[]

2016-07-12 Thread Erick Erickson
Might have cracked it, I'll know more by tomorrow


On Tue, Jul 12, 2016 at 4:54 PM, Erick Erickson  wrote:
> Right, thanks. I had no intention of actually committing any of this
> in, I'm just trying to figure out if this is worth the effort with a
> quick hack to take some measurements. That said I'll be more cautious
> now about making any patches.
>
> I'll add that preliminarily, even on just looking at string fields
> it's looking promising. We can write a CharsRef without having to go
> through a String operation or hacking any code.
>
> Erick
>
> On Tue, Jul 12, 2016 at 4:31 PM, Yonik Seeley  wrote:
>> On Tue, Jul 12, 2016 at 7:22 PM, Erick Erickson  
>> wrote:
>>> Before I launch off into ripping off the toString code fro, cay Float
>>> and Double types
>>
>> Please look into licensing before you do that...
>> Or rip off from Geronimo, which is ASL
>>
>> -Yonik
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7280) Load cores in sorted order and tweak coreLoadThread counts to improve cluster stability on restarts

2016-07-12 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374195#comment-15374195
 ] 

damien kamerman commented on SOLR-7280:
---

Or, ensure that the coreLoadThreads is >= max(collection's replicas on a single 
node) ?

> Load cores in sorted order and tweak coreLoadThread counts to improve cluster 
> stability on restarts
> ---
>
> Key: SOLR-7280
> URL: https://issues.apache.org/jira/browse/SOLR-7280
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
> Fix For: 5.2, 6.0
>
> Attachments: SOLR-7280.patch
>
>
> In SOLR-7191, Damien mentioned that by loading solr cores in a sorted order 
> and tweaking some of the coreLoadThread counts, he was able to improve the 
> stability of a cluster with thousands of collections. We should explore some 
> of these changes and fold them into Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9285) ArrayIndexOutOfBoundsException when ValueSourceAugmenter used with RTG on uncommitted doc

2016-07-12 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-9285:
---
Attachment: SOLR-9285.patch

bq. Reviewing the RTG Component code also makes me realize that in general we 
should have more RTG+transformer tests which:

updated patch with a randomized test along these lines ... still some holes to 
fill in, but mostly feature complete.

> ArrayIndexOutOfBoundsException when ValueSourceAugmenter used with RTG on 
> uncommitted doc
> -
>
> Key: SOLR-9285
> URL: https://issues.apache.org/jira/browse/SOLR-9285
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9285.patch, SOLR-9285.patch
>
>
> Found in SOLR-9180 testing.
> Even in single node solr envs, doing an RTG for an uncommitted doc that uses 
> ValueSourceAugmenter (ie: simple field aliasing, or functions in fl) causes 
> an ArrayIndexOutOfBoundsException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-ea+126) - Build # 17233 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17233/
Java: 32bit/jdk-9-ea+126 -server -XX:+UseSerialGC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.CollectionsAPIDistributedZkTest

Error Message:
9 threads leaked from SUITE scope at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest: 1) Thread[id=10410, 
name=Connection evictor, state=TIMED_WAITING, 
group=TGRP-CollectionsAPIDistributedZkTest] at 
java.lang.Thread.sleep(java.base@9-ea/Native Method) at 
org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
 at java.lang.Thread.run(java.base@9-ea/Thread.java:843)2) 
Thread[id=10460, name=zkCallback-1848-thread-3, state=TIMED_WAITING, 
group=TGRP-CollectionsAPIDistributedZkTest] at 
jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@9-ea/LockSupport.java:230)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@9-ea/SynchronousQueue.java:461)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@9-ea/SynchronousQueue.java:362)
 at 
java.util.concurrent.SynchronousQueue.poll(java.base@9-ea/SynchronousQueue.java:937)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(java.base@9-ea/ThreadPoolExecutor.java:1082)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@9-ea/ThreadPoolExecutor.java:1143)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@9-ea/ThreadPoolExecutor.java:632)
 at java.lang.Thread.run(java.base@9-ea/Thread.java:843)3) 
Thread[id=10484, name=zkCallback-1848-thread-4, state=TIMED_WAITING, 
group=TGRP-CollectionsAPIDistributedZkTest] at 
jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@9-ea/LockSupport.java:230)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@9-ea/SynchronousQueue.java:461)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@9-ea/SynchronousQueue.java:362)
 at 
java.util.concurrent.SynchronousQueue.poll(java.base@9-ea/SynchronousQueue.java:937)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(java.base@9-ea/ThreadPoolExecutor.java:1082)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@9-ea/ThreadPoolExecutor.java:1143)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@9-ea/ThreadPoolExecutor.java:632)
 at java.lang.Thread.run(java.base@9-ea/Thread.java:843)4) 
Thread[id=10545, name=zkCallback-1848-thread-6, state=TIMED_WAITING, 
group=TGRP-CollectionsAPIDistributedZkTest] at 
jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@9-ea/LockSupport.java:230)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@9-ea/SynchronousQueue.java:461)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@9-ea/SynchronousQueue.java:362)
 at 
java.util.concurrent.SynchronousQueue.poll(java.base@9-ea/SynchronousQueue.java:937)
 at 
java.util.concurrent.ThreadPoolExecutor.getTask(java.base@9-ea/ThreadPoolExecutor.java:1082)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@9-ea/ThreadPoolExecutor.java:1143)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@9-ea/ThreadPoolExecutor.java:632)
 at java.lang.Thread.run(java.base@9-ea/Thread.java:843)5) 
Thread[id=10412, 
name=TEST-CollectionsAPIDistributedZkTest.test-seed#[99712609A1052BEE]-EventThread,
 state=WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at 
jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at 
java.util.concurrent.locks.LockSupport.park(java.base@9-ea/LockSupport.java:190)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@9-ea/AbstractQueuedSynchronizer.java:2064)
 at 
java.util.concurrent.LinkedBlockingQueue.take(java.base@9-ea/LinkedBlockingQueue.java:442)
 at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494)6) 
Thread[id=10509, name=zkCallback-1848-thread-5, state=TIMED_WAITING, 
group=TGRP-CollectionsAPIDistributedZkTest] at 
jdk.internal.misc.Unsafe.park(java.base@9-ea/Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@9-ea/LockSupport.java:230)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@9-ea/SynchronousQueue.java:461)
 at 
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@9-ea/SynchronousQueue.java:362)
 at 

[jira] [Commented] (SOLR-7282) Cache config or index schema objects by configset and share them across cores

2016-07-12 Thread damien kamerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374179#comment-15374179
 ] 

damien kamerman commented on SOLR-7282:
---

What about caching the IndexSchema object only when the schemaFactory is the 
(immutable) ClassicIndexSchemaFactory?

> Cache config or index schema objects by configset and share them across cores
> -
>
> Key: SOLR-7282
> URL: https://issues.apache.org/jira/browse/SOLR-7282
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Noble Paul
> Fix For: 5.2, 6.0
>
> Attachments: SOLR-7282.patch
>
>
> Sharing schema and config objects has been known to improve startup 
> performance when a large number of cores are on the same box (See 
> http://wiki.apache.org/solr/LotsOfCores).Damien also saw improvements to 
> cluster startup speed upon caching the index schema in SOLR-7191.
> Now that SolrCloud configuration is based on config sets in ZK, we should 
> explore how we can minimize config/schema parsing for each core in a way that 
> is compatible with the recent/planned changes in the config and schema APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Windows (64bit/jdk1.8.0_92) - Build # 5981 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/5981/
Java: 64bit/jdk1.8.0_92 -XX:-UseCompressedOops -XX:+UseParallelGC

4 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.lucene.search.TestBoolean2

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at __randomizedtesting.SeedInfo.seed([7C3A3DE7DC768814]:0)
at java.util.Arrays.copyOf(Arrays.java:3308)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexReader.(CompressingStoredFieldsIndexReader.java:106)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.(CompressingStoredFieldsReader.java:135)
at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:121)
at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:119)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
at 
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
at 
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:54)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:675)
at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:77)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at 
org.apache.lucene.index.RandomIndexWriter.mockIndexWriter(RandomIndexWriter.java:73)
at 
org.apache.lucene.index.RandomIndexWriter.mockIndexWriter(RandomIndexWriter.java:56)
at 
org.apache.lucene.index.RandomIndexWriter.(RandomIndexWriter.java:113)
at 
org.apache.lucene.index.RandomIndexWriter.(RandomIndexWriter.java:107)
at 
org.apache.lucene.search.TestBoolean2.beforeClass(TestBoolean2.java:149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:811)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)


FAILED:  junit.framework.TestSuite.org.apache.lucene.search.TestBoolean2

Error Message:


Stack Trace:
java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([7C3A3DE7DC768814]:0)
at 
org.apache.lucene.search.TestBoolean2.afterClass(TestBoolean2.java:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:834)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[jira] [Updated] (SOLR-8714) Implement translation contrib package for LanguageTranslationUpdateProcessor's

2016-07-12 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated SOLR-8714:
---
Fix Version/s: (was: 6.0)
   6.2

> Implement translation contrib package for LanguageTranslationUpdateProcessor's
> --
>
> Key: SOLR-8714
> URL: https://issues.apache.org/jira/browse/SOLR-8714
> Project: Solr
>  Issue Type: New Feature
>Reporter: Lewis John McGibbney
>Assignee: Tommaso Teofili
> Fix For: 6.2
>
>
> A while back over in Tika we implemented the 
> [Translator|https://github.com/apache/tika/blob/master/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java]
>  interface. This now provides a number of 
> [implementations|https://github.com/apache/tika/tree/master/tika-translate/src/main/java/org/apache/tika/language/translate].
>  
> This issue will provide a  translation contrib package offering a 
> LanguageTranslationUpdateProcessor.
> The new processor will probably utilize the existing [Solr Language 
> Identifier|https://github.com/apache/lucene-solr/tree/master/solr/contrib/langid]
>  and would enable a document to be translated based upon a user defined 
> mapping. The LanguageTranslatorUpdateProcessor's should be pluggable and 
> would be placed in an UpdateChain the same as the 
> [LanguageIdentifierUpdateProcessor|https://github.com/apache/lucene-solr/blob/master/solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java]'s
> It is my intent to also provide a wiki page which can be referenced and 
> maintained in conjunction with the code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-ea+126) - Build # 17232 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17232/
Java: 32bit/jdk-9-ea+126 -client -XX:+UseSerialGC

2 tests failed.
FAILED:  org.apache.solr.handler.TestReqParamsAPI.test

Error Message:
Could not get expected value  'CY val' for path 'response/params/y/c' full 
output: {   "responseHeader":{ "status":0, "QTime":0},   "response":{   
  "znodeVersion":0, "params":{"x":{ "a":"A val", "b":"B 
val", "":{"v":0},  from server:  https://127.0.0.1:46639/collection1

Stack Trace:
java.lang.AssertionError: Could not get expected value  'CY val' for path 
'response/params/y/c' full output: {
  "responseHeader":{
"status":0,
"QTime":0},
  "response":{
"znodeVersion":0,
"params":{"x":{
"a":"A val",
"b":"B val",
"":{"v":0},  from server:  https://127.0.0.1:46639/collection1
at 
__randomizedtesting.SeedInfo.seed([3FEF165720DF6CC6:B7BB298D8E23013E]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestSolrConfigHandler.testForResponseElement(TestSolrConfigHandler.java:481)
at 
org.apache.solr.handler.TestReqParamsAPI.testReqParams(TestReqParamsAPI.java:159)
at 
org.apache.solr.handler.TestReqParamsAPI.test(TestReqParamsAPI.java:61)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:533)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 715 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/715/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

3 tests failed.
FAILED:  org.apache.solr.cloud.CollectionStateFormat2Test.test

Error Message:
Could not find collection:.system

Stack Trace:
java.lang.AssertionError: Could not find collection:.system
at 
__randomizedtesting.SeedInfo.seed([4C6FC1BE3225788F:C43BFE649CD91577]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:154)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:139)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.waitForRecoveriesToFinish(AbstractDistribZkTestBase.java:134)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForRecoveriesToFinish(AbstractFullDistribZkTestBase.java:856)
at 
org.apache.solr.cloud.CollectionStateFormat2Test.testConfNameAndCollectionNameSame(CollectionStateFormat2Test.java:53)
at 
org.apache.solr.cloud.CollectionStateFormat2Test.test(CollectionStateFormat2Test.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

Re: Converting double values to char[]

2016-07-12 Thread Erick Erickson
Right, thanks. I had no intention of actually committing any of this
in, I'm just trying to figure out if this is worth the effort with a
quick hack to take some measurements. That said I'll be more cautious
now about making any patches.

I'll add that preliminarily, even on just looking at string fields
it's looking promising. We can write a CharsRef without having to go
through a String operation or hacking any code.

Erick

On Tue, Jul 12, 2016 at 4:31 PM, Yonik Seeley  wrote:
> On Tue, Jul 12, 2016 at 7:22 PM, Erick Erickson  
> wrote:
>> Before I launch off into ripping off the toString code fro, cay Float
>> and Double types
>
> Please look into licensing before you do that...
> Or rip off from Geronimo, which is ASL
>
> -Yonik
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Converting double values to char[]

2016-07-12 Thread Yonik Seeley
On Tue, Jul 12, 2016 at 7:22 PM, Erick Erickson  wrote:
> Before I launch off into ripping off the toString code fro, cay Float
> and Double types

Please look into licensing before you do that...
Or rip off from Geronimo, which is ASL

-Yonik

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Converting double values to char[]

2016-07-12 Thread Erick Erickson
Is there any pre-existing art to convert from a numeric DocValues
field to a char[] that's re-usable?

This is for SOLR-9296. I have these "FieldWriters" that have one
instantiated for each field for the duration of the export. I'm trying
to reduce the number of Java objects, thus replacing a number of the
toString() calls with something that write to a char[] then re-use
that char[] array.

Before I launch off into ripping off the toString code fro, cay Float
and Double types

Thanks,
Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Pydev and pylucene issues reading libraries

2016-07-12 Thread Andi Vajda

> On Jul 12, 2016, at 15:09, Kevin Lopez  wrote:
> 
> I have a django project in eclipse's pydev enviroment which need access to
> the libjvm.so located at:
> 
> /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/
> 
> I get this error:
> 
>> 
>> Traceback (most recent call last):
>>  File "/home/kevin/git/YIF/imageFinder/tools/indexer.py", line 5, in
>> 
>>import lucene
>>  File "/usr/lib/python2.7/dist-packages/lucene/__init__.py", line 2, in
>> 
>>import os, _lucene
>> ImportError: libjvm.so: cannot open shared object file: No such file or
>> directory
> 
> 
> How can I get pydev/eclipse to see this library? I tried doing:
> 
> import lucene
> lucene.initVM()
> 
> And it seems to work in a python shell running in terminal, but I can't do
> this in eclipse, anyone know how I can resolve this? I am running Ubuntu
> 16.04

If it works in a shell and the shell finds it with the help of LD_LIBRARY_PATH 
then you need to make sure eclipse sees the same env variable. If eclipse 
already loads a libjvm.so file for its own use, you need to ensure it's the 
same version as the one you're trying to load with PyLucene.
These are just guesses, I don't use eclipse.

Andi..

> 
> Thanks,
> 
> Kevin



[jira] [Commented] (SOLR-9299) Allow Streaming Expressions to use Analyzers

2016-07-12 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373901#comment-15373901
 ] 

Cao Manh Dat commented on SOLR-9299:


That's a good idea! It's will help us easier to test and focus on small part 
first.

> Allow Streaming Expressions to use Analyzers
> 
>
> Key: SOLR-9299
> URL: https://issues.apache.org/jira/browse/SOLR-9299
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> As SOLR-9240 is close to completion it will be important for Streaming 
> Expressions to be able to analyze text fields. This ticket will add this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9136) Separate out the error statistics into server-side error vs client-side error

2016-07-12 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-9136:

Fix Version/s: (was: 6.2)
   6.1

> Separate out the error statistics into server-side error vs client-side error
> -
>
> Key: SOLR-9136
> URL: https://issues.apache.org/jira/browse/SOLR-9136
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jessica Cheng Mallet
>Priority: Minor
> Fix For: 6.1, trunk
>
> Attachments: SOLR-9136.patch, SOLR-9136.patch, SOLR-9136.patch
>
>
> Currently Solr counts both server-side errors (5xx) and client-side errors 
> (4xx) under the same statistic "errors". Operationally it's beneficial to 
> have those errors separated out so different teams can be alerted depending 
> on if Solr is seeing lots of server errors vs. client errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373886#comment-15373886
 ] 

Cao Manh Dat commented on SOLR-9252:


Thanks, That's seem a good improvement, I will update the patch soon. 

In general, TF-IDF is a good/standard way to represent document for 
classification. We can use TF only, but it wont as good as TF-IDF and a nice 
thing about SOLR that we can get IDF of terms very quickly.

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-MacOSX (64bit/jdk1.8.0) - Build # 276 - Failure!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/276/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  
org.apache.solr.common.cloud.TestCollectionStateWatchers.testSimpleCollectionWatch

Error Message:
CollectionStateWatcher wasn't cleared after completion

Stack Trace:
java.lang.AssertionError: CollectionStateWatcher wasn't cleared after completion
at 
__randomizedtesting.SeedInfo.seed([56A9BB942B1F1FFD:B9274E46C1280C3]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.common.cloud.TestCollectionStateWatchers.testSimpleCollectionWatch(TestCollectionStateWatchers.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 13244 lines...]
   [junit4] Suite: org.apache.solr.common.cloud.TestCollectionStateWatchers
   [junit4]   2> Creating dataDir: 

Pydev and pylucene issues reading libraries

2016-07-12 Thread Kevin Lopez
I have a django project in eclipse's pydev enviroment which need access to
the libjvm.so located at:

/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/

I get this error:

>
> Traceback (most recent call last):
>   File "/home/kevin/git/YIF/imageFinder/tools/indexer.py", line 5, in
> 
> import lucene
>   File "/usr/lib/python2.7/dist-packages/lucene/__init__.py", line 2, in
> 
> import os, _lucene
> ImportError: libjvm.so: cannot open shared object file: No such file or
> directory


How can I get pydev/eclipse to see this library? I tried doing:

import lucene
lucene.initVM()

And it seems to work in a python shell running in terminal, but I can't do
this in eclipse, anyone know how I can resolve this? I am running Ubuntu
16.04

Thanks,

Kevin


[JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-ea+126) - Build # 17231 - Failure!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17231/
Java: 32bit/jdk-9-ea+126 -server -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.TestDistributedSearch.test

Error Message:
Expected to find shardAddress in the up shard info

Stack Trace:
java.lang.AssertionError: Expected to find shardAddress in the up shard info
at 
__randomizedtesting.SeedInfo.seed([2DDE02801EF6486D:A58A3D5AB00A2595]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.TestDistributedSearch.comparePartialResponses(TestDistributedSearch.java:1172)
at 
org.apache.solr.TestDistributedSearch.queryPartialResults(TestDistributedSearch.java:1113)
at 
org.apache.solr.TestDistributedSearch.test(TestDistributedSearch.java:973)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:533)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsRepeatStatement.callStatement(BaseDistributedSearchTestCase.java:1011)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 

[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3408 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3408/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
Error from server at http://127.0.0.1:54229/solr: 'location' is not specified 
as a query parameter or as a default repository property or as a cluster 
property.

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:54229/solr: 'location' is not specified as a 
query parameter or as a default repository property or as a cluster property.
at 
__randomizedtesting.SeedInfo.seed([C5013D9D0DB5C997:4D550247A349A46F]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:606)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:413)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:366)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1270)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1040)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:976)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.testInvalidPath(AbstractCloudBackupRestoreTestCase.java:149)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.test(AbstractCloudBackupRestoreTestCase.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-07-12 Thread Henrik (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373691#comment-15373691
 ] 

Henrik commented on SOLR-8621:
--

Never mind, I found the answer in 
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.4.pdf
 in "Merging Index Segments".

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * Lucene's SortingMergePolicy can be configured in Solr (with SOLR-5730)
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5 deprecates (but maintains)  support
> * SOLR-8668 in solr 6.0(\?) will remove  support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9200) Add Delegation Token Support to Solr

2016-07-12 Thread Gregory Chanan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373674#comment-15373674
 ] 

Gregory Chanan commented on SOLR-9200:
--

[~ichattopadhyaya] your argument sounds reasonable to me.

[~anshumg] and Ishan, thanks for taking a look.

> Add Delegation Token Support to Solr
> 
>
> Key: SOLR-9200
> URL: https://issues.apache.org/jira/browse/SOLR-9200
> Project: Solr
>  Issue Type: New Feature
>  Components: security
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Attachments: SOLR-9200.patch, SOLR-9200.patch
>
>
> SOLR-7468 added support for kerberos authentication via the hadoop 
> authentication filter.  Hadoop also has support for an authentication filter 
> that supports delegation tokens, which allow authenticated users the ability 
> to grab/renew/delete a token that can be used to bypass the normal 
> authentication path for a time.  This is useful in a variety of use cases:
> 1) distributed clients (e.g. MapReduce) where each client may not have access 
> to the user's kerberos credentials.  Instead, the job runner can grab a 
> delegation token and use that during task execution.
> 2) If the load on the kerberos server is too high, delegation tokens can 
> avoid hitting the kerberos server after the first request
> 3) If requests/permissions need to be delegated to another user: the more 
> privileged user can request a delegation token that can be passed to the less 
> privileged user.
> Note to self:
> In 
> https://issues.apache.org/jira/browse/SOLR-7468?focusedCommentId=14579636=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579636
>  I made the following comment which I need to investigate further, since I 
> don't know if anything changed in this area:
> {quote}3) I'm a little concerned with the "NoContext" code in KerberosPlugin 
> moving forward (I understand this is more a generic auth question than 
> kerberos specific). For example, in the latest version of the filter we are 
> using at Cloudera, we play around with the ServletContext in order to pass 
> information around 
> (https://github.com/cloudera/lucene-solr/blob/cdh5-4.10.3_5.4.2/solr/core/src/java/org/apache/solr/servlet/SolrHadoopAuthenticationFilter.java#L106).
>  Is there any way we can get the actual ServletContext in a plugin?{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for SolrCloud to support large scale *Extract, Transform and 
Load* work loads with streaming expressions. Instead of using MapReduce for 
ETL, the topic expression can be used which allows SolrCloud to be treated like 
a distributed message queue filled with data to be processed. The topic 
expression works in batches and supports retrieval of stored fields, so large 
scale *text ETL* will work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for SolrCloud to support large scale *Extract, Transform and 
Load* work loads with streaming expressions. Instead of using MapReduce for 
ETL, the topic expression can be used which allows SolrCloud to behave like a 
distributed message queue filled with data to be processed. The topic 
expression works in batches and supports retrieval of stored fields, so large 
scale *text ETL* will work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for SolrCloud to support large scale *Extract, Transform 
> and Load* work loads with streaming expressions. Instead of using MapReduce 
> for ETL, the topic expression can be used which allows SolrCloud to be 
> treated like a distributed message queue filled with data to be processed. 
> The topic expression works in batches and supports retrieval of stored 
> fields, so large scale *text ETL* will work perfectly with this approach.

[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* work loads with streaming expressions. Instead of using MapReduce for 
ETL, the topic expression can be used which allows SolrCloud to behave like a 
distributed message queue filled with data to be processed. The topic 
expression works in batches and supports retrieval of stored fields, so large 
scale *text ETL* will work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will behave like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* work loads with streaming expressions. Instead of using MapReduce for 
> ETL, the topic expression can be used which allows SolrCloud to behave like a 
> distributed message queue filled with data to be processed. The topic 
> expression works in batches and supports retrieval of stored fields, so large 
> scale *text ETL* will work perfectly with this approach.
> This ticket makes two small 

[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for SolrCloud to support large scale *Extract, Transform and 
Load* work loads with streaming expressions. Instead of using MapReduce for 
ETL, the topic expression can be used which allows SolrCloud to behave like a 
distributed message queue filled with data to be processed. The topic 
expression works in batches and supports retrieval of stored fields, so large 
scale *text ETL* will work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* work loads with streaming expressions. Instead of using MapReduce for 
ETL, the topic expression can be used which allows SolrCloud to behave like a 
distributed message queue filled with data to be processed. The topic 
expression works in batches and supports retrieval of stored fields, so large 
scale *text ETL* will work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for SolrCloud to support large scale *Extract, Transform 
> and Load* work loads with streaming expressions. Instead of using MapReduce 
> for ETL, the topic expression can be used which allows SolrCloud to behave 
> like a distributed message queue filled with data to be processed. The topic 
> expression works in batches and supports retrieval of stored fields, so large 
> scale *text ETL* will work perfectly with this approach.
> This 

[JENKINS] Lucene-Solr-6.x-Windows (64bit/jdk1.8.0_92) - Build # 317 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/317/
Java: 64bit/jdk1.8.0_92 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.lucene.store.TestSimpleFSDirectory

Error Message:
Could not remove the following files (in the order of attempts):
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001\testThreadSafety-001:
 java.nio.file.AccessDeniedException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001\testThreadSafety-001

C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001
 

Stack Trace:
java.io.IOException: Could not remove the following files (in the order of 
attempts):
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001\testThreadSafety-001:
 java.nio.file.AccessDeniedException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001\testThreadSafety-001
   
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001:
 java.nio.file.DirectoryNotEmptyException: 
C:\Users\jenkins\workspace\Lucene-Solr-6.x-Windows\lucene\build\core\test\J1\temp\lucene.store.TestSimpleFSDirectory_C7E540B1E34B1139-001

at __randomizedtesting.SeedInfo.seed([C7E540B1E34B1139]:0)
at org.apache.lucene.util.IOUtils.rm(IOUtils.java:323)
at 
org.apache.lucene.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:216)
at 
com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 782 lines...]
   [junit4] Suite: org.apache.lucene.store.TestSimpleFSDirectory
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62), 
sim=RandomSimilarity(queryNorm=true,coord=yes): {}, locale=ar-MA, 
timezone=Africa/Lagos
   [junit4]   2> NOTE: Windows 10 10.0 amd64/Oracle Corporation 1.8.0_92 
(64-bit)/cpus=3,threads=1,free=137811952,total=200347648
   [junit4]   2> NOTE: All tests run in this JVM: [TestReusableStringReader, 
TestUTF32ToUTF8, TestClassicSimilarity, TestFixedBitDocIdSet, TestMixedCodecs, 
TestPerFieldPostingsFormat2, TestTerm, TestDocValues, TestTransactions, 
TestSpanBoostQuery, TestAssertions, TestDateSort, 
TestMultiValuedNumericRangeQuery, TestIndexWriterThreadsToSegments, 
TestAttributeSource, TestByteBlockPool, TestByteArrayDataInput, 
TestCompiledAutomaton, TestGeoEncodingUtils, TestLucene54DocValuesFormat, 
TestBoostQuery, TestSortedSetSortField, TestPrefixQuery, TestCharArraySet, 
TestTermsEnum2, TestNewestSegment, TestMultiPhraseEnum, 
TestLucene50CompoundFormat, TestComplexExplanations, TestMixedDocValuesUpdates, 
TestHugeRamFile, TestLazyProxSkipping, TestStressIndexing, TestReadOnlyIndex, 
TestComplexExplanationsOfNonMatches, TestLucene62SegmentInfoFormat, 
TestPositionIncrement, TestWordlistLoader, TestCodecs, TestForTooMuchCloning, 
TestPrefixInBooleanQuery, TestSimilarityBase, TestAllFilesDetectTruncation, 
TestNoDeletionPolicy, TestSloppyPhraseQuery, TestCrashCausesCorruptIndex, 
TestBooleanOr, TestQueryBuilder, TestPayloads, TestLSBRadixSorter, 
TestStandardAnalyzer, TestMergePolicyWrapper, TestConjunctions, TestRollback, 
TestMultiThreadTermVectors, TestMultiMMap, TestCharsRefBuilder, TestNRTThreads, 
TestNativeFSLockFactory, TestIOUtils, Test2BPoints, TestCodecUtil, 
TestTrackingDirectoryWrapper, TestRollingUpdates, TestIndexWriterOnJRECrash, 
TestPriorityQueue, TestSimpleExplanations, TestRecyclingIntBlockAllocator, 

[jira] [Comment Edited] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373522#comment-15373522
 ] 

Joel Bernstein edited comment on SOLR-9252 at 7/12/16 7:32 PM:
---

I just reviewed the latest patch, it looks good. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies by 
adding the *terms.stats* param.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?


was (Author: joel.bernstein):
I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies by 
adding the *terms.stats* param.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Windows (64bit/jdk1.8.0_92) - Build # 5980 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/5980/
Java: 64bit/jdk1.8.0_92 -XX:+UseCompressedOops -XX:+UseSerialGC

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestSolrConfigHandlerCloud

Error Message:
ObjectTracker found 20 object(s) that were not released!!! [SolrCore, SolrCore, 
MDCAwareThreadPoolExecutor, MockDirectoryWrapper, MockDirectoryWrapper, 
SolrCore, MockDirectoryWrapper, MockDirectoryWrapper, MockDirectoryWrapper, 
SolrCore, MDCAwareThreadPoolExecutor, SolrCore, MockDirectoryWrapper, 
MDCAwareThreadPoolExecutor, SolrCore, MDCAwareThreadPoolExecutor, 
MDCAwareThreadPoolExecutor, SolrCore, MDCAwareThreadPoolExecutor, SolrCore]

Stack Trace:
java.lang.AssertionError: ObjectTracker found 20 object(s) that were not 
released!!! [SolrCore, SolrCore, MDCAwareThreadPoolExecutor, 
MockDirectoryWrapper, MockDirectoryWrapper, SolrCore, MockDirectoryWrapper, 
MockDirectoryWrapper, MockDirectoryWrapper, SolrCore, 
MDCAwareThreadPoolExecutor, SolrCore, MockDirectoryWrapper, 
MDCAwareThreadPoolExecutor, SolrCore, MDCAwareThreadPoolExecutor, 
MDCAwareThreadPoolExecutor, SolrCore, MDCAwareThreadPoolExecutor, SolrCore]
at __randomizedtesting.SeedInfo.seed([98CADD8A2B195E7A]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNull(Assert.java:551)
at 
org.apache.solr.SolrTestCaseJ4.teardownTestCases(SolrTestCaseJ4.java:257)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:834)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)


FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at 
__randomizedtesting.SeedInfo.seed([98CADD8A2B195E7A:109EE25085E53382]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.testBackupAndRestore(AbstractCloudBackupRestoreTestCase.java:209)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.test(AbstractCloudBackupRestoreTestCase.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 

[jira] [Comment Edited] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373522#comment-15373522
 ] 

Joel Bernstein edited comment on SOLR-9252 at 7/12/16 7:28 PM:
---

I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies by 
adding the *terms.stats* param.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?


was (Author: joel.bernstein):
I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies by 
adding the terms.stats param.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373522#comment-15373522
 ] 

Joel Bernstein edited comment on SOLR-9252 at 7/12/16 7:28 PM:
---

I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies by 
adding the terms.stats param.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?


was (Author: joel.bernstein):
I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373522#comment-15373522
 ] 

Joel Bernstein edited comment on SOLR-9252 at 7/12/16 7:25 PM:
---

I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies.

And one question about the use of tf-idf:

You're using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?


was (Author: joel.bernstein):
I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies.

And one question about the use of tf-idf:

Your using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373522#comment-15373522
 ] 

Joel Bernstein commented on SOLR-9252:
--

I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies.

And one question about the use of tf-idf:

2) Your using tf-idf for the doc vectors which seems like a good idea. Is this 
a typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9252) Feature selection and logistic regression on text

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373522#comment-15373522
 ] 

Joel Bernstein edited comment on SOLR-9252 at 7/12/16 7:25 PM:
---

I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies.

And one question about the use of tf-idf:

Your using tf-idf for the doc vectors which seems like a good idea. Is this a 
typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?


was (Author: joel.bernstein):
I just reviewed the latest patch. One implementation detail:

The terms component also returns the numDocs now that SOLR-9193 has been 
committed. So you can retrieve the numDocs along with the doc frequencies.

And one question about the use of tf-idf:

2) Your using tf-idf for the doc vectors which seems like a good idea. Is this 
a typical approach for text regression or is this something you decided to do 
because we have access to these types of stats in the index?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> enron1.zip
>
>
> SOLR-9186 come up with a challenges that for each iterative we have to 
> rebuild the tf-idf vector for each documents. It is costly computation if we 
> represent doc by a lot of terms. Features selection can help reducing the 
> computation.
> Due to its computational efficiency and simple interpretation, information 
> gain is one of the most popular feature selection methods. It is used to 
> measure the dependence between features and labels and calculates the 
> information gain between the i-th feature and the class labels 
> (http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf).
> I confirmed that by running logistics regressions on enron mail dataset (in 
> which each email is represented by top 100 terms that have highest 
> information gain) and got the accuracy by 92% and precision by 82%.
> This ticket will create two new streaming expression. Both of them use the 
> same *parallel iterative framework* as SOLR-8492.
> {code}
> featuresSelection(collection1, q="*:*",  field="tv_text", outcome="out_i", 
> positiveLabel=1, numTerms=100)
> {code}
> featuresSelection will emit top terms that have highest information gain 
> scores. It can be combined with new tlogit stream.
> {code}
> tlogit(collection1, q="*:*",
>  featuresSelection(collection1, 
>   q="*:*",  
>   field="tv_text", 
>   outcome="out_i", 
>   positiveLabel=1, 
>   numTerms=100),
>  field="tv_text",
>  outcome="out_i",
>  maxIterations=100)
> {code}
> In the iteration n, the text logistics regression will emit nth model, and 
> compute the error of (n-1)th model. Because the error will be wrong if we 
> compute the error dynamically in each iteration. 
> In each iteration tlogit will change learning rate based on error of previous 
> iteration. It will increase the learning rate by 5% if error is going down 
> and It will decrease the learning rate by 50% if error is going up.
> This will support use cases such as building models for spam detection, 
> sentiment analysis and threat detection. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7013) Move license header before package declaration in all *.java files

2016-07-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373509#comment-15373509
 ] 

Steve Rowe commented on LUCENE-7013:


No rush :) - good luck with the house!

> Move license header before package declaration in all *.java files
> --
>
> Key: LUCENE-7013
> URL: https://issues.apache.org/jira/browse/LUCENE-7013
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 5.5, trunk
>
> Attachments: LUCENE-7013-precommit.patch, LUCENE-7013.patch, 
> mvcopyright.py, mvcopyright.py
>
>
> In LUCENE-7012 we committed a change to the IDE templates to place the 
> license header before the package declaration in new Java files.
> I wrote a simple Python script which moves the header before the package 
> declaration. To be on the safe side, if a .java file does not already start 
> with the license header or with {{package org.apache}}, it doesn't modify it 
> and asks for manual intervention.
> It runs quite fast, so I don't mind running and committing one module at a 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9298) solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)

2016-07-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373506#comment-15373506
 ] 

Steve Rowe commented on SOLR-9298:
--

The analysis-extras module tests passed for me under Maven; +1 to commit.

But other modules' tests under the maven seem hosed: I got failures from solrj 
({{java.lang.ClassNotFoundException: org.hsqldb.jdbcDriver}}), so the Solr core 
tests didn't run as a result.  Also looks like the Lucene core tests were 
skipped for some reason.  I'll investigate.

> solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)
> --
>
> Key: SOLR-9298
> URL: https://issues.apache.org/jira/browse/SOLR-9298
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-9298.patch
>
>
> The error/exception concerned is
> {code}
> java.lang.IllegalStateException: Unable to locate keystore resource file in 
> classpath: SSLTestConfig.testing.keystore
> {code}
> and it seems something similar to the [dev-tools/idea 
> commit|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6942fe2] 
> in SOLR-8970 is also needed for 
> [dev-tools/maven|https://github.com/apache/lucene-solr/tree/master/dev-tools/maven]'s
>  
> [solr/test-framework/pom.xml.template|https://github.com/apache/lucene-solr/blob/master/dev-tools/maven/solr/test-framework/pom.xml.template#L61]
>  file.
> Attached patch seems to work but I am new to maven and so would very much 
> appreciate additional input on this. Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7013) Move license header before package declaration in all *.java files

2016-07-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373486#comment-15373486
 ] 

Uwe Schindler commented on LUCENE-7013:
---

Hi, sorry for delay. Let me apply the patch tomorrow. I will for sure report 
back! I am bit busy because buying a house...

> Move license header before package declaration in all *.java files
> --
>
> Key: LUCENE-7013
> URL: https://issues.apache.org/jira/browse/LUCENE-7013
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 5.5, trunk
>
> Attachments: LUCENE-7013-precommit.patch, LUCENE-7013.patch, 
> mvcopyright.py, mvcopyright.py
>
>
> In LUCENE-7012 we committed a change to the IDE templates to place the 
> license header before the package declaration in new Java files.
> I wrote a simple Python script which moves the header before the package 
> declaration. To be on the safe side, if a .java file does not already start 
> with the license header or with {{package org.apache}}, it doesn't modify it 
> and asks for manual intervention.
> It runs quite fast, so I don't mind running and committing one module at a 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5944) Support updates of numeric DocValues

2016-07-12 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-5944:
---
Attachment: SOLR-5944.patch

Updated patch:
# Refactored the logic to add a previous version to an AddUpdateCommand, moved 
it to DUP from JavabinLoader/XMLLoader
# Updated javadocs to make them more detailed. Added javadocs to some related 
existing methods in VersionInfo.

> Support updates of numeric DocValues
> 
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, 
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-12 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-9290:

Attachment: SOLR-9290-debug.patch
setup-solr.sh

The setup-solr.sh is small script that I use to setup different versions with 
SSL enabled and debug logging enabled for httpclient.

The debug logging looks like the following:
{code}
2016-07-12 13:25:05.692 DEBUG 
(httpShardExecutor-4-thread-7-processing-x:xyz_shard3_replica1 r:core_node5 
https:127.0.1.1:8983//solr//xyz_shard3_replica2// n:127.0.1.1:8984_solr 
s:shard3 c:xyz [https:127.0.1.1:8983//solr//xyz_shard3_replica2//]) [c:xyz 
s:shard3 r:core_node5 x:xyz_shard3_replica1] 
o.a.h.i.c.PoolingClientConnectionManager Connection leased: [id: 17][route: 
{s}->https://127.0.1.1:8983][total kept alive: 0; route allocated: 5 of 10; 
total allocated: 5 of 10]
...
2016-07-12 13:25:05.791 DEBUG 
(recoveryExecutor-3-thread-4-processing-n:127.0.1.1:8984_solr 
x:xyz_shard1_replica1 s:shard1 c:xyz r:core_node2) [c:xyz s:shard1 r:core_node2 
x:xyz_shard1_replica1] o.a.h.i.c.PoolingClientConnectionManager Connection 
released: [id: 17][route: {s}->https://127.0.1.1:8983][total kept alive: 8; 
route allocated: 8 of 10; total allocated: 8 of 10]
{code}

The attached SOLR-9290-debug.patch applies to 5.3.x and changes HttpSolrClient 
to log the connection details including the client port number for each request.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-12 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373452#comment-15373452
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

It is reproducible very easily on stock solr with SSL enabled. My test setup 
creates two SSL-enabled Solr instances with a 5 shard x 2 replica collection 
and runs a short indexing program (just 9 update requests with 1 document each 
and a commit at the end). Keep on running the indexing program repeatedly and 
the number of connections in the CLOSE_WAIT state gradually increase.

Interestingly, the number of connections stuck in CLOSE_WAIT decrease during 
indexing and increase again about 10 or so seconds after the indexing is 
stopped.

I can reproduce the problem on 6.1, 6.0, 5.5.1, 5.3.2. I am not able to 
reproduce this on master although I don't see anything relevant that has 
changed since 6.1 -- I tried this only once so it may have just been bad timing?

When the connections show in CLOSE_WAIT state, the recv-q buffer always has 
exactly 70 bytes.
{code}
netstat -tonp | grep CLOSE_WAIT | grep java
tcp   70  0 127.0.0.1:56538 127.0.1.1:8983  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:47995 127.0.1.1:8984  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:47477 127.0.1.1:8984  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:47996 127.0.1.1:8984  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:56644 127.0.1.1:8983  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:56533 127.0.1.1:8983  CLOSE_WAIT  
21654/java   off (0.00/0/0)
...
{code}

If I run the same steps with SSL disabled then the connections in CLOSE_WAIT 
state have just 1 byte in recv-q. I don't see the number of such connections 
increasing with indexing over time but I know for a fact (from a client) that 
eventually more and more connections pile up in this state even without SSL.
{code}
tcp   1  0 127.0.0.1:41723 127.0.1.1:8983  CLOSE_WAIT  
2522/javaoff (0.00/0/0)
tcp   1  0 127.0.0.1:41780 127.0.1.1:8983  CLOSE_WAIT  
2640/javaoff (0.00/0/0)
...
{code}

I enabled debug logging for PoolingHttpClientConnectionManager (used in 6.x) 
and PoolingClientConnectionManager (used in 5.x.x) and after running the 
indexing program and verifying that some connections are in CLOSE_WAIT, I 
grepped the logs for connections leased vs released and I always find the 
number to be the same which means that the connections are always given back to 
the pool.

Now some connections hanging around in CLOSE_WAIT are to be expected because of 
the following (quoted from the httpclient documentation):
{quote}
One of the major shortcomings of the classic blocking I/O model is that the 
network socket can react to I/O events only when blocked in an I/O operation. 
When a connection is released back to the manager, it can be kept alive however 
it is unable to monitor the status of the socket and react to any I/O events. 
If the connection gets closed on the server side, the client side connection is 
unable to detect the change in the connection state (and react appropriately by 
closing the socket on its end).
HttpClient tries to mitigate the problem by testing whether the connection is 
'stale', that is no longer valid because it was closed on the server side, 
prior to using the connection for executing an HTTP request. The stale 
connection check is not 100% reliable. The only feasible solution that does not 
involve a one thread per socket model for idle connections is a dedicated 
monitor thread used to evict connections that are considered expired due to a 
long period of inactivity. The monitor thread can periodically call 
ClientConnectionManager#closeExpiredConnections() method to close all expired 
connections and evict closed connections from the pool. It can also optionally 
call ClientConnectionManager#closeIdleConnections() method to close all 
connections that have been idle over a given period of time.
{quote}

I'm going to try adding such a monitor thread and see if this is still a 
problem.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, 

[JENKINS] Lucene-Solr-NightlyTests-master - Build # 1069 - Still Failing

2016-07-12 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1069/

12 tests failed.
FAILED:  org.apache.solr.security.BasicAuthIntegrationTest.testBasics

Error Message:
IOException occured when talking to server at: 
http://127.0.0.1:33883/solr/testSolrCloudCollection_shard1_replica1

Stack Trace:
org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: IOException 
occured when talking to server at: 
http://127.0.0.1:33883/solr/testSolrCloudCollection_shard1_replica1
at 
__randomizedtesting.SeedInfo.seed([5478594789356A7C:69A0F76BB1DB340C]:0)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:739)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1151)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1040)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:976)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.security.BasicAuthIntegrationTest.doExtraTests(BasicAuthIntegrationTest.java:193)
at 
org.apache.solr.cloud.TestMiniSolrCloudClusterBase.testCollectionCreateSearchDelete(TestMiniSolrCloudClusterBase.java:196)
at 
org.apache.solr.cloud.TestMiniSolrCloudClusterBase.testBasics(TestMiniSolrCloudClusterBase.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 

[jira] [Commented] (SOLR-9298) solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)

2016-07-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373328#comment-15373328
 ] 

Steve Rowe commented on SOLR-9298:
--

Christine, I think your patch is the right fix.  I'd do one extra thing though: 
get rid of the {{\*\*/\*.java}}, which 
only makes sense when resources are co-located in a {{src/java/}} or 
{{src/test/}} directory.  (Trivial point here though since {{src/resources/}} 
doesn't contain any .java files right now.)

Initially I thought the existing resource (all non-.java files in the module) 
should be preserved, but when I did that (by adding a separate {{}} 
for {{src/resources/}} instead of replacing it as your patch does), I can see 
that the jar actually contains the test keystore twice, one at the root (from 
the {{src/resources/}} {{}}), and another at {{src/resources/}}, and 
the latter is useless (and the cause of this issue).

I'm running tests with your patch locally now.

> solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)
> --
>
> Key: SOLR-9298
> URL: https://issues.apache.org/jira/browse/SOLR-9298
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-9298.patch
>
>
> The error/exception concerned is
> {code}
> java.lang.IllegalStateException: Unable to locate keystore resource file in 
> classpath: SSLTestConfig.testing.keystore
> {code}
> and it seems something similar to the [dev-tools/idea 
> commit|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6942fe2] 
> in SOLR-8970 is also needed for 
> [dev-tools/maven|https://github.com/apache/lucene-solr/tree/master/dev-tools/maven]'s
>  
> [solr/test-framework/pom.xml.template|https://github.com/apache/lucene-solr/blob/master/dev-tools/maven/solr/test-framework/pom.xml.template#L61]
>  file.
> Attached patch seems to work but I am new to maven and so would very much 
> appreciate additional input on this. Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will behave like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic expression so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will behave like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will be used and SolrCloud will behave like a 
> giant message queue filled with data to be processed. The topic expression 
> works in batches and supports retrieval of stored fields, so large scale 
> *text ETL* will work perfectly with this approach.
> This ticket makes two small changes to the 

[jira] [Updated] (SOLR-9299) Allow Streaming Expressions to use Analyzers

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9299:
-
Description: As SOLR-9240 is close to completion it will be important for 
Streaming Expressions to be able to analyze text fields. This ticket will add 
this capability.  (was: As SOLR-9240 is close to completion it we be important 
for Streaming Expression to be able to analyze text fields. This ticket will 
add this capability.)

> Allow Streaming Expressions to use Analyzers
> 
>
> Key: SOLR-9299
> URL: https://issues.apache.org/jira/browse/SOLR-9299
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> As SOLR-9240 is close to completion it will be important for Streaming 
> Expressions to be able to analyze text fields. This ticket will add this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7013) Move license header before package declaration in all *.java files

2016-07-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373254#comment-15373254
 ] 

Steve Rowe commented on LUCENE-7013:


Did you get a chance to look [~thetaphi]?

> Move license header before package declaration in all *.java files
> --
>
> Key: LUCENE-7013
> URL: https://issues.apache.org/jira/browse/LUCENE-7013
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 5.5, trunk
>
> Attachments: LUCENE-7013-precommit.patch, LUCENE-7013.patch, 
> mvcopyright.py, mvcopyright.py
>
>
> In LUCENE-7012 we committed a change to the IDE templates to place the 
> license header before the package declaration in new Java files.
> I wrote a simple Python script which moves the header before the package 
> declaration. To be on the safe side, if a .java file does not already start 
> with the license header or with {{package org.apache}}, it doesn't modify it 
> and asks for manual intervention.
> It runs quite fast, so I don't mind running and committing one module at a 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9299) Allow Streaming Expressions to use Analyzers

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373222#comment-15373222
 ] 

Joel Bernstein commented on SOLR-9299:
--

[~caomanhdat], I know that you are working on this as part of the larger 
Classification ticket. How would you feel about spitting this functionality off 
into this ticket to support SOLR-9240.

> Allow Streaming Expressions to use Analyzers
> 
>
> Key: SOLR-9299
> URL: https://issues.apache.org/jira/browse/SOLR-9299
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> As SOLR-9240 is close to completion it we be important for Streaming 
> Expression to be able to analyze text fields. This ticket will add this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9299) Allow Streaming Expressions to use Analyzers

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373222#comment-15373222
 ] 

Joel Bernstein edited comment on SOLR-9299 at 7/12/16 4:52 PM:
---

[~caomanhdat], I know that you are working on this as part of a larger 
Classification ticket. How would you feel about spitting this functionality off 
into this ticket to support SOLR-9240.


was (Author: joel.bernstein):
[~caomanhdat], I know that you are working on this as part a larger 
Classification ticket. How would you feel about spitting this functionality off 
into this ticket to support SOLR-9240.

> Allow Streaming Expressions to use Analyzers
> 
>
> Key: SOLR-9299
> URL: https://issues.apache.org/jira/browse/SOLR-9299
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> As SOLR-9240 is close to completion it we be important for Streaming 
> Expression to be able to analyze text fields. This ticket will add this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9299) Allow Streaming Expressions to use Analyzers

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373222#comment-15373222
 ] 

Joel Bernstein edited comment on SOLR-9299 at 7/12/16 4:51 PM:
---

[~caomanhdat], I know that you are working on this as part a larger 
Classification ticket. How would you feel about spitting this functionality off 
into this ticket to support SOLR-9240.


was (Author: joel.bernstein):
[~caomanhdat], I know that you are working on this as part of the larger 
Classification ticket. How would you feel about spitting this functionality off 
into this ticket to support SOLR-9240.

> Allow Streaming Expressions to use Analyzers
> 
>
> Key: SOLR-9299
> URL: https://issues.apache.org/jira/browse/SOLR-9299
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> As SOLR-9240 is close to completion it we be important for Streaming 
> Expression to be able to analyze text fields. This ticket will add this 
> capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9299) Allow Streaming Expressions to use Analyzers

2016-07-12 Thread Joel Bernstein (JIRA)
Joel Bernstein created SOLR-9299:


 Summary: Allow Streaming Expressions to use Analyzers
 Key: SOLR-9299
 URL: https://issues.apache.org/jira/browse/SOLR-9299
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


As SOLR-9240 is close to completion it we be important for Streaming Expression 
to be able to analyze text fields. This ticket will add this capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Near real time search improvement

2016-07-12 Thread Adrien Grand
This is not something I am very familiar with, but this issue
https://issues.apache.org/jira/browse/LUCENE-2312 tried to improve NRT
latency by adding the ability to search directly into the indexing buffer
of the index writer.

Le mar. 12 juil. 2016 à 16:11, Konstantin  a
écrit :

> Hello everyone,
> As far as I understand NRT requires flushing new segment to disk. Is it
> correct that write cache is not searchable ?
>
> Competing search library groonga
>  - claim that they have much
> smaller realtime search latency (as far as I understand via searchable
> write-cache), but loading data into their index takes almost three times
> longer (benchmark in blog post in Japanese
>  , seems like
>  wikipedia XML, I'm not sure if it's English one ).
>
> I've created incomplete prototype of searchable write cache in my pet
> project  - and it takes two times
> longer to index fraction of wikipedia using same EnglishAnalyzer from
> lucene.analysis (probably there is a room for optimizations). While loading
> data into Lucene I didn't reuse Document instances. Searchable write-cache
> was implemented as a bunch of persistent  scala's SortedMap[TermKey,
> Measure](), one per logical core. Where TermKey is defined as 
> TermKey(termID:Int,
> docID: Long)and Measure is just frequency and norm (but could be
> extended).
>
> Do you think it's worth the slowdown ? If so I'm interested to learn how
> this part of Lucene works while implementing this feature. However it is
> unclear to me how hard would it be to change existing implementation. I
> cannot wrap my head around TermHash and the whole flush process - are there
> any documentation, good blog posts to read about it ?
>
>


[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 714 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/714/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

2 tests failed.
FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
Error from server at http://127.0.0.1:46928/solr: 'location' is not specified 
as a query parameter or as a default repository property or as a cluster 
property.

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://127.0.0.1:46928/solr: 'location' is not specified as a 
query parameter or as a default repository property or as a cluster property.
at 
__randomizedtesting.SeedInfo.seed([A4C4B76E94DE792E:2C9088B43A2214D6]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:606)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:259)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:413)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:366)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1270)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1040)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:976)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.testInvalidPath(AbstractCloudBackupRestoreTestCase.java:149)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.test(AbstractCloudBackupRestoreTestCase.java:128)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[jira] [Updated] (SOLR-9298) solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)

2016-07-12 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-9298:
--
Attachment: SOLR-9298.patch

> solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)
> --
>
> Key: SOLR-9298
> URL: https://issues.apache.org/jira/browse/SOLR-9298
> Project: Solr
>  Issue Type: Test
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-9298.patch
>
>
> The error/exception concerned is
> {code}
> java.lang.IllegalStateException: Unable to locate keystore resource file in 
> classpath: SSLTestConfig.testing.keystore
> {code}
> and it seems something similar to the [dev-tools/idea 
> commit|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6942fe2] 
> in SOLR-8970 is also needed for 
> [dev-tools/maven|https://github.com/apache/lucene-solr/tree/master/dev-tools/maven]'s
>  
> [solr/test-framework/pom.xml.template|https://github.com/apache/lucene-solr/blob/master/dev-tools/maven/solr/test-framework/pom.xml.template#L61]
>  file.
> Attached patch seems to work but I am new to maven and so would very much 
> appreciate additional input on this. Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9298) solr/contrib/analysis-extras tests fail with maven (SSLTestConfig)

2016-07-12 Thread Christine Poerschke (JIRA)
Christine Poerschke created SOLR-9298:
-

 Summary: solr/contrib/analysis-extras tests fail with maven 
(SSLTestConfig)
 Key: SOLR-9298
 URL: https://issues.apache.org/jira/browse/SOLR-9298
 Project: Solr
  Issue Type: Test
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Christine Poerschke
Priority: Minor


The error/exception concerned is
{code}
java.lang.IllegalStateException: Unable to locate keystore resource file in 
classpath: SSLTestConfig.testing.keystore
{code}
and it seems something similar to the [dev-tools/idea 
commit|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6942fe2] in 
SOLR-8970 is also needed for 
[dev-tools/maven|https://github.com/apache/lucene-solr/tree/master/dev-tools/maven]'s
 
[solr/test-framework/pom.xml.template|https://github.com/apache/lucene-solr/blob/master/dev-tools/maven/solr/test-framework/pom.xml.template#L61]
 file.

Attached patch seems to work but I am new to maven and so would very much 
appreciate additional input on this. Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will behave like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() function's natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will behave like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() functions natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will be used and SolrCloud will behave like a 
> giant message queue filled with data to be processed. The topic expression 
> works in batches and supports retrieval of stored fields, so large scale 
> *text ETL* will work perfectly with this approach.
> This ticket makes two small changes to the 

[jira] [Commented] (SOLR-3702) String concatenation function

2016-07-12 Thread Chantal Ackermann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373155#comment-15373155
 ] 

Chantal Ackermann commented on SOLR-3702:
-

What's the status on this? It seems like it's finished? I'd be one user of it.

> String concatenation function
> -
>
> Key: SOLR-3702
> URL: https://issues.apache.org/jira/browse/SOLR-3702
> Project: Solr
>  Issue Type: New Feature
>  Components: query parsers
>Affects Versions: 4.0-ALPHA
>Reporter: Ted Strauss
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.9, 6.0
>
> Attachments: SOLR-3702.patch, SOLR-3702.patch, SOLR-3702.patch, 
> SOLR-3702.patch, SOLR-3702.patch
>
>
> Related to https://issues.apache.org/jira/browse/SOLR-2526
> Add query function to support concatenation of Strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7355) Leverage MultiTermAwareComponent in query parsers

2016-07-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7355.
--
   Resolution: Fixed
Fix Version/s: 6.2
   master (7.0)

Thanks David for helping me iterate on this issue.

On the 6.x branch, AnalyzingQueryParser uses the new Analyzer#normalize 
functionality while the classic QueryParser still relies on the 
{{lowercaseExpandedTerms}} option.

> Leverage MultiTermAwareComponent in query parsers
> -
>
> Key: LUCENE-7355
> URL: https://issues.apache.org/jira/browse/LUCENE-7355
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, 
> LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, 
> LUCENE-7355.patch
>
>
> MultiTermAwareComponent is designed to make it possible to do the right thing 
> in query parsers when in comes to analysis of multi-term queries. However, 
> since query parsers just take an analyzer and since analyzers do not 
> propagate the information about what to do for multi-term analysis, query 
> parsers cannot do the right thing out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7371.
--
   Resolution: Fixed
Fix Version/s: 6.2
   master (7.0)

Thanks Mike!

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373139#comment-15373139
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit 1a6df249f91ca9f4dab792c48f5965f3388f1776 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a6df24 ]

LUCENE-7371: Fix CHANGES entry.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373140#comment-15373140
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit b54d46722b36f107edd59a8d843b93f5727a9058 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b54d467 ]

LUCENE-7371: Fix CHANGES entry.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373137#comment-15373137
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit 1f446872aa9346c22643d0fb753ec42942b5a4d2 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1f44687 ]

LUCENE-7371: Better compression of values in Lucene60PointsFormat.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7355) Leverage MultiTermAwareComponent in query parsers

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373136#comment-15373136
 ] 

ASF subversion and git services commented on LUCENE-7355:
-

Commit 7c2e7a0fb80a5bf733cf710aee6cbf01d02629eb in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c2e7a0 ]

LUCENE-7355: Add Analyzer#normalize() and use it in query parsers.


> Leverage MultiTermAwareComponent in query parsers
> -
>
> Key: LUCENE-7355
> URL: https://issues.apache.org/jira/browse/LUCENE-7355
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, 
> LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, 
> LUCENE-7355.patch
>
>
> MultiTermAwareComponent is designed to make it possible to do the right thing 
> in query parsers when in comes to analysis of multi-term queries. However, 
> since query parsers just take an analyzer and since analyzers do not 
> propagate the information about what to do for multi-term analysis, query 
> parsers cannot do the right thing out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Fix Version/s: 6.2

> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will be used and SolrCloud will behave like a 
> giant message queue filled with data to be processed. The topic expression 
> works in batches and supports retrieval of stored fields, so large scale 
> *text ETL* will work perfectly with this approach.
> This ticket makes two small changes to the topic() expression that makes this 
> possible:
> 1) Changes the topic() behavior so it can operate in parallel.
> 2) Adds the initialCheckpoint parameter to the topic expression so a topic 
> can start pulling records from anywhere in the queue.
> Daemons can be sent to worker nodes that each work on processing a partition 
> of the data from the same topic. The daemon() functions natural behavior is 
> perfect for iteratively calling a topic until all records in the topic have 
> been processed.
> The sample code below pulls all records from one collection and indexes them 
> into another collection. A Transform function could be wrapped around the 
> topic() to transform the records before loading. Custom functions can also be 
> built to load the data in parallel to any outside system. 
> {code}
> parallel(
>  workerCollection, 
>  workers="2", 
>  sort="_version_ desc", 
>  daemon(
>   update(
> updateCollection, 
> batchSize=200, 
> topic(
> checkpointCollection,
> topicCollection, 
> q=*:*, 
>  id="topic1",
>  fl="id, to , from, body", 
>  partitionKeys="id",
>  initialCheckpoint="0")), 
>runInterval="1000", 
>id="daemon1"))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will behave like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() functions natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will be used like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() functions natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will be used and SolrCloud will behave like a 
> giant message queue filled with data to be processed. The topic expression 
> works in batches and supports retrieval of stored fields, so large scale 
> *text ETL* will work perfectly with this approach.
> This ticket makes two small changes to the topic() expression that makes this 

[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will be used and SolrCloud will be used like a giant 
message queue filled with data to be processed. The topic expression works in 
batches and supports retrieval of stored fields, so large scale *text ETL* will 
work perfectly with this approach.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can be sent to worker nodes that each work on processing a partition of 
the data from the same topic. The daemon() functions natural behavior is 
perfect for iteratively calling a topic until all records in the topic have 
been processed.

The sample code below pulls all records from one collection and indexes them 
into another collection. A Transform function could be wrapped around the 
topic() to transform the records before loading. Custom functions can also be 
built to load the data in parallel to any outside system. 

{code}

parallel(
 workerCollection, 
 workers="2", 
 sort="_version_ desc", 
 daemon(
  update(
updateCollection, 
batchSize=200, 
topic(
checkpointCollection,
topicCollection, 
q=*:*, 
 id="topic1",
 fl="id, to , from, body", 
 partitionKeys="id",
 initialCheckpoint="0")), 
   runInterval="1000", 
   id="daemon1"))
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will used and SolrCloud will be treated like a giant 
message queue filled with data to be processed.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can then be sent to worker nodes that each work on processing a 
partition of the data from the same topic. The daemon() functions natural 
behavior is perfect for iteratively calling a topic until all records in the 
topic have been processed.

{code}
{code}





> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will be used and SolrCloud will be used like a 
> giant message queue filled with data to be processed. The topic expression 
> works in batches and supports retrieval of stored fields, so large scale 
> *text ETL* will work perfectly with this approach.
> This ticket makes two small changes to the topic() expression that makes this 
> possible:
> 1) Changes the topic() behavior so it can operate in parallel.
> 2) Adds the initialCheckpoint parameter to the topic expression so a topic 
> can start pulling records from anywhere in the queue.
> Daemons can be sent to worker nodes that each work on processing a partition 
> of the data from the same topic. The daemon() functions natural behavior is 
> perfect for iteratively calling a topic until all records in the topic have 
> been processed.
> The sample code below pulls all records from one collection and indexes them 
> into another collection. A Transform function could be wrapped around the 
> topic() to transform the records before loading. Custom functions can also be 
> built to load the data in parallel to any outside system. 
> {code}
> parallel(
>  workerCollection, 
>  workers="2", 
>  sort="_version_ desc", 
>  daemon(
>   update(
> updateCollection, 
> batchSize=200, 
> topic(
> 

[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373129#comment-15373129
 ] 

ASF subversion and git services commented on LUCENE-7371:
-

Commit 866398bea67607bcd54331a48736e6bdb94a703d in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=866398b ]

LUCENE-7371: Better compression of values in Lucene60PointsFormat.


> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will used and SolrCloud will be treated like a giant 
message queue filled with data to be processed.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can then be sent to worker nodes that each work on processing a 
partition of the data from the same topic. The daemon() functions natural 
behavior is perfect for iteratively calling a topic until all records in the 
topic have been processed.

{code}
{code}




  was:
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will used and SolrCloud will be treated like a giant 
message queue filled with data to be processed.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can then be sent to worker nodes that each work on processing a 
partition of the data from the same topic. The daemon() functions natural 
behavior is perfect for iteratively calling a topic until all records in the 
topic have been processed.






> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will used and SolrCloud will be treated like a 
> giant message queue filled with data to be processed.
> This ticket makes two small changes to the topic() expression that makes this 
> possible:
> 1) Changes the topic() behavior so it can operate in parallel.
> 2) Adds the initialCheckpoint parameter to the topic expression so a topic 
> can start pulling records from anywhere in the queue.
> Daemons can then be sent to worker nodes that each work on processing a 
> partition of the data from the same topic. The daemon() functions natural 
> behavior is perfect for iteratively calling a topic until all records in the 
> topic have been processed.
> {code}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Description: 
It would be useful for Solr to support large scale *Extract, Transform and 
Load* use cases with streaming expressions. Instead of using MapReduce for the 
ETL, the topic expression will used and SolrCloud will be treated like a giant 
message queue filled with data to be processed.

This ticket makes two small changes to the topic() expression that makes this 
possible:

1) Changes the topic() behavior so it can operate in parallel.
2) Adds the initialCheckpoint parameter to the topic expression so a topic can 
start pulling records from anywhere in the queue.

Daemons can then be sent to worker nodes that each work on processing a 
partition of the data from the same topic. The daemon() functions natural 
behavior is perfect for iteratively calling a topic until all records in the 
topic have been processed.





  was:
Currently the topic() function won't run in parallel mode because each worker 
needs to maintain a separate set of checkpoints. The proposed solution for this 
is to append the worker ID to the topic ID, which will cause each worker to 
have it's own checkpoints.

It would be useful to support parallelizing the topic function because it will 
provide a general purpose approach for processing text in parallel across 
worker nodes.

For example this would allow a classify() function to be wrapped around a 
topic() function to classify documents in parallel across worker nodes. 

Sample syntax:

{code}
parallel(daemon(update(classify(topic(..., partitionKeys="id")
{code}

The example above would send a daemon to worker nodes that would classify all 
documents returned by the topic() function. The update function would send the 
output of classify() to a SolrCloud collection for indexing.

The partitionKeys parameter would ensure that each worker would receive a 
partition of the results returned by the topic() function. This allows the 
classify() function to be run in parallel.







> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> It would be useful for Solr to support large scale *Extract, Transform and 
> Load* use cases with streaming expressions. Instead of using MapReduce for 
> the ETL, the topic expression will used and SolrCloud will be treated like a 
> giant message queue filled with data to be processed.
> This ticket makes two small changes to the topic() expression that makes this 
> possible:
> 1) Changes the topic() behavior so it can operate in parallel.
> 2) Adds the initialCheckpoint parameter to the topic expression so a topic 
> can start pulling records from anywhere in the queue.
> Daemons can then be sent to worker nodes that each work on processing a 
> partition of the data from the same topic. The daemon() functions natural 
> behavior is perfect for iteratively calling a topic until all records in the 
> topic have been processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9240) Support parallel ETL with the topic expression

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9240:
-
Summary: Support parallel ETL with the topic expression  (was: Support 
running the topic() Streaming Expression in parallel mode.)

> Support parallel ETL with the topic expression
> --
>
> Key: SOLR-9240
> URL: https://issues.apache.org/jira/browse/SOLR-9240
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-9240.patch, SOLR-9240.patch
>
>
> Currently the topic() function won't run in parallel mode because each worker 
> needs to maintain a separate set of checkpoints. The proposed solution for 
> this is to append the worker ID to the topic ID, which will cause each worker 
> to have it's own checkpoints.
> It would be useful to support parallelizing the topic function because it 
> will provide a general purpose approach for processing text in parallel 
> across worker nodes.
> For example this would allow a classify() function to be wrapped around a 
> topic() function to classify documents in parallel across worker nodes. 
> Sample syntax:
> {code}
> parallel(daemon(update(classify(topic(..., partitionKeys="id")
> {code}
> The example above would send a daemon to worker nodes that would classify all 
> documents returned by the topic() function. The update function would send 
> the output of classify() to a SolrCloud collection for indexing.
> The partitionKeys parameter would ensure that each worker would receive a 
> partition of the results returned by the topic() function. This allows the 
> classify() function to be run in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9103) Restore ability for users to add custom Streaming Expressions

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373078#comment-15373078
 ] 

Joel Bernstein edited comment on SOLR-9103 at 7/12/16 3:32 PM:
---

We need to the restore the ability for users to add their own expressions. But 
I'm not sure the solrconfig.xml is the best place to do this.

How do people feel about adding a new *expressions.properties* file to the 
configset?


was (Author: joel.bernstein):
We need to the restore the ability for users to add their own expressions. But 
I'm not sure the solrconfig.xml is the best place to do this.

How do people feel about adding a new expressions.properties file to the 
configset?

> Restore ability for users to add custom Streaming Expressions
> -
>
> Key: SOLR-9103
> URL: https://issues.apache.org/jira/browse/SOLR-9103
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
> Attachments: HelloStream.class, SOLR-9103.PATCH, SOLR-9103.PATCH
>
>
> StreamHandler is an implicit handler. So to make it extensible, we can 
> introduce the below syntax in solrconfig.xml. 
> {code}
> 
> {code}
> This will add hello function to streamFactory of StreamHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9103) Restore ability for users to add custom Streaming Expressions

2016-07-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9103:
-
Summary: Restore ability for users to add custom Streaming Expressions  
(was: Make StreamHandler extensible)

> Restore ability for users to add custom Streaming Expressions
> -
>
> Key: SOLR-9103
> URL: https://issues.apache.org/jira/browse/SOLR-9103
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
> Attachments: HelloStream.class, SOLR-9103.PATCH, SOLR-9103.PATCH
>
>
> StreamHandler is an implicit handler. So to make it extensible, we can 
> introduce the below syntax in solrconfig.xml. 
> {code}
> 
> {code}
> This will add hello function to streamFactory of StreamHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9103) Make StreamHandler extensible

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373078#comment-15373078
 ] 

Joel Bernstein commented on SOLR-9103:
--

We need to the restore the ability for users to add their own expressions. But 
I'm not sure the solrconfig.xml is the best place to do this.

How do people feel about adding a new expressions.properties file to the 
configset?

> Make StreamHandler extensible
> -
>
> Key: SOLR-9103
> URL: https://issues.apache.org/jira/browse/SOLR-9103
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
> Attachments: HelloStream.class, SOLR-9103.PATCH, SOLR-9103.PATCH
>
>
> StreamHandler is an implicit handler. So to make it extensible, we can 
> introduce the below syntax in solrconfig.xml. 
> {code}
> 
> {code}
> This will add hello function to streamFactory of StreamHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372981#comment-15372981
 ] 

Michael McCandless commented on LUCENE-7371:


+1, great!

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7355) Leverage MultiTermAwareComponent in query parsers

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372967#comment-15372967
 ] 

ASF subversion and git services commented on LUCENE-7355:
-

Commit e92a38af90d12e51390b4307ccbe0c24ac7b6b4e in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e92a38a ]

LUCENE-7355: Add Analyzer#normalize() and use it in query parsers.


> Leverage MultiTermAwareComponent in query parsers
> -
>
> Key: LUCENE-7355
> URL: https://issues.apache.org/jira/browse/LUCENE-7355
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, 
> LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, LUCENE-7355.patch, 
> LUCENE-7355.patch
>
>
> MultiTermAwareComponent is designed to make it possible to do the right thing 
> in query parsers when in comes to analysis of multi-term queries. However, 
> since query parsers just take an analyzer and since analyzers do not 
> propagate the information about what to do for multi-term analysis, query 
> parsers cannot do the right thing out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Near real time search improvement

2016-07-12 Thread Konstantin
Hello everyone,
As far as I understand NRT requires flushing new segment to disk. Is it
correct that write cache is not searchable ?

Competing search library groonga
 - claim that they have much
smaller realtime search latency (as far as I understand via searchable
write-cache), but loading data into their index takes almost three times
longer (benchmark in blog post in Japanese
 , seems like
 wikipedia XML, I'm not sure if it's English one ).

I've created incomplete prototype of searchable write cache in my pet
project  - and it takes two times
longer to index fraction of wikipedia using same EnglishAnalyzer from
lucene.analysis (probably there is a room for optimizations). While loading
data into Lucene I didn't reuse Document instances. Searchable write-cache
was implemented as a bunch of persistent  scala's SortedMap[TermKey,
Measure](), one per logical core. Where TermKey is defined as
TermKey(termID:Int,
docID: Long)and Measure is just frequency and norm (but could be extended).

Do you think it's worth the slowdown ? If so I'm interested to learn how
this part of Lucene works while implementing this feature. However it is
unclear to me how hard would it be to change existing implementation. I
cannot wrap my head around TermHash and the whole flush process - are there
any documentation, good blog posts to read about it ?


[jira] [Commented] (SOLR-8742) HdfsDirectoryTest fails reliably after changes in LUCENE-6932

2016-07-12 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372918#comment-15372918
 ] 

Steve Rowe commented on SOLR-8742:
--

More reproducing seeds on master (from Policeman, ASF and my Jenkins): 

145163F0806CEBE
17E18886733CD59C
6B256AA5BE342DA5
9A60B5259CC9E719


> HdfsDirectoryTest fails reliably after changes in LUCENE-6932
> -
>
> Key: SOLR-8742
> URL: https://issues.apache.org/jira/browse/SOLR-8742
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> the following seed fails reliably for me on master...
> {noformat}
>[junit4]   2> 1370568 INFO  
> (TEST-HdfsDirectoryTest.testEOF-seed#[A0D22782D87E1CE2]) [] 
> o.a.s.SolrTestCaseJ4 ###Ending testEOF
>[junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=HdfsDirectoryTest 
> -Dtests.method=testEOF -Dtests.seed=A0D22782D87E1CE2 -Dtests.slow=true 
> -Dtests.locale=es-PR -Dtests.timezone=Indian/Mauritius -Dtests.asserts=true 
> -Dtests.file.encoding=ISO-8859-1
>[junit4] ERROR   0.13s J0 | HdfsDirectoryTest.testEOF <<<
>[junit4]> Throwable #1: java.lang.NullPointerException
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([A0D22782D87E1CE2:31B9658A9A5ABA9E]:0)
>[junit4]>  at 
> org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:69)
>[junit4]>  at 
> org.apache.solr.store.hdfs.HdfsDirectoryTest.testEof(HdfsDirectoryTest.java:159)
>[junit4]>  at 
> org.apache.solr.store.hdfs.HdfsDirectoryTest.testEOF(HdfsDirectoryTest.java:151)
>[junit4]>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> git bisect says this is the first commit where it started failing..
> {noformat}
> ddc65d977f920013c5fca16c8ac75ae2c6895f9d is the first bad commit
> commit ddc65d977f920013c5fca16c8ac75ae2c6895f9d
> Author: Michael McCandless 
> Date:   Thu Jan 21 17:50:28 2016 +
> LUCENE-6932: RAMInputStream now throws EOFException if you seek beyond 
> the end of the file
> 
> git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1726039 
> 13f79535-47bb-0310-9956-ffa450edef68
> {noformat}
> ...which seems remarkable relevant and likely to indicate a problem that 
> needs fixed in the HdfsDirectory code (or perhaps just the test)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-07-12 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372906#comment-15372906
 ] 

Joel Bernstein commented on SOLR-8593:
--

Hi Julian!

Thanks for the offer to help out. [~risdenk] and I are very interested in using 
Calcite to power Solr's Parallel SQL engine so we can use Calcites awesome 
optimizer.

Kevin has been doing most of the work on this but I will be helping out more 
following the next Solr release. 

I think I our biggest struggle has been understanding how to apply the rules 
properly to push-down distributed joins and aggregations. Solr supports fast 
MapReduce shuffling, distributed joins and also has mature faceting analytics 
so we'd like to take advantage of all this power from the SQL interface.



> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7371:
-
Attachment: LUCENE-7371.patch

Good point, here is an updated patch. I am getting the following indexing 
times, which suggests that it does not hurt:
{noformat}
278.064423505 sec to index part 0 // master
270.492947789 sec to index part 0 // patch
{noformat}

The index size is also unchanged, which was expected since the previous 
heuristic should be equivalent with dense data.

bq. Also, I noticed TestBackwardsCompatibility seems not to test points! I'll 
go fix that ...

Ouch, good catch. Thanks!

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3407 - Failure!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3407/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  
org.apache.solr.common.cloud.TestCollectionStateWatchers.testSimpleCollectionWatch

Error Message:


Stack Trace:
java.util.concurrent.TimeoutException
at 
__randomizedtesting.SeedInfo.seed([F374360139312322:AE4FF9717E3CBC1C]:0)
at 
org.apache.solr.common.cloud.ZkStateReader.waitForState(ZkStateReader.java:1225)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.waitForState(CloudSolrClient.java:630)
at 
org.apache.solr.common.cloud.TestCollectionStateWatchers.testSimpleCollectionWatch(TestCollectionStateWatchers.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 13144 lines...]
   [junit4] Suite: org.apache.solr.common.cloud.TestCollectionStateWatchers
   [junit4]   2> Creating dataDir: 

[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-07-12 Thread Henrik (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372851#comment-15372851
 ] 

Henrik commented on SOLR-8621:
--

I also have this, by the way:
{code}
5
{code}

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * Lucene's SortingMergePolicy can be configured in Solr (with SOLR-5730)
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5 deprecates (but maintains)  support
> * SOLR-8668 in solr 6.0(\?) will remove  support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-07-12 Thread Henrik (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372847#comment-15372847
 ] 

Henrik commented on SOLR-8621:
--

Hi [~cpoerschke] and others,

I couldn't find an upgrade document for {{mergeFactor}}, so I'll just ask here. 
 This is my current config:
{code}
6
8
{code}
If want the exact same behaviour as I currently have, how should it look in 
Solr6?  I can't find mention of {{maxIndexingThreads}} in 
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig and 
I haven't figured out if the "old" policy is {{TieredMergePolicyFactory}} or 
{{LogMergePolicy}}.  

Thanks.

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * Lucene's SortingMergePolicy can be configured in Solr (with SOLR-5730)
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5 deprecates (but maintains)  support
> * SOLR-8668 in solr 6.0(\?) will remove  support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7347) Remove queryNorm and coords

2016-07-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7347.
--
   Resolution: Fixed
Fix Version/s: master (7.0)

Both queryNorm and coords are now gone in master:
 - disjunctions just sum up the scores of their matching sub queries, 
 - boosts are applied as multiplicative factors to the scores.

> Remove queryNorm and coords
> ---
>
> Key: LUCENE-7347
> URL: https://issues.apache.org/jira/browse/LUCENE-7347
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: master (7.0)
>
>
> These two features are specific to TF-IDF and introduce some complexity (see 
> eg. handling of coords in BooleanWeight) and bugs/corner-cases (see eg. how 
> taking the query norm into account causes scoring challenges on LUCENE-7337).
> Since we made BM25 the default in 6.0, I propose that we remove these 
> TF-IDF-specific features in 7.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7368) Remove queryNorm

2016-07-12 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7368.
--
Resolution: Fixed

> Remove queryNorm
> 
>
> Key: LUCENE-7368
> URL: https://issues.apache.org/jira/browse/LUCENE-7368
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: master (7.0)
>
> Attachments: LUCENE-7368.patch
>
>
> Splitting LUCENE-7347 into smaller tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7368) Remove queryNorm

2016-07-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372813#comment-15372813
 ] 

ASF subversion and git services commented on LUCENE-7368:
-

Commit 5def78ba101dd87261c787dc865979769c4b58e4 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5def78b ]

LUCENE-7368: Remove queryNorm.


> Remove queryNorm
> 
>
> Key: LUCENE-7368
> URL: https://issues.apache.org/jira/browse/LUCENE-7368
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: master (7.0)
>
> Attachments: LUCENE-7368.patch
>
>
> Splitting LUCENE-7347 into smaller tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7372) factor out a org.apache.lucene.search.FilterWeight class

2016-07-12 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372808#comment-15372808
 ] 

Adrien Grand commented on LUCENE-7372:
--

Building your own query/weight is expert, so I'm more in favour of internal. 
While we want the API to consume queries to be stable, since this is what 
Lucene is about, I don't see it as a goal as far as implementing custom 
queries/weights is concerned.

Regarding the other Filter* classes, the annotation is indeed not used 
consistently. See also this comment: 
https://issues.apache.org/jira/browse/LUCENE-7123?focusedCommentId=15204103=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15204103.

> factor out a org.apache.lucene.search.FilterWeight class
> 
>
> Key: LUCENE-7372
> URL: https://issues.apache.org/jira/browse/LUCENE-7372
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-7372.patch, LUCENE-7372.patch, LUCENE-7372.patch
>
>
> * {{FilterWeight}} to delegate method implementations to the {{Weight}} that 
> it wraps
> * exception: no delegating for the {{bulkScorer}} method implementation since 
> currently not all FilterWeights implement/override that default method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7371) BKDReader could compress values better

2016-07-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372803#comment-15372803
 ] 

Michael McCandless commented on LUCENE-7371:


This is a nice optimization!  Patch looks good!

The {{BKDWriter}} change to pick which dimension to apply the run-length coding 
to is best effort right?  Because, you could have a dim with fewer unique 
leading suffix bytes, but a larger delta between first and last values?  But it 
would take quite a bit more work at indexing time to figure it out ... maybe 
add a comment explaining this tradeoff?  It seems likely the "min delta" 
approach should work well in practice, but have you tried with the 
slow-but-correct approach to verify?

Also, I noticed {{TestBackwardsCompatibility}} seems not to test points!  I'll 
go fix that ...

> BKDReader could compress values better
> --
>
> Key: LUCENE-7371
> URL: https://issues.apache.org/jira/browse/LUCENE-7371
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7371.patch, LUCENE-7371.patch
>
>
> For compressing values, BKDReader only relies on shared prefixes in a block. 
> We could probably easily do better. For instance there are only 256 possible 
> values for the first byte of the dimension that the values are sorted by, yet 
> we use a block size of 1024. So by using something simple like run-length 
> compression we could save 6 bits per value on average.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Windows (64bit/jdk1.8.0_92) - Build # 316 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/316/
Java: 64bit/jdk1.8.0_92 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.cloud.TestLocalFSCloudBackupRestore.test

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at 
__randomizedtesting.SeedInfo.seed([5F65CF7DB795ACB4:D731F0A71969C14C]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.testBackupAndRestore(AbstractCloudBackupRestoreTestCase.java:209)
at 
org.apache.solr.cloud.AbstractCloudBackupRestoreTestCase.test(AbstractCloudBackupRestoreTestCase.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 12153 lines...]
   

[JENKINS] Lucene-Solr-SmokeRelease-master - Build # 541 - Failure

2016-07-12 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/541/

No tests ran.

Build Log:
[...truncated 40562 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist
 [copy] Copying 476 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist/lucene
 [copy] Copying 245 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist/solr
   [smoker] Java 1.8 
JAVA_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.8
   [smoker] NOTE: output encoding is UTF-8
   [smoker] 
   [smoker] Load release URL 
"file:/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist/"...
   [smoker] 
   [smoker] Test Lucene...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.2 MB in 0.02 sec (9.8 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download lucene-7.0.0-src.tgz...
   [smoker] 29.8 MB in 0.03 sec (916.1 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-7.0.0.tgz...
   [smoker] 64.3 MB in 0.07 sec (923.6 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-7.0.0.zip...
   [smoker] 74.9 MB in 0.08 sec (910.9 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack lucene-7.0.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.8...
   [smoker]   got 6022 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-7.0.0.zip...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.8...
   [smoker]   got 6022 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-7.0.0-src.tgz...
   [smoker] make sure no JARs/WARs in src dist...
   [smoker] run "ant validate"
   [smoker] run tests w/ Java 8 and testArgs='-Dtests.slow=false'...
   [smoker] test demo with 1.8...
   [smoker]   got 222 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] generate javadocs w/ Java 8...
   [smoker] 
   [smoker] Crawl/parse...
   [smoker] 
   [smoker] Verify...
   [smoker]   confirm all releases have coverage in TestBackwardsCompatibility
   [smoker] find all past Lucene releases...
   [smoker] run TestBackwardsCompatibility..
   [smoker] success!
   [smoker] 
   [smoker] Test Solr...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.2 MB in 0.00 sec (92.5 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download solr-7.0.0-src.tgz...
   [smoker] 39.2 MB in 0.37 sec (107.2 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-7.0.0.tgz...
   [smoker] 137.2 MB in 1.54 sec (89.1 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-7.0.0.zip...
   [smoker] 145.9 MB in 1.92 sec (75.8 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack solr-7.0.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] unpack lucene-7.0.0.tgz...
   [smoker]   **WARNING**: skipping check of 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar:
 it has javax.* classes
   [smoker]   **WARNING**: skipping check of 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar:
 it has javax.* classes
   [smoker] copying unpacked distribution for Java 8 ...
   [smoker] test solr example w/ Java 8...
   [smoker]   start Solr instance 
(log=/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0-java8/solr-example.log)...
   [smoker] No process found for Solr node running on port 8983
   [smoker]   Running techproducts example on port 8983 from 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0-java8
   [smoker] Creating Solr home directory 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0-java8/example/techproducts/solr
   [smoker] 
   [smoker] Starting up Solr on port 8983 using command:
   [smoker] bin/solr start -p 8983 -s "example/techproducts/solr"
   [smoker] 
   [smoker] Waiting up to 30 seconds to see Solr running on port 8983 [|]  
 [/]   [-]   [\]   [|]   [/]   [-]   
[\]   [|]  

[JENKINS] Lucene-Solr-master-Linux (32bit/jdk1.8.0_92) - Build # 17226 - Failure!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17226/
Java: 32bit/jdk1.8.0_92 -server -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.handler.TestReqParamsAPI.test

Error Message:
Could not get expected value  'first' for path 
'response/params/x/_appends_/add' full output: {   "responseHeader":{ 
"status":0, "QTime":0},   "response":{ "znodeVersion":3, "params":{ 
  "x":{ "a":"A val", "b":"B val", "":{"v":0}},  
 "y":{ "p":"P val", "q":"Q val", "":{"v":2},  from 
server:  http://127.0.0.1:41492/collection1

Stack Trace:
java.lang.AssertionError: Could not get expected value  'first' for path 
'response/params/x/_appends_/add' full output: {
  "responseHeader":{
"status":0,
"QTime":0},
  "response":{
"znodeVersion":3,
"params":{
  "x":{
"a":"A val",
"b":"B val",
"":{"v":0}},
  "y":{
"p":"P val",
"q":"Q val",
"":{"v":2},  from server:  http://127.0.0.1:41492/collection1
at 
__randomizedtesting.SeedInfo.seed([974E4D88FA16FC9F:1F1A725254EA9167]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestSolrConfigHandler.testForResponseElement(TestSolrConfigHandler.java:481)
at 
org.apache.solr.handler.TestReqParamsAPI.testReqParams(TestReqParamsAPI.java:230)
at 
org.apache.solr.handler.TestReqParamsAPI.test(TestReqParamsAPI.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[JENKINS] Lucene-Solr-master-Windows (32bit/jdk1.8.0_92) - Build # 5979 - Still Failing!

2016-07-12 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/5979/
Java: 32bit/jdk1.8.0_92 -client -XX:+UseConcMarkSweepGC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.cloud.CdcrVersionReplicationTest

Error Message:
ObjectTracker found 1 object(s) that were not released!!! [InternalHttpClient]

Stack Trace:
java.lang.AssertionError: ObjectTracker found 1 object(s) that were not 
released!!! [InternalHttpClient]
at __randomizedtesting.SeedInfo.seed([FB5C46903F5CD1F5]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNull(Assert.java:551)
at 
org.apache.solr.SolrTestCaseJ4.teardownTestCases(SolrTestCaseJ4.java:257)
at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:834)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)


FAILED:  junit.framework.TestSuite.org.apache.solr.util.TestSolrCLIRunExample

Error Message:
ObjectTracker found 3 object(s) that were not released!!! 
[MockDirectoryWrapper, MockDirectoryWrapper, SolrCore]

Stack Trace:
java.lang.AssertionError: ObjectTracker found 3 object(s) that were not 
released!!! [MockDirectoryWrapper, MockDirectoryWrapper, SolrCore]
at __randomizedtesting.SeedInfo.seed([FB5C46903F5CD1F5]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNull(Assert.java:551)
at 
org.apache.solr.SolrTestCaseJ4.teardownTestCases(SolrTestCaseJ4.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:834)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[jira] [Comment Edited] (LUCENE-7372) factor out a org.apache.lucene.search.FilterWeight class

2016-07-12 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372633#comment-15372633
 ] 

Christine Poerschke edited comment on LUCENE-7372 at 7/12/16 10:02 AM:
---


bq. Can you add @lucene.internal to the class javadocs ...

Just to confirm, {{@lucene.internal}} or {{@lucene.experimental}} or perhaps 
both?

I notice that {{FilterCollector}} and {{FilterLeafCollector}} are marked 
{{@lucene.experimental}} but other 
[org.apache.lucene.search|https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/search]
 Filter classes are unmarked.

Should they ({{FilteredDocIdSetIterator}}, {{FilterScorer}}, {{FilterSpans}}) 
be marked also and if so (outside the scope of this ticket) as what?

Similar question would apply to 
[org.apache.lucene.index|https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/index]
 Filter classes e.g. {{FilterCodecReader}}, {{FilterDirectoryReader}}, 
{{FilteredTermsEnum}}, {{FilterLeafReader}} and possibly other classes too.


was (Author: cpoerschke):


bq. Can you add @lucene.internal to the class javadocs ...

Just to confirm, {{@lucene.internal}} or {{@lucene.experimental}} or perhaps 
both?

I notice that {{FilterCollector}} and {{FilterLeafCollector}} are marked 
{{@lucene.experimental}} but other 
[org.apache.lucene.search|https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/search]
 Filter classes are unmarked.

Should they ({{FilteredDocIdSetIterator}}, {{FilterLeafCollector}}, 
{{FilterScorer}}, {{FilterSpans}}) be marked also and if so (outside the scope 
of this ticket) as what?

Similar question would apply to 
[org.apache.lucene.index|https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/index]
 Filter classes e.g. {{FilterCodecReader}}, {{FilterDirectoryReader}}, 
{{FilteredTermsEnum}}, {{FilterLeafReader}} and possibly other classes too.

> factor out a org.apache.lucene.search.FilterWeight class
> 
>
> Key: LUCENE-7372
> URL: https://issues.apache.org/jira/browse/LUCENE-7372
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-7372.patch, LUCENE-7372.patch, LUCENE-7372.patch
>
>
> * {{FilterWeight}} to delegate method implementations to the {{Weight}} that 
> it wraps
> * exception: no delegating for the {{bulkScorer}} method implementation since 
> currently not all FilterWeights implement/override that default method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7372) factor out a org.apache.lucene.search.FilterWeight class

2016-07-12 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372633#comment-15372633
 ] 

Christine Poerschke commented on LUCENE-7372:
-



bq. Can you add @lucene.internal to the class javadocs ...

Just to confirm, {{@lucene.internal}} or {{@lucene.experimental}} or perhaps 
both?

I notice that {{FilterCollector}} and {{FilterLeafCollector}} are marked 
{{@lucene.experimental}} but other 
[org.apache.lucene.search|https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/search]
 Filter classes are unmarked.

Should they ({{FilteredDocIdSetIterator}}, {{FilterLeafCollector}}, 
{{FilterScorer}}, {{FilterSpans}}) be marked also and if so (outside the scope 
of this ticket) as what?

Similar question would apply to 
[org.apache.lucene.index|https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/index]
 Filter classes e.g. {{FilterCodecReader}}, {{FilterDirectoryReader}}, 
{{FilteredTermsEnum}}, {{FilterLeafReader}} and possibly other classes too.

> factor out a org.apache.lucene.search.FilterWeight class
> 
>
> Key: LUCENE-7372
> URL: https://issues.apache.org/jira/browse/LUCENE-7372
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-7372.patch, LUCENE-7372.patch, LUCENE-7372.patch
>
>
> * {{FilterWeight}} to delegate method implementations to the {{Weight}} that 
> it wraps
> * exception: no delegating for the {{bulkScorer}} method implementation since 
> currently not all FilterWeights implement/override that default method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene Index accelerator interface

2016-07-12 Thread Konstantin
Hello, flushing write buffer (term hashes) to disk is CPU intensive
operation, some part of it might be quite easy to move to GPUs (I think it
involves sorting). Other important part is encoding postings lists
(integers), it has been shown that SIMD instructions can be handy
for this task -
thus I think it can be moved to GPU as well, but decoding probably should
stay on CPU.

 But acceleration of text analysis seems much harder to me - it probably
contains  a lot of branching, which is bad for GPU utilization.

I'm new to Lucene internals, I might be horribly wrong )
12 Июл 2016 г. 1:30 пользователь "Beercandyman"
 написал:

> I’ve been working on writing an indexer that runs in an FPGA. I find FPGAs
> have a hard time doing functions that are very easy on a CPU. When I think
> about it the same is true of GPUs. I want to propose some kind of
> accelerator interface into the indexing code.
>
>
>
> Here is a start to a proposal.
>
> http://bit.ly/29EOSnW
>
>
>
> I’m wondering if anyone thinks this idea has merit and if anyone want to
> work on it with me?
>
>
>
> Cheers
>
>
>
> Steve
>
>
>
>
>


[JENKINS] Lucene-Solr-SmokeRelease-6.x - Build # 109 - Still Failing

2016-07-12 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-6.x/109/

No tests ran.

Build Log:
[...truncated 40569 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/dist
 [copy] Copying 476 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/dist/lucene
 [copy] Copying 245 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/dist/solr
   [smoker] Java 1.8 
JAVA_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/latest1.8
   [smoker] NOTE: output encoding is UTF-8
   [smoker] 
   [smoker] Load release URL 
"file:/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/dist/"...
   [smoker] 
   [smoker] Test Lucene...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.2 MB in 0.01 sec (16.9 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download lucene-6.2.0-src.tgz...
   [smoker] 29.9 MB in 0.04 sec (831.4 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-6.2.0.tgz...
   [smoker] 64.4 MB in 0.08 sec (826.5 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-6.2.0.zip...
   [smoker] 75.0 MB in 0.09 sec (818.7 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack lucene-6.2.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.8...
   [smoker]   got 6036 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-6.2.0.zip...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.8...
   [smoker]   got 6036 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-6.2.0-src.tgz...
   [smoker] make sure no JARs/WARs in src dist...
   [smoker] run "ant validate"
   [smoker] run tests w/ Java 8 and testArgs='-Dtests.slow=false'...
   [smoker] test demo with 1.8...
   [smoker]   got 224 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] generate javadocs w/ Java 8...
   [smoker] 
   [smoker] Crawl/parse...
   [smoker] 
   [smoker] Verify...
   [smoker]   confirm all releases have coverage in TestBackwardsCompatibility
   [smoker] find all past Lucene releases...
   [smoker] run TestBackwardsCompatibility..
   [smoker] success!
   [smoker] 
   [smoker] Test Solr...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.2 MB in 0.03 sec (4.7 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download solr-6.2.0-src.tgz...
   [smoker] 39.2 MB in 2.00 sec (19.7 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-6.2.0.tgz...
   [smoker] 137.3 MB in 5.29 sec (26.0 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-6.2.0.zip...
   [smoker] 146.0 MB in 4.62 sec (31.6 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack solr-6.2.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] unpack lucene-6.2.0.tgz...
   [smoker]   **WARNING**: skipping check of 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/tmp/unpack/solr-6.2.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar:
 it has javax.* classes
   [smoker]   **WARNING**: skipping check of 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/tmp/unpack/solr-6.2.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar:
 it has javax.* classes
   [smoker] copying unpacked distribution for Java 8 ...
   [smoker] test solr example w/ Java 8...
   [smoker]   start Solr instance 
(log=/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/tmp/unpack/solr-6.2.0-java8/solr-example.log)...
   [smoker] No process found for Solr node running on port 8983
   [smoker]   Running techproducts example on port 8983 from 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/tmp/unpack/solr-6.2.0-java8
   [smoker] Creating Solr home directory 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-6.x/lucene/build/smokeTestRelease/tmp/unpack/solr-6.2.0-java8/example/techproducts/solr
   [smoker] 
   [smoker] Starting up Solr on port 8983 using command:
   [smoker] bin/solr start -p 8983 -s "example/techproducts/solr"
   [smoker] 
   [smoker] Waiting up to 30 seconds to see Solr running on port 8983 [|]  
 [/]   [-]   [\]   [|]   [/]   [-]   
[\]   [|]   [/]   [-]  
   [smoker] 

[jira] [Created] (SOLR-9297) UI encoding issue while displaying segments that a searcher is opened on

2016-07-12 Thread Varun Thacker (JIRA)
Varun Thacker created SOLR-9297:
---

 Summary: UI encoding issue while displaying segments that a 
searcher is opened on
 Key: SOLR-9297
 URL: https://issues.apache.org/jira/browse/SOLR-9297
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Varun Thacker
Assignee: Varun Thacker
Priority: Minor
 Fix For: 6.2


Steps to reproduce

1. Start solr {{bin/solr start -e techproducts}}
2. Go to Core Selector -> techproducts -> Plugins/Stats 

Expand the {{Searcher@}} and {{searcher}} dropdown.  The {{reader}} and 
{{readerDir}} row doesn't encode the underscore correctly.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7372) factor out a org.apache.lucene.search.FilterWeight class

2016-07-12 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372572#comment-15372572
 ] 

Adrien Grand commented on LUCENE-7372:
--

Can you add {{@lucene.internal}} to the class javadocs and add a comment to the 
constructor that takes a Query parameter to be explicit about the fact that 
this query should be rewritten? Otherwise +1 to push.

> factor out a org.apache.lucene.search.FilterWeight class
> 
>
> Key: LUCENE-7372
> URL: https://issues.apache.org/jira/browse/LUCENE-7372
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-7372.patch, LUCENE-7372.patch, LUCENE-7372.patch
>
>
> * {{FilterWeight}} to delegate method implementations to the {{Weight}} that 
> it wraps
> * exception: no delegating for the {{bulkScorer}} method implementation since 
> currently not all FilterWeights implement/override that default method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >