[jira] [Commented] (LUCENE-7560) Can we make QueryBuilder.createFieldQuery un-final?

2016-11-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669760#comment-15669760
 ] 

Tommaso Teofili commented on LUCENE-7560:
-

+1

> Can we make QueryBuilder.createFieldQuery un-final?
> ---
>
> Key: LUCENE-7560
> URL: https://issues.apache.org/jira/browse/LUCENE-7560
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>
> It's marked final, I assume because we want people who customize their query 
> parsers to only override {{newXXXQuery}} instead.
> But for deeper query parser customization, like using exploring consuming a 
> graph and creating a {{TermAutomatonQuery}}, or a union of {{PhraseQuery}}, 
> etc., it is not possible today and one must fork the class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

2016-11-15 Thread Tommaso Teofili (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669749#comment-15669749
 ] 

Tommaso Teofili commented on LUCENE-6664:
-

{quote}
 I'm proposing that we make it possible for query-time position graphs to work 
correctly, so multi-token synonyms are no longer buggy, and I think this is a 
good way to make that happen.
{quote}

+1 

> Replace SynonymFilter with SynonymGraphFilter
> -
>
> Key: LUCENE-6664
> URL: https://issues.apache.org/jira/browse/LUCENE-6664
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-6664.patch, LUCENE-6664.patch, LUCENE-6664.patch, 
> LUCENE-6664.patch, usa.png, usa_flat.png
>
>
> Spinoff from LUCENE-6582.
> I created a new SynonymGraphFilter (to replace the current buggy
> SynonymFilter), that produces correct graphs (does no "graph
> flattening" itself).  I think this makes it simpler.
> This means you must add the FlattenGraphFilter yourself, if you are
> applying synonyms during indexing.
> Index-time syn expansion is a necessarily "lossy" graph transformation
> when multi-token (input or output) synonyms are applied, because the
> index does not store {{posLength}}, so there will always be phrase
> queries that should match but do not, and then phrase queries that
> should not match but do.
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> goes into detail about this.
> However, with this new SynonymGraphFilter, if instead you do synonym
> expansion at query time (and don't do the flattening), and you use
> TermAutomatonQuery (future: somehow integrated into a query parser),
> or maybe just "enumerate all paths and make union of PhraseQuery", you
> should get 100% correct matches (not sure about "proper" scoring
> though...).
> This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9771) Resolv Variables in DIH when using encryptKeyFile.

2016-11-15 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669624#comment-15669624
 ] 

Mikhail Khludnev commented on SOLR-9771:


Giving 
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler
 you might be able to do 
{code}
...  
{code}
And then propagate the value via solrconf.xml
{code}

  
  ...
${db.passwdkey}
  

{code} 

Then you'd be able to pass it via -Ddb.passwdkey=... Does it work for you? 


> Resolv Variables in DIH when using encryptKeyFile.
> --
>
> Key: SOLR-9771
> URL: https://issues.apache.org/jira/browse/SOLR-9771
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3
>Reporter: Bill Bell
>
> I would like to use a variable like ${db.passwdkey} for password when using 
> encryptKeyFile in various DIH files.
> -Ddb.passwdkey="U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o="
> Please backport to 5.5.3
> This does not appear to work when used in DIH below.
>  url="jdbc:oracle:thin:@//hostname:port/SID" user="db_username" 
> 
> 
> password="U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o=" 
> 
> encryptKeyFile="/location/of/encryptionkey"
> />
> {{{
>  url="jdbc:oracle:thin:@//hostname:port/SID" user="db_username" 
> 
> 
> password=${solr.passkey} 
> 
> encryptKeyFile="/location/of/encryptionkey"
> }}}
> />



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9774) Delta indexing with child documents with help of cacheImpl="SortedMapBackedCache"

2016-11-15 Thread Aniket Khare (JIRA)
Aniket Khare created SOLR-9774:
--

 Summary: Delta indexing with child documents with help of 
cacheImpl="SortedMapBackedCache"
 Key: SOLR-9774
 URL: https://issues.apache.org/jira/browse/SOLR-9774
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - DataImportHandler, Data-driven Schema
Affects Versions: 6.1
Reporter: Aniket Khare


Hi,

I am using solr DIH for indexing the Parent-Child relation data and using 
cacheImpl="SortedMapBackedCache".
For Full data indexinf I am using command clean="true" and for delta I am using 
command full-import and clean="false".
So the same queries are being executed for fulland delta and indexing working 
properly.
The issue which we are facing is where for a perticuler parent document, there 
not a single child document and we are adding new child document.
Following are the steps to reproduce the issue.

1. Add Child document to an existing parent document which is not having empty 
child document.
2. Once the child document is added with delta indexing, try to modify the 
parent document and run delta indexing again
3. After the delta indexing is completed, I can see the modified child 
documents showing in Solr DIH page in debug mode. But the it is not getting 
updated in Solr collection.

I am using data config as below as below.

  

  
  
  
  


  
 


  

  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-6.x - Build # 200 - Still unstable

2016-11-15 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-6.x/200/

4 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.SolrCloudExampleTest

Error Message:
ObjectTracker found 5 object(s) that were not released!!! [TransactionLog, 
MDCAwareThreadPoolExecutor, MockDirectoryWrapper, MockDirectoryWrapper, 
RawDirectoryWrapper] 
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:43)
  at org.apache.solr.update.TransactionLog.(TransactionLog.java:188)  at 
org.apache.solr.update.UpdateLog.newTransactionLog(UpdateLog.java:344)  at 
org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:859)  at 
org.apache.solr.update.UpdateLog.add(UpdateLog.java:428)  at 
org.apache.solr.update.UpdateLog.add(UpdateLog.java:415)  at 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:299)
  at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:211)
  at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
  at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)
  at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:957)
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1112)
  at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:738)
  at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
  at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)  
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
  at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
  at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:298)  
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
  at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:263)  
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:181)  
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
  at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
  at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)  
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
  at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
  at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2210)  at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)  at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)  at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
  at 

[JENKINS-EA] Lucene-Solr-master-Linux (64bit/jdk-9-ea+140) - Build # 18294 - Unstable!

2016-11-15 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18294/
Java: 64bit/jdk-9-ea+140 -XX:+UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  
org.apache.solr.handler.component.SpellCheckComponentTest.testNumericQuery

Error Message:
List size mismatch @ spellcheck/suggestions

Stack Trace:
java.lang.RuntimeException: List size mismatch @ spellcheck/suggestions
at 
__randomizedtesting.SeedInfo.seed([A9E122609411E368:A2CD75A00BD85FC7]:0)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:900)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:847)
at 
org.apache.solr.handler.component.SpellCheckComponentTest.testNumericQuery(SpellCheckComponentTest.java:154)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:535)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(java.base@9-ea/Thread.java:843)




Build Log:
[...truncated 11770 lines...]
   [junit4] Suite: 

[jira] [Commented] (SOLR-9252) Feature selection and logistic regression on text

2016-11-15 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669357#comment-15669357
 ] 

Joel Bernstein commented on SOLR-9252:
--

[~caomanhdat], added an improved test case which I was planning to commit, but 
haven't gotten to it yet. We could resolve this ticket and create a new ticket 
with the the lastest patch as a starting point.

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search, SolrCloud, SolrJ
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
>  Labels: Streaming
> Fix For: 6.2
>
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9299-1.patch
>
>
> This ticket adds two new streaming expressions: *features* and *train*
> These two functions work together to train a logistic regression model on 
> text, from a training set stored in a SolrCloud collection.
> The syntax is as follows:
> {code}
> train(collection1, q="*:*",
>   features(collection1, 
>q="*:*",  
>field="body", 
>outcome="out_i", 
>positiveLabel=1, 
>numTerms=100),
>   field="body",
>   outcome="out_i",
>   maxIterations=100)
> {code}
> The *features* function extracts the feature terms from a training set using 
> *information gain* to score the terms. 
> http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf
> The *train* function uses the extracted features to train a logistic 
> regression model on a text field in the training set.
> For both *features* and *train* the training set is defined by a query. The 
> doc vectors in the *train* function use tf-idf to represent the terms in the 
> document. The idf is calculated for the specific training set, allowing 
> multiple training sets to be stored in the same collection without polluting 
> the idf. 
> In the *train* function a batch gradient descent approach is used to 
> iteratively train the model.
> Both the *features* and the *train* function are embedded in Solr using the 
> AnalyticsQuery framework. So only the model is transported across the network 
> with each iteration.
> Both the features and the models can be stored in a SolrCloud collection. 
> Using this approach Solr can hold millions of models which can be selectively 
> deployed. For example a model could be trained for each user, to personalize 
> ranking and recommendations.
> Below is the final iteration of a model trained on the Enron Ham/Spam 
> dataset. The model includes the terms and their idfs and weights as well as a 
> classification evaluation describing the accuracy of model on the training 
> set. 
> {code}
> {
>   "idfs_ds": [1.2627703388716238, 1.2043595767152093, 
> 1.3886172425360304, 1.5488587854881268, 1.6127302558747882, 
> 2.1359177807201526, 1.514866246141212, 1.7375701403808523, 
> 1.6166175299631897, 1.756428159015249, 1.7929202354640175, 
> 1.2834893120635762, 1.899442866302021, 1.8639061320252337, 
> 1.7631697575821685, 1.6820002892260415, 1.4411352768194767, 
> 2.103708877350535, 1.2225773869965861, 2.208893321170597, 1.878981794430681, 
> 2.043737027506736, 2.2819184561854864, 2.3264563106163885, 
> 1.9336117619172708, 2.0467265663551024, 1.7386696457142692, 
> 2.468795829515302, 2.069437610615317, 2.6294363202479327, 3.7388303845193307, 
> 2.5446615802900157, 1.7430797961918219, 3.0787440662202736, 
> 1.9579702057493114, 2.289523055570706, 1.5362003886162032, 
> 2.7549569891263763, 3.955894889757158, 2.587435396273302, 3.945844553903657, 
> 1.003513057076781, 3.0416264032637708, 2.248395764146843, 4.018415246738492, 
> 2.2876164773001246, 3.3636289340509933, 1.2438124251270097, 
> 2.733903579928544, 3.439026951535205, 0.6709665389201712, 0.9546224358275518, 
> 2.8080115520822657, 2.477970205791343, 2.2631561797299637, 
> 3.2378087608499606, 0.36177021415584676, 4.1083634834014315, 
> 4.120197941048435, 2.471081544796158, 2.424147775633, 2.92339362620, 
> 2.9269972337044097, 3.2987413118451183, 2.383498249003407, 4.168988105217867, 
> 2.877691472720256, 4.233526626355437, 3.8505343740993316, 2.3264563106163885, 
> 2.6429318017228174, 4.260555298743357, 3.0058372954121855, 
> 3.8688835127675283, 3.021585652380325, 3.0295538220295017, 
> 1.9620882623582288, 3.469610374907285, 3.945844553903657, 3.4821105376715167, 
> 4.3169082352944885, 2.520329479630485, 3.609372317282444, 3.070375816549757, 
> 4.220281399605417, 

[JENKINS-EA] Lucene-Solr-6.x-Linux (32bit/jdk-9-ea+140) - Build # 2191 - Still Unstable!

2016-11-15 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2191/
Java: 32bit/jdk-9-ea+140 -server -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigAliasReplication

Error Message:
expected:<1> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<1> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([C83E2AC814C1C8B:FBF00CF447A4B36D]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigAliasReplication(TestReplicationHandler.java:1331)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:535)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 

[jira] [Commented] (LUCENE-7407) Explore switching doc values to an iterator API

2016-11-15 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669305#comment-15669305
 ] 

Otis Gospodnetic commented on LUCENE-7407:
--

I had a quick look at [~yo...@apache.org]'s SOLR-9599 and then at [~jpountz]'s 
patch in LUCENE-7462 that makes the search-time work less expensive.  Last 
comment from Yonik reporting faceting regression in Solr was from October 18.  
Adrien't patch was committed on October 24.  Maybe things are working better 
for Solr now?

If not, in interest of moving forward, what do people think about Yonik's 
suggestion:
bq. Perhaps we should have both a random access API as well as an iterator API?
?

> Explore switching doc values to an iterator API
> ---
>
> Key: LUCENE-7407
> URL: https://issues.apache.org/jira/browse/LUCENE-7407
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>  Labels: docValues
> Fix For: master (7.0)
>
> Attachments: LUCENE-7407.patch
>
>
> I think it could be compelling if we restricted doc values to use an
> iterator API at read time, instead of the more general random access
> API we have today:
>   * It would make doc values disk usage more of a "you pay for what
> what you actually use", like postings, which is a compelling
> reduction for sparse usage.
>   * I think codecs could compress better and maybe speed up decoding
> of doc values, even in the non-sparse case, since the read-time
> API is more restrictive "forward only" instead of random access.
>   * We could remove {{getDocsWithField}} entirely, since that's
> implicit in the iteration, and the awkward "return 0 if the
> document didn't have this field" would go away.
>   * We can remove the annoying thread locals we must make today in
> {{CodecReader}}, and close the trappy "I accidentally shared a
> single XXXDocValues instance across threads", since an iterator is
> inherently "use once".
>   * We could maybe leverage the numerous optimizations we've done for
> postings over time, since the two problems ("iterate over doc ids
> and store something interesting for each") are very similar.
> This idea has come up many in the past, e.g. LUCENE-7253 is a recent
> example, and very early iterations of doc values started with exactly
> this ;)
> However, it's a truly enormous change, likely 7.0 only.  Or maybe we
> could have the new iterator APIs also ported to 6.x side by side with
> the deprecate existing random-access APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9597) Add setReadOnly(String ...) to ConnectionImpl

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-9597.

   Resolution: Fixed
Fix Version/s: 6.4

> Add setReadOnly(String ...) to ConnectionImpl
> -
>
> Key: SOLR-9597
> URL: https://issues.apache.org/jira/browse/SOLR-9597
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrJ
>Affects Versions: 6.2.1, master (7.0)
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9597.patch, SOLR-9597.patch
>
>
> When using OpenLink ODBC-JDBC bridge on Windows, it tries to run the method 
> ConnectionImpl.setReadOnly(String ...). The spec says that 
> setReadOnly(boolean ...) is required. This causes the bridge to fail on 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9597) Add setReadOnly(String ...) to ConnectionImpl

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669257#comment-15669257
 ] 

ASF subversion and git services commented on SOLR-9597:
---

Commit 8c7decb4c020b91e77b521f555f33865cae89a1b in lucene-solr's branch 
refs/heads/branch_6x from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8c7decb ]

SOLR-9597: Add setReadOnly(String ...) to ConnectionImpl


> Add setReadOnly(String ...) to ConnectionImpl
> -
>
> Key: SOLR-9597
> URL: https://issues.apache.org/jira/browse/SOLR-9597
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrJ
>Affects Versions: 6.2.1, master (7.0)
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: SOLR-9597.patch, SOLR-9597.patch
>
>
> When using OpenLink ODBC-JDBC bridge on Windows, it tries to run the method 
> ConnectionImpl.setReadOnly(String ...). The spec says that 
> setReadOnly(boolean ...) is required. This causes the bridge to fail on 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9597) Add setReadOnly(String ...) to ConnectionImpl

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669256#comment-15669256
 ] 

ASF subversion and git services commented on SOLR-9597:
---

Commit 012d75d36d6da8b7e5b0fb7ab0b3f25c0952833e in lucene-solr's branch 
refs/heads/master from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=012d75d ]

SOLR-9597: Add setReadOnly(String ...) to ConnectionImpl


> Add setReadOnly(String ...) to ConnectionImpl
> -
>
> Key: SOLR-9597
> URL: https://issues.apache.org/jira/browse/SOLR-9597
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrJ
>Affects Versions: 6.2.1, master (7.0)
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: SOLR-9597.patch, SOLR-9597.patch
>
>
> When using OpenLink ODBC-JDBC bridge on Windows, it tries to run the method 
> ConnectionImpl.setReadOnly(String ...). The spec says that 
> setReadOnly(boolean ...) is required. This causes the bridge to fail on 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9597) Add setReadOnly(String ...) to ConnectionImpl

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-9597:
---
Attachment: SOLR-9597.patch

Updated patch to master

> Add setReadOnly(String ...) to ConnectionImpl
> -
>
> Key: SOLR-9597
> URL: https://issues.apache.org/jira/browse/SOLR-9597
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrJ
>Affects Versions: 6.2.1, master (7.0)
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: SOLR-9597.patch, SOLR-9597.patch
>
>
> When using OpenLink ODBC-JDBC bridge on Windows, it tries to run the method 
> ConnectionImpl.setReadOnly(String ...). The spec says that 
> setReadOnly(boolean ...) is required. This causes the bridge to fail on 
> Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9252) Feature selection and logistic regression on text

2016-11-15 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669226#comment-15669226
 ] 

Kevin Risden commented on SOLR-9252:


[~joel.bernstein] - Should this ticket still be open? Looks like there were 
commits to master and branch_6x?

> Feature selection and logistic regression on text
> -
>
> Key: SOLR-9252
> URL: https://issues.apache.org/jira/browse/SOLR-9252
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search, SolrCloud, SolrJ
>Reporter: Cao Manh Dat
>Assignee: Joel Bernstein
>  Labels: Streaming
> Fix For: 6.2
>
> Attachments: SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, 
> SOLR-9252.patch, SOLR-9252.patch, SOLR-9252.patch, SOLR-9299-1.patch
>
>
> This ticket adds two new streaming expressions: *features* and *train*
> These two functions work together to train a logistic regression model on 
> text, from a training set stored in a SolrCloud collection.
> The syntax is as follows:
> {code}
> train(collection1, q="*:*",
>   features(collection1, 
>q="*:*",  
>field="body", 
>outcome="out_i", 
>positiveLabel=1, 
>numTerms=100),
>   field="body",
>   outcome="out_i",
>   maxIterations=100)
> {code}
> The *features* function extracts the feature terms from a training set using 
> *information gain* to score the terms. 
> http://www.jiliang.xyz/publication/feature_selection_for_classification.pdf
> The *train* function uses the extracted features to train a logistic 
> regression model on a text field in the training set.
> For both *features* and *train* the training set is defined by a query. The 
> doc vectors in the *train* function use tf-idf to represent the terms in the 
> document. The idf is calculated for the specific training set, allowing 
> multiple training sets to be stored in the same collection without polluting 
> the idf. 
> In the *train* function a batch gradient descent approach is used to 
> iteratively train the model.
> Both the *features* and the *train* function are embedded in Solr using the 
> AnalyticsQuery framework. So only the model is transported across the network 
> with each iteration.
> Both the features and the models can be stored in a SolrCloud collection. 
> Using this approach Solr can hold millions of models which can be selectively 
> deployed. For example a model could be trained for each user, to personalize 
> ranking and recommendations.
> Below is the final iteration of a model trained on the Enron Ham/Spam 
> dataset. The model includes the terms and their idfs and weights as well as a 
> classification evaluation describing the accuracy of model on the training 
> set. 
> {code}
> {
>   "idfs_ds": [1.2627703388716238, 1.2043595767152093, 
> 1.3886172425360304, 1.5488587854881268, 1.6127302558747882, 
> 2.1359177807201526, 1.514866246141212, 1.7375701403808523, 
> 1.6166175299631897, 1.756428159015249, 1.7929202354640175, 
> 1.2834893120635762, 1.899442866302021, 1.8639061320252337, 
> 1.7631697575821685, 1.6820002892260415, 1.4411352768194767, 
> 2.103708877350535, 1.2225773869965861, 2.208893321170597, 1.878981794430681, 
> 2.043737027506736, 2.2819184561854864, 2.3264563106163885, 
> 1.9336117619172708, 2.0467265663551024, 1.7386696457142692, 
> 2.468795829515302, 2.069437610615317, 2.6294363202479327, 3.7388303845193307, 
> 2.5446615802900157, 1.7430797961918219, 3.0787440662202736, 
> 1.9579702057493114, 2.289523055570706, 1.5362003886162032, 
> 2.7549569891263763, 3.955894889757158, 2.587435396273302, 3.945844553903657, 
> 1.003513057076781, 3.0416264032637708, 2.248395764146843, 4.018415246738492, 
> 2.2876164773001246, 3.3636289340509933, 1.2438124251270097, 
> 2.733903579928544, 3.439026951535205, 0.6709665389201712, 0.9546224358275518, 
> 2.8080115520822657, 2.477970205791343, 2.2631561797299637, 
> 3.2378087608499606, 0.36177021415584676, 4.1083634834014315, 
> 4.120197941048435, 2.471081544796158, 2.424147775633, 2.92339362620, 
> 2.9269972337044097, 3.2987413118451183, 2.383498249003407, 4.168988105217867, 
> 2.877691472720256, 4.233526626355437, 3.8505343740993316, 2.3264563106163885, 
> 2.6429318017228174, 4.260555298743357, 3.0058372954121855, 
> 3.8688835127675283, 3.021585652380325, 3.0295538220295017, 
> 1.9620882623582288, 3.469610374907285, 3.945844553903657, 3.4821105376715167, 
> 4.3169082352944885, 2.520329479630485, 3.609372317282444, 3.070375816549757, 
> 4.220281399605417, 3.985484239117, 3.6165408067610563, 3.788840805093992, 
> 4.392131656532076, 4.392131656532076, 

[jira] [Resolved] (SOLR-9729) JDBCStream improvements

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-9729.

Resolution: Fixed

> JDBCStream improvements
> ---
>
> Key: SOLR-9729
> URL: https://issues.apache.org/jira/browse/SOLR-9729
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9729.patch, SOLR-9729.patch
>
>
> JDBCStream has a few items that can be improved:
> * IOExceptions don't have the expression in the message
> * Use .equals() when comparing class names
> * Use .getColumnLabel instead of .getColumnName when working with SQL 
> ResultSet to make sure AS is properly handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9729) JDBCStream improvements

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669212#comment-15669212
 ] 

ASF subversion and git services commented on SOLR-9729:
---

Commit 8af0223812946e3d5d2bf455316065a00c3457e6 in lucene-solr's branch 
refs/heads/branch_6x from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8af0223 ]

SOLR-9729: JDBCStream improvements


> JDBCStream improvements
> ---
>
> Key: SOLR-9729
> URL: https://issues.apache.org/jira/browse/SOLR-9729
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9729.patch, SOLR-9729.patch
>
>
> JDBCStream has a few items that can be improved:
> * IOExceptions don't have the expression in the message
> * Use .equals() when comparing class names
> * Use .getColumnLabel instead of .getColumnName when working with SQL 
> ResultSet to make sure AS is properly handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9729) JDBCStream improvements

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669210#comment-15669210
 ] 

ASF subversion and git services commented on SOLR-9729:
---

Commit c20d1298d3b26482dfc46a557d9c0680ce84aaed in lucene-solr's branch 
refs/heads/master from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c20d129 ]

SOLR-9729: JDBCStream improvements


> JDBCStream improvements
> ---
>
> Key: SOLR-9729
> URL: https://issues.apache.org/jira/browse/SOLR-9729
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9729.patch, SOLR-9729.patch
>
>
> JDBCStream has a few items that can be improved:
> * IOExceptions don't have the expression in the message
> * Use .equals() when comparing class names
> * Use .getColumnLabel instead of .getColumnName when working with SQL 
> ResultSet to make sure AS is properly handled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-11-15 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669208#comment-15669208
 ] 

Steve Rowe commented on SOLR-9284:
--

My Jenkins found a reproducing seed half an hour ago (after the commits above) 
- note that I had to run the test without 
{{-Dtests.method=ensureCacheConfigurable}} to get it to reproduce: 

{noformat}
  [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest 
-Dtests.method=ensureCacheConfigurable -Dtests.seed=281E6C2B5FD2D4E1 
-Dtests.slow=true -Dtests.locale=tr-TR -Dtests.timezone=PST8PDT 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
  [junit4] ERROR   1.39s J3  | BlockDirectoryTest.ensureCacheConfigurable <<<
  [junit4]> Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
  [junit4]> at java.nio.Bits.reserveMemory(Bits.java:693)
  [junit4]> at 
java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
  [junit4]> at 
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
  [junit4]> at 
org.apache.solr.store.blockcache.BlockCache.(BlockCache.java:68)
  [junit4]> at 
org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
  [junit4]> at java.lang.Thread.run(Thread.java:745)Throwable #2: 
java.lang.NullPointerException
  [junit4]> at 
org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
{noformat}

> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9077) Streaming expressions should support collection alias

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-9077.

Resolution: Fixed

> Streaming expressions should support collection alias
> -
>
> Key: SOLR-9077
> URL: https://issues.apache.org/jira/browse/SOLR-9077
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.5.1
>Reporter: Suds
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9077.patch, SOLR-9077.patch, SOLR-9077.patch, 
> SOLR-9077.patch, SOLR-9077.patch
>
>
> Streaming expressions should support collection aliases. 
> When I tried to access collection alias I get a null pointer exception. 
> issue seems to be related to following code: clusterState.getActiveSlices 
> returns null 
> {code}
> Collection slices = clusterState.getActiveSlices(this.collection);
> {code}
> fix seems to fairly simple , clusterState.getActiveSlices can be made aware 
> of collection alias. I am not sure what will happen when we have large alias 
> which has hundred of slices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9077) Streaming expressions should support collection alias

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669172#comment-15669172
 ] 

ASF subversion and git services commented on SOLR-9077:
---

Commit e3db9f3b8a28e1de0b6fcd5cb358a948f7a23423 in lucene-solr's branch 
refs/heads/branch_6x from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3db9f3 ]

SOLR-9077: Streaming expressions should support collection alias


> Streaming expressions should support collection alias
> -
>
> Key: SOLR-9077
> URL: https://issues.apache.org/jira/browse/SOLR-9077
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.5.1
>Reporter: Suds
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9077.patch, SOLR-9077.patch, SOLR-9077.patch, 
> SOLR-9077.patch, SOLR-9077.patch
>
>
> Streaming expressions should support collection aliases. 
> When I tried to access collection alias I get a null pointer exception. 
> issue seems to be related to following code: clusterState.getActiveSlices 
> returns null 
> {code}
> Collection slices = clusterState.getActiveSlices(this.collection);
> {code}
> fix seems to fairly simple , clusterState.getActiveSlices can be made aware 
> of collection alias. I am not sure what will happen when we have large alias 
> which has hundred of slices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9077) Streaming expressions should support collection alias

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669140#comment-15669140
 ] 

ASF subversion and git services commented on SOLR-9077:
---

Commit ace423e958182aa8ad6329f5cc1dc3ca6cd877c7 in lucene-solr's branch 
refs/heads/master from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ace423e ]

SOLR-9077: Streaming expressions should support collection alias


> Streaming expressions should support collection alias
> -
>
> Key: SOLR-9077
> URL: https://issues.apache.org/jira/browse/SOLR-9077
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.5.1
>Reporter: Suds
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9077.patch, SOLR-9077.patch, SOLR-9077.patch, 
> SOLR-9077.patch, SOLR-9077.patch
>
>
> Streaming expressions should support collection aliases. 
> When I tried to access collection alias I get a null pointer exception. 
> issue seems to be related to following code: clusterState.getActiveSlices 
> returns null 
> {code}
> Collection slices = clusterState.getActiveSlices(this.collection);
> {code}
> fix seems to fairly simple , clusterState.getActiveSlices can be made aware 
> of collection alias. I am not sure what will happen when we have large alias 
> which has hundred of slices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-9773) hive

2016-11-15 Thread xuxiaoxiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xuxiaoxiao closed SOLR-9773.

Resolution: Invalid

> hive
> 
>
> Key: SOLR-9773
> URL: https://issues.apache.org/jira/browse/SOLR-9773
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Reporter: xuxiaoxiao
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9773) hive

2016-11-15 Thread xuxiaoxiao (JIRA)
xuxiaoxiao created SOLR-9773:


 Summary: hive
 Key: SOLR-9773
 URL: https://issues.apache.org/jira/browse/SOLR-9773
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Reporter: xuxiaoxiao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9666) SolrJ LukeResponse support dynamic fields

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-9666.

Resolution: Fixed

Thanks [~Fengtan]!

> SolrJ LukeResponse support dynamic fields
> -
>
> Key: SOLR-9666
> URL: https://issues.apache.org/jira/browse/SOLR-9666
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.2.1
>Reporter: Fengtan
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9666.patch, SOLR-9666.patch
>
>
> LukeRequestHandler (/admin/luke), when invoked with the show=schema 
> parameter, returns a list static fields and dynamic fields.
> For instance on my local machine 
> http://localhost:8983/solr/collection1/admin/luke?show=schema returns 
> something like this:
> {code:xml}
> 
>   ...
>   
> 
>   
> string
> I-S-OF-l
>   
>   ...
> 
> 
>   
> string
> I---OF--
>   
>   ...
> 
>   
>   ...
> 
> {code}
> However, when processing a LukeRequest in SolrJ, only static fields are 
> parsed and made available to the client application through 
> lukeResponse.getFieldInfo(). There does not seem to be a way for the client 
> application to get the dynamic fields.
> Maybe we could parse dynamic fields and make them accessible ? Possibly 
> something like this:
> {code}
> public class MyClass {
>   public static void main(String[] args) throws Exception {
> SolrClient client = new 
> HttpSolrClient("http://localhost:8983/solr/collection1;);
> LukeRequest request = new LukeRequest();
> request.setShowSchema(true);
> LukeResponse response = request.process(client);
> Map staticFields = response.getFieldInfo(); // SolrJ 
> already provides this.
> Map dynamicFields = response.getDynamicFieldInfo(); // 
> Proposed improvement.
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9666) SolrJ LukeResponse support dynamic fields

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669045#comment-15669045
 ] 

ASF subversion and git services commented on SOLR-9666:
---

Commit ead40a9e00b53620511ed243932ecaf12093aafa in lucene-solr's branch 
refs/heads/branch_6x from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ead40a9 ]

SOLR-9666: SolrJ LukeResponse support dynamic fields


> SolrJ LukeResponse support dynamic fields
> -
>
> Key: SOLR-9666
> URL: https://issues.apache.org/jira/browse/SOLR-9666
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.2.1
>Reporter: Fengtan
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9666.patch, SOLR-9666.patch
>
>
> LukeRequestHandler (/admin/luke), when invoked with the show=schema 
> parameter, returns a list static fields and dynamic fields.
> For instance on my local machine 
> http://localhost:8983/solr/collection1/admin/luke?show=schema returns 
> something like this:
> {code:xml}
> 
>   ...
>   
> 
>   
> string
> I-S-OF-l
>   
>   ...
> 
> 
>   
> string
> I---OF--
>   
>   ...
> 
>   
>   ...
> 
> {code}
> However, when processing a LukeRequest in SolrJ, only static fields are 
> parsed and made available to the client application through 
> lukeResponse.getFieldInfo(). There does not seem to be a way for the client 
> application to get the dynamic fields.
> Maybe we could parse dynamic fields and make them accessible ? Possibly 
> something like this:
> {code}
> public class MyClass {
>   public static void main(String[] args) throws Exception {
> SolrClient client = new 
> HttpSolrClient("http://localhost:8983/solr/collection1;);
> LukeRequest request = new LukeRequest();
> request.setShowSchema(true);
> LukeResponse response = request.process(client);
> Map staticFields = response.getFieldInfo(); // SolrJ 
> already provides this.
> Map dynamicFields = response.getDynamicFieldInfo(); // 
> Proposed improvement.
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-EA] Lucene-Solr-6.x-Linux (32bit/jdk-9-ea+140) - Build # 2190 - Unstable!

2016-11-15 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2190/
Java: 32bit/jdk-9-ea+140 -client -XX:+UseG1GC

2 tests failed.
FAILED:  
org.apache.solr.handler.component.SpellCheckComponentTest.testThresholdTokenFrequency

Error Message:
Path not found: /spellcheck/suggestions/[1]/suggestion

Stack Trace:
java.lang.RuntimeException: Path not found: 
/spellcheck/suggestions/[1]/suggestion
at 
__randomizedtesting.SeedInfo.seed([CFBBCB631F443CAB:451C449290AF05D0]:0)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:900)
at org.apache.solr.SolrTestCaseJ4.assertJQ(SolrTestCaseJ4.java:847)
at 
org.apache.solr.handler.component.SpellCheckComponentTest.testThresholdTokenFrequency(SpellCheckComponentTest.java:277)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:535)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(java.base@9-ea/Thread.java:843)


FAILED:  

[jira] [Commented] (SOLR-9324) Support Secure Impersonation / Proxy User for solr authentication

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668950#comment-15668950
 ] 

ASF GitHub Bot commented on SOLR-9324:
--

GitHub user hgadre opened a pull request:

https://github.com/apache/lucene-solr/pull/117

SOLR-9324: Support Secure Impersonation / Proxy User for solr authentication

A patch against branch_6x. It also includes unit test fixes applied on the 
master branch...

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hgadre/lucene-solr SOLR-9324_6x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #117


commit d23d4a424d636b893b9075968ae21edcddb3500c
Author: Gregory Chanan 
Date:   2016-07-25T18:15:48Z

SOLR-9324: Support Secure Impersonation / Proxy User for solr authentication

Conflicts:
solr/CHANGES.txt
solr/core/src/java/org/apache/solr/security/KerberosPlugin.java

commit 74b05ba4e42272571eac33609bc15777d1358827
Author: Gregory Chanan 
Date:   2016-08-06T04:04:58Z

SOLR-9324: Fix local host test assumptions

commit 40ba331403f8e7201d823ab99edecbbda9c46250
Author: Uwe Schindler 
Date:   2016-09-03T08:48:01Z

SOLR-9460: Disable test that does not work with Windows

commit 2d5afdc98eadfa9cc6862f0fa881909c62938af0
Author: Uwe Schindler 
Date:   2016-09-03T18:30:30Z

SOLR-9460: Fully fix test setup

commit 32ccf9f62190f3e867fc7edaad198020635fcd4d
Author: Hrishikesh Gadre 
Date:   2016-11-16T00:32:21Z

SOLR-9324 Fix TestSolrCloudWithSecureImpersonation#testForwarding




> Support Secure Impersonation / Proxy User for solr authentication
> -
>
> Key: SOLR-9324
> URL: https://issues.apache.org/jira/browse/SOLR-9324
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Attachments: SOLR-9324-tests.patch, SOLR-9324.patch, SOLR-9324.patch, 
> SOLR-9324.patch, SOLR-9324_branch_6x.patch, build-6025.log
>
>
> Solr should support Proxy User / Secure Impersonation for authentication, as 
> supported by hadoop 
> (http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html)
>  and supported by the hadoop AuthenticationFilter (which we use for the 
> KerberosPlugin).
> There are a number of use cases, but a common one is this:
> There is a front end for searches (say, Hue http://gethue.com/) that supports 
> its own login mechanisms.  If the cluster uses kerberos for authentication, 
> hue must have kerberos credentials for each user, which is a pain to manage.  
> Instead, hue can be allowed to impersonate known users from known machines so 
> it only needs its own kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #117: SOLR-9324: Support Secure Impersonation / Pro...

2016-11-15 Thread hgadre
GitHub user hgadre opened a pull request:

https://github.com/apache/lucene-solr/pull/117

SOLR-9324: Support Secure Impersonation / Proxy User for solr authentication

A patch against branch_6x. It also includes unit test fixes applied on the 
master branch...

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hgadre/lucene-solr SOLR-9324_6x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #117


commit d23d4a424d636b893b9075968ae21edcddb3500c
Author: Gregory Chanan 
Date:   2016-07-25T18:15:48Z

SOLR-9324: Support Secure Impersonation / Proxy User for solr authentication

Conflicts:
solr/CHANGES.txt
solr/core/src/java/org/apache/solr/security/KerberosPlugin.java

commit 74b05ba4e42272571eac33609bc15777d1358827
Author: Gregory Chanan 
Date:   2016-08-06T04:04:58Z

SOLR-9324: Fix local host test assumptions

commit 40ba331403f8e7201d823ab99edecbbda9c46250
Author: Uwe Schindler 
Date:   2016-09-03T08:48:01Z

SOLR-9460: Disable test that does not work with Windows

commit 2d5afdc98eadfa9cc6862f0fa881909c62938af0
Author: Uwe Schindler 
Date:   2016-09-03T18:30:30Z

SOLR-9460: Fully fix test setup

commit 32ccf9f62190f3e867fc7edaad198020635fcd4d
Author: Hrishikesh Gadre 
Date:   2016-11-16T00:32:21Z

SOLR-9324 Fix TestSolrCloudWithSecureImpersonation#testForwarding




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9666) SolrJ LukeResponse support dynamic fields

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668917#comment-15668917
 ] 

ASF subversion and git services commented on SOLR-9666:
---

Commit 782923b894a7eda6cc8940e83d1e8b4863d7d063 in lucene-solr's branch 
refs/heads/master from [~risdenk]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=782923b ]

SOLR-9666: SolrJ LukeResponse support dynamic fields


> SolrJ LukeResponse support dynamic fields
> -
>
> Key: SOLR-9666
> URL: https://issues.apache.org/jira/browse/SOLR-9666
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.2.1
>Reporter: Fengtan
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9666.patch, SOLR-9666.patch
>
>
> LukeRequestHandler (/admin/luke), when invoked with the show=schema 
> parameter, returns a list static fields and dynamic fields.
> For instance on my local machine 
> http://localhost:8983/solr/collection1/admin/luke?show=schema returns 
> something like this:
> {code:xml}
> 
>   ...
>   
> 
>   
> string
> I-S-OF-l
>   
>   ...
> 
> 
>   
> string
> I---OF--
>   
>   ...
> 
>   
>   ...
> 
> {code}
> However, when processing a LukeRequest in SolrJ, only static fields are 
> parsed and made available to the client application through 
> lukeResponse.getFieldInfo(). There does not seem to be a way for the client 
> application to get the dynamic fields.
> Maybe we could parse dynamic fields and make them accessible ? Possibly 
> something like this:
> {code}
> public class MyClass {
>   public static void main(String[] args) throws Exception {
> SolrClient client = new 
> HttpSolrClient("http://localhost:8983/solr/collection1;);
> LukeRequest request = new LukeRequest();
> request.setShowSchema(true);
> LukeResponse response = request.process(client);
> Map staticFields = response.getFieldInfo(); // SolrJ 
> already provides this.
> Map dynamicFields = response.getDynamicFieldInfo(); // 
> Proposed improvement.
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9666) SolrJ LukeResponse support dynamic fields

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-9666:
---
Summary: SolrJ LukeResponse support dynamic fields  (was: Extract dynamic 
fields in LukeResponse)

> SolrJ LukeResponse support dynamic fields
> -
>
> Key: SOLR-9666
> URL: https://issues.apache.org/jira/browse/SOLR-9666
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.2.1
>Reporter: Fengtan
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9666.patch, SOLR-9666.patch
>
>
> LukeRequestHandler (/admin/luke), when invoked with the show=schema 
> parameter, returns a list static fields and dynamic fields.
> For instance on my local machine 
> http://localhost:8983/solr/collection1/admin/luke?show=schema returns 
> something like this:
> {code:xml}
> 
>   ...
>   
> 
>   
> string
> I-S-OF-l
>   
>   ...
> 
> 
>   
> string
> I---OF--
>   
>   ...
> 
>   
>   ...
> 
> {code}
> However, when processing a LukeRequest in SolrJ, only static fields are 
> parsed and made available to the client application through 
> lukeResponse.getFieldInfo(). There does not seem to be a way for the client 
> application to get the dynamic fields.
> Maybe we could parse dynamic fields and make them accessible ? Possibly 
> something like this:
> {code}
> public class MyClass {
>   public static void main(String[] args) throws Exception {
> SolrClient client = new 
> HttpSolrClient("http://localhost:8983/solr/collection1;);
> LukeRequest request = new LukeRequest();
> request.setShowSchema(true);
> LukeResponse response = request.process(client);
> Map staticFields = response.getFieldInfo(); // SolrJ 
> already provides this.
> Map dynamicFields = response.getDynamicFieldInfo(); // 
> Proposed improvement.
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668824#comment-15668824
 ] 

Timothy M. Rodriguez commented on SOLR-9708:


I was suggesting instead of hl.tag.pre, but realized that's used too. No sense 
adding a third. Even though both names are not so ideal IMO

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9666) Extract dynamic fields in LukeResponse

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-9666:
---
Fix Version/s: 6.4

> Extract dynamic fields in LukeResponse
> --
>
> Key: SOLR-9666
> URL: https://issues.apache.org/jira/browse/SOLR-9666
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.2.1
>Reporter: Fengtan
>Assignee: Kevin Risden
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9666.patch, SOLR-9666.patch
>
>
> LukeRequestHandler (/admin/luke), when invoked with the show=schema 
> parameter, returns a list static fields and dynamic fields.
> For instance on my local machine 
> http://localhost:8983/solr/collection1/admin/luke?show=schema returns 
> something like this:
> {code:xml}
> 
>   ...
>   
> 
>   
> string
> I-S-OF-l
>   
>   ...
> 
> 
>   
> string
> I---OF--
>   
>   ...
> 
>   
>   ...
> 
> {code}
> However, when processing a LukeRequest in SolrJ, only static fields are 
> parsed and made available to the client application through 
> lukeResponse.getFieldInfo(). There does not seem to be a way for the client 
> application to get the dynamic fields.
> Maybe we could parse dynamic fields and make them accessible ? Possibly 
> something like this:
> {code}
> public class MyClass {
>   public static void main(String[] args) throws Exception {
> SolrClient client = new 
> HttpSolrClient("http://localhost:8983/solr/collection1;);
> LukeRequest request = new LukeRequest();
> request.setShowSchema(true);
> LukeResponse response = request.process(client);
> Map staticFields = response.getFieldInfo(); // SolrJ 
> already provides this.
> Map dynamicFields = response.getDynamicFieldInfo(); // 
> Proposed improvement.
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9666) Extract dynamic fields in LukeResponse

2016-11-15 Thread Kevin Risden (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden reassigned SOLR-9666:
--

Assignee: Kevin Risden

> Extract dynamic fields in LukeResponse
> --
>
> Key: SOLR-9666
> URL: https://issues.apache.org/jira/browse/SOLR-9666
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.2.1
>Reporter: Fengtan
>Assignee: Kevin Risden
>Priority: Minor
> Attachments: SOLR-9666.patch, SOLR-9666.patch
>
>
> LukeRequestHandler (/admin/luke), when invoked with the show=schema 
> parameter, returns a list static fields and dynamic fields.
> For instance on my local machine 
> http://localhost:8983/solr/collection1/admin/luke?show=schema returns 
> something like this:
> {code:xml}
> 
>   ...
>   
> 
>   
> string
> I-S-OF-l
>   
>   ...
> 
> 
>   
> string
> I---OF--
>   
>   ...
> 
>   
>   ...
> 
> {code}
> However, when processing a LukeRequest in SolrJ, only static fields are 
> parsed and made available to the client application through 
> lukeResponse.getFieldInfo(). There does not seem to be a way for the client 
> application to get the dynamic fields.
> Maybe we could parse dynamic fields and make them accessible ? Possibly 
> something like this:
> {code}
> public class MyClass {
>   public static void main(String[] args) throws Exception {
> SolrClient client = new 
> HttpSolrClient("http://localhost:8983/solr/collection1;);
> LukeRequest request = new LukeRequest();
> request.setShowSchema(true);
> LukeResponse response = request.process(client);
> Map staticFields = response.getFieldInfo(); // SolrJ 
> already provides this.
> Map dynamicFields = response.getDynamicFieldInfo(); // 
> Proposed improvement.
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668771#comment-15668771
 ] 

David Smiley commented on SOLR-9708:


I suggested to support _both_. 

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668766#comment-15668766
 ] 

Timothy M. Rodriguez commented on SOLR-9708:


I thought the suggestion was to use hl.tag.pre instead of hl.simple.pre?

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668761#comment-15668761
 ] 

David Smiley commented on SOLR-9708:


-1 I would hate to see new parameters when there are semantically equivalent 
ones already. 

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668715#comment-15668715
 ] 

Timothy M. Rodriguez commented on SOLR-9708:


I'm okay with hl.tag.pre/post, but it may not always be a tag.  Perhaps 
something like hl.pre.marker? or hl.pre.sigil?

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #116: fixed NPEs

2016-11-15 Thread kagan770
GitHub user kagan770 opened a pull request:

https://github.com/apache/lucene-solr/pull/116

fixed NPEs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kagan770/lucene-solr master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #116


commit 2b1a3e4c0f7c25bb7ae820e5bfbe7516551e394e
Author: rkagan 
Date:   2016-11-15T21:42:30Z

fixed NPEs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7537) Add multi valued field support to index sorting

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668388#comment-15668388
 ] 

ASF subversion and git services commented on LUCENE-7537:
-

Commit 6c3c6bc3797307efa13cae06778d41f24a26bccb in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6c3c6bc ]

LUCENE-7537: Index time sorting now supports multi-valued sorts using selectors 
(MIN, MAX, etc.)


> Add multi valued field support to index sorting
> ---
>
> Key: LUCENE-7537
> URL: https://issues.apache.org/jira/browse/LUCENE-7537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Ferenczi Jim
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7537.patch, LUCENE-7537.patch, LUCENE-7537.patch, 
> LUCENE-7537.patch, LUCENE-7537.patch
>
>
> Today index sorting can be done on single valued field through the 
> NumericDocValues (for numerics) and SortedDocValues (for strings).
> I'd like to add the ability to sort on multi valued fields. Since index 
> sorting does not accept custom comparator we could just take the minimum 
> value of each document for an ascending sort and the maximum value for a 
> descending sort.
> This way we could handle all cases instead of throwing an exception during a 
> merge when we encounter a multi valued DVs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-11-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-7526.
--
Resolution: Fixed

Committed. Thanks Tim and thanks [~mbraun688] for some internal code reviewing.

> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
> Attachments: LUCENE-7526.patch
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668358#comment-15668358
 ] 

ASF subversion and git services commented on LUCENE-7526:
-

Commit 0790c34cc555ec9b09cae04da6e61f465cfc in lucene-solr's branch 
refs/heads/branch_6x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0790c34 ]

LUCENE-7526: UnifiedHighlighter: enhance MTQ passage relevancy. 
TokenStreamFromTermVector isn't used by the UH anymore. Refactor 
AnalysisOffsetStrategy into TokenStream and MemoryIndex strategies, and related 
refactorings from that.

(cherry picked from commit 7af454a)


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
> Attachments: LUCENE-7526.patch
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7562) CompletionFieldsConsumer throws NPE on ghost fields

2016-11-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668357#comment-15668357
 ] 

Adrien Grand commented on LUCENE-7562:
--

Nevermind then, +1 to the current patch!

> CompletionFieldsConsumer throws NPE on ghost fields
> ---
>
> Key: LUCENE-7562
> URL: https://issues.apache.org/jira/browse/LUCENE-7562
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7562.patch
>
>
> If you index {{SuggestField}} for some field X, but later delete all 
> documents with that field, it can cause a ghost situation where the field 
> infos believes field X exists yet the postings do not.
> I believe this bug is the root cause of this ES issue: 
> https://github.com/elastic/elasticsearch/issues/21500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668354#comment-15668354
 ] 

ASF subversion and git services commented on LUCENE-7526:
-

Commit 7af454ad767c3a0364757d6fcf55bff9f063febe in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7af454a ]

LUCENE-7526: UnifiedHighlighter: enhance MTQ passage relevancy. 
TokenStreamFromTermVector isn't used by the UH anymore. Refactor 
AnalysisOffsetStrategy into TokenStream and MemoryIndex strategies, and related 
refactorings from that.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
> Attachments: LUCENE-7526.patch
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668334#comment-15668334
 ] 

David Smiley commented on SOLR-9708:


Also I'm on the fence if we should support {{hl.simple.pre}} 
({{HighlightParams.SIMPLE_PRE}}) and corresponding -post.  We probably should 
for a better user experience.  But support those as fallbacks, as I think the 
{{hl.tag.pre}} is a better name.

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7537) Add multi valued field support to index sorting

2016-11-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668337#comment-15668337
 ] 

Michael McCandless commented on LUCENE-7537:


[~jim.ferenczi] thanks, this looks awesome ... I'll run tests and push soon.

> Add multi valued field support to index sorting
> ---
>
> Key: LUCENE-7537
> URL: https://issues.apache.org/jira/browse/LUCENE-7537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Ferenczi Jim
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7537.patch, LUCENE-7537.patch, LUCENE-7537.patch, 
> LUCENE-7537.patch, LUCENE-7537.patch
>
>
> Today index sorting can be done on single valued field through the 
> NumericDocValues (for numerics) and SortedDocValues (for strings).
> I'd like to add the ability to sort on multi valued fields. Since index 
> sorting does not accept custom comparator we could just take the minimum 
> value of each document for an ascending sort and the maximum value for a 
> descending sort.
> This way we could handle all cases instead of throwing an exception during a 
> merge when we encounter a multi valued DVs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7562) CompletionFieldsConsumer throws NPE on ghost fields

2016-11-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668325#comment-15668325
 ] 

Michael McCandless commented on LUCENE-7562:


Alas, that won't work (well) because `CompletionPostingsFormat` is not general 
purpose, e.g. it requires fields are indexed with positions/payloads, as 
{{SuggestField}} does ...

> CompletionFieldsConsumer throws NPE on ghost fields
> ---
>
> Key: LUCENE-7562
> URL: https://issues.apache.org/jira/browse/LUCENE-7562
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7562.patch
>
>
> If you index {{SuggestField}} for some field X, but later delete all 
> documents with that field, it can cause a ghost situation where the field 
> infos believes field X exists yet the postings do not.
> I believe this bug is the root cause of this ES issue: 
> https://github.com/elastic/elasticsearch/issues/21500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4924) indices getting out of sync with SolrCloud

2016-11-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668320#comment-15668320
 ] 

Mark Miller commented on SOLR-4924:
---

I think a *lot* of different later issues fixed this type of thing, so hard to 
say which this one relates to.

Anyway, given the old version, I don't think there is much value in keeping 
this open.

> indices getting out of sync with SolrCloud
> --
>
> Key: SOLR-4924
> URL: https://issues.apache.org/jira/browse/SOLR-4924
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java), SolrCloud
>Affects Versions: 4.2
> Environment: Linux 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:43 EDT 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
> CentOS release 5.8 (Final)
> Solr 4.2.1
>Reporter: Ricardo Merizalde
>
> We are experiencing an issue in our production servers where the indices get 
> out of sync. Customers will see different results/result sorting depending of 
> the instance that serves the request.
> We currently have 2 instances with a single shard. This is our update handler 
> configuration
> 
>   
> 
> 60
> 
> 5000
> 
> false
>   
>   
> 
> 5000
>   
>   
> ${solr.data.dir:}
>   
> 
> When the indices get out of sync the follower replica ends up with a higher 
> version than the master. Optimizing the leader or reloading the follower core 
> doesn't not help. The only why to get the indices in sync is to restart the 
> server.
> This is an state example of the leader:
> version: 1102541
> numDocs: 214007
> maxDoc: 370861
> deletedDocs: 156854 
> While the follower core has the following state:
> version: 1109143
> numDocs: 213890
> maxDoc: 341585
> deletedDocs: 127695 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-11-15 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-9284.
---
   Resolution: Fixed
Fix Version/s: (was: 6.2)
   6.4

> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9708) Expose UnifiedHighlighter in Solr

2016-11-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668291#comment-15668291
 ] 

David Smiley commented on SOLR-9708:


After you add hl.useUnifiedHighlighter, I think the tests should be updated to 
go about it his way instead of custom search component because people probably 
won't bother to explicitly change the search component given this approach 
works without fuss.

Maybe hl.method should be hl.offsetSource so as to not suggest you're picking 
the highlighter implementation overall?

> Expose UnifiedHighlighter in Solr
> -
>
> Key: SOLR-9708
> URL: https://issues.apache.org/jira/browse/SOLR-9708
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.4
>
>
> This ticket is for creating a Solr plugin that can utilize the new 
> UnifiedHighlighter which was initially committed in 
> https://issues.apache.org/jira/browse/LUCENE-7438



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9772) FieldSortValues should reuse comparator and only invalidate leafComparator

2016-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668237#comment-15668237
 ] 

ASF GitHub Bot commented on SOLR-9772:
--

GitHub user johnthcall opened a pull request:

https://github.com/apache/lucene-solr/pull/115

SOLR-9772 FieldSortValues should reuse comparator and only invalidate 
leafComparator

No need to recreate the comparator as leaf changes. There was a bug where 
lastIdx was not set and was recreating the comparator and re-initializing the 
leafComparator for each document.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/johnthcall/lucene-solr patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #115


commit 8a47601e0f42563c1d45c3049d21194505b1a89d
Author: John Call 
Date:   2016-11-15T20:37:37Z

SOLR-9772 FieldSortValues should reuse comparator and only invalidate 
leafComparator




> FieldSortValues should reuse comparator and only invalidate leafComparator
> --
>
> Key: SOLR-9772
> URL: https://issues.apache.org/jira/browse/SOLR-9772
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: master (7.0)
>Reporter: John Call
>Priority: Minor
> Fix For: master (7.0)
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> In QueryComponent.doFieldSortValues when there are multiple leafs a 
> comparator and leafComparator is made for each document instead of creating a 
> common comparator and making as different leafs are visited.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #115: SOLR-9772 FieldSortValues should reuse compar...

2016-11-15 Thread johnthcall
GitHub user johnthcall opened a pull request:

https://github.com/apache/lucene-solr/pull/115

SOLR-9772 FieldSortValues should reuse comparator and only invalidate 
leafComparator

No need to recreate the comparator as leaf changes. There was a bug where 
lastIdx was not set and was recreating the comparator and re-initializing the 
leafComparator for each document.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/johnthcall/lucene-solr patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #115


commit 8a47601e0f42563c1d45c3049d21194505b1a89d
Author: John Call 
Date:   2016-11-15T20:37:37Z

SOLR-9772 FieldSortValues should reuse comparator and only invalidate 
leafComparator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9772) FieldSortValues should reuse comparator and only invalidate leafComparator

2016-11-15 Thread John Call (JIRA)
John Call created SOLR-9772:
---

 Summary: FieldSortValues should reuse comparator and only 
invalidate leafComparator
 Key: SOLR-9772
 URL: https://issues.apache.org/jira/browse/SOLR-9772
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Affects Versions: master (7.0)
Reporter: John Call
Priority: Minor
 Fix For: master (7.0)


In QueryComponent.doFieldSortValues when there are multiple leafs a comparator 
and leafComparator is made for each document instead of creating a common 
comparator and making as different leafs are visited.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9728) Ability to specify Key Store type in solr.in file for SSL

2016-11-15 Thread Mano Kovacs (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668226#comment-15668226
 ] 

Mano Kovacs commented on SOLR-9728:
---

Hi [~michaelsuzuki], thanks you for the patch! I tried to add the store types 
manual and I experienced that only setting the property of the 
sslContextFactory in jetty-ssl.xml would be effective. I might be mistaken, but 
adding the two args below to jetty-ssl.xml would make the patch fully 
functioning.
{code:title="solr/server/etc/jetty-ssl.xml"}
...


...
{code}

> Ability to specify Key Store type in solr.in file for SSL
> -
>
> Key: SOLR-9728
> URL: https://issues.apache.org/jira/browse/SOLR-9728
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Server
>Affects Versions: master (7.0)
>Reporter: Michael Suzuki
> Attachments: SOLR-9728.patch
>
>
> At present when ssl is enabled we can't set the SSL type. It currently 
> defaults to JCK.
> As a user I would like to configure the SSL type via the solr.in file.
> For instance "JCEKS" would be configured as:
> {code}
> SOLR_SSL_KEYSTORE_TYPE=JCEKS
> SOLR_SSL_TRUSTSTORE_TYPE=JCEKS
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: parallelize-peersync.patch

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: (was: parallelize-peersync.patch)

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: parallelize-peersync.patch

Attached working patch. 
For my tests I didn't see much improvement (in fact in some cases performance 
degraded) with parallelization. I could not find any hotspot in the profile.

My theory is documents in test are so shorts and simple, that although 
parallelizing is working functionally, we need to test this with more complex 
documents and verify performance gains. 

Most of the parallelization parameters would be subjective and people need to 
verify which ones work better for them.

It also seems performance would suffer if there are relatively high DBQs to  
applied during DBQs, since updates are applied out of order.

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7562) CompletionFieldsConsumer throws NPE on ghost fields

2016-11-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668113#comment-15668113
 ] 

Michael McCandless commented on LUCENE-7562:


Thanks [~jpountz] I'll do that.

> CompletionFieldsConsumer throws NPE on ghost fields
> ---
>
> Key: LUCENE-7562
> URL: https://issues.apache.org/jira/browse/LUCENE-7562
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7562.patch
>
>
> If you index {{SuggestField}} for some field X, but later delete all 
> documents with that field, it can cause a ghost situation where the field 
> infos believes field X exists yet the postings do not.
> I believe this bug is the root cause of this ES issue: 
> https://github.com/elastic/elasticsearch/issues/21500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

2016-11-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668108#comment-15668108
 ] 

Michael McCandless commented on LUCENE-6664:


I'm re-opening this issue: I think my original patch here is a good way to move 
forward.  It is a simple, backwards compatible way, for token streams to 
naturally produce graphs, and to empower token filters to create new positions.

Existing token streams, that produce posInc=0 or posInc=1 and posLength=1 
tokens, naturally work the way they do today with this change, producing 
"sausage" graphs.

Graph-aware token streams, like the new {{SynonymGraphFilter}} here, the 
Kuromoji {{JapaneseTokenizer}}, and {{WordDelimiterFilter}} if we improve it, 
can produce correct graphs which can be used at query time to make accurate 
queries.

Today, multi-word synonyms are buggy (see 
https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter
 and 
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html), 
missing hits that should match, and incorrectly returning hits that should not 
match, for queries that involve the synonyms.  With this change, if you use 
query time synonym expansion, along with separate improvements to query parser, 
it would fix the bug.  The required changes to query parsing are surprisingly 
contained ... see https://github.com/elastic/elasticsearch/pull/21517 as an 
example approach.

I am not proposing, here, that the Lucene index format be changed to support 
indexing a position graph.  Instead, I'm proposing that we make it possible for 
query-time position graphs to work correctly, so multi-token synonyms are no 
longer buggy, and I think this is a good way to make that happen.

> Replace SynonymFilter with SynonymGraphFilter
> -
>
> Key: LUCENE-6664
> URL: https://issues.apache.org/jira/browse/LUCENE-6664
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-6664.patch, LUCENE-6664.patch, LUCENE-6664.patch, 
> LUCENE-6664.patch, usa.png, usa_flat.png
>
>
> Spinoff from LUCENE-6582.
> I created a new SynonymGraphFilter (to replace the current buggy
> SynonymFilter), that produces correct graphs (does no "graph
> flattening" itself).  I think this makes it simpler.
> This means you must add the FlattenGraphFilter yourself, if you are
> applying synonyms during indexing.
> Index-time syn expansion is a necessarily "lossy" graph transformation
> when multi-token (input or output) synonyms are applied, because the
> index does not store {{posLength}}, so there will always be phrase
> queries that should match but do not, and then phrase queries that
> should not match but do.
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> goes into detail about this.
> However, with this new SynonymGraphFilter, if instead you do synonym
> expansion at query time (and don't do the flattening), and you use
> TermAutomatonQuery (future: somehow integrated into a query parser),
> or maybe just "enumerate all paths and make union of PhraseQuery", you
> should get 100% correct matches (not sure about "proper" scoring
> though...).
> This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

2016-11-15 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-6664:


> Replace SynonymFilter with SynonymGraphFilter
> -
>
> Key: LUCENE-6664
> URL: https://issues.apache.org/jira/browse/LUCENE-6664
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-6664.patch, LUCENE-6664.patch, LUCENE-6664.patch, 
> LUCENE-6664.patch, usa.png, usa_flat.png
>
>
> Spinoff from LUCENE-6582.
> I created a new SynonymGraphFilter (to replace the current buggy
> SynonymFilter), that produces correct graphs (does no "graph
> flattening" itself).  I think this makes it simpler.
> This means you must add the FlattenGraphFilter yourself, if you are
> applying synonyms during indexing.
> Index-time syn expansion is a necessarily "lossy" graph transformation
> when multi-token (input or output) synonyms are applied, because the
> index does not store {{posLength}}, so there will always be phrase
> queries that should match but do not, and then phrase queries that
> should not match but do.
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> goes into detail about this.
> However, with this new SynonymGraphFilter, if instead you do synonym
> expansion at query time (and don't do the flattening), and you use
> TermAutomatonQuery (future: somehow integrated into a query parser),
> or maybe just "enumerate all paths and make union of PhraseQuery", you
> should get 100% correct matches (not sure about "proper" scoring
> though...).
> This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7387) Something wrong with how "File Formats" link is generated in docs/index.html - can cause precommit to fail on some systems

2016-11-15 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved LUCENE-7387.
--
   Resolution: Fixed
 Assignee: Hoss Man
Fix Version/s: 6.4
   master (7.0)

> Something wrong with how "File Formats" link is generated in docs/index.html 
> - can cause precommit to fail on some systems
> --
>
> Key: LUCENE-7387
> URL: https://issues.apache.org/jira/browse/LUCENE-7387
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7387.patch
>
>
> I'm not sure what's going on, but here's what I've figured out while poking 
> at things with Ishan to try and figure out why {{ant precommit}} fails for 
> him on a clean checkout of master...
> * on my machine, with a clean checkout, the generated index.html file has 
> lines that look like this...{noformat}
> 
> File Formats: Guide to the 
> supported index format used by Lucene.  This can be customized by using  href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
>  alternate codec.
> 
> {noformat}...note there is a newline in the href after {{lucene62}}
> * on ishan's machine, with a clean checkout, the same line looks like 
> this...{noformat}
> 
>  href="core/org/apache/lucene/codecs/lucene62%0A/package-summary.html#package.description">File
>  Formats: Guide to the supported index format used by Lucene.  This can 
> be customized by using  href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
>  alternate codec.
> 
> {noformat}...note that he has a URL escaped {{'NO-BREAK SPACE' (U+00A0)}} 
> character in href attribute.
> * on my machine, {{ant documentation-lint}} doesn't complain about the 
> newline in the href attribute when checking links.
> * on ishan's machine, {{ant documentation-lint}} most certainly complains 
> about the 'NO-BREAK SPACE'...{noformat}
> ...
> -documentation-lint:
>  [echo] checking for broken html...
> [jtidy] Checking for broken html (such as invalid tags)...
>[delete] Deleting directory 
> /home/ishan/code/chatman-lucene-solr/lucene/build/jtidy_tmp
>  [echo] Checking for broken links...
>  [exec] 
>  [exec] Crawl/parse...
>  [exec] 
>  [exec] Verify...
>  [exec] 
>  [exec] file:///build/docs/index.html
>  [exec]   BROKEN LINK: 
> file:///build/docs/core/org/apache/lucene/codecs/lucene62%0A/package-summary.html
>  [exec] 
>  [exec] Broken javadocs links were found!
> BUILD FAILED
> {noformat}
> Raising the following questions...
> * How is *either* a newline or a 'NO-BREAK SPACE' getting introduced into the 
> {{$defaultCodecPackage}} variable that index.xsl uses to generate that href 
> attribute?
> * why doesn't {{documentation-lint}} complain that the href has a newline in 
> it on my system?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7387) Something wrong with how "File Formats" link is generated in docs/index.html - can cause precommit to fail on some systems

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668048#comment-15668048
 ] 

ASF subversion and git services commented on LUCENE-7387:
-

Commit 280cbfd8fb70376be3d32902baa629baf0b66e00 in lucene-solr's branch 
refs/heads/master from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=280cbfd ]

LUCENE-7387: fix defaultCodec in build.xml to account for the line ending

this not only fixes the link in the javadoc to be correct, but also gets 
precommit working with ant 1.9.6


> Something wrong with how "File Formats" link is generated in docs/index.html 
> - can cause precommit to fail on some systems
> --
>
> Key: LUCENE-7387
> URL: https://issues.apache.org/jira/browse/LUCENE-7387
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: LUCENE-7387.patch
>
>
> I'm not sure what's going on, but here's what I've figured out while poking 
> at things with Ishan to try and figure out why {{ant precommit}} fails for 
> him on a clean checkout of master...
> * on my machine, with a clean checkout, the generated index.html file has 
> lines that look like this...{noformat}
> 
> File Formats: Guide to the 
> supported index format used by Lucene.  This can be customized by using  href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
>  alternate codec.
> 
> {noformat}...note there is a newline in the href after {{lucene62}}
> * on ishan's machine, with a clean checkout, the same line looks like 
> this...{noformat}
> 
>  href="core/org/apache/lucene/codecs/lucene62%0A/package-summary.html#package.description">File
>  Formats: Guide to the supported index format used by Lucene.  This can 
> be customized by using  href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
>  alternate codec.
> 
> {noformat}...note that he has a URL escaped {{'NO-BREAK SPACE' (U+00A0)}} 
> character in href attribute.
> * on my machine, {{ant documentation-lint}} doesn't complain about the 
> newline in the href attribute when checking links.
> * on ishan's machine, {{ant documentation-lint}} most certainly complains 
> about the 'NO-BREAK SPACE'...{noformat}
> ...
> -documentation-lint:
>  [echo] checking for broken html...
> [jtidy] Checking for broken html (such as invalid tags)...
>[delete] Deleting directory 
> /home/ishan/code/chatman-lucene-solr/lucene/build/jtidy_tmp
>  [echo] Checking for broken links...
>  [exec] 
>  [exec] Crawl/parse...
>  [exec] 
>  [exec] Verify...
>  [exec] 
>  [exec] file:///build/docs/index.html
>  [exec]   BROKEN LINK: 
> file:///build/docs/core/org/apache/lucene/codecs/lucene62%0A/package-summary.html
>  [exec] 
>  [exec] Broken javadocs links were found!
> BUILD FAILED
> {noformat}
> Raising the following questions...
> * How is *either* a newline or a 'NO-BREAK SPACE' getting introduced into the 
> {{$defaultCodecPackage}} variable that index.xsl uses to generate that href 
> attribute?
> * why doesn't {{documentation-lint}} complain that the href has a newline in 
> it on my system?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7387) Something wrong with how "File Formats" link is generated in docs/index.html - can cause precommit to fail on some systems

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668047#comment-15668047
 ] 

ASF subversion and git services commented on LUCENE-7387:
-

Commit 38a67e25ae872d921107896e359da5364040ba79 in lucene-solr's branch 
refs/heads/branch_6x from Chris Hostetter
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=38a67e2 ]

LUCENE-7387: fix defaultCodec in build.xml to account for the line ending

this not only fixes the link in the javadoc to be correct, but also gets 
precommit working with ant 1.9.6

(cherry picked from commit 280cbfd8fb70376be3d32902baa629baf0b66e00)


> Something wrong with how "File Formats" link is generated in docs/index.html 
> - can cause precommit to fail on some systems
> --
>
> Key: LUCENE-7387
> URL: https://issues.apache.org/jira/browse/LUCENE-7387
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Hoss Man
> Attachments: LUCENE-7387.patch
>
>
> I'm not sure what's going on, but here's what I've figured out while poking 
> at things with Ishan to try and figure out why {{ant precommit}} fails for 
> him on a clean checkout of master...
> * on my machine, with a clean checkout, the generated index.html file has 
> lines that look like this...{noformat}
> 
> File Formats: Guide to the 
> supported index format used by Lucene.  This can be customized by using  href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
>  alternate codec.
> 
> {noformat}...note there is a newline in the href after {{lucene62}}
> * on ishan's machine, with a clean checkout, the same line looks like 
> this...{noformat}
> 
>  href="core/org/apache/lucene/codecs/lucene62%0A/package-summary.html#package.description">File
>  Formats: Guide to the supported index format used by Lucene.  This can 
> be customized by using  href="core/org/apache/lucene/codecs/package-summary.html#package.description">an
>  alternate codec.
> 
> {noformat}...note that he has a URL escaped {{'NO-BREAK SPACE' (U+00A0)}} 
> character in href attribute.
> * on my machine, {{ant documentation-lint}} doesn't complain about the 
> newline in the href attribute when checking links.
> * on ishan's machine, {{ant documentation-lint}} most certainly complains 
> about the 'NO-BREAK SPACE'...{noformat}
> ...
> -documentation-lint:
>  [echo] checking for broken html...
> [jtidy] Checking for broken html (such as invalid tags)...
>[delete] Deleting directory 
> /home/ishan/code/chatman-lucene-solr/lucene/build/jtidy_tmp
>  [echo] Checking for broken links...
>  [exec] 
>  [exec] Crawl/parse...
>  [exec] 
>  [exec] Verify...
>  [exec] 
>  [exec] file:///build/docs/index.html
>  [exec]   BROKEN LINK: 
> file:///build/docs/core/org/apache/lucene/codecs/lucene62%0A/package-summary.html
>  [exec] 
>  [exec] Broken javadocs links were found!
> BUILD FAILED
> {noformat}
> Raising the following questions...
> * How is *either* a newline or a 'NO-BREAK SPACE' getting introduced into the 
> {{$defaultCodecPackage}} variable that index.xsl uses to generate that href 
> attribute?
> * why doesn't {{documentation-lint}} complain that the href has a newline in 
> it on my system?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9707) DeleteByQuery forward requests to down replicas and set it in LiR

2016-11-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667995#comment-15667995
 ] 

Yonik Seeley commented on SOLR-9707:


I don't think so...
seems like it should follow the same procedure as any other update.

> DeleteByQuery forward requests to down replicas and set it in LiR
> -
>
> Key: SOLR-9707
> URL: https://issues.apache.org/jira/browse/SOLR-9707
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Varun Thacker
>  Labels: solrcloud
> Attachments: SOLR-9707.diff
>
>
> DeleteByQuery, unlike other requests, does not filter out the down replicas. 
> Thus, the update is still forwarded to the down replica and fails, and the 
> leader then sets the replica in LiR. In a cluster where there are lots of 
> deleteByQuery requests, this can flood the /overseer/queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667956#comment-15667956
 ] 

Ishan Chattopadhyaya commented on SOLR-9506:


Can we resolve this issue, since it seems it was released as part of 6.3.0? (I 
will open another issue for the issue I wrote about two comments before).

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9707) DeleteByQuery forward requests to down replicas and set it in LiR

2016-11-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667951#comment-15667951
 ] 

Mark Miller commented on SOLR-9707:
---

[~ysee...@gmail.com], any idea if that was on purpose to avoid a state race or 
something?

> DeleteByQuery forward requests to down replicas and set it in LiR
> -
>
> Key: SOLR-9707
> URL: https://issues.apache.org/jira/browse/SOLR-9707
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Varun Thacker
>  Labels: solrcloud
> Attachments: SOLR-9707.diff
>
>
> DeleteByQuery, unlike other requests, does not filter out the down replicas. 
> Thus, the update is still forwarded to the down replica and fails, and the 
> leader then sets the replica in LiR. In a cluster where there are lots of 
> deleteByQuery requests, this can flood the /overseer/queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9707) DeleteByQuery forward requests to down replicas and set it in LiR

2016-11-15 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker reassigned SOLR-9707:
---

Assignee: Varun Thacker

> DeleteByQuery forward requests to down replicas and set it in LiR
> -
>
> Key: SOLR-9707
> URL: https://issues.apache.org/jira/browse/SOLR-9707
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Jessica Cheng Mallet
>Assignee: Varun Thacker
>  Labels: solrcloud
> Attachments: SOLR-9707.diff
>
>
> DeleteByQuery, unlike other requests, does not filter out the down replicas. 
> Thus, the update is still forwarded to the down replica and fails, and the 
> leader then sets the replica in LiR. In a cluster where there are lots of 
> deleteByQuery requests, this can flood the /overseer/queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7543) Make changes-to-html target an offline operation

2016-11-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667911#comment-15667911
 ] 

Hoss Man commented on LUCENE-7543:
--

or just {{dev-tools/doap/lucene.rdf}} and {{dev-tools/doap/solr.rdf}} (with a 
{{README.txt}} in the same dir explaining to future devs why that dir is there) 
... or whatever names we like ... there's no rule that the filenames / URLs has 
to have "doap" in them ... the bike sheds can be any color we want.

> Make changes-to-html target an offline operation
> 
>
> Key: LUCENE-7543
> URL: https://issues.apache.org/jira/browse/LUCENE-7543
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Steve Rowe
>
> Currently changes-to-html pulls release dates from JIRA, and so fails when 
> JIRA is inaccessible (e.g. from behind a firewall).
> SOLR-9711 advocates adding a build sysprop to ignore JIRA connection 
> failures, but I'd rather make the operation always offline.
> In an offline discussion, [~hossman] advocated moving Lucene's and Solr's 
> {{doap.rdf}} files, which contain all of the release dates that the 
> changes-to-html now pulls from JIRA, from the CMS Subversion repository 
> (downloadable from the website at http://lucene.apache.org/core/doap.rdf and 
> http://lucene.apache.org/solr/doap.rdf) to the Lucene/Solr git repository. If 
> we did that, then the process could be entirely offline if release dates were 
> taken from the local {{doap.rdf}} files instead of downloaded from JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9442) Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter

2016-11-15 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667886#comment-15667886
 ] 

Hoss Man commented on SOLR-9442:


bq. What do you think?

Well, I guess don't really see what usecases the "arrnvp" format (already 
committed) serves that aren't equally/better served with the "arrntv" format 
... if you find it more useful then what I suggested then i guess it has a 
purpose as well ... i'm just not seeing it.

I suppose my main concern with having both is making sure the tests/docs make 
it very clear how things behave with either the name or the value (or both) are 
null.

Even after reading the original diff for this issue it's not clear to me what 
JSON attribute(s) will exist for the equivalents of things like {{}} or {{}} since there is no "type" to use as a JSON 
attribute name ... or does that just result in something like...

{noformat}
NamedList("bar”=null,null=true,null=null)
  => [{"name":"bar"},
  {"bool":true},
  {}]
{noformat}

...which seems really weird)

> Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter
> 
>
> Key: SOLR-9442
> URL: https://issues.apache.org/jira/browse/SOLR-9442
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Reporter: Jonny Marks
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9442-arrntv.patch, SOLR-9442.patch, 
> SOLR-9442.patch, SOLR-9442.patch
>
>
> The JSONResponseWriter class currently supports several styles of NamedList 
> output format, documented on the wiki at http://wiki.apache.org/solr/SolJSON 
> and in the code at 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/response/JSONResponseWriter.java#L71-L76.
> For example the 'arrmap' style:
> {code}NamedList("a"=1,"b"=2,null=3) => [{"a":1},{"b":2},3]
> NamedList("a"=1,"bar”=“foo",null=3.4f) => [{"a":1},{"bar”:”foo"},{3.4}]{code}
> This patch creates a new style ‘arrnvp’ which is an array of named value 
> pairs. For example:
> {code}NamedList("a"=1,"b"=2,null=3) => 
> [{"name":"a","int":1},{"name":"b","int":2},{"int":3}]
> NamedList("a"=1,"bar”=“foo",null=3.4f) => 
> [{"name":"a","int":1},{"name":"b","str":"foo"},{"float":3.4}]{code}
> This style maintains the type information of the values, similar to the xml 
> format:
> {code:xml}
>   1
>   foo
>   3.4
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5944) Support updates of numeric DocValues

2016-11-15 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-5944:
---
Attachment: SOLR-5944.patch

Added another patch. The PeerSyncTest was failing, due to fingerprint caching 
issue. This patch now depends on SOLR-9506's 
"SOLR-9506-combined-deletion-key.patch" patch.

> Support updates of numeric DocValues
> 
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, defensive-checks.log.gz, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, 
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9771) Resolv Variables in DIH when using encryptKeyFile.

2016-11-15 Thread Bill Bell (JIRA)
Bill Bell created SOLR-9771:
---

 Summary: Resolv Variables in DIH when using encryptKeyFile.
 Key: SOLR-9771
 URL: https://issues.apache.org/jira/browse/SOLR-9771
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 5.5.3
Reporter: Bill Bell



I would like to use a variable like ${db.passwdkey} for password when using 
encryptKeyFile in various DIH files.

-Ddb.passwdkey="U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o="

Please backport to 5.5.3

This does not appear to work when used in DIH below.



password="U2FsdGVkX18QMjY0yfCqlfBMvAB4d3XkwY96L7gfO2o=" 

encryptKeyFile="/location/of/encryptionkey"
/>

{{{


password=${solr.passkey} 

encryptKeyFile="/location/of/encryptionkey"
}}}
/>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-master - Build # 1154 - Still Unstable

2016-11-15 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1154/

8 tests failed.
FAILED:  
org.apache.solr.handler.TestReplicationHandler.doTestReplicateAfterCoreReload

Error Message:
expected:<[{indexVersion=1479222981730,generation=2,filelist=[_26u.cfe, 
_26u.cfs, _26u.si, _26v.doc, _26v.fdt, _26v.fdx, _26v.fnm, _26v.nvd, _26v.nvm, 
_26v.si, _26v.tim, _26v.tip, _26w.cfe, _26w.cfs, _26w.si, _26x.cfe, _26x.cfs, 
_26x.si, _26y.cfe, _26y.cfs, _26y.si, _26z.cfe, _26z.cfs, _26z.si, _270.cfe, 
_270.cfs, _270.si, _271.cfe, _271.cfs, _271.si, _272.cfe, _272.cfs, _272.si, 
_273.cfe, _273.cfs, _273.si, _274.cfe, _274.cfs, _274.si, _275.cfe, _275.cfs, 
_275.si, _277.cfe, _277.cfs, _277.si, _278.cfe, _278.cfs, _278.si, _279.cfe, 
_279.cfs, _279.si, _27a.cfe, _27a.cfs, _27a.si, _27b.cfe, _27b.cfs, _27b.si, 
_27c.cfe, _27c.cfs, _27c.si, segments_2]}]> but 
was:<[{indexVersion=1479222981730,generation=2,filelist=[_26u.cfe, _26u.cfs, 
_26u.si, _26v.doc, _26v.fdt, _26v.fdx, _26v.fnm, _26v.nvd, _26v.nvm, _26v.si, 
_26v.tim, _26v.tip, _26w.cfe, _26w.cfs, _26w.si, _26x.cfe, _26x.cfs, _26x.si, 
_26y.cfe, _26y.cfs, _26y.si, _26z.cfe, _26z.cfs, _26z.si, _270.cfe, _270.cfs, 
_270.si, _271.cfe, _271.cfs, _271.si, _272.cfe, _272.cfs, _272.si, _273.cfe, 
_273.cfs, _273.si, _274.cfe, _274.cfs, _274.si, _275.cfe, _275.cfs, _275.si, 
_277.cfe, _277.cfs, _277.si, _278.cfe, _278.cfs, _278.si, _279.cfe, _279.cfs, 
_279.si, _27a.cfe, _27a.cfs, _27a.si, _27b.cfe, _27b.cfs, _27b.si, _27c.cfe, 
_27c.cfs, _27c.si, segments_2]}, 
{indexVersion=1479222981730,generation=3,filelist=[_275.cfe, _275.cfs, _275.si, 
_276.doc, _276.fdt, _276.fdx, _276.fnm, _276.nvd, _276.nvm, _276.si, _276.tim, 
_276.tip, _277.cfe, _277.cfs, _277.si, _278.cfe, _278.cfs, _278.si, _279.cfe, 
_279.cfs, _279.si, _27a.cfe, _27a.cfs, _27a.si, _27b.cfe, _27b.cfs, _27b.si, 
_27c.cfe, _27c.cfs, _27c.si, segments_3]}]>

Stack Trace:
java.lang.AssertionError: 
expected:<[{indexVersion=1479222981730,generation=2,filelist=[_26u.cfe, 
_26u.cfs, _26u.si, _26v.doc, _26v.fdt, _26v.fdx, _26v.fnm, _26v.nvd, _26v.nvm, 
_26v.si, _26v.tim, _26v.tip, _26w.cfe, _26w.cfs, _26w.si, _26x.cfe, _26x.cfs, 
_26x.si, _26y.cfe, _26y.cfs, _26y.si, _26z.cfe, _26z.cfs, _26z.si, _270.cfe, 
_270.cfs, _270.si, _271.cfe, _271.cfs, _271.si, _272.cfe, _272.cfs, _272.si, 
_273.cfe, _273.cfs, _273.si, _274.cfe, _274.cfs, _274.si, _275.cfe, _275.cfs, 
_275.si, _277.cfe, _277.cfs, _277.si, _278.cfe, _278.cfs, _278.si, _279.cfe, 
_279.cfs, _279.si, _27a.cfe, _27a.cfs, _27a.si, _27b.cfe, _27b.cfs, _27b.si, 
_27c.cfe, _27c.cfs, _27c.si, segments_2]}]> but 
was:<[{indexVersion=1479222981730,generation=2,filelist=[_26u.cfe, _26u.cfs, 
_26u.si, _26v.doc, _26v.fdt, _26v.fdx, _26v.fnm, _26v.nvd, _26v.nvm, _26v.si, 
_26v.tim, _26v.tip, _26w.cfe, _26w.cfs, _26w.si, _26x.cfe, _26x.cfs, _26x.si, 
_26y.cfe, _26y.cfs, _26y.si, _26z.cfe, _26z.cfs, _26z.si, _270.cfe, _270.cfs, 
_270.si, _271.cfe, _271.cfs, _271.si, _272.cfe, _272.cfs, _272.si, _273.cfe, 
_273.cfs, _273.si, _274.cfe, _274.cfs, _274.si, _275.cfe, _275.cfs, _275.si, 
_277.cfe, _277.cfs, _277.si, _278.cfe, _278.cfs, _278.si, _279.cfe, _279.cfs, 
_279.si, _27a.cfe, _27a.cfs, _27a.si, _27b.cfe, _27b.cfs, _27b.si, _27c.cfe, 
_27c.cfs, _27c.si, segments_2]}, 
{indexVersion=1479222981730,generation=3,filelist=[_275.cfe, _275.cfs, _275.si, 
_276.doc, _276.fdt, _276.fdx, _276.fnm, _276.nvd, _276.nvm, _276.si, _276.tim, 
_276.tip, _277.cfe, _277.cfs, _277.si, _278.cfe, _278.cfs, _278.si, _279.cfe, 
_279.cfs, _279.si, _27a.cfe, _27a.cfs, _27a.si, _27b.cfe, _27b.cfs, _27b.si, 
_27c.cfe, _27c.cfs, _27c.si, segments_3]}]>
at 
__randomizedtesting.SeedInfo.seed([87D43D2F6FF8E245:A203261F1FB0EC46]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.solr.handler.TestReplicationHandler.doTestReplicateAfterCoreReload(TestReplicationHandler.java:1227)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 

[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667801#comment-15667801
 ] 

Ishan Chattopadhyaya commented on SOLR-9506:


I see.. I saw it was unresolved, and I thought it didn't make it into 6.3 yet. 
I'll see if it made it into 6.3, and open a new ticket if that's the case.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667795#comment-15667795
 ] 

Noble Paul commented on SOLR-9506:
--

Ishan , i guess this is already fixed in 6.3. so, we may need to open another 
ticket

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Collection API for performance monitoring?

2016-11-15 Thread Tomás Fernández Löbbe
If you only need query/update performance you could aggregate the logs too.
If you need more information, I like what was proposed in SOLR-9641, that
would allow you do collect and aggregate metrics for internal components
too.

Tomás

On Tue, Nov 15, 2016 at 8:31 AM, Walter Underwood 
wrote:

> To calculate percentiles we need all the data points. If there is a lot of
> data, it could be sampled.
>
> Average can be calculated with the total time and the number of requests.
> Snapshots of those
> two values allow snapshots of averages.
>
> But averages are the wrong metric for a one-sided distribution like
> response time. Let’s assume
> that any response longer than 10 seconds is a bad experience. Percentiles
> will tell you what
> response time 95% of customer searches are getting. With averages, a
> single 30 second response
> time will increase the metric, even though it is “just as broken” as a 15
> s response.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> On Nov 15, 2016, at 7:27 AM, Ryan Josal  wrote:
>
> I haven't tried for 95th percentile, but generally with those collection
> start stats you would monitor based on calculated deltas.  You can figure
> out the average response time for any given window of time not smaller than
> your snapshot polling interval.  I don't see why 95th percentile would be
> any different.
>
> Ryan
>
> On Monday, November 14, 2016, Walter Underwood 
> wrote:
>
>> Because the current stats are not usable. They really should be removed
>> from the code.
>>
>> They calculate percentiles since the last collection load. We need to
>> know 95th percentile
>> during the peak hour last night, not the 95th for the last month.
>>
>> Right now, we run eleven collections in our Solr 4 cluster. In each
>> collection, we have
>> several different handlers. Usually, one for autosuggest (instant
>> results), one for the SRP,
>> and one for mobile, though we also have SEO requests and so on. We can
>> track performance
>> for each of these.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> On Nov 14, 2016, at 3:54 PM, Erick Erickson 
>> wrote:
>>
>> Point taken, and thanks for the link. The stats I'm referring to in
>> this thread are available now, and would (I think) be a quick win. I
>> don't have a huge amount of investment in it though, more "why didn't
>> we think of this before?" followed by "maybe there's a very good
>> reason not to bother". This may be it since we now standardize on
>> Jetty. My question of course is whether this would be supported moving
>> forward to netty or whatever...
>>
>> Best,
>> Erick
>>
>> On Mon, Nov 14, 2016 at 3:44 PM, Walter Underwood 
>> wrote:
>>
>> I’m not fond of polling for performance stats. I’d rather have the app
>> report them.
>>
>> We could integrate existing Jetty monitoring:
>>
>> http://metrics.dropwizard.io/3.1.0/manual/jetty/
>>
>> From our experience with a similar approach, we might need some
>> Solr-specific metric
>> conflation. SolrJ sends a request to /solr/collection/handler as
>> /solr/collection/select?qt=/handler.
>> In our code, we fix that request to the intended path. We’ve been running
>> a
>> Tomcat metrics search
>> filter for three years.
>>
>> Also, see:
>>
>> https://issues.apache.org/jira/browse/SOLR-8785
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>> On Nov 14, 2016, at 3:25 PM, Erick Erickson 
>> wrote:
>>
>> What do people think about exposing a Collections API call (name TBD,
>> but the sense is PERFORMANCESTATS) that would simply issue the
>> admin/mbeans call to each replica of a collection and report them
>> back. This would give operations monitors the ability to see, say,
>> anomalous replicas that had poor average response times for the last 5
>> minutes and the like.
>>
>> Seems like an easy enhancement that would make ops people's lives easier.
>>
>> I'll raise a JIRA if there's interest, but sure won't make progress on
>> it until I clear my plate of some other JIRAs that I've let linger for
>> far too long.
>>
>> Erick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>>
>


[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-11-15 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9506:
---
Attachment: SOLR-9506-combined-deletion-key.patch

While working on SOLR-5944, I realized that the current per segment caching 
logic works fine for deleted documents (due to comparison of numDocs in a 
segment for the criterion of cache hit/miss). However, if a segment has 
docValues updates, the same logic is insufficient. It is my understanding that 
changing the key for caching from reader().getCoreCacheKey() to 
reader().getCombinedCoreAndDeletesKey() would work here, since the docValues 
updates are internally handled using deletion queue and hence the "combined" 
core and deletes key would work here. Attaching a patch for the same.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506-combined-deletion-key.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch, 
> SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-EA] Lucene-Solr-master-Linux (32bit/jdk-9-ea+140) - Build # 18290 - Unstable!

2016-11-15 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18290/
Java: 32bit/jdk-9-ea+140 -client -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test

Error Message:
expected:<204> but was:<190>

Stack Trace:
java.lang.AssertionError: expected:<204> but was:<190>
at 
__randomizedtesting.SeedInfo.seed([42ED2D51153DE44B:CAB9128BBBC189B3]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.bringUpDeadNodeAndEnsureNoReplication(PeerSyncReplicationTest.java:280)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.test(PeerSyncReplicationTest.java:157)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@9-ea/Native 
Method)
at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@9-ea/NativeMethodAccessorImpl.java:62)
at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@9-ea/Method.java:535)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 

[jira] [Commented] (LUCENE-7563) BKD index should compress unused leading bytes

2016-11-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667700#comment-15667700
 ] 

Adrien Grand commented on LUCENE-7563:
--

+1

> BKD index should compress unused leading bytes
> --
>
> Key: LUCENE-7563
> URL: https://issues.apache.org/jira/browse/LUCENE-7563
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Fix For: master (7.0), 6.4
>
>
> Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
> dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
> two bytes in a given segment, we shouldn't store all those leading 0s in the 
> index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7537) Add multi valued field support to index sorting

2016-11-15 Thread Ferenczi Jim (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferenczi Jim updated LUCENE-7537:
-
Attachment: LUCENE-7537.patch

Oh right "sorted_string" is ambiguous. Here is another patch with the renaming  
to "multi_valued" for string and numerics.
Thanks [~mikemccand]

> Add multi valued field support to index sorting
> ---
>
> Key: LUCENE-7537
> URL: https://issues.apache.org/jira/browse/LUCENE-7537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Ferenczi Jim
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7537.patch, LUCENE-7537.patch, LUCENE-7537.patch, 
> LUCENE-7537.patch, LUCENE-7537.patch
>
>
> Today index sorting can be done on single valued field through the 
> NumericDocValues (for numerics) and SortedDocValues (for strings).
> I'd like to add the ability to sort on multi valued fields. Since index 
> sorting does not accept custom comparator we could just take the minimum 
> value of each document for an ascending sort and the maximum value for a 
> descending sort.
> This way we could handle all cases instead of throwing an exception during a 
> merge when we encounter a multi valued DVs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7563) BKD index should compress unused leading bytes

2016-11-15 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-7563:
--

 Summary: BKD index should compress unused leading bytes
 Key: LUCENE-7563
 URL: https://issues.apache.org/jira/browse/LUCENE-7563
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: master (7.0), 6.4


Today the BKD (points) in-heap index always uses {{dimensionNumBytes}} per 
dimension, but if e.g. you are indexing {{LongPoint}} yet only use the bottom 
two bytes in a given segment, we shouldn't store all those leading 0s in the 
index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9770) Solr should not cache queries that fail

2016-11-15 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-9770:


 Summary: Solr should not cache queries that fail
 Key: SOLR-9770
 URL: https://issues.apache.org/jira/browse/SOLR-9770
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: trunk, 6.4
Reporter: Erick Erickson
Priority: Minor


Bram Van Dam on the user's list had a problem with a bad query causing an 
exception:

java.lang.IllegalStateException: Too many values for UnInvertedField
faceting on field text. 

Then the query apparently got into one of the caches and was then autowarmed 
leading to the error every time a new searcher was opened.

This does _not_ happen with, say, a query that references an undefined field. 
Such a query doesn't get into the cache in the first place.

I have not been able to verified this but it seems worth a JIRA to investigate 
at least, no query that throws an exception should get into the caches. You can 
imagine a situation where this leads to OOM errors and Solr needing to be 
restarted to get past it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7537) Add multi valued field support to index sorting

2016-11-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667660#comment-15667660
 ] 

Michael McCandless commented on LUCENE-7537:


Thanks [~jim.ferenczi]; I still see e.g.:

{noformat}
+  case "sorted_string":
+type = SortField.Type.STRING;
+selectorSet = readSetSelector(input, scratch);
+break;
{noformat}

in SimpleText ... can we maybe rename that to:

{noformat}
  case "multi_valued_string":
...
{noformat}

Otherwise I think this is ready!

> Add multi valued field support to index sorting
> ---
>
> Key: LUCENE-7537
> URL: https://issues.apache.org/jira/browse/LUCENE-7537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Ferenczi Jim
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7537.patch, LUCENE-7537.patch, LUCENE-7537.patch, 
> LUCENE-7537.patch
>
>
> Today index sorting can be done on single valued field through the 
> NumericDocValues (for numerics) and SortedDocValues (for strings).
> I'd like to add the ability to sort on multi valued fields. Since index 
> sorting does not accept custom comparator we could just take the minimum 
> value of each document for an ascending sort and the maximum value for a 
> descending sort.
> This way we could handle all cases instead of throwing an exception during a 
> merge when we encounter a multi valued DVs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9764) Design a memory efficient DocSet if a query returns all docs

2016-11-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667611#comment-15667611
 ] 

Mark Miller commented on SOLR-9764:
---

Nice Michael, looks interesting.

Looks like you need to handle intersection overloads to avoid an infinite loop 
of method call backs?

Guessing that is due to:

BitDocSet.java
{code}
  // they had better not call us back!
  return other.intersectionSize(this);
{code}



> Design a memory efficient DocSet if a query returns all docs
> 
>
> Key: SOLR-9764
> URL: https://issues.apache.org/jira/browse/SOLR-9764
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Sun
> Attachments: SOLR-9764.patch
>
>
> In some use cases, particularly use cases with time series data, using 
> collection alias and partitioning data into multiple small collections using 
> timestamp, a filter query can match all documents in a collection. Currently 
> BitDocSet is used which contains a large array of long integers with every 
> bits set to 1. After querying, the resulted DocSet saved in filter cache is 
> large and becomes one of the main memory consumers in these use cases.
> For example. suppose a Solr setup has 14 collections for data in last 14 
> days, each collection with one day of data. A filter query for last one week 
> data would result in at least six DocSet in filter cache which matches all 
> documents in six collections respectively.   
> This is to design a new DocSet that is memory efficient for such a use case.  
> The new DocSet removes the large array, reduces memory usage and GC pressure 
> without losing advantage of large filter cache.
> In particular, for use cases when using time series data, collection alias 
> and partition data into multiple small collections using timestamp, the gain 
> can be large.
> For further optimization, it may be helpful to design a DocSet with run 
> length encoding. Thanks [~mmokhtar] for suggestion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Collection API for performance monitoring?

2016-11-15 Thread Walter Underwood
To calculate percentiles we need all the data points. If there is a lot of 
data, it could be sampled.

Average can be calculated with the total time and the number of requests. 
Snapshots of those
two values allow snapshots of averages.

But averages are the wrong metric for a one-sided distribution like response 
time. Let’s assume 
that any response longer than 10 seconds is a bad experience. Percentiles will 
tell you what 
response time 95% of customer searches are getting. With averages, a single 30 
second response
time will increase the metric, even though it is “just as broken” as a 15 s 
response.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 15, 2016, at 7:27 AM, Ryan Josal  wrote:
> 
> I haven't tried for 95th percentile, but generally with those collection 
> start stats you would monitor based on calculated deltas.  You can figure out 
> the average response time for any given window of time not smaller than your 
> snapshot polling interval.  I don't see why 95th percentile would be any 
> different.
> 
> Ryan
> 
> On Monday, November 14, 2016, Walter Underwood  > wrote:
> Because the current stats are not usable. They really should be removed from 
> the code.
> 
> They calculate percentiles since the last collection load. We need to know 
> 95th percentile
> during the peak hour last night, not the 95th for the last month.
> 
> Right now, we run eleven collections in our Solr 4 cluster. In each 
> collection, we have
> several different handlers. Usually, one for autosuggest (instant results), 
> one for the SRP,
> and one for mobile, though we also have SEO requests and so on. We can track 
> performance
> for each of these.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/   (my blog)
> 
> 
>> On Nov 14, 2016, at 3:54 PM, Erick Erickson > > wrote:
>> 
>> Point taken, and thanks for the link. The stats I'm referring to in
>> this thread are available now, and would (I think) be a quick win. I
>> don't have a huge amount of investment in it though, more "why didn't
>> we think of this before?" followed by "maybe there's a very good
>> reason not to bother". This may be it since we now standardize on
>> Jetty. My question of course is whether this would be supported moving
>> forward to netty or whatever...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Nov 14, 2016 at 3:44 PM, Walter Underwood > > wrote:
>>> I’m not fond of polling for performance stats. I’d rather have the app
>>> report them.
>>> 
>>> We could integrate existing Jetty monitoring:
>>> 
>>> http://metrics.dropwizard.io/3.1.0/manual/jetty/ 
>>> 
>>> 
>>> From our experience with a similar approach, we might need some
>>> Solr-specific metric
>>> conflation. SolrJ sends a request to /solr/collection/handler as
>>> /solr/collection/select?qt=/handler.
>>> In our code, we fix that request to the intended path. We’ve been running a
>>> Tomcat metrics search
>>> filter for three years.
>>> 
>>> Also, see:
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-8785 
>>> 
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org 
>>> 
>>> http://observer.wunderwood.org/   (my blog)
>>> 
>>> 
>>> On Nov 14, 2016, at 3:25 PM, Erick Erickson >> > wrote:
>>> 
>>> What do people think about exposing a Collections API call (name TBD,
>>> but the sense is PERFORMANCESTATS) that would simply issue the
>>> admin/mbeans call to each replica of a collection and report them
>>> back. This would give operations monitors the ability to see, say,
>>> anomalous replicas that had poor average response times for the last 5
>>> minutes and the like.
>>> 
>>> Seems like an easy enhancement that would make ops people's lives easier.
>>> 
>>> I'll raise a JIRA if there's interest, but sure won't make progress on
>>> it until I clear my plate of some other JIRAs that I've let linger for
>>> far too long.
>>> 
>>> Erick
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>>> 
>>> For additional commands, e-mail: dev-h...@lucene.apache.org 
>>> 
>>> 
>>> 
>> 
>> -
>> To unsubscribe, e-mail: 

[jira] [Commented] (SOLR-9442) Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter

2016-11-15 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667571#comment-15667571
 ] 

Christine Poerschke commented on SOLR-9442:
---

Hi [~hossman], thanks for your input. Jonny and I discussed offline and here's 
what we think.

bq. ... you have to either know beforehand what to expect, or iterate ...

Yes, with the json.nl=arrnvp \{"name":"a","int":1\} representation it helps to 
know beforehand what to expect. Our (quite possibly unusual) use case is 
actually to parse-and-convert the JSON response into an object validated by a 
_schema_.

We agree that a \{"name":"a", "type":"int", "value":1\} representation would 
help avoid iterating since clients can rely on the existence of the exact 3 
attributes.

How about supporting both json.nl=arrnvp (Named Value Pair) and json.nl=arrntv 
(Name Type Value) representations? Attached SOLR-9442-arrntv.patch shows how 
both can easily share most code. What do you think?

> Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter
> 
>
> Key: SOLR-9442
> URL: https://issues.apache.org/jira/browse/SOLR-9442
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Reporter: Jonny Marks
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9442-arrntv.patch, SOLR-9442.patch, 
> SOLR-9442.patch, SOLR-9442.patch
>
>
> The JSONResponseWriter class currently supports several styles of NamedList 
> output format, documented on the wiki at http://wiki.apache.org/solr/SolJSON 
> and in the code at 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/response/JSONResponseWriter.java#L71-L76.
> For example the 'arrmap' style:
> {code}NamedList("a"=1,"b"=2,null=3) => [{"a":1},{"b":2},3]
> NamedList("a"=1,"bar”=“foo",null=3.4f) => [{"a":1},{"bar”:”foo"},{3.4}]{code}
> This patch creates a new style ‘arrnvp’ which is an array of named value 
> pairs. For example:
> {code}NamedList("a"=1,"b"=2,null=3) => 
> [{"name":"a","int":1},{"name":"b","int":2},{"int":3}]
> NamedList("a"=1,"bar”=“foo",null=3.4f) => 
> [{"name":"a","int":1},{"name":"b","str":"foo"},{"float":3.4}]{code}
> This style maintains the type information of the values, similar to the xml 
> format:
> {code:xml}
>   1
>   foo
>   3.4
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9442) Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter

2016-11-15 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-9442:
--
Attachment: SOLR-9442-arrntv.patch

> Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter
> 
>
> Key: SOLR-9442
> URL: https://issues.apache.org/jira/browse/SOLR-9442
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Response Writers
>Reporter: Jonny Marks
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9442-arrntv.patch, SOLR-9442.patch, 
> SOLR-9442.patch, SOLR-9442.patch
>
>
> The JSONResponseWriter class currently supports several styles of NamedList 
> output format, documented on the wiki at http://wiki.apache.org/solr/SolJSON 
> and in the code at 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/response/JSONResponseWriter.java#L71-L76.
> For example the 'arrmap' style:
> {code}NamedList("a"=1,"b"=2,null=3) => [{"a":1},{"b":2},3]
> NamedList("a"=1,"bar”=“foo",null=3.4f) => [{"a":1},{"bar”:”foo"},{3.4}]{code}
> This patch creates a new style ‘arrnvp’ which is an array of named value 
> pairs. For example:
> {code}NamedList("a"=1,"b"=2,null=3) => 
> [{"name":"a","int":1},{"name":"b","int":2},{"int":3}]
> NamedList("a"=1,"bar”=“foo",null=3.4f) => 
> [{"name":"a","int":1},{"name":"b","str":"foo"},{"float":3.4}]{code}
> This style maintains the type information of the values, similar to the xml 
> format:
> {code:xml}
>   1
>   foo
>   3.4
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667522#comment-15667522
 ] 

ASF subversion and git services commented on SOLR-9284:
---

Commit b90b4dc694edb9b31c5afd69b477e6d90f24adfd in lucene-solr's branch 
refs/heads/branch_6x from markrmiller
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b90b4dc ]

SOLR-9284: Reduce off heap cache size and fix test asserts.


> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 6.2, master (7.0)
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667521#comment-15667521
 ] 

ASF subversion and git services commented on SOLR-9284:
---

Commit 358c164620f774820bd22278fcf425c599a254b2 in lucene-solr's branch 
refs/heads/master from markrmiller
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=358c164 ]

SOLR-9284: Reduce off heap cache size and fix test asserts.


> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 6.2, master (7.0)
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Linux (32bit/jdk1.8.0_102) - Build # 2187 - Unstable!

2016-11-15 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2187/
Java: 32bit/jdk1.8.0_102 -client -XX:+UseConcMarkSweepGC

2 tests failed.
FAILED:  
org.apache.solr.store.blockcache.BlockDirectoryTest.testRandomAccessWrites

Error Message:
Direct buffer memory

Stack Trace:
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at 
org.apache.solr.store.blockcache.BlockCache.(BlockCache.java:68)
at 
org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:941)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)


FAILED:  org.apache.solr.store.blockcache.BlockDirectoryTest.testEOF

Error Message:
Direct buffer memory

Stack Trace:
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at 

Re: Collection API for performance monitoring?

2016-11-15 Thread Ryan Josal
I haven't tried for 95th percentile, but generally with those collection
start stats you would monitor based on calculated deltas.  You can figure
out the average response time for any given window of time not smaller than
your snapshot polling interval.  I don't see why 95th percentile would be
any different.

Ryan

On Monday, November 14, 2016, Walter Underwood 
wrote:

> Because the current stats are not usable. They really should be removed
> from the code.
>
> They calculate percentiles since the last collection load. We need to know
> 95th percentile
> during the peak hour last night, not the 95th for the last month.
>
> Right now, we run eleven collections in our Solr 4 cluster. In each
> collection, we have
> several different handlers. Usually, one for autosuggest (instant
> results), one for the SRP,
> and one for mobile, though we also have SEO requests and so on. We can
> track performance
> for each of these.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> 
> http://observer.wunderwood.org/  (my blog)
>
>
> On Nov 14, 2016, at 3:54 PM, Erick Erickson  > wrote:
>
> Point taken, and thanks for the link. The stats I'm referring to in
> this thread are available now, and would (I think) be a quick win. I
> don't have a huge amount of investment in it though, more "why didn't
> we think of this before?" followed by "maybe there's a very good
> reason not to bother". This may be it since we now standardize on
> Jetty. My question of course is whether this would be supported moving
> forward to netty or whatever...
>
> Best,
> Erick
>
> On Mon, Nov 14, 2016 at 3:44 PM, Walter Underwood  > wrote:
>
> I’m not fond of polling for performance stats. I’d rather have the app
> report them.
>
> We could integrate existing Jetty monitoring:
>
> http://metrics.dropwizard.io/3.1.0/manual/jetty/
>
> From our experience with a similar approach, we might need some
> Solr-specific metric
> conflation. SolrJ sends a request to /solr/collection/handler as
> /solr/collection/select?qt=/handler.
> In our code, we fix that request to the intended path. We’ve been running a
> Tomcat metrics search
> filter for three years.
>
> Also, see:
>
> https://issues.apache.org/jira/browse/SOLR-8785
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> 
> http://observer.wunderwood.org/  (my blog)
>
>
> On Nov 14, 2016, at 3:25 PM, Erick Erickson  > wrote:
>
> What do people think about exposing a Collections API call (name TBD,
> but the sense is PERFORMANCESTATS) that would simply issue the
> admin/mbeans call to each replica of a collection and report them
> back. This would give operations monitors the ability to see, say,
> anomalous replicas that had poor average response times for the last 5
> minutes and the like.
>
> Seems like an easy enhancement that would make ops people's lives easier.
>
> I'll raise a JIRA if there's interest, but sure won't make progress on
> it until I clear my plate of some other JIRAs that I've let linger for
> far too long.
>
> Erick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
>
>
>


[jira] [Updated] (LUCENE-7536) ASCIIFoldingFilterFactory.getMultiTermComponent can emit two tokens

2016-11-15 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-7536:
-
Attachment: LUCENE-7536.patch

Here is a proposal: the multi-term component only emits the folded token, even 
when {{preserveOriginal}} is true.

> ASCIIFoldingFilterFactory.getMultiTermComponent can emit two tokens
> ---
>
> Key: LUCENE-7536
> URL: https://issues.apache.org/jira/browse/LUCENE-7536
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7536.patch
>
>
> My understanding is that it is a requirement for multi-term analysis to only 
> normalize tokens, and not eg. remove tokens (stop filter) or add tokens (by 
> tokenizing or adding synonyms). Yet 
> ASCIIFoldingFilterFactory.getMultiTermComponent will return a factory that 
> emits synonyms if preserveOriginal is set to true on the original filter.
> This looks like a bug to me but I'm not entirely sure how to fix it. Should 
> the multi-term analysis component do ascii folding or not if the original 
> factory has preserveOriginal set to true?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7537) Add multi valued field support to index sorting

2016-11-15 Thread Ferenczi Jim (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferenczi Jim updated LUCENE-7537:
-
Attachment: LUCENE-7537.patch

Thanks [~mikemccand], I attached a new patch that addresses your comments. 
I can also make another path for 6.4 if needed.

> Add multi valued field support to index sorting
> ---
>
> Key: LUCENE-7537
> URL: https://issues.apache.org/jira/browse/LUCENE-7537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Ferenczi Jim
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7537.patch, LUCENE-7537.patch, LUCENE-7537.patch, 
> LUCENE-7537.patch
>
>
> Today index sorting can be done on single valued field through the 
> NumericDocValues (for numerics) and SortedDocValues (for strings).
> I'd like to add the ability to sort on multi valued fields. Since index 
> sorting does not accept custom comparator we could just take the minimum 
> value of each document for an ascending sort and the maximum value for a 
> descending sort.
> This way we could handle all cases instead of throwing an exception during a 
> merge when we encounter a multi valued DVs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9769) solr stop on a service already stopped should return exit code 0

2016-11-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiří Pejchal updated SOLR-9769:
---
Description: 
According to the LSB specification
https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html#INISCRPTACT
 running stop on a service already stopped or not running should be considered 
successful and return code should be 0 (zero).

Solr currently returns exit code 1:
{code}
$ /etc/init.d/solr stop; echo $?
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds 
to allow Jetty process 4277 to stop gracefully.
0
$ /etc/init.d/solr stop; echo $?
No process found for Solr node running on port 8983
1
{code}

{code:title="bin/solr"}
if [ "$SOLR_PID" != "" ]; then
stop_solr "$SOLR_SERVER_DIR" "$SOLR_PORT" "$STOP_KEY" "$SOLR_PID"
  else
if [ "$SCRIPT_CMD" == "stop" ]; then
  echo -e "No process found for Solr node running on port $SOLR_PORT"
  exit 1
fi
  fi

{code}

  was:
According to the LSB specification
https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html#INISCRPTACT
 running stop on a service already stopped or not running should be considered 
successful and return code should be 0 (zero).

Solr currently returns exit code 1:
{code}
$ /etc/init.d/solr stop; echo $?
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds 
to allow Jetty process 4277 to stop gracefully.
0
$ /etc/init.d/solr stop; echo $?
No process found for Solr node running on port 8983
1
{code}


> solr stop on a service already stopped should return exit code 0
> 
>
> Key: SOLR-9769
> URL: https://issues.apache.org/jira/browse/SOLR-9769
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: scripts and tools
>Affects Versions: 6.3
>Reporter: Jiří Pejchal
>
> According to the LSB specification
> https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html#INISCRPTACT
>  running stop on a service already stopped or not running should be 
> considered successful and return code should be 0 (zero).
> Solr currently returns exit code 1:
> {code}
> $ /etc/init.d/solr stop; echo $?
> Sending stop command to Solr running on port 8983 ... waiting up to 180 
> seconds to allow Jetty process 4277 to stop gracefully.
> 0
> $ /etc/init.d/solr stop; echo $?
> No process found for Solr node running on port 8983
> 1
> {code}
> {code:title="bin/solr"}
> if [ "$SOLR_PID" != "" ]; then
> stop_solr "$SOLR_SERVER_DIR" "$SOLR_PORT" "$STOP_KEY" "$SOLR_PID"
>   else
> if [ "$SCRIPT_CMD" == "stop" ]; then
>   echo -e "No process found for Solr node running on port $SOLR_PORT"
>   exit 1
> fi
>   fi
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9769) solr stop on a service already stopped should return exit code 0

2016-11-15 Thread JIRA
Jiří Pejchal created SOLR-9769:
--

 Summary: solr stop on a service already stopped should return exit 
code 0
 Key: SOLR-9769
 URL: https://issues.apache.org/jira/browse/SOLR-9769
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: scripts and tools
Affects Versions: 6.3
Reporter: Jiří Pejchal


According to the LSB specification
https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic.html#INISCRPTACT
 running stop on a service already stopped or not running should be considered 
successful and return code should be 0 (zero).

Solr currently returns exit code 1:
{code}
$ /etc/init.d/solr stop; echo $?
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds 
to allow Jetty process 4277 to stop gracefully.
0
$ /etc/init.d/solr stop; echo $?
No process found for Solr node running on port 8983
1
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7461) Refactor doc values queries to better use the new doc values APIs

2016-11-15 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7461.
--
   Resolution: Fixed
Fix Version/s: master (7.0)

> Refactor doc values queries to better use the new doc values APIs
> -
>
> Key: LUCENE-7461
> URL: https://issues.apache.org/jira/browse/LUCENE-7461
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7461.patch
>
>
> The new doc values APIs make it easy to implement a TwoPhaseIterator, and 
> things are going to be faster in the sparse case since we can use the doc 
> values object as an approximation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-6966) Contribution: Codec for index-level encryption

2016-11-15 Thread Renaud Delbru (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renaud Delbru updated LUCENE-6966:
--
Attachment: Encryption Codec Documentation.pdf

An initial technical documentation.

> Contribution: Codec for index-level encryption
> --
>
> Key: LUCENE-6966
> URL: https://issues.apache.org/jira/browse/LUCENE-6966
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Renaud Delbru
>  Labels: codec, contrib
> Attachments: Encryption Codec Documentation.pdf, LUCENE-6966-1.patch, 
> LUCENE-6966-2-docvalues.patch, LUCENE-6966-2.patch
>
>
> We would like to contribute a codec that enables the encryption of sensitive 
> data in the index that has been developed as part of an engagement with a 
> customer. We think that this could be of interest for the community.
> Below is a description of the project.
> h1. Introduction
> In comparison with approaches where all data is encrypted (e.g., file system 
> encryption, index output / directory encryption), encryption at a codec level 
> enables more fine-grained control on which block of data is encrypted. This 
> is more efficient since less data has to be encrypted. This also gives more 
> flexibility such as the ability to select which field to encrypt.
> Some of the requirements for this project were:
> * The performance impact of the encryption should be reasonable.
> * The user can choose which field to encrypt.
> * Key management: During the life cycle of the index, the user can provide a 
> new version of his encryption key. Multiple key versions should co-exist in 
> one index.
> h1. What is supported ?
> - Block tree terms index and dictionary
> - Compressed stored fields format
> - Compressed term vectors format
> - Doc values format (prototype based on an encrypted index output) - this 
> will be submitted as a separated patch
> - Index upgrader: command to upgrade all the index segments with the latest 
> key version available.
> h1. How it is implemented ?
> h2. Key Management
> One index segment is encrypted with a single key version. An index can have 
> multiple segments, each one encrypted using a different key version. The key 
> version for a segment is stored in the segment info.
> The provided codec is abstract, and a subclass is responsible in providing an 
> implementation of the cipher factory. The cipher factory is responsible of 
> the creation of a cipher instance based on a given key version.
> h2. Encryption Model
> The encryption model is based on AES/CBC with padding. Initialisation vector 
> (IV) is reused for performance reason, but only on a per format and per 
> segment basis.
> While IV reuse is usually considered a bad practice, the CBC mode is somehow 
> resilient to IV reuse. The only "leak" of information that this could lead to 
> is being able to know that two encrypted blocks of data starts with the same 
> prefix. However, it is unlikely that two data blocks in an index segment will 
> start with the same data:
> - Stored Fields Format: Each encrypted data block is a compressed block 
> (~4kb) of one or more documents. It is unlikely that two compressed blocks 
> start with the same data prefix.
> - Term Vectors: Each encrypted data block is a compressed block (~4kb) of 
> terms and payloads from one or more documents. It is unlikely that two 
> compressed blocks start with the same data prefix.
> - Term Dictionary Index: The term dictionary index is encoded and encrypted 
> in one single data block.
> - Term Dictionary Data: Each data block of the term dictionary encodes a set 
> of suffixes. It is unlikely to have two dictionary data blocks sharing the 
> same prefix within the same segment.
> - DocValues: A DocValues file will be composed of multiple encrypted data 
> blocks. It is unlikely to have two data blocks sharing the same prefix within 
> the same segment (each one will encodes a list of values associated to a 
> field).
> To the best of our knowledge, this model should be safe. However, it would be 
> good if someone with security expertise in the community could review and 
> validate it. 
> h1. Performance
> We report here a performance benchmark we did on an early prototype based on 
> Lucene 4.x. The benchmark was performed on the Wikipedia dataset where all 
> the fields (id, title, body, date) were encrypted. Only the block tree terms 
> and compressed stored fields format were tested at that time. 
> h2. Indexing
> The indexing throughput slightly decreased and is roughly 15% less than with 
> the base Lucene. 
> The merge time slightly increased by 35%.
> There was no significant difference in term of index size.
> h2. Query Throughput
> With respect to query throughput, we observed no significant impact on the 
> following 

[jira] [Commented] (LUCENE-6966) Contribution: Codec for index-level encryption

2016-11-15 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667213#comment-15667213
 ] 

Renaud Delbru commented on LUCENE-6966:
---

Is there still interest from the community in considering this patch as a 
contribution ? Even if there are limitations and therefore this will not cover 
all possible scenarios, we think this provides an initial set of core features 
and a good starting point for future work. We received multiples personal 
request for this patch which shows there is a certain interest for such a 
feature. I am attaching also an initial technical documentation that explains 
how to use the codec and clarifies its current known limitations.

> Contribution: Codec for index-level encryption
> --
>
> Key: LUCENE-6966
> URL: https://issues.apache.org/jira/browse/LUCENE-6966
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Renaud Delbru
>  Labels: codec, contrib
> Attachments: Encryption Codec Documentation.pdf, LUCENE-6966-1.patch, 
> LUCENE-6966-2-docvalues.patch, LUCENE-6966-2.patch
>
>
> We would like to contribute a codec that enables the encryption of sensitive 
> data in the index that has been developed as part of an engagement with a 
> customer. We think that this could be of interest for the community.
> Below is a description of the project.
> h1. Introduction
> In comparison with approaches where all data is encrypted (e.g., file system 
> encryption, index output / directory encryption), encryption at a codec level 
> enables more fine-grained control on which block of data is encrypted. This 
> is more efficient since less data has to be encrypted. This also gives more 
> flexibility such as the ability to select which field to encrypt.
> Some of the requirements for this project were:
> * The performance impact of the encryption should be reasonable.
> * The user can choose which field to encrypt.
> * Key management: During the life cycle of the index, the user can provide a 
> new version of his encryption key. Multiple key versions should co-exist in 
> one index.
> h1. What is supported ?
> - Block tree terms index and dictionary
> - Compressed stored fields format
> - Compressed term vectors format
> - Doc values format (prototype based on an encrypted index output) - this 
> will be submitted as a separated patch
> - Index upgrader: command to upgrade all the index segments with the latest 
> key version available.
> h1. How it is implemented ?
> h2. Key Management
> One index segment is encrypted with a single key version. An index can have 
> multiple segments, each one encrypted using a different key version. The key 
> version for a segment is stored in the segment info.
> The provided codec is abstract, and a subclass is responsible in providing an 
> implementation of the cipher factory. The cipher factory is responsible of 
> the creation of a cipher instance based on a given key version.
> h2. Encryption Model
> The encryption model is based on AES/CBC with padding. Initialisation vector 
> (IV) is reused for performance reason, but only on a per format and per 
> segment basis.
> While IV reuse is usually considered a bad practice, the CBC mode is somehow 
> resilient to IV reuse. The only "leak" of information that this could lead to 
> is being able to know that two encrypted blocks of data starts with the same 
> prefix. However, it is unlikely that two data blocks in an index segment will 
> start with the same data:
> - Stored Fields Format: Each encrypted data block is a compressed block 
> (~4kb) of one or more documents. It is unlikely that two compressed blocks 
> start with the same data prefix.
> - Term Vectors: Each encrypted data block is a compressed block (~4kb) of 
> terms and payloads from one or more documents. It is unlikely that two 
> compressed blocks start with the same data prefix.
> - Term Dictionary Index: The term dictionary index is encoded and encrypted 
> in one single data block.
> - Term Dictionary Data: Each data block of the term dictionary encodes a set 
> of suffixes. It is unlikely to have two dictionary data blocks sharing the 
> same prefix within the same segment.
> - DocValues: A DocValues file will be composed of multiple encrypted data 
> blocks. It is unlikely to have two data blocks sharing the same prefix within 
> the same segment (each one will encodes a list of values associated to a 
> field).
> To the best of our knowledge, this model should be safe. However, it would be 
> good if someone with security expertise in the community could review and 
> validate it. 
> h1. Performance
> We report here a performance benchmark we did on an early prototype based on 
> Lucene 4.x. The benchmark was performed on the Wikipedia dataset where all 

[jira] [Commented] (LUCENE-7461) Refactor doc values queries to better use the new doc values APIs

2016-11-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667204#comment-15667204
 ] 

ASF subversion and git services commented on LUCENE-7461:
-

Commit 212b1d846235b06ec40fdf27cb969838072dca95 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=212b1d8 ]

LUCENE-7461: Refactor doc values queries to leverage the new iterator API.


> Refactor doc values queries to better use the new doc values APIs
> -
>
> Key: LUCENE-7461
> URL: https://issues.apache.org/jira/browse/LUCENE-7461
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7461.patch
>
>
> The new doc values APIs make it easy to implement a TwoPhaseIterator, and 
> things are going to be faster in the sparse case since we can use the doc 
> values object as an approximation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7562) CompletionFieldsConsumer throws NPE on ghost fields

2016-11-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667197#comment-15667197
 ] 

Adrien Grand commented on LUCENE-7562:
--

I see BasePostingsFormatTestCase has a test case for ghost fields, maybe we 
should have a test that extends BasePostingsFormatTestCase with 
CompletionPostingsFormat?

> CompletionFieldsConsumer throws NPE on ghost fields
> ---
>
> Key: LUCENE-7562
> URL: https://issues.apache.org/jira/browse/LUCENE-7562
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7562.patch
>
>
> If you index {{SuggestField}} for some field X, but later delete all 
> documents with that field, it can cause a ghost situation where the field 
> infos believes field X exists yet the postings do not.
> I believe this bug is the root cause of this ES issue: 
> https://github.com/elastic/elasticsearch/issues/21500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9284) The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps grow indefinitely.

2016-11-15 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667176#comment-15667176
 ] 

Steve Rowe commented on SOLR-9284:
--

OOM issues likely caused by commit here: 
https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18288/

Also reproducible, from my Jenkins:

{noformat}
  [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=BlockDirectoryTest 
-Dtests.method=testEOF -Dtests.seed=81253F7E7D614B6C -Dtests.slow=true 
-Dtests.locale=bg -Dtests.timezone=Etc/UCT -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   1.41s | BlockDirectoryTest.testEOF <<<
   [junit4]> Throwable #1: java.lang.OutOfMemoryError: Direct buffer memory
   [junit4]>at java.nio.Bits.reserveMemory(Bits.java:693)
   [junit4]>at 
java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
   [junit4]>at 
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
   [junit4]>at 
org.apache.solr.store.blockcache.BlockCache.(BlockCache.java:68)
   [junit4]>at 
org.apache.solr.store.blockcache.BlockDirectoryTest.setUp(BlockDirectoryTest.java:119)
   [junit4]>at java.lang.Thread.run(Thread.java:745)Throwable #2: 
java.lang.NullPointerException
   [junit4]>at 
org.apache.solr.store.blockcache.BlockDirectoryTest.tearDown(BlockDirectoryTest.java:131)
 {noformat}

> The HDFS BlockDirectoryCache should not let it's keysToRelease or names maps 
> grow indefinitely.
> ---
>
> Key: SOLR-9284
> URL: https://issues.apache.org/jira/browse/SOLR-9284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: hdfs
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 6.2, master (7.0)
>
> Attachments: SOLR-9284.patch, SOLR-9284.patch
>
>
> https://issues.apache.org/jira/browse/SOLR-9284



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9768) RecordingJsonParser produce incomplete json when updated document stream is larger than input parser buffer size

2016-11-15 Thread Wojciech Stryszyk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wojciech Stryszyk updated SOLR-9768:

Attachment: SOLR_9768_RecordingJsonParser_test.patch
SOLR_9768_RecordingJsonParser_fix.patch

> RecordingJsonParser produce incomplete json when updated document stream is 
> larger than input parser buffer size
> 
>
> Key: SOLR-9768
> URL: https://issues.apache.org/jira/browse/SOLR-9768
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Affects Versions: 6.1, 6.2, 6.3
>Reporter: Wojciech Stryszyk
>Priority: Minor
>  Labels: json, parsing, update_request_handler
> Attachments: SOLR_9768_RecordingJsonParser_fix.patch, 
> SOLR_9768_RecordingJsonParser_test.patch
>
>
> While using srcField, RecordingJsonParser produce incomplete json when 
> updated document stream is larger than buffer size (8192). 
> Fast fix is to align all documents in stream to buffer size. Another is 
> attached in patch bellow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >