[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 1062 - Still Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1062/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

1 tests failed.
FAILED:  org.apache.solr.update.SolrIndexMetricsTest.testIndexMetrics

Error Message:
minorMerge: 3 expected:<4> but was:<3>

Stack Trace:
java.lang.AssertionError: minorMerge: 3 expected:<4> but was:<3>
at 
__randomizedtesting.SeedInfo.seed([56E5B67BEE4E29C:C9BE66DB7E6A19A7]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.solr.update.SolrIndexMetricsTest.testIndexMetrics(SolrIndexMetricsTest.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 12733 lines...]
   [junit4] Suite: org.apache.solr.update.SolrIndexMetricsTest
   [junit4]   2> Creating dataDir: 

[jira] [Commented] (LUCENE-7614) Allow single prefix "phrase*" in complexphrase queryparser

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806949#comment-15806949
 ] 

ASF subversion and git services commented on LUCENE-7614:
-

Commit ac85a41cbefa7b0ea8c1b0b5c3ec9584d318a1cb in lucene-solr's branch 
refs/heads/branch_6x from [~mkhludnev]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac85a41 ]

LUCENE-7614: ComplexPhraseQueryParser ignores quotes around single terms phrases


> Allow single prefix "phrase*" in complexphrase queryparser 
> ---
>
> Key: LUCENE-7614
> URL: https://issues.apache.org/jira/browse/LUCENE-7614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Mikhail Khludnev
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7614.patch, LUCENE-7614.patch
>
>
> {quote}
> From  Otmar Caduff 
> Subject   ComplexPhraseQueryParser with wildcards
> Date  Tue, 20 Dec 2016 13:55:42 GMT
> Hi,
> I have an index with a single document with a field "field" and textual
> content "johnny peters" and I am using
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
> parse the query:
>field: (john* peter)
> When searching with this query, I am getting the document as expected.
> However with this query:
>field: ("john*" "peter")
> I am getting the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> query type "org.apache.lucene.search.PrefixQuery" found in phrase query
> string "john*"
> at
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7614) Allow single prefix "phrase*" in complexphrase queryparser

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806907#comment-15806907
 ] 

ASF subversion and git services commented on LUCENE-7614:
-

Commit 52f2a77b78fc95bc98d664411cda63d58606df52 in lucene-solr's branch 
refs/heads/master from [~mkhludnev]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=52f2a77 ]

LUCENE-7614: ComplexPhraseQueryParser ignores quotes around single terms phrases


> Allow single prefix "phrase*" in complexphrase queryparser 
> ---
>
> Key: LUCENE-7614
> URL: https://issues.apache.org/jira/browse/LUCENE-7614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Mikhail Khludnev
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7614.patch, LUCENE-7614.patch
>
>
> {quote}
> From  Otmar Caduff 
> Subject   ComplexPhraseQueryParser with wildcards
> Date  Tue, 20 Dec 2016 13:55:42 GMT
> Hi,
> I have an index with a single document with a field "field" and textual
> content "johnny peters" and I am using
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
> parse the query:
>field: (john* peter)
> When searching with this query, I am getting the document as expected.
> However with this query:
>field: ("john*" "peter")
> I am getting the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> query type "org.apache.lucene.search.PrefixQuery" found in phrase query
> string "john*"
> at
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-7620:
-
Attachment: LUCENE_7620_UH_LengthGoalBreakIterator.patch

Here's an update to the patch mostly related to testing to clarify what's being 
tested. And I did the {{createClosestToLength}} rename.

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-6.x - Build # 655 - Still Unstable

2017-01-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-6.x/655/

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.ltr.model.TestLinearModel

Error Message:
Suite timeout exceeded (>= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (>= 720 msec).
at __randomizedtesting.SeedInfo.seed([AC70116F38CD4793]:0)




Build Log:
[...truncated 19322 lines...]
   [junit4] Suite: org.apache.solr.ltr.model.TestLinearModel
   [junit4]   2> Creating dataDir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-6.x/solr/build/contrib/solr-ltr/test/J1/temp/solr.ltr.model.TestLinearModel_AC70116F38CD4793-001/init-core-data-001
   [junit4]   2> 36986 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (false) via: 
@org.apache.solr.util.RandomizeSSL(reason=, ssl=NaN, value=NaN, clientAuth=NaN)
   [junit4]   2> 36986 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.SolrTestCaseJ4 initCore
   [junit4]   2> 36992 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.0.0
   [junit4]   2> 36999 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.s.IndexSchema [null] Schema name=example
   [junit4]   2> 37005 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.s.IndexSchema Loaded schema example/1.5 with uniqueid field id
   [junit4]   2> 37009 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.u.UpdateShardHandler Creating UpdateShardHandler HTTP client with params: 
socketTimeout=3=3=true
   [junit4]   2> 37010 WARN  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.m.r.SolrJmxReporter No serviceUrl or agentId was configured, using first 
MBeanServer.
   [junit4]   2> 37012 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.m.r.SolrJmxReporter JMX monitoring enabled at server: 
com.sun.jmx.mbeanserver.JmxMBeanServer@516a0efe
   [junit4]   2> 37012 WARN  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.m.r.SolrJmxReporter No serviceUrl or agentId was configured, using first 
MBeanServer.
   [junit4]   2> 37014 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.m.r.SolrJmxReporter JMX monitoring enabled at server: 
com.sun.jmx.mbeanserver.JmxMBeanServer@516a0efe
   [junit4]   2> 37014 WARN  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.m.r.SolrJmxReporter No serviceUrl or agentId was configured, using first 
MBeanServer.
   [junit4]   2> 37014 INFO  
(SUITE-TestLinearModel-seed#[AC70116F38CD4793]-worker) [] 
o.a.s.m.r.SolrJmxReporter JMX monitoring enabled at server: 
com.sun.jmx.mbeanserver.JmxMBeanServer@516a0efe
   [junit4]   2> 37020 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.c.SolrConfig Using Lucene MatchVersion: 6.0.0
   [junit4]   2> 37034 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.s.IndexSchema [collection1] Schema name=example
   [junit4]   2> 37040 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.s.IndexSchema Loaded schema example/1.5 with uniqueid 
field id
   [junit4]   2> 37042 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.c.CoreContainer Creating SolrCore 'collection1' using 
configuration from instancedir 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-6.x/solr/contrib/ltr/src/test-files/solr/collection1
   [junit4]   2> 37042 WARN  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.m.r.SolrJmxReporter No serviceUrl or agentId was 
configured, using first MBeanServer.
   [junit4]   2> 37043 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.m.r.SolrJmxReporter JMX monitoring enabled at server: 
com.sun.jmx.mbeanserver.JmxMBeanServer@516a0efe
   [junit4]   2> 37043 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.c.SolrCore [[collection1] ] Opening new SolrCore at 
[/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-6.x/solr/contrib/ltr/src/test-files/solr/collection1],
 
dataDir=[/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-Tests-6.x/solr/build/contrib/solr-ltr/test/J1/temp/solr.ltr.model.TestLinearModel_AC70116F38CD4793-001/init-core-data-001/]
   [junit4]   2> 37064 WARN  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.c.RequestHandlers no default request handler is registered 
(either '/select' or 'standard')
   [junit4]   2> 37072 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.u.UpdateHandler Using UpdateLog implementation: 
org.apache.solr.update.UpdateLog
   [junit4]   2> 37072 INFO  (coreLoadExecutor-371-thread-1) [
x:collection1] o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= 
defaultSyncLevel=FLUSH numRecordsToKeep=100 maxNumLogsToKeep=10 
numVersionBuckets=65536
   [junit4]   2> 37072 INFO  

[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-06 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806603#comment-15806603
 ] 

Koji Sekiguchi commented on SOLR-9918:
--

Thank you for giving the great explanation which is more than I expected. :)

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
>Assignee: Koji Sekiguchi
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-9918:


Assignee: Koji Sekiguchi

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
>Assignee: Koji Sekiguchi
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues

2017-01-06 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806566#comment-15806566
 ] 

Ishan Chattopadhyaya commented on SOLR-5944:


{quote}
 We're either going to need to change things as part of SOLR-9941 first, or 
deal with out of order DBQs differently as part of this issue
{quote}
I have really tried to think of various ways to "deal with out of order DBQs 
differently", but haven't found anything other than the current fetch from 
leader logic. I've even looked at ways to "undelete" a recently DBQ'd document, 
but that didn't look so promising. There is likely no clean way, in a replica, 
to retro-actively decide to ditch the partial update and instead do a full 
update (since that decision has already been taken in the past by the leader, 
so only going back to the leader for a full update/document can suffice here). 
Hence, I would think we need to address SOLR-9941.

> Support updates of numeric DocValues
> 
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> TestStressInPlaceUpdates.eb044ac71.beast-167-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.beast-587-failure.stdout.txt, 
> TestStressInPlaceUpdates.eb044ac71.failures.tar.gz, defensive-checks.log.gz, 
> demo-why-dynamic-fields-cannot-be-inplace-updated-first-time.patch, 
> hoss.62D328FA1DEA57FD.fail.txt, hoss.62D328FA1DEA57FD.fail2.txt, 
> hoss.62D328FA1DEA57FD.fail3.txt, hoss.D768DD9443A98DC.fail.txt, 
> hoss.D768DD9443A98DC.pass.txt
>
>
> LUCENE-5189 introduced support for updates to numeric docvalues. It would be 
> really nice to have Solr support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-06 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-9941:
---
Attachment: SOLR-9941.patch

The attached patch seems to fix the problem, and feels like the right thing to 
do. Based on Hoss' idea:
{quote}
it seems like at a minimum DUH2 could inspect the UpdateCommand flags to see if 
this is a {{REPLAY}} command and if it is skip the {{UpdateLog.getDBQNewer}} 
call?
{quote}

The corresponding test failure in jira/solr-5944 branch passes with this patch 
(even after excluding the getUpdateLog().deleteAll() call).

> log replay redundently (pre-)applies DBQs as if they were out of order
> --
>
> Key: SOLR-9941
> URL: https://issues.apache.org/jira/browse/SOLR-9941
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
> Attachments: SOLR-9941.patch
>
>
> There's kind of an odd situation that arises when a Solr node starts up 
> (after a crash) and tries to recover from it's tlog that causes deletes to be 
> redundantly & excessively applied -- at a minimum it causes confusing really 
> log messages
> * {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
> recent log files found (based on numRecordsToKeep) and then builds a 
> {{RecentUpdates}} instance from them
> * Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
> ** {{deleteByQueries}}
> ** {{oldDeletes}} (for deleteById).
> * Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used 
> to replay any (uncommited) {{TransactionLog}} enteries
> ** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain 
> to for the various adds/deletes, etc...
> ** when an add makes it to {{RunUpdateProcessor}} it delegates to 
> {{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
> replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
> "Reordered" deletes that have a version greater then the add
> *** if it finds _any_ DBQs "newer" then the document being added, it does a 
> low level {{IndexWriter.updateDocument}} and then immediately executes _all_ 
> the newer DBQs ... _once per add_
> ** these deletes are *also* still executed as part of the normal tlog replay, 
> because they are in the tlog.
> Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
> DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 
> 90 of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
> detected. ..."}} even tough the only reason they are out of order is because 
> Solr is deliberately applying them out of order.
> * At a minimum we should improve the log messages
> * Ideally we should stop (pre-emptively) applying these deletes during tlog 
> replay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Windows (64bit/jdk1.8.0_112) - Build # 673 - Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/673/
Java: 64bit/jdk1.8.0_112 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.lucene.index.TestDirectoryReader.testFilesOpenClose

Error Message:


Stack Trace:
java.lang.reflect.InvocationTargetException
at 
__randomizedtesting.SeedInfo.seed([EE4F33618F0F7F12:CA56815D76F2667B]:0)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.lucene.util.CommandLineUtil.newFSDirectory(CommandLineUtil.java:138)
at 
org.apache.lucene.util.LuceneTestCase.newFSDirectoryImpl(LuceneTestCase.java:1606)
at 
org.apache.lucene.util.LuceneTestCase.newFSDirectory(LuceneTestCase.java:1410)
at 
org.apache.lucene.util.LuceneTestCase.newFSDirectory(LuceneTestCase.java:1391)
at 
org.apache.lucene.util.LuceneTestCase.newFSDirectory(LuceneTestCase.java:1378)
at 
org.apache.lucene.index.TestDirectoryReader.testFilesOpenClose(TestDirectoryReader.java:449)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)

[jira] [Commented] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806526#comment-15806526
 ] 

David Smiley commented on LUCENE-7620:
--

bq. (Tim) For the following method, does it make sense to return the baseIter 
if the followingIdx < startIndex? Maybe throw an exception instead or just have 
an assert that it's less?

It's already asserting (line 124).  Or I'm not understanding you.

bq.  (Tim) This is subjective, but I find it's more useful to break out the 
different tests with methods for each condition. For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom, etc. Similar for the 
defaultSummary tests. This helps when coming back to the test and helps tease 
apart if one piece of functionality is broken vs another.

Fair point.  A better compromise in my mind that is not as verbose as your 
suggestion is to use the "message" parameter of the assert methods.  I will do 
this and upload a new patch tonight.

bq.  (Jim) ... but I wonder if the logic to get the boundary could not be 
simplified.
Isn't it possible to always invoke baseIter.preceding(targetIdx) and based on 
isMinimumSize return current() or baseIter.next() ?

No; I don't think so. If one looks at {{preceding(target)}}, you don't know if 
it's result is closer to the target than the following break or not.  The 
"target" mode of this BI gets the _closest_ break.  Come to think of it, maybe 
I should rename {{createTargetLength}} to be {{createClosestToLength}}.  At 
least it's javadocs are already clear I think?

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806526#comment-15806526
 ] 

David Smiley edited comment on LUCENE-7620 at 1/7/17 2:21 AM:
--

bq. (Tim) For the following method, does it make sense to return the baseIter 
if the followingIdx < startIndex? Maybe throw an exception instead or just have 
an assert that it's less?

It's already asserting (line 124).  Or I'm not understanding you.

bq.  (Tim) This is subjective, but I find it's more useful to break out the 
different tests with methods for each condition. For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom, etc. Similar for the 
defaultSummary tests. This helps when coming back to the test and helps tease 
apart if one piece of functionality is broken vs another.

Fair point.  A better compromise in my mind that is not as verbose as your 
suggestion is to use the "message" parameter of the assert methods.  I will do 
this and upload a new patch tonight.

bq.  (Jim) ... but I wonder if the logic to get the boundary could not be 
simplified. Isn't it possible to always invoke baseIter.preceding(targetIdx) 
and based on isMinimumSize return current() or baseIter.next() ?

No; I don't think so. If one looks at {{preceding(target)}}, you don't know if 
it's result is closer to the target than the following break or not.  The 
"target" mode of this BI gets the _closest_ break.  Come to think of it, maybe 
I should rename {{createTargetLength}} to be {{createClosestToLength}}.  At 
least it's javadocs are already clear I think?


was (Author: dsmiley):
bq. (Tim) For the following method, does it make sense to return the baseIter 
if the followingIdx < startIndex? Maybe throw an exception instead or just have 
an assert that it's less?

It's already asserting (line 124).  Or I'm not understanding you.

bq.  (Tim) This is subjective, but I find it's more useful to break out the 
different tests with methods for each condition. For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom, etc. Similar for the 
defaultSummary tests. This helps when coming back to the test and helps tease 
apart if one piece of functionality is broken vs another.

Fair point.  A better compromise in my mind that is not as verbose as your 
suggestion is to use the "message" parameter of the assert methods.  I will do 
this and upload a new patch tonight.

bq.  (Jim) ... but I wonder if the logic to get the boundary could not be 
simplified.
Isn't it possible to always invoke baseIter.preceding(targetIdx) and based on 
isMinimumSize return current() or baseIter.next() ?

No; I don't think so. If one looks at {{preceding(target)}}, you don't know if 
it's result is closer to the target than the following break or not.  The 
"target" mode of this BI gets the _closest_ break.  Come to think of it, maybe 
I should rename {{createTargetLength}} to be {{createClosestToLength}}.  At 
least it's javadocs are already clear I think?

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-06 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806407#comment-15806407
 ] 

Cao Manh Dat edited comment on SOLR-8292 at 1/7/17 1:59 AM:


I think people kinda of misunderstanding this line
{code}
* @return The log record, or null if EOF
{code} 
{{EOF}} here is not related to {{EOFException}}, {{EOF}} mean when the file is 
fully read to the end. While {{EOFException}} throw by TransactionLog.next() 
mean the file is corrupted. 

For example 
{code:title=TransactionLog.java|borderStyle=solid}
codec.writeTag(JavaBinCodec.ARR, 3);
codec.writeInt(UpdateLog.ADD | flags);  // should just take one byte
codec.writeLong(cmd.getVersion());
codec.writeSolrInputDocument(cmd.getSolrInputDocument());
{code}

So when {{LogReader}} read to the tag {{JavaBinCodec.ARR = 3}}, it will expect 
that there are 3 more elements to be read. But if the file have only 2 elements 
( because the file is corrupted/truncated ) it will throw an {{EOFException}}.

FIY: I also write a test ( {{TestCloudRecovery.corruptedLogTest()}} ) to check 
that even if all the tlogs is corrupted/truncated, the collection can still 
become healthy after restart.

So in my opinion SOLR-4116 is quite general, 
- if the system is restarted gracefully and the EOFException still be thrown so 
it's a bug. 
- If the system is restarted roughly ( kill -9 ) so it's not a bug.

So we do not make sure that this is a bug or not without a way to re procedure 
that.


was (Author: caomanhdat):
I think people kinda of misunderstanding this line
{code}
* @return The log record, or null if EOF
{code} 
{{EOF}} here is not related to {{EOFException}}, {{EOF}} mean when the file is 
fully read to the end. While {{EOFException}} throw by TransactionLog.next() 
mean the file is corrupted. 

For example 
{code:title=TransactionLog.java|borderStyle=solid}
codec.writeTag(JavaBinCodec.ARR, 3);
codec.writeInt(UpdateLog.ADD | flags);  // should just take one byte
codec.writeLong(cmd.getVersion());
codec.writeSolrInputDocument(cmd.getSolrInputDocument());
{code}

So when {{LogReader}} read to the tag {{JavaBinCodec.ARR = 3}}, it will expect 
that there are 3 more elements to be read. But if the file have only 2 elements 
( because the file is corrupted/truncated ) it will throw an {{EOFException}}.

FIY: I also write a test ( {{TestCloudRecovery.corruptedLogTest()}} ) to check 
that even if all the tlogs is corrupted/truncated, the collection can still 
become healthy after restart.

So in my opinion SOLR-4116 is quite general, 
- if the system is restarted gracefully and the EOFException still be thrown so 
it's a bug. 
- If the system is restarted roughly ( kill -9 ) so it's not a bug.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5944) Support updates of numeric DocValues

2017-01-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806457#comment-15806457
 ] 

Hoss Man commented on SOLR-5944:


while reviewing some of ishan's recent commits to the jira/solr-5944 branch I 
was confused by how exactly 
{{TestRecovery.testLogReplayWithInPlaceUpdatesAndDeletes}} works and what it 
was doing, and asked ishan about it offline -- while he walked me through the 
test, we realized that at one point while writting it he had assumed some weird 
behavior he had seen relating to the update log replay was an artifact of the 
test harness, and he included this line to work around it...

{code}
  // Clearing the updatelog as it would happen after a fresh node restart
  h.getCore().getUpdateHandler().getUpdateLog().deleteAll();
{code}

..but the more we talked about it the more it seemed like a legitimate bug -- 
either in the new code, or in the existing log replay code.

I've been investigating and it seems like this is intentional, but weird, code 
on master, see SOLR-9941.

The net result being that the test as original written (w/o that call to 
{{getUpdateLog().deleteAll();}} really was finding a problematic situation with 
log recovery on the branch, because of how the existing DUH2 code tries to 
pre-emptively apply DBQs during log recovery.  We're either going to need to 
change things as part of SOLR-9941 first, or deal with out of order DBQs 
differently as part of this issue, because the current approach of "re-fetch 
whole doc from leader" won't work if the leader (or a single node install) is 
itself recovering from it's tlog.



Here's some simple steps to demonstrate the problem as it stands on the 
jira/solr-5944 branch (as of 5db04fd)...

{noformat}
bin/solr -e techproducts

curl -X POST http://localhost:8983/solr/techproducts/config -H 'Content-Type: 
application/json' --data-binary 
'{"set-property":{"updateHandler.autoCommit.maxTime":"-1"}}'

curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field":{
 "name":"foo_dvo",
 "type":"int",
 "stored":false,
 "indexed":false,
 "docValues":true }
}' http://localhost:8983/solr/techproducts/schema

curl -X POST http://localhost:8983/solr/techproducts/update -H 'Content-Type: 
application/json' --data-binary '{
  "add":{
"doc": {
  "id": "DOCX",
  "foo_dvo": 41
}
   },
  "delete": { "query":"foo_dvo:42" },
  "delete": { "query":"foo_dvo:43" },
  "add":{
 "doc": {
  "id": "DOCX",
  "foo_dvo": { "inc" : "1" }
}
  },
  "delete": { "query":"foo_dvo:41" },
  "delete": { "query":"foo_dvo:43" },
  "add":{
"doc": {
  "id": "DOCX",
  "foo_dvo": { "inc" : "1" }
}
  },
  "delete": { "query":"foo_dvo:41" },
  "delete": { "query":"foo_dvo:42" }
}'

# verify the in-place atomic updates were applied correctly...
curl 'http://localhost:8983/solr/techproducts/get?wt=json=DOCX'
{
  "doc":
  {
"id":"DOCX",
"_version_":1555823554278195200,
"foo_dvo":43}}

# crash the node...
kill -9 PID # use whatever PID you get from "ps -ef | grep start.jar | grep 
techproducts"

# restart and let recovery from log replay happen...
bin/solr start -s example/techproducts/solr/

# because of how the DUH2/UpdateLog code interacts, the doc is deleted during 
one of the DBQs and the inplace update code can't recover it from anywhere 
else...
curl 'http://localhost:8983/solr/techproducts/get?wt=json=DOCX'
{
  "doc":null}

{noformat}



> Support updates of numeric DocValues
> 
>
> Key: SOLR-5944
> URL: https://issues.apache.org/jira/browse/SOLR-5944
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Shalin Shekhar Mangar
> Attachments: DUP.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> SOLR-5944.patch, SOLR-5944.patch, SOLR-5944.patch, 
> 

[jira] [Commented] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806431#comment-15806431
 ] 

Hoss Man commented on SOLR-9941:



I'm guessing this situation exists because the {{RecentUpdates}} code was added 
to {{UpdateLog}} to handle (true) out of order updates while a system was 
"live", but it's not clear to me why the {{RecentUpdates}} are used the way 
they are during {{UpdateLog.init}} ... it seems like at a minimum DUH2 could 
inspect the UpdateCommand flags to see if this is a {{REPLAY}} command and if 
it is skip the {{UpdateLog.getDBQNewer}} call?

Here's some steps that make it easy to see how weird this situation can get...

{noformat}
bin/solr -e techproducts

curl -X POST http://localhost:8983/solr/techproducts/config -H 'Content-Type: 
application/json' --data-binary 
'{"set-property":{"updateHandler.autoCommit.maxTime":"-1"}}'

curl -X POST http://localhost:8983/solr/techproducts/update -H 'Content-Type: 
application/json' --data-binary '{
  "add":{
"doc": {
  "id": "DOCX",
  "foo_i": 41
}
   },
  "delete": { "query":"foo_i:42" },
  "delete": { "query":"foo_i:43" },
  "add":{
 "doc": {
  "id": "DOCX",
  "foo_i": { "inc" : "1" }
}
  },
  "delete": { "query":"foo_i:41" },
  "delete": { "query":"foo_i:43" },
  "add":{
"doc": {
  "id": "DOCX",
  "foo_i": { "inc" : "1" }
}
  },
  "delete": { "query":"foo_i:41" },
  "delete": { "query":"foo_i:42" }
}'

# verify the updates were applied correctly and doc didn't get deleted by 
mistake
curl 'http://localhost:8983/solr/techproducts/get?wt=json=DOCX'
{
  "doc":
  {
"id":"DOCX",
"foo_i":43,
"_version_":1555827152896655360}}


kill -9 PID # use whatever PID you get from "ps -ef | grep start.jar | grep 
techproducts"

bin/solr start -s example/techproducts/solr/

# re-verify the updates were applied correctly during replay
curl 'http://localhost:8983/solr/techproducts/get?wt=json=DOCX'
{
  "doc":
  {
"id":"DOCX",
"foo_i":43,
"_version_":1555827152896655360}}

{noformat}

And here's a snippet of solr's logs during the replay on (second) startup...

{noformat}
WARN  - 2017-01-07 01:27:45.408; [   x:techproducts] 
org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay 
tlog{file=/home/hossman/lucene/dev/solr/example/techproducts/solr/techproducts/data/tlog/tlog.001
 refcount=2} active=false starting pos=0
INFO  - 2017-01-07 01:27:45.422; [   x:techproducts] 
org.apache.solr.update.DirectUpdateHandler2; Reordered DBQs detected.  
Update=add{flags=a,_version_=1555827152848420864,id=DOCX} 
DBQs=[DBQ{version=1555827152905043968,q=foo_i:42}, 
DBQ{version=1555827152897703936,q=foo_i:41}, 
DBQ{version=1555827152894558208,q=foo_i:43}, 
DBQ{version=1555827152883023872,q=foo_i:41}, 
DBQ{version=1555827152872538112,q=foo_i:43}, 
DBQ{version=1555827152849469440,q=foo_i:42}]
INFO  - 2017-01-07 01:27:45.434; [   x:techproducts] 
org.apache.solr.core.SolrCore; [techproducts]  webapp=null path=null 
params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
 hits=3 status=0 QTime=62
INFO  - 2017-01-07 01:27:45.446; [   x:techproducts] 
org.apache.solr.core.QuerySenderListener; QuerySenderListener done.
INFO  - 2017-01-07 01:27:45.446; [   x:techproducts] 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener; 
Loading spell index for spellchecker: default
INFO  - 2017-01-07 01:27:45.446; [   x:techproducts] 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener; 
Loading spell index for spellchecker: wordbreak
INFO  - 2017-01-07 01:27:45.448; [   x:techproducts] 
org.apache.solr.core.SolrCore; [techproducts] Registered new searcher 
Searcher@1826a6c8[techproducts] 
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.0.0):C32)))}
INFO  - 2017-01-07 01:27:45.532; [   x:techproducts] 
org.apache.solr.search.SolrIndexSearcher; Opening 
[Searcher@5bd6b40f[techproducts] realtime]
INFO  - 2017-01-07 01:27:45.536; [   x:techproducts] 
org.apache.solr.update.DirectUpdateHandler2; Reordered DBQs detected.  
Update=add{flags=a,_version_=1555827152881975296,id=DOCX} 
DBQs=[DBQ{version=1555827152905043968,q=foo_i:42}, 
DBQ{version=1555827152897703936,q=foo_i:41}, 
DBQ{version=1555827152894558208,q=foo_i:43}, 
DBQ{version=1555827152883023872,q=foo_i:41}]
INFO  - 2017-01-07 01:27:45.546; [   x:techproducts] 
org.apache.solr.search.SolrIndexSearcher; Opening 
[Searcher@667a386f[techproducts] realtime]
INFO  - 2017-01-07 01:27:45.555; [   x:techproducts] 
org.apache.solr.update.DirectUpdateHandler2; Reordered DBQs detected.  
Update=add{flags=a,_version_=1555827152896655360,id=DOCX} 
DBQs=[DBQ{version=1555827152905043968,q=foo_i:42}, 
DBQ{version=1555827152897703936,q=foo_i:41}]
INFO  - 2017-01-07 01:27:45.573; [   x:techproducts] 
org.apache.solr.search.SolrIndexSearcher; Opening 
[Searcher@3ed8aa7e[techproducts] 

[jira] [Comment Edited] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-06 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806407#comment-15806407
 ] 

Cao Manh Dat edited comment on SOLR-8292 at 1/7/17 1:33 AM:


I think people kinda of misunderstanding this line
{code}
* @return The log record, or null if EOF
{code} 
{{EOF}} here is not related to {{EOFException}}, {{EOF}} mean when the file is 
fully read to the end. While {{EOFException}} throw by TransactionLog.next() 
mean the file is corrupted. 

For example 
{code:title=TransactionLog.java|borderStyle=solid}
codec.writeTag(JavaBinCodec.ARR, 3);
codec.writeInt(UpdateLog.ADD | flags);  // should just take one byte
codec.writeLong(cmd.getVersion());
codec.writeSolrInputDocument(cmd.getSolrInputDocument());
{code}

So when {{LogReader}} read to the tag {{JavaBinCodec.ARR = 3}}, it will expect 
that there are 3 more elements to be read. But if the file have only 2 elements 
( because the file is corrupted/truncated ) it will throw an {{EOFException}}.

FIY: I also write a test ( {{TestCloudRecovery.corruptedLogTest()}} ) to check 
that even if all the tlogs is corrupted/truncated, the collection can still 
become healthy after restart.

So in my opinion SOLR-4116 is quite general, 
- if the system is restarted gracefully and the EOFException still be thrown so 
it's a bug. 
- If the system is restarted roughly ( kill -9 ) so it's not a bug.


was (Author: caomanhdat):
I think people kinda of misunderstanding this line
{code}
* @return The log record, or null if EOF
{code} 
{{EOF}} here is not related to {{EOFException}}, {{EOF}} mean when the file is 
fully read to the end. While {{EOFException}} throw by TransactionLog.next() 
mean the file is corrupted. 

For example 
{code:title=TransactionLog.java|borderStyle=solid}
codec.writeTag(JavaBinCodec.ARR, 3);
codec.writeInt(UpdateLog.ADD | flags);  // should just take one byte
codec.writeLong(cmd.getVersion());
codec.writeSolrInputDocument(cmd.getSolrInputDocument());
{code}

So when {{LogReader}} read to the tag {{JavaBinCodec.ARR = 3}}, it will expect 
that there are 3 more elements to be read. But if the file have only 2 elements 
( because the file is corrupted ) it will throw an {{EOFException}}.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9941) log replay redundently (pre-)applies DBQs as if they were out of order

2017-01-06 Thread Hoss Man (JIRA)
Hoss Man created SOLR-9941:
--

 Summary: log replay redundently (pre-)applies DBQs as if they were 
out of order
 Key: SOLR-9941
 URL: https://issues.apache.org/jira/browse/SOLR-9941
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Hoss Man



There's kind of an odd situation that arises when a Solr node starts up (after 
a crash) and tries to recover from it's tlog that causes deletes to be 
redundantly & excessively applied -- at a minimum it causes confusing really 
log messages

* {{UpdateLog.init(...)}} creates {{TransactionLog}} instances for the most 
recent log files found (based on numRecordsToKeep) and then builds a 
{{RecentUpdates}} instance from them
* Delete entries from the {{RecentUpdates}} are used to populate 2 lists:
** {{deleteByQueries}}
** {{oldDeletes}} (for deleteById).
* Then when {{UpdateLog.recoverFromLog}} is called a {{LogReplayer}} is used to 
replay any (uncommited) {{TransactionLog}} enteries
** during replay {{UpdateLog}} delegates to the UpdateRequestProcessorChain to 
for the various adds/deletes, etc...
** when an add makes it to {{RunUpdateProcessor}} it delegates to 
{{DirectUpdateHandler2}}, which (independent of the fact that we're in log 
replay) calls {{UpdateLog.getDBQNewer}} for every add, looking for any 
"Reordered" deletes that have a version greater then the add
*** if it finds _any_ DBQs "newer" then the document being added, it does a low 
level {{IndexWriter.updateDocument}} and then immediately executes _all_ the 
newer DBQs ... _once per add_
** these deletes are *also* still executed as part of the normal tlog replay, 
because they are in the tlog.

Which means if you are recovering from a tlog with 90 addDocs, followed by 5 
DBQs, then *each* of those 5 DBQs will each be executed 91 times -- and for 90 
of those executions, a DUH2 INFO log messages will say {{"Reordered DBQs 
detected. ..."}} even tough the only reason they are out of order is because 
Solr is deliberately applying them out of order.

* At a minimum we should improve the log messages
* Ideally we should stop (pre-emptively) applying these deletes during tlog 
replay.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-06 Thread Cao Manh Dat (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806407#comment-15806407
 ] 

Cao Manh Dat commented on SOLR-8292:


I think people kinda of misunderstanding this line
{code}
* @return The log record, or null if EOF
{code} 
{{EOF}} here is not related to {{EOFException}}, {{EOF}} mean when the file is 
fully read to the end. While {{EOFException}} throw by TransactionLog.next() 
mean the file is corrupted. 

For example 
{code:title=TransactionLog.java|borderStyle=solid}
codec.writeTag(JavaBinCodec.ARR, 3);
codec.writeInt(UpdateLog.ADD | flags);  // should just take one byte
codec.writeLong(cmd.getVersion());
codec.writeSolrInputDocument(cmd.getSolrInputDocument());
{code}

So when {{LogReader}} read to the tag {{JavaBinCodec.ARR = 3}}, it will expect 
that there are 3 more elements to be read. But if the file have only 2 elements 
( because the file is corrupted ) it will throw an {{EOFException}}.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Windows (32bit/jdk1.8.0_112) - Build # 6339 - Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Windows/6339/
Java: 32bit/jdk1.8.0_112 -server -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test

Error Message:
timeout waiting to see all nodes active

Stack Trace:
java.lang.AssertionError: timeout waiting to see all nodes active
at 
__randomizedtesting.SeedInfo.seed([8EEB7ACCA81F5D78:6BF451606E33080]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.waitTillNodesActive(PeerSyncReplicationTest.java:311)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.bringUpDeadNodeAndEnsureNoReplication(PeerSyncReplicationTest.java:262)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.forceNodeFailureAndDoPeerSync(PeerSyncReplicationTest.java:244)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.test(PeerSyncReplicationTest.java:133)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 

[jira] [Commented] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806121#comment-15806121
 ] 

Jim Ferenczi commented on LUCENE-7620:
--

I think the two methods to create the break iterator are useful but I wonder if 
the logic to get the boundary could not be simplified.
Isn't it possible to always invoke baseIter.preceding(targetIdx) and based on 
isMinimumSize return current() or baseIter.next() ? 

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806094#comment-15806094
 ] 

Timothy M. Rodriguez edited comment on LUCENE-7620 at 1/6/17 11:09 PM:
---

Very useful!  I like that it decorates an underlying BreakIterator.  For the 
following method, does it make sense to return the baseIter if the followingIdx 
< startIndex?  Maybe throw an exception instead or just have an assert that 
it's less?

This is subjective, but I find it's more useful to break out the different 
tests with methods for each condition.  For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom,  etc. Similar for 
the defaultSummary tests.  This helps when coming back to the test and helps 
tease apart if one piece of functionality is broken vs another.


was (Author: timothy055):
Very useful!  I like that it decorates an underlying BreakIterator.  For the 
following method, does it make sense to return the baseIter if the followingIdx 
< startIndex?  Maybe throw an exception instead or just have an assert that 
it's less?

This is subjective, but I find it's more useful to break out the different 
tests with methods for each condition.  For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom,  etc. Similar for 
the defaultSummary tests.

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806094#comment-15806094
 ] 

Timothy M. Rodriguez commented on LUCENE-7620:
--

Very useful!  I like that it decorates an underlying BreakIterator.  For the 
following method, does it make sense to return the baseIter if the followingIdx 
< startIndex?  Maybe throw an exception instead or just have an assert that 
it's less?

This is subjective, but I find it's more useful to break out the different 
tests with methods for each condition.  For example: breakAtGoal, 
breakLessThanGoal, breakMoreThanGoal, breakGoalPlusRandom,  etc. Similar for 
the defaultSummary tests.

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2017-01-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806078#comment-15806078
 ] 

Joel Bernstein commented on SOLR-8593:
--

Great!

If we had another week we  could have gotten this into 6.4. But we'll have the 
entire 6.5 dev cycle to make sure it's ready to go.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch, SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Solaris (64bit/jdk1.8.0) - Build # 601 - Still Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/601/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.update.HardAutoCommitTest.testCommitWithin

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([FFCC61ADF6F6B847:451E0ED575D85652]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:818)
at 
org.apache.solr.update.HardAutoCommitTest.testCommitWithin(HardAutoCommitTest.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//result[@numFound=1]
xml response was: 

00


request was:q=id:529=standard=0=20=2.2
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:811)
... 40 more




Build Log:
[...truncated 11822 lines...]
   [junit4] Suite: 

[jira] [Updated] (SOLR-9937) StandardDirectoryFactory::move never uses atomic implementation

2017-01-06 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-9937:

Summary: StandardDirectoryFactory::move never uses atomic implementation  
(was: StandardDirectoryFactory::move never uses more efficient implementation)

> StandardDirectoryFactory::move never uses atomic implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Assignee: Mark Miller
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 6.4 release

2017-01-06 Thread David Smiley
LUCENE-7620 (with SOLR-9935 adapter) for UnifiedHighlighter Passage min or
target lengths.
I'll commit this weekend.  Very low risk of problems since it's an opt-in
feature on the Lucene side, and opt-in to the UH on the Solr side as well
(with the ability to disable this fragsize limit if you don't want it)..

On Fri, Jan 6, 2017 at 4:40 PM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> All done, thanks for checking. Have a good weekend.
>
> From: dev@lucene.apache.org At: 01/06/17 18:23:06
> To: dev@lucene.apache.org
> Subject: Re: 6.4 release
>
> Thanks all for your answers.
>
> Are we still good to build the first RC on Monday ? Christine maybe do you
> need more time ?
>
>
> 2017-01-05 5:47 GMT+01:00 Shalin Shekhar Mangar :
>
> Aggregated metrics will not be in 6.4 -- that work is still in progress.
>
> On Thu, Jan 5, 2017 at 6:57 AM, S G  wrote:
> > +1 for adding the metric related changes.
> > Aggregated metrics from from replicas sounds like a very nice thing to
> have.
> >
> > On Wed, Jan 4, 2017 at 12:11 PM, Varun Thacker 
> wrote:
> >>
> >> +1 to cut a release branch on monday. Lots of goodies in this release!
> >>
> >> On Tue, Jan 3, 2017 at 8:23 AM, jim ferenczi 
> >> wrote:
> >>>
> >>> Hi,
> >>> I would like to volunteer to release 6.4. I can cut the release branch
> >>> next Monday if everybody agrees.
> >>>
> >>> Jim
> >>
> >>
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


[jira] [Updated] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-7620:
-
Fix Version/s: 6.4

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Fix For: 6.4
>
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-7620:
-
Attachment: LUCENE_7620_UH_LengthGoalBreakIterator.patch

Here's an updated patch.  I added assertions not exceptions because if per 
chance this circumstance happens in production, it's really okay to return 
possibly the wrong break and have a passage that isn't quite the ideal size 
rather than throw some exception.

It now has 2 modes of operation, with 2 corresponding factory methods to 
clarify which: {{createMinLength(...)}} and {{createTargetLength(...)}}.  The 
minLength mode might be useful because it's faster (than target).  I think it's 
more useful than a MaxLength (which still could be added in the future) because 
a too-long passage can possibly be trimmed by the client, but the reverse is 
not true -- you can't lengthen a passage that is too short (if it reaches the 
client talking to a search server).

I did some benchmarking too; which in addition to observing the overhead also 
served to help ensure it didn't throw exceptions (at least for the test queries 
& test data).  That never happened though; I squashed bugs in the test and 
chose sizes to tease out the edge conditions.  In so doing I found a minor bug 
with CustomSeparatorBreakIterator but I'll leave that for another time.  
Benchmarking showed the minLength is noticeably faster than targetLength, maybe 
10% overall.  Also, (something I already knew) I observed a "cheap" underlying 
BreakIterator like CustomSeparatorBreakIterator is ~20% faster than a JDK 
Sentence one.

I'll commit it this weekend or possibly tonight if you review it in-time 
positively.

> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch, 
> LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-6.x - Build # 654 - Still Unstable

2017-01-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-6.x/654/

1 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPISolrJTest.testSplitShard

Error Message:
Error from server at https://127.0.0.1:57624/solr: Could not fully remove 
collection: solrj_test_splitshard

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at https://127.0.0.1:57624/solr: Could not fully remove collection: 
solrj_test_splitshard
at 
__randomizedtesting.SeedInfo.seed([8E7578B987654669:557FD5D599907AD6]:0)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:610)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1344)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1095)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1037)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:166)
at 
org.apache.solr.cloud.CollectionsAPISolrJTest.testSplitShard(CollectionsAPISolrJTest.java:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Updated] (SOLR-9940) Config API throws Index Not Mutable when creating a RequestHandler on a Classic Schema

2017-01-06 Thread Webster Homer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Webster Homer updated SOLR-9940:

Priority: Minor  (was: Major)

> Config API throws Index Not Mutable when creating a RequestHandler on a 
> Classic Schema
> --
>
> Key: SOLR-9940
> URL: https://issues.apache.org/jira/browse/SOLR-9940
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: config-api
>Affects Versions: 6.2
> Environment: linux
>Reporter: Webster Homer
>Priority: Minor
>
> All of our schema are Classic, we do no use mutable schemas.
> We use the Config API to create /cdcr request handlers 
> {code}
> {
>   "add-requesthandler":{
> "name":"/cdcr",
> "class":"solr.CdcrRequestHandler",
> "replica": [{
>   "zkHost":"stldeepx20.sial.com:2181/solr",
>   "source":"sial-catalog-material",
>   "target":"sial-catalog-material"
> }, {
>   "zkHost":"stldeepx06.sial.com:2181/solr",
>   "source":"sial-catalog-material",
>   "target":"sial-catalog-material"
> }],
> "replicator": {
>   "threadPoolSize":2,
>   "schedule": 1000,
>   "batchSize": 128
> },
> "updateLogSynchronizer" : {
>   "schedule": 6
> }
>   }
> }
> {code}
> The actual handler is generated by by an endpoint.
> The handler is created, and is functional, but the following errors are 
> present in the log:
> {code}
> 2017-01-06 21:34:12.377 ERROR (SolrConfigHandler-refreshconf) 
> [c:test-catalog-product s:shard1 r:core_node1 x:test-catalog-product_s
> hard1_replica1] o.a.s.s.IndexSchema This IndexSchema is not mutable.
> 2017-01-06 21:34:12.377 WARN  (SolrConfigHandler-refreshconf) 
> [c:test-catalog-product s:shard1 r:core_node1 x:test-catalog-product_s
> hard1_replica1] o.a.s.c.SolrCore 
> org.apache.solr.common.SolrException: This IndexSchema is not mutable.
> at 
> org.apache.solr.schema.IndexSchema.getSchemaUpdateLock(IndexSchema.java:1864)
> at 
> org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:434)
> at 
> org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2571)
> at org.apache.solr.core.SolrCore$$Lambda$91/766533008.run(Unknown 
> Source)
> at 
> org.apache.solr.handler.SolrConfigHandler$Command.lambda$handleGET$0(SolrConfigHandler.java:216)
> at 
> org.apache.solr.handler.SolrConfigHandler$Command$$Lambda$143/2029435585.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805884#comment-15805884
 ] 

ASF subversion and git services commented on SOLR-8029:
---

Commit d9caa8082c0e04909fd4d6b9095464ed452742b1 in lucene-solr's branch 
refs/heads/apiv2 from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d9caa80 ]

SOLR-8029: changed command id to a parameter instead of path segment


> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 6.0
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: 6.0
>
> Attachments: SOLR-8029.patch, SOLR-8029.patch, SOLR-8029.patch, 
> SOLR-8029.patch
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 4 types of requests in the new API 
> * {{/v2//*}} : Hit a collection directly or manage 
> collections/shards/replicas 
> * {{/v2//*}} : Hit a core directly or manage cores 
> * {{/v2/cluster/*}} : Operations on cluster not pertaining to any collection 
> or core. e.g: security, overseer ops etc
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9940) Config API throws Index Not Mutable when creating a RequestHandler on a Classic Schema

2017-01-06 Thread Webster Homer (JIRA)
Webster Homer created SOLR-9940:
---

 Summary: Config API throws Index Not Mutable when creating a 
RequestHandler on a Classic Schema
 Key: SOLR-9940
 URL: https://issues.apache.org/jira/browse/SOLR-9940
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: config-api
Affects Versions: 6.2
 Environment: linux
Reporter: Webster Homer


All of our schema are Classic, we do no use mutable schemas.
We use the Config API to create /cdcr request handlers 
{code}
{
  "add-requesthandler":{
"name":"/cdcr",
"class":"solr.CdcrRequestHandler",
"replica": [{
"zkHost":"stldeepx20.sial.com:2181/solr",
"source":"sial-catalog-material",
"target":"sial-catalog-material"
}, {
"zkHost":"stldeepx06.sial.com:2181/solr",
"source":"sial-catalog-material",
"target":"sial-catalog-material"
}],
"replicator": {
  "threadPoolSize":2,
  "schedule": 1000,
  "batchSize": 128
},
"updateLogSynchronizer" : {
  "schedule": 6
}
  }
}
{code}
The actual handler is generated by by an endpoint.
The handler is created, and is functional, but the following errors are present 
in the log:
{code}
2017-01-06 21:34:12.377 ERROR (SolrConfigHandler-refreshconf) 
[c:test-catalog-product s:shard1 r:core_node1 x:test-catalog-product_s
hard1_replica1] o.a.s.s.IndexSchema This IndexSchema is not mutable.
2017-01-06 21:34:12.377 WARN  (SolrConfigHandler-refreshconf) 
[c:test-catalog-product s:shard1 r:core_node1 x:test-catalog-product_s
hard1_replica1] o.a.s.c.SolrCore 
org.apache.solr.common.SolrException: This IndexSchema is not mutable.
at 
org.apache.solr.schema.IndexSchema.getSchemaUpdateLock(IndexSchema.java:1864)
at 
org.apache.solr.schema.SchemaManager.getFreshManagedSchema(SchemaManager.java:434)
at 
org.apache.solr.core.SolrCore.lambda$getConfListener$6(SolrCore.java:2571)
at org.apache.solr.core.SolrCore$$Lambda$91/766533008.run(Unknown 
Source)
at 
org.apache.solr.handler.SolrConfigHandler$Command.lambda$handleGET$0(SolrConfigHandler.java:216)
at 
org.apache.solr.handler.SolrConfigHandler$Command$$Lambda$143/2029435585.run(Unknown
 Source)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2017-01-06 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805873#comment-15805873
 ] 

Kevin Risden commented on SOLR-8593:


Also merged master into jira/solr-8593 and fixed the merge conflicts. They 
weren't bad just some minor changes.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch, SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9939) Ping handler logs each request twice

2017-01-06 Thread Shawn Heisey (JIRA)
Shawn Heisey created SOLR-9939:
--

 Summary: Ping handler logs each request twice
 Key: SOLR-9939
 URL: https://issues.apache.org/jira/browse/SOLR-9939
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 6.4
Reporter: Shawn Heisey
Priority: Minor


Requests to the ping handler are being logged twice.  The first line has "hits" 
and the second one doesn't, but other than that they have the same info.

These lines are from a 5.3.2-SNAPSHOT version.  In the IRC channel, [~ctargett] 
confirmed that this also happens in 6.4-SNAPSHOT.

{noformat}
2017-01-06 14:16:37.253 INFO  (qtp1510067370-186262) [   x:sparkmain] 
or.ap.so.co.So.Request [sparkmain] webapp=/solr path=/admin/ping params={} 
hits=400271103 status=0 QTime=4
2017-01-06 14:16:37.253 INFO  (qtp1510067370-186262) [   x:sparkmain] 
or.ap.so.co.So.Request [sparkmain] webapp=/solr path=/admin/ping params={} 
status=0 QTime=4
{noformat}

Unless there's a good reason to have it that I'm not aware of, the second log 
should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9939) Ping handler logs each request twice

2017-01-06 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805862#comment-15805862
 ] 

Shawn Heisey commented on SOLR-9939:


When I have some time I can look for the problem and try to fix it, but if 
anybody else wants the issue, feel free to take it.

> Ping handler logs each request twice
> 
>
> Key: SOLR-9939
> URL: https://issues.apache.org/jira/browse/SOLR-9939
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.4
>Reporter: Shawn Heisey
>Priority: Minor
>
> Requests to the ping handler are being logged twice.  The first line has 
> "hits" and the second one doesn't, but other than that they have the same 
> info.
> These lines are from a 5.3.2-SNAPSHOT version.  In the IRC channel, 
> [~ctargett] confirmed that this also happens in 6.4-SNAPSHOT.
> {noformat}
> 2017-01-06 14:16:37.253 INFO  (qtp1510067370-186262) [   x:sparkmain] 
> or.ap.so.co.So.Request [sparkmain] webapp=/solr path=/admin/ping params={} 
> hits=400271103 status=0 QTime=4
> 2017-01-06 14:16:37.253 INFO  (qtp1510067370-186262) [   x:sparkmain] 
> or.ap.so.co.So.Request [sparkmain] webapp=/solr path=/admin/ping params={} 
> status=0 QTime=4
> {noformat}
> Unless there's a good reason to have it that I'm not aware of, the second log 
> should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Michael McCandless
Unfortunately I think that's somewhat dangerous because it creates an
ambiguous API with a nasty performance trap?

I.e. this new method won't invoke the fast Terms.intersect in the
default terms dict?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jan 6, 2017 at 3:20 PM, Alan Woodward  wrote:
> Hm, how about something like this, on CompiledAutomaton:
>
> public TermsEnum getTermsEnum(TermsEnum te) throws IOException {
>   switch (type) {
> case NONE:
>   return TermsEnum.EMPTY;
> case ALL:
>   return te;
> case SINGLE:
>   return new SingleTermsEnum(te, term);
> case NORMAL:
>   return new AutomatonTermsEnum(te, this);
> default:
>   // unreachable
>   throw new RuntimeException("unhandled case");
>   }
> }
>
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 6 Jan 2017, at 19:16, Michael McCandless 
> wrote:
>
> These automaton intersection APIs are frustrating with all the special
> case handling... Ideas welcome!
>
> We've had similar challenges with them in the past, when a user
> invoked Terms.intersect directly instead of via CompiledAutomaton:
> https://issues.apache.org/jira/browse/LUCENE-7576
>
> The problem is CompiledAutomaton specializes certain cases (all
> strings match, no strings match, single term) and sidesteps
> Terms.intersect for those cases.
>
> We should fix AutomatonTermsEnum public ctor w/ the same checks
> (insist on a NORMAL case) so you don't hit assert failures, or, worse
> ... I'll do that.
>
> I think a new CompiledAutomaton.intersect taking TermsEnum would be
> tricky in general because it relies on the (efficient) Terms.intersect
> to handle the NORMAL case well, but we can't invoke that from a
> TermsEnum.
>
> In the SINGLE case, could you use SingleTermsEnum, passing the
> TermsEnum from your doc values, and the term from the
> CompiledAutomaton?  Would that suffice as a workaround?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward  wrote:
>
> We’ve hit an issue while developing marple, where we want to have the
> ability to filter the values from a SortedDocValues terms dictionary.
> Normally you’d create a CompiledAutomaton from the filter string, and then
> call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms
> instance, we instead have a TermsEnum.
>
> Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but
> if the CompiledAutomaton in question is a fixed string, then we get
> assertion failures, because ATE uses the  compiled automaton’s internal
> ByteRunAutomaton for filtering, and fixed-string automata don’t have one.
>
> Is there a work-around that I’m missing here?  Or should I maybe open a JIRA
> to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?
>
> Alan Woodward
> www.flax.co.uk
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 6.4 release

2017-01-06 Thread Christine Poerschke (BLOOMBERG/ LONDON)
All done, thanks for checking. Have a good weekend.

From: dev@lucene.apache.org At: 01/06/17 18:23:06
To: dev@lucene.apache.org
Subject: Re: 6.4 release

Thanks all for your answers.
Are we still good to build the first RC on Monday ? Christine maybe do you need 
more time ?


2017-01-05 5:47 GMT+01:00 Shalin Shekhar Mangar :

Aggregated metrics will not be in 6.4 -- that work is still in progress.

On Thu, Jan 5, 2017 at 6:57 AM, S G  wrote:
> +1 for adding the metric related changes.
> Aggregated metrics from from replicas sounds like a very nice thing to have.
>
> On Wed, Jan 4, 2017 at 12:11 PM, Varun Thacker  wrote:
>>
>> +1 to cut a release branch on monday. Lots of goodies in this release!
>>
>> On Tue, Jan 3, 2017 at 8:23 AM, jim ferenczi 
>> wrote:
>>>
>>> Hi,
>>> I would like to volunteer to release 6.4. I can cut the release branch
>>> next Monday if everybody agrees.
>>>
>>> Jim
>>
>>
>


--
Regards,
Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Resolved] (SOLR-8542) Integrate Learning to Rank into Solr

2017-01-06 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved SOLR-8542.
---
Resolution: Fixed

Thanks everyone!

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch, 
> SOLR-8542.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously [presented by the authors at Lucene/Solr 
> Revolution 
> 2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].
> 
> Solr Reference Guide documentation:
> * https://cwiki.apache.org/confluence/display/solr/Result+Reranking
> Source code and README files:
> * 
> [solr/contrib/ltr|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr]
> * 
> [solr/contrib/ltr/example|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8542) Integrate Learning to Rank into Solr

2017-01-06 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-8542:
--
Description: 
This is a ticket to integrate learning to rank machine learning models into 
Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
directly inside Solr for use in training a machine learned model. You can then 
deploy that model to Solr and use it to rerank your top X search results. This 
concept was previously [presented by the authors at Lucene/Solr Revolution 
2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].



Solr Reference Guide documentation:
* https://cwiki.apache.org/confluence/display/solr/Result+Reranking

Source code and README files:
* 
[solr/contrib/ltr|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr]
* 
[solr/contrib/ltr/example|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example]

  was:
This is a ticket to integrate learning to rank machine learning models into 
Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
directly inside Solr for use in training a machine learned model. You can then 
deploy that model to Solr and use it to rerank your top X search results. This 
concept was previously [presented by the authors at Lucene/Solr Revolution 
2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].

---

Solr Reference Guide documentation:
* https://cwiki.apache.org/confluence/display/solr/Result+Reranking

Source code and README files:
* 
[solr/contrib/ltr|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr]
* 
[solr/contrib/ltr/example|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example]


> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch, 
> SOLR-8542.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously [presented by the authors at Lucene/Solr 
> Revolution 
> 2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].
> 
> Solr Reference Guide documentation:
> * https://cwiki.apache.org/confluence/display/solr/Result+Reranking
> Source code and README files:
> * 
> [solr/contrib/ltr|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr]
> * 
> [solr/contrib/ltr/example|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8542) Integrate Learning to Rank into Solr

2017-01-06 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-8542:
--
Description: 
This is a ticket to integrate learning to rank machine learning models into 
Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
directly inside Solr for use in training a machine learned model. You can then 
deploy that model to Solr and use it to rerank your top X search results. This 
concept was previously [presented by the authors at Lucene/Solr Revolution 
2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].

---

Solr Reference Guide documentation:
* https://cwiki.apache.org/confluence/display/solr/Result+Reranking

Source code and README files:
* 
[solr/contrib/ltr|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr]
* 
[solr/contrib/ltr/example|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example]

  was:
This is a ticket to integrate learning to rank machine learning models into 
Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
directly inside Solr for use in training a machine learned model. You can then 
deploy that model to Solr and use it to rerank your top X search results. This 
concept was previously [presented by the authors at Lucene/Solr Revolution 
2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].

[Read through the 
README|https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr]
 for a tutorial on using the plugin, in addition to how to train your own 
external model.



> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch, 
> SOLR-8542.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously [presented by the authors at Lucene/Solr 
> Revolution 
> 2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].
> ---
> Solr Reference Guide documentation:
> * https://cwiki.apache.org/confluence/display/solr/Result+Reranking
> Source code and README files:
> * 
> [solr/contrib/ltr|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr]
> * 
> [solr/contrib/ltr/example|https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2017-01-06 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805828#comment-15805828
 ] 

Kevin Risden commented on SOLR-8593:


Updated jira/solr-8593 branch with Calcite 0.11.0 since that was just released.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8593.patch, SOLR-8593.patch
>
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 1061 - Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/1061/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

2 tests failed.
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test

Error Message:
timeout waiting to see all nodes active

Stack Trace:
java.lang.AssertionError: timeout waiting to see all nodes active
at 
__randomizedtesting.SeedInfo.seed([C470F112EA5E5039:4C24CEC844A23DC1]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.waitTillNodesActive(PeerSyncReplicationTest.java:311)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.bringUpDeadNodeAndEnsureNoReplication(PeerSyncReplicationTest.java:262)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.forceNodeFailureAndDoPeerSync(PeerSyncReplicationTest.java:244)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.test(PeerSyncReplicationTest.java:133)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 

[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2017-01-06 Thread Christine Poerschke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805800#comment-15805800
 ] 

Christine Poerschke commented on SOLR-8542:
---

The bot did not (yet) update for it here but there is equivalent 'master' 
branch commit as per the bot's 
[update|https://issues.apache.org/jira/browse/SOLR-9929?focusedCommentId=15805749=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15805749]
 on SOLR-9929 itself.

> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch, 
> SOLR-8542.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously [presented by the authors at Lucene/Solr 
> Revolution 
> 2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].
> [Read through the 
> README|https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr]
>  for a tutorial on using the plugin, in addition to how to train your own 
> external model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-9929:
--
Priority: Minor  (was: Major)

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread Christine Poerschke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-9929:
--
Issue Type: Task  (was: Improvement)

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2017-01-06 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805767#comment-15805767
 ] 

Paul Elschot commented on LUCENE-7580:
--

SpanSynonymQuery is unusual here because it uses a single SpansDocScorer per 
segment, independent of the number of synonym terms.

Since the TermSpans for SynonymSpans are Spans without a SpansDocScorer it 
makes some sense not to merge Spans and SpansDocScorer later.



> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch, 
> LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805757#comment-15805757
 ] 

ASF subversion and git services commented on SOLR-9929:
---

Commit 88450c70bb4daa3ca6c4750581bddeaad9bea6f9 in lucene-solr's branch 
refs/heads/branch_6x from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=88450c7 ]

SOLR-8542: expand 'Assemble training data' content in solr/contrib/ltr/README

(Diego Ceccarelli via Christine Poerschke in response to SOLR-9929 enquiry from 
Jeffery Yuan.)


> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8542) Integrate Learning to Rank into Solr

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805756#comment-15805756
 ] 

ASF subversion and git services commented on SOLR-8542:
---

Commit 88450c70bb4daa3ca6c4750581bddeaad9bea6f9 in lucene-solr's branch 
refs/heads/branch_6x from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=88450c7 ]

SOLR-8542: expand 'Assemble training data' content in solr/contrib/ltr/README

(Diego Ceccarelli via Christine Poerschke in response to SOLR-9929 enquiry from 
Jeffery Yuan.)


> Integrate Learning to Rank into Solr
> 
>
> Key: SOLR-8542
> URL: https://issues.apache.org/jira/browse/SOLR-8542
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joshua Pantony
>Assignee: Christine Poerschke
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8542-branch_5x.patch, SOLR-8542-trunk.patch, 
> SOLR-8542.patch
>
>
> This is a ticket to integrate learning to rank machine learning models into 
> Solr. Solr Learning to Rank (LTR) provides a way for you to extract features 
> directly inside Solr for use in training a machine learned model. You can 
> then deploy that model to Solr and use it to rerank your top X search 
> results. This concept was previously [presented by the authors at Lucene/Solr 
> Revolution 
> 2015|http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp].
> [Read through the 
> README|https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr]
>  for a tutorial on using the plugin, in addition to how to train your own 
> external model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805754#comment-15805754
 ] 

Mark Miller commented on SOLR-9937:
---

This issue not working creates the same problems SOLR-9901 intended to fix for 
HDFS.

> StandardDirectoryFactory::move never uses more efficient implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Assignee: Mark Miller
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805749#comment-15805749
 ] 

ASF subversion and git services commented on SOLR-9929:
---

Commit 024c4031e55a998b73288fd276e30ffd626f0b91 in lucene-solr's branch 
refs/heads/master from [~cpoerschke]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=024c403 ]

SOLR-8542: expand 'Assemble training data' content in solr/contrib/ltr/README

(Diego Ceccarelli via Christine Poerschke in response to SOLR-9929 enquiry from 
Jeffery Yuan.)


> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805747#comment-15805747
 ] 

Mark Miller commented on SOLR-9937:
---

No, the behavior is not correct, we want a move to provide resiliency for 
replication and a move means atomic file stuff we want to ensure is the default 
behavior. We need to make sure someone else doesn't break this after it's fixed.

> StandardDirectoryFactory::move never uses more efficient implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Assignee: Mark Miller
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7580) Spans tree scoring

2017-01-06 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7580:
-
Attachment: LUCENE-7580.patch

Patch of 6 Jan 2017.

This contains:

The changes in the patch of 30 Dec 2016.

Support for SpanSynonymQuery, see SynonymSpans and SynonymSpansDocScorer.

Class AsSingleTermSpansDocScorer as common superclass for TermSpansDocScorer 
and SynonymSpansDocScorer. This is the place where matching and non matching 
term occurrences are scored with a SimScorer from Similarity while taking into 
account the slop factors.

Method SpansTreeQuery.wrapAfterRewrite() to use SpansTreeQuery.wrap() at the 
right moment.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch, 
> LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Installing PyLucene

2017-01-06 Thread Andi Vajda


On Fri, 6 Jan 2017, Jan Høydahl wrote:


Hi,

I hope you didn?t get this wrong! We all appreciate the existence of 
JCC/PyLucene and especially all the effort you?ve put into this.



PyLucene is driven by its own community, and user involvement and contributions 
is a must.
The (sub)project will survive only to the extent that its current users invest 
in it.


So if some funding is required to get this going ?


For an ASF Open Source Project, the only thing that is required to get going is 
user/developer
involvement and teamwork. While Andi started the project due to needs at the 
time, and became
a committer, he is no longer an active user, so perhaps time has come for other 
users to step ut and take
responsibility.

How ?funding? would look like in the Python3 case is not so much sending money 
to the ASF,
but more for individual companies like your own, to sponsor (through developer 
time) the major
work on the patch, and driving it through to completion. Hopefully other users 
will contribute along
the way too.

You will of course need help from experienced developers, but the ideal 
situation is that after
a couple of such patches that get committed, you (or the developer working on 
the code) will be nominated
as committer and can continue developing PyLucene without the need for Andi or 
any other one individual.


There has been some discussions about the future of PyLucene on this list but I 
still didn't see any conclusion/decision



The discussion sparked some new development and a release, which is a success. 
So the decission I guess is to keep PyLucene alive and try to strengthen the 
community.
As long as the project continues to produce releases, it is (somewhat) alive.
If on the other hand another year or two goes by without another release, I?m 
sure the PMC will take action again.


I intend to produce a PyLucene 6.4 release once Lucene 6.4 is done.
It's been a few months now...

Andi..



--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


6. jan. 2017 kl. 10.34 skrev Thomas Koch :

Dear Andi,

I?ve just sent the link to the public gist with the patch to Petrus and this list. As 
mentioned by Oliver we?d be more than happy if a core developer of JCC/PyLucene could 
review the patch and decide what to do with it. It has been developed without intimate 
knowledge of JCC with the goal to make PyLucene(36) usable with Python3. It may have some 
issues or need improvements (also cf. "IMPORTANT NOTES" in my last email about 
current limitations of the patch). That?s where export review (and effort) is needed.

For the future of course a port to newer versions of JCC/PyLucene would be more 
than valuable. I think what Oliver wanted to express is that we don?t have that 
much deep know how of JCC and can thus can only provide initial efforts and 
contributions, but for production/release ready code an export review is still 
needed. Also we haven?t watched the development of newer versions of PyLucene 
as we?re still stuck with PyLucene36.

I hope you didn?t get this wrong! We all appreciate the existence of 
JCC/PyLucene and especially all the effort you?ve put into this.

However, I fear that Python 3 support is a must-have for a Python tool or 
library that's available today:
- Python3 is here to stay! (py3.6 has just been released)
- Most of the popular Python packages do meanwhile provide Python3 support - cf. 
http://py3readiness.org 
- Python2 support will end by 2020 (sounds far away but isn't - cf.  
https://pythonclock.org  )

There has been some discussions about the future of PyLucene on this list but I 
still didn't see any conclusion/decision. Without a transparent roadmap and 
ongoing development (i.e. new releases, Python3 support etc.) the usage of 
JCC/PyLucene is most likely unattractive for developers who start a new project 
and this is where the user base shrinks and further contributions are stalled 
(somehow a chicken-egg-problem).

I'm not sure how far the ASF may help here, but I've read that the Python 
Software Foundation occasionally funds projects to port libraries that are 
widely used but don't have enough of a community to do a port.
cf. 
https://developers.slashdot.org/story/13/08/25/2115204/interviews-guido-van-rossum-answers-your-questions
 


So if some funding is required to get this going ...



best regards,

Thomas
?

Am 04.01.2017 um 19:41 schrieb Andi Vajda :



Note that PyLucene currently lacks official Python3 support!
We've done a port of PyLucene 3.6 (!) to support Python3 and offered the 
patches needed to JCC and PyLucene for use/review on the list - but didn't get 
any feedback so far.
cf. https://www.mail-archive.com/pylucene-dev@lucene.apache.org/msg02167.html 

Re: Installing PyLucene

2017-01-06 Thread Andi Vajda


 Hi Thomas,

On Fri, 6 Jan 2017, Thomas Koch wrote:

I?ve just sent the link to the public gist with the patch to Petrus and 
this list. As mentioned by Oliver we?d be more than happy if a core 
developer of JCC/PyLucene could review the patch and decide what to do 
with it. It has been developed without intimate knowledge of JCC with the 
goal to make PyLucene(36) usable with Python3. It may have some issues or 
need improvements (also cf. "IMPORTANT NOTES" in my last email about 
current limitations of the patch). That?s where export review (and effort) 
is needed.


For the future of course a port to newer versions of JCC/PyLucene would be 
more than valuable. I think what Oliver wanted to express is that we don?t 
have that much deep know how of JCC and can thus can only provide initial 
efforts and contributions, but for production/release ready code an export 
review is still needed. Also we haven?t watched the development of newer 
versions of PyLucene as we?re still stuck with PyLucene36.


I hope you didn?t get this wrong! We all appreciate the existence of 
JCC/PyLucene and especially all the effort you?ve put into this.


However, I fear that Python 3 support is a must-have for a Python tool or 
library that's available today:
- Python3 is here to stay! (py3.6 has just been released)
- Most of the popular Python packages do meanwhile provide Python3 support - cf. 
http://py3readiness.org 
- Python2 support will end by 2020 (sounds far away but isn't - cf.  
https://pythonclock.org  )

There has been some discussions about the future of PyLucene on this list 
but I still didn't see any conclusion/decision. Without a transparent 
roadmap and ongoing development (i.e. new releases, Python3 support etc.) 
the usage of JCC/PyLucene is most likely unattractive for developers who 
start a new project and this is where the user base shrinks and further 
contributions are stalled (somehow a chicken-egg-problem).


I'm not sure how far the ASF may help here, but I've read that the Python 
Software Foundation occasionally funds projects to port libraries that are 
widely used but don't have enough of a community to do a port.
cf. 
https://developers.slashdot.org/story/13/08/25/2115204/interviews-guido-van-rossum-answers-your-questions
 


So if some funding is required to get this going ...


I now took a look at the python 3 patches you sent a link to in an earlier 
message and here is the gist of my thoughts:

  - Moving the Python 3 is desirable but what about Python 2 support today
in 2017 ? I have no desire to support both for PyLucene manually. If,
somehow, there can be two versions of JCC, one for Python 2, one for
Python 3 and the PyLucene tests can be 2to3'd automatically, then the
Python 3 support idea looks more attractive already. Supporting two
versions of JCC is fine until 2020.

  - The JCC patches look very reasonable but should be updated to the latest
Python 3. In particular, the internal Python 3 string representation was
changed again after 3.2 (?) and has clever optimizations possible based
on the internal byte size of characters chosen by Python (internally)
for each string, based on the range of the characters used in the string.
This makes it possible to often just copy chars from Python to Java.
I just did a rewrite for this in PyICU (another long
term project of mine, https://github.com/ovalhub/pyicu/) and the Python 3
string story got much cleaner post 3.2 (at least more
understandable). Lots of bugs with long unicode chars (forgot the proper
term, sorry) got fixed along the way (emoticon support, yay).

So, if you're prepared to fund this effort, it might be best to hire
back the contractor who did the JCC Python 3 port originally and have
him/her refresh it for the latest JCC on trunk (not too many changes
happened in the past few years) and to the use the Python internal string
APIs that appeared post Python 3.2. The ones in use in the patch are
deprecated already. I love it that we'd then shed _all_ backwards
compatibility baggage in JCC going forward in Python 3.x, x >= 6.

If you get the JCC/Python3 patches into a shape where I can apply them to
trunk without trouble and using the latest CPython string APIs:
   https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsUCS4
and related (PyUnicode_KIND, etc...)
then there is a good chance that PyLucene/JCC would be fully supported
with Python 3.x, x >= 6.

  - The PyLucene patches should probably be redone so that they can be
automated with 2to3. If we get JCC in shape, I can take care of the rest.

Thank you for the work done so far, it's looking really good but it needs to
be refreshed to JCC/trunk and latest Python 3 to minimize work on my side.

[jira] [Reopened] (SOLR-9928) MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super

2017-01-06 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  reopened SOLR-9928:
-

> MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super
> 
>
> Key: SOLR-9928
> URL: https://issues.apache.org/jira/browse/SOLR-9928
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (7.0), 6.4
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki 
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9928.patch, SOLR-9928.patch
>
>
> MetricsDirectoryFactory::renameWithOverwrite should call the delegate instead 
> of super. Trivial patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7617) Improve GroupingSearch API and extensibility

2017-01-06 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805662#comment-15805662
 ] 

Martijn van Groningen commented on LUCENE-7617:
---

+1 Thanks for cleaning this up!

I found a few places still using GROUP_VALUE_TYPE, in 
SecondPassGroupingCollector.SearchGroupDocs, GroupDocs, TopGroups, 
AllGroupHeadsCollector.GroupHead and Grouping.Command (in Solr).

bq. Given that everything here is marked as experimental, I think we're OK to 
just backwards-break?

Yes, that is OK. 



> Improve GroupingSearch API and extensibility
> 
>
> Key: LUCENE-7617
> URL: https://issues.apache.org/jira/browse/LUCENE-7617
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Attachments: LUCENE-7617.patch, LUCENE-7617.patch
>
>
> While looking at how to make grouping work with the new XValuesSource API in 
> core, I thought I'd try and clean up GroupingSearch a bit.  We have three 
> different ways of grouping at the moment: by doc block, using a single-pass 
> collector; by field; and by ValueSource.  The latter two both use essentially 
> the same two-pass mechanism, with different Collector implementations.
> I can see a number of possible improvements here:
> * abstract the two-pass collector creation into a factory API, which should 
> allow us to add the XValuesSource implementations as well
> * clean up the generics on the two-pass collectors - maybe look into removing 
> them entirely?  I'm not sure they add anything really, and we don't have them 
> on the equivalent plan search APIs
> * think about moving the document block method into the join module instead, 
> alongside all the other block-indexing code
> * rename the various Collector base classes so that they don't have 
> 'Abstract' in them anymore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2017-01-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805657#comment-15805657
 ] 

Joel Bernstein commented on SOLR-9636:
--

I'll also test javabin with gatherNodes() graph traversal. gatherNodes simply 
passes through the parameters to CloudSolrStream so it's easy just take off and 
on the writer type and test performance.

> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Alan Woodward
Hm, how about something like this, on CompiledAutomaton:

public TermsEnum getTermsEnum(TermsEnum te) throws IOException {
  switch (type) {
case NONE:
  return TermsEnum.EMPTY;
case ALL:
  return te;
case SINGLE:
  return new SingleTermsEnum(te, term);
case NORMAL:
  return new AutomatonTermsEnum(te, this);
default:
  // unreachable
  throw new RuntimeException("unhandled case");
  }
}

Alan Woodward
www.flax.co.uk


> On 6 Jan 2017, at 19:16, Michael McCandless  wrote:
> 
> These automaton intersection APIs are frustrating with all the special
> case handling... Ideas welcome!
> 
> We've had similar challenges with them in the past, when a user
> invoked Terms.intersect directly instead of via CompiledAutomaton:
> https://issues.apache.org/jira/browse/LUCENE-7576
> 
> The problem is CompiledAutomaton specializes certain cases (all
> strings match, no strings match, single term) and sidesteps
> Terms.intersect for those cases.
> 
> We should fix AutomatonTermsEnum public ctor w/ the same checks
> (insist on a NORMAL case) so you don't hit assert failures, or, worse
> ... I'll do that.
> 
> I think a new CompiledAutomaton.intersect taking TermsEnum would be
> tricky in general because it relies on the (efficient) Terms.intersect
> to handle the NORMAL case well, but we can't invoke that from a
> TermsEnum.
> 
> In the SINGLE case, could you use SingleTermsEnum, passing the
> TermsEnum from your doc values, and the term from the
> CompiledAutomaton?  Would that suffice as a workaround?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward  wrote:
>> We’ve hit an issue while developing marple, where we want to have the
>> ability to filter the values from a SortedDocValues terms dictionary.
>> Normally you’d create a CompiledAutomaton from the filter string, and then
>> call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms
>> instance, we instead have a TermsEnum.
>> 
>> Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but
>> if the CompiledAutomaton in question is a fixed string, then we get
>> assertion failures, because ATE uses the  compiled automaton’s internal
>> ByteRunAutomaton for filtering, and fixed-string automata don’t have one.
>> 
>> Is there a work-around that I’m missing here?  Or should I maybe open a JIRA
>> to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



Re: 6.4 release

2017-01-06 Thread Alan Woodward
Hi Jim,

I’d like to get LUCENE-7609, LUCENE-7610 and LUCENE-7611 in - I’ll commit them 
over the weekend unless anybody squawks.

Alan Woodward
www.flax.co.uk


> On 6 Jan 2017, at 18:22, jim ferenczi  wrote:
> 
> Thanks all for your answers.
> Are we still good to build the first RC on Monday ? Christine maybe do you 
> need more time ?
> 
> 
> 2017-01-05 5:47 GMT+01:00 Shalin Shekhar Mangar  >:
> Aggregated metrics will not be in 6.4 -- that work is still in progress.
> 
> On Thu, Jan 5, 2017 at 6:57 AM, S G  > wrote:
> > +1 for adding the metric related changes.
> > Aggregated metrics from from replicas sounds like a very nice thing to have.
> >
> > On Wed, Jan 4, 2017 at 12:11 PM, Varun Thacker  > > wrote:
> >>
> >> +1 to cut a release branch on monday. Lots of goodies in this release!
> >>
> >> On Tue, Jan 3, 2017 at 8:23 AM, jim ferenczi  >> >
> >> wrote:
> >>>
> >>> Hi,
> >>> I would like to volunteer to release 6.4. I can cut the release branch
> >>> next Monday if everybody agrees.
> >>>
> >>> Jim
> >>
> >>
> >
> 
> 
> 
> --
> Regards,
> Shalin Shekhar Mangar.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
> 
> For additional commands, e-mail: dev-h...@lucene.apache.org 
> 
> 
> 



[jira] [Commented] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805574#comment-15805574
 ] 

Mike Drob commented on SOLR-9937:
-

The current behaviour is correct, but not optimal.efficient. Not sure what kind 
of testing you think we would benefit from.

Could do a performance test where we try to move 1000 files using 
DirectoryFactory and then move 1000 using StandardDirectoryFactory and measure 
that the second is faster. But that still sounds like we'll run into a problem 
and false failures on somebody's hardware that I haven't thought about.

> StandardDirectoryFactory::move never uses more efficient implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Assignee: Mark Miller
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7576) RegExp automaton causes NPE on Terms.intersect

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805539#comment-15805539
 ] 

ASF subversion and git services commented on LUCENE-7576:
-

Commit 8e974ecdcfc85243442fadf353cab4cb52a6cab2 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8e974ec ]

LUCENE-7576: AutomatonTermsEnum ctor should also insist on a NORMAL 
CompiledAutomaton in


> RegExp automaton causes NPE on Terms.intersect
> --
>
> Key: LUCENE-7576
> URL: https://issues.apache.org/jira/browse/LUCENE-7576
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs, core/index
>Affects Versions: 6.2.1
> Environment: java version "1.8.0_77" macOS 10.12.1
>Reporter: Tom Mortimer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7576.patch
>
>
> Calling org.apache.lucene.index.Terms.intersect(automaton, null) causes an 
> NPE:
> String index_path = 
> String term = 
> Directory directory = FSDirectory.open(Paths.get(index_path));
> IndexReader reader = DirectoryReader.open(directory);
> Fields fields = MultiFields.getFields(reader);
> Terms terms = fields.terms(args[1]);
> CompiledAutomaton automaton = new CompiledAutomaton(
>   new RegExp("do_not_match_anything").toAutomaton());
> TermsEnum te = terms.intersect(automaton, null);
> throws:
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.lucene.codecs.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:127)
>   at 
> org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
>   at org.apache.lucene.index.MultiTerms.intersect(MultiTerms.java:85)
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7576) RegExp automaton causes NPE on Terms.intersect

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805536#comment-15805536
 ] 

ASF subversion and git services commented on LUCENE-7576:
-

Commit ebb5c7e6768c03c83be4aa3abdab22e16cb67c2c in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ebb5c7e ]

LUCENE-7576: AutomatonTermsEnum ctor should also insist on a NORMAL 
CompiledAutomaton in


> RegExp automaton causes NPE on Terms.intersect
> --
>
> Key: LUCENE-7576
> URL: https://issues.apache.org/jira/browse/LUCENE-7576
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/codecs, core/index
>Affects Versions: 6.2.1
> Environment: java version "1.8.0_77" macOS 10.12.1
>Reporter: Tom Mortimer
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7576.patch
>
>
> Calling org.apache.lucene.index.Terms.intersect(automaton, null) causes an 
> NPE:
> String index_path = 
> String term = 
> Directory directory = FSDirectory.open(Paths.get(index_path));
> IndexReader reader = DirectoryReader.open(directory);
> Fields fields = MultiFields.getFields(reader);
> Terms terms = fields.terms(args[1]);
> CompiledAutomaton automaton = new CompiledAutomaton(
>   new RegExp("do_not_match_anything").toAutomaton());
> TermsEnum te = terms.intersect(automaton, null);
> throws:
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.lucene.codecs.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:127)
>   at 
> org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
>   at org.apache.lucene.index.MultiTerms.intersect(MultiTerms.java:85)
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805350#comment-15805350
 ] 

Michael McCandless commented on LUCENE-7588:


Thanks [~ekeller]!

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Michael McCandless
These automaton intersection APIs are frustrating with all the special
case handling... Ideas welcome!

We've had similar challenges with them in the past, when a user
invoked Terms.intersect directly instead of via CompiledAutomaton:
https://issues.apache.org/jira/browse/LUCENE-7576

The problem is CompiledAutomaton specializes certain cases (all
strings match, no strings match, single term) and sidesteps
Terms.intersect for those cases.

We should fix AutomatonTermsEnum public ctor w/ the same checks
(insist on a NORMAL case) so you don't hit assert failures, or, worse
... I'll do that.

I think a new CompiledAutomaton.intersect taking TermsEnum would be
tricky in general because it relies on the (efficient) Terms.intersect
to handle the NORMAL case well, but we can't invoke that from a
TermsEnum.

In the SINGLE case, could you use SingleTermsEnum, passing the
TermsEnum from your doc values, and the term from the
CompiledAutomaton?  Would that suffice as a workaround?

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward  wrote:
> We’ve hit an issue while developing marple, where we want to have the
> ability to filter the values from a SortedDocValues terms dictionary.
> Normally you’d create a CompiledAutomaton from the filter string, and then
> call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms
> instance, we instead have a TermsEnum.
>
> Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but
> if the CompiledAutomaton in question is a fixed string, then we get
> assertion failures, because ATE uses the  compiled automaton’s internal
> ByteRunAutomaton for filtering, and fixed-string automata don’t have one.
>
> Is there a work-around that I’m missing here?  Or should I maybe open a JIRA
> to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?
>
> Alan Woodward
> www.flax.co.uk
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 6.4 release

2017-01-06 Thread Joel Bernstein
I'm all done with 6.4 changes.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jan 6, 2017 at 1:22 PM, jim ferenczi  wrote:

> Thanks all for your answers.
> Are we still good to build the first RC on Monday ? Christine maybe do you
> need more time ?
>
>
> 2017-01-05 5:47 GMT+01:00 Shalin Shekhar Mangar :
>
>> Aggregated metrics will not be in 6.4 -- that work is still in progress.
>>
>> On Thu, Jan 5, 2017 at 6:57 AM, S G  wrote:
>> > +1 for adding the metric related changes.
>> > Aggregated metrics from from replicas sounds like a very nice thing to
>> have.
>> >
>> > On Wed, Jan 4, 2017 at 12:11 PM, Varun Thacker 
>> wrote:
>> >>
>> >> +1 to cut a release branch on monday. Lots of goodies in this release!
>> >>
>> >> On Tue, Jan 3, 2017 at 8:23 AM, jim ferenczi 
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>> I would like to volunteer to release 6.4. I can cut the release branch
>> >>> next Monday if everybody agrees.
>> >>>
>> >>> Jim
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>


[JENKINS] Lucene-Solr-NightlyTests-6.x - Build # 249 - Still unstable

2017-01-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-6.x/249/

11 tests failed.
FAILED:  org.apache.solr.cloud.BasicDistributedZkTest.test

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([6126209FE45704B1]:0)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
Suite timeout exceeded (>= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (>= 720 msec).
at __randomizedtesting.SeedInfo.seed([6126209FE45704B1]:0)


FAILED:  
org.apache.solr.cloud.hdfs.HdfsCollectionsAPIDistributedZkTest.testCreateShouldFailOnExistingCore

Error Message:
{responseHeader={status=0,QTime=38146},failure={127.0.0.1:33914_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.0.1:33914/solr: Core with name 
'halfcollection_shard1_replica1' already 
exists.,127.0.0.1:35654_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.0.1:35654/solr: Error CREATEing SolrCore 
'halfcollection_shard2_replica1': Unable to create core 
[halfcollection_shard2_replica1] Caused by: Direct buffer memory}}

Stack Trace:
java.lang.AssertionError: 
{responseHeader={status=0,QTime=38146},failure={127.0.0.1:33914_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.0.1:33914/solr: Core with name 
'halfcollection_shard1_replica1' already 
exists.,127.0.0.1:35654_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://127.0.0.1:35654/solr: Error CREATEing SolrCore 
'halfcollection_shard2_replica1': Unable to create core 
[halfcollection_shard2_replica1] Caused by: Direct buffer memory}}
at 
__randomizedtesting.SeedInfo.seed([6126209FE45704B1:8DE08D2B1FCBB955]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testCreateShouldFailOnExistingCore(CollectionsAPIDistributedZkTest.java:308)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Comment Edited] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread Diego Ceccarelli (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805240#comment-15805240
 ] 

Diego Ceccarelli edited comment on SOLR-9929 at 1/6/17 6:41 PM:


Thanks [~jefferyyuan] for opening the issue, I submitted a patch to the 
learning to rank example readme, trying to explain better how a user can 
produce a training set from feedback data. The new version is available here: 
https://github.com/bloomberg/lucene-solr/blob/master-ltr/solr/contrib/ltr/example/README.md

Please let me know if you have comments or more questions. Thanks! 


was (Author: diegoceccarelli):
Improve Learning to Rank example readme

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9929) Documentation and sample code about how to train the model using user clicks when use ltr module

2017-01-06 Thread Diego Ceccarelli (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diego Ceccarelli updated SOLR-9929:
---
Attachment: 0001-Improve-Learning-to-Rank-example-Readme.patch

Improve Learning to Rank example readme

> Documentation and sample code about how to train the model using user clicks 
> when use ltr module
> 
>
> Key: SOLR-9929
> URL: https://issues.apache.org/jira/browse/SOLR-9929
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: jefferyyuan
>Assignee: Christine Poerschke
>  Labels: learning-to-rank, machine_learning, solr
> Fix For: master (7.0), 6.4
>
> Attachments: 0001-Improve-Learning-to-Rank-example-Readme.patch
>
>
> Thanks very much for integrating machine learning to Solr.
> https://issues.apache.org/jira/browse/SOLR-8542
> I tried to integrate it. But have difficult figuring out how to translate the 
> partial pairwise feedback to the importance or relevance of that doc.
> https://github.com/apache/lucene-solr/blob/f62874e47a0c790b9e396f58ef6f14ea04e2280b/solr/contrib/ltr/README.md
> In the Assemble training data part: the third column indicates the relative 
> importance or relevance of that doc
> Could you please give more info about how to give a score based on what user 
> clicks?
> I have read 
> https://static.aminer.org/pdf/PDF/000/472/865/optimizing_search_engines_using_clickthrough_data.pdf
> http://www.cs.cornell.edu/people/tj/publications/joachims_etal_05a.pdf
> http://alexbenedetti.blogspot.com/2016/07/solr-is-learning-to-rank-better-part-1.html
> But still have no clue yet.
> From a user's perspective, the steps such as setup the feature and model in 
> Solr is simple, but collecting the feedback data and train/update the model 
> is much more complex. Without it, we can't really use the learning-to-rank 
> function in Solr.
> It would be great if Solr can provide some detailed instruction and sample 
> code about how to translate the partial pairwise feedback and use it to train 
> and update model.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 6.4 release

2017-01-06 Thread jim ferenczi
Thanks all for your answers.
Are we still good to build the first RC on Monday ? Christine maybe do you
need more time ?


2017-01-05 5:47 GMT+01:00 Shalin Shekhar Mangar :

> Aggregated metrics will not be in 6.4 -- that work is still in progress.
>
> On Thu, Jan 5, 2017 at 6:57 AM, S G  wrote:
> > +1 for adding the metric related changes.
> > Aggregated metrics from from replicas sounds like a very nice thing to
> have.
> >
> > On Wed, Jan 4, 2017 at 12:11 PM, Varun Thacker 
> wrote:
> >>
> >> +1 to cut a release branch on monday. Lots of goodies in this release!
> >>
> >> On Tue, Jan 3, 2017 at 8:23 AM, jim ferenczi 
> >> wrote:
> >>>
> >>> Hi,
> >>> I would like to volunteer to release 6.4. I can cut the release branch
> >>> next Monday if everybody agrees.
> >>>
> >>> Jim
> >>
> >>
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805199#comment-15805199
 ] 

Jim Ferenczi commented on LUCENE-7620:
--

{quote}
By choosing a lengthGoal on the low side; maybe "too long" will tend not to be 
a problem? Or see my TODO at the top of the file – essentially choose the break 
that is closest to the goal instead of always the first following it.
{quote}

Yeah depends how the lengthGoal is perceived. I was looking at it as a boundary 
mainly to solve "too long" fragment. And this issue is more about "too short" 
fragments. Maybe a different issue then but I am just afraid that we'll end up 
with multiple public break iterator impls that must follow a specific pattern 
to be used.
Anyway this patch is a start to get better highlighting through custom break 
iterator and it solves a real issue. Please push to 6.4 if you think it's 
ready, we can always discuss the next steps in a follow up. 
Regarding the assertion I prefer an IllegalStateException with a clear message 
but I am maybe too paranoid.






> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9928) MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super

2017-01-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805150#comment-15805150
 ] 

Mark Miller commented on SOLR-9928:
---

The argument for hiding and doing a single unwrap everywhere seems to be that 
this factory tries to inject itself in an abnormal way, rather than counting on 
being configured. It almost looks like it hides outside the cache - and so the 
unwrap would pass the cache key directory that impls may expect to get passed.

> MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super
> 
>
> Key: SOLR-9928
> URL: https://issues.apache.org/jira/browse/SOLR-9928
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (7.0), 6.4
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki 
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9928.patch, SOLR-9928.patch
>
>
> MetricsDirectoryFactory::renameWithOverwrite should call the delegate instead 
> of super. Trivial patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9928) MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super

2017-01-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805115#comment-15805115
 ] 

Mark Miller commented on SOLR-9928:
---

Previously non of the impls unwrapped more than One layer and they only did it 
for nrt caching dir. I recently fixed it so they unwrap Filtered dirs multiple 
layers. So we should remove the unwrapping probably.  The consistency argument 
is that it's confusing to only do some
Methods. It indicates the unwrapping is unnecessary cruft where it used. Or 
it's intentionally trying to hide the dir and failing to in some cases. Doesn't 
make sense in any scenario. 

> MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super
> 
>
> Key: SOLR-9928
> URL: https://issues.apache.org/jira/browse/SOLR-9928
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (7.0), 6.4
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki 
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9928.patch, SOLR-9928.patch
>
>
> MetricsDirectoryFactory::renameWithOverwrite should call the delegate instead 
> of super. Trivial patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805103#comment-15805103
 ] 

Mark Miller commented on SOLR-9937:
---

Nice catch. I suppose we should look at getting in a testing for this. 

> StandardDirectoryFactory::move never uses more efficient implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-9937:
-

Assignee: Mark Miller

> StandardDirectoryFactory::move never uses more efficient implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Assignee: Mark Miller
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-06 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805029#comment-15805029
 ] 

Emmanuel Keller commented on LUCENE-7588:
-

Ok, I am able to reproduce the failure on my own environment. I try to fix that 
now.

{noformat}

   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestParallelDrillSideways -Dtests.method=testRandom 
-Dtests.seed=734B3451E1B6F47B -Dtests.slow=true -Dtests.locale=ar-BH 
-Dtests.timezone=America/North_Dakota/Center -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 3.50s | TestParallelDrillSideways.testRandom <<<
   [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<1[00]4> 
but was:<1[]4>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([734B3451E1B6F47B:107115E50D64208]:0)
   [junit4]>at 
org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1036)
   [junit4]>at 
org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:820)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> Jan 06, 2017 6:09:15 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
   [junit4]   2> WARNING: Will linger awaiting termination of 1 leaked 
thread(s).
   [junit4]   2> Jan 06, 2017 6:09:35 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
   [junit4]   2> SEVERE: 1 thread leaked from SUITE scope at 
org.apache.lucene.facet.TestParallelDrillSideways: 
   [junit4]   2>1) Thread[id=17, name=LuceneTestCase-1-thread-1, 
state=WAITING, group=TGRP-TestParallelDrillSideways]
   [junit4]   2> at sun.misc.Unsafe.park(Native Method)
   [junit4]   2> at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   [junit4]   2> at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   [junit4]   2> at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   [junit4]   2> at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
   [junit4]   2> at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
   [junit4]   2> at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2> at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> Jan 06, 2017 6:09:35 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
   [junit4]   2> INFO: Starting to interrupt leaked threads:
   [junit4]   2>1) Thread[id=17, name=LuceneTestCase-1-thread-1, 
state=WAITING, group=TGRP-TestParallelDrillSideways]
   [junit4]   2> Jan 06, 2017 6:09:38 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
   [junit4]   2> SEVERE: There are still zombie threads that couldn't be 
terminated:
   [junit4]   2>1) Thread[id=17, name=LuceneTestCase-1-thread-1, 
state=WAITING, group=TGRP-TestParallelDrillSideways]
   [junit4]   2> at sun.misc.Unsafe.park(Native Method)
   [junit4]   2> at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   [junit4]   2> at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
   [junit4]   2> at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
   [junit4]   2> at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
   [junit4]   2> at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
   [junit4]   2> at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   [junit4]   2> at java.lang.Thread.run(Thread.java:745)

{noformat}

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF

2017-01-06 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805015#comment-15805015
 ] 

Erick Erickson commented on SOLR-8292:
--

[~caomanhdat317] All I was really looking for was whether, in your opinion, 
this was even possible any more, I was just being lazy. This wasn't 
particularly about CDCR, it was just that CDCR exercised it I think.

Please don't spend time trying to reproduce. It sure would have been helpful if 
I'd recorded _what_ test failed a year ago wouldn't it? Shhh.

It's been a long time since I opened this. I'll just start monitoring CDCR 
Jenkins failures (I've noticed a few go by but mostly haven't pursued them) and 
see if anything similar reappears and if not, maybe we can close it. That'll 
take a while before anyone would feel comfortable. Probably should have been 
doing that all along. Ditto with SOLR-4116.

> TransactionLog.next() does not honor contract and return null for EOF
> -
>
> Key: SOLR-8292
> URL: https://issues.apache.org/jira/browse/SOLR-8292
> Project: Solr
>  Issue Type: Bug
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a 
> stack trace showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, 
> but looking at the code a bit more I'm not so sure, that seems like it'd mask 
> what looks at first glance like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written 
> at the same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the 
> output file? Maybe because the 'i' key is right next to the 'o' key? The 
> comment hints that it's checking for the ability to read the first record in 
> input stream along with the header. And perhaps there's a different issue 
> here because the expectation clearly is that the first record should be there 
> if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? 
> Change the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to 
> fos is intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9938) Improve the performance of CloudSolrStream

2017-01-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9938:
-
Description: 
Now that we have started to improve the efficiency of Streaming, I think it 
makes sense to work on CloudSolrStream, which is used as a bulk stream source.

The first thing to tackle is how the merge sort of the SolrStreams from each 
shard is done.

Currently the sorting is done by a TreeSet, which is not the most efficient 
approach. For one thing each *put* and *poll* on the TreeSet creates a new map 
Entry. When streaming millions of documents this adds up. Also the TreeSet is 
backed by a TreeMap that maintains a fully ordered set of tuples. We just need 
to know the highest Tuple on each read().

I think we can increase throughput significantly by using a custom priority 
queue for sorting rather then the TreeSet.  




  was:
Now that we have started to improve the efficiency of Streaming, I think it 
makes sense to work on CloudSolrStream, which is used as a bulk stream source.

The first thing to tackle is how the merge sort of the SolrStreams from each 
shard is done.

Currently the sorting is done by a TreeSet, which is not the most efficient 
approach. For one thing each *put* and *poll* on the TreeSet creates a new map 
Entry. When streaming millions of documents this adds up. Also the TreeSet is 
backed by a TreeMap that maintains a fully order set of tuples. We just need to 
know the highest Tuple.

I think we can increase throughput significantly by using a custom priority 
queue for sorting rather then the TreeSet.  





> Improve the performance of CloudSolrStream
> --
>
> Key: SOLR-9938
> URL: https://issues.apache.org/jira/browse/SOLR-9938
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> Now that we have started to improve the efficiency of Streaming, I think it 
> makes sense to work on CloudSolrStream, which is used as a bulk stream source.
> The first thing to tackle is how the merge sort of the SolrStreams from each 
> shard is done.
> Currently the sorting is done by a TreeSet, which is not the most efficient 
> approach. For one thing each *put* and *poll* on the TreeSet creates a new 
> map Entry. When streaming millions of documents this adds up. Also the 
> TreeSet is backed by a TreeMap that maintains a fully ordered set of tuples. 
> We just need to know the highest Tuple on each read().
> I think we can increase throughput significantly by using a custom priority 
> queue for sorting rather then the TreeSet.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9938) Improve the performance of CloudSolrStream

2017-01-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9938:
-
Description: 
Now that we have started to improve the efficiency of Streaming, I think it 
makes sense to work on CloudSolrStream, which is used as a bulk stream source.

The first thing to tackle is how the merge sort of the SolrStreams from each 
shard is done.

Currently the sorting is done by a TreeSet, which is not the most efficient 
approach. For one thing each *put* and *poll* on the TreeSet creates a new map 
Entry. When streaming millions of documents this adds up. Also the TreeSet is 
backed by a TreeMap that maintains a fully order set of tuples. We just need to 
know the highest Tuple.

I think we can increase throughput significantly by using a custom priority 
queue for sorting rather then the TreeSet.  




  was:
Now that we have started to improve the efficiency of Streaming, I think it 
makes sense to work on CloudSolrStream, which is used as a bulk stream source.

The first thing to tackle is how the merge sort of the SolrStream from each 
shard is done.

Currently the sorting is done by a TreeSet, which is not the most efficient 
approach. For one thing each *put* and *poll* on the TreeSet creates a new map 
Entry. When streaming millions of documents this adds up. Also the TreeSet is 
backed by a TreeMap that maintains a fully order set of tuples. We just need to 
know the highest Tuple.

I think we can increase throughput significantly by using a custom priority 
queue for sorting rather then the TreeSet.  





> Improve the performance of CloudSolrStream
> --
>
> Key: SOLR-9938
> URL: https://issues.apache.org/jira/browse/SOLR-9938
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>
> Now that we have started to improve the efficiency of Streaming, I think it 
> makes sense to work on CloudSolrStream, which is used as a bulk stream source.
> The first thing to tackle is how the merge sort of the SolrStreams from each 
> shard is done.
> Currently the sorting is done by a TreeSet, which is not the most efficient 
> approach. For one thing each *put* and *poll* on the TreeSet creates a new 
> map Entry. When streaming millions of documents this adds up. Also the 
> TreeSet is backed by a TreeMap that maintains a fully order set of tuples. We 
> just need to know the highest Tuple.
> I think we can increase throughput significantly by using a custom priority 
> queue for sorting rather then the TreeSet.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9938) Improve the performance of CloudSolrStream

2017-01-06 Thread Joel Bernstein (JIRA)
Joel Bernstein created SOLR-9938:


 Summary: Improve the performance of CloudSolrStream
 Key: SOLR-9938
 URL: https://issues.apache.org/jira/browse/SOLR-9938
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


Now that we have started to improve the efficiency of Streaming, I think it 
makes sense to work on CloudSolrStream, which is used as a bulk stream source.

The first thing to tackle is how the merge sort of the SolrStream from each 
shard is done.

Currently the sorting is done by a TreeSet, which is not the most efficient 
approach. For one thing each *put* and *poll* on the TreeSet creates a new map 
Entry. When streaming millions of documents this adds up. Also the TreeSet is 
backed by a TreeMap that maintains a fully order set of tuples. We just need to 
know the highest Tuple.

I think we can increase throughput significantly by using a custom priority 
queue for sorting rather then the TreeSet.  






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-9937:

Attachment: SOLR-9937.patch

[~markrmil...@gmail.com] - you were last to touch this in SOLR-9902, care to 
take a look?

> StandardDirectoryFactory::move never uses more efficient implementation
> ---
>
> Key: SOLR-9937
> URL: https://issues.apache.org/jira/browse/SOLR-9937
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
> Attachments: SOLR-9937.patch
>
>
> {noformat}
>   Path path1 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   Path path2 = ((FSDirectory) 
> baseFromDir).getDirectory().toAbsolutePath();
>   
>   try {
> Files.move(path1.resolve(fileName), path2.resolve(fileName), 
> StandardCopyOption.ATOMIC_MOVE);
>   } catch (AtomicMoveNotSupportedException e) {
> Files.move(path1.resolve(fileName), path2.resolve(fileName));
>   }
> {noformat}
> Because {{path1 == path2}} this code never does anything and move always 
> defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9928) MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super

2017-01-06 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804976#comment-15804976
 ] 

Mike Drob commented on SOLR-9928:
-

Ran some tests locally and it looks like the only two specializations both 
unwrap properly internally, so we don't have to worry about it here. Needed to 
unwrap in renameWithOverwrite because the implementations there _did not_ 
unwrap before trying to use. I don't have strong opinions about the consistency 
argument here.

However, while looking into this, I discovered a bug in 
StandardDirectoryFactory::move, filed as SOLR-9937

> MetricsDirectoryFactory::renameWithOverwrite incorrectly calls super
> 
>
> Key: SOLR-9928
> URL: https://issues.apache.org/jira/browse/SOLR-9928
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (7.0), 6.4
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki 
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9928.patch, SOLR-9928.patch
>
>
> MetricsDirectoryFactory::renameWithOverwrite should call the delegate instead 
> of super. Trivial patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9937) StandardDirectoryFactory::move never uses more efficient implementation

2017-01-06 Thread Mike Drob (JIRA)
Mike Drob created SOLR-9937:
---

 Summary: StandardDirectoryFactory::move never uses more efficient 
implementation
 Key: SOLR-9937
 URL: https://issues.apache.org/jira/browse/SOLR-9937
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Mike Drob


{noformat}
  Path path1 = ((FSDirectory) baseFromDir).getDirectory().toAbsolutePath();
  Path path2 = ((FSDirectory) baseFromDir).getDirectory().toAbsolutePath();
  
  try {
Files.move(path1.resolve(fileName), path2.resolve(fileName), 
StandardCopyOption.ATOMIC_MOVE);
  } catch (AtomicMoveNotSupportedException e) {
Files.move(path1.resolve(fileName), path2.resolve(fileName));
  }
{noformat}

Because {{path1 == path2}} this code never does anything and move always 
defaults to the less efficient implementation in DirectoryFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException

2017-01-06 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-4116:


Assignee: Erick Erickson

> Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException
> -
>
> Key: SOLR-4116
> URL: https://issues.apache.org/jira/browse/SOLR-4116
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.1
> Environment: 5.0.0.2012.11.28.10.42.06
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
>
> With SOLR-4032 fixed we see other issues when randomly taking down nodes 
> (nicely via tomcat restart) while indexing a few million web pages from 
> Hadoop. We do make sure that at least one node is up for a shard but due to 
> recovery issues it may not be live.
> {code}
> 2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - 
> [recoveryExecutor-8-thread-1] - : Starting log replay 
> tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 
> refcount=2} active=false starting pos=0
> 2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - 
> [recoveryExecutor-8-thread-1] - : java.io.EOFException
> at 
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151)
> at 
> org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176)
> at 
> org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:374)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
> at 
> org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182)
> at 
> org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:618)
> at 
> org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1198)
> at 
> org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1143)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Alan Woodward
We’ve hit an issue while developing marple, where we want to have the ability 
to filter the values from a SortedDocValues terms dictionary.  Normally you’d 
create a CompiledAutomaton from the filter string, and then call 
#getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms instance, 
we instead have a TermsEnum.

Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but if 
the CompiledAutomaton in question is a fixed string, then we get assertion 
failures, because ATE uses the  compiled automaton’s internal ByteRunAutomaton 
for filtering, and fixed-string automata don’t have one.

Is there a work-around that I’m missing here?  Or should I maybe open a JIRA to 
add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?

Alan Woodward
www.flax.co.uk




[jira] [Commented] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper

2017-01-06 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804800#comment-15804800
 ] 

David Smiley commented on LUCENE-7620:
--

bq. Though I wonder if we should also break the sentence if it's too long ? 
Maybe the wrapped breakiterator could always be a sentence one and we could use 
a WordBreakIterator to cut sentences that are too long ? This way it would 
produce snippets that are similar to the SimpleFragmenter.
It could also be done in another breakiterator on top of this one but this 
would make things over complicated, I guess.

By choosing a lengthGoal on the low side; maybe "too long" will tend not to be 
a problem?  Or see my TODO at the top of the file -- essentially choose the 
break that is closest to the goal instead of always the first following it.  
Maybe I'll add that in my next patch.

I don't think we should try to emulate SimpleFragmenter exactly.  We can do a 
much better job ;-)   I like this implementation as a wrapper BreakIterator 
perhaps we'll add a Regex BI one day and then it would simply fit right in.

bq. For the implementation can you throw an exception on the method that should 
not be called ? For instance ...(etc)

Yeah I could go either way on that... how about {{assert false : "not 
supported/expected";}}?  

bq. Additionally I think that we should have a way to change the start and end 
of a passage when we know all the match that it contains. This is what the FVH 
is doing and it should be doable in the UH because the passage are created on 
the fly in forward manner. This is of course not the purpose of this issue and 
it should be treated as a new feature but I think it would be great to have the 
same output than the FVH when the max length of the passage is set.

Definitely a separate issue.  It wouldn't fit into the BreakIterator 
abstraction either.  Maybe some Passage post-processor like thing.  Or maybe 
simply expose sufficient hooks to allow subclassers to do this.  That keeps the 
UH simpler.


> UnifiedHighlighter: add target character width BreakIterator wrapper
> 
>
> Key: LUCENE-7620
> URL: https://issues.apache.org/jira/browse/LUCENE-7620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch
>
>
> The original Highlighter includes a {{SimpleFragmenter}} that delineates 
> fragments (aka Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  
> It's useful in its own right and of course it helps users transition to the 
> UH.  I'd like to do it as a wrapper to another BreakIterator -- perhaps a 
> sentence one.  In this way you get back Passages that are a number of 
> sentences so they will look nice instead of breaking mid-way through a 
> sentence.  And you get some control by specifying a target number of 
> characters.  This BreakIterator wouldn't be a general purpose 
> java.text.BreakIterator since it would assume it's called in a manner exactly 
> as the UnifiedHighlighter uses it.  It would probably be compatible with the 
> PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your 
> BreakIterator config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7617) Improve GroupingSearch API and extensibility

2017-01-06 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7617:
--
Attachment: LUCENE-7617.patch

Thanks for the review Martijn!  Here's an updated patch:
* no more 'Abstract' in abstract class names
* generics are changed so that instead of Class, 
we just use Class.  Also, type parameters are all set to T rather than 
GROUP_VALUE_TYPE.  You can shrink your windows when looking at grouping code 
now :)
* Block grouping is left where it is
* Added javadocs, and extra {} around if statements.

Given that everything here is marked as experimental, I think we're OK to just 
backwards-break?  Most people will be using GroupingSearch, I think, which 
stays the same.

> Improve GroupingSearch API and extensibility
> 
>
> Key: LUCENE-7617
> URL: https://issues.apache.org/jira/browse/LUCENE-7617
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Attachments: LUCENE-7617.patch, LUCENE-7617.patch
>
>
> While looking at how to make grouping work with the new XValuesSource API in 
> core, I thought I'd try and clean up GroupingSearch a bit.  We have three 
> different ways of grouping at the moment: by doc block, using a single-pass 
> collector; by field; and by ValueSource.  The latter two both use essentially 
> the same two-pass mechanism, with different Collector implementations.
> I can see a number of possible improvements here:
> * abstract the two-pass collector creation into a factory API, which should 
> allow us to add the XValuesSource implementations as well
> * clean up the generics on the two-pass collectors - maybe look into removing 
> them entirely?  I'm not sure they add anything really, and we don't have them 
> on the equivalent plan search APIs
> * think about moving the document block method into the join module instead, 
> alongside all the other block-indexing code
> * rename the various Collector base classes so that they don't have 
> 'Abstract' in them anymore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804750#comment-15804750
 ] 

ASF subversion and git services commented on SOLR-9503:
---

Commit 2b66d0cb127b5e3e92a0f988aa7ba10690227ac3 in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b66d0c ]

SOLR-9503: NPE in Replica Placement Rules when using Overseer Role with other 
rules


> NPE in Replica Placement Rules when using Overseer Role with other rules
> 
>
> Key: SOLR-9503
> URL: https://issues.apache.org/jira/browse/SOLR-9503
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Rules, SolrCloud
>Affects Versions: 6.2, master (7.0)
>Reporter: Tim Owen
>Assignee: Noble Paul
> Attachments: SOLR-9503.patch, SOLR-9503.patch
>
>
> The overseer role introduced in SOLR-9251 works well if there's only a single 
> Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
> combined with another rule, e.g. 
> {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result 
> in a NullPointerException (in Rule.tryAssignNodeToShard)
> This happens because the code builds up a nodeVsTags map, but it only has 
> entries for nodes that have values for *all* tags used among the rules. This 
> means not enough information is available to other rules when they are being 
> checked during replica assignment. In the example rules above, if we have a 
> cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
> have any entry in the nodeVsTags map because they only have the host tag 
> value and not the role tag value.
> Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
> keeping entries that fulfil the constraint of having values for all tags used 
> in the rules. Possibly this constraint was suitable when rules were 
> originally introduced, but the Role tag (used for Overseers) is unlikely to 
> be present for all nodes in the cluster, and similarly for sysprop tags which 
> may or not be set for a node.
> My patch removes this constraint, so the nodeVsTags map contains everything 
> known about all nodes, even if they have no value for a given tag. This 
> allows the rule combination above to work, and doesn't appear to cause any 
> problems with the code paths that use the nodeVsTags map. They handle null 
> values quite well, and the tests pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: PyLucene package test error

2017-01-06 Thread Andi Vajda

> On Jan 6, 2017, at 03:02, Shawn Gao  wrote:
> 
> Hello, PyLucene User and Developers
> 
>Problems occurred during `make test` in pylucene-6.2.0 from PyLucen
> Homepage  when testing
> 'test_PythonException.py'. And I think there might be something wrong with
> the test python code.

This error was just covered on another thread a few days ago: you either didn't 
build jcc in shared mode or didn't use --shared then on the jcc invocation 
lcommand line in PyLucene's Makefile when building it. The bug is that this 
test should be disabled when jcc is not in shared mode since this exception 
support depends on it.

Andi..

> 
>The test python code I run and the Error Log are posted at the end of
> this email.
> 
>The test script raised a python-exception(TestException). But it return
> a JavaError first, which failed the test.
> 
>If I change assertRaises(TestException) to
> assertRaises(lucene.JavaError) here, it passed the test.
> 
>Should I make this change to pass the test. Or am I missing something?
> 
> 
> Thanks for your advice!
> 
> 
> 
>Here comes the python script:
> 
> import sys, lucene, unittest
> from PyLuceneTestCase import PyLuceneTestCase
> 
> from org.apache.lucene.analysis.standard import StandardAnalyzer
> from org.apache.pylucene.queryparser.classic import PythonQueryParser
> 
> 
> class PythonExceptionTestCase(PyLuceneTestCase):
>def testThroughLayerException(self):
>class TestException(Exception):
>pass
> 
>class TestQueryParser(PythonQueryParser):
>def getFieldQuery_quoted(_self, field, queryText, quoted):
>raise TestException("TestException")
> 
>qp = TestQueryParser('all', StandardAnalyzer())
> 
>with self.assertRaises(TestException):
>qp.parse("foo bar")
> 
> if __name__ == "__main__":
>lucene.initVM(vmargs=['-Djava.awt.headless=true'])
>if '-loop' in sys.argv:
>print "in if"
>sys.argv.remove('-loop')
>while True:
>try:
>unittest.main()
>except:
>pass
>else:
>print "in else"
>unittest.main()
> 
> 
>Here's the Error StackTrace:
> 
> 
> shawn@shawn-Precision-T1700:~/workspace/pylucene-6.2.0/test$ python
> ./test_PythonException.py
> in else
> E
> ==
> ERROR: testThroughLayerException (__main__.PythonExceptionTestCase)
> --
> Traceback (most recent call last):
>  File "./test_PythonException.py", line 34, in testThroughLayerException
>qp.parse("foo bar")
> JavaError: , >
>Java stacktrace:
> java.lang.RuntimeException: TestException
>at 
> org.apache.pylucene.queryparser.classic.PythonQueryParser.getFieldQuery_quoted(Native
> Method)
>at 
> org.apache.pylucene.queryparser.classic.PythonQueryParser.getFieldQuery(Unknown
> Source)
>at org.apache.lucene.queryparser.classic.QueryParser.MultiTerm(
> QueryParser.java:585)
>at org.apache.lucene.queryparser.classic.QueryParser.Query(
> QueryParser.java:198)
>at org.apache.lucene.queryparser.classic.QueryParser.
> TopLevelQuery(QueryParser.java:187)
>at org.apache.lucene.queryparser.classic.QueryParserBase.parse(
> QueryParserBase.java:111)



[jira] [Updated] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-06 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9936:
---
Description: 
There are two executor services in {{UpdateShardHandler}}, the 
{{updateExecutor}} whose size is unbounded for reasons explained in the code 
comments. There is also the {{recoveryExecutor}} which was added later, and is 
the one that executes the {{RecoveryStrategy}} code to actually fetch index 
files and store to disk, eventually calling an {{fsync}} thread to ensure the 
data is written.

We found that with a fast network such as 10GbE it's very easy to overload the 
local disk storage when doing a restart of Solr instances after some downtime, 
if they have many cores to load. Typically we have each physical server 
containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a 
dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can 
really hammer the SSD as it's writing in parallel from as many cores as Solr is 
recovering. This made recovery time bad enough that replicas were down for a 
long time, and even shards marked as down if none of its replicas have 
recovered (usually when many machines have been restarted). The very slow IO 
times (10s of seconds or worse) also made the JVM pause, so that disconnects 
from ZK, which didn't help recovery either.

This patch allowed us to throttle how much parallelism there would be writing 
to a disk - in practice we're using a pool size of 4 threads, to prevent the 
SSD getting overloaded, and that worked well enough to make recovery of all 
cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on 
whether it's OK to do this for the {{recoveryExecutor}} though?

It's configured in solr.xml with e.g.

{noformat}
  
${solr.recovery.threads:4}
  
{noformat}


  was:
There are two executor services in {{UpdateShardHandler}}, the 
{{updateExecutor}} whose size is unbounded for reasons explained in the code 
comments. There is also the {{recoveryExecutor}} which was added later, and is 
the one that executes the {{RecoveryStrategy}} code to actually fetch index 
files and store to disk, eventually calling an {{fsync}} thread to ensure the 
data is written.

We found that with a fast network such as 10GbE it's very easy to overload the 
local disk storage when doing a restart of Solr instances after some downtime, 
if they have many cores to load. Typically we have each physical server 
containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a 
dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can 
really hammer the SSD as it's writing in parallel from as many cores as Solr is 
recovering. This made recovery time bad enough that replicas were down for a 
long time, and even shards marked as down if none of its replicas have 
recovered (usually when many machines have been restarted).

This patch allowed us to throttle how much parallelism there would be writing 
to a disk - in practice we're using a pool size of 4 threads, to prevent the 
SSD getting overloaded, and that worked well enough to make recovery of all 
cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on 
whether it's OK to do this for the {{recoveryExecutor}} though?

It's configured in solr.xml with e.g.

{noformat}
  
${solr.recovery.threads:4}
  
{noformat}



> Allow configuration for recoveryExecutor thread pool size
> -
>
> Key: SOLR-9936
> URL: https://issues.apache.org/jira/browse/SOLR-9936
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.3
>Reporter: Tim Owen
> Attachments: SOLR-9936.patch
>
>
> There are two executor services in {{UpdateShardHandler}}, the 
> {{updateExecutor}} whose size is unbounded for reasons explained in the code 
> comments. There is also the {{recoveryExecutor}} which was added later, and 
> is the one that executes the {{RecoveryStrategy}} code to actually fetch 
> index files and store to disk, eventually calling an {{fsync}} thread to 
> ensure the data is written.
> We found that with a fast network such as 10GbE it's very easy to overload 
> the local disk storage when doing a restart of Solr instances after some 
> downtime, if they have many cores to load. Typically we have each physical 
> server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir 
> on a dedicated SSD. With 100+ cores (shard replicas) on each instance, 
> startup can really hammer the SSD as it's writing in parallel from as many 
> cores as Solr is recovering. This made recovery time bad enough that replicas 
> were down for a long time, and 

[jira] [Commented] (SOLR-9503) NPE in Replica Placement Rules when using Overseer Role with other rules

2017-01-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804740#comment-15804740
 ] 

ASF subversion and git services commented on SOLR-9503:
---

Commit cd4f908d5ba223e615920be73285b7c5cc57704a in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd4f908 ]

SOLR-9503: NPE in Replica Placement Rules when using Overseer Role with other 
rules


> NPE in Replica Placement Rules when using Overseer Role with other rules
> 
>
> Key: SOLR-9503
> URL: https://issues.apache.org/jira/browse/SOLR-9503
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Rules, SolrCloud
>Affects Versions: 6.2, master (7.0)
>Reporter: Tim Owen
>Assignee: Noble Paul
> Attachments: SOLR-9503.patch, SOLR-9503.patch
>
>
> The overseer role introduced in SOLR-9251 works well if there's only a single 
> Rule for replica placement e.g. {code}rule=role:!overseer{code} but when 
> combined with another rule, e.g. 
> {code}rule=role:!overseer=host:*,shard:*,replica:<2{code} it can result 
> in a NullPointerException (in Rule.tryAssignNodeToShard)
> This happens because the code builds up a nodeVsTags map, but it only has 
> entries for nodes that have values for *all* tags used among the rules. This 
> means not enough information is available to other rules when they are being 
> checked during replica assignment. In the example rules above, if we have a 
> cluster of 12 nodes and only 3 are given the Overseer role, the others do not 
> have any entry in the nodeVsTags map because they only have the host tag 
> value and not the role tag value.
> Looking at the code in ReplicaAssigner.getTagsForNodes, it is explicitly only 
> keeping entries that fulfil the constraint of having values for all tags used 
> in the rules. Possibly this constraint was suitable when rules were 
> originally introduced, but the Role tag (used for Overseers) is unlikely to 
> be present for all nodes in the cluster, and similarly for sysprop tags which 
> may or not be set for a node.
> My patch removes this constraint, so the nodeVsTags map contains everything 
> known about all nodes, even if they have no value for a given tag. This 
> allows the rule combination above to work, and doesn't appear to cause any 
> problems with the code paths that use the nodeVsTags map. They handle null 
> values quite well, and the tests pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804734#comment-15804734
 ] 

Michael McCandless commented on LUCENE-7588:


Hmm the ES jenkins caught this failure:

{noformat}
 NOTE: reproduce with: ant test  -Dtestcase=TestParallelDrillSideways 
-Dtests.method=testRandom -Dtests.seed=734B3451E1B6F47B -Dtests.slow=true 
-Dtests.locale=ar-BH -Dtests.timezone=America/North_Dakota/Center 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 1.87s J2 | TestParallelDrillSideways.testRandom <<<
   [junit4]> Throwable #1: org.junit.ComparisonFailure: expected:<1[00]4> 
but was:<1[]4>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([734B3451E1B6F47B:107115E50D64208]:0)
   [junit4]>at 
org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1034)
   [junit4]>at 
org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:818)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
{noformat}

And it does repro for me on the current (rev 
7ae9ca85d9d920db353d3d080b0cb36567e206b2) branch_6x head.  [~ekeller] any ideas?

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-06 Thread Tim Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Owen updated SOLR-9936:
---
Attachment: SOLR-9936.patch

> Allow configuration for recoveryExecutor thread pool size
> -
>
> Key: SOLR-9936
> URL: https://issues.apache.org/jira/browse/SOLR-9936
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: replication (java)
>Affects Versions: 6.3
>Reporter: Tim Owen
> Attachments: SOLR-9936.patch
>
>
> There are two executor services in {{UpdateShardHandler}}, the 
> {{updateExecutor}} whose size is unbounded for reasons explained in the code 
> comments. There is also the {{recoveryExecutor}} which was added later, and 
> is the one that executes the {{RecoveryStrategy}} code to actually fetch 
> index files and store to disk, eventually calling an {{fsync}} thread to 
> ensure the data is written.
> We found that with a fast network such as 10GbE it's very easy to overload 
> the local disk storage when doing a restart of Solr instances after some 
> downtime, if they have many cores to load. Typically we have each physical 
> server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir 
> on a dedicated SSD. With 100+ cores (shard replicas) on each instance, 
> startup can really hammer the SSD as it's writing in parallel from as many 
> cores as Solr is recovering. This made recovery time bad enough that replicas 
> were down for a long time, and even shards marked as down if none of its 
> replicas have recovered (usually when many machines have been restarted).
> This patch allowed us to throttle how much parallelism there would be writing 
> to a disk - in practice we're using a pool size of 4 threads, to prevent the 
> SSD getting overloaded, and that worked well enough to make recovery of all 
> cores in reasonable time.
> Due to the comment on the other thread pool size, I'd like some comments on 
> whether it's OK to do this for the {{recoveryExecutor}} though?
> It's configured in solr.xml with e.g.
> {noformat}
>   
> ${solr.recovery.threads:4}
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9936) Allow configuration for recoveryExecutor thread pool size

2017-01-06 Thread Tim Owen (JIRA)
Tim Owen created SOLR-9936:
--

 Summary: Allow configuration for recoveryExecutor thread pool size
 Key: SOLR-9936
 URL: https://issues.apache.org/jira/browse/SOLR-9936
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: replication (java)
Affects Versions: 6.3
Reporter: Tim Owen


There are two executor services in {{UpdateShardHandler}}, the 
{{updateExecutor}} whose size is unbounded for reasons explained in the code 
comments. There is also the {{recoveryExecutor}} which was added later, and is 
the one that executes the {{RecoveryStrategy}} code to actually fetch index 
files and store to disk, eventually calling an {{fsync}} thread to ensure the 
data is written.

We found that with a fast network such as 10GbE it's very easy to overload the 
local disk storage when doing a restart of Solr instances after some downtime, 
if they have many cores to load. Typically we have each physical server 
containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a 
dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can 
really hammer the SSD as it's writing in parallel from as many cores as Solr is 
recovering. This made recovery time bad enough that replicas were down for a 
long time, and even shards marked as down if none of its replicas have 
recovered (usually when many machines have been restarted).

This patch allowed us to throttle how much parallelism there would be writing 
to a disk - in practice we're using a pool size of 4 threads, to prevent the 
SSD getting overloaded, and that worked well enough to make recovery of all 
cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on 
whether it's OK to do this for the {{recoveryExecutor}} though?

It's configured in solr.xml with e.g.

{noformat}
  
${solr.recovery.threads:4}
  
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7611) Make suggester module use LongValuesSource

2017-01-06 Thread Alan Woodward (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-7611:
--
Attachment: LUCENE-7611.patch

Updated patch, moving the constant() helper to LongValuesSource.

> Make suggester module use LongValuesSource
> --
>
> Key: LUCENE-7611
> URL: https://issues.apache.org/jira/browse/LUCENE-7611
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: LUCENE-7611.patch, LUCENE-7611.patch
>
>
> This allows us to remove the suggester module's dependency on the queries 
> module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9777) IndexFingerprinting: use getCombinedCoreAndDeletesKey() instead of getCoreCacheKey() for per-segment caching

2017-01-06 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804572#comment-15804572
 ] 

Ishan Chattopadhyaya commented on SOLR-9777:


I'm planning to commit this soon. So, if someone has the time to review this 
change, would be great.

> IndexFingerprinting: use getCombinedCoreAndDeletesKey() instead of  
> getCoreCacheKey() for per-segment caching
> -
>
> Key: SOLR-9777
> URL: https://issues.apache.org/jira/browse/SOLR-9777
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Noble Paul
> Attachments: SOLR-9777.patch
>
>
> [Note: Had initially posted to SOLR-9506, but now moved here]
> While working on SOLR-5944, I realized that the current per segment caching 
> logic works fine for deleted documents (due to comparison of numDocs in a 
> segment for the criterion of cache hit/miss). However, if a segment has 
> docValues updates, the same logic is insufficient. It is my understanding 
> that changing the key for caching from reader().getCoreCacheKey() to 
> reader().getCombinedCoreAndDeletesKey() would work here, since the docValues 
> updates are internally handled using deletion queue and hence the "combined" 
> core and deletes key would work here. Attaching a patch for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9777) IndexFingerprinting: use getCombinedCoreAndDeletesKey() instead of getCoreCacheKey() for per-segment caching

2017-01-06 Thread Ishan Chattopadhyaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya reassigned SOLR-9777:
--

Assignee: Ishan Chattopadhyaya  (was: Noble Paul)

> IndexFingerprinting: use getCombinedCoreAndDeletesKey() instead of  
> getCoreCacheKey() for per-segment caching
> -
>
> Key: SOLR-9777
> URL: https://issues.apache.org/jira/browse/SOLR-9777
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-9777.patch
>
>
> [Note: Had initially posted to SOLR-9506, but now moved here]
> While working on SOLR-5944, I realized that the current per segment caching 
> logic works fine for deleted documents (due to comparison of numDocs in a 
> segment for the criterion of cache hit/miss). However, if a segment has 
> docValues updates, the same logic is insufficient. It is my understanding 
> that changing the key for caching from reader().getCoreCacheKey() to 
> reader().getCombinedCoreAndDeletesKey() would work here, since the docValues 
> updates are internally handled using deletion queue and hence the "combined" 
> core and deletes key would work here. Attaching a patch for the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr Dedupe Issue

2017-01-06 Thread Paris, Dan
Hi Solr Dev,

I'm attempting to get dedupe working in Solr 6.3.0 but am experiencing some 
issues.

The updateRequestProcessorChain for dedupe doesn't appear to be working.

We are running Solr 6.3.0 (in cloud mode) that is taking in data via a NiFi 
Flow using a "PutSolrContentStream" using the following configuration:

[X][X]

When attempting to use the dedupe capability of Solr as per the 
documentation,
 it is not working. The NiFi flow is continually consuming a hard coded JSON 
document from a lightweight Spring Boot server. The document contains a 
"signature" string and a "content" string (these are just placeholder fields 
for demo purposes). These documents are continually being created in the Solr 
collection when I would expect to see no change.

solrconfig.xml and schema.xml are attached.

Would you be able to provide some assistance?

Thanks in advance,
Dan

Dan Paris | Leading Engineer
250 Brook Drive, Reading, RG2 6UA | United Kingdom
M:  +44 7920783573
dan.pa...@cgi.com  | 
www.cgi.com
Registered in England & Wales (registered number 947968)
Registered Office: 250 Brook Drive, Green Park, Reading RG2 6UA, United Kingdom






  

  
  6.3.0

  

  
  
  

  
  

  
  

  
  
  
  

  
  ${solr.data.dir:}


  
  

  
  

  
  


















${solr.lock.type:native}













  


  
  
  
  
  
  

  
  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}




  ${solr.autoCommit.maxTime:15000}
  false





  ${solr.autoSoftCommit.maxTime:-1}







  

  
  

  
  

1024
























true





20


200




  

  


  

  



false


2

  


  
  









  

  
  
  


  explicit
  10
  








  

  
  

  explicit
  json
  true

  


  
  

  explicit

  

  

  _text_

  

  

  add-unknown-fields-to-the-schema

  

  
  

  true
  ignored_
  _text_

  

  

  
  

text_general





  default
  _text_
  solr.DirectSolrSpellChecker
  
  internal
  
  0.5
  
  2
  
  1
  
  5
  
  4
  
  0.01
  




  

  
  

  
  default
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  

  
  

  
  

  true


  tvComponent

  

  

  
  

  
  

  true
  false


  terms

  


  
  

string
elevate.xml
  

  
  

  explicit


  elevator

  

  
  

  
  
  

  100

  

  
  

  
  70
  
  0.5
  
  [-\w ,/\n\]{20,200}

  

  
  

  
  

  

  
  

  
  

  
  

  
  

  
  

  

  
  

  
  

  

  

  10
  .,!? 

  

  

  
  WORD
  
  
  en
  US

  

  

  
  
  
  







  [^\w-\.]
  _





  
-MM-dd'T'HH:mm:ss.SSSZ
-MM-dd'T'HH:mm:ss,SSSZ
-MM-dd'T'HH:mm:ss.SSS
-MM-dd'T'HH:mm:ss,SSS
-MM-dd'T'HH:mm:ssZ
-MM-dd'T'HH:mm:ss
-MM-dd'T'HH:mmZ
-MM-dd'T'HH:mm
-MM-dd HH:mm:ss.SSSZ
-MM-dd HH:mm:ss,SSSZ
-MM-dd HH:mm:ss.SSS
-MM-dd HH:mm:ss,SSS
-MM-dd HH:mm:ssZ
-MM-dd HH:mm:ss
-MM-dd HH:mmZ
-MM-dd HH:mm
-MM-dd
  


  strings
  
java.lang.Boolean
booleans
  
  
java.util.Date
tdates
  
  
java.lang.Long
java.lang.Integer
tlongs
  
  
java.lang.Number
tdoubles
  


  

  
 
   
 true
 id
 false
 signature,content
 solr.processor.Lookup3Signature
	 
 
   
   
   
 

	
	  
	dedupe
	  
	

  
  

  
  

  
  
  

  

text/plain; charset=UTF-8
  

  
  
${velocity.template.base.dir:}
${velocity.solr.resource.loader.enabled:true}

[JENKINS] Lucene-Solr-6.x-Solaris (64bit/jdk1.8.0) - Build # 600 - Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/600/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.hdfs.HdfsRecoveryZkTest

Error Message:
ObjectTracker found 1 object(s) that were not released!!! [HdfsTransactionLog] 
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:43)
  at 
org.apache.solr.update.HdfsTransactionLog.(HdfsTransactionLog.java:130)  
at org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:202)  at 
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)  at 
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:94)  at 
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:102)
  at sun.reflect.GeneratedConstructorAccessor183.newInstance(Unknown Source)  
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)  at 
org.apache.solr.core.SolrCore.createInstance(SolrCore.java:753)  at 
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:815)  at 
org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1065)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:930)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:823)  at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:889)  at 
org.apache.solr.core.CoreContainer.lambda$load$3(CoreContainer.java:541)  at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
 at java.lang.Thread.run(Thread.java:745)  

Stack Trace:
java.lang.AssertionError: ObjectTracker found 1 object(s) that were not 
released!!! [HdfsTransactionLog]
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException
at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:43)
at 
org.apache.solr.update.HdfsTransactionLog.(HdfsTransactionLog.java:130)
at org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:202)
at org.apache.solr.update.UpdateHandler.(UpdateHandler.java:137)
at org.apache.solr.update.UpdateHandler.(UpdateHandler.java:94)
at 
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:102)
at sun.reflect.GeneratedConstructorAccessor183.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:753)
at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:815)
at org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:1065)
at org.apache.solr.core.SolrCore.(SolrCore.java:930)
at org.apache.solr.core.SolrCore.(SolrCore.java:823)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:889)
at 
org.apache.solr.core.CoreContainer.lambda$load$3(CoreContainer.java:541)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


at __randomizedtesting.SeedInfo.seed([8F36A980324E7016]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNull(Assert.java:551)
at 
org.apache.solr.SolrTestCaseJ4.teardownTestCases(SolrTestCaseJ4.java:266)
at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:870)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[JENKINS] Lucene-Solr-master-Linux (32bit/jdk1.8.0_112) - Build # 18711 - Unstable!

2017-01-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18711/
Java: 32bit/jdk1.8.0_112 -client -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.solr.cloud.PeerSyncReplicationTest.test

Error Message:
timeout waiting to see all nodes active

Stack Trace:
java.lang.AssertionError: timeout waiting to see all nodes active
at 
__randomizedtesting.SeedInfo.seed([84F4EA3AA629126F:CA0D5E008D57F97]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.waitTillNodesActive(PeerSyncReplicationTest.java:311)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.bringUpDeadNodeAndEnsureNoReplication(PeerSyncReplicationTest.java:262)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.forceNodeFailureAndDoPeerSync(PeerSyncReplicationTest.java:244)
at 
org.apache.solr.cloud.PeerSyncReplicationTest.test(PeerSyncReplicationTest.java:133)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 

[jira] [Commented] (LUCENE-7614) Allow single prefix "phrase*" in complexphrase queryparser

2017-01-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804457#comment-15804457
 ] 

Michael McCandless commented on LUCENE-7614:


+1

> Allow single prefix "phrase*" in complexphrase queryparser 
> ---
>
> Key: LUCENE-7614
> URL: https://issues.apache.org/jira/browse/LUCENE-7614
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Mikhail Khludnev
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7614.patch, LUCENE-7614.patch
>
>
> {quote}
> From  Otmar Caduff 
> Subject   ComplexPhraseQueryParser with wildcards
> Date  Tue, 20 Dec 2016 13:55:42 GMT
> Hi,
> I have an index with a single document with a field "field" and textual
> content "johnny peters" and I am using
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser to
> parse the query:
>field: (john* peter)
> When searching with this query, I am getting the document as expected.
> However with this query:
>field: ("john*" "peter")
> I am getting the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException: Unknown
> query type "org.apache.lucene.search.PrefixQuery" found in phrase query
> string "john*"
> at
> org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:268)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >