[jira] [Commented] (SOLR-13933) Cluster mode Stress test suite

2019-11-23 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980991#comment-16980991
 ] 

Ishan Chattopadhyaya commented on SOLR-13933:
-

If someone has ideas on an existing Java tool that can take in a configuration 
(JSON) of tasks to run in separate threads and then execute them while 
collecting metrics, please let me know. I've evaluated sundial, argo, etc. 
JMeter's embdedded mode also came close. Ditched all of them, as the 
configurations were very ugly. So, consequently, building this from scratch 
(based on the configuration in my previous comment). Will update the 
configuration as necessary, so as to keep it as simple and yet as expressive as 
possible.

> Cluster mode Stress test suite 
> ---
>
> Key: SOLR-13933
> URL: https://issues.apache.org/jira/browse/SOLR-13933
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> We need a stress test harness based on 10s or 100s of nodes, 1000s of 
> collection API operations, overseer operations etc. This suite should run 
> nightly, publish results publicly, so as to help with:
> # Uncover stability problems
> # Benchmarking (timings, resource metrics etc.) on collection operations
> # Indexing/querying performance
> # Validate the accuracy of potential improvements
> References:
> SOLR-10317
> https://github.com/lucidworks/solr-scale-tk
> https://github.com/shalinmangar/solr-perf-tools
> Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13933) Cluster mode Stress test suite

2019-11-23 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980989#comment-16980989
 ] 

Ishan Chattopadhyaya commented on SOLR-13933:
-

I've managed to spin up and down instances on GCP, build the Solr jar from a 
commit and scp it to the instances.

Here's a tentative format for defining the tasks to be executed.
{code}
{
"task-types": [
{
"name": "indexing-wikipedia",
"indexing-benchmark": {
"name": "wikipedia-small",
"description": "Wikipedia dataset on SolrCloud",
"dataset-file": "small-data/small-enwiki.tsv.gz",
"setups": [
  {
"setup-name": "wiki_2x2",
"collection": "wiki_2x2",
"replication-factor": 2,
"shards": 2,
"min-threads": 4,
"max-threads": 12,
"thread-step": 4
  }
]
  }
},
{
"name": "collection-creation",
"command": 
"http://${HOST}:${PORT}/solr/collections/admin?action=CREATE=collection${INDEX}=${SHARDS};,
"defaults": {
"INDEX": 0,
"SHARDS": 1
}
},
{
"name": "shard-splitting",
"command": 
"http://${HOST}:${PORT}/solr/collections/admin?action=SPLITSHARD=${COLLECTION}=${SHARD};,
"defaults": {}
}
],

"global-variables": {
"collection-counter": 1
},

"tasks": [
{
"task": "task1",
"type": "indexing-wikipedia",
"mode": "async"
},
{
"description": "Create 100 collections parallely using 4 threads",
"task": "task2",
"type": "collection-creation",
"instances": 100,
"concurrency": 4,
"parameters": {
"INDEX": "${collection-counter}"
},
"pre-task-evals": [
"inc(collection-counter,1)"
],
"mode": "async"
},
{
"description": "Once all collections are created, split a shard in 
collection1",
"task": "task3",
"type": "shard-splitting",
"parameters": {
"COLLECTION": "collection1",
"SHARD": "shard1"
},
"waitFor": "task2",
"mode": "sync"
}
]
}
{code}

> Cluster mode Stress test suite 
> ---
>
> Key: SOLR-13933
> URL: https://issues.apache.org/jira/browse/SOLR-13933
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> We need a stress test harness based on 10s or 100s of nodes, 1000s of 
> collection API operations, overseer operations etc. This suite should run 
> nightly, publish results publicly, so as to help with:
> # Uncover stability problems
> # Benchmarking (timings, resource metrics etc.) on collection operations
> # Indexing/querying performance
> # Validate the accuracy of potential improvements
> References:
> SOLR-10317
> https://github.com/lucidworks/solr-scale-tk
> https://github.com/shalinmangar/solr-perf-tools
> Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13842) Remove wt=json from Implicit API definition's defaults

2019-11-23 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980956#comment-16980956
 ] 

Munendra S N commented on SOLR-13842:
-

Please go ahead. Leave a comment when u pick this up

> Remove wt=json from Implicit API definition's defaults
> --
>
> Key: SOLR-13842
> URL: https://issues.apache.org/jira/browse/SOLR-13842
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Munendra S N
>Priority: Minor
>  Labels: newdev
>
> From solr 7, {{json}} is the default response writer. So, {{wt=json}} can be 
> removed from implicit API definitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980913#comment-16980913
 ] 

Noble Paul commented on SOLR-13963:
---

I have attached a patch where the main {{_readStr(DataInputInputStream dis, 
StringCache stringCache, int sz)}} does not pay the price of synchronization.
I'm still thinking how I can make a testcase which can localize the problem 

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt 
> intranode updates
> --
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Assignee: Noble Paul
>Priority: Major
> Attachments: JavaBinCodec.java, SOLR-13963.patch, SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12193) Move some log messages to TRACE level

2019-11-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-12193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-12193.

Fix Version/s: (was: 8.1)
   (was: master (9.0))
   8.4
   Resolution: Fixed

Thanks [~gezapeti], finally got around to this one

> Move some log messages to TRACE level
> -
>
> Key: SOLR-12193
> URL: https://issues.apache.org/jira/browse/SOLR-12193
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: newbie, newdev
> Fix For: 8.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One example of a wasteful DEBUG log which could be moved to TRACE level is:
> {noformat}
> $ solr start -f -v
> 2018-04-05 22:46:14.488 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /opt/solr/server/solr/solr.xml
> 2018-04-05 22:46:14.574 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@coreLoadThreads
> 2018-04-05 22:46:14.577 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@persistent
> 2018-04-05 22:46:14.579 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@sharedLib
> 2018-04-05 22:46:14.581 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@zkHost
> 2018-04-05 22:46:14.583 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/cores
> 2018-04-05 22:46:14.605 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/transientCoreCacheFactory
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/counter
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/meter
> 2018-04-05 22:46:14.611 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/timer
> 2018-04-05 22:46:14.612 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/histogram
> 201
> {noformat}
> There are probably other examples as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12193) Move some log messages to TRACE level

2019-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980910#comment-16980910
 ] 

ASF subversion and git services commented on SOLR-12193:


Commit 5f11efb2d51ce7ebc28012db059553f83ba4fdff in lucene-solr's branch 
refs/heads/branch_8x from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f11efb ]

SOLR-12193: Move some log messages to TRACE level, remove some dead code

(cherry picked from commit d809bc27f1b5cd6d97e0bfe688c99d481bc42d39)


> Move some log messages to TRACE level
> -
>
> Key: SOLR-12193
> URL: https://issues.apache.org/jira/browse/SOLR-12193
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: newbie, newdev
> Fix For: 8.1, master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One example of a wasteful DEBUG log which could be moved to TRACE level is:
> {noformat}
> $ solr start -f -v
> 2018-04-05 22:46:14.488 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /opt/solr/server/solr/solr.xml
> 2018-04-05 22:46:14.574 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@coreLoadThreads
> 2018-04-05 22:46:14.577 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@persistent
> 2018-04-05 22:46:14.579 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@sharedLib
> 2018-04-05 22:46:14.581 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@zkHost
> 2018-04-05 22:46:14.583 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/cores
> 2018-04-05 22:46:14.605 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/transientCoreCacheFactory
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/counter
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/meter
> 2018-04-05 22:46:14.611 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/timer
> 2018-04-05 22:46:14.612 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/histogram
> 201
> {noformat}
> There are probably other examples as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-13963:
--
Attachment: SOLR-13963.patch

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt 
> intranode updates
> --
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Assignee: Noble Paul
>Priority: Major
> Attachments: JavaBinCodec.java, SOLR-13963.patch, SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12193) Move some log messages to TRACE level

2019-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980911#comment-16980911
 ] 

ASF subversion and git services commented on SOLR-12193:


Commit 340b238f1c15e4c5facc58990fbb653064a0b121 in lucene-solr's branch 
refs/heads/branch_8x from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=340b238 ]

SOLR-12193: reverting one line back to trace

(cherry picked from commit 592ea19eff0a0d4225f92d0b96bfb3c9559c077e)


> Move some log messages to TRACE level
> -
>
> Key: SOLR-12193
> URL: https://issues.apache.org/jira/browse/SOLR-12193
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: newbie, newdev
> Fix For: 8.1, master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One example of a wasteful DEBUG log which could be moved to TRACE level is:
> {noformat}
> $ solr start -f -v
> 2018-04-05 22:46:14.488 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /opt/solr/server/solr/solr.xml
> 2018-04-05 22:46:14.574 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@coreLoadThreads
> 2018-04-05 22:46:14.577 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@persistent
> 2018-04-05 22:46:14.579 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@sharedLib
> 2018-04-05 22:46:14.581 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@zkHost
> 2018-04-05 22:46:14.583 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/cores
> 2018-04-05 22:46:14.605 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/transientCoreCacheFactory
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/counter
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/meter
> 2018-04-05 22:46:14.611 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/timer
> 2018-04-05 22:46:14.612 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/histogram
> 201
> {noformat}
> There are probably other examples as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12193) Move some log messages to TRACE level

2019-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980909#comment-16980909
 ] 

ASF subversion and git services commented on SOLR-12193:


Commit 592ea19eff0a0d4225f92d0b96bfb3c9559c077e in lucene-solr's branch 
refs/heads/master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=592ea19 ]

SOLR-12193: reverting one line back to trace


> Move some log messages to TRACE level
> -
>
> Key: SOLR-12193
> URL: https://issues.apache.org/jira/browse/SOLR-12193
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: newbie, newdev
> Fix For: 8.1, master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One example of a wasteful DEBUG log which could be moved to TRACE level is:
> {noformat}
> $ solr start -f -v
> 2018-04-05 22:46:14.488 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /opt/solr/server/solr/solr.xml
> 2018-04-05 22:46:14.574 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@coreLoadThreads
> 2018-04-05 22:46:14.577 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@persistent
> 2018-04-05 22:46:14.579 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@sharedLib
> 2018-04-05 22:46:14.581 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@zkHost
> 2018-04-05 22:46:14.583 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/cores
> 2018-04-05 22:46:14.605 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/transientCoreCacheFactory
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/counter
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/meter
> 2018-04-05 22:46:14.611 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/timer
> 2018-04-05 22:46:14.612 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/histogram
> 201
> {noformat}
> There are probably other examples as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12193) Move some log messages to TRACE level

2019-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980904#comment-16980904
 ] 

ASF subversion and git services commented on SOLR-12193:


Commit d809bc27f1b5cd6d97e0bfe688c99d481bc42d39 in lucene-solr's branch 
refs/heads/master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d809bc2 ]

SOLR-12193: Move some log messages to TRACE level, remove some dead code


> Move some log messages to TRACE level
> -
>
> Key: SOLR-12193
> URL: https://issues.apache.org/jira/browse/SOLR-12193
> Project: Solr
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Labels: newbie, newdev
> Fix For: 8.1, master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One example of a wasteful DEBUG log which could be moved to TRACE level is:
> {noformat}
> $ solr start -f -v
> 2018-04-05 22:46:14.488 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
> container configuration from /opt/solr/server/solr/solr.xml
> 2018-04-05 22:46:14.574 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@coreLoadThreads
> 2018-04-05 22:46:14.577 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@persistent
> 2018-04-05 22:46:14.579 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@sharedLib
> 2018-04-05 22:46:14.581 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/@zkHost
> 2018-04-05 22:46:14.583 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/cores
> 2018-04-05 22:46:14.605 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/transientCoreCacheFactory
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/counter
> 2018-04-05 22:46:14.609 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/meter
> 2018-04-05 22:46:14.611 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/timer
> 2018-04-05 22:46:14.612 DEBUG (main) [   ] o.a.s.c.Config null missing 
> optional solr/metrics/suppliers/histogram
> 201
> {noformat}
> There are probably other examples as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal

2019-11-23 Thread Rahul Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980901#comment-16980901
 ] 

Rahul Yadav commented on LUCENE-8674:
-

I am looking at this

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47)
> at 
> org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188)
> at 
> org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
> at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
> at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177)
> at 
> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:817)
> at 
> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1025)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1540)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
> at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
> at 
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1434)
> {noformat}
> Sadly, I can't understand the logic of this code well enough to give any 
> insights.
> To set up an environment to reproduce this bug, follow the description in the 
> ‘Environment’ field.
> We found this issue and ~70 more like this using [Diffblue Microservices 
> Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more 
> information on this [fuzz testing 
> 

[jira] [Resolved] (SOLR-13345) Admin UI login page doesn't accept empty passwords

2019-11-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-13345.

Resolution: Won't Fix

Closing this. I checked a few code paths, and there are some checks that won't 
allow you to enter an empty password. Have not checked everywhere though. Feel 
free to re-open if you want to block empty password in other code paths.

> Admin UI login page doesn't accept empty passwords
> --
>
> Key: SOLR-13345
> URL: https://issues.apache.org/jira/browse/SOLR-13345
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: 7.7, 8.0
>Reporter: Märt
>Assignee: Jan Høydahl
>Priority: Minor
>
> In solr 7.6 and older, it was possible to log in with an empty password using 
> basic auth. The new Admin UI login page implemented in SOLR-7896 no longer 
> accepts empty passwords.
> This issue was discussed in the solr-user mailing list 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201903.mbox/%3C7629BDDD-3D22-4203-9188-0E0A8DCF2FEE%40cominvent.com%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13842) Remove wt=json from Implicit API definition's defaults

2019-11-23 Thread Rahul Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980888#comment-16980888
 ] 

Rahul Yadav commented on SOLR-13842:


If no one is working on this , can i start looking at this?

> Remove wt=json from Implicit API definition's defaults
> --
>
> Key: SOLR-13842
> URL: https://issues.apache.org/jira/browse/SOLR-13842
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Munendra S N
>Priority: Minor
>  Labels: newdev
>
> From solr 7, {{json}} is the default response writer. So, {{wt=json}} can be 
> removed from implicit API definitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9031) UnsupportedOperationException on highlighting Interval Query

2019-11-23 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9031:
-
Attachment: LUCENE-9031.patch
Status: Patch Available  (was: Patch Available)

Moved passing (only) intervals tests to {{TestUnifiedHighlighterTermIntervals}}

Refreshed https://github.com/apache/lucene-solr/pull/1011 as well. 

So far, there two next open questions: 
* fixField() highlighting LUCENE-9058
* Multiterm Intervals highlihgting. 

Giving that, everything what's requested has been resolved, I suppose to push 
it after +1 from precommit. However, more feedbacks, concerns and even vetoes 
are highly appreciated. 

Thanks, [~romseygeek]! 

> UnsupportedOperationException on highlighting Interval Query
> 
>
> Key: LUCENE-9031
> URL: https://issues.apache.org/jira/browse/LUCENE-9031
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.4
>
> Attachments: LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, 
> LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, 
> LUCENE-9031.patch, LUCENE-9031.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When UnifiedHighlighter highlights Interval Query it encounters 
> UnsupportedOperationException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980879#comment-16980879
 ] 

Colvin Cowie commented on SOLR-13963:
-

[~noble.paul] Eclipse is playing up and I need to head off.

If you run the test in the patch without the extra synchronized{} block you 
(should) see it fail every time.

I've attached another version of the JBC where I've replaced the synchronized{} 
with a Reentrant lock and thrown an exception when the lock for the instance is 
held by another thread, which shows that it is happening.

If there's more detail you need after you've tried running the test, then let 
me know and I can get back to you tomorrow.

When I was debugging it earlier, I grabbed this stacktrace where two threads 
were modifying the same instance. The line numbers won't match here because the 
code was formatted differently.
{noformat}
Thread [qtp1047503754-123] (Suspended) 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec)._readStr(DataInputInputStream,
 JavaBinCodec$StringCache, int) line: 931 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readStr(DataInputInputStream,
 JavaBinCodec$StringCache, boolean) line: 920 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readExternString(DataInputInputStream)
 line: 1190 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readObject(DataInputInputStream)
 line: 302 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readVal(DataInputInputStream)
 line: 280 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readSolrInputDocument(DataInputInputStream)
 line: 626 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readObject(DataInputInputStream)
 line: 339 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readVal(DataInputInputStream)
 line: 280 
 
JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(DataInputInputStream)
 line: 321 
 JavaBinUpdateRequestCodec$StreamingCodec.readIterator(DataInputInputStream) 
line: 280 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readObject(DataInputInputStream)
 line: 335 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readVal(DataInputInputStream)
 line: 280 
 JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(DataInputInputStream) 
line: 235 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readObject(DataInputInputStream)
 line: 300 
 
JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).readVal(DataInputInputStream)
 line: 280 
 JavaBinUpdateRequestCodec$StreamingCodec(JavaBinCodec).unmarshal(InputStream) 
line: 189 
 JavaBinUpdateRequestCodec.unmarshal(InputStream, 
JavaBinUpdateRequestCodec$StreamingUpdateHandler) line: 126 
 JavabinLoader.parseAndLoadDocs(SolrQueryRequest, SolrQueryResponse, 
InputStream, UpdateRequestProcessor) line: 123 
 JavabinLoader.load(SolrQueryRequest, SolrQueryResponse, ContentStream, 
UpdateRequestProcessor) line: 70 
 UpdateRequestHandler$1.load(SolrQueryRequest, SolrQueryResponse, 
ContentStream, UpdateRequestProcessor) line: 97 
 
UpdateRequestHandler(ContentStreamHandlerBase).handleRequestBody(SolrQueryRequest,
 SolrQueryResponse) line: 68 
 UpdateRequestHandler(RequestHandlerBase).handleRequest(SolrQueryRequest, 
SolrQueryResponse) line: 198 
 SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) 
line: 2576 
 HttpSolrCall.execute(SolrQueryResponse) line: 803 
 HttpSolrCall.call() line: 582 
 RobustSolrDispatchFilter(SolrDispatchFilter).doFilter(ServletRequest, 
ServletResponse, FilterChain, boolean) line: 424 
 RobustSolrDispatchFilter(SolrDispatchFilter).doFilter(ServletRequest, 
ServletResponse, FilterChain) line: 351 
 ServletHandler$CachedChain.doFilter(ServletRequest, ServletResponse) line: 
1602 
 ServletHandler.doHandle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 540 
 ServletHandler(ScopedHandler).handle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 146 
 ConstraintSecurityHandler(SecurityHandler).handle(String, Request, 
HttpServletRequest, HttpServletResponse) line: 548 
 SessionHandler(HandlerWrapper).handle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 132 
 SessionHandler(ScopedHandler).nextHandle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 257 
 SessionHandler.doHandle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 1711 
 WebAppContext(ScopedHandler).nextHandle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 255 
 WebAppContext(ContextHandler).doHandle(String, Request, HttpServletRequest, 
HttpServletResponse) line: 1347 
 ServletHandler(ScopedHandler).nextScope(String, Request, HttpServletRequest, 
HttpServletResponse) line: 203 
 ServletHandler.doScope(String, Request, HttpServletRequest, 
HttpServletResponse) line: 480 
 SessionHandler.doScope(String, Request, HttpServletRequest, 
HttpServletResponse) line: 1678 
 

[jira] [Updated] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colvin Cowie updated SOLR-13963:

Attachment: JavaBinCodec.java

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt 
> intranode updates
> --
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Assignee: Noble Paul
>Priority: Major
> Attachments: JavaBinCodec.java, SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980876#comment-16980876
 ] 

Noble Paul commented on SOLR-13963:
---

Thanks , I'll try to modify the test to a smaller easily reproducible 

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt 
> intranode updates
> --
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-13963:
-

Assignee: Noble Paul

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt 
> intranode updates
> --
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Assignee: Noble Paul
>Priority: Major
> Attachments: SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13963) JavaBinCodec has concurrent modification of CharArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colvin Cowie updated SOLR-13963:

Summary: JavaBinCodec has concurrent modification of CharArr resulting in 
corrupt intranode updates  (was: JavaBinCodec has concurrent modification of 
CharrArr resulting in corrupt intranode updates)

> JavaBinCodec has concurrent modification of CharArr resulting in corrupt 
> intranode updates
> --
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
> Attachments: SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8674) UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal

2019-11-23 Thread Rahul Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980868#comment-16980868
 ] 

Rahul Yadav commented on LUCENE-8674:
-

Hi ,

NewDev here , can i take up this issue?

> UnsupportedOperationException due to call to o.a.l.q.f.FunctionValues.floatVal
> --
>
> Key: LUCENE-8674
> URL: https://issues.apache.org/jira/browse/LUCENE-8674
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Affects Versions: master (9.0)
> Environment: h1. Steps to reproduce
> * Use a Linux machine.
> *  Build commit {{ea2c8ba}} of Solr as described in the section below.
> * Build the films collection as described below.
> * Start the server using the command {{./bin/solr start -f -p 8983 -s 
> /tmp/home}}
> * Request the URL given in the bug description.
> h1. Compiling the server
> {noformat}
> git clone https://github.com/apache/lucene-solr
> cd lucene-solr
> git checkout ea2c8ba
> ant compile
> cd solr
> ant server
> {noformat}
> h1. Building the collection and reproducing the bug
> We followed [Exercise 
> 2|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html#exercise-2] from 
> the [Solr 
> Tutorial|http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html].
> {noformat}
> mkdir -p /tmp/home
> echo '' > 
> /tmp/home/solr.xml
> {noformat}
> In one terminal start a Solr instance in foreground:
> {noformat}
> ./bin/solr start -f -p 8983 -s /tmp/home
> {noformat}
> In another terminal, create a collection of movies, with no shards and no 
> replication, and initialize it:
> {noformat}
> bin/solr create -c films
> curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": 
> {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' 
> http://localhost:8983/solr/films/schema
> curl -X POST -H 'Content-type:application/json' --data-binary 
> '{"add-copy-field" : {"source":"*","dest":"_text_"}}' 
> http://localhost:8983/solr/films/schema
> ./bin/post -c films example/films/films.json
> curl -v “URL_BUG”
> {noformat}
> Please check the issue description below to find the “URL_BUG” that will 
> allow you to reproduce the issue reported.
>Reporter: Johannes Kloos
>Priority: Minor
>  Labels: diffblue, newdev
>
> Requesting the following URL causes Solr to return an HTTP 500 error response:
> {noformat}
> http://localhost:8983/solr/films/select?fq={!frange%20l=10%20u=100}or_version_s,directed_by
> {noformat}
> The error response seems to be caused by the following uncaught exception:
> {noformat}
> java.lang.UnsupportedOperationException
> at 
> org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:47)
> at 
> org.apache.lucene.queries.function.FunctionValues$3.matches(FunctionValues.java:188)
> at 
> org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:53)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.doNext(TwoPhaseIterator.java:89)
> at 
> org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator.nextDoc(TwoPhaseIterator.java:77)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261)
> at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
> at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
> at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1177)
> at 
> org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:817)
> at 
> org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1025)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1540)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
> at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
> at 
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1434)
> {noformat}
> Sadly, I can't understand the logic of this code well enough to give any 
> insights.
> To set up an environment to reproduce this bug, follow the description in the 
> ‘Environment’ field.
> We found this issue and ~70 more like this using [Diffblue Microservices 
> Testing|https://www.diffblue.com/labs/?utm_source=solr-br]. Find more 
> information on this [fuzz testing 
> 

[jira] [Commented] (LUCENE-6744) equals methods should compare classes directly, not use instanceof

2019-11-23 Thread Rahul Yadav (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980862#comment-16980862
 ] 

Rahul Yadav commented on LUCENE-6744:
-

Hi  ,

 NewDev here , is this issue resolved/abandoned? , if not, can i start looking 
at this?

> equals methods should compare classes directly, not use instanceof
> --
>
> Key: LUCENE-6744
> URL: https://issues.apache.org/jira/browse/LUCENE-6744
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chris M. Hostetter
>Priority: Major
>  Labels: newdev
> Attachments: LUCENE-6744.patch, LUCENE-6744.patch
>
>
> from a 2015-07-12 email to the dev list from Fuxiang Chen...
> {noformat}
> We have found some inconsistencies in the overriding of the equals() method
> in some files with respect to the conforming to the contract structure
> based on the Java Specification.
> Affected files:
> 1) ConstValueSource.java
> 2) DoubleConstValueSource.java
> 3) FixedBitSet.java
> 4) GeohashFunction.java
> 5) LongBitSet.java
> 6) SpanNearQuery.java
> 7) StringDistanceFunction.java
> 8) ValueSourceRangeFilter.java
> 9) VectorDistanceFunction.java
> The above files all uses instanceof in the overridden equals() method in
> comparing two objects.
> According to the Java Specification, the equals() method must be reflexive,
> symmetric, transitive and consistent. In the case of symmetric, it is
> stated that x.equals(y) should return true if and only if y.equals(x)
> returns true. Using instanceof is asymmetric and is not a valid symmetric
> contract.
> A more preferred way will be to compare the classes instead. i.e. if
> (this.getClass() != o.getClass()).
> However, if compiling the source code using JDK 7 and above, and if
> developers still prefer to use instanceof, you can make use of the static
> methods of Objects such as Objects.equals(this.id, that.id). (Making use of
> the static methods of Objects is currently absent in the methods.) It will
> be easier to override the equals() method and will ensure that the
> overridden equals() method will fulfill the contract rules.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980860#comment-16980860
 ] 

Colvin Cowie commented on SOLR-13963:
-

I've attached a patch that fixes it, and I've included a new test that 
reproduces the problem without the fix...

I don't know enough about the way the tests have been done for Solr to know 
what the best way to write a test for this is, so I've just done something that 
worked.

But if there is a better way to do it / different coding style etc, then 
obviously I'm open to it being done differently.

> JavaBinCodec has concurrent modification of CharrArr resulting in corrupt 
> intranode updates
> ---
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
> Attachments: SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colvin Cowie updated SOLR-13963:

Status: Patch Available  (was: Open)

> JavaBinCodec has concurrent modification of CharrArr resulting in corrupt 
> intranode updates
> ---
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
> Attachments: SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colvin Cowie updated SOLR-13963:

Attachment: SOLR-13963.patch

> JavaBinCodec has concurrent modification of CharrArr resulting in corrupt 
> intranode updates
> ---
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
> Attachments: SOLR-13963.patch
>
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9056) Simplify BlockImpactsDocsEnum#advance

2019-11-23 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9056.
--
Fix Version/s: 8.4
   Resolution: Fixed

> Simplify BlockImpactsDocsEnum#advance
> -
>
> Key: LUCENE-9056
> URL: https://issues.apache.org/jira/browse/LUCENE-9056
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is a follow-up to LUCENE-9027. Now that we compute the prefix sum in 
> #refillDocs, we can remove the check that we are on the last document of the 
> postings list so that we should return NO_MORE_DOCS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9060) Fix the files generated python scripts in lucene/util/packed to not use RamUsageEstimator.NUM_BYTES_INT

2019-11-23 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980838#comment-16980838
 ] 

Adrien Grand commented on LUCENE-9060:
--

+1 Does the script then recreate the file exactly as it is today?

> Fix the files generated python scripts in lucene/util/packed to not use 
> RamUsageEstimator.NUM_BYTES_INT
> ---
>
> Key: LUCENE-9060
> URL: https://issues.apache.org/jira/browse/LUCENE-9060
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9060.patch
>
>
> RamUsageEstimator.NUM_BYTES_INT has been removed. But the Python code still 
> puts it in the generated code. Once you run "ant regenerate" (and I had to 
> run it with 24G!) you can no longer build.
> We should verify that warnings against hand-editing end up in the generated 
> code, although they weren't hand-edited in this case.
> It looks like the constants were removed as part of LUCENE-8745.
> I think it's just a straightforward substitution of "Integer.BYTES".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8985) SynonymGraphFilter cannot handle input stream with tokens filtered.

2019-11-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980835#comment-16980835
 ] 

Jan Høydahl commented on LUCENE-8985:
-

I was hoping the new tests would describe the bug. The current sgf code does 
not handle holes in the token stream caused by removing tokens in e.g. 
stopfilter. When holes are remover then a phrase query may get wrong match. 
Simplified example:

Document: Please clean the screen
After stopfilter: Please clean * screen

Query: “clean the monitor”
After stopfilter: “clean * monitor”
After sgf: “clean screen|monitor”
No match

> SynonymGraphFilter cannot handle input stream with tokens filtered.
> ---
>
> Key: LUCENE-8985
> URL: https://issues.apache.org/jira/browse/LUCENE-8985
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chongchen Chen
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
> Attachments: SGF_SF_interaction.patch.txt
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [~janhoy] find the bug.
> In an analyzer with e.g. stopFilter where tokens are removed from the stream 
> and replaced with a “hole”, synonymgraphfilter will not preserve these holes 
> but remove them, resulting in certain phrase queries failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980830#comment-16980830
 ] 

Ishan Chattopadhyaya commented on SOLR-13963:
-

Thanks Colvin! [~noble], FYI.

> JavaBinCodec has concurrent modification of CharrArr resulting in corrupt 
> intranode updates
> ---
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980829#comment-16980829
 ] 

Colvin Cowie commented on SOLR-13963:
-

SOLR-12983 is in 7.x and is when the getStringProvider() was added, so there is 
a potential bug in 7 as well. But maybe there's nothing hitting it.

[https://github.com/apache/lucene-solr/commit/507a96e4181d4151d36332d46dd51e7ca5a09f90]

Probably worth applying the fix to both anyway

> JavaBinCodec has concurrent modification of CharrArr resulting in corrupt 
> intranode updates
> ---
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
> context calls
>  org.apache.solr.common.util.JavaBinCodec.getStringProvider()
>  
>  JavaBinCodec has a CharArr, _arr_, which is modified in two different 
> locations, but only one of which is protected with a synchronized block
>   
>  getStringProvider() synchronizes on _arr_:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
>   
>  but  _readStr() doesn't:
>  
> [https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
>   
>  The two methods are called concurrently, but wheren't prior to SOLR-13682.
>   
>  Adding a synchronized block into _readStr() around the modification of _arr_ 
> fixes the problem as far as I can see.
>  
> Also, the problem does not seem to occur when using the dynamic schema mode 
> of autoCreateFields=true in the updateRequestProcessorChain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colvin Cowie updated SOLR-13963:

Description: 
Discussed on the mailing list "Possible data corruption in JavaBinCodec in Solr 
8.3 during distributed update?"

 

In summary, after moving to 8.3 we had a consistent (but non-deterministic) set 
of failing tests where the data being sent in intranode requests was 
_sometimes_ corrupted. For example if the well formed data was
 _'fieldName':"this is a long string"_
 The error we saw from Solr might be that
 unknown field _+'fieldNamis a long string"+_ 
  
 The change that indirectly caused to this issue to materialize was from 
SOLR-13682 which meant that 
org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
org.apache.solr.common.SolrInputField.getValue() rather than 
org.apache.solr.common.SolrInputField.getRawValue() as it had before.
  
 getRawValue for a string calls 
org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
context calls
 org.apache.solr.common.util.JavaBinCodec.getStringProvider()

 
 JavaBinCodec has a CharArr, _arr_, which is modified in two different 
locations, but only one of which is protected with a synchronized block
  
 getStringProvider() synchronizes on _arr_:
 
[https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
  
 but  _readStr() doesn't:
 
[https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
  
 The two methods are called concurrently, but wheren't prior to SOLR-13682.
  
 Adding a synchronized block into _readStr() around the modification of _arr_ 
fixes the problem as far as I can see.

 

Also, the problem does not seem to occur when using the dynamic schema mode of 
autoCreateFields=true in the updateRequestProcessorChain.

  was:
Discussed on the mailing list "Possible data corruption in JavaBinCodec in Solr 
8.3 during distributed update?"

 

In summary, after moving to 8.3 we had a consistent (but non-deterministic) set 
of failing tests where the data being sent in intranode requests was 
_sometimes_ corrupted. For example if the well formed data was
_'fieldName':"this is a long string"_
The error we saw from Solr might be that
unknown field  _+'fieldNamis a long string"+_ 
 
The change that indirectly caused to this issue to materialize was from 
SOLR-13682 which meant that 
org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
org.apache.solr.common.SolrInputField.getValue() rather than 
org.apache.solr.common.SolrInputField.getRawValue() as it had before.
 
getRawValue for a string calls 
org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
context calls
org.apache.solr.common.util.JavaBinCodec.getStringProvider()

 
JavaBinCodec has a CharArr, _arr_, which is modified in two different 
locations, but only one of which is protected with a synchronized block
 
getStringProvider() synchronizes on _arr_:
[https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
 
but  _readStr() doesn't:
[https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
 
The two methods are called concurrently, but wheren't prior to SOLR-13682.
 
Adding a synchronized block into _readStr() around the modification of _arr_ 
fixes the problem as far as I can see.


> JavaBinCodec has concurrent modification of CharrArr resulting in corrupt 
> intranode updates
> ---
>
> Key: SOLR-13963
> URL: https://issues.apache.org/jira/browse/SOLR-13963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.3
>Reporter: Colvin Cowie
>Priority: Major
>
> Discussed on the mailing list "Possible data corruption in JavaBinCodec in 
> Solr 8.3 during distributed update?"
>  
> In summary, after moving to 8.3 we had a consistent (but non-deterministic) 
> set of failing tests where the data being sent in intranode requests was 
> _sometimes_ corrupted. For example if the well formed data was
>  _'fieldName':"this is a long string"_
>  The error we saw from Solr might be that
>  unknown field _+'fieldNamis a long string"+_ 
>   
>  The change that indirectly caused to this issue to materialize was from 
> SOLR-13682 which meant that 
> org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
> org.apache.solr.common.SolrInputField.getValue() rather than 
> org.apache.solr.common.SolrInputField.getRawValue() as it had before.
>   
>  getRawValue for a string calls 
> 

[jira] [Created] (SOLR-13963) JavaBinCodec has concurrent modification of CharrArr resulting in corrupt intranode updates

2019-11-23 Thread Colvin Cowie (Jira)
Colvin Cowie created SOLR-13963:
---

 Summary: JavaBinCodec has concurrent modification of CharrArr 
resulting in corrupt intranode updates
 Key: SOLR-13963
 URL: https://issues.apache.org/jira/browse/SOLR-13963
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.3
Reporter: Colvin Cowie


Discussed on the mailing list "Possible data corruption in JavaBinCodec in Solr 
8.3 during distributed update?"

 

In summary, after moving to 8.3 we had a consistent (but non-deterministic) set 
of failing tests where the data being sent in intranode requests was 
_sometimes_ corrupted. For example if the well formed data was
_'fieldName':"this is a long string"_
The error we saw from Solr might be that
unknown field  _+'fieldNamis a long string"+_ 
 
The change that indirectly caused to this issue to materialize was from 
SOLR-13682 which meant that 
org.apache.solr.common.SolrInputDocument.writeMap(EntryWriter) would call 
org.apache.solr.common.SolrInputField.getValue() rather than 
org.apache.solr.common.SolrInputField.getRawValue() as it had before.
 
getRawValue for a string calls 
org.apache.solr.common.util.ByteArrayUtf8CharSequence._getStr() which in this 
context calls
org.apache.solr.common.util.JavaBinCodec.getStringProvider()

 
JavaBinCodec has a CharArr, _arr_, which is modified in two different 
locations, but only one of which is protected with a synchronized block
 
getStringProvider() synchronizes on _arr_:
[https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L966]
 
but  _readStr() doesn't:
[https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L930]
 
The two methods are called concurrently, but wheren't prior to SOLR-13682.
 
Adding a synchronized block into _readStr() around the modification of _arr_ 
fixes the problem as far as I can see.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search

2019-11-23 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980772#comment-16980772
 ] 

Tomoko Uchida edited comment on LUCENE-9004 at 11/23/19 4:23 PM:
-

Just for status update:
 [my PoC 
branch|https://github.com/mocobeta/lucene-solr-mirror/tree/jira/LUCENE-9004-aknn]
 is still on pretty early stage and works only on one segment, but now it can 
index and query arbitrary vectors by [this example 
code|https://gist.github.com/mocobeta/5c174ee9fc6408470057a9e7d2020c45]. The 
newly added KnnGraphQuery is an extension of Query class so it should be able 
to be combined with other queries with some limitations, because the knn query 
cannot score entire dataset in nature. Indexing performance is terrible for now 
(it takes a few minutes for a hundred of thousands vectors w/ 100 dims on 
commodity PCs), but searching doesn't look too bad (it takes ~30 msec for the 
same dataset) thanks to the skip list-like graph structure.

On my current branch I wrapped {{BinaryDocValues}} to store vector values. 
However, exposing random access capability for doc values (or its extensions) 
can be controversial, so I'd like to propose a new codec which combines 1. the 
HNSW graph and 2. the vectors (float arrays).

The new format for each vector field would have three parts (in other words, 
three files in a segment). They would look like:
{code:java}
 
 Meta data and index part:
 +--+
 | meta data|
 ++-+
 | doc id | offset to first friend list for the doc |
 ++-+
 | doc id | offset to first friend list for the doc |
 ++-+
 |  ..  |
 ++-+

 Graph data part:
 
+-+---+-+-+
 | friends list at layer N | friends list at layer N-1 |  .. | friends list 
at level 0 | <- friends lists for doc 0
 
+-+---+-+-+
 | friends list at layer N | friends list at layer N-1 |  .. | friends list 
at level 0 | <- friends lists for doc 1
 
+-+---+-+-+
 |..
   | <- and so on
 
+-+

 Vector data part:
 +--+
 | encoded vector value | <- vector value for doc 0
 +--+
 | encoded vector value | <- vector value for doc 1
 +--+
 |   .. | <- and so on
 +--+

{code}
 - "meta data" includes: number of dimensions, distance function for similarity 
calculation, and other field level meta data
 - "doc id" is: doc ids having a vector value on this field
 - "friends list at layer N" is: a delta encoded target doc id list where each 
target doc is connected to the doc at Nth layer
 - "encoded vector value" is: a fixed length byte array. the offset of the 
value can be calculated on the fly. (limitations: each document can have only 
one vector value for each vector field)

The graph data (friends lists) is relatively small so we could keep all of them 
on the Java heap for fast retrieval (though some off-heap strategy might be 
required for very large graphs).
 The vector data (vector values) is large and only the small fraction of it is 
needed when searching, so they should be kept on disk and accessed by some 
on-demand style.

Feedback is welcomed.

And I have a question about introducing new formats - is there a way to inject 
XXXFormat to the indexing chain so that we can add in this feature without any 
change on the {{lucene-core}}?


was (Author: tomoko uchida):
Just for status update:
 [my PoC 
branch|https://github.com/mocobeta/lucene-solr-mirror/tree/jira/LUCENE-9004-aknn]
 is still on pretty early stage and works only on one segment, but now it can 
index and query arbitrary vectors by [this example 
code|https://gist.github.com/mocobeta/5c174ee9fc6408470057a9e7d2020c45]. The 
newly added KnnGraphQuery is an extension of Query class so it should be able 
to be combined with other queries with some limitations, because the knn query 
cannot score entire dataset in nature. Indexing performance is terrible for now 
(it takes a few minutes for a hundred of thousands vectors w/ 100 dims on 
commodity PCs), but searching doesn't look too bad (it takes ~30 msec for the 
same dataset) thanks to the skip list-like graph structure.

On my current branch I wrapped {{BinaryDocValues}} to store vector 

[jira] [Comment Edited] (LUCENE-9004) Approximate nearest vector search

2019-11-23 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980772#comment-16980772
 ] 

Tomoko Uchida edited comment on LUCENE-9004 at 11/23/19 3:52 PM:
-

Just for status update:
 [my PoC 
branch|https://github.com/mocobeta/lucene-solr-mirror/tree/jira/LUCENE-9004-aknn]
 is still on pretty early stage and works only on one segment, but now it can 
index and query arbitrary vectors by [this example 
code|https://gist.github.com/mocobeta/5c174ee9fc6408470057a9e7d2020c45]. The 
newly added KnnGraphQuery is an extension of Query class so it should be able 
to be combined with other queries with some limitations, because the knn query 
cannot score entire dataset in nature. Indexing performance is terrible for now 
(it takes a few minutes for a hundred of thousands vectors w/ 100 dims on 
commodity PCs), but searching doesn't look too bad (it takes ~30 msec for the 
same dataset) thanks to the skip list-like graph structure.

On my current branch I wrapped {{BinaryDocValues}} to store vector values. 
However, exposing random access capability for doc values (or its extensions) 
can be controversial, so I'd like to propose a new codec which combines 1. the 
HNSW graph and 2. the vectors (float arrays).

The new format for each vector field would have three parts (in other words, 
three files in a segment). They would look like:
{code:java}
 
 Meta data and index part:
 +--+
 | meta data|
 ++-+
 | doc id | offset to first friend list for the doc |
 ++-+
 | doc id | offset to first friend list for the doc |
 ++-+
 |  ..  |
 ++-+

 Graph data part:
 
+-+---+-+-+
 | friends list at layer N | friends list at layer N-1 |  .. | friends list 
at level 0 | <- friends lists for doc 0
 
+-+---+-+-+
 | friends list at layer N | friends list at layer N-1 |  .. | friends list 
at level 0 | <- friends lists for doc 1
 
+-+---+-+-+
 |..
   | <- and so on
 
+-+

 Vector data part:
 +--+
 | encoded vector value | <- vector value for doc 0
 +--+
 | encoded vector value | <- vector value for doc 1
 +--+
 |   .. | <- and so on
 +--+

{code}
 - "meta data" includes: number of dimensions, distance function for similarity 
calculation, and other field level meta data
 - "doc id" is: doc ids having a vector value on this field
 - "friends list at layer N" is: a delta encoded target doc id list where each 
target doc is connected to the doc at Nth layer
 - "encoded vector value" is: a fixed length byte array. the offset of the 
value can be calculated on the fly. (limitations: each document can have only 
one vector value for each vector field)

The graph data (friends lists) is relatively small so we could keep all of them 
on the Java heap for fast retrieval (though some off-heap strategy might be 
required for very large graphs).
 The vector data (vector values) is large and only the small fraction of it is 
needed when searching, so they should be accessed by on-demand style via the 
index.

Feedback is welcomed.

And I have a question about introducing new formats - is there a way to inject 
XXXFormat to the indexing chain so that we can add in this feature without any 
change on the {{lucene-core}}?


was (Author: tomoko uchida):
Just for status update:
 [my PoC 
branch|https://github.com/mocobeta/lucene-solr-mirror/tree/jira/LUCENE-9004-aknn]
 is still on pretty early stage and works only on one segment, but now it can 
index and query arbitrary vectors by [this example 
code|https://gist.github.com/mocobeta/5c174ee9fc6408470057a9e7d2020c45]. The 
newly added KnnGraphQuery is an extension of Query class so it should be able 
to be combined with other queries with some limitations, because the knn query 
cannot score entire dataset in nature. Indexing performance is terrible for now 
(it takes a few minutes for a hundred of thousands vectors w/ 100 dims on 
commodity PCs), but searching doesn't look too bad (it takes ~30 msec for the 
same dataset) thanks to the skip list-like graph structure.

On my current branch I wrapped {{BinaryDocValues}} to store vector values. 

[GitHub] [lucene-solr] freedev commented on a change in pull request #996: SOLR-13863: Added spayload query function to read and sort string pay…

2019-11-23 Thread GitBox
freedev commented on a change in pull request #996: SOLR-13863: Added spayload 
query function to read and sort string pay…
URL: https://github.com/apache/lucene-solr/pull/996#discussion_r349876782
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/search/StringPayloadValueSource.java
 ##
 @@ -0,0 +1,306 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.search;
+
+import java.io.IOException;
+import java.util.Map;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.queries.function.docvalues.StrDocValues;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.FieldComparator;
+import org.apache.lucene.search.FieldComparatorSource;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.SimpleFieldComparator;
+import org.apache.lucene.search.SortField;
+import org.apache.lucene.util.BytesRef;
+
+public class StringPayloadValueSource extends ValueSource {
 
 Review comment:
   Hi erik, thanks for the suggestion. Surprisingly, even if I've subscribed 
for notification I haven't received any message for your comment. I'll try to 
refactor my solution adding this new behaviour to payload() function. Thanks 
again for your time. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13952) Separate out Gradle-specific code from other (mostly test) changes and commit separately

2019-11-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980773#comment-16980773
 ] 

David Smiley commented on SOLR-13952:
-

Then file an issue RE SuppressWarnings with 200 files and commit that.  That's 
a valid subject/theme that could have a commit message that makes sense.  I 
just don't want a commit/issue that's basically "Bunch of random stuff Mark 
did; he knows best"

> Separate out Gradle-specific code from other (mostly test) changes and commit 
> separately
> 
>
> Key: SOLR-13952
> URL: https://issues.apache.org/jira/browse/SOLR-13952
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: fordavid.patch
>
>
> The gradle_8 branch has many changes unrelated to gradle. It would be much 
> easier to work on the gradle parts if these were separated. So here's my plan:
> - establish a branch to use for the non-gradle parts of the gradle_8 branch 
> and commit separately. For a first cut, I'll make all the changes I'm 
> confident of, and mark the others with nocommits so we can iterate and decide 
> when to merge to master and 8x.
> - create a "gradle_9" branch that hosts only the gradle changes for us all to 
> iterate on.
> I hope to have a preliminary cut at this over the weekend. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9004) Approximate nearest vector search

2019-11-23 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980772#comment-16980772
 ] 

Tomoko Uchida commented on LUCENE-9004:
---

Just for status update:
 [my PoC 
branch|https://github.com/mocobeta/lucene-solr-mirror/tree/jira/LUCENE-9004-aknn]
 is still on pretty early stage and works only on one segment, but now it can 
index and query arbitrary vectors by [this example 
code|https://gist.github.com/mocobeta/5c174ee9fc6408470057a9e7d2020c45]. The 
newly added KnnGraphQuery is an extension of Query class so it should be able 
to be combined with other queries with some limitations, because the knn query 
cannot score entire dataset in nature. Indexing performance is terrible for now 
(it takes a few minutes for a hundred of thousands vectors w/ 100 dims on 
commodity PCs), but searching doesn't look too bad (it takes ~30 msec for the 
same dataset) thanks to the skip list-like graph structure.

On my current branch I wrapped {{BinaryDocValues}} to store vector values. 
However, exposing random access capability for doc values (or its extensions) 
can be controversial, so I'd like to propose a new codec which combines 1. the 
HNSW graph and 2. the vectors (float arrays).

The new format for each vector field would have three parts (in other words, 
three files in a segment). They would look like:
{code:java}
 
 Meta data and index part:
 +--+
 | meta data|
 ++-+
 | doc id | offset to first friend list for the doc |
 ++-+
 | doc id | offset to first friend list for the doc |
 ++-+
 |  ..  |
 ++-+

 Graph data part:
 
+-+---+-+-+
 | friends list at layer N | friends list at layer N-1 |  .. | friends list 
at level 0 |
 
+-+---+-+-+
 | friends list at layer N | friends list at layer N-1 |  .. | friends list 
at level 0 |
 
+-+---+-+-+
 |..
   |
 
+-+

 Vector data part:
 +--+
 | encoded vector value |
 +--+
 | encoded vector value |
 +--+
 |   .. |
 +--+

{code}
 - "meta data" includes: number of dimensions, distance function for similarity 
calculation, and other field level meta data
 - "doc id" is: doc ids having a vector value on this field
 - "friends list at layer N" is: a delta encoded target doc id list where each 
target doc is connected to the doc at Nth layer
 - "encoded vector value" is: a fixed length byte array. the offset of the 
value can be calculated on the fly. (limitations: each document can have only 
one vector value for each vector field)

The graph data (friends lists) is relatively small so we could keep all of them 
on the Java heap for fast retrieval (though some off-heap strategy might be 
required for very large graphs).
 The vector data (vector values) is large and only the small fraction of it is 
needed when searching, so they should be accessed by on-demand style via the 
index.

Feedback is welcomed.

And I have a question about introducing new formats - is there a way to inject 
XXXFormat to the indexing chain so that we can add in this feature without any 
change on the {{lucene-core}}?

> Approximate nearest vector search
> -
>
> Key: LUCENE-9004
> URL: https://issues.apache.org/jira/browse/LUCENE-9004
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
> Attachments: hnsw_layered_graph.png
>
>
> "Semantic" search based on machine-learned vector "embeddings" representing 
> terms, queries and documents is becoming a must-have feature for a modern 
> search engine. SOLR-12890 is exploring various approaches to this, including 
> providing vector-based scoring functions. This is a spinoff issue from that.
> The idea here is to explore approximate nearest-neighbor search. Researchers 
> have found an approach based on navigating a graph that partially encodes the 
> nearest neighbor relation at multiple scales can provide accuracy > 95% (as 
> compared to exact nearest neighbor calculations) at a reasonable cost. This 
> issue will explore implementing HNSW (hierarchical navigable 

[jira] [Commented] (SOLR-13952) Separate out Gradle-specific code from other (mostly test) changes and commit separately

2019-11-23 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980771#comment-16980771
 ] 

Erick Erickson commented on SOLR-13952:
---

Thanks for letting me know about XmlOffsetCorrector.

re: one big commit.  There are over 200 (?) separate files affected, the vast 
majority of them follow the pattern:

{code}
@SuppressWarning blah blah blah
{code}

Mostly for deprecations and the like and another group for thread leaks from 
other packages that we don't control and shouldn't fail  suites because of 
thread leaks in them.

There are maybe 3-4 about matters like this. And this one (XmlOffsetCorrector) 
doesn't count since I'm going to revert it on your advice.

I'm not willing to create a huge number of tickets for this. Look at the bright 
side, at least the gradle branch won't have them (soon I hope).

The goal here is to peel this out of the Gradle build exactly to introduce 
_some_ separation of changes in the gradle branch without losing Mark's efforts 
at test improvement (or, in many cases compiler warnings).

So what do you suggest here? I'm not going to do a lot of busy work to address 
this.

> Separate out Gradle-specific code from other (mostly test) changes and commit 
> separately
> 
>
> Key: SOLR-13952
> URL: https://issues.apache.org/jira/browse/SOLR-13952
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: fordavid.patch
>
>
> The gradle_8 branch has many changes unrelated to gradle. It would be much 
> easier to work on the gradle parts if these were separated. So here's my plan:
> - establish a branch to use for the non-gradle parts of the gradle_8 branch 
> and commit separately. For a first cut, I'll make all the changes I'm 
> confident of, and mark the others with nocommits so we can iterate and decide 
> when to merge to master and 8x.
> - create a "gradle_9" branch that hosts only the gradle changes for us all to 
> iterate on.
> I hope to have a preliminary cut at this over the weekend. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9061) Async channel tests may leak internal java threads

2019-11-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980766#comment-16980766
 ] 

ASF subversion and git services commented on LUCENE-9061:
-

Commit fad75cf98dc0e3a24fad259f9cea18b3d8bf9a05 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fad75cf ]

LUCENE-9061: Use an explicit executor service in async channel tests, otherwise 
they leak internal JVM threads.


> Async channel tests may leak internal java threads
> --
>
> Key: LUCENE-9061
> URL: https://issues.apache.org/jira/browse/LUCENE-9061
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9061) Async channel tests may leak internal java threads

2019-11-23 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9061:

Fix Version/s: master (9.0)

> Async channel tests may leak internal java threads
> --
>
> Key: LUCENE-9061
> URL: https://issues.apache.org/jira/browse/LUCENE-9061
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: master (9.0)
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9061) Async channel tests may leak internal java threads

2019-11-23 Thread Dawid Weiss (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9061.
-
Resolution: Fixed

> Async channel tests may leak internal java threads
> --
>
> Key: LUCENE-9061
> URL: https://issues.apache.org/jira/browse/LUCENE-9061
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
> Fix For: master (9.0)
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9061) Async channel tests may leak internal java threads

2019-11-23 Thread Dawid Weiss (Jira)
Dawid Weiss created LUCENE-9061:
---

 Summary: Async channel tests may leak internal java threads
 Key: LUCENE-9061
 URL: https://issues.apache.org/jira/browse/LUCENE-9061
 Project: Lucene - Core
  Issue Type: Test
Reporter: Dawid Weiss
Assignee: Dawid Weiss






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8985) SynonymGraphFilter cannot handle input stream with tokens filtered.

2019-11-23 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980750#comment-16980750
 ] 

Michael Sokolov commented on LUCENE-8985:
-

I looked at the patch - it seems quite a significant change: maybe a good one?! 
I'm not entirely clear what the goal is though - the bug description in this 
issue is pretty light on details. Can we explain here what the current behavior 
of PhraseQuery and other positional queries is w.r.t holes more generally, 
ignoring synonyms, and then how it currently works (is broken) in the presence 
of synonyms? I don't have enough context to review this. I'm willing to help 
out, but I'd need more to go on, and I can't really commit to the 8.4 release 
schedule, sorry [~janhoy]

> SynonymGraphFilter cannot handle input stream with tokens filtered.
> ---
>
> Key: LUCENE-8985
> URL: https://issues.apache.org/jira/browse/LUCENE-8985
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Chongchen Chen
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.3
>
> Attachments: SGF_SF_interaction.patch.txt
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [~janhoy] find the bug.
> In an analyzer with e.g. stopFilter where tokens are removed from the stream 
> and replaced with a “hole”, synonymgraphfilter will not preserve these holes 
> but remove them, resulting in certain phrase queries failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13961) Unsetting Nested Documents using Atomic Update leads to SolrException: undefined field

2019-11-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980749#comment-16980749
 ] 

Thomas Wöckinger commented on SOLR-13961:
-

[~dsmiley]: Changed it already, and resolved the discussion. All tests are 
passing.

> Unsetting Nested Documents using Atomic Update leads to SolrException: 
> undefined field
> --
>
> Key: SOLR-13961
> URL: https://issues.apache.org/jira/browse/SOLR-13961
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests, UpdateRequestProcessors
>Affects Versions: master (9.0), 8.3, 8.4
>Reporter: Thomas Wöckinger
>Assignee: David Smiley
>Priority: Critical
>  Labels: easyfix
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Using null or empty collection to unset nested documents (as suggested by 
> documentation) leads to SolrException: undefined field ... .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] thomaswoeckinger commented on a change in pull request #1030: SOLR-13961: Fix Atomic Update unset nested documents

2019-11-23 Thread GitBox
thomaswoeckinger commented on a change in pull request #1030: SOLR-13961: Fix 
Atomic Update unset nested documents
URL: https://github.com/apache/lucene-solr/pull/1030#discussion_r349867533
 
 

 ##
 File path: 
solr/core/src/test/org/apache/solr/update/processor/NestedAtomicUpdateTest.java
 ##
 @@ -642,6 +642,118 @@ public void testBlockAtomicRemove() throws Exception {
 );
   }
 
+  @Test
+  public void testBlockAtomicSetToNull() throws Exception {
+SolrInputDocument doc = sdoc("id", "1",
+"cat_ss", new String[] {"aaa", "ccc"},
+"child1", sdocs(sdoc("id", "2", "cat_ss", "child"), sdoc("id", "3", 
"cat_ss", "child")));
+assertU(adoc(doc));
+
+BytesRef rootDocId = new BytesRef("1");
+SolrCore core = h.getCore();
+SolrInputDocument block = RealTimeGetComponent.getInputDocument(core, 
rootDocId,
+RealTimeGetComponent.Resolution.ROOT_WITH_CHILDREN);
+// assert block doc has child docs
+assertTrue(block.containsKey("child1"));
+
+assertJQ(req("q", "id:1"), "/response/numFound==0");
+
+// commit the changes
+assertU(commit());
+
+SolrInputDocument committedBlock = 
RealTimeGetComponent.getInputDocument(core, rootDocId,
+RealTimeGetComponent.Resolution.ROOT_WITH_CHILDREN);
+BytesRef childDocId = new BytesRef("2");
+// ensure the whole block is returned when resolveBlock is true and id of 
a child doc is provided
+assertEquals(committedBlock.toString(), RealTimeGetComponent
+.getInputDocument(core, childDocId, 
RealTimeGetComponent.Resolution.ROOT_WITH_CHILDREN).toString());
+
+assertJQ(req("q", "id:1"), "/response/numFound==1");
+
+assertJQ(req("qt", "/get", "id", "1", "fl", "id, cat_ss, child1, 
[child]"), "=={\"doc\":{'id':\"1\"" +
+", cat_ss:[\"aaa\",\"ccc\"], 
child1:[{\"id\":\"2\",\"cat_ss\":[\"child\"]}, 
{\"id\":\"3\",\"cat_ss\":[\"child\"]}]}}");
+
+assertU(commit());
+
+assertJQ(req("qt", "/get", "id", "1", "fl", "id, cat_ss, child1, 
[child]"), "=={\"doc\":{'id':\"1\"" +
+", cat_ss:[\"aaa\",\"ccc\"], 
child1:[{\"id\":\"2\",\"cat_ss\":[\"child\"]}, 
{\"id\":\"3\",\"cat_ss\":[\"child\"]}]}}");
+
+doc = sdoc("id", "1", "child1", Collections.singletonMap("set", null));
+addAndGetVersion(doc, params("wt", "json"));
+
+assertJQ(req("qt", "/get", "id", "1", "fl", "id, cat_ss, child1, 
[child]"), "=={\"doc\":{'id':\"1\", cat_ss:[\"aaa\",\"ccc\"]}}");
+
+assertU(commit());
+
+// a cut-n-paste of the first big query, but this time it will be 
retrieved from the index rather than the
+// transaction log
+// this requires ChildDocTransformer to get the whole block, since the 
document is retrieved using an index lookup
+assertJQ(req("qt", "/get", "id", "1", "fl", "id, cat_ss, child1, 
[child]"), "=={'doc':{'id':'1', cat_ss:[\"aaa\",\"ccc\"]}}");
+
+// ensure the whole block has been committed correctly to the index.
+assertJQ(req("q", "id:1", "fl", "*, [child]"),
+"/response/numFound==1",
+"/response/docs/[0]/id=='1'",
+"/response/docs/[0]/cat_ss/[0]==\"aaa\"",
+"/response/docs/[0]/cat_ss/[1]==\"ccc\"");
+  }
+
+  @Test
+  public void testBlockAtomicSetToEmpty() throws Exception {
 
 Review comment:
   You are right, was a long day, but yes changed it .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9042) Refactor TopGroups.merge tests

2019-11-23 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980725#comment-16980725
 ] 

Lucene/Solr QA commented on LUCENE-9042:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 20s{color} 
| {color:red} lucene_grouping generated 3 new + 108 unchanged - 0 fixed = 111 
total (was 108) {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} grouping in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  4m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9042 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986537/LUCENE-9042.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 312431b |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
| javac | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/237/artifact/out/diff-compile-javac-lucene_grouping.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/237/testReport/ |
| modules | C: lucene/grouping U: lucene/grouping |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/237/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Refactor TopGroups.merge tests
> --
>
> Key: LUCENE-9042
> URL: https://issues.apache.org/jira/browse/LUCENE-9042
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Diego Ceccarelli
>Priority: Minor
> Attachments: LUCENE-9042.patch
>
>
> This task proposes a refactoring of the test coverage for the 
> {{TopGroups.merge}} method implemented in LUCENE-9010. For now it will cover 
> only 3 main cases. 
> 1. Merging to empty TopGroups
> 2. Merging a TopGroups with scores and a TopGroups without scores (currently 
> broken because of LUCENE-8996 bug) 
> 3. Merging two TopGroups with scores.
> I'm planning to increase the coverage testing also invalid inputs but I would 
> do that in a separate PR to keep the code readable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org