[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807344#comment-15807344 ] Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 11:39 AM: -- This patch changes the verifyEquals behaviour. It checks that the documents are present and that they are equals, regardless the order. was (Author: ekeller): This patch change the verifyEquals behaviour. It checks that the documents are present and that they are equals, regardless the order. > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.4 > > Attachments: LUCENE-7588.patch, lucene-7588-test.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272 ] Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:58 AM: -- Both actual array and expected array contains 24 documents. But not equally sorted. The test expects that the retrieved ScoreDoc array is ordered. However the scores are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that each document are present and equals. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} was (Author: ekeller): Both actual array and expected array contains 24 documents. But not equally sorted. The test expects that the retrieved ScoreDoc array is ordered. In this test, but the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that each document are present and equals. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.4 > > Attachments: LUCENE-7588.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272 ] Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:50 AM: -- Both actual array and expected array contains 24 documents. But not equally sorted. The test expects that the retrieved ScoreDoc array is ordered. In this test, but the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that each document are present and equals. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} was (Author: ekeller): Bot actual array and expected array contains 24 documents. But not equally sorted. The test expects that the retrieved ScoreDoc array is ordered. In this test, but the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that each document are present and equals. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.4 > > Attachments: LUCENE-7588.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272 ] Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:50 AM: -- Bot actual array and expected array contains 24 documents. But not equally sorted. The test expects that the retrieved ScoreDoc array is ordered. In this test, but the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that each document are present and equals. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} was (Author: ekeller): The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that the document are present with the same score. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.4 > > Attachments: LUCENE-7588.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272 ] Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:47 AM: -- The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that the document are present with the same score. Here is the current check test for the ScoreDoc array: {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} was (Author: ekeller): The test expects that the retrieved ScoreDoc array is ordered. In this test, the score are identical for all documents. As we are using a multithreaded map/reduce design we can't expect that the order will be preserved. [~mikemccand] am I right ? IMHO, the equality check must be modified to only check that the document are present with the same score. {code:java} for (int i = 0; i < expected.hits.size(); i++) { if (VERBOSE) { System.out.println("hit " + i + " expected=" + expected.hits.get(i).id); } assertEquals(expected.hits.get(i).id, s.doc(actual.hits.scoreDocs[i].doc).get("id")); // Score should be IDENTICAL: assertEquals(scores.get(expected.hits.get(i).id), actual.hits.scoreDocs[i].score, 0.0f); } {code} > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.4 > > Attachments: LUCENE-7588.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765461#comment-15765461 ] Emmanuel Keller edited comment on LUCENE-7588 at 12/20/16 10:51 PM: New patch: 1. In the DrillSideways.search method, if executor is non-null, we invoke the concurrent version. 2. The unit test tests effectively the new concurrent methods. I work on the benchmark now. [~mikemccand] I will submit a new bench to your repo luceneutils. was (Author: ekeller): New patch: 1. In the DrillSideways.search method, if executor is non-null, we invoke the concurrent version. 2. The unit test tests effectively the new concurrent methods. > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.3.1 > > Attachments: LUCENE-7588.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation
[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751916#comment-15751916 ] Emmanuel Keller edited comment on LUCENE-7588 at 12/15/16 5:18 PM: --- Thanks for your feedback guys, it's pretty clear. FYI, the patch includes unit tests derived from the already existing test on facets. was (Author: ekeller): Thanks for your feedback guys, it's pretty clear. FYI, the patch includes unit tests derived for the already existing test on facets. > A parallel DrillSideways implementation > --- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (7.0), 6.3.1 >Reporter: Emmanuel Keller >Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.3.1 > > Attachments: LUCENE-7588.patch > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org