[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-07 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807344#comment-15807344
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 11:39 AM:
--

This patch changes the verifyEquals behaviour. It checks that the documents are 
present and that they are equals, regardless the order.


was (Author: ekeller):
This patch change the verifyEquals behaviour. It checks that the documents are 
present and that they are equals, regardless the order.

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch, lucene-7588-test.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-07 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:58 AM:
--

Both actual array and expected array contains 24 documents. But not equally 
sorted.

The test expects that the retrieved ScoreDoc array is ordered. However the 
scores are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that each document are 
present and equals.

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}


was (Author: ekeller):
Both actual array and expected array contains 24 documents. But not equally 
sorted.

The test expects that the retrieved ScoreDoc array is ordered. In this test, 
but the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that each document are 
present and equals.

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-07 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:50 AM:
--

Both actual array and expected array contains 24 documents. But not equally 
sorted.

The test expects that the retrieved ScoreDoc array is ordered. In this test, 
but the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that each document are 
present and equals.

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}


was (Author: ekeller):
Bot actual array and expected array contains 24 documents. But not equally 
sorted.

The test expects that the retrieved ScoreDoc array is ordered. In this test, 
but the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that each document are 
present and equals.

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-07 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:50 AM:
--

Bot actual array and expected array contains 24 documents. But not equally 
sorted.

The test expects that the retrieved ScoreDoc array is ordered. In this test, 
but the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that each document are 
present and equals.

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}


was (Author: ekeller):
The test expects that the retrieved ScoreDoc array is ordered. In this test, 
the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that the document are 
present with the same score.  

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2017-01-07 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15807272#comment-15807272
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 1/7/17 10:47 AM:
--

The test expects that the retrieved ScoreDoc array is ordered. In this test, 
the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that the document are 
present with the same score.  

Here is the current check test for the ScoreDoc array:

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}


was (Author: ekeller):
The test expects that the retrieved ScoreDoc array is ordered. In this test, 
the score are identical for all documents.

As we are using a multithreaded map/reduce design we can't expect that the 
order will be preserved.
[~mikemccand] am I right ?

IMHO, the equality check must be modified to only check that the document are 
present with the same score.  

{code:java}
for (int i = 0; i < expected.hits.size(); i++) {
  if (VERBOSE) {
System.out.println("hit " + i + " expected=" + 
expected.hits.get(i).id);
  }
  assertEquals(expected.hits.get(i).id, 
s.doc(actual.hits.scoreDocs[i].doc).get("id"));
  // Score should be IDENTICAL:
  assertEquals(scores.get(expected.hits.get(i).id), 
actual.hits.scoreDocs[i].score, 0.0f);
}
{code}

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2016-12-20 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15765461#comment-15765461
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 12/20/16 10:51 PM:


New patch:
1. In the DrillSideways.search method, if executor is non-null, we invoke the 
concurrent version.
2. The unit test tests effectively the new concurrent methods.

I work on the benchmark now. [~mikemccand] I will submit a new bench to your 
repo luceneutils.


was (Author: ekeller):
New patch:
1. In the DrillSideways.search method, if executor is non-null, we invoke the 
concurrent version.
2. The unit test tests effectively the new concurrent methods.


> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.3.1
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7588) A parallel DrillSideways implementation

2016-12-15 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751916#comment-15751916
 ] 

Emmanuel Keller edited comment on LUCENE-7588 at 12/15/16 5:18 PM:
---

Thanks for your feedback guys, it's pretty clear. FYI, the patch includes unit 
tests derived from the already existing test on facets.


was (Author: ekeller):
Thanks for your feedback guys, it's pretty clear. FYI, the patch includes unit 
tests derived for the already existing test on facets.

> A parallel DrillSideways implementation
> ---
>
> Key: LUCENE-7588
> URL: https://issues.apache.org/jira/browse/LUCENE-7588
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (7.0), 6.3.1
>Reporter: Emmanuel Keller
>Priority: Minor
>  Labels: facet, faceting
> Fix For: master (7.0), 6.3.1
>
> Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org