[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-21 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to top_fc then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=TOP_FC}
{code}

3)  Adds numeric collapse field implementations.

4) Resolves issue SOLR-6066







 






  was:
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to top_fc then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=top_fc}
{code}

3)  Adds numeric collapse field implementations.

4) Resolves issue SOLR-6066







 







 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0, Trunk

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 This ticket does the following:
 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
 default approach when collapsing on String fields
 2) Provides an option to use a top level FieldCache if the performance of 
 MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
 a new hint parameter. If the hint parameter is set to top_fc then the 
 top-level FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=TOP_FC}
 {code}
 3)  Adds numeric collapse field implementations.
 4) Resolves issue SOLR-6066
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Fix Version/s: Trunk

 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0, Trunk

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 This ticket does the following:
 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
 default approach when collapsing on String fields
 2) Provides an option to use a top level FieldCache if the performance of 
 MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
 a new hint parameter. If the hint parameter is set to FAST_QUERY then the 
 top-level FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
 3)  Adds numeric collapse field implementations.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to top_fc then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=top_fc}
{code}

3)  Adds numeric collapse field implementations.







 






  was:
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to FAST_QUERY then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}

3)  Adds numeric collapse field implementations.







 







 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0, Trunk

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 This ticket does the following:
 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
 default approach when collapsing on String fields
 2) Provides an option to use a top level FieldCache if the performance of 
 MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
 a new hint parameter. If the hint parameter is set to top_fc then the 
 top-level FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=top_fc}
 {code}
 3)  Adds numeric collapse field implementations.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-12 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to top_fc then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=top_fc}
{code}

3)  Adds numeric collapse field implementations.

4) Resolves issue SOLR-6066







 






  was:
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to top_fc then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=top_fc}
{code}

3)  Adds numeric collapse field implementations.







 







 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0, Trunk

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 This ticket does the following:
 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
 default approach when collapsing on String fields
 2) Provides an option to use a top level FieldCache if the performance of 
 MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
 a new hint parameter. If the hint parameter is set to top_fc then the 
 top-level FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=top_fc}
 {code}
 3)  Adds numeric collapse field implementations.
 4) Resolves issue SOLR-6066
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-11 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Summary: Efficient DocValues support and numeric collapse field 
implementations for Collapse and Expand  (was: Prepare CollapsingQParserPlugin 
and ExpandComponent for 5.0)

 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 There are some major advantages of using the MultiDocValues rather then a top 
 level FieldCache. But there is one disadvantage, the lookup from docId to 
 top-level ordinals is slower using MultiDocValues.
 My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
 to use MultiDocValues, the performance drop is around 100%.  For some use 
 cases this performance drop is a blocker.
 *What About Faceting?*
 String faceting also relies on the top level ordinals. Is faceting 
 performance affected also? My testing has shown that the faceting performance 
 is affected much less then collapsing. 
 One possible reason for this may be that field collapsing is memory bound and 
 faceting is not. So the additional memory accesses needed for MultiDocValues 
 affects field collapsing much more then faceting.
 *Proposed Solution*
 The proposed solution is to have the default Collapse and Expand algorithm 
 use MultiDocValues, but to provide an option to use a top level FieldCache if 
 the performance of MultiDocValues is a blocker.
 The proposed mechanism for switching to the FieldCache would be a new hint 
 parameter. If the hint parameter is set to FAST_QUERY then the top-level 
 FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
  
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-11 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But there is one disadvantage, the lookup from docId to 
top-level ordinals is slower using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
affected also? My testing has shown that the faceting performance is affected 
much less then collapsing. 

One possible reason for this may be that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
affects field collapsing much more then faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm use 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}

*Numeric Collapse Fields*

This ticket also adds numeric collapse field implementations.




 







 






  was:
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But there is one disadvantage, the lookup from docId to 
top-level ordinals is slower using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
affected also? My testing has shown that the faceting performance is affected 
much less then collapsing. 

One possible reason for this may be that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
affects field collapsing much more then faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm use 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}






 







 







 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff


 *Background*
 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast 

[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-11 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Description: 
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

This ticket does the following:

1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
default approach when collapsing on String fields

2) Provides an option to use a top level FieldCache if the performance of 
MultiDocValues is a blocker. The mechanism for switching to the FieldCache is a 
new hint parameter. If the hint parameter is set to FAST_QUERY then the 
top-level FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}

3)  Adds numeric collapse field implementations.







 






  was:
*Background*

The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
are optimized to work with a top level FieldCache. Top level FieldCaches have a 
very fast docID to top-level ordinal lookup. Fast access to the top-level 
ordinals allows for very high performance field collapsing on high cardinality 
fields. 

LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
FieldCache is no longer in regular use. Instead all top level caches are 
accessed through MultiDocValues. 

There are some major advantages of using the MultiDocValues rather then a top 
level FieldCache. But there is one disadvantage, the lookup from docId to 
top-level ordinals is slower using MultiDocValues.

My testing has shown that *after optimizing* the CollapsingQParserPlugin code 
to use MultiDocValues, the performance drop is around 100%.  For some use cases 
this performance drop is a blocker.

*What About Faceting?*

String faceting also relies on the top level ordinals. Is faceting performance 
affected also? My testing has shown that the faceting performance is affected 
much less then collapsing. 

One possible reason for this may be that field collapsing is memory bound and 
faceting is not. So the additional memory accesses needed for MultiDocValues 
affects field collapsing much more then faceting.

*Proposed Solution*

The proposed solution is to have the default Collapse and Expand algorithm use 
MultiDocValues, but to provide an option to use a top level FieldCache if the 
performance of MultiDocValues is a blocker.

The proposed mechanism for switching to the FieldCache would be a new hint 
parameter. If the hint parameter is set to FAST_QUERY then the top-level 
FieldCache would be used for both Collapse and Expand.

Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}

*Numeric Collapse Fields*

This ticket also adds numeric collapse field implementations.




 







 







 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, renames.diff


 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 This ticket does the following:
 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
 default approach when collapsing on String fields
 2) Provides an option to use a top level FieldCache if the performance of 
 MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
 a new hint parameter. If the hint parameter is set to FAST_QUERY then the 
 top-level FieldCache would be used for both 

[jira] [Updated] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2015-01-11 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-6581:
-
Attachment: SOLR-6581.patch

unit test are passing, manual testing looks good, pre-commit passes. 

 Efficient DocValues support and numeric collapse field implementations for 
 Collapse and Expand
 --

 Key: SOLR-6581
 URL: https://issues.apache.org/jira/browse/SOLR-6581
 Project: Solr
  Issue Type: Bug
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
 renames.diff


 The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
 are optimized to work with a top level FieldCache. Top level FieldCaches have 
 a very fast docID to top-level ordinal lookup. Fast access to the top-level 
 ordinals allows for very high performance field collapsing on high 
 cardinality fields. 
 LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
 FieldCache is no longer in regular use. Instead all top level caches are 
 accessed through MultiDocValues. 
 This ticket does the following:
 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
 default approach when collapsing on String fields
 2) Provides an option to use a top level FieldCache if the performance of 
 MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
 a new hint parameter. If the hint parameter is set to FAST_QUERY then the 
 top-level FieldCache would be used for both Collapse and Expand.
 Example syntax:
 {code}
 fq={!collapse field=x hint=FAST_QUERY}
 {code}
 3)  Adds numeric collapse field implementations.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org