[jira] [Updated] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Joel Bernstein (JIRA) Tue, 03 Jan 2017 10:14:31 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-8530:
---------------------------------
    Attachment: SOLR-8530.patch

Patch with tests.

> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
>                 Key: SOLR-8530
>                 URL: https://issues.apache.org/jira/browse/SOLR-8530
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>    Affects Versions: 6.0
>            Reporter: Dennis Gove
>            Priority: Minor
>         Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce(....) based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
>     search(.....),
>     sum(cost),
>     on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Reply via email to