[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803218#comment-15803218
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 7ae9ca85d9d920db353d3d080b0cb36567e206b2 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7ae9ca8 ]

SOLR-8530: Add support for aggregate HAVING comparisons without single quotes


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803191#comment-15803191
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit b32cd82318f5c8817a8383e1be7534c772e6fa13 in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b32cd82 ]

SOLR-8530: Add support for aggregate HAVING comparisons without single quotes


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796914#comment-15796914
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 93d1bba8f2194970ca736bee993cedea24e66b91 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=93d1bba ]

SOLR-8530: Add support for single quoted aggregate HAVING comparisons


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796890#comment-15796890
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit ccdbb6ac0e0094985e5145c84b3cc2814ababf1d in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ccdbb6a ]

SOLR-8530: Add support for single quoted aggregate HAVING comparisons


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796835#comment-15796835
 ] 

Joel Bernstein commented on SOLR-8530:
--

This is reopened to support having comparisons on aggregates.

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796408#comment-15796408
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 297b1789092f4f9b3a2cfb91da397e5034708486 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=297b178 ]

SOLR-8530: Add tests from the HavingStream


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796410#comment-15796410
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit d8a58146c3155a13f9bb8c46eb2d2878301426d3 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d8a5814 ]

SOLR-8530: Updated CHANGES.txt


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796407#comment-15796407
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 2f7d6fc0fa3de7e2f1d09823d9ef4c6ee08e9d44 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2f7d6fc ]

SOLR-8530: Add HavingStream to Streaming API and StreamingExpressions


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796409#comment-15796409
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 00af5fff4d096000b0cde9066a599f68076c1862 in lucene-solr's branch 
refs/heads/branch_6x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=00af5ff ]

SOLR-8530: Fixed javadoc


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796350#comment-15796350
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit db7d2ff1629e7ae45a405eebdcdde1c68664d01f in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=db7d2ff ]

SOLR-8530: Updated CHANGES.txt


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796347#comment-15796347
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 1da283ef2c673b2effac834da1de1cb94c0118bb in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1da283e ]

SOLR-8530: Add HavingStream to Streaming API and StreamingExpressions


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796348#comment-15796348
 ] 

ASF subversion and git services commented on SOLR-8530:
---

Commit 5bbd4d6765d69d245131d049a2551c0534c1180d in lucene-solr's branch 
refs/heads/master from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5bbd4d6 ]

SOLR-8530: Add tests from the HavingStream


> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8530.patch
>
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-02 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793146#comment-15793146
 ] 

Joel Bernstein commented on SOLR-8530:
--

We ran into the same problem when we implemented the classify() function which 
needed access to the analyzers. We ended placing the ClassifyStream in core: 
org.apache.solr.handler.

This means the classify() function can only be run via the /stream handler 
rather then as a stand alone solrj client. But in scenarios where we have 
functions that require integration with Solr core classes I think this makes 
senses. 




> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-02 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793092#comment-15793092
 ] 

Dennis Gove commented on SOLR-8530:
---

One problem I ran into when I was approaching the Match (or SolrMatch, I like 
David's idea about naming) implementation was that the classes needed for 
in-memory index don't exist in the SolrJ library. This means it would create a 
dependency on something outside SolrJ. If I remember correctly, the specific 
pieces I was trying to implement was the parsing of a Solr query to a Lucene 
compatible query. This is because the in-memory index requires Lucene syntax 
while I wanted the SolrMatch to accept Solr syntax.

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-02 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793050#comment-15793050
 ] 

David Smiley commented on SOLR-8530:


Nice plan Joel.

RE naming... maybe include the string "solr" in some way, e.g. "solrMatch"?  or 
"solrPredicate"?  "match" by itself seems too generic/ambiguous to me.

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2017-01-01 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15792021#comment-15792021
 ] 

Joel Bernstein commented on SOLR-8530:
--

I returned the HavingStream as part of SOLR-8593.

What I found during the implementation is that both implementations described 
in this ticket are compatible in the same HavingStream implementation. 

What [~dpgove] originally described was indexing a document on the fly and the 
using a Lucene/Solr query to implement the boolean logic.

What I described is implementing the boolean logic as stream operations that 
would handle typical SQL Having comparisons (=, <, >, <>, >=, <=). 

I have  implemented the HavingStream I described as part of SOLR-8593 with 
syntax that looks like this:

{code}
having(expr, booleanOp)
{code}

Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for 
each tuple. The basic boolean operations have been implemented, such as:

{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}

This would emit tuples from the underlying expr where field1 is greater the 5 
and less then 10.

To implement what [~dpgove] had in mind, we can add a new boolean operation 
called *match*. The match operation will index the tuple in a in-memory index 
and the match a Lucene/Solr query against it. Here is the sample syntax:

{code}
having(expr, match("field1:[5 TO 10]"))
{code}

The match boolean operation could then be intermingled with other boolean 
operations, for example:

{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}

Depending on the progress of the SOLR-8593, I may strip out the HavingStream 
implementation and commit it with this ticket, so it can be ready for Solr 6.4.






> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2016-01-08 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090228#comment-15090228
 ] 

Dennis Gove commented on SOLR-8530:
---

This is another good option. 

My thinking for using an index is three-fold. First, a desire to not ask users 
to learn yet another way to do comparisons. If they already know the Solr 
syntax they can use that directly in this stream. And second to support even 
the non simple comparisons without having to implement them. For example a date 
range filter. This assumes that at some point we'll support metrics over dates 
but I think that's a reasonable assumption. And third, given the JDBCStream 
this provides a way for someone to do textual based queries over a subset of 
documents out of a join of Solr and non-Solr supplied documents. Obviously one 
could do a textual search over the Solr supplied stream directly but that may 
not be possible over the JDBC supplied stream.

That said, I'm not adverse to a ComparisonOperation. I just feel that a full 
index support gives us a lot of power going forward.

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2016-01-08 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090316#comment-15090316
 ] 

Joel Bernstein commented on SOLR-8530:
--

I think it makes sense to have two implementations:

*MatchStream*: Uses an in-memory index to match Tuples.
*HavingStream*: Uses a ComparisionOperation to match Tuples.

One of the things we can think over is a specific stream for doing *parallel 
alerting*. The MatchStream is step in that direction.

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2016-01-08 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090073#comment-15090073
 ] 

Joel Bernstein commented on SOLR-8530:
--

Is there a specific reason to use an index for the comparison logic? We could 
also add a ComparisonOperator interface and implements the basic comparison 
logic. 

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

2016-01-08 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090089#comment-15090089
 ] 

Joel Bernstein commented on SOLR-8530:
--

Then I could also throw away the HavingStream that comes with the SQLHandler 
which relies on Presto classes. 

> Add HavingStream to Streaming API and StreamingExpressions
> --
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce() based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
> search(.),
> sum(cost),
> on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org