[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-23 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

Added ability to turn an ExpressibleStream into a StreamExpression. Combined 
with the already existing ability to turn a StreamExpression into a string, we 
can now go back and forth from string -- stream.

This will allow us to modify ParallelStream to pass along the string expression 
of the stream it wants to parallelize.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-04-23 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510208#comment-14510208
 ] 

Dennis Gove commented on SOLR-7377:
---

Well that makes it much clearer for me. I'm sorry for deleting all the older 
patches. In my reading of the How to Contribute I was under the impression 
that uploading a new patch (with same name) would just replace the old one. 
Each time I uploaded a new version I would see the previous one still there and 
figured I'd done something wrong so went ahead and deleted the old version. It 
didn't occur to me that the old versions would still stay there but just be 
greyed out. I won't be deleting the old versions going forward. Thanks for 
clearing that up for me!

The size of the patch is a function of a bit of package refactoring in the 
org.apache.solr.client.solrj.io package. This seems to be resulting in the diff 
showing a bunch of deleted/added files.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-23 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: (was: SOLR-7377.patch)

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-21 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: (was: SOLR-7377.patch)

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7377) SOLR Streaming Expressions

2015-04-25 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512791#comment-14512791
 ] 

Dennis Gove edited comment on SOLR-7377 at 4/26/15 12:29 AM:
-

Made ParallelStream an ExpressibleStream and modified the StreamHandler to 
accept stream expression strings instead of bytecode.

Refactored operation and operand into functionName and parameter. And 
refactored all required references and tangentially related variable/class 
names.

Renamed EqualToComparator to FieldComparator to be a little more descriptive in 
the name.

Added ability to support pluggable streams by making it something you can 
configure in solrconfig.xml.

All stream-related tests pass. At this point I'd consider this functionally 
complete. 


was (Author: dpgove):
Made ParallelStream an ExpressibleStream and modified the StreamHandler to 
accept stream expression strings instead of bytecode.

Refactored operation and operand into functionName and parameter. And 
refactored all required references and tangentially related variable/class 
names.

Renamed EqualToComparator to FieldComparator to be a little more descriptive in 
the name.

Added ability to support pluggable streams by making it something you can 
configure in solrconfig.xml.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-25 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

Made ParallelStream an ExpressibleStream and modified the StreamHandler to 
accept stream expression strings instead of bytecode.

Refactored operation and operand into functionName and parameter. And 
refactored all required references and tangentially related variable/class 
names.

Renamed EqualToComparator to FieldComparator to be a little more descriptive in 
the name.

Added ability to support pluggable streams by making it something you can 
configure in solrconfig.xml.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-04-24 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512165#comment-14512165
 ] 

Dennis Gove commented on SOLR-7377:
---

I was thinking that all comparators, no matter their implemented comparison 
logic, return one of three basic values when comparing A and B. 

1. A and B are logically equal to each other
2. A is logically before B
3. A is logically after B

The implemented comparison logic is then wholly dependent on what one might be 
intending to use the comparator for. For example, EqualToComparator's 
implemented comparison logic will return that A and B are logically equal if 
they are in fact equal to each other. Its logically before/after response 
depends on the sort order (ascending or descending) but is basically deciding 
if A is less than B or if A is greater than B.

One could, if they wanted to, create a comparator returning that two dates are 
logically equal to each other if they occur within the same week. Or a 
comparator returning that two numbers are logically equal if their values are 
within the same logarithmic order of magnitude. So on and so forth.

My thinking is that comparators determine the logical comparison and make no 
assumption on what that implemented logic is. This leaves open the possibility 
of implementing other comparators for given situations as they arise.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7377) SOLR Streaming Expressions

2015-04-20 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503160#comment-14503160
 ] 

Dennis Gove edited comment on SOLR-7377 at 4/20/15 4:46 PM:


Updated the patch based on some additional items I wanted to include in this. 
Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. 
We may want to revisit this additional dependency. Guava is being used for some 
basic string checks (to ensure operations only include supported characters) 
and this logic could be coded up if we want to avoid added a dependency.

{code}
dependency org=com.google.guava name=guava 
rev=${/com.google.guava/guava} conf=compile/
{code}


was (Author: dpgove):
Updated the patch based on some additional items I wanted to include in this. 
Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. 
We may want to revisit this additional dependency. Guava is being used for some 
basic string checks (to ensure operations only include supported characters) 
and this logic could be coded up if we want to avoid added a dependency.

dependency org=com.google.guava name=guava 
rev=${/com.google.guava/guava} conf=compile/

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-20 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

Updated the patch based on some additional items I wanted to include in this. 
Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. 
We may want to revisit this additional dependency. Guava is being used for some 
basic string checks (to ensure operations only include supported characters) 
and this logic could be coded up if we want to avoid added a dependency.

dependency org=com.google.guava name=guava 
rev=${/com.google.guava/guava} conf=compile/

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-20 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: (was: SOLR-7377.patch)

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-21 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

Now allows a search expression to include a zkHost (though does not require 
it). Improved performance of EqualToComparator by moving some branching logic 
into the constructor and creating a lambda for the actual comparison.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-28 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

Fixed a bug in CloudSolrStream when handling aliases. When filtering out the 
stream-only parameters from those that need to be passed to SOLR for query I 
was checking for parameter name alias when I should have been checking for 
aliases.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-02 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525364#comment-14525364
 ] 

Dennis Gove commented on SOLR-7377:
---

I don't see the ExpressionRunner in the patch - am I missing it somewhere? 
Also, I noticed ParallelStream lines 94-100 have some System.out.println lines. 
I suspect you intended to remove those.

Tests look good.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-04-30 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522440#comment-14522440
 ] 

Dennis Gove commented on SOLR-7377:
---

I'm not totally against doing that but I feel like the refactoring is a 
required piece of this patch. I could, however, create a new ticket with just 
the refactoring and then make this one depend on that one. 

I am worried that such a ticket might look like unnecessary refactoring. 
Without the expression stuff added here I think the streaming stuff has a 
reasonable home in org.apache.solr.client.solrj.io.

That said, I certainly understand the benefit of smaller patches.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-7377) SOLR Streaming Expressions

2015-04-30 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Comment: was deleted

(was: I'm not totally against doing that but I feel like the refactoring is a 
required piece of this patch. I could, however, create a new ticket with just 
the refactoring and then make this one depend on that one. 

I am worried that such a ticket might look like unnecessary refactoring. 
Without the expression stuff added here I think the streaming stuff has a 
reasonable home in org.apache.solr.client.solrj.io.

That said, I certainly understand the benefit of smaller patches.)

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-04-30 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522439#comment-14522439
 ] 

Dennis Gove commented on SOLR-7377:
---

I'm not totally against doing that but I feel like the refactoring is a 
required piece of this patch. I could, however, create a new ticket with just 
the refactoring and then make this one depend on that one. 

I am worried that such a ticket might look like unnecessary refactoring. 
Without the expression stuff added here I think the streaming stuff has a 
reasonable home in org.apache.solr.client.solrj.io.

That said, I certainly understand the benefit of smaller patches.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-05 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528339#comment-14528339
 ] 

Dennis Gove commented on SOLR-7377:
---

I believe I've found a bug in FieldComparator. I don't have time to create a 
new patch right now, but the bug is not checking for null on the field before 
calling compare. Fixed version is below

{code:java}
  private void assignComparator(){
if(ComparatorOrder.DESCENDING == order){
  // What black magic is this type intersection??
  // Because this class is serializable we need to make sure the lambda is 
also serializable.
  // This can be done by providing this type intersection on the definition 
of the lambda.
  // Why not do it in the lambda interface? Functional Interfaces don't 
allow extends clauses
  comparator = (ComparatorLambda  Serializable)(leftTuple, rightTuple) - {
Comparable leftComp = (Comparable)leftTuple.get(leftField);
Comparable rightComp = (Comparable)rightTuple.get(rightField);

if(null == leftComp){ return -1; }
if(null == rightComp){ return 1; }

return rightComp.compareTo(leftComp);
  };
}
else{
  // See above for black magic reasoning.
  comparator = (ComparatorLambda  Serializable)(leftTuple, rightTuple) - {
Comparable leftComp = (Comparable)leftTuple.get(leftField);
Comparable rightComp = (Comparable)rightTuple.get(rightField);

if(null == leftComp){ return -1; }
if(null == rightComp){ return 1; }

return leftComp.compareTo(rightComp);
  };
}
  }
{code}

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7513) Add Equalitors to Streaming Expressions

2015-05-07 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7513:
--
Attachment: SOLR-7513.patch

 Add Equalitors to Streaming Expressions
 ---

 Key: SOLR-7513
 URL: https://issues.apache.org/jira/browse/SOLR-7513
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7513.patch


 Right now all streams use the ComparatorTuple interface to compare tuples. 
 The Comparator interface will tell you if tupleA is before, after, or equal 
 to tupleB. This is great for most streams as they use this logic when 
 combining multiple streams together. However, some streams only care about 
 the equality of two tuples and the less/greater than logic is unnecessary.
 This depends on SOLR-7377.
 This patch is to introduce a new interface into streaming expressions called 
 EqualitorTuple which will return if two tuples are equal. The benefit here 
 is that the expressions for streams using Equalitor instead of Comparator can 
 omit the ordering part.
 {code}
 unique(somestream, over=fieldA asc, fieldB desc)
 {code}
 can become
 {code}
 unique(somestream, over=fieldA,fieldB)
 {code}
 The added benefit is that this will set us up with simplier expressions for 
 joins (hash, merge, inner, outer, etc...) as those only care about equality.
 By adding this as an interface we make no assumptions about what it means to 
 be equal, just that some implementation needs to exist adhering to the 
 EqualitorTuple interface which will determine if two tuples are logically 
 equal. 
 We do define at least one concrete class which checks for equality but that 
 does not preclude others from adding additional concrete classes with their 
 own logic in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7513) Add Equalitors to Streaming Expressions

2015-05-07 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7513:
-

 Summary: Add Equalitors to Streaming Expressions
 Key: SOLR-7513
 URL: https://issues.apache.org/jira/browse/SOLR-7513
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor


Right now all streams use the ComparatorTuple interface to compare tuples. 
The Comparator interface will tell you if tupleA is before, after, or equal to 
tupleB. This is great for most streams as they use this logic when combining 
multiple streams together. However, some streams only care about the equality 
of two tuples and the less/greater than logic is unnecessary.

This depends on SOLR-7377.

This patch is to introduce a new interface into streaming expressions called 
EqualitorTuple which will return if two tuples are equal. The benefit here is 
that the expressions for streams using Equalitor instead of Comparator can omit 
the ordering part.

{code}
unique(somestream, over=fieldA asc, fieldB desc)
{code}

can become

{code}
unique(somestream, over=fieldA,fieldB)
{code}

The added benefit is that this will set us up with simplier expressions for 
joins (hash, merge, inner, outer, etc...) as those only care about equality.

By adding this as an interface we make no assumptions about what it means to be 
equal, just that some implementation needs to exist adhering to the 
EqualitorTuple interface which will determine if two tuples are logically 
equal. 

We do define at least one concrete class which checks for equality but that 
does not preclude others from adding additional concrete classes with their own 
logic in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-05-07 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

Updated patch with a few changes.

FieldComparator and StreamComparator have been collapsed into a single class 
StreamComparator. There was no need for a separate abstract class.

Added null checks in StreamComparator. For now if both are null then they will 
evaluate to equal. We can add a later enhancement under a new ticket to make 
that configurable.

Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into 
interface Expressible. They defined the same interface and there's no reason to 
have separate interfaces for them.

Passes precommit checks.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7513) Add Equalitors to Streaming Expressions

2015-05-10 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537412#comment-14537412
 ] 

Dennis Gove commented on SOLR-7513:
---

I'm pretty sure I want to change this to instead use Java's BiPredicate 
interface

https://docs.oracle.com/javase/8/docs/api/java/util/function/BiPredicate.html

 Add Equalitors to Streaming Expressions
 ---

 Key: SOLR-7513
 URL: https://issues.apache.org/jira/browse/SOLR-7513
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7513.patch


 Right now all streams use the ComparatorTuple interface to compare tuples. 
 The Comparator interface will tell you if tupleA is before, after, or equal 
 to tupleB. This is great for most streams as they use this logic when 
 combining multiple streams together. However, some streams only care about 
 the equality of two tuples and the less/greater than logic is unnecessary.
 This depends on SOLR-7377.
 This patch is to introduce a new interface into streaming expressions called 
 EqualitorTuple which will return if two tuples are equal. The benefit here 
 is that the expressions for streams using Equalitor instead of Comparator can 
 omit the ordering part.
 {code}
 unique(somestream, over=fieldA asc, fieldB desc)
 {code}
 can become
 {code}
 unique(somestream, over=fieldA,fieldB)
 {code}
 The added benefit is that this will set us up with simplier expressions for 
 joins (hash, merge, inner, outer, etc...) as those only care about equality.
 By adding this as an interface we make no assumptions about what it means to 
 be equal, just that some implementation needs to exist adhering to the 
 EqualitorTuple interface which will determine if two tuples are logically 
 equal. 
 We do define at least one concrete class which checks for equality but that 
 does not preclude others from adding additional concrete classes with their 
 own logic in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-05 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529210#comment-14529210
 ] 

Dennis Gove commented on SOLR-7377:
---

We may want to make that configurable in solrconfig.xml. Also, should this 
respect the already configurable setting of whether nulls propagate to the 
start or end of result sets?

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-05 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529233#comment-14529233
 ] 

Dennis Gove commented on SOLR-7377:
---

I do agree with you that two nulls should compare equal (I should've included 
that in my original fix), but I have seen a number of situations where users 
have balked at the decision (outside of solr). 

That said, I think it's reasonable to insist that two nulls evaluate to equal. 
(I've never agreed with the case that they wouldn't). 

Were we to make it a user-overridable thing then I do like the idea to make it 
a query-time decision.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7524) Make Streaming Expressions Java 7 Compatible

2015-05-11 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7524:
-

 Summary: Make Streaming Expressions Java 7 Compatible
 Key: SOLR-7524
 URL: https://issues.apache.org/jira/browse/SOLR-7524
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Trivial
 Fix For: 5.2


SOLR-7377 added Streaming Expressions to trunk. It uses, by choice and not 
necessity, some features of Java 8. This patch is to make minor changes to 
three files to make Streaming Expressions compatible with Java 7 and therefor 
able to be included in version 5.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7524) Make Streaming Expressions Java 7 Compatible

2015-05-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7524:
--
Attachment: SOLR-7524.patch

 Make Streaming Expressions Java 7 Compatible
 

 Key: SOLR-7524
 URL: https://issues.apache.org/jira/browse/SOLR-7524
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Trivial
 Fix For: 5.2

 Attachments: SOLR-7524.patch


 SOLR-7377 added Streaming Expressions to trunk. It uses, by choice and not 
 necessity, some features of Java 8. This patch is to make minor changes to 
 three files to make Streaming Expressions compatible with Java 7 and therefor 
 able to be included in version 5.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-05 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528228#comment-14528228
 ] 

Dennis Gove commented on SOLR-7377:
---

This looks good, I think.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-05 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528229#comment-14528229
 ] 

Dennis Gove commented on SOLR-7377:
---

This looks good, I think.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-7377) SOLR Streaming Expressions

2015-05-05 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Comment: was deleted

(was: This looks good, I think.)

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions

2015-05-08 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534432#comment-14534432
 ] 

Dennis Gove commented on SOLR-7377:
---

I can't figure out how I screwed that up - my only thought is that when I 
pulled down your latest with curl it failed and I didn't notice. Internet 
access on Amtrak trains can be splotchy. My apologies, I'll be more careful in 
the future. Let's forget my latest patch - I'll add those in a new smaller one 
after this is in trunk

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, 
 SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-7082. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7554) Add checks in Streams for incoming stream order

2015-05-16 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7554:
--
Attachment: SOLR-7554.patch

 Add checks in Streams for incoming stream order
 ---

 Key: SOLR-7554
 URL: https://issues.apache.org/jira/browse/SOLR-7554
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: Trunk, 5.2
Reporter: Dennis Gove
Priority: Minor
  Labels: streaming
 Fix For: Trunk, 5.2

 Attachments: SOLR-7554.patch


 Most Streams built on top of other streams require that their incoming 
 stream(s) be ordered in a complimentary way to how this stream is expected to 
 output its results. 
 For example, if a MergeStream is merging two streams on fieldA asc, fieldB 
 desc, then both its incoming streams must be ordered in a similar way. That 
 said, the incoming stream could be ordered more strictly, ie fieldA asc, 
 fieldB desc, fieldC asc but as long as the the comparator used in the 
 MergeStream can be derived from the incoming stream's comparator then we 
 are good to go. 
 Some comparator A can be derived from some other comparator B iff the 
 fields and their order in A is equal to the first fields and their order in 
 B. For example, fieldA asc, fieldB dec  can be derived from fieldA asc, 
 fieldB desc, fieldC asc, fieldD asc but cannot be derived from field A asc.
 This patch is to add this validation support. It requires changes to 
 Comparators, Equalitors, most Streams, and related tests. It adds a way to 
 compare Comparators and Equalitors and in the end is one more required piece 
 before we can add support for Join streams.
 It is dependent on SOLR-7513 and SOLR-7528. Other dependencies it has have 
 already been committed to trunk and the 5.2 branch.
 It does not change any interfaces to code already released (5.1 and below). 
 It does change interfaces to code in trunk and 5.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7554) Add checks in Streams for incoming stream order

2015-05-16 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7554:
-

 Summary: Add checks in Streams for incoming stream order
 Key: SOLR-7554
 URL: https://issues.apache.org/jira/browse/SOLR-7554
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: Trunk, 5.2
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk, 5.2


Most Streams built on top of other streams require that their incoming 
stream(s) be ordered in a complimentary way to how this stream is expected to 
output its results. 

For example, if a MergeStream is merging two streams on fieldA asc, fieldB 
desc, then both its incoming streams must be ordered in a similar way. That 
said, the incoming stream could be ordered more strictly, ie fieldA asc, 
fieldB desc, fieldC asc but as long as the the comparator used in the 
MergeStream can be derived from the incoming stream's comparator then we are 
good to go. 

Some comparator A can be derived from some other comparator B iff the fields 
and their order in A is equal to the first fields and their order in B. For 
example, fieldA asc, fieldB dec  can be derived from fieldA asc, fieldB 
desc, fieldC asc, fieldD asc but cannot be derived from field A asc.

This patch is to add this validation support. It requires changes to 
Comparators, Equalitors, most Streams, and related tests. It adds a way to 
compare Comparators and Equalitors and in the end is one more required piece 
before we can add support for Join streams.

It is dependent on SOLR-7513 and SOLR-7528. Other dependencies it has have 
already been committed to trunk and the 5.2 branch.

It does not change any interfaces to code already released (5.1 and below). It 
does change interfaces to code in trunk and 5.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7548) CloudSolrStream Limits Max Results to rows Param

2015-05-14 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7548:
-

 Summary: CloudSolrStream Limits Max Results to rows Param
 Key: SOLR-7548
 URL: https://issues.apache.org/jira/browse/SOLR-7548
 Project: Solr
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk


The CloudSolrStream stream class accepts a set of params to be passed to the 
standard query handler. If the provided params doesn't include rows=N then 
maximum # of records returned by this stream is the configured default rows 
value (generally 10, but perhaps more). 

As CloudSolrStream would generally be the first part of a larger set of stream 
expressions it seems counterintuitive to limit the first set by this value.

This ticket is to address this so that either we set pass a param of rows=MAX 
where MAX is the max value we can pass (max int or max long I suppose) or make 
it so that default value is ignored when in a streaming context.

Example:
Imagine we have a collection people with 90 documents in it

The following query would return at most 10 documents (assuming 10 is the 
default)
{code}
search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc)
{code}

The following query would return all documents
{code}
search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc,rows=100)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-14 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544713#comment-14544713
 ] 

Dennis Gove edited comment on SOLR-7543 at 5/15/15 1:20 AM:


For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q=some query, fl=used fields), 
traverse=search(collection1, q=some dynamic query,fl=used fields), 
on=parent.field=child.field, maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

Would even allow for tree traversal across multiple collections.


was (Author: dpgove):
For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q=some query, fl=used fields), 
traverse=search(collection1, q=some dynamic query,fl=used fields), 
on=parent.field=child.field, maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

Would even allow for tree traversal across collections.

 Create GraphQuery that allows graph traversal as a query operator.
 --

 Key: SOLR-7543
 URL: https://issues.apache.org/jira/browse/SOLR-7543
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Kevin Watters
Priority: Minor

 I have a GraphQuery that I implemented a long time back that allows a user to 
 specify a startQuery to identify which documents to start graph traversal 
 from.  It then gathers up the edge ids for those documents , optionally 
 applies an additional filter.  The query is then re-executed continually 
 until no new edge ids are identified.  I am currently hosting this code up at 
 https://github.com/kwatters/solrgraph and I would like to work with the 
 community to get some feedback and ultimately get it committed back in as a 
 lucene query.
 Here's a bit more of a description of the parameters for the query / graph 
 traversal:
 q - the initial start query that identifies the universe of documents to 
 start traversal from.
 fromField - the field name that contains the node id
 toField - the name of the field that contains the edge id(s).
 traversalFilter - this is an additional query that can be supplied to limit 
 the scope of graph traversal to just the edges that satisfy the 
 traversalFilter query.
 maxDepth - integer specifying how deep the breadth first search should go.
 returnStartNodes - boolean to determine if the documents that matched the 
 original q should be returned as part of the graph.
 onlyLeafNodes - boolean that filters the graph query to only return 
 documents/nodes that have no edges.
 We identify a set of documents with q as any arbitrary lucene query.  It 
 will collect the values in the fromField, create an OR query with those 
 values , optionally apply an additional constraint from the traversalFilter 
 and walk the result set until no new edges are detected.  Traversal can also 
 be stopped at N hops away as defined with the maxDepth.  This is a BFS 
 (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
 the same document for edge extraction.  
 This query operator does not keep track of how you arrived at the document, 
 but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-14 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544713#comment-14544713
 ] 

Dennis Gove commented on SOLR-7543:
---

For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q=some query, fl=used fields), 
traverse=search(collection1, q=some dynamic query,fl=used fields), 
on=parent.field=child.field, maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

 Create GraphQuery that allows graph traversal as a query operator.
 --

 Key: SOLR-7543
 URL: https://issues.apache.org/jira/browse/SOLR-7543
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Kevin Watters
Priority: Minor

 I have a GraphQuery that I implemented a long time back that allows a user to 
 specify a startQuery to identify which documents to start graph traversal 
 from.  It then gathers up the edge ids for those documents , optionally 
 applies an additional filter.  The query is then re-executed continually 
 until no new edge ids are identified.  I am currently hosting this code up at 
 https://github.com/kwatters/solrgraph and I would like to work with the 
 community to get some feedback and ultimately get it committed back in as a 
 lucene query.
 Here's a bit more of a description of the parameters for the query / graph 
 traversal:
 q - the initial start query that identifies the universe of documents to 
 start traversal from.
 fromField - the field name that contains the node id
 toField - the name of the field that contains the edge id(s).
 traversalFilter - this is an additional query that can be supplied to limit 
 the scope of graph traversal to just the edges that satisfy the 
 traversalFilter query.
 maxDepth - integer specifying how deep the breadth first search should go.
 returnStartNodes - boolean to determine if the documents that matched the 
 original q should be returned as part of the graph.
 onlyLeafNodes - boolean that filters the graph query to only return 
 documents/nodes that have no edges.
 We identify a set of documents with q as any arbitrary lucene query.  It 
 will collect the values in the fromField, create an OR query with those 
 values , optionally apply an additional constraint from the traversalFilter 
 and walk the result set until no new edges are detected.  Traversal can also 
 be stopped at N hops away as defined with the maxDepth.  This is a BFS 
 (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
 the same document for edge extraction.  
 This query operator does not keep track of how you arrived at the document, 
 but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-14 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544713#comment-14544713
 ] 

Dennis Gove edited comment on SOLR-7543 at 5/15/15 1:19 AM:


For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q=some query, fl=used fields), 
traverse=search(collection1, q=some dynamic query,fl=used fields), 
on=parent.field=child.field, maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

Would even allow for tree traversal across collections.


was (Author: dpgove):
For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q=some query, fl=used fields), 
traverse=search(collection1, q=some dynamic query,fl=used fields), 
on=parent.field=child.field, maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

 Create GraphQuery that allows graph traversal as a query operator.
 --

 Key: SOLR-7543
 URL: https://issues.apache.org/jira/browse/SOLR-7543
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Kevin Watters
Priority: Minor

 I have a GraphQuery that I implemented a long time back that allows a user to 
 specify a startQuery to identify which documents to start graph traversal 
 from.  It then gathers up the edge ids for those documents , optionally 
 applies an additional filter.  The query is then re-executed continually 
 until no new edge ids are identified.  I am currently hosting this code up at 
 https://github.com/kwatters/solrgraph and I would like to work with the 
 community to get some feedback and ultimately get it committed back in as a 
 lucene query.
 Here's a bit more of a description of the parameters for the query / graph 
 traversal:
 q - the initial start query that identifies the universe of documents to 
 start traversal from.
 fromField - the field name that contains the node id
 toField - the name of the field that contains the edge id(s).
 traversalFilter - this is an additional query that can be supplied to limit 
 the scope of graph traversal to just the edges that satisfy the 
 traversalFilter query.
 maxDepth - integer specifying how deep the breadth first search should go.
 returnStartNodes - boolean to determine if the documents that matched the 
 original q should be returned as part of the graph.
 onlyLeafNodes - boolean that filters the graph query to only return 
 documents/nodes that have no edges.
 We identify a set of documents with q as any arbitrary lucene query.  It 
 will collect the values in the fromField, create an OR query with those 
 values , optionally apply an additional constraint from the traversalFilter 
 and walk the result set until no new edges are detected.  Traversal can also 
 be stopped at N hops away as defined with the maxDepth.  This is a BFS 
 (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
 the same document for edge extraction.  
 This query operator does not keep track of how you arrived at the document, 
 but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7548) CloudSolrStream Limits Max Results to rows Param

2015-05-15 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545472#comment-14545472
 ] 

Dennis Gove commented on SOLR-7548:
---

That makes sense. At the moment how would I make that change to use export? Is 
it in the solrconfig.xml or as part of the incoming query?

 CloudSolrStream Limits Max Results to rows Param
 

 Key: SOLR-7548
 URL: https://issues.apache.org/jira/browse/SOLR-7548
 Project: Solr
  Issue Type: Bug
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Fix For: Trunk


 The CloudSolrStream stream class accepts a set of params to be passed to the 
 standard query handler. If the provided params doesn't include rows=N then 
 maximum # of records returned by this stream is the configured default rows 
 value (generally 10, but perhaps more). 
 As CloudSolrStream would generally be the first part of a larger set of 
 stream expressions it seems counterintuitive to limit the first set by this 
 value.
 This ticket is to address this so that either we set pass a param of rows=MAX 
 where MAX is the max value we can pass (max int or max long I suppose) or 
 make it so that default value is ignored when in a streaming context.
 Example:
 Imagine we have a collection people with 90 documents in it
 The following query would return at most 10 documents (assuming 10 is the 
 default)
 {code}
 search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc)
 {code}
 The following query would return all documents
 {code}
 search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s 
 desc,rows=100)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-15 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545604#comment-14545604
 ] 

Dennis Gove commented on SOLR-7543:
---

This might be crazy, but would allow a little more flexibility on what to return

return=Root | Leaf  = would return documents that are either in the root, or 
a leaf.
return=Root  Leaf = would return documents that are in root and are leafs 
themselves (no children)
return=Leaf | Children4 = would return documents that are leaf or have more 
than 4 children.

 Create GraphQuery that allows graph traversal as a query operator.
 --

 Key: SOLR-7543
 URL: https://issues.apache.org/jira/browse/SOLR-7543
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Kevin Watters
Priority: Minor

 I have a GraphQuery that I implemented a long time back that allows a user to 
 specify a startQuery to identify which documents to start graph traversal 
 from.  It then gathers up the edge ids for those documents , optionally 
 applies an additional filter.  The query is then re-executed continually 
 until no new edge ids are identified.  I am currently hosting this code up at 
 https://github.com/kwatters/solrgraph and I would like to work with the 
 community to get some feedback and ultimately get it committed back in as a 
 lucene query.
 Here's a bit more of a description of the parameters for the query / graph 
 traversal:
 q - the initial start query that identifies the universe of documents to 
 start traversal from.
 fromField - the field name that contains the node id
 toField - the name of the field that contains the edge id(s).
 traversalFilter - this is an additional query that can be supplied to limit 
 the scope of graph traversal to just the edges that satisfy the 
 traversalFilter query.
 maxDepth - integer specifying how deep the breadth first search should go.
 returnStartNodes - boolean to determine if the documents that matched the 
 original q should be returned as part of the graph.
 onlyLeafNodes - boolean that filters the graph query to only return 
 documents/nodes that have no edges.
 We identify a set of documents with q as any arbitrary lucene query.  It 
 will collect the values in the fromField, create an OR query with those 
 values , optionally apply an additional constraint from the traversalFilter 
 and walk the result set until no new edges are detected.  Traversal can also 
 be stopped at N hops away as defined with the maxDepth.  This is a BFS 
 (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
 the same document for edge extraction.  
 This query operator does not keep track of how you arrived at the document, 
 but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-15 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546062#comment-14546062
 ] 

Dennis Gove commented on SOLR-7543:
---

I'm with on the wanting to keep the memory usage as low as possible - I thought 
maybe you had that info hanging around already. In either case, I think this 
syntax might lower the bar to entry for usage, especially if people are already 
using streaming aggregation for other things. 

 Create GraphQuery that allows graph traversal as a query operator.
 --

 Key: SOLR-7543
 URL: https://issues.apache.org/jira/browse/SOLR-7543
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Kevin Watters
Priority: Minor

 I have a GraphQuery that I implemented a long time back that allows a user to 
 specify a startQuery to identify which documents to start graph traversal 
 from.  It then gathers up the edge ids for those documents , optionally 
 applies an additional filter.  The query is then re-executed continually 
 until no new edge ids are identified.  I am currently hosting this code up at 
 https://github.com/kwatters/solrgraph and I would like to work with the 
 community to get some feedback and ultimately get it committed back in as a 
 lucene query.
 Here's a bit more of a description of the parameters for the query / graph 
 traversal:
 q - the initial start query that identifies the universe of documents to 
 start traversal from.
 fromField - the field name that contains the node id
 toField - the name of the field that contains the edge id(s).
 traversalFilter - this is an additional query that can be supplied to limit 
 the scope of graph traversal to just the edges that satisfy the 
 traversalFilter query.
 maxDepth - integer specifying how deep the breadth first search should go.
 returnStartNodes - boolean to determine if the documents that matched the 
 original q should be returned as part of the graph.
 onlyLeafNodes - boolean that filters the graph query to only return 
 documents/nodes that have no edges.
 We identify a set of documents with q as any arbitrary lucene query.  It 
 will collect the values in the fromField, create an OR query with those 
 values , optionally apply an additional constraint from the traversalFilter 
 and walk the result set until no new edges are detected.  Traversal can also 
 be stopped at N hops away as defined with the maxDepth.  This is a BFS 
 (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
 the same document for edge extraction.  
 This query operator does not keep track of how you arrived at the document, 
 but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-15 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546062#comment-14546062
 ] 

Dennis Gove edited comment on SOLR-7543 at 5/15/15 7:39 PM:


I'm with you on wanting to keep the memory usage as low as possible - I thought 
maybe you had that info hanging around already. In either case, I think this 
syntax might lower the bar to entry for usage, especially if people are already 
using streaming aggregation for other things. 


was (Author: dpgove):
I'm with on the wanting to keep the memory usage as low as possible - I thought 
maybe you had that info hanging around already. In either case, I think this 
syntax might lower the bar to entry for usage, especially if people are already 
using streaming aggregation for other things. 

 Create GraphQuery that allows graph traversal as a query operator.
 --

 Key: SOLR-7543
 URL: https://issues.apache.org/jira/browse/SOLR-7543
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Kevin Watters
Priority: Minor

 I have a GraphQuery that I implemented a long time back that allows a user to 
 specify a startQuery to identify which documents to start graph traversal 
 from.  It then gathers up the edge ids for those documents , optionally 
 applies an additional filter.  The query is then re-executed continually 
 until no new edge ids are identified.  I am currently hosting this code up at 
 https://github.com/kwatters/solrgraph and I would like to work with the 
 community to get some feedback and ultimately get it committed back in as a 
 lucene query.
 Here's a bit more of a description of the parameters for the query / graph 
 traversal:
 q - the initial start query that identifies the universe of documents to 
 start traversal from.
 fromField - the field name that contains the node id
 toField - the name of the field that contains the edge id(s).
 traversalFilter - this is an additional query that can be supplied to limit 
 the scope of graph traversal to just the edges that satisfy the 
 traversalFilter query.
 maxDepth - integer specifying how deep the breadth first search should go.
 returnStartNodes - boolean to determine if the documents that matched the 
 original q should be returned as part of the graph.
 onlyLeafNodes - boolean that filters the graph query to only return 
 documents/nodes that have no edges.
 We identify a set of documents with q as any arbitrary lucene query.  It 
 will collect the values in the fromField, create an OR query with those 
 values , optionally apply an additional constraint from the traversalFilter 
 and walk the result set until no new edges are detected.  Traversal can also 
 be stopped at N hops away as defined with the maxDepth.  This is a BFS 
 (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
 the same document for edge extraction.  
 This query operator does not keep track of how you arrived at the document, 
 but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7528) Simplify Interfaces used in Streaming Expressions

2015-05-12 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7528:
--
Attachment: SOLR-7528.patch

 Simplify Interfaces used in Streaming Expressions
 -

 Key: SOLR-7528
 URL: https://issues.apache.org/jira/browse/SOLR-7528
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: Trunk, 5.2
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk, 5.2

 Attachments: SOLR-7528.patch


 FieldComparator and StreamComparator have been collapsed into a single class 
 StreamComparator. There was no need for a separate abstract class.
 Added null checks in StreamComparator. For now if both are null then they 
 will evaluate to equal. We can add a later enhancement under a new ticket to 
 make that configurable.
 Interfaces ExpressibleStream and ExpressibleComparator have been collapsed 
 into interface Expressible. They defined the same interface and there's no 
 reason to have separate interfaces for them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7513) Add Equalitors to Streaming Expressions

2015-05-12 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7513:
--
Attachment: SOLR-7513.patch

Modified Equalitor interface to more closely mirror Java 8's BiPredicate. I'm 
not using BiPredicate because this should be back-ported into 5.2 and as such 
needs to be Java 7 compatible.

Depends on SOLR-7377, SOLR-7524, and SOLR-7528.

 Add Equalitors to Streaming Expressions
 ---

 Key: SOLR-7513
 URL: https://issues.apache.org/jira/browse/SOLR-7513
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7513.patch, SOLR-7513.patch


 Right now all streams use the ComparatorTuple interface to compare tuples. 
 The Comparator interface will tell you if tupleA is before, after, or equal 
 to tupleB. This is great for most streams as they use this logic when 
 combining multiple streams together. However, some streams only care about 
 the equality of two tuples and the less/greater than logic is unnecessary.
 This depends on SOLR-7377.
 This patch is to introduce a new interface into streaming expressions called 
 EqualitorTuple which will return if two tuples are equal. The benefit here 
 is that the expressions for streams using Equalitor instead of Comparator can 
 omit the ordering part.
 {code}
 unique(somestream, over=fieldA asc, fieldB desc)
 {code}
 can become
 {code}
 unique(somestream, over=fieldA,fieldB)
 {code}
 The added benefit is that this will set us up with simplier expressions for 
 joins (hash, merge, inner, outer, etc...) as those only care about equality.
 By adding this as an interface we make no assumptions about what it means to 
 be equal, just that some implementation needs to exist adhering to the 
 EqualitorTuple interface which will determine if two tuples are logically 
 equal. 
 We do define at least one concrete class which checks for equality but that 
 does not preclude others from adding additional concrete classes with their 
 own logic in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7528) Simplify Interfaces used in Streaming Expressions

2015-05-12 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7528:
-

 Summary: Simplify Interfaces used in Streaming Expressions
 Key: SOLR-7528
 URL: https://issues.apache.org/jira/browse/SOLR-7528
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: Trunk, 5.2
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk, 5.2


FieldComparator and StreamComparator have been collapsed into a single class 
StreamComparator. There was no need for a separate abstract class.

Added null checks in StreamComparator. For now if both are null then they will 
evaluate to equal. We can add a later enhancement under a new ticket to make 
that configurable.

Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into 
interface Expressible. They defined the same interface and there's no reason to 
have separate interfaces for them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7377) SOLR Streaming Expressions

2015-04-10 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7377:
-

 Summary: SOLR Streaming Expressions
 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk


It would be beneficial to add an expression-based interface to Streaming API 
described in SOLR-6526. Right now that API requires streaming requests to come 
in from clients as serialized bytecode of the streaming classes. The suggestion 
here is to support string expressions which describe the streaming operations 
the client wishes to perform. 

{code:java}
search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
{code}

With this syntax in mind, one can now express arbitrarily complex stream 
queries with a single string.

{code:java}
// merge two distinct searches together on common fields
merge(
  search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  on=a_f asc, a_s asc)

// find top 20 unique records of a search
top(
  n=20,
  unique(
search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
over=a_f desc),
  over=a_f desc)
{code}

The syntax would support
1. Configurable expression names (eg. via solrconfig.xml one can map unique 
to a class implementing a Unique stream class) This allows users to build their 
own streams and use as they wish.
2. Named parameters (of both simple and expression types)
3. Unnamed, type-matched parameters (to support requiring N streams as 
arguments to another stream)
4. Positional parameters

The main goal here is to make streaming as accessible as possible and define a 
syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-10 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Attachment: SOLR-7377.patch

First-pass patch. Looking for initial feedback.

 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-6526. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   over=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams as 
 arguments to another stream)
 4. Positional parameters
 The main goal here is to make streaming as accessible as possible and define 
 a syntax for running complex queries across large distributed systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions

2015-04-10 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7377:
--
Description: 
It would be beneficial to add an expression-based interface to Streaming API 
described in SOLR-6526. Right now that API requires streaming requests to come 
in from clients as serialized bytecode of the streaming classes. The suggestion 
here is to support string expressions which describe the streaming operations 
the client wishes to perform. 

{code:java}
search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
{code}

With this syntax in mind, one can now express arbitrarily complex stream 
queries with a single string.

{code:java}
// merge two distinct searches together on common fields
merge(
  search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  on=a_f asc, a_s asc)

// find top 20 unique records of a search
top(
  n=20,
  unique(
search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
over=a_f desc),
  sort=a_f desc)
{code}

The syntax would support
1. Configurable expression names (eg. via solrconfig.xml one can map unique 
to a class implementing a Unique stream class) This allows users to build their 
own streams and use as they wish.
2. Named parameters (of both simple and expression types)
3. Unnamed, type-matched parameters (to support requiring N streams as 
arguments to another stream)
4. Positional parameters

The main goal here is to make streaming as accessible as possible and define a 
syntax for running complex queries across large distributed systems.

  was:
It would be beneficial to add an expression-based interface to Streaming API 
described in SOLR-6526. Right now that API requires streaming requests to come 
in from clients as serialized bytecode of the streaming classes. The suggestion 
here is to support string expressions which describe the streaming operations 
the client wishes to perform. 

{code:java}
search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
{code}

With this syntax in mind, one can now express arbitrarily complex stream 
queries with a single string.

{code:java}
// merge two distinct searches together on common fields
merge(
  search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  on=a_f asc, a_s asc)

// find top 20 unique records of a search
top(
  n=20,
  unique(
search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
over=a_f desc),
  over=a_f desc)
{code}

The syntax would support
1. Configurable expression names (eg. via solrconfig.xml one can map unique 
to a class implementing a Unique stream class) This allows users to build their 
own streams and use as they wish.
2. Named parameters (of both simple and expression types)
3. Unnamed, type-matched parameters (to support requiring N streams as 
arguments to another stream)
4. Positional parameters

The main goal here is to make streaming as accessible as possible and define a 
syntax for running complex queries across large distributed systems.


 SOLR Streaming Expressions
 --

 Key: SOLR-7377
 URL: https://issues.apache.org/jira/browse/SOLR-7377
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Dennis Gove
Priority: Minor
 Fix For: Trunk

 Attachments: SOLR-7377.patch


 It would be beneficial to add an expression-based interface to Streaming API 
 described in SOLR-6526. Right now that API requires streaming requests to 
 come in from clients as serialized bytecode of the streaming classes. The 
 suggestion here is to support string expressions which describe the streaming 
 operations the client wishes to perform. 
 {code:java}
 search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc)
 {code}
 With this syntax in mind, one can now express arbitrarily complex stream 
 queries with a single string.
 {code:java}
 // merge two distinct searches together on common fields
 merge(
   search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc, a_s asc)
 // find top 20 unique records of a search
 top(
   n=20,
   unique(
 search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc),
 over=a_f desc),
   sort=a_f desc)
 {code}
 The syntax would support
 1. Configurable expression names (eg. via solrconfig.xml one can map unique 
 to a class implementing a Unique stream class) This allows users to build 
 their own streams and use as they wish.
 2. Named parameters (of both simple and expression types)
 3. Unnamed, type-matched parameters (to support requiring N streams 

[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr

2015-04-10 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490733#comment-14490733
 ] 

Dennis Gove commented on SOLR-7275:
---

I like this concept but I think the response can be expanded to add a bit more 
functionality. It would be nice if the pluggable security layer could respond 
in such a way as to not wholly reject a request but to instead restrict what is 
returned from a request. It could accomplish this by providing additional 
filters to apply to a request.

{code}
public class SolrAuthorizationResponse {
  boolean authorized;
  String additionalFilterQuery;

  ...
}
{code}

By adding additionalFilterQuery, this would give the security layer an 
opportunity to say, yup, you're authorized but you can't see records matching 
this filter or yup, you're authorized but you can only see records also 
matching this filter. It provides a way to add fine-grained control of data 
access but keep that control completely outside of SOLR (as it would live in 
the pluggable security layer).

Additionally, it allows the security layer to add fine-grained control 
**without notifying the user they are being restricted** as this lives wholly 
in the SOLR --- security layer communication. There are times when telling 
the user their request was rejected due to it returning records they're not 
privileged to see actually gives the user some information you may not want 
them to know - the fact that these restricted records even exist. Instead, by 
adding filters and just not returning records the user isn't privileged for, 
the user is non-the-wiser that they were restricted at all.

 Pluggable authorization module in Solr
 --

 Key: SOLR-7275
 URL: https://issues.apache.org/jira/browse/SOLR-7275
 Project: Solr
  Issue Type: Sub-task
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 Solr needs an interface that makes it easy for different authorization 
 systems to be plugged into it. Here's what I plan on doing:
 Define an interface {{SolrAuthorizationPlugin}} with one single method 
 {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
 return an {{SolrAuthorizationResponse}} object. The object as of now would 
 only contain a single boolean value but in the future could contain more 
 information e.g. ACL for document filtering etc.
 The reason why we need a context object is so that the plugin doesn't need to 
 understand Solr's capabilities e.g. how to extract the name of the collection 
 or other information from the incoming request as there are multiple ways to 
 specify the target collection for a request. Similarly request type can be 
 specified by {{qt}} or {{/handler_name}}.
 Flow:
 Request - SolrDispatchFilter - isAuthorized(context) - Process/Return.
 {code}
 public interface SolrAuthorizationPlugin {
   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
 }
 {code}
 {code}
 public  class SolrRequestContext {
   UserInfo; // Will contain user context from the authentication layer.
   HTTPRequest request;
   Enum OperationType; // Correlated with user roles.
   String[] CollectionsAccessed;
   String[] FieldsAccessed;
   String Resource;
 }
 {code}
 {code}
 public class SolrAuthorizationResponse {
   boolean authorized;
   public boolean isAuthorized();
 }
 {code}
 User Roles: 
 * Admin
 * Collection Level:
   * Query
   * Update
   * Admin
 Using this framework, an implementation could be written for specific 
 security systems e.g. Apache Ranger or Sentry. It would keep all the security 
 system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7560) Parallel SQL Support

2015-06-09 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578750#comment-14578750
 ] 

Dennis Gove commented on SOLR-7560:
---

Possible expression syntax for the RollupStream

{code}
rollup(
  someStream(),
  over=fieldA, fieldB, fieldC,
  min(fieldA),
  max(fieldA),
  min(fieldB),
  mean(fieldD),
  sum(fieldC)
)
{code}

This would require making the *Metric types Expressible but I think that ends 
up as a good thing. Would make it real easy to support other options on metrics 
like excluding outliers, for example find the sum of values within 3 standard 
deviations from the mean could be 
{code}
sum(fieldC, limit=standardDev(3))
{code}
 (note, how that particular calculation could be implemented is left as an 
exercise for the reader, I'm just using it as an example of adding additional 
options on a relatively simple metric).
Another option example is what to do with null values. For example, in some 
cases a null should not impact a mean but in others it should. You could 
express those as
{code}
mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
an impact on the mean
mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
nothing added to numerator
mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
added to numerator
mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
replace it with fieldB, include null fieldB in mean
{code}
so on and so forth.

 Parallel SQL Support
 

 Key: SOLR-7560
 URL: https://issues.apache.org/jira/browse/SOLR-7560
 Project: Solr
  Issue Type: New Feature
  Components: clients - java, search
Reporter: Joel Bernstein
 Fix For: 5.3

 Attachments: SOLR-7560.patch


 This ticket provides support for executing *Parallel SQL* queries across 
 SolrCloud collections. The SQL engine will be built on top of the Streaming 
 API (SOLR-7082), which provides support for *parallel relational algebra* and 
 *real-time map-reduce*.
 Basic design:
 1) A new SQLHandler will be added to process SQL requests. The SQL statements 
 will be compiled to live Streaming API objects for parallel execution across 
 SolrCloud worker nodes.
 2) SolrCloud collections will be abstracted as *Relational Tables*. 
 3) The Presto SQL parser will be used to parse the SQL statements.
 4) A JDBC thin client will be added as a Solrj client.
 This ticket will focus on putting the framework in place and providing basic 
 SELECT support and GROUP BY aggregate support.
 Future releases will build on this framework to provide additional SQL 
 features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7707) Add StreamExpression Support to RollupStream

2015-06-19 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7707:
-

 Summary: Add StreamExpression Support to RollupStream
 Key: SOLR-7707
 URL: https://issues.apache.org/jira/browse/SOLR-7707
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


This ticket is to add Stream Expression support to the RollupStream as 
discussed in SOLR-7560.

Proposed expression syntax for the RollupStream (copied from that ticket)

{code}
rollup(
  someStream(),
  over=fieldA, fieldB, fieldC,
  min(fieldA),
  max(fieldA),
  min(fieldB),
  mean(fieldD),
  sum(fieldC)
)
{code}

This requires making the *Metric types Expressible but I think that ends up as 
a good thing. Would make it real easy to support other options on metrics like 
excluding outliers, for example find the sum of values within 3 standard 
deviations from the mean could be 
{code}
sum(fieldC, limit=standardDev(3))
{code}
 (note, how that particular calculation could be implemented is left as an 
exercise for the reader, I'm just using it as an example of adding additional 
options on a relatively simple metric).
Another option example is what to do with null values. For example, in some 
cases a null should not impact a mean but in others it should. You could 
express those as
{code}
mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
an impact on the mean
mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
nothing added to numerator
mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
added to numerator
mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
replace it with fieldB, include null fieldB in mean
{code}
so on and so forth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7513) Add Equalitors to Streaming Expressions

2015-06-19 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593800#comment-14593800
 ] 

Dennis Gove commented on SOLR-7513:
---

I appreciate the help with this, Joel. Thanks!

 Add Equalitors to Streaming Expressions
 ---

 Key: SOLR-7513
 URL: https://issues.apache.org/jira/browse/SOLR-7513
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: Trunk
Reporter: Dennis Gove
Assignee: Joel Bernstein
Priority: Minor
 Fix For: 5.3

 Attachments: SOLR-7513.patch, SOLR-7513.patch, SOLR-7513.patch, 
 SOLR-7513.patch


 Right now all streams use the ComparatorTuple interface to compare tuples. 
 The Comparator interface will tell you if tupleA is before, after, or equal 
 to tupleB. This is great for most streams as they use this logic when 
 combining multiple streams together. However, some streams only care about 
 the equality of two tuples and the less/greater than logic is unnecessary.
 This depends on SOLR-7377.
 This patch is to introduce a new interface into streaming expressions called 
 EqualitorTuple which will return if two tuples are equal. The benefit here 
 is that the expressions for streams using Equalitor instead of Comparator can 
 omit the ordering part.
 {code}
 unique(somestream, over=fieldA asc, fieldB desc)
 {code}
 can become
 {code}
 unique(somestream, over=fieldA,fieldB)
 {code}
 The added benefit is that this will set us up with simplier expressions for 
 joins (hash, merge, inner, outer, etc...) as those only care about equality.
 By adding this as an interface we make no assumptions about what it means to 
 be equal, just that some implementation needs to exist adhering to the 
 EqualitorTuple interface which will determine if two tuples are logically 
 equal. 
 We do define at least one concrete class which checks for equality but that 
 does not preclude others from adding additional concrete classes with their 
 own logic in place.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API

2015-06-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7669:
--
Attachment: SOLR-7669.patch

 Add SelectStream to Streaming API
 -

 Key: SOLR-7669
 URL: https://issues.apache.org/jira/browse/SOLR-7669
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7669.patch


 Adds a new stream called SelectStream which can be used for two purpose.
  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
 fields
  2. Provide aliases for fields. With this it acts as an alternative to the 
 CloudSolrStream's 'aliases' option.
  For example, in a simple case
 {code}
 select(
   id, 
   fieldA_i as fieldA, 
   fieldB_s as fieldB,
   search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
 fieldB_s asc, id asc)
 )
 {code}
 This can also be used as part of complex expressions to help keep track of 
 what is being worked on. This is particularly useful when merging/joining 
 multiple collections which share field names. For example, the following 
 results in a set of tuples including only the fields id, left.ident, and 
 right.ident even though the total set of fields required to perform the 
 search and join is much larger than just those three fields.
 {code}
 select(
   id, left.ident, right.ident,
   innerJoin(
 select(
   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
   search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, 
 sort=join1_i asc, join2_s asc, id asc)
 ),
 select(
   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
   search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, 
 sort=join3_i asc, join2_s asc),
 ),
 on=left.join1=right.join1, left.join2=right.join2
   )
 )
 {code}
 This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7669) Add SelectStream to Streaming API

2015-06-11 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7669:
-

 Summary: Add SelectStream to Streaming API
 Key: SOLR-7669
 URL: https://issues.apache.org/jira/browse/SOLR-7669
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


Adds a new stream called SelectStream which can be used for two purpose.
 1. Limit the set of fields included in an outgoing tuple to remove unwanted 
fields
 2. Provide aliases for fields. With this it acts as an alternative to the 
CloudSolrStream's 'aliases' option.

 For example, in a simple case
{code}
select(
  id, 
  fieldA_i as fieldA, 
  fieldB_s as fieldB,
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}

This can also be used as part of complex expressions to help keep track of what 
is being worked on. This is particularly useful when merging/joining multiple 
collections which share field names. For example, the following results in a 
set of tuples including only the fields id, left.ident, and right.ident even 
though the total set of fields required to perform the search and join is much 
larger than just those three fields.
{code}
select(
  id, left.ident, right.ident,
  innerJoin(
select(
  id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
  search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, 
sort=join1_i asc, join2_s asc, id asc)
),
select(
  join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
  search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, 
sort=join3_i asc, join2_s asc),
),
on=left.join1=right.join1, left.join2=right.join2
  )
)
{code}

This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-06-02 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Attachment: SOLR-7584.patch

Adds LeftOuterJoinStream to support left outer joins w/tests (work done by 
Corey Wu).
Moves some functions from InnerJoinStream up to parent classes as they are 
shared in LeftOuterJoinStream.

 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7584.patch, SOLR-7584.patch


 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-06-02 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Attachment: SOLR-7584.patch

Missed a single line in my diff that corrected a throw statement. Sorry for the 
double upload.

 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch


 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7621) Frequent 500 error IOExceptions from StreamingExpressions

2015-06-01 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568340#comment-14568340
 ] 

Dennis Gove commented on SOLR-7621:
---

Note that SOLR-7528 adds null checks in the Comparators to allow for null 
values in the sort fields. It considers two null values to be equal. The 
intention is that in a future enhancement we can support a configurable 
approach to how we treat nulls.

 Frequent 500 error IOExceptions from StreamingExpressions
 -

 Key: SOLR-7621
 URL: https://issues.apache.org/jira/browse/SOLR-7621
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2
Reporter: Hoss Man
Assignee: Joel Bernstein

 While trying to test out the new Streaming Expressions functionality, I 
 encountered lots of 500 error / IOException with various root causes  (i'll 
 post details in the comments)
 It looks like the API needs to be better hardend to give the user useful 
 feedback and return 4xx errors when used in an incorrect manner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7707) Add StreamExpression Support to RollupStream

2015-07-03 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612814#comment-14612814
 ] 

Dennis Gove edited comment on SOLR-7707 at 7/3/15 2:18 PM:
---

Looks like I cut my branch from trunk before those changes were committed. I'll 
go through some rebasing tomorrow and post up a new patch. Sorry about that.


was (Author: dpgove):
Looks like I cut my branch from trunk before those changes were committed. I'll 
go through some rebasing tomorrow and post up a new patch. Sorry abut that.

 Add StreamExpression Support to RollupStream
 

 Key: SOLR-7707
 URL: https://issues.apache.org/jira/browse/SOLR-7707
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7707.patch, SOLR-7707.patch, SOLR-7707.patch


 This ticket is to add Stream Expression support to the RollupStream as 
 discussed in SOLR-7560.
 Proposed expression syntax for the RollupStream (copied from that ticket)
 {code}
 rollup(
   someStream(),
   over=fieldA, fieldB, fieldC,
   min(fieldA),
   max(fieldA),
   min(fieldB),
   mean(fieldD),
   sum(fieldC)
 )
 {code}
 This requires making the *Metric types Expressible but I think that ends up 
 as a good thing. Would make it real easy to support other options on metrics 
 like excluding outliers, for example find the sum of values within 3 standard 
 deviations from the mean could be 
 {code}
 sum(fieldC, limit=standardDev(3))
 {code}
  (note, how that particular calculation could be implemented is left as an 
 exercise for the reader, I'm just using it as an example of adding additional 
 options on a relatively simple metric).
 Another option example is what to do with null values. For example, in some 
 cases a null should not impact a mean but in others it should. You could 
 express those as
 {code}
 mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
 an impact on the mean
 mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
 nothing added to numerator
 mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
 added to numerator
 mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
 replace it with fieldB, include null fieldB in mean
 {code}
 so on and so forth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7707) Add StreamExpression Support to RollupStream

2015-07-03 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7707:
--
Attachment: SOLR-7707.patch

New correctly based patch attached.

 Add StreamExpression Support to RollupStream
 

 Key: SOLR-7707
 URL: https://issues.apache.org/jira/browse/SOLR-7707
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7707.patch, SOLR-7707.patch, SOLR-7707.patch


 This ticket is to add Stream Expression support to the RollupStream as 
 discussed in SOLR-7560.
 Proposed expression syntax for the RollupStream (copied from that ticket)
 {code}
 rollup(
   someStream(),
   over=fieldA, fieldB, fieldC,
   min(fieldA),
   max(fieldA),
   min(fieldB),
   mean(fieldD),
   sum(fieldC)
 )
 {code}
 This requires making the *Metric types Expressible but I think that ends up 
 as a good thing. Would make it real easy to support other options on metrics 
 like excluding outliers, for example find the sum of values within 3 standard 
 deviations from the mean could be 
 {code}
 sum(fieldC, limit=standardDev(3))
 {code}
  (note, how that particular calculation could be implemented is left as an 
 exercise for the reader, I'm just using it as an example of adding additional 
 options on a relatively simple metric).
 Another option example is what to do with null values. For example, in some 
 cases a null should not impact a mean but in others it should. You could 
 express those as
 {code}
 mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
 an impact on the mean
 mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
 nothing added to numerator
 mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
 added to numerator
 mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
 replace it with fieldB, include null fieldB in mean
 {code}
 so on and so forth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7707) Add StreamExpression Support to RollupStream

2015-07-02 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7707:
--
Attachment: SOLR-7707.patch

I found the problem.

There is a test class called CountStream. In some of the test files 
(particularly 
solr/solrj/src/test-files/solrj/solr/collection1/conf/solrconfig-streaming.xml) 
the function name count was mapped to that Stream. However, now with a count 
metric I was also mapping the count function name to CountMetric.

For the moment I have corrected this by renaming CountStream to 
RecordCountStream and commented out the mapping in the solrconfig-streaming.xml 
file. I chose to change this one because it is a class in the test suite and 
not, apparently, used outside of testing.

However, this brings up an interesting question. Should we allow conflicting 
names across streams and metrics. Right now both the mapping for function name 
to Stream or Metric is stored in the same Map and as such we we are not 
allowing the conflict of names - ie, both a stream and metric cannot share the 
same function name. However, should we allow that?

I believe the answer, for clarity, is no. If you assign the string count to 
CountMetric then you cannot also assign it to CountStream. This will allow 
users to know what count() means without having to know the context. For 
example, allowing count to map to both could result in confusion in the 
following

{code}
rollup(
  count(search()),
  min(fieldA),
  count(fieldB)
)
{code}

 Add StreamExpression Support to RollupStream
 

 Key: SOLR-7707
 URL: https://issues.apache.org/jira/browse/SOLR-7707
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7707.patch, SOLR-7707.patch


 This ticket is to add Stream Expression support to the RollupStream as 
 discussed in SOLR-7560.
 Proposed expression syntax for the RollupStream (copied from that ticket)
 {code}
 rollup(
   someStream(),
   over=fieldA, fieldB, fieldC,
   min(fieldA),
   max(fieldA),
   min(fieldB),
   mean(fieldD),
   sum(fieldC)
 )
 {code}
 This requires making the *Metric types Expressible but I think that ends up 
 as a good thing. Would make it real easy to support other options on metrics 
 like excluding outliers, for example find the sum of values within 3 standard 
 deviations from the mean could be 
 {code}
 sum(fieldC, limit=standardDev(3))
 {code}
  (note, how that particular calculation could be implemented is left as an 
 exercise for the reader, I'm just using it as an example of adding additional 
 options on a relatively simple metric).
 Another option example is what to do with null values. For example, in some 
 cases a null should not impact a mean but in others it should. You could 
 express those as
 {code}
 mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
 an impact on the mean
 mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
 nothing added to numerator
 mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
 added to numerator
 mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
 replace it with fieldB, include null fieldB in mean
 {code}
 so on and so forth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7707) Add StreamExpression Support to RollupStream

2015-07-02 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612814#comment-14612814
 ] 

Dennis Gove commented on SOLR-7707:
---

Looks like I cut my branch from trunk before those changes were committed. I'll 
go through some rebasing tomorrow and post up a new patch. Sorry abut that.

 Add StreamExpression Support to RollupStream
 

 Key: SOLR-7707
 URL: https://issues.apache.org/jira/browse/SOLR-7707
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7707.patch, SOLR-7707.patch


 This ticket is to add Stream Expression support to the RollupStream as 
 discussed in SOLR-7560.
 Proposed expression syntax for the RollupStream (copied from that ticket)
 {code}
 rollup(
   someStream(),
   over=fieldA, fieldB, fieldC,
   min(fieldA),
   max(fieldA),
   min(fieldB),
   mean(fieldD),
   sum(fieldC)
 )
 {code}
 This requires making the *Metric types Expressible but I think that ends up 
 as a good thing. Would make it real easy to support other options on metrics 
 like excluding outliers, for example find the sum of values within 3 standard 
 deviations from the mean could be 
 {code}
 sum(fieldC, limit=standardDev(3))
 {code}
  (note, how that particular calculation could be implemented is left as an 
 exercise for the reader, I'm just using it as an example of adding additional 
 options on a relatively simple metric).
 Another option example is what to do with null values. For example, in some 
 cases a null should not impact a mean but in others it should. You could 
 express those as
 {code}
 mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
 an impact on the mean
 mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
 nothing added to numerator
 mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
 added to numerator
 mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
 replace it with fieldB, include null fieldB in mean
 {code}
 so on and so forth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7707) Add StreamExpression Support to RollupStream

2015-06-30 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7707:
--
Attachment: SOLR-7707.patch

Adds expression support to RollupStream. 

Note: I have added a ParallelRollupStream test but I cannot get it to pass. I 
feel as though I've forgotten a required change to make it work with 
ParallelStream.

 Add StreamExpression Support to RollupStream
 

 Key: SOLR-7707
 URL: https://issues.apache.org/jira/browse/SOLR-7707
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
 Attachments: SOLR-7707.patch


 This ticket is to add Stream Expression support to the RollupStream as 
 discussed in SOLR-7560.
 Proposed expression syntax for the RollupStream (copied from that ticket)
 {code}
 rollup(
   someStream(),
   over=fieldA, fieldB, fieldC,
   min(fieldA),
   max(fieldA),
   min(fieldB),
   mean(fieldD),
   sum(fieldC)
 )
 {code}
 This requires making the *Metric types Expressible but I think that ends up 
 as a good thing. Would make it real easy to support other options on metrics 
 like excluding outliers, for example find the sum of values within 3 standard 
 deviations from the mean could be 
 {code}
 sum(fieldC, limit=standardDev(3))
 {code}
  (note, how that particular calculation could be implemented is left as an 
 exercise for the reader, I'm just using it as an example of adding additional 
 options on a relatively simple metric).
 Another option example is what to do with null values. For example, in some 
 cases a null should not impact a mean but in others it should. You could 
 express those as
 {code}
 mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to 
 an impact on the mean
 mean(fieldA, includeNull=true) // nulls are counted in the denominator but 
 nothing added to numerator
 mean(fieldA, includeNull=false) // nulls neither counted in denominator nor 
 added to numerator
 mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null 
 replace it with fieldB, include null fieldB in mean
 {code}
 so on and so forth.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-05-21 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Description: 
Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
Streaming API to allow for joining between sub-streams.

At its basic, it would look something like this
{code}
innerJoin(
  search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
  search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
  on=fieldA=fieldA
)
{code}
or with multi-field on clauses
{code}
innerJoin(
  search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
  search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
  on=fieldA=fieldA, fieldB=fieldD
)
{code}

I'd also like to support the option of doing a hash join instead of the default 
merge join but I haven't yet figured out the best way to express that. I'd like 
to let the user tell us which sub-stream should be hashed (the least-cost one).

Also, I've been thinking about field aliasing and might want to add a 
SelectStream which serves the purpose of allowing us to limit the fields coming 
out and rename fields.

Depends on SOLR-7554

  was:
Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
Streaming API to allow for joining between sub-streams.

At its basic, it would look something like this
{code}
innerJoin(
  search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
  search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
  on=fieldA=fieldA
)
{code}
or with multi-field on clauses
{code}
innerJoin(
  search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
  search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
  on=fieldA=fieldA, fieldB=fieldD
)
{code}

I'd also like to support the option of doing a hash join instead of the default 
merge join but I haven't yet figured out the best way to express that. I'd like 
to let the user tell us which sub-stream should be hashed (the least-cost one).

Also, I've been thinking about field aliasing and might want to add a 
SelectStream which serves the purpose of allowing us to limit the fields coming 
out and rename fields.


 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming

 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-05-21 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7584:
-

 Summary: Add Joins to the Streaming API and Streaming Expressions
 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
Streaming API to allow for joining between sub-streams.

At its basic, it would look something like this
{code}
innerJoin(
  search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
  search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
  on=fieldA=fieldA
)
{code}
or with multi-field on clauses
{code}
innerJoin(
  search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
  search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
  on=fieldA=fieldA, fieldB=fieldD
)
{code}

I'd also like to support the option of doing a hash join instead of the default 
merge join but I haven't yet figured out the best way to express that. I'd like 
to let the user tell us which sub-stream should be hashed (the least-cost one).

Also, I've been thinking about field aliasing and might want to add a 
SelectStream which serves the purpose of allowing us to limit the fields coming 
out and rename fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-05-21 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Attachment: SOLR-7584.patch

Adds JoinStream to support joins of N sub-streams.
Adds BiJoinStream to limit JoinStream to 2 sub-streams, left and right.
Adds InnerJoinStream with support for merge join.

Does not handle hash joins.
Uses aliasing concept already available in CloudSolrStream.

Still work to be done.

 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7584.patch


 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-05-21 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555348#comment-14555348
 ] 

Dennis Gove edited comment on SOLR-7584 at 5/22/15 12:27 AM:
-

Adds abstract JoinStream to support joins of N sub-streams.
Adds abstract BiJoinStream to limit JoinStream to 2 sub-streams, left and right.
Adds concrete InnerJoinStream with support for merge join.

Does not handle hash joins.
Uses aliasing concept already available in CloudSolrStream.

Still work to be done.


was (Author: dpgove):
Adds JoinStream to support joins of N sub-streams.
Adds BiJoinStream to limit JoinStream to 2 sub-streams, left and right.
Adds InnerJoinStream with support for merge join.

Does not handle hash joins.
Uses aliasing concept already available in CloudSolrStream.

Still work to be done.

 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7584.patch


 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-05-22 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556008#comment-14556008
 ] 

Dennis Gove commented on SOLR-7584:
---

That's right. LeftOuterJoin wasn't included in the first version of the patch. 
At the moment the patch includes changes to a set of supporting classes and 
adds inner join. Left outer join isn't ready yet. I expect the expression 
syntax to be the same (two streams with an on clause) and the implementation to 
be fairly similar to inner join but taking into account that a right-side 
record isn't required for the left-side record to be returned.

 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7584.patch


 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7938) MergeStream to support N streams

2015-08-17 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-7938:
-

 Summary: MergeStream to support N streams
 Key: SOLR-7938
 URL: https://issues.apache.org/jira/browse/SOLR-7938
 Project: Solr
  Issue Type: Bug
  Components: SolrJ
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor


Enhances MergeStream to support merging N streams. This was previously limited 
to merging just two streams but with this enhancement it can now accept any 
number of streams to merge.

Based on the comparator, if more than one stream could provide the next value 
then the selected value will follow the order of the streams as they appear in 
the expression or were added to the MergeStream object.

{code}
merge(
  search(collection1, q=id:(0 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
asc),
  search(collection1, q=id:(1), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc),
  search(collection1, q=id:(2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc),
  on=a_f asc
)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7938) MergeStream to support N streams

2015-08-17 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7938:
--
Issue Type: Improvement  (was: Bug)

 MergeStream to support N streams
 

 Key: SOLR-7938
 URL: https://issues.apache.org/jira/browse/SOLR-7938
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
  Labels: streaming

 Enhances MergeStream to support merging N streams. This was previously 
 limited to merging just two streams but with this enhancement it can now 
 accept any number of streams to merge.
 Based on the comparator, if more than one stream could provide the next value 
 then the selected value will follow the order of the streams as they appear 
 in the expression or were added to the MergeStream object.
 {code}
 merge(
   search(collection1, q=id:(0 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection1, q=id:(1), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection1, q=id:(2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc
 )
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7938) MergeStream to support N streams

2015-08-17 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7938:
--
Attachment: SOLR-7938.patch

 MergeStream to support N streams
 

 Key: SOLR-7938
 URL: https://issues.apache.org/jira/browse/SOLR-7938
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Affects Versions: Trunk
Reporter: Dennis Gove
Priority: Minor
  Labels: streaming
 Attachments: SOLR-7938.patch


 Enhances MergeStream to support merging N streams. This was previously 
 limited to merging just two streams but with this enhancement it can now 
 accept any number of streams to merge.
 Based on the comparator, if more than one stream could provide the next value 
 then the selected value will follow the order of the streams as they appear 
 in the expression or were added to the MergeStream object.
 {code}
 merge(
   search(collection1, q=id:(0 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection1, q=id:(1), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   search(collection1, q=id:(2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s 
 asc),
   on=a_f asc
 )
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API

2015-08-06 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7669:
--
Attachment: SOLR-7669.patch

Updated to add support for performing operations on the selected values. The 
only operation included in this patch is Replace which can be used to replace 
field values (or nulll) with a different value or the value of another field.

In the following example, if fieldA is null then it will be replaced with value 
123 and if fieldB is foo then it will be set to bar.
{code}
select(
  id, 
  fieldA_i as fieldA, 
  fieldB_s as fieldB,
  replace(fieldA, null, 123),
  replace(fieldB, foo, withValue=bar),
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}

In the following example, if fieldA is null or ??? then it will be replaced 
with the value of fieldB.
{code}
select(
  id, 
  fieldA_s as fieldA, 
  fieldB_s as fieldB,
  replace(fieldA, null, withField=fieldB),
  replace(fieldA, ???, withField=fieldB)
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}

 Add SelectStream to Streaming API
 -

 Key: SOLR-7669
 URL: https://issues.apache.org/jira/browse/SOLR-7669
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7669.patch, SOLR-7669.patch


 Adds a new stream called SelectStream which can be used for two purpose.
  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
 fields
  2. Provide aliases for fields. With this it acts as an alternative to the 
 CloudSolrStream's 'aliases' option.
  For example, in a simple case
 {code}
 select(
   id, 
   fieldA_i as fieldA, 
   fieldB_s as fieldB,
   search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
 fieldB_s asc, id asc)
 )
 {code}
 This can also be used as part of complex expressions to help keep track of 
 what is being worked on. This is particularly useful when merging/joining 
 multiple collections which share field names. For example, the following 
 results in a set of tuples including only the fields id, left.ident, and 
 right.ident even though the total set of fields required to perform the 
 search and join is much larger than just those three fields.
 {code}
 select(
   id, left.ident, right.ident,
   innerJoin(
 select(
   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
   search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, 
 sort=join1_i asc, join2_s asc, id asc)
 ),
 select(
   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
   search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, 
 sort=join3_i asc, join2_s asc),
 ),
 on=left.join1=right.join1, left.join2=right.join2
   )
 )
 {code}
 This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-08-06 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Attachment: SOLR-7584.patch

Recreated patch off current trunk. Previous patch was a little outdated.

 Add Joins to the Streaming API and Streaming Expressions
 

 Key: SOLR-7584
 URL: https://issues.apache.org/jira/browse/SOLR-7584
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, 
 SOLR-7584.patch


 Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
 Streaming API to allow for joining between sub-streams.
 At its basic, it would look something like this
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA
 )
 {code}
 or with multi-field on clauses
 {code}
 innerJoin(
   search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...),
   search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...),
   on=fieldA=fieldA, fieldB=fieldD
 )
 {code}
 I'd also like to support the option of doing a hash join instead of the 
 default merge join but I haven't yet figured out the best way to express 
 that. I'd like to let the user tell us which sub-stream should be hashed (the 
 least-cost one).
 Also, I've been thinking about field aliasing and might want to add a 
 SelectStream which serves the purpose of allowing us to limit the fields 
 coming out and rename fields.
 Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7669) Add SelectStream to Streaming API

2015-08-10 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661163#comment-14661163
 ] 

Dennis Gove edited comment on SOLR-7669 at 8/10/15 1:55 PM:


Updated to add support for performing operations on the selected values. The 
only operation included in this patch is Replace which can be used to replace 
field values (or nulll) with a different value or the value of another field.

In the following example, if fieldA is null then it will be replaced with value 
123 and if fieldB is foo then it will be set to bar.
{code}
select(
  id, 
  fieldA_i as fieldA, 
  fieldB_s as fieldB,
  replace(fieldA, null, withValue=123),
  replace(fieldB, foo, withValue=bar),
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}

In the following example, if fieldA is null or ??? then it will be replaced 
with the value of fieldB.
{code}
select(
  id, 
  fieldA_s as fieldA, 
  fieldB_s as fieldB,
  replace(fieldA, null, withField=fieldB),
  replace(fieldA, ???, withField=fieldB)
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}


was (Author: dpgove):
Updated to add support for performing operations on the selected values. The 
only operation included in this patch is Replace which can be used to replace 
field values (or nulll) with a different value or the value of another field.

In the following example, if fieldA is null then it will be replaced with value 
123 and if fieldB is foo then it will be set to bar.
{code}
select(
  id, 
  fieldA_i as fieldA, 
  fieldB_s as fieldB,
  replace(fieldA, null, 123),
  replace(fieldB, foo, withValue=bar),
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}

In the following example, if fieldA is null or ??? then it will be replaced 
with the value of fieldB.
{code}
select(
  id, 
  fieldA_s as fieldA, 
  fieldB_s as fieldB,
  replace(fieldA, null, withField=fieldB),
  replace(fieldA, ???, withField=fieldB)
  search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
fieldB_s asc, id asc)
)
{code}

 Add SelectStream to Streaming API
 -

 Key: SOLR-7669
 URL: https://issues.apache.org/jira/browse/SOLR-7669
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor
  Labels: Streaming
 Attachments: SOLR-7669.patch, SOLR-7669.patch


 Adds a new stream called SelectStream which can be used for two purpose.
  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
 fields
  2. Provide aliases for fields. With this it acts as an alternative to the 
 CloudSolrStream's 'aliases' option.
  For example, in a simple case
 {code}
 select(
   id, 
   fieldA_i as fieldA, 
   fieldB_s as fieldB,
   search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, 
 fieldB_s asc, id asc)
 )
 {code}
 This can also be used as part of complex expressions to help keep track of 
 what is being worked on. This is particularly useful when merging/joining 
 multiple collections which share field names. For example, the following 
 results in a set of tuples including only the fields id, left.ident, and 
 right.ident even though the total set of fields required to perform the 
 search and join is much larger than just those three fields.
 {code}
 select(
   id, left.ident, right.ident,
   innerJoin(
 select(
   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
   search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, 
 sort=join1_i asc, join2_s asc, id asc)
 ),
 select(
   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
   search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, 
 sort=join3_i asc, join2_s asc),
 ),
 on=left.join1=right.join1, left.join2=right.join2
   )
 )
 {code}
 This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API

2015-10-22 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7669:
--
Attachment: SOLR-7669.patch

Deleted the EditStream as its functionality (the removal of fields from a 
tuple) is superseded by the SelectStream. Updated the SQLHandler to use the 
SelectStream instead of the EditStream.

All relevant tests pass. 

> Add SelectStream to Streaming API
> -
>
> Key: SOLR-7669
> URL: https://issues.apache.org/jira/browse/SOLR-7669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, 
> SOLR-7669.patch
>
>
> Adds a new stream called SelectStream which can be used for two purpose.
>  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
> fields
>  2. Provide aliases for fields. With this it acts as an alternative to the 
> CloudSolrStream's 'aliases' option.
>  For example, in a simple case
> {code}
> select(
>   id, 
>   fieldA_i as fieldA, 
>   fieldB_s as fieldB,
>   search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, 
> fieldB_s asc, id asc")
> )
> {code}
> This can also be used as part of complex expressions to help keep track of 
> what is being worked on. This is particularly useful when merging/joining 
> multiple collections which share field names. For example, the following 
> results in a set of tuples including only the fields id, left.ident, and 
> right.ident even though the total set of fields required to perform the 
> search and join is much larger than just those three fields.
> {code}
> select(
>   id, left.ident, right.ident,
>   innerJoin(
> select(
>   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
>   search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", 
> sort="join1_i asc, join2_s asc, id asc")
> ),
> select(
>   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
>   search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", 
> sort="join3_i asc, join2_s asc"),
> ),
> on="left.join1=right.join1, left.join2=right.join2"
>   )
> )
> {code}
> This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-10-22 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-8188:
-

 Summary: Add hash style joins to the Streaming API and Streaming 
Expressions
 Key: SOLR-8188
 URL: https://issues.apache.org/jira/browse/SOLR-8188
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
optimized joining between sub-streams.

HashJoinStream is similar to an InnerJoinStream except that it does not insist 
on any particular order and will read all values from the stream being hashed 
(hashStream) when open() is called. During read() it will return the next tuple 
from the stream not being hashed (fullStream) which has at least one matching 
record in hashStream. It will return a tuple which is the merge of both tuples. 
If the tuple from the fullStream matches with more than one tuple from the 
hashStream then calling read() will return the merge with the next matching 
tuple. The order of the resulting stream is the order of the fullStream.

OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
that a tuple from fullStream will be returned even if it doesn't have a 
matching record in hashStream. All other pieces are identical.

In expression form
{code}
hashJoin(
  search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
  hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
  on="fieldA, fieldB"
)
{code}

{code}
outerHashJoin(
  search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
  hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
  on="fieldA, fieldB"
)
{code}

As you can see the hashStream is named parameter which makes it very clear 
which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8198) Change ReducerStream to use StreamEqualitor instead of StreamComparator

2015-10-27 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8198:
--
Attachment: SOLR-8198.patch

All tests pass.

> Change ReducerStream to use StreamEqualitor instead of StreamComparator
> ---
>
> Key: SOLR-8198
> URL: https://issues.apache.org/jira/browse/SOLR-8198
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8198.patch
>
>
> Currently the ReducerStream uses a StreamComparator to determine whether 
> tuples are equal. StreamEqualitors are a simplified version of a comparator 
> in that they do not require a sort to be provided. Using the function 
> getStreamSort we are still able to validate the incoming stream's sort and 
> pass that on up to any parent stream which might require it.
> This will simplify the use of the ReducerStream in join scenarios where the 
> reducer is used to find like records. Such a scenario exists with Inner/Outer 
> JoinStream, ComplementStream, and [Outer]HashJoinStreams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-10-27 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976307#comment-14976307
 ] 

Dennis Gove commented on SOLR-7525:
---

I've got a patch for this that also includes an IntersectStream (return tuples 
from streamA that also exist in streamB). I just want to add some additional 
tests before a post the patch.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-10-29 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981511#comment-14981511
 ] 

Dennis Gove edited comment on SOLR-7525 at 10/29/15 10:59 PM:
--

Includes both ComplementStream and IntersectStream. All tests pass.

Depends on SOLR-8198.


was (Author: dpgove):
Includes both ComplementStream and IntersectStream. All tests pass.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-10-29 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7525:
--
Attachment: SOLR-7525.patch

Includes both ComplementStream and IntersectStream. All tests pass.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-10-23 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972324#comment-14972324
 ] 

Dennis Gove commented on SOLR-7584:
---

Could you describe your use-case for joining on facets? I can imagine that a 
HashJoin (SOLR-8188) would be good for something like that because it removes 
the sort requirement.

Yes, you can apply functions like sum and average on the joined data by 
wrapping the resulting joined stream in a RollupStream and using metrics.

> Add Joins to the Streaming API and Streaming Expressions
> 
>
> Key: SOLR-7584
> URL: https://issues.apache.org/jira/browse/SOLR-7584
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, 
> SOLR-7584.patch, SOLR-7584.patch
>
>
> Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
> Streaming API to allow for joining between sub-streams.
> At its basic, it would look something like this
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA"
> )
> {code}
> or with multi-field on clauses
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA, fieldB=fieldD"
> )
> {code}
> I'd also like to support the option of doing a hash join instead of the 
> default merge join but I haven't yet figured out the best way to express 
> that. I'd like to let the user tell us which sub-stream should be hashed (the 
> least-cost one).
> Also, I've been thinking about field aliasing and might want to add a 
> SelectStream which serves the purpose of allowing us to limit the fields 
> coming out and rename fields.
> Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8198) Change ReducerStream to use StreamEqualitor instead of StreamComparator

2015-10-23 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-8198:
-

 Summary: Change ReducerStream to use StreamEqualitor instead of 
StreamComparator
 Key: SOLR-8198
 URL: https://issues.apache.org/jira/browse/SOLR-8198
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


Currently the ReducerStream uses a StreamComparator to determine whether tuples 
are equal. StreamEqualitors are a simplified version of a comparator in that 
they do not require a sort to be provided. Using the function getStreamSort we 
are still able to validate the incoming stream's sort and pass that on up to 
any parent stream which might require it.

This will simplify the use of the ReducerStream in join scenarios where the 
reducer is used to find like records. Such a scenario exists with Inner/Outer 
JoinStream, ComplementStream, and [Outer]HashJoinStreams.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-10-23 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Attachment: SOLR-7584.patch

Part of this ticket is a change in comparators and equalitors to support 
differing field names on either side of the comparison (ie, fieldA = fieldB). 
Due to changes that have come into trunk between the creation of this patch and 
now it was required that I propagate those changes to a couple of other files.

Note, I originally included this change in SOLR-7669 but realized today that 
it's actually necessary in this patch. Here's me regretting the decision to not 
create a separate ticket for the equalitor/comparator changes but this patch 
does also add support for distributed joins so there's that. Either way, 
description of change is below.

Required a couple of changes in the SQL and FacetStream areas related to 
FieldComparator. The FieldComparator has been changed to support different 
field names on the left and right side. The SQL and FacetStream areas use 
FieldComparator for sorting (a totally valid use case) but do expect the left 
and right side field names to be equal. The changes I made go through and 
validate that assumption.

In the future I think I may circle back around and create a new FieldComparator 
with a single field name so that on construction that assumption can be 
enforced.

All tests pass.

> Add Joins to the Streaming API and Streaming Expressions
> 
>
> Key: SOLR-7584
> URL: https://issues.apache.org/jira/browse/SOLR-7584
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, 
> SOLR-7584.patch, SOLR-7584.patch
>
>
> Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
> Streaming API to allow for joining between sub-streams.
> At its basic, it would look something like this
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA"
> )
> {code}
> or with multi-field on clauses
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA, fieldB=fieldD"
> )
> {code}
> I'd also like to support the option of doing a hash join instead of the 
> default merge join but I haven't yet figured out the best way to express 
> that. I'd like to let the user tell us which sub-stream should be hashed (the 
> least-cost one).
> Also, I've been thinking about field aliasing and might want to add a 
> SelectStream which serves the purpose of allowing us to limit the fields 
> coming out and rename fields.
> Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-10-22 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8188:
--
Attachment: SOLR-8188.patch

Added a field seperator to the hash calculation. This is to prevent a situation 
where two tuples have the same hashed value where they shoudn't.

t1.fieldA = "foo"
t1.fieldB = "bar"

t2.fieldA = "foob"
t2.fieldB = "ar"

With this change the hash will be different for t1 and t2.

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-10-22 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8188:
--
Attachment: SOLR-8188.patch

All tests pass.

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-10-22 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970311#comment-14970311
 ] 

Dennis Gove edited comment on SOLR-8188 at 10/23/15 2:43 AM:
-

Added a field separator to the hash calculation. This is to prevent a situation 
where two tuples have the same hashed value where they shouldn't.

t1.fieldA = "foo"
t1.fieldB = "bar"

t2.fieldA = "foob"
t2.fieldB = "ar"

With this change the hash will be different for t1 and t2.


was (Author: dpgove):
Added a field seperator to the hash calculation. This is to prevent a situation 
where two tuples have the same hashed value where they shoudn't.

t1.fieldA = "foo"
t1.fieldB = "bar"

t2.fieldA = "foob"
t2.fieldB = "ar"

With this change the hash will be different for t1 and t2.

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8185) Add operations support to streaming metrics

2015-10-21 Thread Dennis Gove (JIRA)
Dennis Gove created SOLR-8185:
-

 Summary: Add operations support to streaming metrics
 Key: SOLR-8185
 URL: https://issues.apache.org/jira/browse/SOLR-8185
 Project: Solr
  Issue Type: Improvement
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


Adds support for operations on stream metrics.

With this feature one can modify tuple values before applying to the computed 
metric. There are a lot of use-cases I can see with this - I'll describe one 
here.

Imagine you have a RollupStream which is computing the average over some field 
but you cannot be sure that all documents have a value for that field, ie the 
value is null. When the value is null you want to treat it as a 0. With this 
feature you can accomplish that like this

{code}
rollup(
  search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
  over=\"a_s\",
  avg(a_i, replace(null, withValue=0)),
  count(*),
)
{code}

The operations are applied to the tuple for each metric in the stream which 
means you perform different operations on different metrics without being 
impacted by operations on other metrics. 

Adding to our previous example, imagine you want to also get the min of a field 
but do not consider null values.

{code}
rollup(
  search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
  over=\"a_s\",
  avg(a_i, replace(null, withValue=0)),
  min(a_i),
  count(*),
)
{code}

Also, the tuple is not modified for streams that might wrap this one. Ie, the 
only thing that sees the applied operation is that particular metric. If you 
want to apply operations for wrapping streams you can still achieve that with 
the SelectStream (SOLR-7669).

One feature I'm investigating but this patch DOES NOT add is the ability to 
assign names to the resulting metric value. For example, to allow for something 
like this

{code}
rollup(
  search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
  over=\"a_s\",
  avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
  avg(a_i),
  count(*, as="totalCount"),
)
{code}

Right now that isn't possible because the identifier for each metric would be 
the same "avg_a_i" and as such both couldn't be returned. It's relatively easy 
to add but I have to investigate its impact on the SQL and FacetStream areas.

Depends on SOLR-7669 (SelectStream)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8185) Add operations support to streaming metrics

2015-10-21 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8185:
--
Attachment: SOLR-8185.patch

Full patch. All tests pass.

> Add operations support to streaming metrics
> ---
>
> Key: SOLR-8185
> URL: https://issues.apache.org/jira/browse/SOLR-8185
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8185.patch
>
>
> Adds support for operations on stream metrics.
> With this feature one can modify tuple values before applying to the computed 
> metric. There are a lot of use-cases I can see with this - I'll describe one 
> here.
> Imagine you have a RollupStream which is computing the average over some 
> field but you cannot be sure that all documents have a value for that field, 
> ie the value is null. When the value is null you want to treat it as a 0. 
> With this feature you can accomplish that like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   count(*),
> )
> {code}
> The operations are applied to the tuple for each metric in the stream which 
> means you perform different operations on different metrics without being 
> impacted by operations on other metrics. 
> Adding to our previous example, imagine you want to also get the min of a 
> field but do not consider null values.
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   min(a_i),
>   count(*),
> )
> {code}
> Also, the tuple is not modified for streams that might wrap this one. Ie, the 
> only thing that sees the applied operation is that particular metric. If you 
> want to apply operations for wrapping streams you can still achieve that with 
> the SelectStream (SOLR-7669).
> One feature I'm investigating but this patch DOES NOT add is the ability to 
> assign names to the resulting metric value. For example, to allow for 
> something like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
>   avg(a_i),
>   count(*, as="totalCount"),
> )
> {code}
> Right now that isn't possible because the identifier for each metric would be 
> the same "avg_a_i" and as such both couldn't be returned. It's relatively 
> easy to add but I have to investigate its impact on the SQL and FacetStream 
> areas.
> Depends on SOLR-7669 (SelectStream)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API

2015-10-21 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7669:
--
Attachment: SOLR-7669.patch

Rebased against trunk (git hash f63fc48, SOLR-8114: in Grouping.java rename 
groupSort to withinGroupSort)

Required a couple of changes in the SQL and FacetStream areas related to 
FieldComparator. The FieldComparator has been changed to support different 
field names on the left and right side. The SQL and FacetStream areas use 
FieldComparator for sorting (a totally valid use case) but do expect the left 
and right side field names to be equal. The changes I made go through and 
validate that assumption.

In the future I think I may circle back around and create a new FieldComparator 
with a single field name so that on construction that assumption can be 
enforced.

All tests pass.

> Add SelectStream to Streaming API
> -
>
> Key: SOLR-7669
> URL: https://issues.apache.org/jira/browse/SOLR-7669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch
>
>
> Adds a new stream called SelectStream which can be used for two purpose.
>  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
> fields
>  2. Provide aliases for fields. With this it acts as an alternative to the 
> CloudSolrStream's 'aliases' option.
>  For example, in a simple case
> {code}
> select(
>   id, 
>   fieldA_i as fieldA, 
>   fieldB_s as fieldB,
>   search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, 
> fieldB_s asc, id asc")
> )
> {code}
> This can also be used as part of complex expressions to help keep track of 
> what is being worked on. This is particularly useful when merging/joining 
> multiple collections which share field names. For example, the following 
> results in a set of tuples including only the fields id, left.ident, and 
> right.ident even though the total set of fields required to perform the 
> search and join is much larger than just those three fields.
> {code}
> select(
>   id, left.ident, right.ident,
>   innerJoin(
> select(
>   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
>   search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", 
> sort="join1_i asc, join2_s asc, id asc")
> ),
> select(
>   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
>   search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", 
> sort="join3_i asc, join2_s asc"),
> ),
> on="left.join1=right.join1, left.join2=right.join2"
>   )
> )
> {code}
> This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-8268) Makes StatsStream implement Expressible interface

2015-11-10 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-8268:
-

Assignee: Dennis Gove

> Makes StatsStream implement Expressible interface
> -
>
> Key: SOLR-8268
> URL: https://issues.apache.org/jira/browse/SOLR-8268
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Trivial
>  Labels: Streaming
> Fix For: Trunk
>
> Attachments: SOLR-8268.patch
>
>
> Adds expression support to the Stats stream. With this it will now be 
> possible to express an stats stream as
> {code}
> stats(
>   collection1, q=*:*, fl="fieldA,fieldB,fieldInt,fieldFloat",
>   sum(fieldInt), 
>   sum(fieldFloat), 
>   min(fieldInt), 
>   min(fieldFloat), 
>   max(fieldInt), 
>   max(fieldFloat), 
>   avg(fieldInt), 
>   avg(fieldFloat), 
>   count(*)
> )
> {code}
> You can collect stats on any supported metric and use full metric features. 
> Ie, when SOLR-8185 is committed you can then include operations in the 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-7669) Add SelectStream to Streaming API

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-7669:
-

Assignee: Dennis Gove

> Add SelectStream to Streaming API
> -
>
> Key: SOLR-7669
> URL: https://issues.apache.org/jira/browse/SOLR-7669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, 
> SOLR-7669.patch, SOLR-7669.patch
>
>
> Adds a new stream called SelectStream which can be used for two purpose.
>  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
> fields
>  2. Provide aliases for fields. With this it acts as an alternative to the 
> CloudSolrStream's 'aliases' option.
>  For example, in a simple case
> {code}
> select(
>   id, 
>   fieldA_i as fieldA, 
>   fieldB_s as fieldB,
>   search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, 
> fieldB_s asc, id asc")
> )
> {code}
> This can also be used as part of complex expressions to help keep track of 
> what is being worked on. This is particularly useful when merging/joining 
> multiple collections which share field names. For example, the following 
> results in a set of tuples including only the fields id, left.ident, and 
> right.ident even though the total set of fields required to perform the 
> search and join is much larger than just those three fields.
> {code}
> select(
>   id, left.ident, right.ident,
>   innerJoin(
> select(
>   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
>   search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", 
> sort="join1_i asc, join2_s asc, id asc")
> ),
> select(
>   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
>   search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", 
> sort="join3_i asc, join2_s asc"),
> ),
> on="left.join1=right.join1, left.join2=right.join2"
>   )
> )
> {code}
> This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7669:
--
Attachment: SOLR-7669.patch

Fixes for pre-commit failures. Add documentation on the operations.

> Add SelectStream to Streaming API
> -
>
> Key: SOLR-7669
> URL: https://issues.apache.org/jira/browse/SOLR-7669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, 
> SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch
>
>
> Adds a new stream called SelectStream which can be used for two purpose.
>  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
> fields
>  2. Provide aliases for fields. With this it acts as an alternative to the 
> CloudSolrStream's 'aliases' option.
>  For example, in a simple case
> {code}
> select(
>   id, 
>   fieldA_i as fieldA, 
>   fieldB_s as fieldB,
>   search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, 
> fieldB_s asc, id asc")
> )
> {code}
> This can also be used as part of complex expressions to help keep track of 
> what is being worked on. This is particularly useful when merging/joining 
> multiple collections which share field names. For example, the following 
> results in a set of tuples including only the fields id, left.ident, and 
> right.ident even though the total set of fields required to perform the 
> search and join is much larger than just those three fields.
> {code}
> select(
>   id, left.ident, right.ident,
>   innerJoin(
> select(
>   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
>   search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", 
> sort="join1_i asc, join2_s asc, id asc")
> ),
> select(
>   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
>   search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", 
> sort="join3_i asc, join2_s asc"),
> ),
> on="left.join1=right.join1, left.join2=right.join2"
>   )
> )
> {code}
> This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-7669) Add SelectStream to Streaming API

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-7669.
-
   Resolution: Implemented
Fix Version/s: Trunk

> Add SelectStream to Streaming API
> -
>
> Key: SOLR-7669
> URL: https://issues.apache.org/jira/browse/SOLR-7669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Fix For: Trunk
>
> Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, 
> SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch
>
>
> Adds a new stream called SelectStream which can be used for two purpose.
>  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
> fields
>  2. Provide aliases for fields. With this it acts as an alternative to the 
> CloudSolrStream's 'aliases' option.
>  For example, in a simple case
> {code}
> select(
>   id, 
>   fieldA_i as fieldA, 
>   fieldB_s as fieldB,
>   search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, 
> fieldB_s asc, id asc")
> )
> {code}
> This can also be used as part of complex expressions to help keep track of 
> what is being worked on. This is particularly useful when merging/joining 
> multiple collections which share field names. For example, the following 
> results in a set of tuples including only the fields id, left.ident, and 
> right.ident even though the total set of fields required to perform the 
> search and join is much larger than just those three fields.
> {code}
> select(
>   id, left.ident, right.ident,
>   innerJoin(
> select(
>   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
>   search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", 
> sort="join1_i asc, join2_s asc, id asc")
> ),
> select(
>   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
>   search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", 
> sort="join3_i asc, join2_s asc"),
> ),
> on="left.join1=right.join1, left.join2=right.join2"
>   )
> )
> {code}
> This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-8185) Add operations support to streaming metrics

2015-11-12 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968357#comment-14968357
 ] 

Dennis Gove edited comment on SOLR-8185 at 11/13/15 1:41 AM:
-

Full patch. 


was (Author: dpgove):
Full patch. All tests pass.

> Add operations support to streaming metrics
> ---
>
> Key: SOLR-8185
> URL: https://issues.apache.org/jira/browse/SOLR-8185
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8185.patch
>
>
> Adds support for operations on stream metrics.
> With this feature one can modify tuple values before applying to the computed 
> metric. There are a lot of use-cases I can see with this - I'll describe one 
> here.
> Imagine you have a RollupStream which is computing the average over some 
> field but you cannot be sure that all documents have a value for that field, 
> ie the value is null. When the value is null you want to treat it as a 0. 
> With this feature you can accomplish that like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   count(*),
> )
> {code}
> The operations are applied to the tuple for each metric in the stream which 
> means you perform different operations on different metrics without being 
> impacted by operations on other metrics. 
> Adding to our previous example, imagine you want to also get the min of a 
> field but do not consider null values.
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   min(a_i),
>   count(*),
> )
> {code}
> Also, the tuple is not modified for streams that might wrap this one. Ie, the 
> only thing that sees the applied operation is that particular metric. If you 
> want to apply operations for wrapping streams you can still achieve that with 
> the SelectStream (SOLR-7669).
> One feature I'm investigating but this patch DOES NOT add is the ability to 
> assign names to the resulting metric value. For example, to allow for 
> something like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
>   avg(a_i),
>   count(*, as="totalCount"),
> )
> {code}
> Right now that isn't possible because the identifier for each metric would be 
> the same "avg_a_i" and as such both couldn't be returned. It's relatively 
> easy to add but I have to investigate its impact on the SQL and FacetStream 
> areas.
> Depends on SOLR-7669 (SelectStream)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-8185) Add operations support to streaming metrics

2015-11-12 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-8185:
-

Assignee: Dennis Gove

> Add operations support to streaming metrics
> ---
>
> Key: SOLR-8185
> URL: https://issues.apache.org/jira/browse/SOLR-8185
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8185.patch
>
>
> Adds support for operations on stream metrics.
> With this feature one can modify tuple values before applying to the computed 
> metric. There are a lot of use-cases I can see with this - I'll describe one 
> here.
> Imagine you have a RollupStream which is computing the average over some 
> field but you cannot be sure that all documents have a value for that field, 
> ie the value is null. When the value is null you want to treat it as a 0. 
> With this feature you can accomplish that like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   count(*),
> )
> {code}
> The operations are applied to the tuple for each metric in the stream which 
> means you perform different operations on different metrics without being 
> impacted by operations on other metrics. 
> Adding to our previous example, imagine you want to also get the min of a 
> field but do not consider null values.
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   min(a_i),
>   count(*),
> )
> {code}
> Also, the tuple is not modified for streams that might wrap this one. Ie, the 
> only thing that sees the applied operation is that particular metric. If you 
> want to apply operations for wrapping streams you can still achieve that with 
> the SelectStream (SOLR-7669).
> One feature I'm investigating but this patch DOES NOT add is the ability to 
> assign names to the resulting metric value. For example, to allow for 
> something like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
>   avg(a_i),
>   count(*, as="totalCount"),
> )
> {code}
> Right now that isn't possible because the identifier for each metric would be 
> the same "avg_a_i" and as such both couldn't be returned. It's relatively 
> easy to add but I have to investigate its impact on the SQL and FacetStream 
> areas.
> Depends on SOLR-7669 (SelectStream)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8185) Add operations support to streaming metrics

2015-11-12 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003390#comment-15003390
 ] 

Dennis Gove commented on SOLR-8185:
---

Running into some issues turning the expression into something that would 
perform the expected .equals()

{code}
 avg(a_f, replace(10, withValue=0))
{code}
In this example, what type is 10? Is it a long or a float or a double? The 
field is a float (as noted by the _f) so one would expect 10 to be a float as 
well. However, in converting 10 to some Object that we can call .equals(...) on 
we are not sure what the type is. This has been a persistent problem with this 
patch.

But I think I've come up with something that puts some of the decision making 
in the hands of the expression writer.

{code}
 avg(a_f, replace(10f, withValue=0f))
{code}
In this case the value can only be converted to a float so it will be created 
as a float object.

However, to add this new requirement on the expression creator I want to take a 
deeper look at what this might impact and make sure the documentation is very 
clear. If a user doesn't do the correct thing (gives us 10 instead of 10f) and 
the value in the tuple is a float then float.equals(long) == false every single 
time.

Anyway, this note is somewhat of a rant. 

> Add operations support to streaming metrics
> ---
>
> Key: SOLR-8185
> URL: https://issues.apache.org/jira/browse/SOLR-8185
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8185.patch
>
>
> Adds support for operations on stream metrics.
> With this feature one can modify tuple values before applying to the computed 
> metric. There are a lot of use-cases I can see with this - I'll describe one 
> here.
> Imagine you have a RollupStream which is computing the average over some 
> field but you cannot be sure that all documents have a value for that field, 
> ie the value is null. When the value is null you want to treat it as a 0. 
> With this feature you can accomplish that like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   count(*),
> )
> {code}
> The operations are applied to the tuple for each metric in the stream which 
> means you perform different operations on different metrics without being 
> impacted by operations on other metrics. 
> Adding to our previous example, imagine you want to also get the min of a 
> field but do not consider null values.
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0)),
>   min(a_i),
>   count(*),
> )
> {code}
> Also, the tuple is not modified for streams that might wrap this one. Ie, the 
> only thing that sees the applied operation is that particular metric. If you 
> want to apply operations for wrapping streams you can still achieve that with 
> the SelectStream (SOLR-7669).
> One feature I'm investigating but this patch DOES NOT add is the ability to 
> assign names to the resulting metric value. For example, to allow for 
> something like this
> {code}
> rollup(
>   search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"),
>   over=\"a_s\",
>   avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"),
>   avg(a_i),
>   count(*, as="totalCount"),
> )
> {code}
> Right now that isn't possible because the identifier for each metric would be 
> the same "avg_a_i" and as such both couldn't be returned. It's relatively 
> easy to add but I have to investigate its impact on the SQL and FacetStream 
> areas.
> Depends on SOLR-7669 (SelectStream)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-8188:
-

Assignee: Dennis Gove

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-8188.
-
   Resolution: Implemented
Fix Version/s: Trunk

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reopened SOLR-8188:
---

Forgot to attach a slightly modified patch file (rebased off trunk).

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8188:
--
Attachment: SOLR-8188.patch

This is the patch that was applied to trunk.

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-8188.patch, SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-8188.
-
Resolution: Implemented

Still closed

> Add hash style joins to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-8188
> URL: https://issues.apache.org/jira/browse/SOLR-8188
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Minor
> Fix For: Trunk
>
> Attachments: SOLR-8188.patch, SOLR-8188.patch, SOLR-8188.patch
>
>
> Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for 
> optimized joining between sub-streams.
> HashJoinStream is similar to an InnerJoinStream except that it does not 
> insist on any particular order and will read all values from the stream being 
> hashed (hashStream) when open() is called. During read() it will return the 
> next tuple from the stream not being hashed (fullStream) which has at least 
> one matching record in hashStream. It will return a tuple which is the merge 
> of both tuples. If the tuple from the fullStream matches with more than one 
> tuple from the hashStream then calling read() will return the merge with the 
> next matching tuple. The order of the resulting stream is the order of the 
> fullStream.
> OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in 
> that a tuple from fullStream will be returned even if it doesn't have a 
> matching record in hashStream. All other pieces are identical.
> In expression form
> {code}
> hashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> {code}
> outerHashJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...),
>   on="fieldA, fieldB"
> )
> {code}
> As you can see the hashStream is named parameter which makes it very clear 
> which stream should be hashed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API

2015-11-11 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7669:
--
Attachment: SOLR-7669.patch

Rebased against trunk.

> Add SelectStream to Streaming API
> -
>
> Key: SOLR-7669
> URL: https://issues.apache.org/jira/browse/SOLR-7669
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, 
> SOLR-7669.patch, SOLR-7669.patch
>
>
> Adds a new stream called SelectStream which can be used for two purpose.
>  1. Limit the set of fields included in an outgoing tuple to remove unwanted 
> fields
>  2. Provide aliases for fields. With this it acts as an alternative to the 
> CloudSolrStream's 'aliases' option.
>  For example, in a simple case
> {code}
> select(
>   id, 
>   fieldA_i as fieldA, 
>   fieldB_s as fieldB,
>   search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, 
> fieldB_s asc, id asc")
> )
> {code}
> This can also be used as part of complex expressions to help keep track of 
> what is being worked on. This is particularly useful when merging/joining 
> multiple collections which share field names. For example, the following 
> results in a set of tuples including only the fields id, left.ident, and 
> right.ident even though the total set of fields required to perform the 
> search and join is much larger than just those three fields.
> {code}
> select(
>   id, left.ident, right.ident,
>   innerJoin(
> select(
>   id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident,
>   search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", 
> sort="join1_i asc, join2_s asc, id asc")
> ),
> select(
>   join3_i as right.join1, join2_s as right.join2, ident_s as right.ident,
>   search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", 
> sort="join3_i asc, join2_s asc"),
> ),
> on="left.join1=right.join1, left.join2=right.join2"
>   )
> )
> {code}
> This depends on SOLR-7584.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-8268) Makes StatsStream implement Expressible interface

2015-11-10 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-8268.
-
Resolution: Implemented

> Makes StatsStream implement Expressible interface
> -
>
> Key: SOLR-8268
> URL: https://issues.apache.org/jira/browse/SOLR-8268
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Trivial
>  Labels: Streaming
> Fix For: Trunk
>
> Attachments: SOLR-8268.patch
>
>
> Adds expression support to the Stats stream. With this it will now be 
> possible to express an stats stream as
> {code}
> stats(
>   collection1, q=*:*, fl="fieldA,fieldB,fieldInt,fieldFloat",
>   sum(fieldInt), 
>   sum(fieldFloat), 
>   min(fieldInt), 
>   min(fieldFloat), 
>   max(fieldInt), 
>   max(fieldFloat), 
>   avg(fieldInt), 
>   avg(fieldFloat), 
>   count(*)
> )
> {code}
> You can collect stats on any supported metric and use full metric features. 
> Ie, when SOLR-8185 is committed you can then include operations in the 
> metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions

2015-11-10 Thread Dennis Gove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7584:
--
Attachment: SOLR-7584.patch

Rebased against current trunk. A couple of comment changes. All tests pass.

> Add Joins to the Streaming API and Streaming Expressions
> 
>
> Key: SOLR-7584
> URL: https://issues.apache.org/jira/browse/SOLR-7584
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: Streaming
> Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, 
> SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch
>
>
> Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the 
> Streaming API to allow for joining between sub-streams.
> At its basic, it would look something like this
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA"
> )
> {code}
> or with multi-field on clauses
> {code}
> innerJoin(
>   search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...),
>   search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...),
>   on="fieldA=fieldA, fieldB=fieldD"
> )
> {code}
> I'd also like to support the option of doing a hash join instead of the 
> default merge join but I haven't yet figured out the best way to express 
> that. I'd like to let the user tell us which sub-stream should be hashed (the 
> least-cost one).
> Also, I've been thinking about field aliasing and might want to add a 
> SelectStream which serves the purpose of allowing us to limit the fields 
> coming out and rename fields.
> Depends on SOLR-7554



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   >