[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch Added ability to turn an ExpressibleStream into a StreamExpression. Combined with the already existing ability to turn a StreamExpression into a string, we can now go back and forth from string -- stream. This will allow us to modify ParallelStream to pass along the string expression of the stream it wants to parallelize. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510208#comment-14510208 ] Dennis Gove commented on SOLR-7377: --- Well that makes it much clearer for me. I'm sorry for deleting all the older patches. In my reading of the How to Contribute I was under the impression that uploading a new patch (with same name) would just replace the old one. Each time I uploaded a new version I would see the previous one still there and figured I'd done something wrong so went ahead and deleted the old version. It didn't occur to me that the old versions would still stay there but just be greyed out. I won't be deleting the old versions going forward. Thanks for clearing that up for me! The size of the patch is a function of a bit of package refactoring in the org.apache.solr.client.solrj.io package. This seems to be resulting in the diff showing a bunch of deleted/added files. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: (was: SOLR-7377.patch) SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: (was: SOLR-7377.patch) SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512791#comment-14512791 ] Dennis Gove edited comment on SOLR-7377 at 4/26/15 12:29 AM: - Made ParallelStream an ExpressibleStream and modified the StreamHandler to accept stream expression strings instead of bytecode. Refactored operation and operand into functionName and parameter. And refactored all required references and tangentially related variable/class names. Renamed EqualToComparator to FieldComparator to be a little more descriptive in the name. Added ability to support pluggable streams by making it something you can configure in solrconfig.xml. All stream-related tests pass. At this point I'd consider this functionally complete. was (Author: dpgove): Made ParallelStream an ExpressibleStream and modified the StreamHandler to accept stream expression strings instead of bytecode. Refactored operation and operand into functionName and parameter. And refactored all required references and tangentially related variable/class names. Renamed EqualToComparator to FieldComparator to be a little more descriptive in the name. Added ability to support pluggable streams by making it something you can configure in solrconfig.xml. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch Made ParallelStream an ExpressibleStream and modified the StreamHandler to accept stream expression strings instead of bytecode. Refactored operation and operand into functionName and parameter. And refactored all required references and tangentially related variable/class names. Renamed EqualToComparator to FieldComparator to be a little more descriptive in the name. Added ability to support pluggable streams by making it something you can configure in solrconfig.xml. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512165#comment-14512165 ] Dennis Gove commented on SOLR-7377: --- I was thinking that all comparators, no matter their implemented comparison logic, return one of three basic values when comparing A and B. 1. A and B are logically equal to each other 2. A is logically before B 3. A is logically after B The implemented comparison logic is then wholly dependent on what one might be intending to use the comparator for. For example, EqualToComparator's implemented comparison logic will return that A and B are logically equal if they are in fact equal to each other. Its logically before/after response depends on the sort order (ascending or descending) but is basically deciding if A is less than B or if A is greater than B. One could, if they wanted to, create a comparator returning that two dates are logically equal to each other if they occur within the same week. Or a comparator returning that two numbers are logically equal if their values are within the same logarithmic order of magnitude. So on and so forth. My thinking is that comparators determine the logical comparison and make no assumption on what that implemented logic is. This leaves open the possibility of implementing other comparators for given situations as they arise. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503160#comment-14503160 ] Dennis Gove edited comment on SOLR-7377 at 4/20/15 4:46 PM: Updated the patch based on some additional items I wanted to include in this. Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. We may want to revisit this additional dependency. Guava is being used for some basic string checks (to ensure operations only include supported characters) and this logic could be coded up if we want to avoid added a dependency. {code} dependency org=com.google.guava name=guava rev=${/com.google.guava/guava} conf=compile/ {code} was (Author: dpgove): Updated the patch based on some additional items I wanted to include in this. Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. We may want to revisit this additional dependency. Guava is being used for some basic string checks (to ensure operations only include supported characters) and this logic could be coded up if we want to avoid added a dependency. dependency org=com.google.guava name=guava rev=${/com.google.guava/guava} conf=compile/ SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch Updated the patch based on some additional items I wanted to include in this. Note that this patch adds a dependency on guava in the solr/solrj/ivy.xml file. We may want to revisit this additional dependency. Guava is being used for some basic string checks (to ensure operations only include supported characters) and this logic could be coded up if we want to avoid added a dependency. dependency org=com.google.guava name=guava rev=${/com.google.guava/guava} conf=compile/ SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: (was: SOLR-7377.patch) SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch Now allows a search expression to include a zkHost (though does not require it). Improved performance of EqualToComparator by moving some branching logic into the constructor and creating a lambda for the actual comparison. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch Fixed a bug in CloudSolrStream when handling aliases. When filtering out the stream-only parameters from those that need to be passed to SOLR for query I was checking for parameter name alias when I should have been checking for aliases. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525364#comment-14525364 ] Dennis Gove commented on SOLR-7377: --- I don't see the ExpressionRunner in the patch - am I missing it somewhere? Also, I noticed ParallelStream lines 94-100 have some System.out.println lines. I suspect you intended to remove those. Tests look good. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522440#comment-14522440 ] Dennis Gove commented on SOLR-7377: --- I'm not totally against doing that but I feel like the refactoring is a required piece of this patch. I could, however, create a new ticket with just the refactoring and then make this one depend on that one. I am worried that such a ticket might look like unnecessary refactoring. Without the expression stuff added here I think the streaming stuff has a reasonable home in org.apache.solr.client.solrj.io. That said, I certainly understand the benefit of smaller patches. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Comment: was deleted (was: I'm not totally against doing that but I feel like the refactoring is a required piece of this patch. I could, however, create a new ticket with just the refactoring and then make this one depend on that one. I am worried that such a ticket might look like unnecessary refactoring. Without the expression stuff added here I think the streaming stuff has a reasonable home in org.apache.solr.client.solrj.io. That said, I certainly understand the benefit of smaller patches.) SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522439#comment-14522439 ] Dennis Gove commented on SOLR-7377: --- I'm not totally against doing that but I feel like the refactoring is a required piece of this patch. I could, however, create a new ticket with just the refactoring and then make this one depend on that one. I am worried that such a ticket might look like unnecessary refactoring. Without the expression stuff added here I think the streaming stuff has a reasonable home in org.apache.solr.client.solrj.io. That said, I certainly understand the benefit of smaller patches. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528339#comment-14528339 ] Dennis Gove commented on SOLR-7377: --- I believe I've found a bug in FieldComparator. I don't have time to create a new patch right now, but the bug is not checking for null on the field before calling compare. Fixed version is below {code:java} private void assignComparator(){ if(ComparatorOrder.DESCENDING == order){ // What black magic is this type intersection?? // Because this class is serializable we need to make sure the lambda is also serializable. // This can be done by providing this type intersection on the definition of the lambda. // Why not do it in the lambda interface? Functional Interfaces don't allow extends clauses comparator = (ComparatorLambda Serializable)(leftTuple, rightTuple) - { Comparable leftComp = (Comparable)leftTuple.get(leftField); Comparable rightComp = (Comparable)rightTuple.get(rightField); if(null == leftComp){ return -1; } if(null == rightComp){ return 1; } return rightComp.compareTo(leftComp); }; } else{ // See above for black magic reasoning. comparator = (ComparatorLambda Serializable)(leftTuple, rightTuple) - { Comparable leftComp = (Comparable)leftTuple.get(leftField); Comparable rightComp = (Comparable)rightTuple.get(rightField); if(null == leftComp){ return -1; } if(null == rightComp){ return 1; } return leftComp.compareTo(rightComp); }; } } {code} SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7513) Add Equalitors to Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7513: -- Attachment: SOLR-7513.patch Add Equalitors to Streaming Expressions --- Key: SOLR-7513 URL: https://issues.apache.org/jira/browse/SOLR-7513 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7513.patch Right now all streams use the ComparatorTuple interface to compare tuples. The Comparator interface will tell you if tupleA is before, after, or equal to tupleB. This is great for most streams as they use this logic when combining multiple streams together. However, some streams only care about the equality of two tuples and the less/greater than logic is unnecessary. This depends on SOLR-7377. This patch is to introduce a new interface into streaming expressions called EqualitorTuple which will return if two tuples are equal. The benefit here is that the expressions for streams using Equalitor instead of Comparator can omit the ordering part. {code} unique(somestream, over=fieldA asc, fieldB desc) {code} can become {code} unique(somestream, over=fieldA,fieldB) {code} The added benefit is that this will set us up with simplier expressions for joins (hash, merge, inner, outer, etc...) as those only care about equality. By adding this as an interface we make no assumptions about what it means to be equal, just that some implementation needs to exist adhering to the EqualitorTuple interface which will determine if two tuples are logically equal. We do define at least one concrete class which checks for equality but that does not preclude others from adding additional concrete classes with their own logic in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7513) Add Equalitors to Streaming Expressions
Dennis Gove created SOLR-7513: - Summary: Add Equalitors to Streaming Expressions Key: SOLR-7513 URL: https://issues.apache.org/jira/browse/SOLR-7513 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Right now all streams use the ComparatorTuple interface to compare tuples. The Comparator interface will tell you if tupleA is before, after, or equal to tupleB. This is great for most streams as they use this logic when combining multiple streams together. However, some streams only care about the equality of two tuples and the less/greater than logic is unnecessary. This depends on SOLR-7377. This patch is to introduce a new interface into streaming expressions called EqualitorTuple which will return if two tuples are equal. The benefit here is that the expressions for streams using Equalitor instead of Comparator can omit the ordering part. {code} unique(somestream, over=fieldA asc, fieldB desc) {code} can become {code} unique(somestream, over=fieldA,fieldB) {code} The added benefit is that this will set us up with simplier expressions for joins (hash, merge, inner, outer, etc...) as those only care about equality. By adding this as an interface we make no assumptions about what it means to be equal, just that some implementation needs to exist adhering to the EqualitorTuple interface which will determine if two tuples are logically equal. We do define at least one concrete class which checks for equality but that does not preclude others from adding additional concrete classes with their own logic in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch Updated patch with a few changes. FieldComparator and StreamComparator have been collapsed into a single class StreamComparator. There was no need for a separate abstract class. Added null checks in StreamComparator. For now if both are null then they will evaluate to equal. We can add a later enhancement under a new ticket to make that configurable. Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into interface Expressible. They defined the same interface and there's no reason to have separate interfaces for them. Passes precommit checks. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7513) Add Equalitors to Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537412#comment-14537412 ] Dennis Gove commented on SOLR-7513: --- I'm pretty sure I want to change this to instead use Java's BiPredicate interface https://docs.oracle.com/javase/8/docs/api/java/util/function/BiPredicate.html Add Equalitors to Streaming Expressions --- Key: SOLR-7513 URL: https://issues.apache.org/jira/browse/SOLR-7513 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7513.patch Right now all streams use the ComparatorTuple interface to compare tuples. The Comparator interface will tell you if tupleA is before, after, or equal to tupleB. This is great for most streams as they use this logic when combining multiple streams together. However, some streams only care about the equality of two tuples and the less/greater than logic is unnecessary. This depends on SOLR-7377. This patch is to introduce a new interface into streaming expressions called EqualitorTuple which will return if two tuples are equal. The benefit here is that the expressions for streams using Equalitor instead of Comparator can omit the ordering part. {code} unique(somestream, over=fieldA asc, fieldB desc) {code} can become {code} unique(somestream, over=fieldA,fieldB) {code} The added benefit is that this will set us up with simplier expressions for joins (hash, merge, inner, outer, etc...) as those only care about equality. By adding this as an interface we make no assumptions about what it means to be equal, just that some implementation needs to exist adhering to the EqualitorTuple interface which will determine if two tuples are logically equal. We do define at least one concrete class which checks for equality but that does not preclude others from adding additional concrete classes with their own logic in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529210#comment-14529210 ] Dennis Gove commented on SOLR-7377: --- We may want to make that configurable in solrconfig.xml. Also, should this respect the already configurable setting of whether nulls propagate to the start or end of result sets? SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529233#comment-14529233 ] Dennis Gove commented on SOLR-7377: --- I do agree with you that two nulls should compare equal (I should've included that in my original fix), but I have seen a number of situations where users have balked at the decision (outside of solr). That said, I think it's reasonable to insist that two nulls evaluate to equal. (I've never agreed with the case that they wouldn't). Were we to make it a user-overridable thing then I do like the idea to make it a query-time decision. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7524) Make Streaming Expressions Java 7 Compatible
Dennis Gove created SOLR-7524: - Summary: Make Streaming Expressions Java 7 Compatible Key: SOLR-7524 URL: https://issues.apache.org/jira/browse/SOLR-7524 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Priority: Trivial Fix For: 5.2 SOLR-7377 added Streaming Expressions to trunk. It uses, by choice and not necessity, some features of Java 8. This patch is to make minor changes to three files to make Streaming Expressions compatible with Java 7 and therefor able to be included in version 5.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7524) Make Streaming Expressions Java 7 Compatible
[ https://issues.apache.org/jira/browse/SOLR-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7524: -- Attachment: SOLR-7524.patch Make Streaming Expressions Java 7 Compatible Key: SOLR-7524 URL: https://issues.apache.org/jira/browse/SOLR-7524 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Priority: Trivial Fix For: 5.2 Attachments: SOLR-7524.patch SOLR-7377 added Streaming Expressions to trunk. It uses, by choice and not necessity, some features of Java 8. This patch is to make minor changes to three files to make Streaming Expressions compatible with Java 7 and therefor able to be included in version 5.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528228#comment-14528228 ] Dennis Gove commented on SOLR-7377: --- This looks good, I think. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528229#comment-14528229 ] Dennis Gove commented on SOLR-7377: --- This looks good, I think. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Comment: was deleted (was: This looks good, I think.) SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534432#comment-14534432 ] Dennis Gove commented on SOLR-7377: --- I can't figure out how I screwed that up - my only thought is that when I pulled down your latest with curl it failed and I didn't notice. Internet access on Amtrak trains can be splotchy. My apologies, I'll be more careful in the future. Let's forget my latest patch - I'll add those in a new smaller one after this is in trunk SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch, SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-7082. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7554) Add checks in Streams for incoming stream order
[ https://issues.apache.org/jira/browse/SOLR-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7554: -- Attachment: SOLR-7554.patch Add checks in Streams for incoming stream order --- Key: SOLR-7554 URL: https://issues.apache.org/jira/browse/SOLR-7554 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: Trunk, 5.2 Reporter: Dennis Gove Priority: Minor Labels: streaming Fix For: Trunk, 5.2 Attachments: SOLR-7554.patch Most Streams built on top of other streams require that their incoming stream(s) be ordered in a complimentary way to how this stream is expected to output its results. For example, if a MergeStream is merging two streams on fieldA asc, fieldB desc, then both its incoming streams must be ordered in a similar way. That said, the incoming stream could be ordered more strictly, ie fieldA asc, fieldB desc, fieldC asc but as long as the the comparator used in the MergeStream can be derived from the incoming stream's comparator then we are good to go. Some comparator A can be derived from some other comparator B iff the fields and their order in A is equal to the first fields and their order in B. For example, fieldA asc, fieldB dec can be derived from fieldA asc, fieldB desc, fieldC asc, fieldD asc but cannot be derived from field A asc. This patch is to add this validation support. It requires changes to Comparators, Equalitors, most Streams, and related tests. It adds a way to compare Comparators and Equalitors and in the end is one more required piece before we can add support for Join streams. It is dependent on SOLR-7513 and SOLR-7528. Other dependencies it has have already been committed to trunk and the 5.2 branch. It does not change any interfaces to code already released (5.1 and below). It does change interfaces to code in trunk and 5.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7554) Add checks in Streams for incoming stream order
Dennis Gove created SOLR-7554: - Summary: Add checks in Streams for incoming stream order Key: SOLR-7554 URL: https://issues.apache.org/jira/browse/SOLR-7554 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: Trunk, 5.2 Reporter: Dennis Gove Priority: Minor Fix For: Trunk, 5.2 Most Streams built on top of other streams require that their incoming stream(s) be ordered in a complimentary way to how this stream is expected to output its results. For example, if a MergeStream is merging two streams on fieldA asc, fieldB desc, then both its incoming streams must be ordered in a similar way. That said, the incoming stream could be ordered more strictly, ie fieldA asc, fieldB desc, fieldC asc but as long as the the comparator used in the MergeStream can be derived from the incoming stream's comparator then we are good to go. Some comparator A can be derived from some other comparator B iff the fields and their order in A is equal to the first fields and their order in B. For example, fieldA asc, fieldB dec can be derived from fieldA asc, fieldB desc, fieldC asc, fieldD asc but cannot be derived from field A asc. This patch is to add this validation support. It requires changes to Comparators, Equalitors, most Streams, and related tests. It adds a way to compare Comparators and Equalitors and in the end is one more required piece before we can add support for Join streams. It is dependent on SOLR-7513 and SOLR-7528. Other dependencies it has have already been committed to trunk and the 5.2 branch. It does not change any interfaces to code already released (5.1 and below). It does change interfaces to code in trunk and 5.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7548) CloudSolrStream Limits Max Results to rows Param
Dennis Gove created SOLR-7548: - Summary: CloudSolrStream Limits Max Results to rows Param Key: SOLR-7548 URL: https://issues.apache.org/jira/browse/SOLR-7548 Project: Solr Issue Type: Bug Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Fix For: Trunk The CloudSolrStream stream class accepts a set of params to be passed to the standard query handler. If the provided params doesn't include rows=N then maximum # of records returned by this stream is the configured default rows value (generally 10, but perhaps more). As CloudSolrStream would generally be the first part of a larger set of stream expressions it seems counterintuitive to limit the first set by this value. This ticket is to address this so that either we set pass a param of rows=MAX where MAX is the max value we can pass (max int or max long I suppose) or make it so that default value is ignored when in a streaming context. Example: Imagine we have a collection people with 90 documents in it The following query would return at most 10 documents (assuming 10 is the default) {code} search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc) {code} The following query would return all documents {code} search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc,rows=100) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
[ https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544713#comment-14544713 ] Dennis Gove edited comment on SOLR-7543 at 5/15/15 1:20 AM: For interface/semantics, I think this might be able to benefit from the Expression stuff recently added for streams (SOLR-7377). With that, you could do something like {code} graph(root=search(collection1, q=some query, fl=used fields), traverse=search(collection1, q=some dynamic query,fl=used fields), on=parent.field=child.field, maxDepth=5, returnRoot=true, returnOnlyLeaf=false) {code} This would also allow you to do other things like make use of stream merging, uniquing, etc Would even allow for tree traversal across multiple collections. was (Author: dpgove): For interface/semantics, I think this might be able to benefit from the Expression stuff recently added for streams (SOLR-7377). With that, you could do something like {code} graph(root=search(collection1, q=some query, fl=used fields), traverse=search(collection1, q=some dynamic query,fl=used fields), on=parent.field=child.field, maxDepth=5, returnRoot=true, returnOnlyLeaf=false) {code} This would also allow you to do other things like make use of stream merging, uniquing, etc Would even allow for tree traversal across collections. Create GraphQuery that allows graph traversal as a query operator. -- Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a startQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. Here's a bit more of a description of the parameters for the query / graph traversal: q - the initial start query that identifies the universe of documents to start traversal from. fromField - the field name that contains the node id toField - the name of the field that contains the edge id(s). traversalFilter - this is an additional query that can be supplied to limit the scope of graph traversal to just the edges that satisfy the traversalFilter query. maxDepth - integer specifying how deep the breadth first search should go. returnStartNodes - boolean to determine if the documents that matched the original q should be returned as part of the graph. onlyLeafNodes - boolean that filters the graph query to only return documents/nodes that have no edges. We identify a set of documents with q as any arbitrary lucene query. It will collect the values in the fromField, create an OR query with those values , optionally apply an additional constraint from the traversalFilter and walk the result set until no new edges are detected. Traversal can also be stopped at N hops away as defined with the maxDepth. This is a BFS (Breadth First Search) algorithm. Cycle detection is done by not revisiting the same document for edge extraction. This query operator does not keep track of how you arrived at the document, but only that the traversal did arrive at the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
[ https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544713#comment-14544713 ] Dennis Gove commented on SOLR-7543: --- For interface/semantics, I think this might be able to benefit from the Expression stuff recently added for streams (SOLR-7377). With that, you could do something like {code} graph(root=search(collection1, q=some query, fl=used fields), traverse=search(collection1, q=some dynamic query,fl=used fields), on=parent.field=child.field, maxDepth=5, returnRoot=true, returnOnlyLeaf=false) {code} This would also allow you to do other things like make use of stream merging, uniquing, etc Create GraphQuery that allows graph traversal as a query operator. -- Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a startQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. Here's a bit more of a description of the parameters for the query / graph traversal: q - the initial start query that identifies the universe of documents to start traversal from. fromField - the field name that contains the node id toField - the name of the field that contains the edge id(s). traversalFilter - this is an additional query that can be supplied to limit the scope of graph traversal to just the edges that satisfy the traversalFilter query. maxDepth - integer specifying how deep the breadth first search should go. returnStartNodes - boolean to determine if the documents that matched the original q should be returned as part of the graph. onlyLeafNodes - boolean that filters the graph query to only return documents/nodes that have no edges. We identify a set of documents with q as any arbitrary lucene query. It will collect the values in the fromField, create an OR query with those values , optionally apply an additional constraint from the traversalFilter and walk the result set until no new edges are detected. Traversal can also be stopped at N hops away as defined with the maxDepth. This is a BFS (Breadth First Search) algorithm. Cycle detection is done by not revisiting the same document for edge extraction. This query operator does not keep track of how you arrived at the document, but only that the traversal did arrive at the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
[ https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544713#comment-14544713 ] Dennis Gove edited comment on SOLR-7543 at 5/15/15 1:19 AM: For interface/semantics, I think this might be able to benefit from the Expression stuff recently added for streams (SOLR-7377). With that, you could do something like {code} graph(root=search(collection1, q=some query, fl=used fields), traverse=search(collection1, q=some dynamic query,fl=used fields), on=parent.field=child.field, maxDepth=5, returnRoot=true, returnOnlyLeaf=false) {code} This would also allow you to do other things like make use of stream merging, uniquing, etc Would even allow for tree traversal across collections. was (Author: dpgove): For interface/semantics, I think this might be able to benefit from the Expression stuff recently added for streams (SOLR-7377). With that, you could do something like {code} graph(root=search(collection1, q=some query, fl=used fields), traverse=search(collection1, q=some dynamic query,fl=used fields), on=parent.field=child.field, maxDepth=5, returnRoot=true, returnOnlyLeaf=false) {code} This would also allow you to do other things like make use of stream merging, uniquing, etc Create GraphQuery that allows graph traversal as a query operator. -- Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a startQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. Here's a bit more of a description of the parameters for the query / graph traversal: q - the initial start query that identifies the universe of documents to start traversal from. fromField - the field name that contains the node id toField - the name of the field that contains the edge id(s). traversalFilter - this is an additional query that can be supplied to limit the scope of graph traversal to just the edges that satisfy the traversalFilter query. maxDepth - integer specifying how deep the breadth first search should go. returnStartNodes - boolean to determine if the documents that matched the original q should be returned as part of the graph. onlyLeafNodes - boolean that filters the graph query to only return documents/nodes that have no edges. We identify a set of documents with q as any arbitrary lucene query. It will collect the values in the fromField, create an OR query with those values , optionally apply an additional constraint from the traversalFilter and walk the result set until no new edges are detected. Traversal can also be stopped at N hops away as defined with the maxDepth. This is a BFS (Breadth First Search) algorithm. Cycle detection is done by not revisiting the same document for edge extraction. This query operator does not keep track of how you arrived at the document, but only that the traversal did arrive at the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7548) CloudSolrStream Limits Max Results to rows Param
[ https://issues.apache.org/jira/browse/SOLR-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545472#comment-14545472 ] Dennis Gove commented on SOLR-7548: --- That makes sense. At the moment how would I make that change to use export? Is it in the solrconfig.xml or as part of the incoming query? CloudSolrStream Limits Max Results to rows Param Key: SOLR-7548 URL: https://issues.apache.org/jira/browse/SOLR-7548 Project: Solr Issue Type: Bug Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Labels: Streaming Fix For: Trunk The CloudSolrStream stream class accepts a set of params to be passed to the standard query handler. If the provided params doesn't include rows=N then maximum # of records returned by this stream is the configured default rows value (generally 10, but perhaps more). As CloudSolrStream would generally be the first part of a larger set of stream expressions it seems counterintuitive to limit the first set by this value. This ticket is to address this so that either we set pass a param of rows=MAX where MAX is the max value we can pass (max int or max long I suppose) or make it so that default value is ignored when in a streaming context. Example: Imagine we have a collection people with 90 documents in it The following query would return at most 10 documents (assuming 10 is the default) {code} search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc) {code} The following query would return all documents {code} search(people,q=*:*,fl=id,name_s,gender_s,nick_s,sort=name_s desc,rows=100) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
[ https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545604#comment-14545604 ] Dennis Gove commented on SOLR-7543: --- This might be crazy, but would allow a little more flexibility on what to return return=Root | Leaf = would return documents that are either in the root, or a leaf. return=Root Leaf = would return documents that are in root and are leafs themselves (no children) return=Leaf | Children4 = would return documents that are leaf or have more than 4 children. Create GraphQuery that allows graph traversal as a query operator. -- Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a startQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. Here's a bit more of a description of the parameters for the query / graph traversal: q - the initial start query that identifies the universe of documents to start traversal from. fromField - the field name that contains the node id toField - the name of the field that contains the edge id(s). traversalFilter - this is an additional query that can be supplied to limit the scope of graph traversal to just the edges that satisfy the traversalFilter query. maxDepth - integer specifying how deep the breadth first search should go. returnStartNodes - boolean to determine if the documents that matched the original q should be returned as part of the graph. onlyLeafNodes - boolean that filters the graph query to only return documents/nodes that have no edges. We identify a set of documents with q as any arbitrary lucene query. It will collect the values in the fromField, create an OR query with those values , optionally apply an additional constraint from the traversalFilter and walk the result set until no new edges are detected. Traversal can also be stopped at N hops away as defined with the maxDepth. This is a BFS (Breadth First Search) algorithm. Cycle detection is done by not revisiting the same document for edge extraction. This query operator does not keep track of how you arrived at the document, but only that the traversal did arrive at the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
[ https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546062#comment-14546062 ] Dennis Gove commented on SOLR-7543: --- I'm with on the wanting to keep the memory usage as low as possible - I thought maybe you had that info hanging around already. In either case, I think this syntax might lower the bar to entry for usage, especially if people are already using streaming aggregation for other things. Create GraphQuery that allows graph traversal as a query operator. -- Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a startQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. Here's a bit more of a description of the parameters for the query / graph traversal: q - the initial start query that identifies the universe of documents to start traversal from. fromField - the field name that contains the node id toField - the name of the field that contains the edge id(s). traversalFilter - this is an additional query that can be supplied to limit the scope of graph traversal to just the edges that satisfy the traversalFilter query. maxDepth - integer specifying how deep the breadth first search should go. returnStartNodes - boolean to determine if the documents that matched the original q should be returned as part of the graph. onlyLeafNodes - boolean that filters the graph query to only return documents/nodes that have no edges. We identify a set of documents with q as any arbitrary lucene query. It will collect the values in the fromField, create an OR query with those values , optionally apply an additional constraint from the traversalFilter and walk the result set until no new edges are detected. Traversal can also be stopped at N hops away as defined with the maxDepth. This is a BFS (Breadth First Search) algorithm. Cycle detection is done by not revisiting the same document for edge extraction. This query operator does not keep track of how you arrived at the document, but only that the traversal did arrive at the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.
[ https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546062#comment-14546062 ] Dennis Gove edited comment on SOLR-7543 at 5/15/15 7:39 PM: I'm with you on wanting to keep the memory usage as low as possible - I thought maybe you had that info hanging around already. In either case, I think this syntax might lower the bar to entry for usage, especially if people are already using streaming aggregation for other things. was (Author: dpgove): I'm with on the wanting to keep the memory usage as low as possible - I thought maybe you had that info hanging around already. In either case, I think this syntax might lower the bar to entry for usage, especially if people are already using streaming aggregation for other things. Create GraphQuery that allows graph traversal as a query operator. -- Key: SOLR-7543 URL: https://issues.apache.org/jira/browse/SOLR-7543 Project: Solr Issue Type: New Feature Components: search Reporter: Kevin Watters Priority: Minor I have a GraphQuery that I implemented a long time back that allows a user to specify a startQuery to identify which documents to start graph traversal from. It then gathers up the edge ids for those documents , optionally applies an additional filter. The query is then re-executed continually until no new edge ids are identified. I am currently hosting this code up at https://github.com/kwatters/solrgraph and I would like to work with the community to get some feedback and ultimately get it committed back in as a lucene query. Here's a bit more of a description of the parameters for the query / graph traversal: q - the initial start query that identifies the universe of documents to start traversal from. fromField - the field name that contains the node id toField - the name of the field that contains the edge id(s). traversalFilter - this is an additional query that can be supplied to limit the scope of graph traversal to just the edges that satisfy the traversalFilter query. maxDepth - integer specifying how deep the breadth first search should go. returnStartNodes - boolean to determine if the documents that matched the original q should be returned as part of the graph. onlyLeafNodes - boolean that filters the graph query to only return documents/nodes that have no edges. We identify a set of documents with q as any arbitrary lucene query. It will collect the values in the fromField, create an OR query with those values , optionally apply an additional constraint from the traversalFilter and walk the result set until no new edges are detected. Traversal can also be stopped at N hops away as defined with the maxDepth. This is a BFS (Breadth First Search) algorithm. Cycle detection is done by not revisiting the same document for edge extraction. This query operator does not keep track of how you arrived at the document, but only that the traversal did arrive at the document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7528) Simplify Interfaces used in Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7528: -- Attachment: SOLR-7528.patch Simplify Interfaces used in Streaming Expressions - Key: SOLR-7528 URL: https://issues.apache.org/jira/browse/SOLR-7528 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: Trunk, 5.2 Reporter: Dennis Gove Priority: Minor Fix For: Trunk, 5.2 Attachments: SOLR-7528.patch FieldComparator and StreamComparator have been collapsed into a single class StreamComparator. There was no need for a separate abstract class. Added null checks in StreamComparator. For now if both are null then they will evaluate to equal. We can add a later enhancement under a new ticket to make that configurable. Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into interface Expressible. They defined the same interface and there's no reason to have separate interfaces for them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7513) Add Equalitors to Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7513: -- Attachment: SOLR-7513.patch Modified Equalitor interface to more closely mirror Java 8's BiPredicate. I'm not using BiPredicate because this should be back-ported into 5.2 and as such needs to be Java 7 compatible. Depends on SOLR-7377, SOLR-7524, and SOLR-7528. Add Equalitors to Streaming Expressions --- Key: SOLR-7513 URL: https://issues.apache.org/jira/browse/SOLR-7513 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7513.patch, SOLR-7513.patch Right now all streams use the ComparatorTuple interface to compare tuples. The Comparator interface will tell you if tupleA is before, after, or equal to tupleB. This is great for most streams as they use this logic when combining multiple streams together. However, some streams only care about the equality of two tuples and the less/greater than logic is unnecessary. This depends on SOLR-7377. This patch is to introduce a new interface into streaming expressions called EqualitorTuple which will return if two tuples are equal. The benefit here is that the expressions for streams using Equalitor instead of Comparator can omit the ordering part. {code} unique(somestream, over=fieldA asc, fieldB desc) {code} can become {code} unique(somestream, over=fieldA,fieldB) {code} The added benefit is that this will set us up with simplier expressions for joins (hash, merge, inner, outer, etc...) as those only care about equality. By adding this as an interface we make no assumptions about what it means to be equal, just that some implementation needs to exist adhering to the EqualitorTuple interface which will determine if two tuples are logically equal. We do define at least one concrete class which checks for equality but that does not preclude others from adding additional concrete classes with their own logic in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7528) Simplify Interfaces used in Streaming Expressions
Dennis Gove created SOLR-7528: - Summary: Simplify Interfaces used in Streaming Expressions Key: SOLR-7528 URL: https://issues.apache.org/jira/browse/SOLR-7528 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: Trunk, 5.2 Reporter: Dennis Gove Priority: Minor Fix For: Trunk, 5.2 FieldComparator and StreamComparator have been collapsed into a single class StreamComparator. There was no need for a separate abstract class. Added null checks in StreamComparator. For now if both are null then they will evaluate to equal. We can add a later enhancement under a new ticket to make that configurable. Interfaces ExpressibleStream and ExpressibleComparator have been collapsed into interface Expressible. They defined the same interface and there's no reason to have separate interfaces for them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7377) SOLR Streaming Expressions
Dennis Gove created SOLR-7377: - Summary: SOLR Streaming Expressions Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk It would be beneficial to add an expression-based interface to Streaming API described in SOLR-6526. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), over=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Attachment: SOLR-7377.patch First-pass patch. Looking for initial feedback. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-6526. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), over=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7377) SOLR Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7377: -- Description: It would be beneficial to add an expression-based interface to Streaming API described in SOLR-6526. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. was: It would be beneficial to add an expression-based interface to Streaming API described in SOLR-6526. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), over=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams as arguments to another stream) 4. Positional parameters The main goal here is to make streaming as accessible as possible and define a syntax for running complex queries across large distributed systems. SOLR Streaming Expressions -- Key: SOLR-7377 URL: https://issues.apache.org/jira/browse/SOLR-7377 Project: Solr Issue Type: Improvement Components: clients - java Reporter: Dennis Gove Priority: Minor Fix For: Trunk Attachments: SOLR-7377.patch It would be beneficial to add an expression-based interface to Streaming API described in SOLR-6526. Right now that API requires streaming requests to come in from clients as serialized bytecode of the streaming classes. The suggestion here is to support string expressions which describe the streaming operations the client wishes to perform. {code:java} search(collection1, q=*:*, fl=id,fieldA,fieldB, sort=fieldA asc) {code} With this syntax in mind, one can now express arbitrarily complex stream queries with a single string. {code:java} // merge two distinct searches together on common fields merge( search(collection1, q=id:(0 3 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection2, q=id:(1 2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc, a_s asc) // find top 20 unique records of a search top( n=20, unique( search(collection1, q=*:*, fl=id,a_s,a_i,a_f, sort=a_f desc), over=a_f desc), sort=a_f desc) {code} The syntax would support 1. Configurable expression names (eg. via solrconfig.xml one can map unique to a class implementing a Unique stream class) This allows users to build their own streams and use as they wish. 2. Named parameters (of both simple and expression types) 3. Unnamed, type-matched parameters (to support requiring N streams
[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr
[ https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490733#comment-14490733 ] Dennis Gove commented on SOLR-7275: --- I like this concept but I think the response can be expanded to add a bit more functionality. It would be nice if the pluggable security layer could respond in such a way as to not wholly reject a request but to instead restrict what is returned from a request. It could accomplish this by providing additional filters to apply to a request. {code} public class SolrAuthorizationResponse { boolean authorized; String additionalFilterQuery; ... } {code} By adding additionalFilterQuery, this would give the security layer an opportunity to say, yup, you're authorized but you can't see records matching this filter or yup, you're authorized but you can only see records also matching this filter. It provides a way to add fine-grained control of data access but keep that control completely outside of SOLR (as it would live in the pluggable security layer). Additionally, it allows the security layer to add fine-grained control **without notifying the user they are being restricted** as this lives wholly in the SOLR --- security layer communication. There are times when telling the user their request was rejected due to it returning records they're not privileged to see actually gives the user some information you may not want them to know - the fact that these restricted records even exist. Instead, by adding filters and just not returning records the user isn't privileged for, the user is non-the-wiser that they were restricted at all. Pluggable authorization module in Solr -- Key: SOLR-7275 URL: https://issues.apache.org/jira/browse/SOLR-7275 Project: Solr Issue Type: Sub-task Reporter: Anshum Gupta Assignee: Anshum Gupta Solr needs an interface that makes it easy for different authorization systems to be plugged into it. Here's what I plan on doing: Define an interface {{SolrAuthorizationPlugin}} with one single method {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and return an {{SolrAuthorizationResponse}} object. The object as of now would only contain a single boolean value but in the future could contain more information e.g. ACL for document filtering etc. The reason why we need a context object is so that the plugin doesn't need to understand Solr's capabilities e.g. how to extract the name of the collection or other information from the incoming request as there are multiple ways to specify the target collection for a request. Similarly request type can be specified by {{qt}} or {{/handler_name}}. Flow: Request - SolrDispatchFilter - isAuthorized(context) - Process/Return. {code} public interface SolrAuthorizationPlugin { public SolrAuthorizationResponse isAuthorized(SolrRequestContext context); } {code} {code} public class SolrRequestContext { UserInfo; // Will contain user context from the authentication layer. HTTPRequest request; Enum OperationType; // Correlated with user roles. String[] CollectionsAccessed; String[] FieldsAccessed; String Resource; } {code} {code} public class SolrAuthorizationResponse { boolean authorized; public boolean isAuthorized(); } {code} User Roles: * Admin * Collection Level: * Query * Update * Admin Using this framework, an implementation could be written for specific security systems e.g. Apache Ranger or Sentry. It would keep all the security system specific code out of Solr. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7560) Parallel SQL Support
[ https://issues.apache.org/jira/browse/SOLR-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578750#comment-14578750 ] Dennis Gove commented on SOLR-7560: --- Possible expression syntax for the RollupStream {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This would require making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. Parallel SQL Support Key: SOLR-7560 URL: https://issues.apache.org/jira/browse/SOLR-7560 Project: Solr Issue Type: New Feature Components: clients - java, search Reporter: Joel Bernstein Fix For: 5.3 Attachments: SOLR-7560.patch This ticket provides support for executing *Parallel SQL* queries across SolrCloud collections. The SQL engine will be built on top of the Streaming API (SOLR-7082), which provides support for *parallel relational algebra* and *real-time map-reduce*. Basic design: 1) A new SQLHandler will be added to process SQL requests. The SQL statements will be compiled to live Streaming API objects for parallel execution across SolrCloud worker nodes. 2) SolrCloud collections will be abstracted as *Relational Tables*. 3) The Presto SQL parser will be used to parse the SQL statements. 4) A JDBC thin client will be added as a Solrj client. This ticket will focus on putting the framework in place and providing basic SELECT support and GROUP BY aggregate support. Future releases will build on this framework to provide additional SQL features. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7707) Add StreamExpression Support to RollupStream
Dennis Gove created SOLR-7707: - Summary: Add StreamExpression Support to RollupStream Key: SOLR-7707 URL: https://issues.apache.org/jira/browse/SOLR-7707 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560. Proposed expression syntax for the RollupStream (copied from that ticket) {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7513) Add Equalitors to Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593800#comment-14593800 ] Dennis Gove commented on SOLR-7513: --- I appreciate the help with this, Joel. Thanks! Add Equalitors to Streaming Expressions --- Key: SOLR-7513 URL: https://issues.apache.org/jira/browse/SOLR-7513 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: Trunk Reporter: Dennis Gove Assignee: Joel Bernstein Priority: Minor Fix For: 5.3 Attachments: SOLR-7513.patch, SOLR-7513.patch, SOLR-7513.patch, SOLR-7513.patch Right now all streams use the ComparatorTuple interface to compare tuples. The Comparator interface will tell you if tupleA is before, after, or equal to tupleB. This is great for most streams as they use this logic when combining multiple streams together. However, some streams only care about the equality of two tuples and the less/greater than logic is unnecessary. This depends on SOLR-7377. This patch is to introduce a new interface into streaming expressions called EqualitorTuple which will return if two tuples are equal. The benefit here is that the expressions for streams using Equalitor instead of Comparator can omit the ordering part. {code} unique(somestream, over=fieldA asc, fieldB desc) {code} can become {code} unique(somestream, over=fieldA,fieldB) {code} The added benefit is that this will set us up with simplier expressions for joins (hash, merge, inner, outer, etc...) as those only care about equality. By adding this as an interface we make no assumptions about what it means to be equal, just that some implementation needs to exist adhering to the EqualitorTuple interface which will determine if two tuples are logically equal. We do define at least one concrete class which checks for equality but that does not preclude others from adding additional concrete classes with their own logic in place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7669: -- Attachment: SOLR-7669.patch Add SelectStream to Streaming API - Key: SOLR-7669 URL: https://issues.apache.org/jira/browse/SOLR-7669 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7669.patch Adds a new stream called SelectStream which can be used for two purpose. 1. Limit the set of fields included in an outgoing tuple to remove unwanted fields 2. Provide aliases for fields. With this it acts as an alternative to the CloudSolrStream's 'aliases' option. For example, in a simple case {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} This can also be used as part of complex expressions to help keep track of what is being worked on. This is particularly useful when merging/joining multiple collections which share field names. For example, the following results in a set of tuples including only the fields id, left.ident, and right.ident even though the total set of fields required to perform the search and join is much larger than just those three fields. {code} select( id, left.ident, right.ident, innerJoin( select( id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, sort=join1_i asc, join2_s asc, id asc) ), select( join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, sort=join3_i asc, join2_s asc), ), on=left.join1=right.join1, left.join2=right.join2 ) ) {code} This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7669) Add SelectStream to Streaming API
Dennis Gove created SOLR-7669: - Summary: Add SelectStream to Streaming API Key: SOLR-7669 URL: https://issues.apache.org/jira/browse/SOLR-7669 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Adds a new stream called SelectStream which can be used for two purpose. 1. Limit the set of fields included in an outgoing tuple to remove unwanted fields 2. Provide aliases for fields. With this it acts as an alternative to the CloudSolrStream's 'aliases' option. For example, in a simple case {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} This can also be used as part of complex expressions to help keep track of what is being worked on. This is particularly useful when merging/joining multiple collections which share field names. For example, the following results in a set of tuples including only the fields id, left.ident, and right.ident even though the total set of fields required to perform the search and join is much larger than just those three fields. {code} select( id, left.ident, right.ident, innerJoin( select( id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, sort=join1_i asc, join2_s asc, id asc) ), select( join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, sort=join3_i asc, join2_s asc), ), on=left.join1=right.join1, left.join2=right.join2 ) ) {code} This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Attachment: SOLR-7584.patch Adds LeftOuterJoinStream to support left outer joins w/tests (work done by Corey Wu). Moves some functions from InnerJoinStream up to parent classes as they are shared in LeftOuterJoinStream. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7584.patch, SOLR-7584.patch Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Attachment: SOLR-7584.patch Missed a single line in my diff that corrected a throw statement. Sorry for the double upload. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7621) Frequent 500 error IOExceptions from StreamingExpressions
[ https://issues.apache.org/jira/browse/SOLR-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568340#comment-14568340 ] Dennis Gove commented on SOLR-7621: --- Note that SOLR-7528 adds null checks in the Comparators to allow for null values in the sort fields. It considers two null values to be equal. The intention is that in a future enhancement we can support a configurable approach to how we treat nulls. Frequent 500 error IOExceptions from StreamingExpressions - Key: SOLR-7621 URL: https://issues.apache.org/jira/browse/SOLR-7621 Project: Solr Issue Type: Bug Affects Versions: 5.2 Reporter: Hoss Man Assignee: Joel Bernstein While trying to test out the new Streaming Expressions functionality, I encountered lots of 500 error / IOException with various root causes (i'll post details in the comments) It looks like the API needs to be better hardend to give the user useful feedback and return 4xx errors when used in an incorrect manner -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7707) Add StreamExpression Support to RollupStream
[ https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612814#comment-14612814 ] Dennis Gove edited comment on SOLR-7707 at 7/3/15 2:18 PM: --- Looks like I cut my branch from trunk before those changes were committed. I'll go through some rebasing tomorrow and post up a new patch. Sorry about that. was (Author: dpgove): Looks like I cut my branch from trunk before those changes were committed. I'll go through some rebasing tomorrow and post up a new patch. Sorry abut that. Add StreamExpression Support to RollupStream Key: SOLR-7707 URL: https://issues.apache.org/jira/browse/SOLR-7707 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7707.patch, SOLR-7707.patch, SOLR-7707.patch This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560. Proposed expression syntax for the RollupStream (copied from that ticket) {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7707) Add StreamExpression Support to RollupStream
[ https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7707: -- Attachment: SOLR-7707.patch New correctly based patch attached. Add StreamExpression Support to RollupStream Key: SOLR-7707 URL: https://issues.apache.org/jira/browse/SOLR-7707 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7707.patch, SOLR-7707.patch, SOLR-7707.patch This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560. Proposed expression syntax for the RollupStream (copied from that ticket) {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7707) Add StreamExpression Support to RollupStream
[ https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7707: -- Attachment: SOLR-7707.patch I found the problem. There is a test class called CountStream. In some of the test files (particularly solr/solrj/src/test-files/solrj/solr/collection1/conf/solrconfig-streaming.xml) the function name count was mapped to that Stream. However, now with a count metric I was also mapping the count function name to CountMetric. For the moment I have corrected this by renaming CountStream to RecordCountStream and commented out the mapping in the solrconfig-streaming.xml file. I chose to change this one because it is a class in the test suite and not, apparently, used outside of testing. However, this brings up an interesting question. Should we allow conflicting names across streams and metrics. Right now both the mapping for function name to Stream or Metric is stored in the same Map and as such we we are not allowing the conflict of names - ie, both a stream and metric cannot share the same function name. However, should we allow that? I believe the answer, for clarity, is no. If you assign the string count to CountMetric then you cannot also assign it to CountStream. This will allow users to know what count() means without having to know the context. For example, allowing count to map to both could result in confusion in the following {code} rollup( count(search()), min(fieldA), count(fieldB) ) {code} Add StreamExpression Support to RollupStream Key: SOLR-7707 URL: https://issues.apache.org/jira/browse/SOLR-7707 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7707.patch, SOLR-7707.patch This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560. Proposed expression syntax for the RollupStream (copied from that ticket) {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7707) Add StreamExpression Support to RollupStream
[ https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612814#comment-14612814 ] Dennis Gove commented on SOLR-7707: --- Looks like I cut my branch from trunk before those changes were committed. I'll go through some rebasing tomorrow and post up a new patch. Sorry abut that. Add StreamExpression Support to RollupStream Key: SOLR-7707 URL: https://issues.apache.org/jira/browse/SOLR-7707 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7707.patch, SOLR-7707.patch This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560. Proposed expression syntax for the RollupStream (copied from that ticket) {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7707) Add StreamExpression Support to RollupStream
[ https://issues.apache.org/jira/browse/SOLR-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7707: -- Attachment: SOLR-7707.patch Adds expression support to RollupStream. Note: I have added a ParallelRollupStream test but I cannot get it to pass. I feel as though I've forgotten a required change to make it work with ParallelStream. Add StreamExpression Support to RollupStream Key: SOLR-7707 URL: https://issues.apache.org/jira/browse/SOLR-7707 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Attachments: SOLR-7707.patch This ticket is to add Stream Expression support to the RollupStream as discussed in SOLR-7560. Proposed expression syntax for the RollupStream (copied from that ticket) {code} rollup( someStream(), over=fieldA, fieldB, fieldC, min(fieldA), max(fieldA), min(fieldB), mean(fieldD), sum(fieldC) ) {code} This requires making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be {code} sum(fieldC, limit=standardDev(3)) {code} (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric). Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as {code} mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean mean(fieldA, includeNull=true) // nulls are counted in the denominator but nothing added to numerator mean(fieldA, includeNull=false) // nulls neither counted in denominator nor added to numerator mean(fieldA, replace(null, fieldB), includeNull=true) // if fieldA is null replace it with fieldB, include null fieldB in mean {code} so on and so forth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Description: Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 was: Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
Dennis Gove created SOLR-7584: - Summary: Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Attachment: SOLR-7584.patch Adds JoinStream to support joins of N sub-streams. Adds BiJoinStream to limit JoinStream to 2 sub-streams, left and right. Adds InnerJoinStream with support for merge join. Does not handle hash joins. Uses aliasing concept already available in CloudSolrStream. Still work to be done. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7584.patch Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555348#comment-14555348 ] Dennis Gove edited comment on SOLR-7584 at 5/22/15 12:27 AM: - Adds abstract JoinStream to support joins of N sub-streams. Adds abstract BiJoinStream to limit JoinStream to 2 sub-streams, left and right. Adds concrete InnerJoinStream with support for merge join. Does not handle hash joins. Uses aliasing concept already available in CloudSolrStream. Still work to be done. was (Author: dpgove): Adds JoinStream to support joins of N sub-streams. Adds BiJoinStream to limit JoinStream to 2 sub-streams, left and right. Adds InnerJoinStream with support for merge join. Does not handle hash joins. Uses aliasing concept already available in CloudSolrStream. Still work to be done. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7584.patch Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556008#comment-14556008 ] Dennis Gove commented on SOLR-7584: --- That's right. LeftOuterJoin wasn't included in the first version of the patch. At the moment the patch includes changes to a set of supporting classes and adds inner join. Left outer join isn't ready yet. I expect the expression syntax to be the same (two streams with an on clause) and the implementation to be fairly similar to inner join but taking into account that a right-side record isn't required for the left-side record to be returned. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7584.patch Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7938) MergeStream to support N streams
Dennis Gove created SOLR-7938: - Summary: MergeStream to support N streams Key: SOLR-7938 URL: https://issues.apache.org/jira/browse/SOLR-7938 Project: Solr Issue Type: Bug Components: SolrJ Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Enhances MergeStream to support merging N streams. This was previously limited to merging just two streams but with this enhancement it can now accept any number of streams to merge. Based on the comparator, if more than one stream could provide the next value then the selected value will follow the order of the streams as they appear in the expression or were added to the MergeStream object. {code} merge( search(collection1, q=id:(0 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection1, q=id:(1), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection1, q=id:(2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc ) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7938) MergeStream to support N streams
[ https://issues.apache.org/jira/browse/SOLR-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7938: -- Issue Type: Improvement (was: Bug) MergeStream to support N streams Key: SOLR-7938 URL: https://issues.apache.org/jira/browse/SOLR-7938 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Labels: streaming Enhances MergeStream to support merging N streams. This was previously limited to merging just two streams but with this enhancement it can now accept any number of streams to merge. Based on the comparator, if more than one stream could provide the next value then the selected value will follow the order of the streams as they appear in the expression or were added to the MergeStream object. {code} merge( search(collection1, q=id:(0 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection1, q=id:(1), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection1, q=id:(2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc ) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7938) MergeStream to support N streams
[ https://issues.apache.org/jira/browse/SOLR-7938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7938: -- Attachment: SOLR-7938.patch MergeStream to support N streams Key: SOLR-7938 URL: https://issues.apache.org/jira/browse/SOLR-7938 Project: Solr Issue Type: Improvement Components: SolrJ Affects Versions: Trunk Reporter: Dennis Gove Priority: Minor Labels: streaming Attachments: SOLR-7938.patch Enhances MergeStream to support merging N streams. This was previously limited to merging just two streams but with this enhancement it can now accept any number of streams to merge. Based on the comparator, if more than one stream could provide the next value then the selected value will follow the order of the streams as they appear in the expression or were added to the MergeStream object. {code} merge( search(collection1, q=id:(0 4), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection1, q=id:(1), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), search(collection1, q=id:(2), fl=id,a_s,a_i,a_f, sort=a_f asc, a_s asc), on=a_f asc ) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7669: -- Attachment: SOLR-7669.patch Updated to add support for performing operations on the selected values. The only operation included in this patch is Replace which can be used to replace field values (or nulll) with a different value or the value of another field. In the following example, if fieldA is null then it will be replaced with value 123 and if fieldB is foo then it will be set to bar. {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, replace(fieldA, null, 123), replace(fieldB, foo, withValue=bar), search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} In the following example, if fieldA is null or ??? then it will be replaced with the value of fieldB. {code} select( id, fieldA_s as fieldA, fieldB_s as fieldB, replace(fieldA, null, withField=fieldB), replace(fieldA, ???, withField=fieldB) search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} Add SelectStream to Streaming API - Key: SOLR-7669 URL: https://issues.apache.org/jira/browse/SOLR-7669 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7669.patch, SOLR-7669.patch Adds a new stream called SelectStream which can be used for two purpose. 1. Limit the set of fields included in an outgoing tuple to remove unwanted fields 2. Provide aliases for fields. With this it acts as an alternative to the CloudSolrStream's 'aliases' option. For example, in a simple case {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} This can also be used as part of complex expressions to help keep track of what is being worked on. This is particularly useful when merging/joining multiple collections which share field names. For example, the following results in a set of tuples including only the fields id, left.ident, and right.ident even though the total set of fields required to perform the search and join is much larger than just those three fields. {code} select( id, left.ident, right.ident, innerJoin( select( id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, sort=join1_i asc, join2_s asc, id asc) ), select( join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, sort=join3_i asc, join2_s asc), ), on=left.join1=right.join1, left.join2=right.join2 ) ) {code} This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Attachment: SOLR-7584.patch Recreated patch off current trunk. Previous patch was a little outdated. Add Joins to the Streaming API and Streaming Expressions Key: SOLR-7584 URL: https://issues.apache.org/jira/browse/SOLR-7584 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the Streaming API to allow for joining between sub-streams. At its basic, it would look something like this {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA ) {code} or with multi-field on clauses {code} innerJoin( search(collection1, q=*:*, fl=fieldA, fieldB, fieldC, ...), search(collection2, q=*:*, fl=fieldA, fieldD, fieldE, ...), on=fieldA=fieldA, fieldB=fieldD ) {code} I'd also like to support the option of doing a hash join instead of the default merge join but I haven't yet figured out the best way to express that. I'd like to let the user tell us which sub-stream should be hashed (the least-cost one). Also, I've been thinking about field aliasing and might want to add a SelectStream which serves the purpose of allowing us to limit the fields coming out and rename fields. Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661163#comment-14661163 ] Dennis Gove edited comment on SOLR-7669 at 8/10/15 1:55 PM: Updated to add support for performing operations on the selected values. The only operation included in this patch is Replace which can be used to replace field values (or nulll) with a different value or the value of another field. In the following example, if fieldA is null then it will be replaced with value 123 and if fieldB is foo then it will be set to bar. {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, replace(fieldA, null, withValue=123), replace(fieldB, foo, withValue=bar), search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} In the following example, if fieldA is null or ??? then it will be replaced with the value of fieldB. {code} select( id, fieldA_s as fieldA, fieldB_s as fieldB, replace(fieldA, null, withField=fieldB), replace(fieldA, ???, withField=fieldB) search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} was (Author: dpgove): Updated to add support for performing operations on the selected values. The only operation included in this patch is Replace which can be used to replace field values (or nulll) with a different value or the value of another field. In the following example, if fieldA is null then it will be replaced with value 123 and if fieldB is foo then it will be set to bar. {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, replace(fieldA, null, 123), replace(fieldB, foo, withValue=bar), search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} In the following example, if fieldA is null or ??? then it will be replaced with the value of fieldB. {code} select( id, fieldA_s as fieldA, fieldB_s as fieldB, replace(fieldA, null, withField=fieldB), replace(fieldA, ???, withField=fieldB) search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} Add SelectStream to Streaming API - Key: SOLR-7669 URL: https://issues.apache.org/jira/browse/SOLR-7669 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Labels: Streaming Attachments: SOLR-7669.patch, SOLR-7669.patch Adds a new stream called SelectStream which can be used for two purpose. 1. Limit the set of fields included in an outgoing tuple to remove unwanted fields 2. Provide aliases for fields. With this it acts as an alternative to the CloudSolrStream's 'aliases' option. For example, in a simple case {code} select( id, fieldA_i as fieldA, fieldB_s as fieldB, search(collection1, q=*:*, fl=id,fieldA_i,fieldB_s, sort=fieldA_i asc, fieldB_s asc, id asc) ) {code} This can also be used as part of complex expressions to help keep track of what is being worked on. This is particularly useful when merging/joining multiple collections which share field names. For example, the following results in a set of tuples including only the fields id, left.ident, and right.ident even though the total set of fields required to perform the search and join is much larger than just those three fields. {code} select( id, left.ident, right.ident, innerJoin( select( id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, search(collection1, q=side_s:left, fl=id,join1_i,join2_s,ident_s, sort=join1_i asc, join2_s asc, id asc) ), select( join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, search(collection1, q=side_s:right, fl=join3_i,join2_s,ident_s, sort=join3_i asc, join2_s asc), ), on=left.join1=right.join1, left.join2=right.join2 ) ) {code} This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7669: -- Attachment: SOLR-7669.patch Deleted the EditStream as its functionality (the removal of fields from a tuple) is superseded by the SelectStream. Updated the SQLHandler to use the SelectStream instead of the EditStream. All relevant tests pass. > Add SelectStream to Streaming API > - > > Key: SOLR-7669 > URL: https://issues.apache.org/jira/browse/SOLR-7669 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, > SOLR-7669.patch > > > Adds a new stream called SelectStream which can be used for two purpose. > 1. Limit the set of fields included in an outgoing tuple to remove unwanted > fields > 2. Provide aliases for fields. With this it acts as an alternative to the > CloudSolrStream's 'aliases' option. > For example, in a simple case > {code} > select( > id, > fieldA_i as fieldA, > fieldB_s as fieldB, > search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, > fieldB_s asc, id asc") > ) > {code} > This can also be used as part of complex expressions to help keep track of > what is being worked on. This is particularly useful when merging/joining > multiple collections which share field names. For example, the following > results in a set of tuples including only the fields id, left.ident, and > right.ident even though the total set of fields required to perform the > search and join is much larger than just those three fields. > {code} > select( > id, left.ident, right.ident, > innerJoin( > select( > id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, > search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", > sort="join1_i asc, join2_s asc, id asc") > ), > select( > join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, > search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", > sort="join3_i asc, join2_s asc"), > ), > on="left.join1=right.join1, left.join2=right.join2" > ) > ) > {code} > This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
Dennis Gove created SOLR-8188: - Summary: Add hash style joins to the Streaming API and Streaming Expressions Key: SOLR-8188 URL: https://issues.apache.org/jira/browse/SOLR-8188 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for optimized joining between sub-streams. HashJoinStream is similar to an InnerJoinStream except that it does not insist on any particular order and will read all values from the stream being hashed (hashStream) when open() is called. During read() it will return the next tuple from the stream not being hashed (fullStream) which has at least one matching record in hashStream. It will return a tuple which is the merge of both tuples. If the tuple from the fullStream matches with more than one tuple from the hashStream then calling read() will return the merge with the next matching tuple. The order of the resulting stream is the order of the fullStream. OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in that a tuple from fullStream will be returned even if it doesn't have a matching record in hashStream. All other pieces are identical. In expression form {code} hashJoin( search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), on="fieldA, fieldB" ) {code} {code} outerHashJoin( search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), on="fieldA, fieldB" ) {code} As you can see the hashStream is named parameter which makes it very clear which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8198) Change ReducerStream to use StreamEqualitor instead of StreamComparator
[ https://issues.apache.org/jira/browse/SOLR-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8198: -- Attachment: SOLR-8198.patch All tests pass. > Change ReducerStream to use StreamEqualitor instead of StreamComparator > --- > > Key: SOLR-8198 > URL: https://issues.apache.org/jira/browse/SOLR-8198 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8198.patch > > > Currently the ReducerStream uses a StreamComparator to determine whether > tuples are equal. StreamEqualitors are a simplified version of a comparator > in that they do not require a sort to be provided. Using the function > getStreamSort we are still able to validate the incoming stream's sort and > pass that on up to any parent stream which might require it. > This will simplify the use of the ReducerStream in join scenarios where the > reducer is used to find like records. Such a scenario exists with Inner/Outer > JoinStream, ComplementStream, and [Outer]HashJoinStreams. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976307#comment-14976307 ] Dennis Gove commented on SOLR-7525: --- I've got a patch for this that also includes an IntersectStream (return tuples from streamA that also exist in streamB). I just want to add some additional tests before a post the patch. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981511#comment-14981511 ] Dennis Gove edited comment on SOLR-7525 at 10/29/15 10:59 PM: -- Includes both ComplementStream and IntersectStream. All tests pass. Depends on SOLR-8198. was (Author: dpgove): Includes both ComplementStream and IntersectStream. All tests pass. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7525: -- Attachment: SOLR-7525.patch Includes both ComplementStream and IntersectStream. All tests pass. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972324#comment-14972324 ] Dennis Gove commented on SOLR-7584: --- Could you describe your use-case for joining on facets? I can imagine that a HashJoin (SOLR-8188) would be good for something like that because it removes the sort requirement. Yes, you can apply functions like sum and average on the joined data by wrapping the resulting joined stream in a RollupStream and using metrics. > Add Joins to the Streaming API and Streaming Expressions > > > Key: SOLR-7584 > URL: https://issues.apache.org/jira/browse/SOLR-7584 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, > SOLR-7584.patch, SOLR-7584.patch > > > Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the > Streaming API to allow for joining between sub-streams. > At its basic, it would look something like this > {code} > innerJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...), > on="fieldA=fieldA" > ) > {code} > or with multi-field on clauses > {code} > innerJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...), > on="fieldA=fieldA, fieldB=fieldD" > ) > {code} > I'd also like to support the option of doing a hash join instead of the > default merge join but I haven't yet figured out the best way to express > that. I'd like to let the user tell us which sub-stream should be hashed (the > least-cost one). > Also, I've been thinking about field aliasing and might want to add a > SelectStream which serves the purpose of allowing us to limit the fields > coming out and rename fields. > Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8198) Change ReducerStream to use StreamEqualitor instead of StreamComparator
Dennis Gove created SOLR-8198: - Summary: Change ReducerStream to use StreamEqualitor instead of StreamComparator Key: SOLR-8198 URL: https://issues.apache.org/jira/browse/SOLR-8198 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Currently the ReducerStream uses a StreamComparator to determine whether tuples are equal. StreamEqualitors are a simplified version of a comparator in that they do not require a sort to be provided. Using the function getStreamSort we are still able to validate the incoming stream's sort and pass that on up to any parent stream which might require it. This will simplify the use of the ReducerStream in join scenarios where the reducer is used to find like records. Such a scenario exists with Inner/Outer JoinStream, ComplementStream, and [Outer]HashJoinStreams. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Attachment: SOLR-7584.patch Part of this ticket is a change in comparators and equalitors to support differing field names on either side of the comparison (ie, fieldA = fieldB). Due to changes that have come into trunk between the creation of this patch and now it was required that I propagate those changes to a couple of other files. Note, I originally included this change in SOLR-7669 but realized today that it's actually necessary in this patch. Here's me regretting the decision to not create a separate ticket for the equalitor/comparator changes but this patch does also add support for distributed joins so there's that. Either way, description of change is below. Required a couple of changes in the SQL and FacetStream areas related to FieldComparator. The FieldComparator has been changed to support different field names on the left and right side. The SQL and FacetStream areas use FieldComparator for sorting (a totally valid use case) but do expect the left and right side field names to be equal. The changes I made go through and validate that assumption. In the future I think I may circle back around and create a new FieldComparator with a single field name so that on construction that assumption can be enforced. All tests pass. > Add Joins to the Streaming API and Streaming Expressions > > > Key: SOLR-7584 > URL: https://issues.apache.org/jira/browse/SOLR-7584 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, > SOLR-7584.patch, SOLR-7584.patch > > > Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the > Streaming API to allow for joining between sub-streams. > At its basic, it would look something like this > {code} > innerJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...), > on="fieldA=fieldA" > ) > {code} > or with multi-field on clauses > {code} > innerJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...), > on="fieldA=fieldA, fieldB=fieldD" > ) > {code} > I'd also like to support the option of doing a hash join instead of the > default merge join but I haven't yet figured out the best way to express > that. I'd like to let the user tell us which sub-stream should be hashed (the > least-cost one). > Also, I've been thinking about field aliasing and might want to add a > SelectStream which serves the purpose of allowing us to limit the fields > coming out and rename fields. > Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8188: -- Attachment: SOLR-8188.patch Added a field seperator to the hash calculation. This is to prevent a situation where two tuples have the same hashed value where they shoudn't. t1.fieldA = "foo" t1.fieldB = "bar" t2.fieldA = "foob" t2.fieldB = "ar" With this change the hash will be different for t1 and t2. > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8188: -- Attachment: SOLR-8188.patch All tests pass. > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970311#comment-14970311 ] Dennis Gove edited comment on SOLR-8188 at 10/23/15 2:43 AM: - Added a field separator to the hash calculation. This is to prevent a situation where two tuples have the same hashed value where they shouldn't. t1.fieldA = "foo" t1.fieldB = "bar" t2.fieldA = "foob" t2.fieldB = "ar" With this change the hash will be different for t1 and t2. was (Author: dpgove): Added a field seperator to the hash calculation. This is to prevent a situation where two tuples have the same hashed value where they shoudn't. t1.fieldA = "foo" t1.fieldB = "bar" t2.fieldA = "foob" t2.fieldB = "ar" With this change the hash will be different for t1 and t2. > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8185) Add operations support to streaming metrics
Dennis Gove created SOLR-8185: - Summary: Add operations support to streaming metrics Key: SOLR-8185 URL: https://issues.apache.org/jira/browse/SOLR-8185 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Dennis Gove Priority: Minor Adds support for operations on stream metrics. With this feature one can modify tuple values before applying to the computed metric. There are a lot of use-cases I can see with this - I'll describe one here. Imagine you have a RollupStream which is computing the average over some field but you cannot be sure that all documents have a value for that field, ie the value is null. When the value is null you want to treat it as a 0. With this feature you can accomplish that like this {code} rollup( search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), over=\"a_s\", avg(a_i, replace(null, withValue=0)), count(*), ) {code} The operations are applied to the tuple for each metric in the stream which means you perform different operations on different metrics without being impacted by operations on other metrics. Adding to our previous example, imagine you want to also get the min of a field but do not consider null values. {code} rollup( search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), over=\"a_s\", avg(a_i, replace(null, withValue=0)), min(a_i), count(*), ) {code} Also, the tuple is not modified for streams that might wrap this one. Ie, the only thing that sees the applied operation is that particular metric. If you want to apply operations for wrapping streams you can still achieve that with the SelectStream (SOLR-7669). One feature I'm investigating but this patch DOES NOT add is the ability to assign names to the resulting metric value. For example, to allow for something like this {code} rollup( search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), over=\"a_s\", avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"), avg(a_i), count(*, as="totalCount"), ) {code} Right now that isn't possible because the identifier for each metric would be the same "avg_a_i" and as such both couldn't be returned. It's relatively easy to add but I have to investigate its impact on the SQL and FacetStream areas. Depends on SOLR-7669 (SelectStream) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8185) Add operations support to streaming metrics
[ https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8185: -- Attachment: SOLR-8185.patch Full patch. All tests pass. > Add operations support to streaming metrics > --- > > Key: SOLR-8185 > URL: https://issues.apache.org/jira/browse/SOLR-8185 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8185.patch > > > Adds support for operations on stream metrics. > With this feature one can modify tuple values before applying to the computed > metric. There are a lot of use-cases I can see with this - I'll describe one > here. > Imagine you have a RollupStream which is computing the average over some > field but you cannot be sure that all documents have a value for that field, > ie the value is null. When the value is null you want to treat it as a 0. > With this feature you can accomplish that like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > count(*), > ) > {code} > The operations are applied to the tuple for each metric in the stream which > means you perform different operations on different metrics without being > impacted by operations on other metrics. > Adding to our previous example, imagine you want to also get the min of a > field but do not consider null values. > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > min(a_i), > count(*), > ) > {code} > Also, the tuple is not modified for streams that might wrap this one. Ie, the > only thing that sees the applied operation is that particular metric. If you > want to apply operations for wrapping streams you can still achieve that with > the SelectStream (SOLR-7669). > One feature I'm investigating but this patch DOES NOT add is the ability to > assign names to the resulting metric value. For example, to allow for > something like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"), > avg(a_i), > count(*, as="totalCount"), > ) > {code} > Right now that isn't possible because the identifier for each metric would be > the same "avg_a_i" and as such both couldn't be returned. It's relatively > easy to add but I have to investigate its impact on the SQL and FacetStream > areas. > Depends on SOLR-7669 (SelectStream) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7669: -- Attachment: SOLR-7669.patch Rebased against trunk (git hash f63fc48, SOLR-8114: in Grouping.java rename groupSort to withinGroupSort) Required a couple of changes in the SQL and FacetStream areas related to FieldComparator. The FieldComparator has been changed to support different field names on the left and right side. The SQL and FacetStream areas use FieldComparator for sorting (a totally valid use case) but do expect the left and right side field names to be equal. The changes I made go through and validate that assumption. In the future I think I may circle back around and create a new FieldComparator with a single field name so that on construction that assumption can be enforced. All tests pass. > Add SelectStream to Streaming API > - > > Key: SOLR-7669 > URL: https://issues.apache.org/jira/browse/SOLR-7669 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch > > > Adds a new stream called SelectStream which can be used for two purpose. > 1. Limit the set of fields included in an outgoing tuple to remove unwanted > fields > 2. Provide aliases for fields. With this it acts as an alternative to the > CloudSolrStream's 'aliases' option. > For example, in a simple case > {code} > select( > id, > fieldA_i as fieldA, > fieldB_s as fieldB, > search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, > fieldB_s asc, id asc") > ) > {code} > This can also be used as part of complex expressions to help keep track of > what is being worked on. This is particularly useful when merging/joining > multiple collections which share field names. For example, the following > results in a set of tuples including only the fields id, left.ident, and > right.ident even though the total set of fields required to perform the > search and join is much larger than just those three fields. > {code} > select( > id, left.ident, right.ident, > innerJoin( > select( > id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, > search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", > sort="join1_i asc, join2_s asc, id asc") > ), > select( > join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, > search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", > sort="join3_i asc, join2_s asc"), > ), > on="left.join1=right.join1, left.join2=right.join2" > ) > ) > {code} > This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-8268) Makes StatsStream implement Expressible interface
[ https://issues.apache.org/jira/browse/SOLR-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-8268: - Assignee: Dennis Gove > Makes StatsStream implement Expressible interface > - > > Key: SOLR-8268 > URL: https://issues.apache.org/jira/browse/SOLR-8268 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Affects Versions: Trunk >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Trivial > Labels: Streaming > Fix For: Trunk > > Attachments: SOLR-8268.patch > > > Adds expression support to the Stats stream. With this it will now be > possible to express an stats stream as > {code} > stats( > collection1, q=*:*, fl="fieldA,fieldB,fieldInt,fieldFloat", > sum(fieldInt), > sum(fieldFloat), > min(fieldInt), > min(fieldFloat), > max(fieldInt), > max(fieldFloat), > avg(fieldInt), > avg(fieldFloat), > count(*) > ) > {code} > You can collect stats on any supported metric and use full metric features. > Ie, when SOLR-8185 is committed you can then include operations in the > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-7669: - Assignee: Dennis Gove > Add SelectStream to Streaming API > - > > Key: SOLR-7669 > URL: https://issues.apache.org/jira/browse/SOLR-7669 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, > SOLR-7669.patch, SOLR-7669.patch > > > Adds a new stream called SelectStream which can be used for two purpose. > 1. Limit the set of fields included in an outgoing tuple to remove unwanted > fields > 2. Provide aliases for fields. With this it acts as an alternative to the > CloudSolrStream's 'aliases' option. > For example, in a simple case > {code} > select( > id, > fieldA_i as fieldA, > fieldB_s as fieldB, > search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, > fieldB_s asc, id asc") > ) > {code} > This can also be used as part of complex expressions to help keep track of > what is being worked on. This is particularly useful when merging/joining > multiple collections which share field names. For example, the following > results in a set of tuples including only the fields id, left.ident, and > right.ident even though the total set of fields required to perform the > search and join is much larger than just those three fields. > {code} > select( > id, left.ident, right.ident, > innerJoin( > select( > id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, > search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", > sort="join1_i asc, join2_s asc, id asc") > ), > select( > join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, > search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", > sort="join3_i asc, join2_s asc"), > ), > on="left.join1=right.join1, left.join2=right.join2" > ) > ) > {code} > This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7669: -- Attachment: SOLR-7669.patch Fixes for pre-commit failures. Add documentation on the operations. > Add SelectStream to Streaming API > - > > Key: SOLR-7669 > URL: https://issues.apache.org/jira/browse/SOLR-7669 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, > SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch > > > Adds a new stream called SelectStream which can be used for two purpose. > 1. Limit the set of fields included in an outgoing tuple to remove unwanted > fields > 2. Provide aliases for fields. With this it acts as an alternative to the > CloudSolrStream's 'aliases' option. > For example, in a simple case > {code} > select( > id, > fieldA_i as fieldA, > fieldB_s as fieldB, > search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, > fieldB_s asc, id asc") > ) > {code} > This can also be used as part of complex expressions to help keep track of > what is being worked on. This is particularly useful when merging/joining > multiple collections which share field names. For example, the following > results in a set of tuples including only the fields id, left.ident, and > right.ident even though the total set of fields required to perform the > search and join is much larger than just those three fields. > {code} > select( > id, left.ident, right.ident, > innerJoin( > select( > id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, > search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", > sort="join1_i asc, join2_s asc, id asc") > ), > select( > join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, > search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", > sort="join3_i asc, join2_s asc"), > ), > on="left.join1=right.join1, left.join2=right.join2" > ) > ) > {code} > This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-7669. - Resolution: Implemented Fix Version/s: Trunk > Add SelectStream to Streaming API > - > > Key: SOLR-7669 > URL: https://issues.apache.org/jira/browse/SOLR-7669 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Labels: Streaming > Fix For: Trunk > > Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, > SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch > > > Adds a new stream called SelectStream which can be used for two purpose. > 1. Limit the set of fields included in an outgoing tuple to remove unwanted > fields > 2. Provide aliases for fields. With this it acts as an alternative to the > CloudSolrStream's 'aliases' option. > For example, in a simple case > {code} > select( > id, > fieldA_i as fieldA, > fieldB_s as fieldB, > search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, > fieldB_s asc, id asc") > ) > {code} > This can also be used as part of complex expressions to help keep track of > what is being worked on. This is particularly useful when merging/joining > multiple collections which share field names. For example, the following > results in a set of tuples including only the fields id, left.ident, and > right.ident even though the total set of fields required to perform the > search and join is much larger than just those three fields. > {code} > select( > id, left.ident, right.ident, > innerJoin( > select( > id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, > search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", > sort="join1_i asc, join2_s asc, id asc") > ), > select( > join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, > search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", > sort="join3_i asc, join2_s asc"), > ), > on="left.join1=right.join1, left.join2=right.join2" > ) > ) > {code} > This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8185) Add operations support to streaming metrics
[ https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968357#comment-14968357 ] Dennis Gove edited comment on SOLR-8185 at 11/13/15 1:41 AM: - Full patch. was (Author: dpgove): Full patch. All tests pass. > Add operations support to streaming metrics > --- > > Key: SOLR-8185 > URL: https://issues.apache.org/jira/browse/SOLR-8185 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Attachments: SOLR-8185.patch > > > Adds support for operations on stream metrics. > With this feature one can modify tuple values before applying to the computed > metric. There are a lot of use-cases I can see with this - I'll describe one > here. > Imagine you have a RollupStream which is computing the average over some > field but you cannot be sure that all documents have a value for that field, > ie the value is null. When the value is null you want to treat it as a 0. > With this feature you can accomplish that like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > count(*), > ) > {code} > The operations are applied to the tuple for each metric in the stream which > means you perform different operations on different metrics without being > impacted by operations on other metrics. > Adding to our previous example, imagine you want to also get the min of a > field but do not consider null values. > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > min(a_i), > count(*), > ) > {code} > Also, the tuple is not modified for streams that might wrap this one. Ie, the > only thing that sees the applied operation is that particular metric. If you > want to apply operations for wrapping streams you can still achieve that with > the SelectStream (SOLR-7669). > One feature I'm investigating but this patch DOES NOT add is the ability to > assign names to the resulting metric value. For example, to allow for > something like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"), > avg(a_i), > count(*, as="totalCount"), > ) > {code} > Right now that isn't possible because the identifier for each metric would be > the same "avg_a_i" and as such both couldn't be returned. It's relatively > easy to add but I have to investigate its impact on the SQL and FacetStream > areas. > Depends on SOLR-7669 (SelectStream) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-8185) Add operations support to streaming metrics
[ https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-8185: - Assignee: Dennis Gove > Add operations support to streaming metrics > --- > > Key: SOLR-8185 > URL: https://issues.apache.org/jira/browse/SOLR-8185 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Attachments: SOLR-8185.patch > > > Adds support for operations on stream metrics. > With this feature one can modify tuple values before applying to the computed > metric. There are a lot of use-cases I can see with this - I'll describe one > here. > Imagine you have a RollupStream which is computing the average over some > field but you cannot be sure that all documents have a value for that field, > ie the value is null. When the value is null you want to treat it as a 0. > With this feature you can accomplish that like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > count(*), > ) > {code} > The operations are applied to the tuple for each metric in the stream which > means you perform different operations on different metrics without being > impacted by operations on other metrics. > Adding to our previous example, imagine you want to also get the min of a > field but do not consider null values. > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > min(a_i), > count(*), > ) > {code} > Also, the tuple is not modified for streams that might wrap this one. Ie, the > only thing that sees the applied operation is that particular metric. If you > want to apply operations for wrapping streams you can still achieve that with > the SelectStream (SOLR-7669). > One feature I'm investigating but this patch DOES NOT add is the ability to > assign names to the resulting metric value. For example, to allow for > something like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"), > avg(a_i), > count(*, as="totalCount"), > ) > {code} > Right now that isn't possible because the identifier for each metric would be > the same "avg_a_i" and as such both couldn't be returned. It's relatively > easy to add but I have to investigate its impact on the SQL and FacetStream > areas. > Depends on SOLR-7669 (SelectStream) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8185) Add operations support to streaming metrics
[ https://issues.apache.org/jira/browse/SOLR-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003390#comment-15003390 ] Dennis Gove commented on SOLR-8185: --- Running into some issues turning the expression into something that would perform the expected .equals() {code} avg(a_f, replace(10, withValue=0)) {code} In this example, what type is 10? Is it a long or a float or a double? The field is a float (as noted by the _f) so one would expect 10 to be a float as well. However, in converting 10 to some Object that we can call .equals(...) on we are not sure what the type is. This has been a persistent problem with this patch. But I think I've come up with something that puts some of the decision making in the hands of the expression writer. {code} avg(a_f, replace(10f, withValue=0f)) {code} In this case the value can only be converted to a float so it will be created as a float object. However, to add this new requirement on the expression creator I want to take a deeper look at what this might impact and make sure the documentation is very clear. If a user doesn't do the correct thing (gives us 10 instead of 10f) and the value in the tuple is a float then float.equals(long) == false every single time. Anyway, this note is somewhat of a rant. > Add operations support to streaming metrics > --- > > Key: SOLR-8185 > URL: https://issues.apache.org/jira/browse/SOLR-8185 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Attachments: SOLR-8185.patch > > > Adds support for operations on stream metrics. > With this feature one can modify tuple values before applying to the computed > metric. There are a lot of use-cases I can see with this - I'll describe one > here. > Imagine you have a RollupStream which is computing the average over some > field but you cannot be sure that all documents have a value for that field, > ie the value is null. When the value is null you want to treat it as a 0. > With this feature you can accomplish that like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > count(*), > ) > {code} > The operations are applied to the tuple for each metric in the stream which > means you perform different operations on different metrics without being > impacted by operations on other metrics. > Adding to our previous example, imagine you want to also get the min of a > field but do not consider null values. > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0)), > min(a_i), > count(*), > ) > {code} > Also, the tuple is not modified for streams that might wrap this one. Ie, the > only thing that sees the applied operation is that particular metric. If you > want to apply operations for wrapping streams you can still achieve that with > the SelectStream (SOLR-7669). > One feature I'm investigating but this patch DOES NOT add is the ability to > assign names to the resulting metric value. For example, to allow for > something like this > {code} > rollup( > search(collection1, q=*:*, fl=\"a_s,a_i,a_f\", sort=\"a_s asc\"), > over=\"a_s\", > avg(a_i, replace(null, withValue=0), as="avg_a_i_null_as_0"), > avg(a_i), > count(*, as="totalCount"), > ) > {code} > Right now that isn't possible because the identifier for each metric would be > the same "avg_a_i" and as such both couldn't be returned. It's relatively > easy to add but I have to investigate its impact on the SQL and FacetStream > areas. > Depends on SOLR-7669 (SelectStream) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-8188: - Assignee: Dennis Gove > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Attachments: SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-8188. - Resolution: Implemented Fix Version/s: Trunk > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reopened SOLR-8188: --- Forgot to attach a slightly modified patch file (rebased off trunk). > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8188: -- Attachment: SOLR-8188.patch This is the patch that was applied to trunk. > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-8188.patch, SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-8188) Add hash style joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-8188. - Resolution: Implemented Still closed > Add hash style joins to the Streaming API and Streaming Expressions > --- > > Key: SOLR-8188 > URL: https://issues.apache.org/jira/browse/SOLR-8188 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-8188.patch, SOLR-8188.patch, SOLR-8188.patch > > > Add HashJoinStream and OuterHashJoinStream to the Streaming API to allow for > optimized joining between sub-streams. > HashJoinStream is similar to an InnerJoinStream except that it does not > insist on any particular order and will read all values from the stream being > hashed (hashStream) when open() is called. During read() it will return the > next tuple from the stream not being hashed (fullStream) which has at least > one matching record in hashStream. It will return a tuple which is the merge > of both tuples. If the tuple from the fullStream matches with more than one > tuple from the hashStream then calling read() will return the merge with the > next matching tuple. The order of the resulting stream is the order of the > fullStream. > OuterHashJoinStream is similar to a HashJoinStream and LeftOuterJoinStream in > that a tuple from fullStream will be returned even if it doesn't have a > matching record in hashStream. All other pieces are identical. > In expression form > {code} > hashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > {code} > outerHashJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > hashed=search(collection2, q=*:*, fl="fieldA, fieldB, fieldE", ...), > on="fieldA, fieldB" > ) > {code} > As you can see the hashStream is named parameter which makes it very clear > which stream should be hashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7669) Add SelectStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7669: -- Attachment: SOLR-7669.patch Rebased against trunk. > Add SelectStream to Streaming API > - > > Key: SOLR-7669 > URL: https://issues.apache.org/jira/browse/SOLR-7669 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7669.patch, SOLR-7669.patch, SOLR-7669.patch, > SOLR-7669.patch, SOLR-7669.patch > > > Adds a new stream called SelectStream which can be used for two purpose. > 1. Limit the set of fields included in an outgoing tuple to remove unwanted > fields > 2. Provide aliases for fields. With this it acts as an alternative to the > CloudSolrStream's 'aliases' option. > For example, in a simple case > {code} > select( > id, > fieldA_i as fieldA, > fieldB_s as fieldB, > search(collection1, q="*:*", fl="id,fieldA_i,fieldB_s", sort="fieldA_i asc, > fieldB_s asc, id asc") > ) > {code} > This can also be used as part of complex expressions to help keep track of > what is being worked on. This is particularly useful when merging/joining > multiple collections which share field names. For example, the following > results in a set of tuples including only the fields id, left.ident, and > right.ident even though the total set of fields required to perform the > search and join is much larger than just those three fields. > {code} > select( > id, left.ident, right.ident, > innerJoin( > select( > id, join1_i as left.join1, join2_s as left.join2, ident_s as left.ident, > search(collection1, q="side_s:left", fl="id,join1_i,join2_s,ident_s", > sort="join1_i asc, join2_s asc, id asc") > ), > select( > join3_i as right.join1, join2_s as right.join2, ident_s as right.ident, > search(collection1, q="side_s:right", fl="join3_i,join2_s,ident_s", > sort="join3_i asc, join2_s asc"), > ), > on="left.join1=right.join1, left.join2=right.join2" > ) > ) > {code} > This depends on SOLR-7584. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-8268) Makes StatsStream implement Expressible interface
[ https://issues.apache.org/jira/browse/SOLR-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-8268. - Resolution: Implemented > Makes StatsStream implement Expressible interface > - > > Key: SOLR-8268 > URL: https://issues.apache.org/jira/browse/SOLR-8268 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Affects Versions: Trunk >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Trivial > Labels: Streaming > Fix For: Trunk > > Attachments: SOLR-8268.patch > > > Adds expression support to the Stats stream. With this it will now be > possible to express an stats stream as > {code} > stats( > collection1, q=*:*, fl="fieldA,fieldB,fieldInt,fieldFloat", > sum(fieldInt), > sum(fieldFloat), > min(fieldInt), > min(fieldFloat), > max(fieldInt), > max(fieldFloat), > avg(fieldInt), > avg(fieldFloat), > count(*) > ) > {code} > You can collect stats on any supported metric and use full metric features. > Ie, when SOLR-8185 is committed you can then include operations in the > metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7584) Add Joins to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7584: -- Attachment: SOLR-7584.patch Rebased against current trunk. A couple of comment changes. All tests pass. > Add Joins to the Streaming API and Streaming Expressions > > > Key: SOLR-7584 > URL: https://issues.apache.org/jira/browse/SOLR-7584 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Labels: Streaming > Attachments: SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch, > SOLR-7584.patch, SOLR-7584.patch, SOLR-7584.patch > > > Add InnerJoinStream, LeftOuterJoinStream, and supporting classes to the > Streaming API to allow for joining between sub-streams. > At its basic, it would look something like this > {code} > innerJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...), > on="fieldA=fieldA" > ) > {code} > or with multi-field on clauses > {code} > innerJoin( > search(collection1, q=*:*, fl="fieldA, fieldB, fieldC", ...), > search(collection2, q=*:*, fl="fieldA, fieldD, fieldE", ...), > on="fieldA=fieldA, fieldB=fieldD" > ) > {code} > I'd also like to support the option of doing a hash join instead of the > default merge join but I haven't yet figured out the best way to express > that. I'd like to let the user tell us which sub-stream should be hashed (the > least-cost one). > Also, I've been thinking about field aliasing and might want to add a > SelectStream which serves the purpose of allowing us to limit the fields > coming out and rename fields. > Depends on SOLR-7554 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org