[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-7535: - Attachment: SOLR-7535.patch Patch with the latest work. Ready to commit but having a hard time getting the full test suite to run through. I had a stall earlier on the StreamingExpressionTests which I had never seen before. So I'm being extra careful with this. I'd like to run the tests successfully several more times to see if it was a one time problem. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-7535: - Attachment: SOLR-7535.patch Changed testParallelUpdateStream() to mirror the changes made to testUpdateStread(). > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-7535: - Attachment: SOLR-7535.patch > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-7535: - Attachment: SOLR-7535.patch New patch that wraps the stream source in a PushbackStream. This allows us to push back the EOF tuple and upload the batch. This is a nice approach that preserves the EOF tuple from the source stream in case there is info in the EOF tuple. Existing tests are passing with this patch. I'll spend some time today expanding the tests. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-7535: - Attachment: (was: SOLR-7535.patch) > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-7535: - Attachment: SOLR-7535.patch Added multi-value fields to testUpdateStream() and am also now checking the values of the Tuples from the destination collection. I'll do the same for the testParallelUpdateStream and then move on to manual testing. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-7535: -- Attachment: SOLR-7535.patch I forgot to make the change Joel suggested for supporting multivalued fields. This patch is a small update to take care of that. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-7535: -- Attachment: SOLR-7535.patch After some more poking around last night and this morning (and with help from Joel and Dennis), I found where my confusion was coming from yesterday. I've updated the patch to include *basic* tests. {{StreamExpressionTest}} now has clauses for testUpdateStream, and testParallelUpdateStream. I also added a testUpdateStream to {{StreamExpressionToExpessionTest}} The tests (hopefully) do about what you expect them to. I want to stress that these are just _basic_ tests though. There were a few other test cases that I thought of adding but didn't. (test where batchSize evenly divides into numResults for underlying stream, test where batchSize doesn't evenly divide, test where there are 0 results from underlying stream, test that nice messages are returned on common error cases, test that multivalued fields are handled properly, etc.) I'm happy to add these sorts of tests too if people think they're worth the future-maintenance and test-suite-runtime cost. (I think they're def worth it, but I wanted to defer to others with more experience before starting...just a sanity check). Ideally, since there's more cases I'm trying to cover, I'd like to put these tests in a separate file entirely (i.e. a new {{UpdateStreamTest}}). > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-7535: -- Attachment: SOLR-7535.patch Latest patch addresses 3 of Joel's concerns (see above): 1.) UpdateStream now extends TupleStream, instead of CloudSolrStream 2.) UpdateStream no longer commits on EOF. 3.) UpdateStream now takes a mandatory batchSize argument as a namedParameter. It reads batchSize tuples from the wrapped stream before sending them off. It then spits out a tuple with a "uploadedDocs" parameter. i.e. the stream now outputs data that looks like: {code} {"result-set": {"docs":[ {"uploadedDocs":5}, {"uploadedDocs":5}, {"uploadedDocs":5}, {"uploadedDocs":4,"EOF":true,"RESPONSE_TIME":146}] } } {code} I thought a bit about making batchSize an optional parameter, and just using a reasonable default/fallback value when no value is provided. But I decided against it, since this is probably something a user should be deciding for themselves. Still no tests on this patch. Running late for work, so I can't add them now. Hopefully that'll be a job for this evening. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski updated SOLR-7535: -- Attachment: SOLR-7535.patch After a bit of thought and a holiday break, I've got my first attempt at this ready for some feedback. h5. Notes about this Patch 1.) No tests yet. It does work (I tried it out manually), but it's getting close to the end of my night, and I wanted to get this out there on the off chance that someone has the time to take a look and give me some feedback before I sit back down to work on this again tomorrow evening. But I am planning on adding tests to {{StreamExpressionTest}}, and {{StreamExpressionToExpessionTest}}. 2.) I didn't make any attempt to restrict the {{TupleStream}} implementations that {{UpdateStream}} can wrap. Mainly because I didn't get around to it yet. But also because, IMO, there are use cases where a user wouldn't need to use a {{SelectStream}} (for example, if they're doing field filtering in their initial Solr query/search() expression). Happy to change this in a subsequent patch. Just wanted to see what people thought. 3.) I kept my original tuple-to-input-doc mapping in tact. It's limited, but as Joel mentioned, will probably do the job for a first pass. h5. Questions about Surrounding Code These aren't necessarily related to this JIRA/patch, but working on this patch made me think of a few questions that I couldn't figure out answers to on my own. 1.) Many of the {{TupleStream}} implementations require a collection to be explicitly stated as the first argument (i.e. {{search(gettingstarted...)}}. However, the collection-name is already specified in the URL path (i.e. {{localhost:7574/solr/gettingstarted/stream?...}}). Are these values ever allowed to be different? 2.) Many of the Stream Expressions are specified using a syntax that mixes named parameters (rows, sort, zkHost, etc.), and unnamed parameters ('collection' is probably the most common). Are there any guidelines/logic around which parameters are named, and which are unnamed? If I'm creating a new TupleStream type (as we are here), are there any guidelines on what the expression interface should look like? Thanks in advance if anyone can help clarify some of those things for me. Should be back online soon to revise this further. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org