[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-04 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-7535:
-
Attachment: SOLR-7535.patch

Patch with the latest work. Ready to commit but having a hard time getting the 
full test suite to run through. I had a stall earlier on the 
StreamingExpressionTests which I had never seen before. So I'm being extra 
careful with this. I'd like to run the tests successfully several more times to 
see if it was a one time problem.


> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-7535:
-
Attachment: SOLR-7535.patch

Changed testParallelUpdateStream() to mirror the changes made to 
testUpdateStread().

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-7535:
-
Attachment: SOLR-7535.patch

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-7535:
-
Attachment: SOLR-7535.patch

New patch that wraps the stream source in a PushbackStream. This allows us to 
push back the EOF tuple and upload the batch. This is a nice approach that 
preserves the EOF tuple from the source stream in case there is info in the EOF 
tuple.

Existing tests are passing with this patch.

I'll spend some time today expanding the tests. 

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-7535:
-
Attachment: (was: SOLR-7535.patch)

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-03 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-7535:
-
Attachment: SOLR-7535.patch

Added multi-value fields to testUpdateStream() and am also now checking the 
values of the Tuples from the destination collection.

I'll do the same for the testParallelUpdateStream and then move on to manual 
testing.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-02 Thread Jason Gerlowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-7535:
--
Attachment: SOLR-7535.patch

I forgot to make the change Joel suggested for supporting multivalued fields.  
This patch is a small update to take care of that.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-02 Thread Jason Gerlowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-7535:
--
Attachment: SOLR-7535.patch

After some more poking around last night and this morning (and with help from 
Joel and Dennis), I found where my confusion was coming from yesterday.

I've updated the patch to include *basic* tests.  {{StreamExpressionTest}} now 
has clauses for testUpdateStream, and testParallelUpdateStream.  I also added a 
testUpdateStream to {{StreamExpressionToExpessionTest}}  The tests (hopefully) 
do about what you expect them to.

I want to stress that these are just _basic_ tests though.  There were a few 
other test cases that I thought of adding but didn't.  (test where batchSize 
evenly divides into numResults for underlying stream, test where batchSize 
doesn't evenly divide, test where there are 0 results from underlying stream, 
test that nice messages are returned on common error cases, test that 
multivalued fields are handled properly, etc.)

I'm happy to add these sorts of tests too if people think they're worth the 
future-maintenance and test-suite-runtime cost.  (I think they're def worth it, 
but I wanted to defer to others with more experience before starting...just a 
sanity check).  Ideally, since there's more cases I'm trying to cover, I'd like 
to put these tests in a separate file entirely (i.e. a new 
{{UpdateStreamTest}}).

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-30 Thread Jason Gerlowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-7535:
--
Attachment: SOLR-7535.patch

Latest patch addresses 3 of Joel's concerns (see above):

1.) UpdateStream now extends TupleStream, instead of CloudSolrStream
2.) UpdateStream no longer commits on EOF.
3.) UpdateStream now takes a mandatory batchSize argument as a namedParameter.  
It reads batchSize tuples from the wrapped stream before sending them off.  It 
then spits out a tuple with a "uploadedDocs" parameter.

i.e. the stream now outputs data that looks like:
{code}
{"result-set":
{"docs":[
{"uploadedDocs":5},
{"uploadedDocs":5},
{"uploadedDocs":5},

{"uploadedDocs":4,"EOF":true,"RESPONSE_TIME":146}]
}
}
{code}

I thought a bit about making batchSize an optional parameter, and just using a  
reasonable default/fallback value when no value is provided.  But I decided 
against it, since this is probably something a user should be deciding for 
themselves.


Still no tests on this patch.  Running late for work, so I can't add them now.  
Hopefully that'll be a job for this evening.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-28 Thread Jason Gerlowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski updated SOLR-7535:
--
Attachment: SOLR-7535.patch

After a bit of thought and a holiday break, I've got my first attempt at this 
ready for some feedback.

h5. Notes about this Patch
1.) No tests yet.  It does work (I tried it out manually), but it's getting 
close to the end of my night, and I wanted to get this out there on the off 
chance that someone has the time to take a look and give me some feedback 
before I sit back down to work on this again tomorrow evening.  But I am 
planning on adding tests to {{StreamExpressionTest}}, and 
{{StreamExpressionToExpessionTest}}.
2.) I didn't make any attempt to restrict the {{TupleStream}} implementations 
that {{UpdateStream}} can wrap.  Mainly because I didn't get around to it yet.  
But also because, IMO, there are use cases where a user wouldn't need to use a 
{{SelectStream}} (for example, if they're doing field filtering in their 
initial Solr query/search() expression).  Happy to change this in a subsequent 
patch.  Just wanted to see what people thought.
3.) I kept my original tuple-to-input-doc mapping in tact.  It's limited, but 
as Joel mentioned, will probably do the job for a first pass.

h5. Questions about Surrounding Code
These aren't necessarily related to this JIRA/patch, but working on this patch 
made me think of a few questions that I couldn't figure out answers to on my 
own.

1.) Many of the {{TupleStream}} implementations require a collection to be 
explicitly stated as the first argument (i.e. {{search(gettingstarted...)}}.  
However, the collection-name is already specified in the URL path (i.e. 
{{localhost:7574/solr/gettingstarted/stream?...}}).  Are these values ever 
allowed to be different?
2.) Many of the Stream Expressions are specified using a syntax that mixes 
named parameters (rows, sort, zkHost, etc.), and unnamed parameters 
('collection' is probably the most common).  Are there any guidelines/logic 
around which parameters are named, and which are unnamed?  If I'm creating a 
new TupleStream type (as we are here), are there any guidelines on what the 
expression interface should look like?


Thanks in advance if anyone can help clarify some of those things for me.  
Should be back online soon to revise this further. 

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org