[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575681#comment-15575681 ] ASF subversion and git services commented on SOLR-8487: --- Commit edde433594c104668137350d9db640180b04f648 in lucene-solr's branch refs/heads/branch_6x from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=edde433 ] SOLR-8487: Adds CommitStream to support sending commits to a collection being updated > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.3 >Reporter: Jason Gerlowski >Assignee: Dennis Gove >Priority: Minor > Fix For: 6.3 > > Attachments: SOLR-8487.patch, SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513872#comment-15513872 ] Dennis Gove commented on SOLR-8487: --- Added a section in the reference guide - https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions#StreamingExpressions-commit > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.3 >Reporter: Jason Gerlowski >Assignee: Dennis Gove >Priority: Minor > Fix For: 6.3 > > Attachments: SOLR-8487.patch, SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513823#comment-15513823 ] ASF subversion and git services commented on SOLR-8487: --- Commit 6365920a0e9ed3bf0b13b90955cd73535d495f9a in lucene-solr's branch refs/heads/master from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6365920 ] SOLR-8487: Adds CommitStream to support sending commits to a collection being updated > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.3 >Reporter: Jason Gerlowski >Assignee: Dennis Gove >Priority: Minor > Fix For: 6.3 > > Attachments: SOLR-8487.patch, SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510877#comment-15510877 ] Joel Bernstein commented on SOLR-8487: -- +1 looks good > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch, SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15487515#comment-15487515 ] Dennis Gove commented on SOLR-8487: --- I just realized I had a fundamental misunderstanding of the UpdateStream. I thought it was returning all source tuples on a call to read() but that is not the case. It is instead sending a batch of source tuples into the destination collection, dropping them, and then returning a summary tuple. This will change some of the implementation details of the CommitStream. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440449#comment-15440449 ] Joel Bernstein commented on SOLR-8487: -- I'll be on vacation next week, so I don't want to hold things up, if you're feeling good about your current approach. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440446#comment-15440446 ] Joel Bernstein commented on SOLR-8487: -- The original patch had the update function do it's own commits. But it was taken out because if an expression is doing parallel updates there would be multiple workers committing at the same time. So the commit function is needed to support this scenario: {code} commit(parallel(update(search( {code} So I think we're left with the choice of using the data in the tuples returned by the update stream, or leaving it decoupled. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439454#comment-15439454 ] Kevin Risden commented on SOLR-8487: I like the commit() outside update(). This makes commit look more like a count or something similar. One thing that may be useful is amount of time passed (I know this makes it harder): Lets say the underlying stream is a daemon that happens every 30 seconds. If you set the batch size to 1 that would work but maybe you want to commit every 1000 tuples or every 5 minutes. I guess at that point you could instead have Solr doing the auto commit. Just a thought. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439256#comment-15439256 ] Dennis Gove commented on SOLR-8487: --- I'm not a huge fan of tying two streams together like that (ie, one is dependent on the other). If we wanted to tie update and commit more closely I'd rather see the commit as an operation inside the UpdateStream like {code} update(foo, stream(...), batchSize=#, commit(nBatches/batchSize/time)) {code} > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437411#comment-15437411 ] Joel Bernstein commented on SOLR-8487: -- Looks good! Wondering if we should tie this more closely with the update stream. A couple of possibilities: 1) The update stream returns a tuple with each batch, which includes the batch size. Should we use that to calculate when to commit? 2) We could have the update stream add the collection to its outgoing tuples and then use that, instead of specifying the collection as a parameter to the commit function. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > Attachments: SOLR-8487.patch > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435610#comment-15435610 ] Dennis Gove commented on SOLR-8487: --- I'm working on this. Hoping to have a first draft in a day or two. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: 6.0 >Reporter: Jason Gerlowski >Priority: Minor > Fix For: 6.0 > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8487) Add CommitStream to Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082950#comment-15082950 ] Joel Bernstein commented on SOLR-8487: -- Closed this issue by mistake and then re-opened. > Add CommitStream to Streaming API and Streaming Expressions > --- > > Key: SOLR-8487 > URL: https://issues.apache.org/jira/browse/SOLR-8487 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Jason Gerlowski >Priority: Minor > Fix For: Trunk > > > (Paraphrased from Joel's idea/suggestions in the comments of SOLR-7535). > With SOLR-7535, users can now index documents/tuples using an UpdateStream. > However, there's no way currently using the Streaming API to force a commit > on the collection that received these updates. > The purpose of this ticket is to add a CommitStream, which can be used to > trigger commit(s) on a given collection. > The proposed usage/behavior would look a little bit like: > {{commit(collection, parallel(update(search()))}} > Note that... > 1.) CommitStream has a positional collection parameter, to indicate which > collection to commit on. (Alternatively, it could recurse through > {{children()}} nodes until it finds the UpdateStream, and then retrieve the > collection from the UpdateStream). > 2.) CommitStream forwards all tuples received by an underlying, wrapped > stream. > 3.) CommitStream commits when the underlying stream emits its EOF tuple. > (Alternatively, it could commit every X tuples, based on a parameter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org