[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348596#comment-15348596 ] Matt Pouttu-Clarke commented on BEAM-91: Apologies in advance for the SQL like code examples but they are most understandable to the general public: https://github.com/LamdaFu/bloklinx/wiki/Semantics-and-Usage > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345507#comment-15345507 ] Matt Pouttu-Clarke commented on BEAM-91: With regards versioning structural changes, removing a field is one example as is changing the type of a field. In this case one must replay all relevant history with the change applied AND more importantly quickly identify the root cause of failures related to the structural change. With regard to retaining "deleted" data and relationships, the best real example I have are versioned hierarchical structures like zip codes and sales territories. You cannot reject mail because the zip code has changed or moved, and sales people will have a conniption if their numbers change and effect their commissions. Thus in the real world these historical structures remain frozen in time potentially forever even when they are "deleted". > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345476#comment-15345476 ] Matt Pouttu-Clarke commented on BEAM-91: Yes agreed it is not clear yet from the docs how this relates directly to Beam. However this is mainly a terminology issue in my perspective. The bespoke systems I have built over the last few years to unify batch and stream processing all rely on data versioning to ensure point-in-session consistency (watermarks) across streams and all data derived from streams such as aggregates, transforms, splits, and replicas. There is no hard dependency on a configuration service but it is critical to keep a current water mark and all historical watermarks in a system of record. This could be as simple as a shared file system or as complex as etcd. That aside the versioning model I set forward for flatbuffers is an example using more recent technologies. I have done the same with relational tables and Avro in the past. I'll work on the examples of how the versioning model feeds aggregate refresh and hopefully it will become more clear. > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345012#comment-15345012 ] Matt Pouttu-Clarke commented on BEAM-91: Added information the the local repo (similar to git local repo): https://github.com/LamdaFu/bloklinx/wiki/Local-Repo Working on Bloklinx swarm design next (i.e. what happens during and after a push) > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343104#comment-15343104 ] Matt Pouttu-Clarke commented on BEAM-91: https://github.com/LamdaFu/bloklinx/wiki ^^ provides the basic description https://github.com/LamdaFu/bloklinx/wiki/Bloklinx-Schema-(flatbuffers) ^^ provides definition of the basic mapping to flatbuffer serialization I am working on examples of basic versioning as well as branching / merge / change data processing The question I get asked most is why an "UPDATE" is composed of a redaction followed by an assertion rather than just one record. The answer is that this provides several huge benefits including very efficient refresh of downstream aggregations, splits, merges, easier data diff, and much easier reconciliation. This will become clear with my subsequent examples to come. > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342259#comment-15342259 ] Matt Pouttu-Clarke commented on BEAM-91: Yes, just wanted to be sure there was interest before documenting it. FYI: it requires some form of distributed configuration service such as etcd or zookeeper to keep track of in-process change sessions. Once the change sessions are done or "committed" (or time out), they are cleared from the config service but can be obtained from logs for later replays. Also, in terms of granularity of change sessions, a large number of change sessions making very small changes can cause problems for the design and should be throttled at the client side. I'll post a link here to the doco > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342022#comment-15342022 ] Matt Pouttu-Clarke commented on BEAM-91: The design actually solves multiple problems: i.e. BEAM-25, BEAM-91, and BEAM-101 share a common solution. > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341996#comment-15341996 ] Matt Pouttu-Clarke commented on BEAM-91: I've solved this problem successfully several times since around 2007. It requires implementing data versioning and treating data much like you would treat code in github. You could also call it streaming with parallel universe support, as some consumers may not want or need your redactions, while others may have critical need of them (much like in the source code world some users do not want immediate "upgrades"). Also, please note that it is just as important to support redacting structural changes as it is to support redacting data changes. I have mature and battle tested designs in this area if there's interest. > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau >Assignee: Frances Perry > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)