[jira] [Commented] (BEAM-91) Retractions

2016-06-24 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348596#comment-15348596
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


Apologies in advance for the SQL like code examples but they are most 
understandable to the general public: 
https://github.com/LamdaFu/bloklinx/wiki/Semantics-and-Usage

> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-22 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345507#comment-15345507
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


With regards versioning structural changes, removing a field is one example as 
is changing the type of a field.  In this case one must replay all relevant 
history with the change applied AND more importantly quickly identify the root 
cause of failures related to the structural change. 

With regard to retaining "deleted" data and relationships, the best real 
example I have are versioned hierarchical structures like zip codes and sales 
territories.  You cannot reject mail because the zip code has changed or moved, 
and sales people will have a conniption if their numbers change and effect 
their commissions.  Thus in the real world these historical structures remain 
frozen in time potentially forever even when they are "deleted".



> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-22 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345476#comment-15345476
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


Yes agreed it is not clear yet from the docs how this relates directly to Beam. 
 However this is mainly a terminology issue in my perspective.  The bespoke 
systems I have built over the last few years to unify batch and stream 
processing all rely on data versioning to ensure point-in-session consistency 
(watermarks) across streams and all data derived from streams such as 
aggregates, transforms, splits, and replicas.  

There is no hard dependency on a configuration service but it is critical to 
keep a current water mark and all historical watermarks in a system of record.  
This could be as simple as a shared file system or as complex as etcd.  

That aside the versioning model I set forward for flatbuffers is an example 
using more recent technologies. I have done the same with relational tables and 
Avro in the past.

I'll work on the examples of how the versioning model feeds aggregate refresh 
and hopefully it will become more clear.  



> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-22 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345012#comment-15345012
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


Added information the the local repo (similar to git local repo): 
https://github.com/LamdaFu/bloklinx/wiki/Local-Repo

Working on Bloklinx swarm design next (i.e. what happens during and after a 
push)

> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-21 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343104#comment-15343104
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


https://github.com/LamdaFu/bloklinx/wiki
^^ provides the basic description
https://github.com/LamdaFu/bloklinx/wiki/Bloklinx-Schema-(flatbuffers)
^^ provides definition of the basic mapping to flatbuffer serialization

I am working on examples of basic versioning as well as branching / merge / 
change data processing

The question I get asked most is why an "UPDATE" is composed of a redaction 
followed by an assertion rather than just one record.  The answer is that this 
provides several huge benefits including very efficient refresh of downstream 
aggregations, splits, merges, easier data diff, and much easier reconciliation. 
 This will become clear with my subsequent examples to come.

> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-21 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342259#comment-15342259
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


Yes, just wanted to be sure there was interest before documenting it.  FYI: it 
requires some form of distributed configuration service such as etcd or 
zookeeper to keep track of in-process change sessions.  Once the change 
sessions are done or "committed" (or time out), they are cleared from the 
config service but can be obtained from logs for later replays.  Also, in terms 
of granularity of change sessions, a large number of change sessions making 
very small changes can cause problems for the design and should be throttled at 
the client side.  I'll post a link here to the doco

> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-21 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342022#comment-15342022
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


The design actually solves multiple problems: i.e. BEAM-25, BEAM-91, and 
BEAM-101 share a common solution.

> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-91) Retractions

2016-06-21 Thread Matt Pouttu-Clarke (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341996#comment-15341996
 ] 

Matt Pouttu-Clarke commented on BEAM-91:


I've solved this problem successfully several times since around 2007.  It 
requires implementing data versioning and treating data much like you would 
treat code in github.  You could also call it streaming with parallel universe 
support, as some consumers may not want or need your redactions, while others 
may have critical need of them (much like in the source code world some users 
do not want immediate "upgrades").  Also, please note that it is just as 
important to support redacting structural changes as it is to support redacting 
data changes.  I have mature and battle tested designs in this area if there's 
interest.

> Retractions
> ---
>
> Key: BEAM-91
> URL: https://issues.apache.org/jira/browse/BEAM-91
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Tyler Akidau
>Assignee: Frances Perry
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> We still haven't added retractions to Beam, even though they're a core part 
> of the model. We should document all the necessary aspects (uncombine, 
> reverting DoFn output with DoOvers, sink integration, source-level 
> retractions, etc), and then implement them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)