[jira] [Commented] (APEXMALHAR-2278) Implement Kudu Output Operator for non-transactional streams

2017-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995704#comment-15995704
 ] 

ASF GitHub Bot commented on APEXMALHAR-2278:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/486


> Implement Kudu Output Operator for non-transactional streams
> 
>
> Key: APEXMALHAR-2278
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2278
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: adapters database
>Reporter: Ananth
>Assignee: Ananth
> Fix For: 3.8.0
>
>
> Here are some benefits of integrating Kudu and Apex:
> Kudu is just declared 1.0 and has just been declared production ready.
> Kudu as a store might a good a fit for many architectures in the years to 
> come because of its capabilities to provide mutability of data ( unlike HDFS 
> ) and optimized storage formats for low latency scans.
> It seems to also withstand high-throughput write patterns which makes it 
> a stable sink for Apex workflows which operate at very high volumes. 
> [Design] 
> 1. The operator would be an AbstractOperator and would allow the concrete 
> implementations to set a few behavioral aspects of the operator. 
> 2. The following are the major phases of the operator:
> During activate() phase of the operator : Establish a connection to the 
> cluster and get the metadata about the table that is being used as the sink.
> During setup() phase of the operator: Fetch the current window 
> information and use it decide if we are recovering from a failure mode. (See 
> point 8 below )
> During process() of Input port : Inspect the incoming ExecutionContext ( 
> see below ) tuple and perform one of the  operations ( 
> Insert/Update/Delete/Upsert) 
> 3. The following parameters are tunable while establishing a Kudu connection:
> Table name, Boss worker threads, Worker threads, Socket read time outs 
> and External Consistency mode.
> 4. The user need not specify any schema outright. The pojo fields are 
> automatically mapped to the table column names as identified in the schema 
> parse in the activate phase. 
> 5. Allow the concrete implementation of the operator to override the Pojo 
> field name to the table schema column name. This would allow flexibility in 
> use cases like table schema column names are not compatible with java bean 
> frameworks or in situations when column names cant be controlled as POJO is 
> coming from an upstream operator.
> 6. The input tuple that is to be supplied to this operator is of type "Kudu 
> Execution Context". This tuple encompasses the actual Pojo that is going to 
> be persisted to the Kudu store. Additionally it allows the upstream operator 
> to specify the operation that needs to be performed. One of the following 
> operations is permitted as part of the context : Insert, Upsert, Update and 
> delete on the Pojo that is acting as the payload in the Execution Context.
> 7. The concrete implementation of the operator would allow the user to 
> specify the actual POJO class definition that would be used to the write to 
> the table. The execution context would contain this POJO as well as the 
> metadata that defines the behavior of the processing that needs to be done on 
> that tuple.
> 8. The operator would allow for a special case of execution mode for the 
> first window that is being processed as the operator gets activated. There 
> are two modes for the first window of processing of the operator : 
>   a. Safe Mode :  Safe mode is the "happy path execution" as in no extra 
> processing is required to perform the Kudu mutation.
>   b. Reconciling Mode:  There is an additional function that would be 
> called to see if the user would like the tuple to be used for mutation. This 
> mode is automatically set when OperatorContext.ACTIVATION_WINDOW_ID != 
> Stateless.WINDOW_ID   during the first window of processing by the operator. 
> This feature is deemed to be useful when an operator is recovering from a 
> crash instance of the application and we do not want to perform multiple 
> mutations of the same tuple given ATLEAST_ONCE is the default semantics.  
> 9. The operator is a stateless operator. 
> 10. The operator would generate the following autometrics :
>   a. Counts  of Inserts, Upserts, Deletes and Updates (separate counters 
> for each mutation) for a given window
>   b. Bytes written in a given window
>   c. Write RPCs in the given window
>   d. Total RPC errors in this window
>e. All of the above metrics for the operator for its entire lifetime 
> of the operator. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2278) Implement Kudu Output Operator for non-transactional streams

2016-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647571#comment-15647571
 ] 

ASF GitHub Bot commented on APEXMALHAR-2278:


GitHub user ananthc opened a pull request:

https://github.com/apache/apex-malhar/pull/486

APEXMALHAR-2278 Implement KuduNonTransactional Output Operator

@tweise  / @PramodSSImmaneni  : Could you please review or suggest the 
right person who can complete this review for me? 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ananthc/apex-malhar 
APEXMALHAR-2278.KuduNonTransactionalOutputOperator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #486


commit af6944f53c86d0fb355798b9c48d0156964648c9
Author: ananthc 
Date:   2016-11-08T13:28:28Z

APEXMALHAR-2278 Implement KuduNonTransactional Output Operator




> Implement Kudu Output Operator for non-transactional streams
> 
>
> Key: APEXMALHAR-2278
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2278
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>  Components: adapters database
>Reporter: Ananth
>Assignee: Ananth
>
> Here are some benefits of integrating Kudu and Apex:
> Kudu is just declared 1.0 and has just been declared production ready.
> Kudu as a store might a good a fit for many architectures in the years to 
> come because of its capabilities to provide mutability of data ( unlike HDFS 
> ) and optimized storage formats for low latency scans.
> It seems to also withstand high-throughput write patterns which makes it 
> a stable sink for Apex workflows which operate at very high volumes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)