[1/2] apex-malhar git commit: APEXMALHAR-2179: Add documentation for JDBC Poll Input Operator

2016-10-01 Thread ramapex
Repository: apex-malhar
Updated Branches:
  refs/heads/master baff632ae -> 12d6183cf


APEXMALHAR-2179: Add documentation for JDBC Poll Input Operator


Project: http://git-wip-us.apache.org/repos/asf/apex-malhar/repo
Commit: http://git-wip-us.apache.org/repos/asf/apex-malhar/commit/87a72434
Tree: http://git-wip-us.apache.org/repos/asf/apex-malhar/tree/87a72434
Diff: http://git-wip-us.apache.org/repos/asf/apex-malhar/diff/87a72434

Branch: refs/heads/master
Commit: 87a72434274c27532c8f38a71dfe8e51e85cc8db
Parents: 0a924ad
Author: Priyanka Gugale 
Authored: Tue Aug 9 15:45:57 2016 +0530
Committer: Priyanka Gugale 
Committed: Wed Sep 21 23:51:58 2016 +0530

--
 .../images/jdbcinput/operatorsClassDiagram.png  | Bin 0 -> 49841 bytes
 docs/operators/jdbcPollInputOperator.md | 175 +++
 2 files changed, 175 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/apex-malhar/blob/87a72434/docs/operators/images/jdbcinput/operatorsClassDiagram.png
--
diff --git a/docs/operators/images/jdbcinput/operatorsClassDiagram.png 
b/docs/operators/images/jdbcinput/operatorsClassDiagram.png
new file mode 100644
index 000..4b0432d
Binary files /dev/null and 
b/docs/operators/images/jdbcinput/operatorsClassDiagram.png differ

http://git-wip-us.apache.org/repos/asf/apex-malhar/blob/87a72434/docs/operators/jdbcPollInputOperator.md
--
diff --git a/docs/operators/jdbcPollInputOperator.md 
b/docs/operators/jdbcPollInputOperator.md
new file mode 100644
index 000..aa1d107
--- /dev/null
+++ b/docs/operators/jdbcPollInputOperator.md
@@ -0,0 +1,175 @@
+JDBC Poller Input Operator
+=
+
+## Operator Objective
+This operator scans JDBC database table in parallel fashion. This operator is 
added to address common input operator problems like,
+
+1. As discussed in [Development Best 
Practices](https://github.com/apache/apex-core/blob/master/docs/development_best_practices.md),
+the operator callbacks such as `beginWindow()`, `endWindow()`, 
`emitTuples()`, etc.
+(which are invoked by the main operator thread)
+are required to return quickly, well within the default streaming window 
duration of
+500ms. This requirement can be an issue when retrieving data from slow 
external systems
+such as databases or object stores: if the call takes too long, the 
platform will deem
+the operator blocked and restart it. Restarting will often run into the 
same issue
+causing an unbroken sequence of restarts.
+
+2. When a large volume of data is available from a single store that allows 
reading from
+   arbitrary locations (such as a file or a database table), reading the data 
sequentially
+   can be throughput limiting: Having multiple readers read from 
non-overlapping sections
+   of the store allows any downstream parallelism in the DAG to be exploited 
better to
+   enhance throughput. For files, this approach is used by the file splitter 
and block
+   reader operators in the Malhar library.
+
+JDBC Poller Input operator addresses the first issue with an asynchronous 
worker thread which retrieves the data and adds it to an in-memory queue; the 
main operator thread dequeue tuples very quickly if data is available or simply 
returns if not. The second is addressed in a way that parallels the approach to 
files by having multiple partitions read records from non-overlapping areas of 
the table. Additional details of how this is done are described below.
+
+ Assumption
+Assumption is that there is an ordered column using which range queries can be 
formed. That means database has a column or combination of columns which has 
unique constraint as well as every newly inserted record should have column 
value more than max value in that column, as we poll only appended records.
+
+## Use cases
+1. Scan huge database tables to either copy to other database or process it 
using **Apache Apex**. An example application using this operator to copy 
database contents to HDFS is available in the [examples 
repository](https://github.com/DataTorrent/examples/tree/master/tutorials/jdbcIngest).
 Look for "PollJdbcToHDFSApp" for example of this particular operator.
+
+## How to Use?
+The tuple type in the abstract class is a generic parameter. Concrete 
subclasses need to choose an appropriate class (such as String or an 
appropriate concrete java class, having no-argument constructor so that it can 
be serialized using kyro). Also implement a couple of abstract methods: 
`getTuple(ResultSet)` to convert database rows to objects of concrete class and 
`emitTuple(T)` to emit the tuple.
+
+In principle, no ports need be defined in the rare case that the operator 
simply 

[2/2] apex-malhar git commit: Merge branch 'APEXMALHAR-2179-jdbc-documentation' of https://github.com/DT-Priyanka/incubator-apex-malhar

2016-10-01 Thread ramapex
Merge branch 'APEXMALHAR-2179-jdbc-documentation' of 
https://github.com/DT-Priyanka/incubator-apex-malhar


Project: http://git-wip-us.apache.org/repos/asf/apex-malhar/repo
Commit: http://git-wip-us.apache.org/repos/asf/apex-malhar/commit/12d6183c
Tree: http://git-wip-us.apache.org/repos/asf/apex-malhar/tree/12d6183c
Diff: http://git-wip-us.apache.org/repos/asf/apex-malhar/diff/12d6183c

Branch: refs/heads/master
Commit: 12d6183cfa69874915be5c8fc61d80840af77120
Parents: baff632 87a7243
Author: Munagala V. Ramanath 
Authored: Sat Oct 1 07:29:59 2016 -0700
Committer: Munagala V. Ramanath 
Committed: Sat Oct 1 07:29:59 2016 -0700

--
 .../images/jdbcinput/operatorsClassDiagram.png  | Bin 0 -> 49841 bytes
 docs/operators/jdbcPollInputOperator.md | 175 +++
 2 files changed, 175 insertions(+)
--