Repository: apex-malhar
Updated Branches:
  refs/heads/master 513e9e2d4 -> b60cf6eb8


APEXMALHAR-2384 adding documentation for fixed width parser


Project: http://git-wip-us.apache.org/repos/asf/apex-malhar/repo
Commit: http://git-wip-us.apache.org/repos/asf/apex-malhar/commit/b60cf6eb
Tree: http://git-wip-us.apache.org/repos/asf/apex-malhar/tree/b60cf6eb
Diff: http://git-wip-us.apache.org/repos/asf/apex-malhar/diff/b60cf6eb

Branch: refs/heads/master
Commit: b60cf6eb88f5b90be519fae0096477c32e1cea0c
Parents: 513e9e2
Author: Hitesh-Scorpio <forhitesh...@gmail.com>
Authored: Wed Jan 11 12:27:27 2017 +0530
Committer: Hitesh-Scorpio <forhitesh...@gmail.com>
Committed: Mon Apr 3 14:11:06 2017 +0530

----------------------------------------------------------------------
 docs/operators/fixedWidthParserOperator.md      | 240 +++++++++++++++++++
 .../fixedWidthParser/fixedWidthParser.png       | Bin 0 -> 91569 bytes
 mkdocs.yml                                      |   3 +-
 3 files changed, 242 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/apex-malhar/blob/b60cf6eb/docs/operators/fixedWidthParserOperator.md
----------------------------------------------------------------------
diff --git a/docs/operators/fixedWidthParserOperator.md 
b/docs/operators/fixedWidthParserOperator.md
new file mode 100644
index 0000000..f987257
--- /dev/null
+++ b/docs/operators/fixedWidthParserOperator.md
@@ -0,0 +1,240 @@
+Fixed Width Parser Operator
+=============
+
+## Operator Objective
+This operator is designed to parse fixed width records and construct a map or 
concrete java class also known as 
["POJO"](https://en.wikipedia.org/wiki/Plain_Old_Java_Object) out of it. User 
needs to provide the schema to describe the fixed width data. The incoming 
records will be parsed according to the schema and either a map or a POJO (or 
both) is emitted.
+Invalid records will be emitted on the error port along with an error message.
+
+**Note**: field names in the schema must match field names of the POJO and 
must be in the same order as it appears in the incoming data.
+
+FixedWidthParser is **idempotent**, **fault-tolerant** and 
**statically/dynamically partitionable**.
+
+## Class Diagram
+![](images/fixedWidthParser/fixedWidthParser.png)
+
+## Operator Information
+1. Operator location:**_malhar-contrib_**
+2. Available since:**_3.8.0_**
+3. Operator state:**_Evolving_**
+4. Java 
Package:[com.datatorrent.contrib.parser.FixedWidthParser](https://github.com/apache/apex-malhar/blob/master/contrib/src/main/java/com/datatorrent/contrib/parser/FixedWidthParser.java)
+
+
+## <a name="props"></a>Properties of FixedWidthParser
+
+Data in a fixed-width text file is arranged in rows and columns, with one 
entry per row. Fixed width record is one row in a fixed-width file.  Each 
column has a fixed width, specified in characters, which determines the maximum 
amount of data it can contain.  No delimiters are used to separate the fields 
in the file.  Instead, the data is left- or right- justified in its column as 
specified by the `alignment` value of the schema and the remaining space is 
filled by the padding character also specified in the schema, such that the 
start of a given column can always be specified as an offset from the beginning 
of a line.
+
+User needs to set the schema which describes fixed width data.
+
+| **Property** | **Description** | **Type** | **Mandatory** | **Default 
Value** |
+| -------- | ----------- | ---- | ------------------ | ------------- |
+| *jsonSchema* | 
[Schema](https://github.com/apache/apex-malhar/blob/master/contrib/src/main/java/com/datatorrent/contrib/parser/FixedWidthSchema.java)
  describing fixed width data. Based on schema definition the operator will 
parse the incoming record to object map and POJO. Valid records will be emitted 
as POJO / map while invalid ones are emitted on error port with error message. 
| String | YES | N/A |
+
+User can set this property in `properties.xml` in the following way:
+
+```xml
+<property>
+    <name>
+      dt.application.{ApplicationName}.operator.{OperatorName}.prop.jsonSchema
+    </name>
+    <value>
+    {
+      "padding": "_",
+      "alignment": "left",
+      "fields":
+      [
+        {
+          "name": "adId",
+          "type": "Integer",
+          "length": "3",
+          "padding": "0"
+        },
+        {
+          "name": "campaignId",
+          "type": "Integer",
+          "length": "3",
+          "padding": " "
+        },
+        {
+          "name": "adName",
+          "type": "String",
+          "length": "10",
+          "alignment":"right"
+        },
+        {
+          "name": "bidPrice",
+          "type": "Double",
+          "length": "3"
+        },
+        {
+          "name": "startDate",
+          "type": "Date",
+          "format": "yyyy-MM-dd HH:mm:ss",
+          "length": "19"
+        },
+        {
+          "name": "endDate",
+          "type": "Date",
+          "format": "dd/MM/yyyy",
+          "length": "10"
+        },
+        {
+          "name": "securityCode",
+          "type": "Long",
+          "length": "5"
+        },
+        {
+          "name": "active",
+          "type": "Boolean",
+          "length": "5",
+          "trueValue": "true",
+          "falseValue": "false"
+        },
+        {
+          "name": "optimized",
+          "type": "Boolean",
+          "length": "1",
+          "trueValue": "y",
+          "falseValue": "n"
+        },
+        {
+          "name": "parentCampaign",
+          "type": "String",
+          "length": "10"
+        },
+        {
+          "name": "weatherTargeted",
+          "type": "Character",
+          "length": "1"
+        }
+      ]
+    }
+    </value>
+</property>
+```
+Where {OperatorName} is the name of the Operator and {ApplicationName} is the 
name of the application.
+As explained earlier padding is a character used to specify padding used in 
the incoming records to fill the fixed width if required. User has the 
flexibility to specify a single padding character for the entire file or the 
user can choose to provide separate padding character for separate fields 
(columns of the record). The padding value for separate fields (if specified) 
over rides the padding value (global) (if specified) for the entire file.
+Similar to padding character user also has the flexibility to define the 
alignment of the incoming records, user can choose whether the record is 
left,centre or right aligned. Note that currently only British spelling for 
'centre' is accepted.
+
+The sample json schema for records having 3 fields 'Occupation', 'Age' and 
'Gender' with field widths 20,2 and 6, padding characters '#','$' and '@' and 
alignments 'left', 'centre' and 'right' will be as follows:
+```
+{
+  "fields":
+  [
+    {
+      "name": "Occupation",
+      "type": "String",
+      "length": "20",
+      "padding": "#",
+      "alignment":"left"
+    },
+    {
+      "name": "Age",
+      "type": "Integer",
+      "length": "2",
+      "padding": "$",
+      "alignment":"centre"
+    },
+    {
+      "name": "Gender",
+      "type": "String",
+      "length": "6",
+      "padding": "@",
+      "alignment":"right"
+    }
+}
+```
+
+The corresponding record with values for 'Occupation' as Engineer, 'Age' as 30 
and 'Gender' as Male would be as follows:
+```
+Engineer############30@@Male
+```
+
+
+## Platform Attributes that influences operator behavior
+
+| **Attribute** | **Description** | **Type** | **Mandatory** |
+| -------- | ----------- | ---- | ------------------ |
+| *TUPLE_CLASS* | TUPLE_CLASS attribute on output port which tells operator 
the class of POJO which need to be emitted | Class| Yes |
+
+User can set this property in `properties.xml` in the following way:
+In the examples below, {OperatorName} is the name of the Operator, 
{ApplicationName} is the name of the application and 
"com.datatorrent.tutorial.fixedwidthparser.Ad" is the fully qualified name of 
the Tuple class
+
+```xml
+<property>
+    
<name>dt.application.{ApplicationName}.operator.{OperatorName}.port.out.attr.TUPLE_CLASS</name>
+    <value>com.datatorrent.tutorial.fixedwidthparser.Ad</value>
+</property>
+```
+
+Following code can be added to `populateDAG()` method of application to set 
Tuple Class:
+```java
+dag.setOutputPortAttribute({OperatorName}.out, 
Context.PortContext.TUPLE_CLASS, 
com.datatorrent.tutorial.fixedwidthparser.Ad.class);
+```
+
+## <a name="dataTypes"></a>Supported DataTypes in Schema
+  * Integer
+  * Long
+  * Double
+  * Character
+  * String
+  * Boolean
+  * Date
+  * Float
+
+
+## Ports
+
+| **Port** | **Description** | **Type** | **Mandatory** |
+| -------- | ----------- | ---- | ------------------ |
+| *in* | Tuples that needs to be parsed are received on this port | byte[] | 
Yes |
+| *out* | Valid Tuples that are emitted as pojo | Object (POJO) | No |
+| *parsedOutput* | Valid Tuples that are emitted as maps | Map | No |
+| *err* | Invalid Tuples are emitted with error message | KeyValPair <String, 
String\> | No |
+
+## Partitioning
+Fixed Width Parser is both statically and dynamically partitionable.
+### Static Partitioning
+
+Static partitioning can be achieved by specifying the partitioner and number 
of partitions in the populateDAG() method.
+
+```java
+FixedWidthParser fixedWidthParser = dag.addOperator("fixedWidthParser", 
FixedWidthParser.class);
+StatelessPartitioner<FixedWidthParser> partitioner1 = new 
StatelessPartitioner<FixedWidthParser>(2);
+dag.setAttribute(fixedWidthParser, Context.OperatorContext.PARTITIONER, 
partitioner1);
+```
+
+Static partitioning can also be achieved by specifying the partitioner in 
properties file.
+
+```xml
+<property>
+    <name>dt.operator.{OperatorName}.attr.PARTITIONER</name>
+    <value>com.datatorrent.common.partitioner.StatelessPartitioner:2</value>
+</property>
+```
+
+where {OperatorName} is the name of the FixedWidthParser operator. Above lines 
will partition FixedWidthParser statically 2 times. Above value can be changed 
accordingly to change the number of static partitions.
+
+
+### Dynamic Partioning
+
+FixedWidthParser can be dynamically partitioned using out-of-the-box 
partitioner:
+
+#### Throughput based
+Following code can be added to `populateDAG()` method of application to 
dynamically partition FixedWidthParser:
+```java
+FixedWidthParser fixedWidthParser = dag.addOperator("fixedWidthParser", 
FixedWidthParser.class);
+StatelessThroughputBasedPartitioner<FixedWidthParser> partitioner = new 
StatelessThroughputBasedPartitioner<>();
+partitioner.setCooldownMillis(conf.getLong("dt.cooldown", 10000));
+partitioner.setMaximumEvents(conf.getLong("dt.maxThroughput", 30000));
+partitioner.setMinimumEvents(conf.getLong("dt.minThroughput", 10000));
+dag.setAttribute(fixedWidthParser, OperatorContext.STATS_LISTENERS, 
Arrays.asList(new StatsListener[]{partitioner}));
+dag.setAttribute(fixedWidthParser, OperatorContext.PARTITIONER, partitioner);
+```
+
+Above code will dynamically partition FixedWidthParser when the throughput 
changes.
+If the overall throughput of FixedWidthParser goes beyond 30000 or less than 
10000, the platform will repartition FixedWidthParser
+to balance throughput of a single partition to be between 10000 and 30000.
+CooldownMillis of 10000 will be used as the threshold time for which the 
throughput change is observed.
+
+## Example
+Example for Fixed Width Parser can be found at: 
[https://github.com/DataTorrent/examples/tree/master/tutorials/parser](https://github.com/DataTorrent/examples/tree/master/tutorials/parser)

http://git-wip-us.apache.org/repos/asf/apex-malhar/blob/b60cf6eb/docs/operators/images/fixedWidthParser/fixedWidthParser.png
----------------------------------------------------------------------
diff --git a/docs/operators/images/fixedWidthParser/fixedWidthParser.png 
b/docs/operators/images/fixedWidthParser/fixedWidthParser.png
new file mode 100644
index 0000000..3359237
Binary files /dev/null and 
b/docs/operators/images/fixedWidthParser/fixedWidthParser.png differ

http://git-wip-us.apache.org/repos/asf/apex-malhar/blob/b60cf6eb/mkdocs.yml
----------------------------------------------------------------------
diff --git a/mkdocs.yml b/mkdocs.yml
index af78fa4..6ac8b94 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -15,7 +15,8 @@ pages:
     - File Output: operators/file_output.md
     - File Splitter: operators/file_splitter.md
     - Filter: operators/filter.md
-    - JDBC Output Operator: 
operators/AbstractJdbcTransactionableOutputOperator.md
+    - Fixed Width Parser: operators/fixedWidthParserOperator.md
+    - Jdbc Output Operator: 
operators/AbstractJdbcTransactionableOutputOperator.md
     - JDBC Poller Input: operators/jdbcPollInputOperator.md
     - JMS Input: operators/jmsInputOperator.md
     - JSON Formatter: operators/jsonFormatter.md

Reply via email to