[jira] [Commented] (APEXCORE-332) Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303566#comment-15303566
 ] 

ASF GitHub Bot commented on APEXCORE-332:
-

Github user ilganeli commented on the pull request:


https://github.com/apache/incubator-apex-core/pull/347#issuecomment-222066531
  
Makes sense / that's why I didn't want to dive too deep. Just enough to get 
a feel for what it would look like. 


> Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions
> 
>
> Key: APEXCORE-332
> URL: https://issues.apache.org/jira/browse/APEXCORE-332
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>Priority: Minor
>
> Currently partitions masks must be defined as a binary mask. As a result the 
> number of partitions must be a power of 2, otherwise the distribution of 
> tuples will be uneven. If we support the modulus operation instead of a 
> binary mask, we could support an even distribution of tuples to a non-power 
> of 2 number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-332) Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303463#comment-15303463
 ] 

ASF GitHub Bot commented on APEXCORE-332:
-

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/incubator-apex-core/pull/347#discussion_r64856234
  
--- Diff: .idea/codeStyleSettings.xml ---
@@ -104,5 +104,6 @@
   
 
 
+
   
 
--- End diff --

This file is created by IntelliJ and is supposed not to have final line 
terminator.


> Support Even Distribution Of Tuples To A Non Power Of 2 Number Of Partitions
> 
>
> Key: APEXCORE-332
> URL: https://issues.apache.org/jira/browse/APEXCORE-332
> Project: Apache Apex Core
>  Issue Type: Improvement
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>Priority: Minor
>
> Currently partitions masks must be defined as a binary mask. As a result the 
> number of partitions must be a power of 2, otherwise the distribution of 
> tuples will be uneven. If we support the modulus operation instead of a 
> binary mask, we could support an even distribution of tuples to a non-power 
> of 2 number of partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-core pull request: [APEXCORE-332][WIP] Improve part...

2016-05-26 Thread amberarrow
Github user amberarrow commented on a diff in the pull request:

https://github.com/apache/incubator-apex-core/pull/347#discussion_r64847617
  
--- Diff: .idea/codeStyleSettings.xml ---
@@ -104,5 +104,6 @@
   
 
 
+
   
 
--- End diff --

Missing final line terminator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-apex-core pull request: [APEXCORE-332][WIP] Improve part...

2016-05-26 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/incubator-apex-core/pull/347#discussion_r64846471
  
--- Diff: .idea/codeStyleSettings.xml ---
@@ -104,5 +104,6 @@
   
 
 
+
--- End diff --

Please keep original IntelliJ code style settings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXMALHAR-2086) Kafka Output Operator with Kafka 0.9 API

2016-05-26 Thread Sandesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandesh updated APEXMALHAR-2086:

Description: 
Goal : 2 Operartors for Kafka Output

  1. Simple Kafka Output Operator 
- Supports Atleast Once 
- Expose most used producer properties as class properties

  2. Exactly Once Kafka Output ( Not possible in all the cases, will be 
documented later )


Design for Exactly Once

Window Data Manager - Stores the Kafka partitions offsets.
Kafka Key - Used by the operator = AppID#OperatorId

During recovery. Partially written window is re-created using the following  
approach:

Tuples between the largest recovery offsets and the current offset are checked. 
Based on the key, tuples written by the other entities are discarded. 

Only tuples which are not in the recovered set are emitted.
  

  was:
Goal : 2 Operartors for Kafka Output

  1. Simple Kafka Output Operator 
- Supports Atleast Once 
- Expose most used producer properties as class properties

  2. Exactly Once Kafka Output ( Not possible in all the cases, will be 
documented later )

  


> Kafka Output Operator with Kafka 0.9 API
> 
>
> Key: APEXMALHAR-2086
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2086
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Sandesh
>Assignee: Sandesh
>
> Goal : 2 Operartors for Kafka Output
>   1. Simple Kafka Output Operator 
> - Supports Atleast Once 
> - Expose most used producer properties as class properties
>   2. Exactly Once Kafka Output ( Not possible in all the cases, will be 
> documented later )
> 
> Design for Exactly Once
> Window Data Manager - Stores the Kafka partitions offsets.
> Kafka Key - Used by the operator = AppID#OperatorId
> During recovery. Partially written window is re-created using the following  
> approach:
> Tuples between the largest recovery offsets and the current offset are 
> checked. Based on the key, tuples written by the other entities are 
> discarded. 
> Only tuples which are not in the recovered set are emitted.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka Exactly once output operator

2016-05-26 Thread Sandesh Hegde
Current design:

Window Data Manager - Stores the Kafka partitions offsets.
Kafka Key - Used by the operator = AppID#OperatorId

During recovery. Partially written window is re-created using the following
 approach:

Tuples between the largest recovery offsets and the current offset are
checked. Based on the key, tuples written by the other entities are
discarded.

Only tuples which are not in the recovered set are emitted.

Here is the first cut of the design
https://github.com/apache/incubator-apex-malhar/pull/298

Please give your feedbacks on the design.

@Bright,
Recovery data needs to be present in the Key, to distinguish the tuple
coming from the different instances of the output operator or external
applications.

Thanks

On Fri, May 13, 2016 at 2:14 PM Bright Chen  wrote:

> Hi Sandesh,
> I think it’s maybe better to keep it into Jira.
>
> Do you mean keep the key in other Kafka topic or the key is in fact the
> key of Kafka Message which represent user tuple?
> If it  is separate key, how to keep the relation between key and value?
> If Key is the key of Kafka message, basically, it will change the expected
> data. As I understand, the key here is just used for recovery, it’s not the
> data user required. And the data which write to the Kafka probably need to
> be decided by the customer logic.
>
> Think about a customer build two applications with our operator, the first
> application write data to Kafka, the second one read data from Kafka. And
> at the very beginning, the first application was implemented by a
> none-exactly once output operator, and then changed to exactly once
> operator. I think the customer don’t expect to change the second
> application. But the second application has to be changed if it’s logic
> depended on key.
>
> thanks
> Bright
>
> > On May 13, 2016, at 12:37 PM, Sandesh Hegde 
> wrote:
> >
> > Hi All,
> >
> > I am working on Kafka 0.9 output operator and one of the requirement is
> to
> > implement Exactly Once Output operator. Here is the one possible idea,
> > please give your feedback or suggest new design.
> >
> >
> -
> >
> > Use *Key* to store meta information which is used during recovery.
> >
> > Operator users will use *Value* to store their key-value pair and
> implement
> > the Kafka partitioning accordingly.
> >
> > Format of the *Key* is as specified below:
> >
> >
> >
> > Key = 1. OperatorName#ApexPartitionId#WindowId#Message#MessageId ( During
> > message write )
> >
> > 2. OperatorName#ApexPartitionId#WindowId#CheckPoint ( During end
> > Window )
> >
> > During End window, checkpoint marker is written to all the Kafka
> partitions
> > of the topic.
> >
> > Every message is given a message id, counter-reset every window, and then
> > written to Kafka.
> >
> > During recovery, Kafka partitions are read until the last checkpoint
> > message from this operator is reached and the partially written window is
> > constructed.
> >
> >
> 
> >
> > Note: Existing Kafka exactly once operator, ( Kafka 0.8 ) also needs to
> be
> > re-written.
> >
> > Thanks
>
>


[GitHub] incubator-apex-malhar pull request: *For review only* [APEXMALHAR-...

2016-05-26 Thread sandeshh
GitHub user sandeshh opened a pull request:

https://github.com/apache/incubator-apex-malhar/pull/298

*For review only* [APEXMALHAR-2086] Kafka output: 0.9.1 first cut

Kafka output exactly once operator and the regular output operator.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sandeshh/incubator-apex-malhar APEXMALHAR-2086

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-apex-malhar/pull/298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #298


commit ea26306b1bc4a3d13ef3e28a222032ad7d1f6955
Author: sandeshh 
Date:   2016-05-25T15:56:56Z

Kafka output: 0.9.1 first cut




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-208) StreamingContainer.java Contains Violations of Style Guide

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303221#comment-15303221
 ] 

ASF GitHub Bot commented on APEXCORE-208:
-

Github user ilganeli closed the pull request at:

https://github.com/apache/incubator-apex-core/pull/346


> StreamingContainer.java Contains Violations of Style Guide
> --
>
> Key: APEXCORE-208
> URL: https://issues.apache.org/jira/browse/APEXCORE-208
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Ilya Ganelin
>
> StreamingContainer.java needs to be updated to conform to CodeStyle checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-core pull request: [APEXCORE-208] Fixed checkstyle ...

2016-05-26 Thread davidyan74
Github user davidyan74 commented on the pull request:


https://github.com/apache/incubator-apex-core/pull/346#issuecomment-222028058
  
@ilganeli I see that you're changing the code by wrapping the long lines. 
We some time ago decided on the Apex dev mailing list that we are not enforcing 
the character limit per line. See 
[here](http://mail-archives.apache.org/mod_mbox/apex-dev/201512.mbox/%3ccakjfldmw68-afrmtxj_7l8y36scojrzyhbkw3axcdmc8po0...@mail.gmail.com%3E).

If you're using IntelliJ IDEA, you can turn on the "Soft Wrap" feature so 
that you don't have to fiddle with the horizontal scrollbar.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...

2016-05-26 Thread gauravgopi123
Github user gauravgopi123 commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64840077
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/FSInputModule.java ---
@@ -59,6 +58,9 @@
   private long blockSize;
   private boolean sequencialFileRead = false;
   private int readersCount;
+  @NotNull
+  @Min(1)
+  protected Integer blocksThreshold;
--- End diff --

why Integer type and not primitive int type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2096) Add blockThreshold parameter to FSInputModule

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303198#comment-15303198
 ] 

ASF GitHub Bot commented on APEXMALHAR-2096:


Github user gauravgopi123 commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64840077
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/FSInputModule.java ---
@@ -59,6 +58,9 @@
   private long blockSize;
   private boolean sequencialFileRead = false;
   private int readersCount;
+  @NotNull
+  @Min(1)
+  protected Integer blocksThreshold;
--- End diff --

why Integer type and not primitive int type?


> Add blockThreshold parameter to FSInputModule
> -
>
> Key: APEXMALHAR-2096
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2096
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Priyanka Gugale
>Assignee: Priyanka Gugale
>
> FileSplitter is very fast, the downstream operators can't work at that speed, 
> so we need to limit the speed with which fileSplitter works.
> One way to do is limit number of blocks emitted per window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2096) Add blockThreshold parameter to FSInputModule

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303179#comment-15303179
 ] 

ASF GitHub Bot commented on APEXMALHAR-2096:


Github user chandnisingh commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64839243
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java ---
@@ -53,6 +53,7 @@
* This is a threshold on the no. of blocks emitted per window. A lot of 
blocks emitted
* per window can overwhelm the downstream operators. This setting helps 
to control that.
*/
+  @NotNull
--- End diff --

Please don't change the type.  Having the constraint min(1) is sufficient 


> Add blockThreshold parameter to FSInputModule
> -
>
> Key: APEXMALHAR-2096
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2096
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Priyanka Gugale
>Assignee: Priyanka Gugale
>
> FileSplitter is very fast, the downstream operators can't work at that speed, 
> so we need to limit the speed with which fileSplitter works.
> One way to do is limit number of blocks emitted per window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-208) StreamingContainer.java Contains Violations of Style Guide

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303135#comment-15303135
 ] 

ASF GitHub Bot commented on APEXCORE-208:
-

GitHub user ilganeli opened a pull request:

https://github.com/apache/incubator-apex-core/pull/346

[APEXCORE-208] Fixed checkstyle violations in StreamingContainer.java

- Fixed checkstyle violations in StreamingContainer.java. 
- Updated apex-style configuration in .idea to be consistent with 
CheckStyle settings in regards to line wrapping

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/incubator-apex-core APEXCORE-208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-apex-core/pull/346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #346


commit 32570003bcff059ae9a8f70f7129baf9681ac8e6
Author: Ilya Ganelin 
Date:   2016-05-26T22:58:32Z

Fixed checkstyle violations in StreamingContainer.java. Updated apex-style 
configuration in .idea to be consistent with checkstyle settings.




> StreamingContainer.java Contains Violations of Style Guide
> --
>
> Key: APEXCORE-208
> URL: https://issues.apache.org/jira/browse/APEXCORE-208
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: Ilya Ganelin
>
> StreamingContainer.java needs to be updated to conform to CodeStyle checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-core pull request: [APEXCORE-208] Fixed checkstyle ...

2016-05-26 Thread ilganeli
GitHub user ilganeli opened a pull request:

https://github.com/apache/incubator-apex-core/pull/346

[APEXCORE-208] Fixed checkstyle violations in StreamingContainer.java

- Fixed checkstyle violations in StreamingContainer.java. 
- Updated apex-style configuration in .idea to be consistent with 
CheckStyle settings in regards to line wrapping

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/incubator-apex-core APEXCORE-208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-apex-core/pull/346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #346


commit 32570003bcff059ae9a8f70f7129baf9681ac8e6
Author: Ilya Ganelin 
Date:   2016-05-26T22:58:32Z

Fixed checkstyle violations in StreamingContainer.java. Updated apex-style 
configuration in .idea to be consistent with checkstyle settings.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...

2016-05-26 Thread chandnisingh
Github user chandnisingh commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64835548
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java ---
@@ -53,6 +53,7 @@
* This is a threshold on the no. of blocks emitted per window. A lot of 
blocks emitted
* per window can overwhelm the downstream operators. This setting helps 
to control that.
*/
+  @NotNull
--- End diff --

@DT-Priyanka 
It is of type primitive int. So default value will be 0. Since the min is 
1, it will be necessary for the user to set it. @NotNull is not relevant here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2096) Add blockThreshold parameter to FSInputModule

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303116#comment-15303116
 ] 

ASF GitHub Bot commented on APEXMALHAR-2096:


Github user DT-Priyanka commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64835068
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java ---
@@ -53,6 +53,7 @@
* This is a threshold on the no. of blocks emitted per window. A lot of 
blocks emitted
* per window can overwhelm the downstream operators. This setting helps 
to control that.
*/
+  @NotNull
--- End diff --

As per discussions on pull request #283 we decided to make blocksThreshold 
as compulsory parameter, so user always sets it. 


> Add blockThreshold parameter to FSInputModule
> -
>
> Key: APEXMALHAR-2096
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2096
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Priyanka Gugale
>Assignee: Priyanka Gugale
>
> FileSplitter is very fast, the downstream operators can't work at that speed, 
> so we need to limit the speed with which fileSplitter works.
> One way to do is limit number of blocks emitted per window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...

2016-05-26 Thread DT-Priyanka
Github user DT-Priyanka commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64835068
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/AbstractFileSplitter.java ---
@@ -53,6 +53,7 @@
* This is a threshold on the no. of blocks emitted per window. A lot of 
blocks emitted
* per window can overwhelm the downstream operators. This setting helps 
to control that.
*/
+  @NotNull
--- End diff --

As per discussions on pull request #283 we decided to make blocksThreshold 
as compulsory parameter, so user always sets it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXCORE-172) Fix License Header In StatefulStreamCodec

2016-05-26 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXCORE-172:
--
Assignee: Ilya Ganelin  (was: Timothy Farkas)

> Fix License Header In StatefulStreamCodec
> -
>
> Key: APEXCORE-172
> URL: https://issues.apache.org/jira/browse/APEXCORE-172
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (APEXCORE-172) Fix License Header In StatefulStreamCodec

2016-05-26 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXCORE-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise closed APEXCORE-172.
-
Resolution: Fixed

> Fix License Header In StatefulStreamCodec
> -
>
> Key: APEXCORE-172
> URL: https://issues.apache.org/jira/browse/APEXCORE-172
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-core pull request: [APEXCORE-172] Fixed license hea...

2016-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-apex-core/pull/345


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-172) Fix License Header In StatefulStreamCodec

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303082#comment-15303082
 ] 

ASF GitHub Bot commented on APEXCORE-172:
-

GitHub user ilganeli opened a pull request:

https://github.com/apache/incubator-apex-core/pull/345

[APEXCORE-172] Fixed license header in StatefulStreamCodec.java

* Fixed license header to be correct.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/incubator-apex-core APEXCORE-172

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-apex-core/pull/345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #345


commit b12e8b01c7f471a91d7891a11e04cdae93c03a90
Author: Ilya Ganelin 
Date:   2016-05-26T22:30:15Z

Fixed license header in StatefulStreamCodec.java




> Fix License Header In StatefulStreamCodec
> -
>
> Key: APEXCORE-172
> URL: https://issues.apache.org/jira/browse/APEXCORE-172
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2096: Add property ...

2016-05-26 Thread gauravgopi123
Github user gauravgopi123 commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/287#discussion_r64828487
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/io/fs/FSInputModule.java ---
@@ -59,6 +58,9 @@
   private long blockSize;
   private boolean sequencialFileRead = false;
   private int readersCount;
+  @NotNull
--- End diff --

Is this required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2066) Add jdbc poller input operator

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303005#comment-15303005
 ] 

ASF GitHub Bot commented on APEXMALHAR-2066:


Github user gauravgopi123 commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/282#discussion_r64826973
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcPollInputOperator.java
 ---
@@ -0,0 +1,652 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package com.datatorrent.lib.db.jdbc;
+
+import java.io.IOException;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.atomic.AtomicReference;
+
+import javax.validation.constraints.Min;
+
+import org.slf4j.LoggerFactory;
+
+import org.apache.apex.malhar.lib.wal.FSWindowDataManager;
+import org.apache.apex.malhar.lib.wal.WindowDataManager;
+import org.apache.commons.lang3.tuple.MutablePair;
+
+import com.datatorrent.api.Context;
+import com.datatorrent.api.Context.OperatorContext;
+import com.datatorrent.api.DefaultPartition;
+import com.datatorrent.api.Operator.ActivationListener;
+import com.datatorrent.api.Operator.IdleTimeHandler;
+import com.datatorrent.api.Partitioner;
+import com.datatorrent.api.annotation.Stateless;
+import com.datatorrent.lib.db.AbstractStoreInputOperator;
+import com.datatorrent.lib.util.KeyValPair;
+import com.datatorrent.lib.util.KryoCloneUtils;
+import com.datatorrent.netlet.util.DTThrowable;
+
+/**
+ * Abstract operator for for consuming data using JDBC interface
+ * User needs User needs to provide
+ * tableName,dbConnection,setEmitColumnList,look-up key 
+ * Optionally batchSize,pollInterval,Look-up key and a where clause can be 
given
+ * 
+ * This operator uses static partitioning to arrive at range queries for 
exactly
+ * once reads
+ * Assumption is that there is an ordered column using which range queries 
can
+ * be formed
+ * If an emitColumnList is provided, please ensure that the keyColumn is 
the
+ * first column in the list
+ * Range queries are formed using the {@link JdbcMetaDataUtility}} Output -
+ * comma separated list of the emit columns eg columnA,columnB,columnC
+ * 
+ * @displayName Jdbc Polling Input Operator
+ * @category Input
+ * @tags database, sql, jdbc, partitionable,exactlyOnce
+ */
+public abstract class AbstractJdbcPollInputOperator extends 
AbstractStoreInputOperator
--- End diff --

How is AbstractJdbcPollInputOperator different from 
AbstractJdbcInputOperator?


> Add jdbc poller input operator
> --
>
> Key: APEXMALHAR-2066
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2066
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: Ashwin Chandra Putta
>Assignee: devendra tagare
>
> Create a JDBC poller input operator that has the following features.
> 1. poll from external jdbc store asynchronously in the input operator.
> 2. polling frequency and batch size should be configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXMALHAR-2066) Add jdbc poller input operator

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303002#comment-15303002
 ] 

ASF GitHub Bot commented on APEXMALHAR-2066:


Github user devtagare commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/282#discussion_r64826542
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcPollInputOperator.java
 ---
@@ -0,0 +1,652 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package com.datatorrent.lib.db.jdbc;
+
+import java.io.IOException;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.atomic.AtomicReference;
+
+import javax.validation.constraints.Min;
+
+import org.slf4j.LoggerFactory;
+
+import org.apache.apex.malhar.lib.wal.FSWindowDataManager;
+import org.apache.apex.malhar.lib.wal.WindowDataManager;
+import org.apache.commons.lang3.tuple.MutablePair;
+
+import com.datatorrent.api.Context;
+import com.datatorrent.api.Context.OperatorContext;
+import com.datatorrent.api.DefaultPartition;
+import com.datatorrent.api.Operator.ActivationListener;
+import com.datatorrent.api.Operator.IdleTimeHandler;
+import com.datatorrent.api.Partitioner;
+import com.datatorrent.api.annotation.Stateless;
+import com.datatorrent.lib.db.AbstractStoreInputOperator;
+import com.datatorrent.lib.util.KeyValPair;
+import com.datatorrent.lib.util.KryoCloneUtils;
+import com.datatorrent.netlet.util.DTThrowable;
+
+/**
+ * Abstract operator for for consuming data using JDBC interface
+ * User needs User needs to provide
+ * tableName,dbConnection,setEmitColumnList,look-up key 
+ * Optionally batchSize,pollInterval,Look-up key and a where clause can be 
given
+ * 
+ * This operator uses static partitioning to arrive at range queries for 
exactly
+ * once reads
+ * Assumption is that there is an ordered column using which range queries 
can
+ * be formed
+ * If an emitColumnList is provided, please ensure that the keyColumn is 
the
+ * first column in the list
+ * Range queries are formed using the {@link JdbcMetaDataUtility}} Output -
+ * comma separated list of the emit columns eg columnA,columnB,columnC
+ * 
+ * @displayName Jdbc Polling Input Operator
+ * @category Input
+ * @tags database, sql, jdbc, partitionable,exactlyOnce
+ */
+public abstract class AbstractJdbcPollInputOperator extends 
AbstractStoreInputOperator
+implements ActivationListener, IdleTimeHandler, 
Partitioner
+{
+  /*
+   * poll interval in milliseconds
+   */
+  private int pollInterval;
+
+  @Min(1)
+  private int partitionCount = 1;
+  protected transient int operatorId;
+  protected transient boolean isReplayed;
+  protected transient boolean isPollable;
+  protected int batchSize;
+  protected int fetchSize;
+  /**
+   * Map of windowId to  of the range key
+   */
+  protected transient MutablePair 
currentWindowRecoveryState;
+
+  /**
+   * size of the emit queue used to hold polled records before emit
+   */
+  private int queueCapacity = 4 * 1024 * 1024;
+  private transient volatile boolean execute;
+  private transient AtomicReference cause;
+  protected transient int spinMillis;
+  private transient OperatorContext context;
+  protected String tableName;
+  protected String key;
+  protected long currentWindowId;
+  protected KeyValPair rangeQueryPair;
+  protected String lower;
+  protected String upper;
+  protected boolean 

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2066 JdbcPolling,id...

2016-05-26 Thread gauravgopi123
Github user gauravgopi123 commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/282#discussion_r64825894
  
--- Diff: 
library/src/main/java/com/datatorrent/lib/db/jdbc/AbstractJdbcPollInputOperator.java
 ---
@@ -0,0 +1,652 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package com.datatorrent.lib.db.jdbc;
+
+import java.io.IOException;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.sql.SQLException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.atomic.AtomicReference;
+
+import javax.validation.constraints.Min;
+
+import org.slf4j.LoggerFactory;
+
+import org.apache.apex.malhar.lib.wal.FSWindowDataManager;
+import org.apache.apex.malhar.lib.wal.WindowDataManager;
+import org.apache.commons.lang3.tuple.MutablePair;
+
+import com.datatorrent.api.Context;
+import com.datatorrent.api.Context.OperatorContext;
+import com.datatorrent.api.DefaultPartition;
+import com.datatorrent.api.Operator.ActivationListener;
+import com.datatorrent.api.Operator.IdleTimeHandler;
+import com.datatorrent.api.Partitioner;
+import com.datatorrent.api.annotation.Stateless;
+import com.datatorrent.lib.db.AbstractStoreInputOperator;
+import com.datatorrent.lib.util.KeyValPair;
+import com.datatorrent.lib.util.KryoCloneUtils;
+import com.datatorrent.netlet.util.DTThrowable;
+
+/**
+ * Abstract operator for for consuming data using JDBC interface
+ * User needs User needs to provide
+ * tableName,dbConnection,setEmitColumnList,look-up key 
+ * Optionally batchSize,pollInterval,Look-up key and a where clause can be 
given
+ * 
+ * This operator uses static partitioning to arrive at range queries for 
exactly
+ * once reads
+ * Assumption is that there is an ordered column using which range queries 
can
+ * be formed
+ * If an emitColumnList is provided, please ensure that the keyColumn is 
the
+ * first column in the list
+ * Range queries are formed using the {@link JdbcMetaDataUtility}} Output -
+ * comma separated list of the emit columns eg columnA,columnB,columnC
+ * 
+ * @displayName Jdbc Polling Input Operator
+ * @category Input
+ * @tags database, sql, jdbc, partitionable,exactlyOnce
+ */
+public abstract class AbstractJdbcPollInputOperator extends 
AbstractStoreInputOperator
+implements ActivationListener, IdleTimeHandler, 
Partitioner
+{
+  /*
+   * poll interval in milliseconds
+   */
+  private int pollInterval;
+
+  @Min(1)
+  private int partitionCount = 1;
+  protected transient int operatorId;
+  protected transient boolean isReplayed;
+  protected transient boolean isPollable;
+  protected int batchSize;
+  protected int fetchSize;
+  /**
+   * Map of windowId to  of the range key
+   */
+  protected transient MutablePair 
currentWindowRecoveryState;
+
+  /**
+   * size of the emit queue used to hold polled records before emit
+   */
+  private int queueCapacity = 4 * 1024 * 1024;
+  private transient volatile boolean execute;
+  private transient AtomicReference cause;
+  protected transient int spinMillis;
+  private transient OperatorContext context;
+  protected String tableName;
+  protected String key;
+  protected long currentWindowId;
+  protected KeyValPair rangeQueryPair;
+  protected String lower;
+  protected String upper;
+  protected boolean recovered;
+  protected boolean isPolled;
+  protected String whereCondition = null;
+  protected String previousUpperBound;
+  protected String highestPolled;
+  private static final String user = "user";
+  private static final 

[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302894#comment-15302894
 ] 

ASF GitHub Bot commented on APEXMALHAR-2094:


Github user sandeep-n commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64818531
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()

[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302895#comment-15302895
 ] 

ASF GitHub Bot commented on APEXMALHAR-2094:


Github user sandeep-n commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64818557
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2094 Quantiles Oper...

2016-05-26 Thread sandeep-n
Github user sandeep-n commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64818557
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()
+  {
+return fractions;
+  }
+
+  public void setFractions(double[] fractions)
+  {
+this.fractions = fractions;
+  }
+
+  public double[] getPmfIntervals()
+  {
+return pmfIntervals;
+  }
 

Re: IntelliJ and Netbeans code styles are missing

2016-05-26 Thread Ganelin, Ilya
Another question - are CheckStyle settings packaged anywhere? There is 
obviously a CheckStyle run by Jenkins - is this CheckStyle published for Apex 
or Malhar?




On 5/24/16, 2:19 PM, "Ganelin, Ilya"  wrote:

>I just realized that these were moved to subdirectories but the associated 
>documentation was never updated.
>I’ll fix that.
>
>
>
>
>On 5/24/16, 2:16 PM, "Ganelin, Ilya"  wrote:
>
>>Hi all – I’ve been setting up a new dev environment and noticed that the 
>>apex-style.jar and apex-style.zip are missing from:
>>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/intellij
>>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/netbeans
>>
>>Did these get lost in the migration from Apache or did we decide to approach 
>>this a different way?
>>
>>In recent work I did in the Alluxio project, I created a utility that would 
>>package/unpack code-style settings allowing changes to be tracked in GitHub 
>>but still provide a relatively automated process for setting up 
>>configurations. Would something like that make sense for Apex?
>>
>>https://github.com/Alluxio/alluxio/pull/3168
>>
>>
>>
>>The information contained in this e-mail is confidential and/or proprietary 
>>to Capital One and/or its affiliates and may only be used solely in 
>>performance of work or services for Capital One. The information transmitted 
>>herewith is intended only for use by the individual or entity to which it is 
>>addressed. If the reader of this message is not the intended recipient, you 
>>are hereby notified that any review, retransmission, dissemination, 
>>distribution, copying or other use of, or taking of any action in reliance 
>>upon this information is strictly prohibited. If you have received this 
>>communication in error, please contact the sender and delete the material 
>>from your computer.
>
>
>The information contained in this e-mail is confidential and/or proprietary to 
>Capital One and/or its affiliates and may only be used solely in performance 
>of work or services for Capital One. The information transmitted herewith is 
>intended only for use by the individual or entity to which it is addressed. If 
>the reader of this message is not the intended recipient, you are hereby 
>notified that any review, retransmission, dissemination, distribution, copying 
>or other use of, or taking of any action in reliance upon this information is 
>strictly prohibited. If you have received this communication in error, please 
>contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302823#comment-15302823
 ] 

ASF GitHub Bot commented on APEXMALHAR-2094:


Github user ilganeli commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810211
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()
 

[jira] [Commented] (APEXMALHAR-2094) Quantiles sketch operator

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302821#comment-15302821
 ] 

ASF GitHub Bot commented on APEXMALHAR-2094:


Github user ilganeli commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810147
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()
 

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2094 Quantiles Oper...

2016-05-26 Thread ilganeli
Github user ilganeli commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810211
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()
+  {
+return fractions;
+  }
+
+  public void setFractions(double[] fractions)
+  {
+this.fractions = fractions;
+  }
+
+  public double[] getPmfIntervals()
+  {
+return pmfIntervals;
+  }
  

[GitHub] incubator-apex-malhar pull request: APEXMALHAR-2094 Quantiles Oper...

2016-05-26 Thread ilganeli
Github user ilganeli commented on a diff in the pull request:


https://github.com/apache/incubator-apex-malhar/pull/296#discussion_r64810147
  
--- Diff: 
sketches/src/main/java/org/apache/apex/malhar/sketches/QuantilesEstimator.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.apex.malhar.sketches;
+
+import com.yahoo.sketches.quantiles.QuantilesSketch;
+
+import com.datatorrent.api.DefaultInputPort;
+import com.datatorrent.api.DefaultOutputPort;
+import com.datatorrent.api.annotation.OperatorAnnotation;
+import com.datatorrent.common.util.BaseOperator;
+
+/**
+ * An implementation of BaseOperator that computes a "sketch" (a 
representation of the probability distribution using
+ * a low memory footprint) of the incoming numeric data, and 
evaluates/outputs the cumulative distribution function and
+ * quantiles of the probability distribution. Leverages the quantiles 
sketch implementation from the Yahoo Datasketches
+ * Library.
+ * 
+ * Input Port(s) : 
+ * data :  Data values input port. 
+ * 
+ * Output Port(s) :  
+ * cdfOutput : cumulative distribution function output port. 
+ * quantilesOutput : quantiles output port. 
+ * 
+ * Partitions : No, no will yield wrong results. 
+ * +
+ */
+@OperatorAnnotation(partitionable = false)
+public class QuantilesEstimator extends BaseOperator
+{
+  private transient QuantilesSketch quantilesSketch = 
QuantilesSketch.builder().build();
+
+  /**
+   * Constructor that allows non-default initialization of the quantile 
sketch object
+   *
+   * @param k:Parameter that determines accuracy and memory usage of 
quantile sketch. See QuantilesSketch documentation
+   *  for details
+   * @param seed: The quantile sketch algorithm is inherently random. Set 
seed to 0 for reproducibility in testing, but
+   *  do not set otherwise.
+   */
+  public QuantilesEstimator(int k, short seed)
+  {
+quantilesSketch = QuantilesSketch.builder().setSeed(seed).build(k);
+  }
+
+  /**
+   * Output port that emits cdf estimated at the current data point
+   */
+  public final transient DefaultOutputPort cdfOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits quantiles of stream seen thus far
+   */
+  public final transient DefaultOutputPort quantilesOutput = new 
DefaultOutputPort<>();
+  /**
+   * Emits probability masses on specified intervals
+   */
+  public final transient DefaultOutputPort pmfOutput = new 
DefaultOutputPort<>();
+  /**
+   * This operator computes three different quantities which are output on 
separate output ports. If not using any of
+   * these quantities, these variables can be set to avoid unnecessary 
computation.
+   */
+
+  private boolean computeCdf = true;
+  private boolean computeQuantiles = true;
+  private boolean computePmf = true;
+
+  public boolean isComputeCdf()
+  {
+return computeCdf;
+  }
+
+  public void setComputeCdf(boolean computeCdf)
+  {
+this.computeCdf = computeCdf;
+  }
+
+  public boolean isComputeQuantiles()
+  {
+return computeQuantiles;
+  }
+
+  public void setComputeQuantiles(boolean computeQuantiles)
+  {
+this.computeQuantiles = computeQuantiles;
+  }
+
+  public boolean isComputePmf()
+  {
+return computePmf;
+  }
+
+  public void setComputePmf(boolean computePmf)
+  {
+this.computePmf = computePmf;
+  }
+
+  public int getK()
+  {
+return quantilesSketch.getK();
+  }
+
+  public double[] getFractions()
+  {
+return fractions;
+  }
+
+  public void setFractions(double[] fractions)
+  {
+this.fractions = fractions;
+  }
+
+  public double[] getPmfIntervals()
+  {
+return pmfIntervals;
+  }
  

[jira] [Updated] (APEXMALHAR-2102) Add A Clone Partitioner Which Sends The Same Data To Each Partition

2016-05-26 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2102:
-
Assignee: Ilya Ganelin

> Add A Clone Partitioner Which Sends The Same Data To Each Partition
> ---
>
> Key: APEXMALHAR-2102
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2102
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>
> This should go into com.datatorrent.common.partitioner and would be very 
> similar to the StatelessPartitioner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (APEXMALHAR-2102) Add A Clone Partitioner Which Sends The Same Data To Each Partition

2016-05-26 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise moved APEXCORE-146 to APEXMALHAR-2102:
---

Assignee: (was: Ilya Ganelin)
Workflow: Default workflow, editable Closed status  (was: jira)
 Key: APEXMALHAR-2102  (was: APEXCORE-146)
 Project: Apache Apex Malhar  (was: Apache Apex Core)

> Add A Clone Partitioner Which Sends The Same Data To Each Partition
> ---
>
> Key: APEXMALHAR-2102
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2102
> Project: Apache Apex Malhar
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>
> This should go into com.datatorrent.common.partitioner and would be very 
> similar to the StatelessPartitioner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-146) Add A Clone Partitioner Which Sends The Same Data To Each Partition

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302786#comment-15302786
 ] 

ASF GitHub Bot commented on APEXCORE-146:
-

Github user ilganeli commented on the pull request:


https://github.com/apache/incubator-apex-core/pull/344#issuecomment-221973428
  
Moved to https://github.com/apache/incubator-apex-malhar/pull/297



> Add A Clone Partitioner Which Sends The Same Data To Each Partition
> ---
>
> Key: APEXCORE-146
> URL: https://issues.apache.org/jira/browse/APEXCORE-146
> Project: Apache Apex Core
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>
> This should go into com.datatorrent.common.partitioner and would be very 
> similar to the StatelessPartitioner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-146) Add A Clone Partitioner Which Sends The Same Data To Each Partition

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302783#comment-15302783
 ] 

ASF GitHub Bot commented on APEXCORE-146:
-

GitHub user ilganeli opened a pull request:

https://github.com/apache/incubator-apex-malhar/pull/297

[APEXCORE-146] Add ClonePartitioner

**Note:** This is still marked as an Apex issue even though it should live 
in Malhar. Once the JIRA is updated I'll update title and reference. 

* Created a Clone partitioner similar to the StatelessPartitioner that 
assigns all data to all partitions
* Added a simple unit test suite to test scale up and scale down

**Questions**
Is there a good place/way to add a test to verify that data is partitioned 
appropriately? For example, the stateless partitioner assigns keys based on the 
nearest power of two - is that validated anywhere?

How is the serialVersionUUID generated?

Does the manual assignment of ports to PartitionKeys need to happen for 
each Partition? There is a comment in the Partition class that by default all 
data is sent to all partitions. However, digging into the implementation in 
depth doesn't show that behavior. Instead, it seems data is assigned based on 
the mask associated with each Partition. 

This references: https://issues.apache.org/jira/browse/APEXCORE-146

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/incubator-apex-malhar APEXCORE-146

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-apex-malhar/pull/297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #297


commit 74f6fd5113a3d488015a9817790018dd63b5e8cb
Author: Ilya Ganelin 
Date:   2016-05-26T19:39:24Z

Added a new partitioner that replicates data across all partitions by 
default.




> Add A Clone Partitioner Which Sends The Same Data To Each Partition
> ---
>
> Key: APEXCORE-146
> URL: https://issues.apache.org/jira/browse/APEXCORE-146
> Project: Apache Apex Core
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>
> This should go into com.datatorrent.common.partitioner and would be very 
> similar to the StatelessPartitioner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (APEXCORE-146) Add A Clone Partitioner Which Sends The Same Data To Each Partition

2016-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302784#comment-15302784
 ] 

ASF GitHub Bot commented on APEXCORE-146:
-

Github user ilganeli closed the pull request at:

https://github.com/apache/incubator-apex-core/pull/344


> Add A Clone Partitioner Which Sends The Same Data To Each Partition
> ---
>
> Key: APEXCORE-146
> URL: https://issues.apache.org/jira/browse/APEXCORE-146
> Project: Apache Apex Core
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Ilya Ganelin
>
> This should go into com.datatorrent.common.partitioner and would be very 
> similar to the StatelessPartitioner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-apex-malhar pull request: [APEXCORE-146] Add ClonePartit...

2016-05-26 Thread ilganeli
GitHub user ilganeli opened a pull request:

https://github.com/apache/incubator-apex-malhar/pull/297

[APEXCORE-146] Add ClonePartitioner

**Note:** This is still marked as an Apex issue even though it should live 
in Malhar. Once the JIRA is updated I'll update title and reference. 

* Created a Clone partitioner similar to the StatelessPartitioner that 
assigns all data to all partitions
* Added a simple unit test suite to test scale up and scale down

**Questions**
Is there a good place/way to add a test to verify that data is partitioned 
appropriately? For example, the stateless partitioner assigns keys based on the 
nearest power of two - is that validated anywhere?

How is the serialVersionUUID generated?

Does the manual assignment of ports to PartitionKeys need to happen for 
each Partition? There is a comment in the Partition class that by default all 
data is sent to all partitions. However, digging into the implementation in 
depth doesn't show that behavior. Instead, it seems data is assigned based on 
the mask associated with each Partition. 

This references: https://issues.apache.org/jira/browse/APEXCORE-146

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilganeli/incubator-apex-malhar APEXCORE-146

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-apex-malhar/pull/297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #297


commit 74f6fd5113a3d488015a9817790018dd63b5e8cb
Author: Ilya Ganelin 
Date:   2016-05-26T19:39:24Z

Added a new partitioner that replicates data across all partitions by 
default.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-apex-core pull request: [APEXCORE-146] Add ClonePartitio...

2016-05-26 Thread ilganeli
Github user ilganeli commented on the pull request:


https://github.com/apache/incubator-apex-core/pull/344#issuecomment-221973428
  
Moved to https://github.com/apache/incubator-apex-malhar/pull/297



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXMALHAR-2063) Integrate WAL to FS WindowDataManager

2016-05-26 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301685#comment-15301685
 ] 

Chandni Singh commented on APEXMALHAR-2063:
---

Window Data Manager supports dynamic partitioning in operator by allowing an 
instance to read data saved for a particular window id by all operator 
instances. The following method provides that support
{code}
 /**
   * When an operator can partition itself dynamically then there is no 
guarantee that an input state which was being
   * handled by one instance previously will be handled by the same instance 
after partitioning. 
   * For eg. An {@link AbstractFileInputOperator} instance which reads a File X 
till offset l (not check-pointed) may no
   * longer be the instance that handles file X after repartitioning as no. of 
instances may have changed and file X
   * is re-hashed to another instance. 
   * The new instance wouldn't know from what point to read the File X unless 
it reads the idempotent storage of all the
   * operators for the window being replayed and fix it's state.
   *
   * @param windowId window id.
   * @return mapping of operator id to the corresponding state
   * @throws IOException
   */
  Map load(long windowId) throws IOException;
{code}
To provide the support for above with FileSystemWAL becomes complicated. 
Currently the FileSystemWAL reader and writer are assumed to be in the same 
physical partition. However, supporting above requires multiple readers which 
are in different physical partitions than the writer.

So the FileSystem WAL needs to be changed, in order to be used in read-only 
mode.

> Integrate WAL to FS WindowDataManager
> -
>
> Key: APEXMALHAR-2063
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2063
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>
> FS Window Data Manager is used to save meta-data that helps in replaying 
> tuples every completed application window after failure. For this it saves 
> meta-data in a file per window. Having multiple small size files on hdfs 
> cause issues as highlighted here:
> http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> Instead FS Window Data Manager can utilize the WAL to write data and maintain 
> a mapping of how much data was flushed to WAL each window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)