[jira] [Assigned] (APEXMALHAR-2413) Improve PojoInnerJoin Accumulation

2017-02-21 Thread Chinmay Kolhatkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kolhatkar reassigned APEXMALHAR-2413:
-

Assignee: Hitesh Kapoor  (was: Chinmay Kolhatkar)

> Improve PojoInnerJoin Accumulation
> --
>
> Key: APEXMALHAR-2413
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2413
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>  Components: algorithms
>Affects Versions: 3.6.0
>Reporter: Chinmay Kolhatkar
>Assignee: Hitesh Kapoor
>
> This contains a series of subtasks:
> 1. Enable PojoJoinInner accumulation to emit POJO instead of Map
> 2. Improve performance of PojoInnerJoin accumulation by using PojoUtils 
> instead of java reflection
> 3. Enable PojoInnerJoin accumulation to accept group of keys for join instead 
> of a single key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (APEXMALHAR-2414) Improve performance of PojoInnerJoin accum by using PojoUtils

2017-02-21 Thread Chinmay Kolhatkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinmay Kolhatkar reassigned APEXMALHAR-2414:
-

Assignee: (was: Chinmay Kolhatkar)

> Improve performance of PojoInnerJoin accum by using PojoUtils
> -
>
> Key: APEXMALHAR-2414
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2414
> Project: Apache Apex Malhar
>  Issue Type: Sub-task
>  Components: algorithms
>Reporter: Chinmay Kolhatkar
>
> Currently PojoInnerJoin accumulation uses java reflection to deal with 
> objects. Use PojoUtils instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (APEXMALHAR-2418) Update the twitter library to 4.0.6

2017-02-21 Thread Priyanka Gugale (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyanka Gugale resolved APEXMALHAR-2418.
-
   Resolution: Fixed
 Assignee: (was: Priyanka Gugale)
Fix Version/s: 3.7.0

> Update the twitter library to 4.0.6 
> 
>
> Key: APEXMALHAR-2418
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2418
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shubham Agrawal
>Priority: Minor
> Fix For: 3.7.0
>
>
> There was a known issue of Build Failure for twitter version 4.0.4.
> So for succesfully building and launching the app need to change the version 
> from 4.0.4 to 4.0.6.
> It has been tested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (APEXMALHAR-2418) Update the twitter library to 4.0.6

2017-02-21 Thread Priyanka Gugale (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Priyanka Gugale reassigned APEXMALHAR-2418:
---

Assignee: Priyanka Gugale

> Update the twitter library to 4.0.6 
> 
>
> Key: APEXMALHAR-2418
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2418
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shubham Agrawal
>Assignee: Priyanka Gugale
>Priority: Minor
>
> There was a known issue of Build Failure for twitter version 4.0.4.
> So for succesfully building and launching the app need to change the version 
> from 4.0.4 to 4.0.6.
> It has been tested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2418) Update the twitter library to 4.0.6

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877471#comment-15877471
 ] 

ASF GitHub Bot commented on APEXMALHAR-2418:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/558


> Update the twitter library to 4.0.6 
> 
>
> Key: APEXMALHAR-2418
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2418
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shubham Agrawal
>Priority: Minor
>
> There was a known issue of Build Failure for twitter version 4.0.4.
> So for succesfully building and launching the app need to change the version 
> from 4.0.4 to 4.0.6.
> It has been tested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-malhar pull request #558: APEXMALHAR-2418 Updated twitter4j library ver...

2017-02-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/558


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] apex-core pull request #440: APEXCORE-580-581 custom control tuple support

2017-02-21 Thread bhupeshchawda
Github user bhupeshchawda closed the pull request at:

https://github.com/apache/apex-core/pull/440


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-580) Interface for processing and emitting control tuples

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877448#comment-15877448
 ] 

ASF GitHub Bot commented on APEXCORE-580:
-

Github user bhupeshchawda closed the pull request at:

https://github.com/apache/apex-core/pull/440


> Interface for processing and emitting control tuples
> 
>
> Key: APEXCORE-580
> URL: https://issues.apache.org/jira/browse/APEXCORE-580
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> DefaultOutputPort needs to have a emitControl method so that operator code 
> can call to emit a control tuple.
> DefaultInputPort needs to have a processControl method so that operator would 
> be able to act on the arrival of a control tuple.
> Similar to a regular data tuple, we also need to provide a way for the user 
> to provide custom serialization for the control tuple.
> We need to design this so that the default behavior is to propagate control 
> tuples to all output ports, and it should allow the user to easily change 
> that behavior. The user can selectively propagate control tuples to certain 
> output ports, or block the propagation altogether.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXCORE-580) Interface for processing and emitting control tuples

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877449#comment-15877449
 ] 

ASF GitHub Bot commented on APEXCORE-580:
-

GitHub user bhupeshchawda reopened a pull request:

https://github.com/apache/apex-core/pull/440

APEXCORE-580-581 custom control tuple support



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bhupeshchawda/apex-core 
APEXCORE-580-581-custom-control-tuples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #440






> Interface for processing and emitting control tuples
> 
>
> Key: APEXCORE-580
> URL: https://issues.apache.org/jira/browse/APEXCORE-580
> Project: Apache Apex Core
>  Issue Type: Sub-task
>Reporter: David Yan
>
> DefaultOutputPort needs to have a emitControl method so that operator code 
> can call to emit a control tuple.
> DefaultInputPort needs to have a processControl method so that operator would 
> be able to act on the arrival of a control tuple.
> Similar to a regular data tuple, we also need to provide a way for the user 
> to provide custom serialization for the control tuple.
> We need to design this so that the default behavior is to propagate control 
> tuples to all output ports, and it should allow the user to easily change 
> that behavior. The user can selectively propagate control tuples to certain 
> output ports, or block the propagation altogether.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-core pull request #440: APEXCORE-580-581 custom control tuple support

2017-02-21 Thread bhupeshchawda
GitHub user bhupeshchawda reopened a pull request:

https://github.com/apache/apex-core/pull/440

APEXCORE-580-581 custom control tuple support



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bhupeshchawda/apex-core 
APEXCORE-580-581-custom-control-tuples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-core/pull/440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #440






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (APEXCORE-642) Catch all exceptions in StreamingAppMasterService.serviceInit() and create a StramEvent

2017-02-21 Thread Sanjay M Pujare (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXCORE-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877070#comment-15877070
 ] 

Sanjay M Pujare commented on APEXCORE-642:
--

I am going to close this JIRA as "Won't Fix" because of the complexity involved 
vis-a-vis the value we get out of it unless someone feels otherwise.

> Catch all exceptions in StreamingAppMasterService.serviceInit() and create a 
> StramEvent
> ---
>
> Key: APEXCORE-642
> URL: https://issues.apache.org/jira/browse/APEXCORE-642
> Project: Apache Apex Core
>  Issue Type: Bug
>Reporter: Sanjay M Pujare
>Assignee: Sanjay M Pujare
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When the AM (Stram) starts, it executes 
> StreamingAppMasterService.serviceInit() to perform service initialization as 
> per the Hadoop service contract. In this, the Stram deserializes the DAG 
> which can fail (e.g. bad jar versions or other deserialization issues) and 
> any exception is thrown is not logged in Apex log files or events. It is 
> proposed to catch these exceptions and log them to the dt log file as well as 
> Stram events so the Apex user knows about these exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Sandesh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876656#comment-15876656
 ] 

Sandesh commented on APEXMALHAR-2419:
-

Fix is here, until it is merged.
https://github.com/sandeshh/apex-malhar/tree/APEXMALHAR-2419


> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Sandesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandesh reassigned APEXMALHAR-2419:
---

Assignee: Sandesh

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Sandesh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876641#comment-15876641
 ] 

Sandesh commented on APEXMALHAR-2419:
-

largestCompletedWindow will be -1, if the recovery is not happening.

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (APEXMALHAR-2410) KafkaSinglePortExactlyOnceOutputOperator fails if an operator crashes in the very first window

2017-02-21 Thread Sandesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandesh closed APEXMALHAR-2410.
---

> KafkaSinglePortExactlyOnceOutputOperator fails if an operator crashes in the 
> very first window
> --
>
> Key: APEXMALHAR-2410
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2410
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Oliver Winke
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (APEXMALHAR-2410) KafkaSinglePortExactlyOnceOutputOperator fails if an operator crashes in the very first window

2017-02-21 Thread Sandesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandesh resolved APEXMALHAR-2410.
-
Resolution: Duplicate

Duplicate of APEXMALHAR-2419

> KafkaSinglePortExactlyOnceOutputOperator fails if an operator crashes in the 
> very first window
> --
>
> Key: APEXMALHAR-2410
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2410
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Oliver Winke
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Ashwin Chandra Putta
Sunil,

You can poll the queue in end window since process method in the input port
does not get called if there is no incoming tuple. However, end window is
called irrespective of there are incoming tuples or not.

Regards,
Ashwin.

On Tue, Feb 21, 2017 at 11:32 AM, Sunil Parmar 
wrote:

> Ram,
> Thanks for the prompt response. If we use the approach you suggested we’re
> dependent on main thread’s process call I.e. Tuples in the thread safe
> queue gets only processed when main thread is processing incoming tuples.
> How can we explicitly call the process from polling of delay queue ?
>
> Just for reference here’s the sample code snippet for our operator.
>
> public class MyOperator extends BaseOperator implements
>
> Operator.ActivationListener {
> …..
>
> @InputPortFieldAnnotation
>
> public transient DefaultInputPort kafkaStreamInput =
>
> new DefaultInputPort() {
>
> List errors = new ArrayList();
>
> @Override
>
> public void process(String consumerRecord) {
>
> //Code for normal tuple process
>
> //Code to poll thread safe queue
>
> }
>
> ***—*
> *From: *Munagala Ramanath 
> *To: *us...@apex.apache.org
> *CC: *"dev@apex.apache.org" , Allan De Leon <
> adel...@threatmetrix.com>, Tim Zhu 
> *Subject: *Re: Occasional Out of order tuples when emitting from a thread
> *Date: *2017-02-21 10:08 (-0800)
> *List: *us...@apex.apache.org
> 
>
> Please note that tuples should not be emitted by any thread other than the
> main operator thread.
>
> A common pattern is to use a thread-safe queue and have worker threads
> enqueue
> tuples there; the main operator thread then pulls tuples from the queue and
> emits them.
>
> Ram
>
> ___
>
> Munagala V. Ramanath
>
> Software Engineer
>
> E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam
> www.datatorrent.com  |  apex.apache.org
>
>
> From: Sunil Parmar 
> Date: Tuesday, February 21, 2017 at 10:05 AM
> To: "us...@apex.apache.org" , "dev@apex.apache.org"
> 
> Cc: Allan De Leon , Tim Zhu <
> t...@threatmetrix.com>
> Subject: Occasional Out of order tuples when emitting from a thread
>
> Hi there,
> We have the following setup:
>
>- we have a generic operator that’s processing tuples in its input port
>- in the input port’s process method, we check for a condition, and:
>   - if the condition is met, the tuple is emitted to the next
>   operator right away (in the process method)
>   - Otherwise, if the condition is not met, we store the tuple  in
>   some cache and we use some threads that periodically check the 
> condition to
>   become true. Once the condition is true, the threads call the emit 
> method
>   on the stored tuples.
>
> With this setup, we occasionally encounter the following error:
> 2017-02-15 17:29:09,364 ERROR com.datatorrent.stram.engine.GenericNode:
> Catastrophic Error: Out of sequence BEGIN_WINDOW tuple 58a404613b7f on
> port transformedJSON while expecting 58a404613b7e
>
> Is there a way to make the above work correctly?
> If not, can you recommend a better way of doing this?
> How can we ensure window assignment is done synchronously before emitting
> tuples ?
>
> Thanks very much in advance…
> -allan
>



-- 

Regards,
Ashwin.


Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Sunil Parmar
Ram,
Thanks for the prompt response. If we use the approach you suggested we're 
dependent on main thread's process call I.e. Tuples in the thread safe queue 
gets only processed when main thread is processing incoming tuples. How can we 
explicitly call the process from polling of delay queue ?

Just for reference here's the sample code snippet for our operator.


public class MyOperator extends BaseOperator implements

Operator.ActivationListener {

.


@InputPortFieldAnnotation

public transient DefaultInputPort kafkaStreamInput =

new DefaultInputPort() {

List errors = new ArrayList();

@Override

public void process(String consumerRecord) {

//Code for normal tuple process

//Code to poll thread safe queue

}

-
From: Munagala Ramanath >
To: us...@apex.apache.org
CC: "dev@apex.apache.org" 
>, Allan De Leon 
>, Tim Zhu 
>
Subject: Re: Occasional Out of order tuples when emitting from a thread
Date: 2017-02-21 10:08 (-0800)
List: 
us...@apex.apache.org

Please note that tuples should not be emitted by any thread other than the
main operator thread.

A common pattern is to use a thread-safe queue and have worker threads
enqueue
tuples there; the main operator thread then pulls tuples from the queue and
emits them.

Ram

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | 
Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org

From: Sunil Parmar >
Date: Tuesday, February 21, 2017 at 10:05 AM
To: "us...@apex.apache.org" 
>, 
"dev@apex.apache.org" 
>
Cc: Allan De Leon >, 
Tim Zhu >
Subject: Occasional Out of order tuples when emitting from a thread

Hi there,
We have the following setup:

  *   we have a generic operator that's processing tuples in its input port
  *   in the input port's process method, we check for a condition, and:
 *   if the condition is met, the tuple is emitted to the next operator 
right away (in the process method)
 *   Otherwise, if the condition is not met, we store the tuple  in some 
cache and we use some threads that periodically check the condition to become 
true. Once the condition is true, the threads call the emit method on the 
stored tuples.

With this setup, we occasionally encounter the following error:
2017-02-15 17:29:09,364 ERROR com.datatorrent.stram.engine.GenericNode: 
Catastrophic Error: Out of sequence BEGIN_WINDOW tuple 58a404613b7f on port 
transformedJSON while expecting 58a404613b7e

Is there a way to make the above work correctly?
If not, can you recommend a better way of doing this?
How can we ensure window assignment is done synchronously before emitting 
tuples ?

Thanks very much in advance...
-allan


Re: Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Munagala Ramanath
Please note that tuples should not be emitted by any thread other than the
main operator thread.

A common pattern is to use a thread-safe queue and have worker threads
enqueue
tuples there; the main operator thread then pulls tuples from the queue and
emits them.

Ram

On Tue, Feb 21, 2017 at 10:05 AM, Sunil Parmar 
wrote:

> Hi there,
> We have the following setup:
>
>- we have a generic operator that’s processing tuples in its input port
>- in the input port’s process method, we check for a condition, and:
>   - if the condition is met, the tuple is emitted to the next
>   operator right away (in the process method)
>   - Otherwise, if the condition is not met, we store the tuple  in
>   some cache and we use some threads that periodically check the 
> condition to
>   become true. Once the condition is true, the threads call the emit 
> method
>   on the stored tuples.
>
> With this setup, we occasionally encounter the following error:
> 2017-02-15 17:29:09,364 ERROR com.datatorrent.stram.engine.GenericNode:
> Catastrophic Error: Out of sequence BEGIN_WINDOW tuple 58a404613b7f on
> port transformedJSON while expecting 58a404613b7e
>
> Is there a way to make the above work correctly?
> If not, can you recommend a better way of doing this?
> How can we ensure window assignment is done synchronously before emitting
> tuples ?
>
> Thanks very much in advance…
> -allan
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org


Occasional Out of order tuples when emitting from a thread

2017-02-21 Thread Sunil Parmar
Hi there,
We have the following setup:

  *   we have a generic operator that's processing tuples in its input port
  *   in the input port's process method, we check for a condition, and:
 *   if the condition is met, the tuple is emitted to the next operator 
right away (in the process method)
 *   Otherwise, if the condition is not met, we store the tuple  in some 
cache and we use some threads that periodically check the condition to become 
true. Once the condition is true, the threads call the emit method on the 
stored tuples.

With this setup, we occasionally encounter the following error:
2017-02-15 17:29:09,364 ERROR com.datatorrent.stram.engine.GenericNode: 
Catastrophic Error: Out of sequence BEGIN_WINDOW tuple 58a404613b7f on port 
transformedJSON while expecting 58a404613b7e

Is there a way to make the above work correctly?
If not, can you recommend a better way of doing this?
How can we ensure window assignment is done synchronously before emitting 
tuples ?

Thanks very much in advance...
-allan


Re: Redshift Output Operator

2017-02-21 Thread Amol Kekre
Chaitanya,
This is good first cut. Post this work, do take a look at loading data
before file rotation.

Thks
Amol


*Follow @amolhkekre*
*Join us at Apex Big Data World-San Jose
, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]


On Mon, Feb 20, 2017 at 10:56 PM, Chaitanya Chebolu <
chaita...@datatorrent.com> wrote:

> Created JIRA for this task: APEXMALHAR-2416
>
> On Mon, Feb 13, 2017 at 4:14 PM, Chaitanya Chebolu <
> chaita...@datatorrent.com> wrote:
>
> > Hi All,
> >
> >   I am proposing Amazon Redshift output module.
> >   Please refer below link about the Redshift: https://aws.amazon.com/
> > redshift/
> >
> >   Primary functionality of this module is load data into Redshift tables
> > from data files using copy command. Refer the below link about the copy
> > command:
> > http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html
> >
> > Input type to this module is byte[].
> >
> >   I am proposing the below design:
> > 1) Write the tuples into EMR/S3. By default, it writes to S3.
> > 2) Once the file is rolled, upload the file into Redshift using copy
> > command.
> >
> > Please share your thoughts on design.
> >
> > Regards,
> > Chaitanya
> >
>
>
>
> --
>
> *Chaitanya*
>
> Software Engineer
>
> E: chaita...@datatorrent.com | Twitter: @chaithu1403
>
> www.datatorrent.com  |  apex.apache.org
>


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876082#comment-15876082
 ] 

Munagala V. Ramanath commented on APEXMALHAR-2419:
--

The expectation seems to be that 
`windowDataManager.getLargestCompletedWindow()` will be the last window on 
which `windowDataManager.save()` was called in the previous failed run which in 
turn means that `windowId > windowDataManager.getLargestCompletedWindow()` will 
be true only when replay is complete. However as the following log messages 
show, the `windowDataManager.getLargestCompletedWindow()` always returns the 
same value during a run -- this may indicate a bug in FSWindowDataManager.

Notice in the initial deploy, the largestCompletedWindow is always -1 even 
after a save() call in endWindow:

Initial deploy:
{quote}
2017-02-21 06:18:55,196 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705345, largestCompletedWindow = -1
2017-02-21 06:18:55,240 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705345, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,747 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705345, 
largestCompletedWindow = -1
2017-02-21 06:18:55,749 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705346, largestCompletedWindow = -1
2017-02-21 06:18:55,749 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705346, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,806 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705346, 
largestCompletedWindow = -1
2017-02-21 06:18:55,806 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705347, largestCompletedWindow = -1
2017-02-21 06:18:55,806 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705347, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,906 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705347, 
largestCompletedWindow = -1
2017-02-21 06:18:55,906 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705348, largestCompletedWindow = -1
2017-02-21 06:18:55,906 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705348, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,964 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705348, 
largestCompletedWindow = -1
2017-02-21 06:18:55,964 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705349, largestCompletedWindow = -1
2017-02-21 06:18:55,965 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705349, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:56,022 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705349, 
largestCompletedWindow = -1
2017-02-21 06:18:56,023 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705350, largestCompletedWindow = -1
2017-02-21 06:19:03,570 INFO 
com.datatorrent.stram.engine.StreamingContainer: Undeploy request: [3]
{quote}

Notice here that when we reach the second replay window, it is not recognized 
as replay and throws the exception.

After redeploy:

{quote}
2017-02-21 06:19:11,899 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705345, largestCompletedWindow = 
6389565783323705345
2017-02-21 06:19:11,900 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: Rebuild 
the partial window after 6389565783323705345
2017-02-21 06:19:13,151 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705345, partialWindowTuples.size = 4, 
largestCompletedWindow = 6389565783323705345
2017-02-21 06:19:13,766 INFO 

[jira] [Created] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Munagala V. Ramanath (JIRA)
Munagala V. Ramanath created APEXMALHAR-2419:


 Summary: KafkaSinglePortExactlyOnceOutputOperator fails on recovery
 Key: APEXMALHAR-2419
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Munagala V. Ramanath


The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
message: "Violates Exactly once. Not all the tuples received after operator 
reset."

This is because of this check in endWindow():
{code}
   if (!partialWindowTuples.isEmpty() && windowId > 
windowDataManager.getLargestCompletedWindow()) {
  throw new RuntimeException("Violates Exactly once. Not all the tuples 
received after operator reset.");
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2418) Update the twitter library to 4.0.6

2017-02-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875771#comment-15875771
 ] 

ASF GitHub Bot commented on APEXMALHAR-2418:


GitHub user Shubham0411 opened a pull request:

https://github.com/apache/apex-malhar/pull/558

APEXMALHAR-2418 Updated twitter4j library version for twitter demo ap…

There was a known issue of Build Failure for twitter version 4.0.4.
So for successfully building and launching the application, need to change 
the version from 4.0.4 to 4.0.6. 
It has been tested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shubham0411/apex-malhar 
APEXMALHAR-2418-twitter-library-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #558


commit bb968f2106e6640590a8a0338ae4a5480dd2b239
Author: Shubham0411 
Date:   2017-02-21T10:45:29Z

APEXMALHAR-2418 Updated twitter4j library version for twitter demo 
application.




> Update the twitter library to 4.0.6 
> 
>
> Key: APEXMALHAR-2418
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2418
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shubham Agrawal
>Priority: Minor
>
> There was a known issue of Build Failure for twitter version 4.0.4.
> So for succesfully building and launching the app need to change the version 
> from 4.0.4 to 4.0.6.
> It has been tested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-malhar pull request #558: APEXMALHAR-2418 Updated twitter4j library ver...

2017-02-21 Thread Shubham0411
GitHub user Shubham0411 opened a pull request:

https://github.com/apache/apex-malhar/pull/558

APEXMALHAR-2418 Updated twitter4j library version for twitter demo ap…

There was a known issue of Build Failure for twitter version 4.0.4.
So for successfully building and launching the application, need to change 
the version from 4.0.4 to 4.0.6. 
It has been tested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shubham0411/apex-malhar 
APEXMALHAR-2418-twitter-library-version

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/558.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #558


commit bb968f2106e6640590a8a0338ae4a5480dd2b239
Author: Shubham0411 
Date:   2017-02-21T10:45:29Z

APEXMALHAR-2418 Updated twitter4j library version for twitter demo 
application.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [jira] [Commented] (APEXMALHAR-2381) Change FSWindowManager for peraformance issues in Kinesis Input Operator

2017-02-21 Thread Hitesh Kapoor
On o Zxdfxxffxd hu

On Feb 21, 2017 2:21 PM, "ASF GitHub Bot (JIRA)"  wrote:


[ https://issues.apache.org/jira/browse/APEXMALHAR-2381?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&
focusedCommentId=15875616#comment-15875616 ]

ASF GitHub Bot commented on APEXMALHAR-2381:


Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/550


> Change FSWindowManager for performance issues in Kinesis Input Operator
> ---
>
> Key: APEXMALHAR-2381
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2381
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXMALHAR-2418) Update the twitter library to 4.0.6

2017-02-21 Thread Shubham Agrawal (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Agrawal updated APEXMALHAR-2418:

Summary: Update the twitter library to 4.0.6   (was: Updating the supported 
version in Twitter Library.)

> Update the twitter library to 4.0.6 
> 
>
> Key: APEXMALHAR-2418
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2418
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Shubham Agrawal
>Priority: Minor
>
> There was a known issue of Build Failure for twitter version 4.0.4.
> So for succesfully building and launching the app need to change the version 
> from 4.0.4 to 4.0.6.
> It has been tested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (APEXMALHAR-2418) Updating the supported version in Twitter Library.

2017-02-21 Thread Shubham Agrawal (JIRA)
Shubham Agrawal created APEXMALHAR-2418:
---

 Summary: Updating the supported version in Twitter Library.
 Key: APEXMALHAR-2418
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2418
 Project: Apache Apex Malhar
  Issue Type: Bug
Reporter: Shubham Agrawal
Priority: Minor


There was a known issue of Build Failure for twitter version 4.0.4.

So for succesfully building and launching the app need to change the version 
from 4.0.4 to 4.0.6.

It has been tested.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (APEXMALHAR-2381) Change FSWindowManager for performance issues in Kinesis Input Operator

2017-02-21 Thread Chaitanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaitanya resolved APEXMALHAR-2381.
---
   Resolution: Fixed
Fix Version/s: 3.7.0

> Change FSWindowManager for performance issues in Kinesis Input Operator
> ---
>
> Key: APEXMALHAR-2381
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2381
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
> Fix For: 3.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] apex-malhar pull request #550: APEXMALHAR-2381 Change WindowManager for perf...

2017-02-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/apex-malhar/pull/550


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (APEXMALHAR-2381) Change FSWindowManager for performance issues in Kinesis Input Operator

2017-02-21 Thread Chaitanya (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaitanya updated APEXMALHAR-2381:
--
Issue Type: Improvement  (was: Bug)

> Change FSWindowManager for performance issues in Kinesis Input Operator
> ---
>
> Key: APEXMALHAR-2381
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2381
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: Deepak Narkhede
>Assignee: Deepak Narkhede
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (APEXMALHAR-2417) Add PojoOuterJoin (left, right and full) accumulation

2017-02-21 Thread Hitesh Kapoor (JIRA)
Hitesh Kapoor created APEXMALHAR-2417:
-

 Summary: Add PojoOuterJoin (left, right and full) accumulation
 Key: APEXMALHAR-2417
 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2417
 Project: Apache Apex Malhar
  Issue Type: Task
Reporter: Hitesh Kapoor
Assignee: Hitesh Kapoor


joinSimilar to PojoInnerJoin accumulation we should have  PojoOuterJoin 
accumulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)