[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-03-24 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940588#comment-15940588
 ] 

Thomas Weise commented on APEXMALHAR-2419:
--

Missing fix version.

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}
> Note:
> It is failing because of the bug in the WindowDataManager,
> https://issues.apache.org/jira/browse/APEXMALHAR-2422



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-22 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879182#comment-15879182
 ] 

Thomas Weise commented on APEXMALHAR-2419:
--

Completed windows (endWindow) are supposed to be flushed to WAL, irrespective 
of committed.


> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-22 Thread Sandesh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878951#comment-15878951
 ] 

Sandesh commented on APEXMALHAR-2419:
-

Largest completed window is moving only after a committed call on 
WindowDataManager but the expected behavior is, Largest completed Window is the 
last saved Window ID. Current behaviour breaks not only this operator but also 
all the operators that use this operator.

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-22 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878506#comment-15878506
 ] 

Munagala V. Ramanath commented on APEXMALHAR-2419:
--

To clarify, the point of the logs above is manifold:
1. The semantics of getLargestCompletedWindow() call are unclear -- what does 
"completed"
mean ?  We have well defined meanings for "committed" and "checkpointed" 
windows but not
for "completed"; the only reasonable meaning I can think of is: a window in 
which endWindow()
   has been called. It appears that operators are using this call with this 
expectation.
2. If the above interpretation is correct, we expect this call to return a new 
value after each
endWindow() call but that is not happening -- the value is constant for the 
entire duration of a
single run.
2. The operators that use WindowDataManager need to know when they are 
processing
replayed windows and when replay is finished. Given the current behavior, 
there is currently
   no way for operators to determine when replay is done using 
WindowDataManager.

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-22 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878463#comment-15878463
 ] 

Munagala V. Ramanath commented on APEXMALHAR-2419:
--

[~sandesh] Does "recovery is not happening" mean (a) This is the first run of 
the operator and it has not gone down; or (b) This may be a restarted run but 
we've finished replaying all the old windows are are now processing new windows 
?

largestCompletedWindow will be -1 for (a) but not for (b)

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Sandesh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876656#comment-15876656
 ] 

Sandesh commented on APEXMALHAR-2419:
-

Fix is here, until it is merged.
https://github.com/sandeshh/apex-malhar/tree/APEXMALHAR-2419


> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>Assignee: Sandesh
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Sandesh (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876641#comment-15876641
 ] 

Sandesh commented on APEXMALHAR-2419:
-

largestCompletedWindow will be -1, if the recovery is not happening.

> KafkaSinglePortExactlyOnceOutputOperator fails on recovery
> --
>
> Key: APEXMALHAR-2419
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2419
> Project: Apache Apex Malhar
>  Issue Type: Bug
>Reporter: Munagala V. Ramanath
>
> The KafkaSinglePortExactlyOnceOutputOperator fails on recovery with the 
> message: "Violates Exactly once. Not all the tuples received after operator 
> reset."
> This is because of this check in endWindow():
> {code}
>if (!partialWindowTuples.isEmpty() && windowId > 
> windowDataManager.getLargestCompletedWindow()) {
>   throw new RuntimeException("Violates Exactly once. Not all the tuples 
> received after operator reset.");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (APEXMALHAR-2419) KafkaSinglePortExactlyOnceOutputOperator fails on recovery

2017-02-21 Thread Munagala V. Ramanath (JIRA)

[ 
https://issues.apache.org/jira/browse/APEXMALHAR-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876082#comment-15876082
 ] 

Munagala V. Ramanath commented on APEXMALHAR-2419:
--

The expectation seems to be that 
`windowDataManager.getLargestCompletedWindow()` will be the last window on 
which `windowDataManager.save()` was called in the previous failed run which in 
turn means that `windowId > windowDataManager.getLargestCompletedWindow()` will 
be true only when replay is complete. However as the following log messages 
show, the `windowDataManager.getLargestCompletedWindow()` always returns the 
same value during a run -- this may indicate a bug in FSWindowDataManager.

Notice in the initial deploy, the largestCompletedWindow is always -1 even 
after a save() call in endWindow:

Initial deploy:
{quote}
2017-02-21 06:18:55,196 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705345, largestCompletedWindow = -1
2017-02-21 06:18:55,240 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705345, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,747 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705345, 
largestCompletedWindow = -1
2017-02-21 06:18:55,749 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705346, largestCompletedWindow = -1
2017-02-21 06:18:55,749 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705346, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,806 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705346, 
largestCompletedWindow = -1
2017-02-21 06:18:55,806 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705347, largestCompletedWindow = -1
2017-02-21 06:18:55,806 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705347, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,906 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705347, 
largestCompletedWindow = -1
2017-02-21 06:18:55,906 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705348, largestCompletedWindow = -1
2017-02-21 06:18:55,906 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705348, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:55,964 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705348, 
largestCompletedWindow = -1
2017-02-21 06:18:55,964 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705349, largestCompletedWindow = -1
2017-02-21 06:18:55,965 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705349, partialWindowTuples.size = 0, 
largestCompletedWindow = -1
2017-02-21 06:18:56,022 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: saved data for windowId = 6389565783323705349, 
largestCompletedWindow = -1
2017-02-21 06:18:56,023 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705350, largestCompletedWindow = -1
2017-02-21 06:19:03,570 INFO 
com.datatorrent.stram.engine.StreamingContainer: Undeploy request: [3]
{quote}

Notice here that when we reach the second replay window, it is not recognized 
as replay and throws the exception.

After redeploy:

{quote}
2017-02-21 06:19:11,899 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
beginWindow: windowId = 6389565783323705345, largestCompletedWindow = 
6389565783323705345
2017-02-21 06:19:11,900 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: Rebuild 
the partial window after 6389565783323705345
2017-02-21 06:19:13,151 INFO 
org.apache.apex.malhar.kafka.KafkaSinglePortExactlyOnceOutputOperator: 
endWindow: windowId = 6389565783323705345, partialWindowTuples.size = 4, 
largestCompletedWindow = 6389565783323705345
2017-02-21 06:19:13,766 INFO