[jira] [Commented] (FLINK-4640) Serialization of the initialValue of a Fold on WindowedStream fails

2016-09-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507442#comment-15507442
 ] 

Stephan Ewen commented on FLINK-4640:
-

Have a fix and nice tests. Waiting for the CI to give green light, then merging 
this fix.

> Serialization of the initialValue of a Fold on WindowedStream fails
> ---
>
> Key: FLINK-4640
> URL: https://issues.apache.org/jira/browse/FLINK-4640
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Fabian Hueske
>Priority: Blocker
> Fix For: 1.2.0, 1.1.3
>
>
> The following program
> {code}
> DataStream> src = env.fromElements(new Tuple2 Long>("a", 1L));
> src
>   .keyBy(1)
>   .timeWindow(Time.minutes(5))
>   .fold(TreeMultimap.create(), new FoldFunction Long>, TreeMultimap>() {
> @Override
> public TreeMultimap fold(
> TreeMultimap topKSoFar, 
> Tuple2 itemCount) throws Exception 
> {
>   String item = itemCount.f0;
>   Long count = itemCount.f1;
>   topKSoFar.put(count, item);
>   if (topKSoFar.keySet().size() > 10) {
> topKSoFar.removeAll(topKSoFar.keySet().first());
>   }
>   return topKSoFar;
> }
> });
> {code}
> throws this exception
> {quote}
> Caused by: java.lang.RuntimeException: Could not add value to folding state.
>   at 
> org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:91)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:382)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:192)
>   at 
> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:121)
>   at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:78)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:115)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:109)
>   at 
> org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:85)
>   ... 6 more
> {quote}
> The exception is caused because the initial value was not correctly 
> deserialized and is {{null}}.
> Using the same {{FoldFunction}} on a {{KeyedStream}} (without a window) works 
> fine.
> I tracked the problem down to the serialization of the {{StateDescriptor}}, 
> i.e., the {{writeObject()}} and {{readObject()}} methods. The methods use 
> Flink's TypeSerializers to serialize the default value. In case of the 
> {{TreeMultiMap}} this is the {{KryoSerializer}} which fails to read the 
> serialized data for some reason.
> A quick workaround to solve this issue would be to check if the default value 
> implements {{Serializable}} and use Java Serialization in this case. However, 
> it would be good to track the root cause of this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4640) Serialization of the initialValue of a Fold on WindowedStream fails

2016-09-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507323#comment-15507323
 ] 

Stephan Ewen commented on FLINK-4640:
-

Okay, so the {{keyBy(1).fold(...)}} case can be fixed via 
{{env.registerTypeWithKryoSerializer(TreeMultimap.class, 
JavaSerializer.class);}}

The Window operator still fails if I do 
{{keyBy(1).timeWindow(Time.seconds(10)).fold(...)}}.
The bug is that serializer registrations are not properly forwarded.

Fixing that...

> Serialization of the initialValue of a Fold on WindowedStream fails
> ---
>
> Key: FLINK-4640
> URL: https://issues.apache.org/jira/browse/FLINK-4640
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Fabian Hueske
>Priority: Blocker
> Fix For: 1.2.0, 1.1.3
>
>
> The following program
> {code}
> DataStream> src = env.fromElements(new Tuple2 Long>("a", 1L));
> src
>   .keyBy(1)
>   .timeWindow(Time.minutes(5))
>   .fold(TreeMultimap.create(), new FoldFunction Long>, TreeMultimap>() {
> @Override
> public TreeMultimap fold(
> TreeMultimap topKSoFar, 
> Tuple2 itemCount) throws Exception 
> {
>   String item = itemCount.f0;
>   Long count = itemCount.f1;
>   topKSoFar.put(count, item);
>   if (topKSoFar.keySet().size() > 10) {
> topKSoFar.removeAll(topKSoFar.keySet().first());
>   }
>   return topKSoFar;
> }
> });
> {code}
> throws this exception
> {quote}
> Caused by: java.lang.RuntimeException: Could not add value to folding state.
>   at 
> org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:91)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:382)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:192)
>   at 
> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:121)
>   at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:78)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:115)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:109)
>   at 
> org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:85)
>   ... 6 more
> {quote}
> The exception is caused because the initial value was not correctly 
> deserialized and is {{null}}.
> Using the same {{FoldFunction}} on a {{KeyedStream}} (without a window) works 
> fine.
> I tracked the problem down to the serialization of the {{StateDescriptor}}, 
> i.e., the {{writeObject()}} and {{readObject()}} methods. The methods use 
> Flink's TypeSerializers to serialize the default value. In case of the 
> {{TreeMultiMap}} this is the {{KryoSerializer}} which fails to read the 
> serialized data for some reason.
> A quick workaround to solve this issue would be to check if the default value 
> implements {{Serializable}} and use Java Serialization in this case. However, 
> it would be good to track the root cause of this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4640) Serialization of the initialValue of a Fold on WindowedStream fails

2016-09-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507104#comment-15507104
 ] 

Stephan Ewen commented on FLINK-4640:
-

I actually reproduced the error BOTH on the {{KeyedStream}} and on the 
{{WindowDataStream}}.

The problem is that Kryo cannot properly serialize the {{TreeMultiMap}}. It 
uses Objenesis to instantiate the map on deserialization, which leaves a broken 
object that then causes the nullpointer exception.
That is a Kryo/Guava incompatibility. Not sure there is anything we can do 
directly about that.

Should be fixable by registering a suitable serializer for the TreeMultiMap:
{code}
env.registerTypeWithKryoSerializer(TreeMultimap.class, JavaSerializer.class);
{code}

> Serialization of the initialValue of a Fold on WindowedStream fails
> ---
>
> Key: FLINK-4640
> URL: https://issues.apache.org/jira/browse/FLINK-4640
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Fabian Hueske
>Priority: Blocker
> Fix For: 1.2.0, 1.1.3
>
>
> The following program
> {code}
> DataStream> src = env.fromElements(new Tuple2 Long>("a", 1L));
> src
>   .keyBy(1)
>   .timeWindow(Time.minutes(5))
>   .fold(TreeMultimap.create(), new FoldFunction Long>, TreeMultimap>() {
> @Override
> public TreeMultimap fold(
> TreeMultimap topKSoFar, 
> Tuple2 itemCount) throws Exception 
> {
>   String item = itemCount.f0;
>   Long count = itemCount.f1;
>   topKSoFar.put(count, item);
>   if (topKSoFar.keySet().size() > 10) {
> topKSoFar.removeAll(topKSoFar.keySet().first());
>   }
>   return topKSoFar;
> }
> });
> {code}
> throws this exception
> {quote}
> Caused by: java.lang.RuntimeException: Could not add value to folding state.
>   at 
> org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:91)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:382)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:176)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:66)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:266)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   at 
> com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:192)
>   at 
> com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:121)
>   at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:78)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:115)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount$1.fold(WordCount.java:109)
>   at 
> org.apache.flink.runtime.state.memory.MemFoldingState.add(MemFoldingState.java:85)
>   ... 6 more
> {quote}
> The exception is caused because the initial value was not correctly 
> deserialized and is {{null}}.
> Using the same {{FoldFunction}} on a {{KeyedStream}} (without a window) works 
> fine.
> I tracked the problem down to the serialization of the {{StateDescriptor}}, 
> i.e., the {{writeObject()}} and {{readObject()}} methods. The methods use 
> Flink's TypeSerializers to serialize the default value. In case of the 
> {{TreeMultiMap}} this is the {{KryoSerializer}} which fails to read the 
> serialized data for some reason.
> A quick workaround to solve this issue would be to check if the default value 
> implements {{Serializable}} and use Java Serialization in this case. However, 
> it would be good to track the root cause of this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)