Re: How to get Kafka record's timestamp using Flink's KeyedDeserializationSchema?

2018-11-16 Thread Flink Developer
Thanks. Is there an alternative way to obtain the Kafka record timestamps using 
FlinkKafkaConsumer?

‐‐‐ Original Message ‐‐‐
On Friday, November 16, 2018 9:59 AM, Andrey Zagrebin 
 wrote:

> Hi,
>
> I think this is still on-going effort. You can monitor the corresponding 
> issues [1] and [2].
>
> Best,
> Andrey
>
> [1] https://issues.apache.org/jira/browse/FLINK-8500
> [2] https://issues.apache.org/jira/browse/FLINK-8354
>
>> On 16 Nov 2018, at 09:41, Flink Developer  
>> wrote:
>>
>> Kafka timestamp

Re: Flink Serialization and case class fields limit

2018-11-16 Thread Fabian Hueske
Hi Andrea,

I wrote the post 2.5 years ago. Sorry, I don't think that I kept the code
somewhere, but the mechanics in Flink should still be the same.

Best, Fabian

Am Fr., 16. Nov. 2018, 20:06 hat Andrea Sella  geschrieben:

> Hi Andrey,
>
> My bad, I forgot to say that I am using Scala 2.11, that’s why I asked
> about the limitation, and Flink 1.5.5.
>
> If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely
> on unapply and tupled functions, so I'd say that Flink doesn't have this
> kind of limitation with Scala 2.11. Correct?
>
> Thank you,
> Andrea
> On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin 
> wrote:
>
>> Hi Andrea,
>>
>> 22 limit comes from Scala [1], not Flink.
>> I am not sure about any repo for the post, but I also cc'ed Fabian, maybe
>> he will point to some if it exists.
>>
>> Best,
>> Andrey
>>
>> [1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html
>>
>>
>> On 16 Nov 2018, at 13:10, Andrea Sella  wrote:
>>
>> Hey squirrels,
>>
>> I've started to study more in-depth Flink Serialization and its "type
>> system".
>>
>> I have a generated case class using scalapb that has more than 30 fields;
>> I've seen that Flink still uses the CaseClassSerializer, the
>> TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
>> differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
>> is well because the CaseClassSerializer is faster than Kryo. Did I
>> misunderstand the documentation or don't the limitation apply anymore?
>>
>> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
>> would like to replicate the experiment with some tailored changes to deep
>> dive even better in the topic. Is the source code in Github or somewhere
>> else?
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
>> [2]
>> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
>>
>> Thank you,
>> Andrea
>>
>>
>>


Re: Kinesis Shards and Parallelism

2018-11-16 Thread shkob1
Actually was looking at the task manager level, i did have more slots than
shards, so it does make sense i had an idle task manager while other task
managers split the shards between their slots

Thanks!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Flink Serialization and case class fields limit

2018-11-16 Thread Andrea Sella
Hi Andrey,

My bad, I forgot to say that I am using Scala 2.11, that’s why I asked
about the limitation, and Flink 1.5.5.

If I recall correctly CaseClassSerilizer and CaseClassTypeInfo don’t rely
on unapply and tupled functions, so I'd say that Flink doesn't have this
kind of limitation with Scala 2.11. Correct?

Thank you,
Andrea
On Fri, 16 Nov 2018 at 19:34, Andrey Zagrebin 
wrote:

> Hi Andrea,
>
> 22 limit comes from Scala [1], not Flink.
> I am not sure about any repo for the post, but I also cc'ed Fabian, maybe
> he will point to some if it exists.
>
> Best,
> Andrey
>
> [1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html
>
>
> On 16 Nov 2018, at 13:10, Andrea Sella  wrote:
>
> Hey squirrels,
>
> I've started to study more in-depth Flink Serialization and its "type
> system".
>
> I have a generated case class using scalapb that has more than 30 fields;
> I've seen that Flink still uses the CaseClassSerializer, the
> TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
> differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
> is well because the CaseClassSerializer is faster than Kryo. Did I
> misunderstand the documentation or don't the limitation apply anymore?
>
> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
> would like to replicate the experiment with some tailored changes to deep
> dive even better in the topic. Is the source code in Github or somewhere
> else?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
> [2]
> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
>
> Thank you,
> Andrea
>
>
>


Re: BucketingSink vs StreamingFileSink

2018-11-16 Thread Andrey Zagrebin
Hi,

StreamingFileSink is supposed to subsume BucketingSink which will be deprecated.

StreamingFileSink fixes some issues of BucketingSink, especially with AWS s3 
and adds more flexibility with defining the rolling policy.

StreamingFileSink does not support older hadoop versions at the moment, 
but there are ideas how to resolve this.

You can have a look how to use StreamingFileSink with Parquet here [1].

I also cc’ed Kostas, he might add more to this topic.

Best,
Andrey

[1] 
https://github.com/apache/flink/blob/0b4947b6142f813d2f1e0e662d0fefdecca0e382/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/ParquetStreamingFileSinkITCase.java

> On 16 Nov 2018, at 11:31, Edward Rojas  wrote:
> 
> Hello,
> We are currently using Flink 1.5 and we use the BucketingSink to save the
> result of job processing to HDFS.
> The data is in JSON format and we store one object per line in the resulting
> files. 
> 
> We are planning to upgrade to Flink 1.6 and we see that there is this new
> StreamingFileSink,  from the description it looks very similar to
> BucketingSink when using Row-encoded Output Format, my question is, should
> we consider to move to StreamingFileSink?
> 
> I would like to better understand what are the suggested use cases for each
> of the two options now (?)
> 
> We are also considering to additionally output the data in Parquet format
> for data scientists (to be stored in HDFS as well), for this I see some
> utils to work with StreamingFileSink, so I guess for this case it's
> recommended to use that option(?).
> Is it possible to use the Parquet writers even when the schema of the data
> may evolve ?
> 
> Thanks in advance for your help.
> (Sorry if I put too many questions in the same message)
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: Could not find previous entry with key.

2018-11-16 Thread Steve Bistline
Implemented hashcode() on both the DEVICE_ID and the MOTION_DIRECTION ( the
pattern is built around this one ). Still giving me the following error:


java.lang.RuntimeException: Exception occurred while processing valve
output watermark:
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265)
at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:103)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Could not find previous
entry with key: first event, value:
{"DEVICE_ID":7435b060-d2fb-11e8-8da5-9779854d8172,"TIME_STAMP":11/16/2018
06:34:33.994 
pm,"TEMPERATURE":0.0,"HUMIDITY":"0.0","LIGHT_WHITE":0.0,"PROCX":0.0,"MOTION_DIRECTION":0,"MOTION_SPEED":0,"MOOD_STATE":0,"VISION_PEOPLE":0,"AUDIO1":0.0}
and timestamp: 1542393274928. This can indicate that either you did
not implement the equals() and hashCode() methods of your input
elements properly or that the element belonging to that entry has been
already pruned.
at org.apache.flink.cep.nfa.SharedBuffer.put(SharedBuffer.java:107)
at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:566)
at org.apache.flink.cep.nfa.NFA.process(NFA.java:252)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:332)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:235)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at 
java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.onEventTime(AbstractKeyedCEPPatternOperator.java:234)
at 
org.apache.flink.streaming.api.operators.HeapInternalTimerService.advanceWatermark(HeapInternalTimerService.java:288)
at 
org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:108)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:734)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
... 7 more


On Thu, Nov 15, 2018 at 8:13 AM Chesnay Schepler  wrote:

> Does the issue persist if you implement hashCode() in IoTEvent like this:
>
> @Overridepublic int hashCode() {
>return this.DEVICE_ID.hashCode();}
>
>
> On 15.11.2018 13:58, Steve Bistline wrote:
>
> Sure, here it is.
>
> Thank you so very much for having a look at this.
>
> Steve
>
>
> On Thu, Nov 15, 2018 at 4:18 AM Chesnay Schepler 
> wrote:
>
>> Can you provide us with the implementation of your Event and IoTEvent
>> classes?
>>
>> On 15.11.2018 06:10, Steve Bistline wrote:
>>
>> Any thoughts on where to start with this error would be appreciated.
>>
>> Caused by: java.lang.IllegalStateException: Could not find previous entry 
>> with key: first event, value: 
>> {"DEVICE_ID":f8a395a0-d3e2-11e8-b050-9779854d8172,"TIME_STAMP":11/15/2018 
>> 02:29:30.343 
>> am,"TEMPERATURE":0.0,"HUMIDITY":"0.0","LIGHT_WHITE":0.0,"PROCX":0.0,"MOTION_DIRECTION":0,"MOTION_SPEED":0,"MOOD_STATE":0,"VISION_PEOPLE":0,"AUDIO1":0.0}
>>  and timestamp: 1542248971585. This can indicate that either you did not 
>> implement the equals() and hashCode() methods of your input elements 
>> properly or that the element belonging to that entry has been already pruned.
>>
>> =
>>
>> CODE HERE
>>
>> =
>>
>> //kinesisConsumerConfig.list(System.out);   // Consume the data streams 
>> from AWS Kinesis stream   DataStream dataStream = 
>> env.addSource(new FlinkKinesisConsumer<>(
>>pt.getRequired("stream"),   new EventSchema(),
>>kinesisConsumerConfig))
>>.name("Kinesis Stream Consumer");   
>> System.out.printf("Print dataStream\n");//dataStream.print();   
>> DataStream kinesisStream = dataStream
>>.assignTimestampsAndWatermarks(new 
>> TimeLagWatermarkGenerator())
>>.map(event -> (IoTEvent) event);   

Re: Flink Serialization and case class fields limit

2018-11-16 Thread Andrey Zagrebin
Hi Andrea,

22 limit comes from Scala [1], not Flink.
I am not sure about any repo for the post, but I also cc'ed Fabian, maybe he 
will point to some if it exists.

Best,
Andrey

[1] https://underscore.io/blog/posts/2016/10/11/twenty-two.html

> On 16 Nov 2018, at 13:10, Andrea Sella  wrote:
> 
> Hey squirrels,
> 
> I've started to study more in-depth Flink Serialization and its "type system".
> 
> I have a generated case class using scalapb that has more than 30 fields; 
> I've seen that Flink still uses the CaseClassSerializer, the TypeInformation 
> is CaseClassTypeInfo, even if in the docs[1] is written differently (22 
> fields limit). I'd have expected a GenericTypeInfo, but all is well because 
> the CaseClassSerializer is faster than Kryo. Did I misunderstand the 
> documentation or don't the limitation apply anymore? 
> 
> Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I 
> would like to replicate the experiment with some tailored changes to deep 
> dive even better in the topic. Is the source code in Github or somewhere else?
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
>  
> 
> [2] 
> https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html 
> 
> 
> Thank you,
> Andrea
> 



Re: How to get Kafka record's timestamp using Flink's KeyedDeserializationSchema?

2018-11-16 Thread Andrey Zagrebin
Hi,

I think this is still on-going effort. You can monitor the corresponding issues 
[1] and [2].

Best,
Andrey

[1] https://issues.apache.org/jira/browse/FLINK-8500 

[2] https://issues.apache.org/jira/browse/FLINK-8354

> On 16 Nov 2018, at 09:41, Flink Developer  wrote:
> 
> Kafka timestamp



Re: Standalone HA cluster: Fatal error occurred in the cluster entrypoint.

2018-11-16 Thread Olga Luganska
Hi, Miki

Thank you for reply!

I have deleted zookeeper data and was able to restart cluster.

Olga

Sent from my iPhone

On Nov 16, 2018, at 4:38 AM, miki haiat 
mailto:miko5...@gmail.com>> wrote:

I "solved" this issue by cleaning the zookeeper information and start the 
cluster again all the the checkpoint and job graph data will be erased and 
basacly you will start a new cluster...

It's happened to me allot on a 1.5.x
On a 1.6 things are running perfect .
I'm not sure way this error is back again on 1.6.1 ?


On Fri, 16 Nov 2018, 0:42 Olga Luganska 
mailto:trebl...@hotmail.com> wrote:
Hello,

I am running flink 1.6.1 standalone HA cluster. Today I am unable to start 
cluster because of "Fatal error in cluster entrypoint"
(I used to see this error when running flink 1.5 version, after upgrade to 
1.6.1 (which had a fix for this bug) everything worked well for a while)

Question: what exactly needs to be done to clean "state handle store"?


2018-11-15 15:09:53,181 DEBUG 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor  - Fencing token 
not set: Ignoring message LocalFencedMessage(null, 
org.apache.flink.runtime.rpc.messages.RunAsync@21fd224c) because the fencing 
token is null.

2018-11-15 15:09:53,182 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error 
occurred in the cluster entrypoint.

java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could not 
retrieve submitted JobGraph from state handle under 
/e13034f83a80072204facb2cec9ea6a3. This indicates that the retrieved state 
handle is broken. Try cleaning the state handle store.

at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)

at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$1(FunctionUtils.java:61)

at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)

at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)

at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)

at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)

at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
JobGraph from state handle under /e13034f83a80072204facb2cec9ea6a3. This 
indicates that the retrieved state handle is broken. Try cleaning the state 
handle store.

at 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)

at 
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:692)

at 
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:677)

at 
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:658)

at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:817)

at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$1(FunctionUtils.java:59)

... 9 more

Caused by: java.io.FileNotFoundException: 
/checkpoint_repo/ha/submittedJobGraphdd865937d674 (No such file or directory)

at java.io.FileInputStream.open0(Native Method)

at java.io.FileInputStream.open(FileInputStream.java:195)

at java.io.FileInputStream.(FileInputStream.java:138)

at 
org.apache.flink.core.fs.local.LocalDataInputStream.(LocalDataInputStream.java:50)

at 
org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)

at 
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)

at 
org.apache.flink.runtime.state.RetrievableStreamStateHandle.openInputStream(RetrievableStreamStateHandle.java:64)

at 
org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:57)

at 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:202)

... 14 more

2018-11-15 15:09:53,185 INFO  org.apache.flink.runtime.blob.TransientBlobCache  
- Shutting down BLOB cache


thank you,

Olga



Re: Null Pointer Exception

2018-11-16 Thread Ken Krugler
Hi Steve,

I don’t have experience with the Flink CEP, but based on the previous stack 
trace you posted I’m guessing that for one of the records, 
value.getAudio_FFT1() returns null.

— Ken

> On Nov 16, 2018, at 3:40 AM, Steve Bistline  wrote:
> 
> I have a fairly straightforward project that is generating a null pointer and 
>  heap space error.
> 
> Any thoughts on where to begin debugging this?
> 
> I suspect it is in this part of the code somewhere.
> 
> Pattern pattern = Pattern
> . begin("first event").subtype(IoTEvent.class)
> .where(new IterativeCondition()
> {
> private static final long serialVersionUID = 
> -6301755149429716724L;
> 
> @Override
> public boolean filter(IoTEvent value, Context ctx) 
> throws Exception {
> return 
> PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1() );
> }
> })
> .next("second")
> .subtype(IoTEvent.class)
> .where(new IterativeCondition() {
> private static final long serialVersionUID = 2392863109523984059L;
> 
> @Override
> public boolean filter(IoTEvent value, Context ctx) 
> throws Exception {
> return 
> PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1() );
> }
> })
> .next("third")
> .subtype(IoTEvent.class)
> .where(new IterativeCondition() {
> private static final long serialVersionUID = 2392863109523984059L;
> 
> @Override
> public boolean filter(IoTEvent value, Context ctx) 
> throws Exception {
> return 
> PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1() );
> }
> })
> .next("fourth")
> .subtype(IoTEvent.class)
> .where(new IterativeCondition() {
> private static final long serialVersionUID = 2392863109523984059L;
> 
> @Override
> public boolean filter(IoTEvent value, Context ctx) 
> throws Exception {
> return 
> PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1() );
> }
> })
> .within(Time.seconds(10));
> 
> Any help appreciated
> 
> Thanks,
> 
> SRB
> 
> ===
> 
> =
> 
> java.lang.RuntimeException: Exception occurred while processing valve output 
> watermark: 
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265)
>   at 
> org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
>   at 
> org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:103)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Failure happened in filter function.
>   at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:704)
>   at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:500)
>   at org.apache.flink.cep.nfa.NFA.process(NFA.java:252)
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:332)
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:235)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at 
> java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
>   at 
> org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.onEventTime(AbstractKeyedCEPPatternOperator.java:234)
>   at 
> org.apache.flink.streaming.api.operators.HeapInternalTimerService.advanceWatermark(HeapInternalTimerService.java:288)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:108)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:734)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
>   ... 7 more
> Caused by: java.lang.NullPointerException
> 
> Steve Bistline mailto:srbistline.t...@gmail.com>>
> Thu, Nov 15, 8:28 PM (10 hours ago)
> 
> to user
> 

Flink Serialization and case class fields limit

2018-11-16 Thread Andrea Sella
Hey squirrels,

I've started to study more in-depth Flink Serialization and its "type
system".

I have a generated case class using scalapb that has more than 30 fields;
I've seen that Flink still uses the CaseClassSerializer, the
TypeInformation is CaseClassTypeInfo, even if in the docs[1] is written
differently (22 fields limit). I'd have expected a GenericTypeInfo, but all
is well because the CaseClassSerializer is faster than Kryo. Did I
misunderstand the documentation or don't the limitation apply anymore?

Another thing, I've read "Juggling with Bits and Bytes"[2] blog post an I
would like to replicate the experiment with some tailored changes to deep
dive even better in the topic. Is the source code in Github or somewhere
else?

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/types_serialization.html#flinks-typeinformation-class
[2]
https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html

Thank you,
Andrea


Null Pointer Exception

2018-11-16 Thread Steve Bistline
I have a fairly straightforward project that is generating a null pointer
and  heap space error.

Any thoughts on where to begin debugging this?

I suspect it is in this part of the code somewhere.


Pattern pattern = Pattern
. begin("first event").subtype(IoTEvent.class)
.where(new IterativeCondition()
{
private static final long serialVersionUID = -6301755149429716724L;

@Override
public boolean filter(IoTEvent value, Context
ctx) throws Exception {
return
PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1()
);
}
})
.next("second")
.subtype(IoTEvent.class)
.where(new IterativeCondition() {
private static final long serialVersionUID = 2392863109523984059L;

@Override
public boolean filter(IoTEvent value, Context
ctx) throws Exception {
return
PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1()
);
}
})
.next("third")
.subtype(IoTEvent.class)
.where(new IterativeCondition() {
private static final long serialVersionUID = 2392863109523984059L;

@Override
public boolean filter(IoTEvent value, Context
ctx) throws Exception {
return
PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1()
);
}
})
.next("fourth")
.subtype(IoTEvent.class)
.where(new IterativeCondition() {
private static final long serialVersionUID = 2392863109523984059L;

@Override
public boolean filter(IoTEvent value, Context
ctx) throws Exception {
return
PatternConstants.AUDIO_YELL.toLowerCase().contains(value.getAudio_FFT1()
);
}
})
.within(Time.seconds(10));

Any help appreciated

Thanks,

SRB

===

=

java.lang.RuntimeException: Exception occurred while processing valve
output watermark:
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:265)
at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.findAndOutputNewMinWatermarkAcrossAlignedChannels(StatusWatermarkValve.java:189)
at 
org.apache.flink.streaming.runtime.streamstatus.StatusWatermarkValve.inputWatermark(StatusWatermarkValve.java:111)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:184)
at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:103)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failure happened in filter function.
at org.apache.flink.cep.nfa.NFA.createDecisionGraph(NFA.java:704)
at org.apache.flink.cep.nfa.NFA.computeNextStates(NFA.java:500)
at org.apache.flink.cep.nfa.NFA.process(NFA.java:252)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.processEvent(AbstractKeyedCEPPatternOperator.java:332)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.lambda$onEventTime$0(AbstractKeyedCEPPatternOperator.java:235)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at 
java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
at 
org.apache.flink.cep.operator.AbstractKeyedCEPPatternOperator.onEventTime(AbstractKeyedCEPPatternOperator.java:234)
at 
org.apache.flink.streaming.api.operators.HeapInternalTimerService.advanceWatermark(HeapInternalTimerService.java:288)
at 
org.apache.flink.streaming.api.operators.InternalTimeServiceManager.advanceWatermark(InternalTimeServiceManager.java:108)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.processWatermark(AbstractStreamOperator.java:734)
at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor$ForwardingValveOutputHandler.handleWatermark(StreamInputProcessor.java:262)
... 7 more
Caused by: java.lang.NullPointerException

Steve Bistline 
Thu, Nov 15, 8:28 PM (10 hours ago)
to user
More to the story the cluster kicked out this error:

Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "flink-7"

Uncaught error from thread [Uncaught error from thread [Uncaught error from
thread [flink-scheduler-1]:
flink-akka.remote.default-remote-dispatcher-92]: flink-10]: Java heap
space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled
for for ActorSystem[Java heap spaceflink, shutting down JVM since

Re: Standalone HA cluster: Fatal error occurred in the cluster entrypoint.

2018-11-16 Thread miki haiat
I "solved" this issue by cleaning the zookeeper information and start the
cluster again all the the checkpoint and job graph data will be erased and
basacly you will start a new cluster...

It's happened to me allot on a 1.5.x
On a 1.6 things are running perfect .
I'm not sure way this error is back again on 1.6.1 ?


On Fri, 16 Nov 2018, 0:42 Olga Luganska  Hello,
>
> I am running flink 1.6.1 standalone HA cluster. Today I am unable to start
> cluster because of "Fatal error in cluster entrypoint"
> (I used to see this error when running flink 1.5 version, after upgrade to
> 1.6.1 (which had a fix for this bug) everything worked well for a while)
>
> Question: what exactly needs to be done to clean "state handle store"?
>
> 2018-11-15 15:09:53,181 DEBUG
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor  - Fencing
> token not set: Ignoring message LocalFencedMessage(null,
> org.apache.flink.runtime.rpc.messages.RunAsync@21fd224c) because the
> fencing token is null.
>
> 2018-11-15 15:09:53,182 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
> occurred in the cluster entrypoint.
>
> java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could
> not retrieve submitted JobGraph from state handle under
> /e13034f83a80072204facb2cec9ea6a3. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
>
> at
> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
>
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$1(FunctionUtils.java:61)
>
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>
> at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> submitted JobGraph from state handle under
> /e13034f83a80072204facb2cec9ea6a3. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
>
> at
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:692)
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:677)
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:658)
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:817)
>
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$1(FunctionUtils.java:59)
>
> ... 9 more
>
> Caused by: java.io.FileNotFoundException:
> /checkpoint_repo/ha/submittedJobGraphdd865937d674 (No such file or
> directory)
>
> at java.io.FileInputStream.open0(Native Method)
>
> at java.io.FileInputStream.open(FileInputStream.java:195)
>
> at java.io.FileInputStream.(FileInputStream.java:138)
>
> at
> org.apache.flink.core.fs.local.LocalDataInputStream.(LocalDataInputStream.java:50)
>
> at
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)
>
> at
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
>
> at
> org.apache.flink.runtime.state.RetrievableStreamStateHandle.openInputStream(RetrievableStreamStateHandle.java:64)
>
> at
> org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:57)
>
> at
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:202)
>
> ... 14 more
>
> 2018-11-15 15:09:53,185 INFO
> org.apache.flink.runtime.blob.TransientBlobCache  - Shutting
> down BLOB cache
>
>
> thank you,
>
> Olga
>
>


BucketingSink vs StreamingFileSink

2018-11-16 Thread Edward Rojas
Hello,
We are currently using Flink 1.5 and we use the BucketingSink to save the
result of job processing to HDFS.
The data is in JSON format and we store one object per line in the resulting
files. 

We are planning to upgrade to Flink 1.6 and we see that there is this new
StreamingFileSink,  from the description it looks very similar to
BucketingSink when using Row-encoded Output Format, my question is, should
we consider to move to StreamingFileSink?

I would like to better understand what are the suggested use cases for each
of the two options now (?)

We are also considering to additionally output the data in Parquet format
for data scientists (to be stored in HDFS as well), for this I see some
utils to work with StreamingFileSink, so I guess for this case it's
recommended to use that option(?).
Is it possible to use the Parquet writers even when the schema of the data
may evolve ?

Thanks in advance for your help.
(Sorry if I put too many questions in the same message)



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


The flink checkpoint time interval is not normal

2018-11-16 Thread 远远
the flink app ruuning on flink1.5.4 on yarn, it should be triggered every 5
min...

[image: image.png]


How to get Kafka record's timestamp using Flink's KeyedDeserializationSchema?

2018-11-16 Thread Flink Developer
Hi, I have a flink app which uses the FlinkKafkaConsumer.

I am interested in retrieving the Kafka timestamp for a given record/offset 
using the *KeyedDeserializationSchema* which provides topic, partition, offset 
and message.

How can the timestamp be obtained through this interface?

Thank you