RE: CsvTableSource Types.TIMESTAMP

2018-03-06 Thread Esa Heikkinen
Hi Thank you, it worked, but there was another problem now in same example. How to use .filter(): val table = tEnv .scan("customers") .filter('name.isNotNull && 'last_update > "2016-01-01 00:00:00".toTimestamp) .select('id, 'name.lowerCase(), 'prefs) Error in compiling: "Value > is not member o

Re: Which test cluster to use for checkpointing tests?

2018-03-06 Thread Nico Kruber
Hi Ken, sorry, I was mislead by the fact that you are using iterations and those were only documented for the DataSet API. Running checkpoints with closed sources sounds like a more general thing than being part of the iterations rework of FLIP-15. I couldn't dig up anything on jira regarding this

Re: Which test cluster to use for checkpointing tests?

2018-03-06 Thread Paris Carbone
Hey, Indeed checkpointing iterations and dealing with closed sources are orthogonal issues, that is why the latter is not part of FLIP-15. Though, you kinda need both to have meaningful checkpoints for jobs with iterations. One has to do with correctness (checkpointing strongly connected compone

Re: Which test cluster to use for checkpointing tests?

2018-03-06 Thread Nico Kruber
There are still some upcoming changes for the network stack, but most of the heavy stuff it already through - you may track this under https://issues.apache.org/jira/browse/FLINK-8581 FLIP-6 is somewhat similar and currently only undergoes some stability improvements/bug fixing. The architectural

akka.remote.ReliableDeliverySupervisor Temporary failure in name resolution

2018-03-06 Thread miki haiat
Hi , Im running flink jobs on kubernetes after a day or so. the task manager and job managerlosing connection and i have to restart earthing . Im assuming that one of the pods crashed and when now pod start he cant find the job manager ? Also i saw that is an Akka issue... and it wiil be fi

Re: Rocksdb in production

2018-03-06 Thread Jayant Ameta
Thanks Fabian. Will there be any performance issues if I use NFS as the shared filesystem (as compared to HDFS or S3)? Jayant Ameta On Mon, Mar 5, 2018 at 10:31 PM, Fabian Hueske wrote: > Yes, that is correct. > > 2018-03-05 8:57 GMT-08:00 Jayant Ameta : > >> Oh! Somehow I missed while reading

Flink is looking for Kafka topic "n/a"

2018-03-06 Thread Mu Kong
Hi, I have encountered a wired problem. After I start the job for several days, Flink gave me the following error: *java.lang.RuntimeException: Unable to find a leader for partitions: [Partition: KafkaTopicPartition{topic='n/a', partition=-1}, KafkaPartitionHandle=[n/a,-1], offset=(not set)]* *

Re: Flink is looking for Kafka topic "n/a"

2018-03-06 Thread Nico Kruber
Hi Mu, which version of flink are you using? I checked the latest branches for 1.2 - 1.5 to look for findLeaderForPartitions at line 205 in Kafka08Fetcher but they did not match. From what I can see in the code, there is a MARKER partition state with topic "n/a" but that is explicitly removed from

Re: Spread Kafka sink tasks over different nodes

2018-03-06 Thread Aljoscha Krettek
Hi Dongwon, I think there is currently no way of ensuring that tasks are spread out across different machines because the scheduling logic does not take into account what machine a slot is on. I currently see two workarounds: - Let all operations have the same parallelism and only have 8 slots

Re: akka.remote.ReliableDeliverySupervisor Temporary failure in name resolution

2018-03-06 Thread Nico Kruber
Hi Miki, I'm no expert on the Kubernetes part, but could that be related to https://github.com/kubernetes/kubernetes/issues/6667? I'm not sure this is an Akka issue: if it cannot communicate with some address it basically blocks it from further connection attempts for a given time (here 5 seconds)

Re: Rocksdb in production

2018-03-06 Thread Fabian Hueske
That depends on your job and the setup. Remember that all operators will write their checkpoint data into that file system. If the state grows very large and only have an NFS with little write performance, it might be a problem. But the same would apply to HDFS as well. 2018-03-06 2:51 GMT-08:00 J

Re: Kafka offset auto-commit stops after timeout

2018-03-06 Thread Nico Kruber
Hi Edward, looking through the Kafka code, I do see a path where they deliberately do not want recursive retries, i.e. if the coordinator is unknown. It seems like you are getting into this scenario. I'm no expert on Kafka and therefore I'm not sure on the implications or ways to circumvent/fix th

Re: Kafka offset auto-commit stops after timeout

2018-03-06 Thread Edward
Thanks for the reply, Nico. I've been testing with OffsetCommitMode.ON_CHECKPOINTS, and I can confirm that this fixes the issue -- even if a single commit time out when communicating with Kafka, subsequent offset commits are still successful. -- Sent from: http://apache-flink-user-mailing-list-

Re: bin/start-cluster.sh won't start jobmanager on master machine

2018-03-06 Thread Nico Kruber
Hi Yesheng, `nohup /bin/bash -l bin/jobmanager.sh start cluster ...` looks a bit strange since it should (imho) be an absolute path towards flink. What you could do to diagnose further, is to try to run the ssh command manually, i.e. figure out what is being executed by calling bash -x ./bin/start

Re: Akka wants to connect with username "flink"

2018-03-06 Thread Nico Kruber
Hi Lukas, those are akka-internal names that you don't have to worry about. It looks like your TaskManager cannot reach the JobManager. Is 'jobmanager.rpc.address' configured correctly on the TaskManager? And is it reachable by this name? Is port 6123 allowed through the firewall? Are you sure th

Does Flink support stream-stream outer joins in the latest version?

2018-03-06 Thread kant kodali
Hi All, Does Flink support stream-stream outer joins in the latest version? Thanks!

Re: Serialization and Deserialization of Avro messages stored in Kafka

2018-03-06 Thread Gordon Weakliem
The 010 consumer extends 09, so I'd guess whatever code is reporting sees the FlinkKafkaConsumer010 as its superclass. I've seen this error a bunch, and it's because MyDeserializationSchema isn't serializable, or likely one of its fields is not serializable, or one of the fields of its fields - yo

Re: bin/start-cluster.sh won't start jobmanager on master machine

2018-03-06 Thread Yesheng Ma
Hi Nico, Thanks for your reply. My major concern is actually the `-l` argument. The command I executed is: `nohup /bin/bash -x -l "/state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh" start cluster dell-01.epcc 8091`, with and without the `-l` argument (the script in Flink's bin directory uses th

Re: bin/start-cluster.sh won't start jobmanager on master machine

2018-03-06 Thread Yesheng Ma
Related source code: https://github.com/apache/flink/blob/master/flink-dist/src/main/flink-bin/bin/start-cluster.sh#L40 On Wed, Mar 7, 2018 at 2:11 AM, Yesheng Ma wrote: > Hi Nico, > > Thanks for your reply. My major concern is actually the `-l` argument. > The command I executed is: `nohup /bin

How back pressure works in Flink?

2018-03-06 Thread Pawel Bartoszek
Can you explain how back pressure affect the source in flink? I read the great article https://data-artisans.com/blog/how-flink-handles-backpressure and got the idea but I would like to know more details. Let's consider org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext

Table Api and CSV builder

2018-03-06 Thread Karim Amer
Hi there I Have a CSV file with the timestamp deconstructed into 3 fields and I was wondering what is the best way to specify the those 3 fields are the event time ? Should I make extend CsvTableSource and do the preprocessing or can CsvTableSource.builder() handle it. Or is there a better

Re: Flink is looking for Kafka topic "n/a"

2018-03-06 Thread Mu Kong
Hi Nico, Thanks for your prompt response. I'm using Flink 1.3.0 for this job. Please let me know if you need more information. Best regards, Mu On Tue, Mar 6, 2018 at 10:17 PM, Nico Kruber wrote: > Hi Mu, > which version of flink are you using? I checked the latest branches for > 1.2 - 1.5 t

Rest APIs

2018-03-06 Thread Daniele Foroni
Hi guys, I am using flink 1.4.1 and I am working with the rest api. If I run a flink job through the command line (./bin/flink run job.jar) is it uploaded to the folder set in variable jobmanager.web.upload.dir? It seems no. So, through the rest api I can cancel the job creating a savepoint, but

Re: Spread Kafka sink tasks over different nodes

2018-03-06 Thread Dongwon Kim
Hi Aljoscha and Robert, You guys are right. I resubmit the application with # session window tasks equal to # Kafka sink tasks. I never thought that multiple different Kafka tasks can write to the same partition. Initially, I do not set the default parallelism and I explicitly set # partitions

回复:How back pressure works in Flink?

2018-03-06 Thread Zhijiang(wangzhijiang999)
Hi Pawel, The data transfer process on sender side is in the following way:operator collect record --> serilize to flink buffer --> copy to netty buffer --> flush to socket On receiver side: socket --> netty --> flink buffer --> deserialize to record --> operator process On receiver side, if the

Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-06 Thread Xingcan Cui
Hi Kant, I suppose you refer to the stream join in SQL/Table API since the outer join for windowed-streams can always be achieved with the `JoinFunction` in DataStream API. There are two kinds of stream joins, namely, the time-windowed join and the non-windowed join in Flink SQL/Table API [1,

Re: Flink is looking for Kafka topic "n/a"

2018-03-06 Thread Tzu-Li Tai
Hi Mu, You mentioned that the job stopped after the "n/a" topic error, but the job failed to recover. What exception did you encounter in the restart executions? Was it the same error? This would verify if we actually should be removing more than one of these special MARKER partition states. On t

Re: Serialization and Deserialization of Avro messages stored in Kafka

2018-03-06 Thread Tzu-Li Tai
Hi Filipe, What Gordon mentioned is correct. Did you manage to fix the issue? >From your code snippet, it looks like that the `Schema` field may not be serializable. Could you double check that? Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Restart hook and checkpoint

2018-03-06 Thread Tzu-Li Tai
Hi Ashish, Could you elaborate a bit more on why you think the restart of all operators lead to data loss? When restart occurs, Flink will restart the job from the latest complete checkpoint. All operator states will be reloaded with state written in that checkpoint, and the position of the input

Re: Restart hook and checkpoint

2018-03-06 Thread Ashish Pokharel
Hi Gordon, The issue really is we are trying to avoid checkpointing as datasets are really heavy and all of the states are really transient in a few of our apps (flushed within few seconds). So high volume/velocity and transient nature of state make those app good candidates to just not have ch

Re: Does Flink support stream-stream outer joins in the latest version?

2018-03-06 Thread Hequn Cheng
Hi Kant, The stream-stream outer joins are work in progress now(left/right/full), and will probably be ready before the end of this month. You can check the progress from[1]. Best, Hequn [1] https://issues.apache.org/jira/browse/FLINK-5878 On Wed, Mar 7, 2018 at 1:01 PM, Xingcan Cui wrote: >