Re: termination of stream#iterate on finite streams

2017-09-01 Thread Xingcan Cui
Hi Peter, Let me try to explain this. As you shown in the examples, the iterate method takes a function, which "split" the initial stream into two separate streams, i.e., initialStream => (stream1, stream2). The stream2 works as the output stream, whose results will be emitted to the successor

Re: dynamically partitioned stream

2017-09-01 Thread Tony Wei
Hi Martin, Aljoscha I think Aljoscha is right. My origin thought was to keep the state only after a lambda function coming. Use Aljoscha's scenario as example, initially, all data will be discarded because there is no any lambdas. When lambda f1 [D, E] and f2 [A, C] comes, A, C begin to be

Re: part files written to HDFS with .pending extension

2017-09-01 Thread Krishnanand Khambadkone
BTW, I am using a BucketingSink and a DateTimeBucketer.  Do I need to set any other property to move the files from .pending state. BucketingSink sink = new BucketingSink("hdfs://localhost:8020/flinktwitter/");sink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); On Friday, September

part files written to HDFS with .pending extension

2017-09-01 Thread Krishnanand Khambadkone
Hi,  I have written a small program that uses a Twitter input stream and a HDFS output sink.   When the files are written to HDFS each part file in the directory has a .pending extension.  I am able to cat the file and see the tweet text.  Is this normal for the part files to have .pending

Re: Error submitting flink job

2017-09-01 Thread Krishnanand Khambadkone
I had to restart the flink process. That fixed the issue Sent from my iPhone > On Sep 1, 2017, at 3:39 PM, ant burton wrote: > > Is this of any help > > https://stackoverflow.com/questions/33890759/how-to-specify-overwrite-to-writeastext-in-apache-flink-streaming-0-10-0

Re: Re: Error submitting flink job

2017-09-01 Thread Krishnanand Khambadkone
I have set this in my flink-conf.yaml file. On Friday, September 1, 2017, 3:39:05 PM PDT, ant burton wrote: Is this of any help https://stackoverflow.com/questions/33890759/how-to-specify-overwrite-to-writeastext-in-apache-flink-streaming-0-10-0

Re: Error submitting flink job

2017-09-01 Thread ant burton
Is this of any help https://stackoverflow.com/questions/33890759/how-to-specify-overwrite-to-writeastext-in-apache-flink-streaming-0-10-0 fs.overwrite-files: true in your

Error submitting flink job

2017-09-01 Thread Krishnanand Khambadkone
I am trying to submit a flink job from the command line and seeing this error.   Any idea what could be happening java.io.IOException: File or directory already exists. Existing files and directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite existing files and

Re: [EXTERNAL] Re: Bucketing/Rolling Sink: New timestamp appeded to the part file name everytime a new part file is rolled

2017-09-01 Thread Felix Cheung
Yap I was able to get this to work with a custom bucketer. A custom bucketer can use the clock given ("processing time") or it can use a timestamp from the data ("event time") for the bucketing path. From: Raja.Aravapalli Sent:

Re: [EXTERNAL] Re: Bucketing/Rolling Sink: New timestamp appeded to the part file name everytime a new part file is rolled

2017-09-01 Thread Raja . Aravapalli
Thanks Aljoscha for the inputs. I will check to extend “BasePathBucketer” class. Regards, Raja. From: Aljoscha Krettek Date: Friday, September 1, 2017 at 10:27 AM To: Piotr Nowojski Cc: Raja Aravapalli ,

Re: dynamically partitioned stream

2017-09-01 Thread Aljoscha Krettek
Hi Martin, I think with those requirements this is very hard (or maybe impossible) to do efficiently in a distributed setting. It might be that I'm misunderstanding things but let's look at an example. Assume that initially, we don't have any lambdas, so data can be sent to any machine because

Re: Bucketing/Rolling Sink: New timestamp appeded to the part file name everytime a new part file is rolled

2017-09-01 Thread Aljoscha Krettek
Hi Raja, I think you can in fact do this by implementing a custom Bucketer. You can have a look at BasePathBucketer and extend that to include the timestamp in the path that is returned. You should probably clamp the timestamp so that you don't get a new path for every millisecond. Best,

Re: Operator variables in memory scoped by key

2017-09-01 Thread gerardg
Thanks Aljoscha, So I can think of three possible solutions: * Use an instance dictionary to store the trie tree scoped by the same key that the KeyedStream. That should work if the lifetime of the operator object is tied to the keys that it processes. * Store the trie tree in a ValueState but

termination of stream#iterate on finite streams

2017-09-01 Thread Peter Ertl
Hi folks, I was doing some experiments with DataStream#iterate and what felt strange to me is the fact that #iterate() does not terminate on it's own when consuming a _finite_ stream. I think this is awkward und unexpected. Only thing that "helped" was setting an arbitrary and meaningless

Re: Sink -> Source

2017-09-01 Thread Nico Kruber
Hi Philipp, afaik, Flink doesn't offer this out-of-the-box. You could either hack something as suggested or use Kafka to glue different jobs together. Both may affect exactly/at-least once guarantees, however. Also refer to

Re: Flink session on Yarn - ClassNotFoundException

2017-09-01 Thread Albert Giménez
Thanks for the replies :) I managed to get it working following the instructions here , but I found a few issues that I guess were specific to HDInsight, or at least to the HDP version it

Re: Bucketing/Rolling Sink: New timestamp appeded to the part file name everytime a new part file is rolled

2017-09-01 Thread Piotr Nowojski
Hi, BucketingSink doesn’t support the feature that you are requesting, you can not specify a dynamically generated prefix/suffix. Piotrek > On Aug 31, 2017, at 7:12 PM, Raja.Aravapalli > wrote: > > > Hi, > > I have a flink application that is streaming data