Re: Fink: KafkaProducer Data Loss

2017-02-02 Thread Tzu-Li (Gordon) Tai
Hi Ninad and Till, Thank you for looking into the issue! This is actually a bug. Till’s suggestion is correct: The producer holds a `pendingRecords` value that is incremented on each invoke() and decremented on each callback, used to check if the producer needs to sync on pending callbacks on

Re: Compiler error while using 'CsvTableSource'

2017-02-02 Thread nsengupta
Til, FWIW, I have fired the entire testsuite for Flink latest Snapshot. Almost all testcases passed, particularly this one: This case uses a bulit-in loaded CSV (in

Re: Compiler error while using 'CsvTableSource'

2017-02-02 Thread nsengupta
Hello Till, Many thanks for a quick reply. I have tried to follow your suggestion, with no luck: Just to give it a shot, I have tried this too (following Flink Documentation):

Parallelism and Partitioning

2017-02-02 Thread Mohit Anchlia
What is the granularity of parallelism in flink? For eg: if I am reading from Kafka -> map -> Sink and I say parallel 2 does it mean it creates 2 consumer threads and allocates it on 2 separate task managers? Also, it would be good to understand the difference between parallelism and partitioning

Clarification on state backend parameters

2017-02-02 Thread Mohit Anchlia
Trying to understand these 3 parameters: state.backend state.backend.fs.checkpointdir state.backend.rocksdb.checkpointdir state.checkpoints.dir As I understand stream of data and the state of operators are 2 different concepts and that both need to be checkpointed. I am bit confused about the

Re: broadcasting a stream from a collection that is populated from an web service call

2017-02-02 Thread Sathi Chowdhury
Hi Till, Thanks for your reply. Probably keeping a kinesis in between makes lot of sense from many angle. My assumption is the code below can be changed to read the second stream from a kinesis stream and will always read the latest data from the seond stream and broadcast it so no disparity

Re: broadcasting a stream from a collection that is populated from an web service call

2017-02-02 Thread Till Rohrmann
Hi Sathi, I would ingest the meta data also into a kinesis queue and read the data from there. Then you don't have to fiddle around with the rest API from within your Flink job. If that is not feasible for you, then you can also write your own custom source function which queries the REST

Re: Fink: KafkaProducer Data Loss

2017-02-02 Thread Till Rohrmann
Hi Ninad, thanks for reporting the issue. For me it looks also as if exceptions might go under certain circumstances unnoticed. So for example you have a write operation which fails this will set the asyncException field which is not checked before the next invoke call happens. If now a

Re: Compiler error while using 'CsvTableSource'

2017-02-02 Thread Till Rohrmann
Hi Nirmalya, could you try casting the Types.STRING into a org.apache.flink.api.common.typeinfo.TypeInformation[String] type? Cheers, Till On Thu, Feb 2, 2017 at 5:55 PM, nsengupta wrote: > I am using *flink-shell*, available with flink-1.2-SNAPSHOT. > > While

broadcasting a stream from a collection that is populated from an web service call

2017-02-02 Thread Sathi Chowdhury
It’s good to be here. I have a data stream coming from kinesis. I also have a list of hashmap which holds metadata that needs to participate in the processing. In my flink processor class I construct this metadata (hardcoded) public static void main(String[] args) throws Exception { …….//

Fink: KafkaProducer Data Loss

2017-02-02 Thread ninad
Our Flink streaming workflow publishes messages to Kafka. KafkaProducer's 'retry' mechanism doesn't kick in until a message is added to it's internal buffer. If there's an exception before that, KafkaProducer will throw that exception, and seems like Flink isn't handling that. In this case there

Compiler error while using 'CsvTableSource'

2017-02-02 Thread nsengupta
I am using *flink-shell*, available with flink-1.2-SNAPSHOT. While loading a CSV file into a CsvTableSource - following the example given with the documents - I get this error. I am not sure

Netty issues while deploying Flink with Yarn on MapR

2017-02-02 Thread ani.desh1512
I am trying to run Flink using Yarn on MapR. My previous issue got resolved and I have updated the original post accordingly so. Accordingly, I modified pom.xml to change the

Re: Kafka data not read in FlinkKafkaConsumer09 second time from command line

2017-02-02 Thread Sujit Sakre
Implementing this formula seems to have solved our problem now. Thanks. On 2 February 2017 at 21:21, Sujit Sakre wrote: > Hi Aljoscha, > > Thanks for your response. > > We wanted to customize the watermark period calculation since we were not > getting the desired

Re: Connection refused error when writing to socket?

2017-02-02 Thread Till Rohrmann
Hi Li Peng, I think what you're trying to do won't work. The problem is that you have two TCP clients (sink and source) which are supposed to connect to each other. Without a server which buffers the incoming data and forwards it to the outgoing connections, it won't be possible to read the

Re: Kafka data not read in FlinkKafkaConsumer09 second time from command line

2017-02-02 Thread Aljoscha Krettek
Hi, what about using BoundedOutOfOrdernessGenerator? Why did that not work for your case? Cheers, Aljoscha On Mon, 30 Jan 2017 at 17:20 Sujit Sakre wrote: > Hi Robert, Aljoscha, > > Many thanks for pointing out. Watermark generation is the problem. It is >

Re: user Digest 2 Feb 2017 14:54:03 -0000 Issue 1703

2017-02-02 Thread Boris Lublinsky
Is KafkaIO supported on Flink Runner? I see the code in Github, but it is not part of .4 libraries > On Feb 2, 2017, at 8:54 AM, user-digest-h...@flink.apache.org wrote: > > > Have a look at >

Re: Cannot cancel job with savepoint due to timeout

2017-02-02 Thread Till Rohrmann
Hi Bruno, the lacking documentation for akka.client.timeout is an oversight on our part [1]. I'll update it asap. Unfortunately, at the moment there is no other way than to specify the akka.client.timeout in the flink-conf.yaml file. [1] https://issues.apache.org/jira/browse/FLINK-5700 Cheers,

Re: Externalized Checkpoints vs Periodic Checkpoints

2017-02-02 Thread Till Rohrmann
Hi Yassine, a periodic checkpoint is checkpoint which will be triggered periodically by Flink. The checkpoint itself can have multiple properties and one of them is whether the checkpoint is externalized or not. An externalized checkpoint is a checkpoint for which Flink writes the meta

Re: Externalized Checkpoints vs Periodic Checkpoints

2017-02-02 Thread Yassine MARZOUGUI
Thank you Till for the clarification, that was helpful. Best, Yassine 2017-02-02 15:31 GMT+01:00 Till Rohrmann : > Hi Yassine, > > a periodic checkpoint is checkpoint which will be triggered periodically > by Flink. The checkpoint itself can have multiple properties and

Re: Graphite reporter recover from broken pipe

2017-02-02 Thread Till Rohrmann
Hi Philipp, Flink 1.2 will also come with Dropwizard 3.1.0. If you need a newer version you can try to build a custom Flink version where you bump the Dropwizard version. It should only be necessary to re-build the module flink-metrics-graphite. Cheers, Till On Wed, Feb 1, 2017 at 3:23 PM,

Re: Support for Auto scaling

2017-02-02 Thread Till Rohrmann
Hi Sandeep, it's also not yet supported to dynamically start new TaskManagers when needed. However, it can be done manually via the taskmanager.sh script. We're working to enable this feature for 1.3 if everything goes according to plan. Cheers, Till On Thu, Feb 2, 2017 at 6:49 AM, Tzu-Li