Flink to S3 streaming

2017-04-07 Thread pradeep s
Hi, I have a use case to stream messages from Kafka to Amazon S3. I am not using the s3 file system way since i need to have Object tags to be added for each object written in S3. So i am planning to use the AWS S3 sdk . But i have a query on how to hold the data till the message size is in few MBs

Re: hadoopcompatibility not in dist

2017-04-07 Thread Petr Novotnik
Hey Fabi, many thanks for your clarifications! It seems flink-shaded-hadoop2 itself is already included in the binary distribution: > $ jar tf flink-1.2.0/lib/flink-dist_2.10-1.2.0.jar | grep org/apache/hadoop | > head -n3 > org/apache/hadoop/ > org/apache/hadoop/fs/ > org/apache/hadoop/fs/FileS

Re: hadoopcompatibility not in dist

2017-04-07 Thread Fabian Hueske
Hi Petr, I think that's an expected behavior because the exception is intercepted and enriched with an instruction to solve the problem. As you assumed, you need to add the flink-hadoop-compatibility JAR file to the ./lib folder. Unfortunately, the file is not included in the binary distribution.

Flink + Druid example?

2017-04-07 Thread Matt
Hi all, I'm looking for an example of Tranquility (Druid's lib) as a Flink sink. I'm trying to follow the code in [1] but I feel it's incomplete or maybe outdated, it doesn't mention anything about other method (tranquilizer) that seems to be part of the BeamFactory interface in the current versi

Re: Hi

2017-04-07 Thread Fabian Hueske
Hi Wolfe, that's all correct. Thank you! I'd like to emphasize that the FsStateBackend stores all state on the heap of the worker JVM. So you might run into OutOfMemoryErrors if you state grows too large. Therefore, the RocksDBStatebackend is the recommended choice for most production use cases.

Re: Hi

2017-04-07 Thread Brian Wolfe
Hi Kant, Jumping in here, would love corrections if I'm wrong about any of this. In short answer, no, HDFS is not necessary to run stateful stream processing. In the minimal case, you can use the MemoryStateBackend to back up your state onto the JobManager. In any production scenario, you will w

hadoopcompatibility not in dist

2017-04-07 Thread Petr Novotnik
Hello, with 1.2.0 `WritableTypeInfo` got moved into its own artifact (flink-hadoop-compatibility_2.10-1.2.0.jar). Unlike with 1.1.0, the distribution jar `flink-dist_2.10-1.2.0.jar` does not include the hadoop compatibility classes anymore. However, `TypeExtractor` which is part of the distributio

Hi

2017-04-07 Thread kant kodali
Hi All, I read the docs however I still have the following question For Stateful stream processing is HDFS mandatory? because In some places I see it is required and other places I see that rocksDB can be used. I just want to know if HDFS is mandatory for Stateful stream processing? Thanks!

Disk I/O in Flink

2017-04-07 Thread Robert Schmidtke
Hi, I'm currently examining the I/O patterns of Flink, and I'd like to know when/how Flink goes to disk. Let me give an introduction of what I have done so far. I am running TeraGen (from the Hadoop examples package) + TeraSort ( https://github.com/robert-schmidtke/terasort) on a 16 node cluster,

Re: Shared DataStream

2017-04-07 Thread nragon
Yeah, hoping you'd say that :) Would be nice to share them just because the network overhead between flink and kafka. Thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-DataStream-tp12545p12549.html Sent from the Apache Flink User

Re: Shared DataStream

2017-04-07 Thread Fabian Hueske
Flink does not provide any tooling to link two jobs together. I would use Kafka to connect Flink jobs with each other. Best, Fabian 2017-04-06 17:28 GMT+02:00 nragon : > Hi, > > Can we share the end point of on job (datastream) with another job? > The arquitecture I'm working on should provide a

Re: Submit Flink job programatically

2017-04-07 Thread Kamil Dziublinski
Hey, I had a similar problem when I tried to list the jobs and kill one by name in yarn cluster. Initially I also tried to set YARN_CONF_DIR but it didn't work. What helped tho was passing hadoop conf dir to my application when starting it. Like that: java -cp application.jar:/etc/hadoop/conf Rea