Re: Flink on EMR Question

2016-01-05 Thread Maximilian Bode
Hi everyone, Regarding Q1, I believe I have witnessed a comparable phenomenon in a (3-node, non-EMR) YARN cluster. After shutting down the yarn session via `stop`, one container seems to linger around. `yarn application -list` is empty, whereas `bin/yarn-session.sh -q` lists the left-over

Local collection data sink for the streaming API

2016-01-05 Thread Filipe Correia
Hi, Collecting results locally (e.g., for unit testing) is possible in the DataSet API by using "LocalCollectionOutputFormat", as described in the programming guide: https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html#collection-data-sources-and-sinks Can

Re: Local collection data sink for the streaming API

2016-01-05 Thread Filipe Correia
Hi Gábor, Thanks! I'm using Scala though. DataStreamUtils.collect() depends on org.apache.flink.streaming.api.datastream.DataStream, rather than org.apache.flink.streaming.api.scala.DataStream. Any suggestion on how to handle this, other than creating my own scala implementation of

Re: Local collection data sink for the streaming API

2016-01-05 Thread Gábor Gévay
Try the getJavaStream method of the scala DataStream. Best, Gábor 2016-01-05 19:14 GMT+01:00 Filipe Correia : > Hi Gábor, Thanks! > > I'm using Scala though. DataStreamUtils.collect() depends on > org.apache.flink.streaming.api.datastream.DataStream, rather than >

Re: Flink on EMR Question

2016-01-05 Thread Stephan Ewen
Hi! Concerning (1) We have seen that a few times. The JVMs / Threads do sometimes not properly exit in a graceful way, and YARN is not always able to kill the process (YARN bug). I am currently working on a refactoring of the YARN resource manager (to allow to easy addition of other frameworks)

Re: Sink - Cassandra

2016-01-05 Thread Nick Dimiduk
Hi Sebastian, I've had preliminary success with a steaming job that is Kafka -> Flink -> HBase (actually, Phoenix) using the Hadoop OutputFormat adapter. A little glue was required but it seems to work okay. My guess it it would be the same for Cassandra. Maybe that can get you started? Good

flink kafka scala error

2016-01-05 Thread Madhukar Thota
Hi I am seeing the following error when i am trying to run the jar in Flink Cluster. I am not sure what dependency is missing. /opt/DataHUB/flink-0.10.1/bin/flink run datahub-heka-1.0-SNAPSHOT.jar flink.properties java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class

Re: Behaviour of CountWindowAll

2016-01-05 Thread Aljoscha Krettek
Hi, I’m afraid this is not possible right now. I’m also not sure about the Evictors as a whole. Using them makes window operations very slow because all elements in a window have to be kept, i.e. window results cannot be pre-aggregated. Cheers, Aljoscha > On 15 Dec 2015, at 12:23, Radu Tudoran

Re: Questions re: ExecutionGraph & ResultPartitions for interactive use a la Spark

2016-01-05 Thread Aljoscha Krettek
Hi, these are certainly valid use cases. As far is I know, the people who know most in this area are on vacation right now. They should be back in a week, I think. They should be able to give you a proper description of the current situation and some pointers. Cheers, Aljoscha > On 04 Jan

Re: Sink - Cassandra

2016-01-05 Thread Aljoscha Krettek
Hi Sebastian, I’m afraid the people working on Flink don’t have much experience with Cassandra. Maybe you could look into the Elasticsearch sink and adapt it to write to Cassandra instead. That could be a valuable addition to Flink. Cheers, Aljoscha > On 22 Dec 2015, at 14:36, syepes