Flink Cassandra Connector is not working

2016-10-26 Thread NagaSaiPradeep
Hi, I am working on connecting Flink with Cassandra. When I ran the sample program (which I got from Github ), I am getting

Testing iterative data flows

2016-10-26 Thread Ken Krugler
Hi all, What’s the recommended way currently to test a streaming data flow that has iterations? I know that using timeouts in tests (which FLINK-2390 also discusses) isn’t reliable, and it’s hard to know when a job with iterations is really

Re: Retrieving a single element from a DataSet

2016-10-26 Thread Greg Hogan
It sounds like you want to use an all-pairs shortest paths algorithm. This would be a great contribution to Gelly! https://en.wikipedia.org/wiki/Shortest_path_problem#All-pairs_shortest_paths On Wed, Oct 26, 2016 at 9:29 AM, otherwise777 wrote: > That is indeed not the

Re: TIMESTAMP TypeInformation

2016-10-26 Thread Fabian Hueske
Hi Radu, I might not have complete understood your problem, but if you do val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env) val ds = env.fromElements( (1, 1L, new Time(1,2,3)) ) val t = ds.toTable(tEnv, 'a, 'b, 'c) val results = t

Re: Flushing the result of a groupReduce to a Sink before all reduces complete

2016-10-26 Thread Fabian Hueske
Hi Paul, Flink pushes the results of operators (including GroupReduce) to the next operator or sink as soon as they are computed. So what you are asking for is actually happening. However, before the GroupReduceFunction can be applied, the whole data is sorted in order to group the data. This

Flushing the result of a groupReduce to a Sink before all reduces complete

2016-10-26 Thread Paul Wilson
Hi, DataSet API Flink 1.1.3 I have an application where I'd like to perform some mapping before batching the results and passing them to the sink. I'm performing a 'composite' key selection to group the items by their natural key as well as a batch (itemCount / batchSize). When I reduce the

Re: Checkpointing large RocksDB state to S3 - tips?

2016-10-26 Thread Aljoscha Krettek
Hi Josh, might the bandwidth to S3 be shared by all the running nodes? (Not sure how that is setup, so I'm just guessing here.) If you're on 1.2-SNAPSHOT you should also get fully elastic jobs in about a week. (I'm talking about the ability to restart from a savepoint with a different parallelism

Re: Reprocessing data in Flink / rebuilding Flink state

2016-10-26 Thread Konstantin Gregor
Hi Ufuk, thanks for this information, this is good news! Updating Flink to 1.1 is not really in our hands, but will hopefully happen soon :-) Thank you and best regards Konstantin On 26.10.2016 16:07, Ufuk Celebi wrote: > On Wed, Oct 26, 2016 at 3:06 PM, Konstantin Gregor >

Re: Reprocessing data in Flink / rebuilding Flink state

2016-10-26 Thread Ufuk Celebi
On Wed, Oct 26, 2016 at 3:06 PM, Konstantin Gregor wrote: > We are still using 1.0.1 so this is an expected behavior, but I just > wondered whether there are any news concerning this topic. Yes, we will add an option to ignore this while restoring. This will be

Re: Broadcast Config-Values through connected Configuration Stream

2016-10-26 Thread Ufuk Celebi
Does the following work? stream1.keyBy().connect(stream2.broadcast()) On Wed, Oct 26, 2016 at 2:01 PM, Julian Bauß wrote: > Hello Everybody, > > I'm currently trying to change the state of a CoFlatMapFunction with the > help of a connected configuration-stream. The code

Re: Retrieving a single element from a DataSet

2016-10-26 Thread otherwise777
That is indeed not the nice way to do it because it will create an executionplan just to get that value, but it does work, so thnx for that A more concrete example for what i want, In gelly you have the SingleSourceShortestPaths algorith which requires the sourceVertexId, now i want to execute a

Re: Reprocessing data in Flink / rebuilding Flink state

2016-10-26 Thread Konstantin Gregor
Hi everyone, I found this thread while examining an issue where Flink could not start from a savepoint. Problem was that we removed an operator, pretty much the same thing that occurred to Josh earlier in this thread. We are still using 1.0.1 so this is an expected behavior, but I just wondered

Re: Retrieving a single element from a DataSet

2016-10-26 Thread Sebastian Neef
Hi, I'm also interested in that question/solution. For now, my workaround looks like this: > DataSet<...> .filter(... object.Id == NeededElement.Id ... ).collect().get(0) I filter the DataSet for the element I want to find, collect it into a List which then returns the first element. That's a

Retrieving a single element from a DataSet

2016-10-26 Thread otherwise777
I'm currently making a shortest path algorithm in Gelly using DataSets, here's a piece of the code i've started with: public DataSet> ShortestPathsEAT(K startingnode) { DataSet> results = this.getVertices().distinct().map(new MapFunction,

Broadcast Config-Values through connected Configuration Stream

2016-10-26 Thread Julian Bauß
Hello Everybody, I'm currently trying to change the state of a CoFlatMapFunction with the help of a connected configuration-stream. The code looks something like this. streamToBeConfigured.connect(configMessageStream) .keyBy(new SomeIdKeySelecor(), new ConfigMessageKeySelector) .flatMap(new

Re: Distributing Tasks over Task manager

2016-10-26 Thread Till Rohrmann
Hi Jürgen, In a nutshell, Flink's scheduling works the following way: The sources are deployed wrt to local preferences. If there are no local preferences then the first machine from a map's iterator which stores the machines is used. So in general, the sources will first fill up the first

Re: Elasticsearch Http Connector

2016-10-26 Thread Timur Shenkao
Hi! For ElasticSearch HTTP Conector example, one may look at Flume's ElasticSearchSink https://github.com/apache/flume/tree/trunk/flume-ng-sinks/flume-ng-elasticsearch-sink It uses Apache Commons HTTPComponent On Wed, Oct 26, 2016 at 9:51 AM, Philipp Bussche wrote:

Re: "Slow ReadProcessor" warnings when using BucketSink

2016-10-26 Thread Robert Metzger
Hi Max, maybe you need to ask this question on the Hadoop user mailing list (or your Hadoop vendor support, if you are using a Hadoop distribution). On Tue, Oct 18, 2016 at 11:19 AM, static-max wrote: > Hi Robert, > > thanks for your reply. I also didn't find anything

Re: Distributing Tasks over Task manager

2016-10-26 Thread Robert Metzger
I'm sorry for the delay. I've added Till who knows the scheduler details to the conversation. On Tue, Oct 18, 2016 at 3:09 PM, Jürgen Thomann < juergen.thom...@innogames.com> wrote: > Hi Robert, > > Do you already had a chance to look on it? If you need more information > just let me know. > >

Re: Watermarks and window firing

2016-10-26 Thread Robert Metzger
Just for others who are wondering what this email is about: I suspect that this email was send accidentally and that this is the correct one: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Event-time-watermarks-and-windows-td9687.html On Mon, Oct 24, 2016 at 4:56 PM, Paul

Re: Unit testing a Kafka stream based application?

2016-10-26 Thread Robert Metzger
Hi Niels, Sorry for the late response. you can launch a Kafka Broker within a JVM and use it for testing purposes. Flink's Kafka connector is using that a lot for integration tests. Here is the code starting the Kafka server:

Re: Side effects or multiple sinks on streaming jobs?

2016-10-26 Thread Robert Metzger
Yes, you need to send the errors in band. There are two options how you implement it: A) Using the KeyedSerializationSchema, then you need to define only one Kafka Producer, because you can specify the target topic using the KeyedSerializationSchema.getTargetTopic() method. (operator

Re: FlinkKafkaConsumerBase - Received confirmation for unknown checkpoint

2016-10-26 Thread Robert Metzger
Hi Pedro, The message is a bit unexpected for me as well, but it does not make the checkpointing inconsistent. The only thing that's not happening in case of this warning is that the offsets are not written to Zookeeper. Which Flink version are you using? On Mon, Oct 24, 2016 at 7:25 PM,

Re: Side effects or multiple sinks on streaming jobs?

2016-10-26 Thread Luis Mariano Guerra
do I have to send the errors "in band"? that is, return maybe more than one tuple in my operations then flatmap and use a KeyedSerializationSchema? or is there a way to emit a tuple to another sink from within operations directly? On Wed, Oct 26, 2016 at 9:20 AM, Robert Metzger

Re: Side effects or multiple sinks on streaming jobs?

2016-10-26 Thread Robert Metzger
Hi Luis, You can define as many data sinks as you want in a Flink job topology. So its not a problem for your use case to define two Kafka sinks, sending data to different topics. Regards, Robert On Tue, Oct 25, 2016 at 3:30 PM, Luis Mariano Guerra < mari...@event-fabric.com> wrote: > hi, > >

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-10-26 Thread Robert Metzger
Hi Vinay, the JobManager and TaskManager logs contain the classpath used when starting a container on YARN. Can you check if the yaml file is in the classpath? On Tue, Oct 25, 2016 at 8:28 AM, vinay patil wrote: > Hi Max, > > As discussed here , I have put my yaml file

Re: Elasticsearch Http Connector

2016-10-26 Thread Philipp Bussche
Hi, yes I wrote an ElasticSearch Sink for Flink that uses Jest. Sure, I will make something available for you. Just travelling at the moment so this will be a few hours before it is there. Thanks -- View this message in context: