Window start and end issue with TumblingProcessingTimeWindows

2016-06-06 Thread Soumya Simanta
I've a simple program which takes some inputs from a command line (Socket stream) and then aggregates based on the key. When running this program on my local machine I see some output that is counter intuitive to my understanding of windows in Flink. The start time of the Window is around the

Re: Hourly top-k statistics of DataStream

2016-06-06 Thread Yukun Guo
My algorithm is roughly like this taking top-K words problem as an example (the purpose of computing local “word count” is to deal with data imbalance): DataStream of words -> timeWindow of 1h -> converted to DataSet of words -> random partitioning by rebalance -> local “word count” using

Hourly top-k statistics of DataStream

2016-06-06 Thread Yukun Guo
Hi, I'm working on a project which uses Flink to compute hourly log statistics like top-K. The logs are fetched from Kafka by a FlinkKafkaProducer and packed into a DataStream. The problem is, I find the computation quite challenging to express with Flink's DataStream API: 1. If I use something

Periodically evicting internal states when using mapWithState()

2016-06-06 Thread Jack Huang
Hi all, I have an incoming stream of event objects, each with its session ID. I am writing a task that aggregate the events by session. The general logics looks like case class Event(sessionId:Int, data:String)case class Session(id:Int, var events:List[Event]) val events = ... //some source

Re: Event processing time with lateness

2016-06-06 Thread Elias Levy
On Fri, Jun 3, 2016 at 6:47 AM, Kostas Kloudas wrote: > To see a relatively more complex example of such a trigger and how to > implement it, > you can have a look at this implementation: >

Re: Kafka producer sink message loss?

2016-06-06 Thread Elias Levy
On Sun, Jun 5, 2016 at 3:16 PM, Stephan Ewen wrote: > You raised a good point. Fortunately, there should be a simply way to fix > this. > > The Kafka Sunk Function should implement the "Checkpointed" interface. It > will get a call to the "snapshotState()" method whenever a

Does Flink allows for encapsulation of transformations?

2016-06-06 Thread Ser Kho
The question is how to encapsulate numerous transformations into one object or may be a function in Apache Flink Java setting. I have tried to investigate this question using an example of Pi calculation (see below). I am wondering whether or not the suggested approach is valid from the Flink's

Re: Submit Flink Jobs to YARN running on AWS

2016-06-06 Thread Bajaj, Abhinav
Hi Josh, I have not yet :-( . I am working on getting a REST service setup on AWS that can do it rather than using Flink client remotely. This way the AKKA communication is within AWS. However, I still need the solution for running some of the integration/system tests. ~ Abhi From: Josh

Re: java.io.IOException: Couldn't access resultSet

2016-06-06 Thread Chesnay Schepler
open() not being called is a valuable observation, but from then on out i have problems following you. within hasNext() we first check whether exhausted is true, and if so return false. Since it is initialized with false we will not return here. this seems like correct behaviour. I have one

Re: Data Source Generator emits 4 instances of the same tuple

2016-06-06 Thread Biplob Biswas
Hi, I tried streaming the source data 2 ways 1. Is a simple straight forward way of sending data without using the serving speed concept http://pastebin.com/cTv0Pk5U 2. The one where I use the TaxiRide source which is exactly similar except loading the data in the proper data structures.

Re: HBase Input Format for streaming

2016-06-06 Thread Christophe Salperwyck
I just did that: public T nextRecord(final T reuse) throws IOException { if (this.rs == null){ // throw new IOException("No table result scanner provided!"); return null; } ... because in the class FileSourceFunction we have: @Override public void run(SourceContext ctx) throws Exception {

Re: Custom keyBy(), look for similaties

2016-06-06 Thread Ufuk Celebi
Hey Iñaki, you can use the KeySelector as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-keys But you only a local view for the current element, e.g. the library you use to determine the similarity has to know the similarities

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-06 Thread Ufuk Celebi
Unless someone really invests time into debugging this I fear that the different misspellings are not really helpful, Flavio. On Mon, Jun 6, 2016 at 10:31 AM, Flavio Pompermaier wrote: > This time I had the following exception (obviously >

Re: HBase Input Format for streaming

2016-06-06 Thread Ufuk Celebi
>From the code it looks like the open method of the TableInputFormat is never called. What are you doing differently in the StreamingTableInputFormat? – Ufuk On Mon, Jun 6, 2016 at 1:49 PM, Christophe Salperwyck wrote: > Hi all, > > I am trying to read data

Re: Logs show `Marking the coordinator 2147483637 dead` in Flink-Kafka conn

2016-06-06 Thread Robert Metzger
Hi, the error is logged is coming from the Kafka code we are referring to. The Kafka user@ list is probably a better place to seek for help on this one. I've searched for the error message at Google. It seems that the message does not immediately indicate an error. Also note, it is logged at

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-06 Thread Ufuk Celebi
Thanks for clarification. I think it might be related to the YARN properties file, which is still being used for the batch jobs. Can you try to delete it between submissions as a temporary workaround to check whether it's related? – Ufuk On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud

RE: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-06 Thread LINZ, Arnaud
Hi, The zookeeper path is only for my persistent container, and I do use a different one for all my persistent containers. The -Drecovery.mode=standalone was passed inside theJVM_ARGS ("${JVM_ARGS} -Drecovery.mode=standalone -Dyarn.properties-file.location=/tmp/flink/batch") I've tried

Logs show `Marking the coordinator 2147483637 dead` in Flink-Kafka conn

2016-06-06 Thread Sendoh
Hi Flink community, In worker's log of Flink I saw the following info appears 30 times or more every 10 minutes approximately. `org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483637 dead. ` Would this indicate Kafka-Flink consumer group is

Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-06 Thread LINZ, Arnaud
Hi, I use Flink 1.0.0. I have a persistent yarn container set (a persistent flink job manager) that I use for streaming jobs ; and I use the “yarn-cluster” mode to launch my batches. I’ve just switched “HA” mode on for my streaming persistent job manager and it seems to works ; however my

HBase Input Format for streaming

2016-06-06 Thread Christophe Salperwyck
Hi all, I am trying to read data from HBase and use the windows functions of Flink streaming. I can read my data using the ExecutionEnvironment but not from the StreamExecutionEnvironment. Is that a known issue? Are the inputsplits used in the streaming environment? Here a sample of my code:

Data Source Generator emits 4 instances of the same tuple

2016-06-06 Thread Biplob Biswas
Hi, I am using a Data Source Generator, which very closely resembles the one here on the dataartisans github page https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java This

Re: java.io.IOException: Couldn't access resultSet

2016-06-06 Thread Chesnay Schepler
the JDBC IF does not and never has used the configuration. On 06.06.2016 09:27, Aljoscha Krettek wrote: The problem could be that open() is not called with a proper Configuration object in streaming mode. On Sun, 5 Jun 2016 at 19:33 Stephan Ewen >

Custom keyBy(), look for similaties

2016-06-06 Thread iñaki williams
Hi guys, I am using Flink on my project and I have a question. (I am using Java) Is it possible to modify the keyby method in order to key by similarities and not by the exact name? Example: I recieve 2 DataStreams, in the first one , the name of the field that I want to KeyBy is "John Locke",

Re: java.io.IOException: Couldn't access resultSet

2016-06-06 Thread Aljoscha Krettek
The problem could be that open() is not called with a proper Configuration object in streaming mode. On Sun, 5 Jun 2016 at 19:33 Stephan Ewen wrote: > Hi David! > > You are using the JDBC format that was written for the batch API in the > streaming API. > > While that should

Re: env.fromElements produces TypeInformation error

2016-06-06 Thread Aljoscha Krettek
Hi, I think the problem is that the case class has generic parameters. You can try making TypeInformation for those parameters implicitly available at the call site, i.e: implicit val typeT = createTypeInformation[T] // where you insert the specific type for T and do the same for the other