Re: Handle event time

2017-09-07 Thread Xingcan Cui
Hi AndreaKinn, The AscendingTimestampExtractor do not work as you think. It should be applied for streams where timestamps are monotonously ascending, naturally. Flink uses watermark to deal with unordered data. When a watermark *t* is received, it means there should be no more records whose

File System State Backend

2017-09-07 Thread rnosworthy
Flink 1.3.2 I have 1 vm for the job manager and another for task manager. I have a custom windowing trigger shown below. My checkpoint data is not clearing. I have tried to inject a fileStateThresholdSize when instantiating the FsStateBackend object, but that didn't work. I have tried

Capturing the exception that leads to a job entering the FAILED state

2017-09-07 Thread Andrew Roberts
Hello, I’m trying to connect our Flink deployment to our error monitor tool, and I’m struggling to find an entry point for capturing that exception. I’ve been poking around a little bit of the source, but I can’t seem to connect anything I’ve found to the job submission API we’re using

Handle event time

2017-09-07 Thread AndreaKinn
Hi, I'm getting sensor data from a kafka source and I absolutely need they are ordered on time data generation basis. I've implemented a custom deserialiser and employed an AscendingTimestampExtractor to handle event time. Obviously I set EventTime as streamTimeCharacteristics. Unfortunately when

Bucketing HDFS Sink Failing randomly after fews days it runs successfully

2017-09-07 Thread Raja . Aravapalli
Hi Team, I have a Bucketing Sink writing to HDFS files…. Which is running successfully for 4days failing suddenly with below exception: Caused by: java.io.IOException: Cannot find required BLOB at /tmp/blobStore Code below: BucketingSink HdfsSink = new BucketingSink (hdfsOutputPath);

Does Flink has a JDBC server where I can submit Calcite Streaming Queries?

2017-09-07 Thread kant kodali
Hi All, Does Flink has a JDBC server where I can submit Calcite Streaming Queries? such that I get Stream of responses back from Flink forever via JDBC ? What is the standard way to do this? Thanks, Kant

Disable job graph in web UI

2017-09-07 Thread Joshua Griffith
Hello, I have an auto-generated job that creates too many tasks for web UI’s job graph to handle. The browser pinwheels while the page attempts to load. Is it possible to disable the job graph component in the web UI? For slightly smaller jobs, once the graph loads the rest of the UI is usable.

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Elias Levy
If you want to ensure you see all changes to a Cassandra table, you need to make use of the Change Data Capture feature. For that, you'll need code running on the Cassandra nodes to read the commit log segments from the Cassandra CDC

Re: Broadcast Config through Connected Stream

2017-09-07 Thread Navneeth Krishnan
Hi All, Any suggestions on this would really help. Thanks. On Tue, Sep 5, 2017 at 2:42 PM, Navneeth Krishnan wrote: > Hi All, > > I looked into an earlier email about the topic broadcast config through > connected stream and I couldn't find the conclusion. > > I

Re: State Maintenance

2017-09-07 Thread Navneeth Krishnan
Will I be able to use both queryable MapState and union list state while implementing the CheckpointedFunction interface? Because one of my major requirement on that operator is to provide a queryable state and in order to compute that state we need the common static state across all parallel

Re: Flink 1.2.1 JobManager Election Deadlock

2017-09-07 Thread Ufuk Celebi
Thanks for looking into this and finding out that it is (probably) related to Curator. Very valuable information! On Thu, Sep 7, 2017 at 3:09 PM, Timo Walther wrote: > Thanks for informing us. As far as I know, we were not aware of any deadlock > in the JobManager election.

Re: dynamically partitioned stream

2017-09-07 Thread Tony Wei
Hi Martin, For the first question, as far as I know, Flink guarantees that the order of records from the same sub-task of consumer won't be changed. If A, B and C came from different sub tasks, the result might be like your concern. After all, you can't have all sub tasks process in the same

Re: Flink 1.2.1 JobManager Election Deadlock

2017-09-07 Thread Timo Walther
Thanks for informing us. As far as I know, we were not aware of any deadlock in the JobManager election. Let's hope that the updated Curator version fixed the problem. We will defenitely keep an eye on this. Feel free to contact the dev@ mailing list, if the problem still exists in 1.3.2.

Re: MapState Default Value

2017-09-07 Thread Stefan Richter
I don’t know any better answer than: it was never implemented. But it could make sense, so there is no deeper reason from my point of view. > Am 07.09.2017 um 14:49 schrieb Timo Walther : > > I will loop in Stefan, who might know the answer. > > > Am 07.09.17 um 02:10

Re: MapState Default Value

2017-09-07 Thread Timo Walther
I will loop in Stefan, who might know the answer. Am 07.09.17 um 02:10 schrieb Navneeth Krishnan: Hi, Is there a reason behind removing the default value option in MapStateDescriptor? I was using it in the earlier version to initialize guava cache with loader etc and in the new version by

Re: Exception when using keyby operator

2017-09-07 Thread Timo Walther
Hi Sridhar, according to the exception, your "meEvents" stream is not POJO. You can check that by printing "meEvents.getType()". In general, you can always check the log for such problems. There should be something like: 14:40:57,079 INFO org.apache.flink.api.java.typeutils.TypeExtractor

Re: Flink on AWS EMR Protobuf

2017-09-07 Thread Timo Walther
I'm not sure if this is a Flink issue. It popped up on other non-Flink projects as well: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/map-red-over-hbase-in-cdh-5-7/td-p/43902 I would defenitely check your dependencies. This looks like conflicting versions in your classpaths.

Re: Question about Flink internals

2017-09-07 Thread Timo Walther
Hi Junguk, I try to answer your questions, but also loop in Ufuk who might now more about the network internals: 1. Yes, every operator/operator chain has a "setParallelism()" method do specify the parallelism. The overall parallelism of the job can be set when submitting a job. The

Re: dynamically partitioned stream

2017-09-07 Thread Martin Eden
Hi Tony, Ah I see. Yes you are right. What I was saying in my last message is that I relaxed that requirement after realising that it works how you just described it (and Aljoscha previously) and global state is not really feasible/possible. Here is a re-worked example. Please let me know if it

Re: Flink on AWS EMR Protobuf

2017-09-07 Thread ant burton
Please find a stack trace below, and thank you for taking a look :-) java.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to com.google.protobuf.Message at

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread kant kodali
Yes I can indeed create them but I wonder if that is even possible? I haven't see any framework doing this as of today. Flink has something called AsyncDataStream? and I wonder if this can be leveraged to create a Stream out of Cassandra source? Thanks! On Thu, Sep 7, 2017 at 1:16 AM, Tzu-Li

Re: dynamically partitioned stream

2017-09-07 Thread Tony Wei
Hi Martin, What I was talking is about how to store the arguments' state. In the example you explained your use case to Aljoscha. 4 f1(V4, V3, V3) f2(V4, V3) 3 f1(V3, V3, V3) 2 - 1 - You showed that when lambda f2 came, it would emit f2(V4, V3)

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Tzu-Li (Gordon) Tai
Ah, I see. I’m not aware of any existing work / JIRAs on streaming sources for Cassandra or HBase, only sinks. If you are interested in one, could you open JIRAs for them? On 7 September 2017 at 4:11:05 PM, kant kodali (kanth...@gmail.com) wrote: Hi Gordon, Thanks for the response, I did go

Re: Additional data read inside dataset transformations

2017-09-07 Thread Fabian Hueske
Hi, traditionally, you would do a join, but that would mean to read all Parquet files that might contain relevant data which might be too much. If you want to read data from within a user function (like GroupReduce), you are pretty much up to your own. You could create a HadoopInputFormat

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread kant kodali
Hi Gordon, Thanks for the response, I did go over the links for sources and sinks prior to posting my question. Maybe, I didn't get my question across correctly so let me rephrase it. Can I get data out of data stores like Cassandra, Hbase in a streaming manner? coz, currently more or less all

Re: can flink do streaming from data sources other than Kafka?

2017-09-07 Thread Tzu-Li (Gordon) Tai
Hi! I am wondering if Flink can do streaming from data sources other than Kafka. For example can Flink do streaming from a database like Cassandra, HBase, MongoDb to sinks like says Elastic search or Kafka. Yes, Flink currently supports various connectors for different sources and sinks. For

Re: State Maintenance

2017-09-07 Thread Fabian Hueske
Hi Navneeth, there's a lower level state interface that should address your requirements: OperatorStateStore.getUnionListState() This union list state is similar to the regular operator list state, but instead of splitting the list for recovery and giving out splits to operator instance, it

can flink do streaming from data sources other than Kafka?

2017-09-07 Thread kant kodali
Hi All, I am wondering if Flink can do streaming from data sources other than Kafka. For example can Flink do streaming from a database like Cassandra, HBase, MongoDb to sinks like says Elastic search or Kafka. Also for out of core stateful streaming. Is RocksDB the only option? Can I use some

Additional data read inside dataset transformations

2017-09-07 Thread eSKa
Hello, I will describe my use case shortly with steps for easier understanding: 1) currently my job is loading data from parquet files using HadoopInputFormat along with AvroParquetInputFormat, with current approach: AvroParquetInputFormat inputFormat = new AvroParquetInputFormat();

Re: dynamically partitioned stream

2017-09-07 Thread Martin Eden
Hi Tony, Yes exactly I am assuming the lambda emits a value only after it has been published to the control topic (t1) and at least 1 value arrives in the data topic for each of it's arguments. This will happen at a time t2 > t1. So yes, there is uncertainty with regards to when t2 will happen.