Re: Kafka consumer are too fast for some partitions in "flatMap" like jobs

2017-08-30 Thread Elias Levy
On Wed, Aug 30, 2017 at 11:50 AM, Oleksandr Baliev < aleksanderba...@gmail.com> wrote: > > So the main question is how to synchronize data reading between kafka > partitions when data is sequential per partitions, but late for some of > them and we care about that data is not thrown away and will

Re: metrics for Flink sinks

2017-08-30 Thread Elias Levy
Not an exact match, but I am guessing it is related to FLINK-7286 , which I reported. Feel free to modify that issue to cover the root cause. On Wed, Aug 30, 2017 at 8:32 AM, Martin Eden wrote: > Thanks

Re: Distributed reading and parsing of protobuf files from S3 in Apache Flink

2017-08-30 Thread ShB
Hi Fabian, Thank you so much for your quick response, I appreciate it. Since I'm working with a very large number of files of small sizes, I don't necessarily need to read each file in parallel. I need to read a my large list of files in parallel - that is, split up my list of files into

Re: Kafka consumer are too fast for some partitions in "flatMap" like jobs

2017-08-30 Thread Oleksandr Baliev
Hi Elias, Thanks for reply, TOPIC_OUT has less partitions, ~20, but actually there are 4 output topics with different amount of partitions. So the Job is kind of router. In general to have 1:1 partitions for IN and OUT topics is good, thanks for tip. But since the main goal is to have windows in

Re: datastream.print() doesn't works

2017-08-30 Thread AndreaKinn
Hi, in the night uninstalling and re-installing maven and flink I solved my issue. I started the web dashboard using start-local.sh script and used /createLocalEnvironmentWithWebUI(new Configuration())/ as you suggested. Anyway when I start it in eclipse in the ui dashboard no running jobs are

Re: Twitter example

2017-08-30 Thread Krishnanand Khambadkone
Fabian, Thank you for the prompt responses. I was able to find the output in the job manager logs. There were no task manager logs generated Sent from my iPhone > On Aug 30, 2017, at 8:24 AM, Fabian Hueske wrote: > > print() writes to the standard output of the

Re: metrics for Flink sinks

2017-08-30 Thread Martin Eden
Thanks Chesnay, Just for completeness, are there any relevant tickets for the discussion that one can follow, upvote, contribute to? M On Tue, Aug 29, 2017 at 8:57 PM, Chesnay Schepler wrote: > Hello, > > 1. Because no one found time to fix it. In contrast to the remaining

Re: Twitter example

2017-08-30 Thread Fabian Hueske
print() writes to the standard output of the TaskManager process. The TM stdout is usually redirected to an out file in the ./log folder. 2017-08-30 17:20 GMT+02:00 Krishnanand Khambadkone : > I am running this standalone, not under yarn, on a single instance > setup. I

Re: Twitter example

2017-08-30 Thread Krishnanand Khambadkone
I am running this standalone, not under yarn, on a single instance setup. I believe you are referring to the flink log files Sent from my iPhone > On Aug 30, 2017, at 12:56 AM, Fabian Hueske wrote: > >Fabian Hueske (fhue...@gmail.com) is not on your Guest List |

Re: Kafka Offset settings in Flink Kafka Consumer 10

2017-08-30 Thread sohimankotia
My Bad :-) Sorry. We are using flink 1.2 dependencies . And I think this functionality is only available from flink 1.3 API Version . -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Kafka Offset settings in Flink Kafka Consumer 10

2017-08-30 Thread Oleksandr Baliev
Hi, it's there https://ci.apache.org/projects/flink/flink-docs- release-1.3/api/java/org/apache/flink/streaming/connectors/kafka/ FlinkKafkaConsumerBase.html#setStartFromSpecificOffsets-java.util.Map- just defined in FlinkKafkaConsumerBase 2017-08-30 16:34 GMT+02:00 sohimankotia

Kafka Offset settings in Flink Kafka Consumer 10

2017-08-30 Thread sohimankotia
Hi, I see that Flink Kafka consumer have ability to set specific offset to read from Map specificStartOffsets = new HashMap<>(); specificStartOffsets.put(new KafkaTopicPartition("topic", 0), 23L); specificStartOffsets.put(new KafkaTopicPartition("topic", 1), 31L);

Re: "Unable to find registrar for hdfs" on Flink cluster

2017-08-30 Thread P. Ramanjaneya Reddy
Thank you Aljoscha. With above steps working wordcount beam using quick start program. When running on actual beam source tree getting following error. root1@master:~/Projects/*beam*/examples/java$ *git branch * master * release-2.0.0 * ==> beam source code*

Re: Classloader issue with UDF's in DataStreamSource

2017-08-30 Thread Edward
In case anyone else runs into this, here's what I discovered: For whatever reason, the classloader used by org.apache.flink.api.java.typeutils.TypeExtractor did not have access to the classes in my udf.jar file. However, if I changed my KeyedDeserializationSchema implementation to use standard

Re: Union limit

2017-08-30 Thread Fabian Hueske
Hi b0c1, This is an limitation in Flink's optimizer. Internally, all binary unions are merged into a single n-ary union. The optimizer restricts the number of inputs for an operator to 64. You can work around this limitation with an identity mapper which prevents the union operators from

BlobCache and its functioning

2017-08-30 Thread Federico D'Ambrosio
Hi, I have a rather simple Flink job which has a KinesisConsumer as a source and an HBase table as sink, in which I write using writeOutputFormat. I'm running it on a local machine with a single taskmanager (2 slots, 2G). The KinesisConsumer works fine and the connection to the HBase table gets

Modify field topics (KafkaConsumer) during runtime

2017-08-30 Thread Jose Miguel Tejedor Fernandez
Hi, I am using Flink version 1.3.1. I am wondering if it is possible to add/delete new topics to FlinkKafkaConsumer during execution of a job? Otherwise, I guess I need to cancel the job and redeploy the new job. Cheers BR

Re: EOFException related to memory segments during run of Beam pipeline on Flink

2017-08-30 Thread Fabian Hueske
Hi Reinier, this is in fact a bug that you stumbled upon. In general, Flink works very well with larger data sets and little memory and gracefully spills data to disk. The problem in your case is caused by a wrapped exception. Internally, Flink uses an EOFException to signal that the memory pool

EOFException related to memory segments during run of Beam pipeline on Flink

2017-08-30 Thread Reinier Kip
Hi all, I’ve been running a Beam pipeline on Flink. Depending on the dataset size and the heap memory configuration of the jobmanager and taskmanager, I may run into an EOFException, which causes the job to fail. You will find the stacktrace near the bottom of this post (data censored). I

Re: Flink session on Yarn - ClassNotFoundException

2017-08-30 Thread Federico D'Ambrosio
Hi, What is your "hadoop version" output? I'm asking because you said your hadoop distribution is in /usr/hdp so it looks like you're using Hortonworks HDP, just like myself. So, this would be a third party distribution and you'd need to build Flink from source according to this:

Re: Flink session on Yarn - ClassNotFoundException

2017-08-30 Thread albert
Hi Chesnay, Thanks for your reply. I did download the binaries matching my Hadoop version (2.7), that's why I was wondering if the issue had something to do with the exact hadoop version flink is compiled again, or if there might be things that are missing in my environment. -- Sent from:

Re: Off heap memory issue

2017-08-30 Thread Robert Metzger
Hi Javier, I'm not aware of such issues with Flink, but if you could give us some more details on your setup, I might get some more ideas on what to look for. are you using the RocksDBStateBackend? (RocksDB is doing some JNI allocations, that could potentially leak memory) Also, are you passing

Re: Elasticsearch Sink - Error

2017-08-30 Thread Fabian Hueske
That's correct Flavio. The issue has been reported as https://issues.apache.org/jira/browse/FLINK-7386 Best, Fabian 2017-08-30 9:21 GMT+02:00 Flavio Pompermaier : > I also had problems with ES 5.4.3 and I had to modify the connector > code...I fear that the code is

Re: Twitter example

2017-08-30 Thread Fabian Hueske
Hi, print() writes the data to the out files of the TaskManagers. So you need to go to the machine that runs the TM and check its out file which is located in the log folder. Best, Fabian 2017-08-29 23:53 GMT+02:00 Krishnanand Khambadkone : > I am trying to run the

Re: Elasticsearch Sink - Error

2017-08-30 Thread Flavio Pompermaier
I also had problems with ES 5.4.3 and I had to modify the connector code...I fear that the code is compatible only up to ES 5.2 or similar.. On Wed, Aug 30, 2017 at 5:40 AM, Raj Kumar wrote: > Hi, > I am using elasticsearch 5.4.3 version in my flink project(flink