Re: Create a file in parquet format

2018-09-12 Thread jose farfan
Thanks! On Tue, 11 Sep 2018 at 15:36, Gary Yao wrote: > Hi Jose, > > You can find an example here: > > > https://github.com/apache/flink/blob/1a94c2094b8045a717a92e232f9891b23120e0f2/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/ParquetStreamingFileSinkITCase.ja

回复:Flink application down due to RpcTimeout exception

2018-09-12 Thread Zhijiang(wangzhijiang999)
Hi, 1. This rpc timeout occurs during JobMaster deploying task into TaskExecutor. The rpc thread in TaskExecutor does not respond the deployment message within 10 seconds. There are many possibilities to cause this issue, such as network problem between TaskExecutor and JobMaster or other time

Flink application down due to RpcTimeout exception

2018-09-12 Thread 徐涛
Hi All, I`m running flink1.6 on yarn,after the program run for a day, the flink program fails on yarn, and the error log is as follows: It seems that it is due to a timeout error. But I have the following questions: 1. In which step the flink components communicate failed?

How to get taskmanager hostname and port on runtime

2018-09-12 Thread 郑舒力
Hello community, Is there a way to get taskmanager hostname and port on runtime? I’d like to implement a kafka metric reporter, reporter should be able to report the taskmanager hostname and port to monitoring system. Thanks.

Re: Question regarding Streaming Resources

2018-09-12 Thread Ken Krugler
Hi Bhaskar, > On 2018/09/12 20:42:22, Ken Krugler wrote: >> Hi Bhaskar, >> >> I assume you don’t have 1000 streams, but rather one (keyed) stream with >> 1000 different key values, yes? >> >> If so, then this one stream is physically partitioned based on the >> parallelism of the operator fo

Logging metrics from within Elasticsearch ActionRequestFailureHandler

2018-09-12 Thread Averell
Good day everyone, I'm writing to Elasticsearch, and I need to count the number of records that the process failed to write. The problem that I'm facing is there is no RunningContext that I can access from within o.a.f.s.c.elasticsearch.ActionRequestFailureHandler's onFailure method so that I can

Re: Question regarding Streaming Resources

2018-09-12 Thread bhaskar . ebay77
On 2018/09/12 20:42:22, Ken Krugler wrote: > Hi Bhaskar, > > I assume you don’t have 1000 streams, but rather one (keyed) stream with 1000 > different key values, yes? > > If so, then this one stream is physically partitioned based on the > parallelism of the operator following the keyBy()

Test harness for validating proper checkpointing of custom SourceFunction

2018-09-12 Thread Ken Krugler
Hi all, We’re using the (Keyed)(One|Two)InputStreamOperatorTestHarness classes to test checkpointing of some custom functions. But in looking through the Flink source, I didn’t see anything comparable for testing a custom SourceFunction (which implements the ListCheckpointed interface). What’

Re: Question regarding Streaming Resources

2018-09-12 Thread Ken Krugler
Hi Bhaskar, I assume you don’t have 1000 streams, but rather one (keyed) stream with 1000 different key values, yes? If so, then this one stream is physically partitioned based on the parallelism of the operator following the keyBy(), not per unique key. The most common per-key “resource” is t

Broadcast managed state

2018-09-12 Thread Deepya
I am using Flink to process Streams. There is a MapValued Managed state that I am using to store custom state. I notice that the state is not shared across the Task Managers. Is there a way to Broadcast Managed state? Thanks, Deepya. -- Sent from: http://apache-flink-user-mailing-list-archive

Re: Question regarding Streaming Resources

2018-09-12 Thread bhaskar . ebay77
On 2018/09/12 16:55:09, bhaskar.eba...@gmail.com wrote: > Hi > > I have created a KeyedStream with state as explained below > For example i have created 1000 streams, out of which 50% of streams data is > going to come once in 8 hours. Will the resources of these under utilized > streams

Question regarding Streaming Resources

2018-09-12 Thread bhaskar . ebay77
Hi I have created a KeyedStream with state as explained below For example i have created 1000 streams, out of which 50% of streams data is going to come once in 8 hours. Will the resources of these under utilized streams are idle for that duration? Or Flink internal task manager is having some

Weird behaviour after change sources in a job.

2018-09-12 Thread Juan Gentile
Hello! We have found a weird issue while replacing the source in one of our Flink SQL Jobs. We have a job which was reading from a Kafka topic (with externalize checkpoints) and we needed to change the topic while keeping the same logic for the job/SQL. After we restarted the job, instead of c

Task managers run on separate nodes in a cluster

2018-09-12 Thread Martin Eden
Hi all, We're using Flink 1.3.2 with DCOS / Mesos. We have a 3 node cluster and are running the Flink DCOS package (Flink Mesos framework) configured with 3 Task Managers. Our goal is to run each of them on separate hosts for better load balancing but it seems the task managers end up running on

Custom metrics in Watermark Assigner

2018-09-12 Thread Oleksandr Nitavskyi
Hello guys, In our custom AssignerWithPunctuatedWatermarks we want to have custom metrics. Unfortunately operator’s wrapping this assigner interface is hidden from user API. What do you think if we add some optional api in order to let user possibility to register custom metrics in the watermar

Re: How does flink read a DataSet?

2018-09-12 Thread Taher Koitawala
Thanks a lot! For your explanation i am much clearer. However for my reference can you give me links of some documentations for flink Dataset and DataStream which clearly and in detail explain all the internals right from reading to processing etc etc. The flink landing page doesn't have in depth i

Re: How does flink read a DataSet?

2018-09-12 Thread Fabian Hueske
The InputFormat interface is similar to Hadoop MapReduce's. Data is emitted record-by-record, but InputFormats can read larger blocks for better efficiency (e.g., for ORC or Parquet files). In general, Flink tries to push data forward as early as possible and avoids collecting records in memory unl

Re: How does flink read a DataSet?

2018-09-12 Thread Taher Koitawala
So flink TMs reads one line at a time from hdfs in parallel and keep filling it in memory and keep passing the records to the next operator? I just want to know how data comes in memory? How it is partition between TMs Is there a documentation i can refer how the reading is done and how data is pus

Re: How does flink read a DataSet?

2018-09-12 Thread Fabian Hueske
Actually, some parts of Flink's batch engine are similar to streaming as well. If the data does not need to be sorted or put into a hash-table, the data is pipelined (like in many relational database systems). For example, if you have a job that joins two inputs with a HashJoin, only the build side

How to clear keyed states periodically?

2018-09-12 Thread Paul Lam
Hi, I’m using MapState to deduplicate some ids and the MapState needs to be truncated periodically. I tried to use ProcessingTimeCallback to call state.clear(), but in this way I can only clear the state for one key, and actually I need a key group level cleanup. So I’m wondering is there any

Re: What is the right way to add classpath?

2018-09-12 Thread bupt_ljy
Hi,Yun Tang Thanks for help. The first option makes the package process heavy, the second will make a change to flink’s lib folder. And the -yt cannot help also, because I need these dependencies before it’s submitted on yarn, and I did use -yt to submit my job and failed. Best, Jiayi Liao

Re: What is the right way to add classpath?

2018-09-12 Thread Yun Tang
Hi Jiayi As far as I know, there exist three ways: 1. Build the fat-application jar with dependencies using maven-shade-plugin or maven-assembly-plugin. 2. Copy the dependency jars to local ${FLINK_HOME}/lib folder. 3. Submit the job with -yt,--yarnship command, please refer to https

Re: Problem with querying state on Flink 1.6.

2018-09-12 Thread Kostas Kloudas
Hi Joe, And it would help a lot if you could share a bit more details about your setup and the code of your job or a minimal example that can reproduce it. Thanks, Kostas > On Sep 12, 2018, at 9:59 AM, Till Rohrmann wrote: > > Hi Joe, > > what is the current problem you are facing? > > Chee

Re: aggregate does not allow RichAggregateFunction ?

2018-09-12 Thread chiggi_dev
Hi Fabian, We came across this issue while working on RichAggregateFunction. Isnt generic state mergeable, similar to ACC merge? What if I need the Flink classLoader in the Aggregate function? Is there anyway I can do that without RuntimeContext? Thanks, Chirag -- Sent from: http://apache

Flink Checkpointing in production

2018-09-12 Thread Ahmad Hassan
Hi All, We need two clarifications for using Flink 1.6.0. We have flink jobs running to handle 100's of tenants with sliding window of 24hrs and slide by 5 minutes. 1) If checkpointing is enabled and flink job crashes in the middle of spitting out results to kafka producer. Then if the job resume

What is the right way to add classpath?

2018-09-12 Thread bupt_ljy
Hi,all My program needs some dependencies before it’s submitted to yarn. Like: ``` stream.filter(new FilterService()).print() env.execute() ``` I use external dependency inFilterService, and the program reports NoClassDefFoundError at org.apache.flink.client.program.PackagedProgram.callM

Re: Problem with querying state on Flink 1.6.

2018-09-12 Thread Till Rohrmann
Hi Joe, what is the current problem you are facing? Cheers, Till On Wed, Sep 12, 2018 at 12:18 AM Joe Olson wrote: > Kostas - Till's advice got me past my first problem. I'm still having > issues with the client side. I've got your example code from [1] in a > github project [2]. > > My proble