Re: StreamingFileSink cannot get AWS S3 credentials

2019-01-15 Thread Vinay Patil
Hi, Can someone please help on this issue. We have even tried to set fs.s3a.impl in core-site.xml, still its not working. Regards, Vinay Patil On Fri, Jan 11, 2019 at 5:03 PM Taher Koitawala [via Apache Flink User Mailing List archive.] wrote: > Hi All, > We have implemented S3 sink

Re: There is no classloader in TableFactoryUtil using ConnectTableDescriptor to registerTableSource

2019-01-15 Thread Joshua Fan
Hi Hequn Thanks. Now I know what you mean. To use tableEnv.registerTableSource instead of using StreamTableDescriptor.registerTableSource. Yes, it is a good solution. If the StreamTableDescriptor itself can use a user-defined classloader, it is better. Thank you. Yours sincerely Joshua On Wed,

Re: There is no classloader in TableFactoryUtil using ConnectTableDescriptor to registerTableSource

2019-01-15 Thread Joshua Fan
Hi Hequn Yes, the TableFactoryService has a proper method. As I use StreamTableDescriptor to connect to Kafka, StreamTableDescriptor actually uses ConnectTableDescriptor which calls TableFactoryUtil to do service load, and TableFactoryUtil does not use a user defined classloader, so I can not use

Re: Streaming Checkpoint - Could not materialize checkpoint Exception

2019-01-15 Thread Congxian Qiu
Hi, Sohi You can check out doc[1][2] to find out the answer. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing [2] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/restart_strategies.html

Re: One TaskManager per node or multiple TaskManager per node

2019-01-15 Thread Ethan Li
It makes sense. Thank you very much, Jamie! > On Jan 15, 2019, at 12:48 PM, Jamie Grier wrote: > > Ethan, it depends on what you mean by easy ;) It just depends a lot on what > infra tools you already have in place. On bare metal it's probably safe to > say there is no "easy" way. You

Unable to override metric format for Prometheus Reporter

2019-01-15 Thread Kaustubh Rudrawar
Hi, I'm setting up Flink 1.7.0 on a Kubernetes cluster and am seeing some unexpected behavior when using the Prometheus Reporter. With the following setup in flink-conf.yaml: metrics.reporters: prometheus metrics.reporter.prometheus.class:

Re: One TaskManager per node or multiple TaskManager per node

2019-01-15 Thread Jamie Grier
Ethan, it depends on what you mean by easy ;) It just depends a lot on what infra tools you already have in place. On bare metal it's probably safe to say there is no "easy" way. You need a lot of automation to make it easy. Bastien, IMO, #1 applies to batch jobs as well. On Tue, Jan 15, 2019

Re: Parallelism questions

2019-01-15 Thread Till Rohrmann
I'm not aware of someone working on this feature right now. On Tue, Jan 15, 2019 at 3:22 PM Alexandru Gutan wrote: > Thats great news! > > Are there any plans to expose it in the upcoming Flink release? > > On Tue, 15 Jan 2019 at 12:59, Till Rohrmann wrote: > >> Hi Alexandru, >> >> at the

Re: Get watermark metric as a delta of current time

2019-01-15 Thread Andrey Zagrebin
Hi Cristian, Have you tried to extend AbstractUdfStreamOperator and override processWatermark? This method should deliver the increasing watermark. Do you use processing or event time of records? Best, Andrey On Mon, Jan 14, 2019 at 11:03 PM Cristian wrote: > Hello. > > Flink emits watermark

Re: ElasticSearch RestClient throws NoSuchMethodError due to shade mechanism

2019-01-15 Thread Rong Rong
Hi Henry, I was not sure if this is the suggested way. but from what I understand of the pom file in elasticsearch5, you are allowed to change the sub version of the org.ealisticsearch.client via manually override using -Delasticsearch.version=5.x.x during maven build progress if you are using

Re: Get the savepointPath of a particular savepoint

2019-01-15 Thread anaray
Dawid , Gary, Got it . Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Duplicate record writes to sink after job failure

2019-01-15 Thread Andrey Zagrebin
Hi Chris, there is no way to provide "exactly-once" and avoid duplicates without transactions available since Kafka 0.11. The only way I could think of is building a custom deduplication step on consumer side. E.g. using in memory cache with eviction or some other temporary storage to keep set of

Re: One TaskManager per node or multiple TaskManager per node

2019-01-15 Thread bastien dine
Hello Jamie, Does #1 apply to batch jobs too ? Regards, -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le lun. 14 janv. 2019 à 20:39, Jamie Grier a écrit : > There are a lot of different ways to deploy Flink. It would be easier to > answer

Re: Flink on Kubernetes - Hostname resolution between job/tasks-managers

2019-01-15 Thread bastien dine
Nevermind.. Problem already discussed in thread : Flink 1.7 jobmanager tries to lookup taskmanager by its hostname in k8s environment" -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mar. 15 janv. 2019 à 15:16, bastien dine a écrit : > Hello,

Re: Parallelism questions

2019-01-15 Thread Alexandru Gutan
Thats great news! Are there any plans to expose it in the upcoming Flink release? On Tue, 15 Jan 2019 at 12:59, Till Rohrmann wrote: > Hi Alexandru, > > at the moment `/jobs/:jobid/rescaling` will always change the parallelism > for all operators. The maximum is the maximum parallelism which

Re: Re: How can I make HTTP requests from an Apache Flink program?

2019-01-15 Thread Alexandru Gutan
Hi Jacopo, Check this: https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/asyncio.html Best, Alex On Tue, 15 Jan 2019 at 13:57, wrote: > Hi, > > > > I have a flink program which needs to process many messages and part of > this processing is to process the data

Flink on Kubernetes - Hostname resolution between job/tasks-managers

2019-01-15 Thread bastien dine
Hello, I am trying to install Flink on Kube, it's almost working.. I am using the kube files on flink 1.7.1 doc My cluster is starting well, my 2 tasksmanagers are registering successfully to job manager On webUI, i see them : akka.tcp://flink@dev-flink-taskmanager-3717639837-gvwh4

RE: Re: How can I make HTTP requests from an Apache Flink program?

2019-01-15 Thread jacopo.gobbi
Hi, I have a flink program which needs to process many messages and part of this processing is to process the data using an external web service using http calls. Example: val myStream: DataStream[String] myStream .map(new MyProcessingFunction) .map(new MyWebServiceHttpClient) .print Any

Re: Subtask much slower than the others when creating checkpoints

2019-01-15 Thread Till Rohrmann
Hi Pasquale, if you configured a checkpoint directory, then the MemoryStateBackend will also write the checkpoint data to disk in order to persist it. Cheers, Till On Tue, Jan 15, 2019 at 1:08 PM Pasquale Vazzana wrote: > I can send you some debug logs and the execution plan, can I use your >

Re: How can I make HTTP requests from an Apache Flink program?

2019-01-15 Thread miki haiat
Can you share more which use case are you trying to implement ? On Tue, Jan 15, 2019 at 2:02 PM wrote: > Hi all, > > > > I was wondering if anybody has any recommendation over making HTTP > requests from Flink to another service. > > On the long term we are looking for a solution that is both

Re: Parallelism questions

2019-01-15 Thread Till Rohrmann
Hi Alexandru, at the moment `/jobs/:jobid/rescaling` will always change the parallelism for all operators. The maximum is the maximum parallelism which you have defined for an operator. I agree that it should also be possible to rescale an individual operator. There internal functionality is

Re: Bug in RocksDB timer service

2019-01-15 Thread Gyula Fóra
Thanks for the tip Stefan, you are probably right that this might be related to a custom change. We have a change that deletes every state that hasn't been registered in the open method and maybe it accidentally delates the timer service as well, need to check. Thanks! Gyula On Tue, Jan 15, 2019

Re: There is no classloader in TableFactoryUtil using ConnectTableDescriptor to registerTableSource

2019-01-15 Thread Hequn Cheng
Hi Joshua, Could you use `TableFactoryService` directly to register TableSource? The code looks like: final TableSource tableSource = > TableFactoryService.find(StreamTableSourceFactory.class, > streamTableDescriptor, classloader) > .createStreamTableSource(propertiesMap); >

RE: Subtask much slower than the others when creating checkpoints

2019-01-15 Thread Pasquale Vazzana
I can send you some debug logs and the execution plan, can I use your personal email? There might be sensitive info in the logs. Incoming and Outgoing records are fairly distributed across subtasks, with similar but alternate loads, when the checkpoint is triggered, the load drops to nearly

Re: Parallelism questions

2019-01-15 Thread Alexandru Gutan
Thanks Till! To execute the above (using Kubernetes), one would enter the running JobManager service and execute it? The following REST API call does the same */jobs/:jobid/rescaling*? I assume it changes the base parallelism, but what it will do if I had already set the parallelism of my

How can I make HTTP requests from an Apache Flink program?

2019-01-15 Thread jacopo.gobbi
Hi all, I was wondering if anybody has any recommendation over making HTTP requests from Flink to another service. On the long term we are looking for a solution that is both performing and integrates well with our flink program. Does it matter the library we use? Do we need a special connector

[SURVEY] Custom RocksDB branch

2019-01-15 Thread Andrey Zagrebin
Dear Flink users and developers! I start this discussion to collect feedback about maintaining a custom RocksDB branch for Flink, if anyone sees any problems with this approach. Are there people who already uses a custom RocksDB client build with RocksDB state backend? As you might already know,

Re: Subtask much slower than the others when creating checkpoints

2019-01-15 Thread Bruno Aranda
Hi Stefan, Thanks for your suggestion. As you may see from the original screenshot, the actual state is small, and even smaller than other some of the other subtasks. We are consuming from a Kafka topic with 600 partitions, with parallelism set to around 20. Our metrics show that all the subtasks

Re: Subtask much slower than the others when creating checkpoints

2019-01-15 Thread Stefan Richter
Hi, I have seen a few cases where for certain jobs a small imbalance in the state partition assignment did cascade into a larger imbalance of the job. If your max parallelism mod parallelism is not 0, it means that some tasks have one partition more than others. Again, depending on how much

Re: Subtask much slower than the others when creating checkpoints

2019-01-15 Thread Bruno Aranda
Hi, Just an update from our side. We couldn't find anything specific in the logs and the problem is not easy reproducible. This week, the system is running fine, which makes me suspicious as well of some resourcing issue. But so far, we haven't been able to find the reason though we have

There is no classloader in TableFactoryUtil using ConnectTableDescriptor to registerTableSource

2019-01-15 Thread Joshua Fan
Hi As known, TableFactoryService has many methods to find a suitable service to load. Some of them use a user defined classloader, the others just uses the default classloader. Now I use ConnectTableDescriptor to registerTableSource in the environment, which uses TableFactoryUtil to load

ElasticSearch RestClient throws NoSuchMethodError due to shade mechanism

2019-01-15 Thread 徐涛
Hi All, I use the following code try to build a RestClient org.elasticsearch.client.RestClient.builder( new HttpHost(xxx, xxx,"http") ).build() but when in running time, a NoSuchMethodError throws out, I think the reason is: There are two RestClient classes, one

Re: Subtask much slower than the others when creating checkpoints

2019-01-15 Thread Till Rohrmann
Same here Pasquale, the logs on DEBUG log level could be helpful. My guess would be that the respective tasks are overloaded or there is some resource congestion (network, disk, etc). You should see in the web UI the number of incoming and outgoing events. It would be good to check that the

Re: Parallelism questions

2019-01-15 Thread Till Rohrmann
Hi Alexandru, you can use the `modify` command `bin/flink modify --parallelism ` to modify the parallelism of a job. At the moment, it is implemented as first taking a savepoint, stopping the job and then redeploying the job with the changed parallelism and resuming from the savepoint. Cheers,

Re: Recovery problem 2 of 2 in Flink 1.6.3

2019-01-15 Thread Till Rohrmann
Hi John, this looks indeed strange. How many concurrent operators do you have which write state to s3? After the cancellation, the JobManager should keep the slots for some time until they are freed. This is the normal behaviour and can be controlled with `slot.idle.timeout`. Could you maybe

Re: Recovery problem 1 of 2 in Flink 1.6.3

2019-01-15 Thread Till Rohrmann
Hi John, this is definitely not how Flink should behave in this situation and could indicate a bug. From the logs I couldn't figure out the problem. Would it be possible to obtain for the TMs and JM the full logs with DEBUG log level? This would help me to further debug the problem. Cheers, Till

Re: Bug in RocksDB timer service

2019-01-15 Thread Stefan Richter
Hi, I have never seen this before. I would assume to see this exception because the write batch is flushed and contained a write against a column family that does not exist (anymore). However, we initialize everything relevant in RocksDBCachingPriorityQueueSet as final (CF handle) and never

Bug in RocksDB timer service

2019-01-15 Thread Gyula Fóra
Hi! Lately I seem to be hitting a bug in the rocksdb timer service. This happens mostly at checkpoints but sometimes even at watermark: java.lang.RuntimeException: Exception occurred while processing valve output watermark: at

Re: NoMatchingTableFactoryException when test flink sql with kafka in flink 1.7

2019-01-15 Thread Joshua Fan
Hi Zhenghua Yes, the topic is polluted somehow. After I create a new topic to consume, It is OK now. Yours sincerely Joshua On Tue, Jan 15, 2019 at 4:28 PM Zhenghua Gao wrote: > May be you're generating non-standard JSON record. > > > > -- > Sent from: >

Re: NoMatchingTableFactoryException when test flink sql with kafka in flink 1.7

2019-01-15 Thread Zhenghua Gao
May be you're generating non-standard JSON record. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Streaming Checkpoint - Could not materialize checkpoint Exception

2019-01-15 Thread sohimankotia
Yes. File got deleted . 2019-01-15 10:40:41,360 INFO FSNamesystem.audit: allowed=true ugi=hdfs (auth:SIMPLE) ip=/192.168.3.184 cmd=delete src=/pipeline/job/checkpoints/e9a08c0661a6c31b5af540cf352e1265/chk-470/5fb3a899-8c0f-45f6-a847-42cbb71e6d19 dst=nullperm=null

Re: Streaming Checkpoint - Could not materialize checkpoint Exception

2019-01-15 Thread Congxian Qiu
Hi, Sohi Seems like the checkpoint file `hdfs:/pipeline/job/checkpoints/e9a08c0661a6c31b5af540cf352e1265/chk-470/5fb3a899-8c0f-45f6-a847-42cbb71e6d19` did not exist for some reason, you can check the life cycle of this file from hdfs audit log and find out why the file did not exist. maybe the