Exactly Once Guarantees with StreamingFileSink to S3

2019-02-05 Thread Kaustubh Rudrawar
Hi, I'm trying to understand the exactly once semantics of the StreamingFileSink with S3 in Flink 1.7.1 and am a bit confused on how it guarantees exactly once under a very specific failure scenario. For simplicity, lets say we will roll the current part file on checkpoint (and only on

JDBCAppendTableSink on Data stream

2019-02-05 Thread Chirag Dewan
Hi, In the documentation, the JDBC sink is mentioned as a source on Table API/stream.  Can I use the same sink with a Data stream as well? My use case is to read the data from Kafka and send the data to Postgres. I was also hoping to achieve Exactly-Once since these will mainly be Idempotent

1.7.1 and hadoop pre 2.7

2019-02-05 Thread Vishal Santoshi
Is there . a work around for https://issues.apache.org/jira/browse/FLINK-10203 or do have to be hadoop 2.7 ?

What async Scala HTTP client do people use with Flink async functions?

2019-02-05 Thread William Saar
Hi, Can anyone recommend a Scala http client that generates responses as futures and works with Flink's child-first class loading for use in Flink async functions? I have tried the sttp client with OkHttp and Akka backends and both seem to run into class loading problems, but I'm guessing I'm

H-A Deployment : Job / task manager configuration

2019-02-05 Thread bastien dine
Hello everyone, I would like to know what exactly I need to configure on my job / task managers for an H-A deployment The document ( https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/jobmanager_high_availability.html) is not really fluent about this.. The conf/masters need to be on

Kafka Sinking from DataSet

2019-02-05 Thread Jonny Graham
Hi, I'm using HadoopInputs.readHadoopFile() to read a Parquet file, which gives me a DataSource which (as far as I can see) is basically a DataSet. I want to write data from this source into Kafka, but the Kafka sink only works on a DataStream. There's no easy way to convert my DataSet to a

Running JobManager as Deployment instead of Job

2019-02-05 Thread Sergey Belikov
Hi, my team is currently experimenting with Flink running in Kubernetes (job cluster setup). And we found out that with JobManager being deployed as "Job" we can't just simply update certain values in job's yaml, e.g. spec.template.spec.containers.image (

late element and expired state

2019-02-05 Thread Aggarwal, Ajay
Hello, I have some questions regarding best practices to deal with ever expanding state with KeyBy(). In my input stream I will continue to see new keys. And I am using Keyed state. How do I keep the total state in limit? After reading the flink documentation and some blogs I am planning to

Re: Endorsed lib in Flink

2019-02-05 Thread Till Rohrmann
Hi Chirag, have you tried excluding the log4j1 jars and adding the log4j 1.2 bridge [1] so that all logging goes through log4j2? [1] https://logging.apache.org/log4j/2.0/log4j-1.2-api/ Cheers, Till On Tue, Feb 5, 2019 at 1:04 PM Chirag Dewan wrote: > Hi, > > Is there some sort of endorsed

Re: Dataset column statistics

2019-02-05 Thread Flavio Pompermaier
Any news on this Kurt? Could you share some insight about how you implemented it? I'm debated whether to run multiple jobs or if analyze could be performed in a single big job Best, Flavio On Tue, Dec 18, 2018 at 3:26 AM Kurt Young wrote: > Hi, > > We have implemented ANALYZE TABLE in our

Endorsed lib in Flink

2019-02-05 Thread Chirag Dewan
Hi, Is there some sort of endorsed lib in Flink yet? A brief about my use case : I am using a 3PP in my job which uses SLF4J as logging facade but has included a log4j1 binding in its source code. And I am trying to use log4j2 for my Flink application. I wired Flink to use log4j2 - added all

Re: How to add caching to async function?

2019-02-05 Thread William Saar
Ah, thanks, missed it when I only looked at the slides. Yes, have heard deadlocks are a problem with iterations but hoped it had been fixed. Pity, had been hoping to replace an external service with the Flink job, but will keep the service around for the caching, - Original Message -

Re: How to load multiple same-format files with single batch job?

2019-02-05 Thread françois lacombe
Thank you Fabian, That's good, I'll go for a custom File input stream. All the best François Le lun. 4 févr. 2019 à 12:10, Fabian Hueske a écrit : > Hi, > > The files will be read in a streaming fashion. > Typically files are broken down into processing splits that are > distributed to tasks

Re: AssertionError: mismatched type $5 TIMESTAMP(3)

2019-02-05 Thread Chris Miller
Sorry to reply to my own post but I wasn't able to figure out a solution for this. Does anyone have any suggestions I could try? -- Original Message -- From: "Chris Miller" To: "Timo Walther" ; "user" Sent: 29/01/2019 10:06:47 Subject: Re: AssertionError: mismatched type $5

Re: How to add caching to async function?

2019-02-05 Thread Lasse Nedergaard
Hi William No iterations isn’t the solution as you can (will) end up in a deadlock. We concluded that storing the results from external lookup in Kafka and use these data as input to the cache was the only way Med venlig hilsen / Best regards Lasse Nedergaard > Den 5. feb. 2019 kl. 10.22

Re: How to add caching to async function?

2019-02-05 Thread William Saar
Thanks! Looks like iterations is indeed the way to go for now then... - Original Message - From: "Lasse Nedergaard" To:"Fabian Hueske" Cc:"William Saar" , "user" Sent:Mon, 4 Feb 2019 20:20:30 +0100 Subject:Re: How to add caching to async function? Hi William We have created a