Re: Few question about upgrade from 1.4 to 1.5 flink ( some very basic )

2018-06-28 Thread Christophe Jolif
Chesnay, Do you have rough idea of the 1.5.1 timeline? Thanks, -- Christophe On Mon, Jun 25, 2018 at 4:22 PM, Chesnay Schepler wrote: > The watermark issue is know and will be fixed in 1.5.1 > > > On 25.06.2018 15:03, Vishal Santoshi wrote: > > Thank you > > One addition > > I do not see W

Re: FlinkML

2018-04-18 Thread Christophe Jolif
Szymon, The short answer is no. See: http://mail-archives.apache.org/mod_mbox/flink-user/201802.mbox/%3ccaadrtt39ciiec1uzwthzgnbkjxs-_h5yfzowhzph_zbidux...@mail.gmail.com%3E On Mon, Apr 16, 2018 at 11:25 PM, Szymon Szczypiński wrote: > Hi, > > i wonder if there are possibility to build FlinkML

Re: keyBy and parallelism

2018-04-12 Thread Christophe Jolif
lelism as my cluster allow me to do so. Regards, > Sihua Zhou > > On 04/12/2018 15:44,Christophe Jolif > wrote: > > Thanks Chesnay (and others). > > That's what I was figuring out. Now let's go onto the follow up with my > exact use-case. > > I have two streams

Re: keyBy and parallelism

2018-04-12 Thread Christophe Jolif
ince you specify a parallellism of > 16, however 8 of these will not get any data. > > > On 11.04.2018 23:29, Hao Sun wrote: > > From what I learnt, you have to control parallelism your self. You can set > parallelism on operator or set default one through flink-config.yaml. >

keyBy and parallelism

2018-04-11 Thread Christophe Jolif
Hi all, Imagine I have a default parallelism of 16 and I do something like stream.keyBy("something").flatMap() Now let's imagine I have less than 16 keys, maybe 8. How many parallel executions of the flatMap function will I get? 8 because I have 8 keys, or 16 because I have default parallelism

Re: SSL config on Kubernetes - Dynamic IP

2018-04-09 Thread Christophe Jolif
17:48 GMT+02:00 Till Rohrmann : >>> >>>> Hi Edward, >>>> >>>> could you please file a JIRA issue for this problem. It might be as >>>> simple as that the TaskManager's network stack uses the IP instead of the >>>> hos

Re: SSL config on Kubernetes - Dynamic IP

2018-03-27 Thread Christophe Jolif
I suspect this relates to: https://issues.apache.org/jira/browse/FLINK-5030 For which there was a PR at some point but nothing has been done so far. It seems the current code explicitly uses the IP vs Hostname for Netty SSL configuration. Without that I'm really wondering how people are reasonabl

Re: Secure TLS/SSL ElasticSearch connector for current and future connector

2018-03-26 Thread Christophe Jolif
Hi Fritz, I think the High Level Rest Client implementation in this PR: https://github.com/apache/flink/pull/5374 should work. If you don't get the certificate properly available in your Java certs, you might want to redefine the createClient method to do something along those lines to get the con

Re: "dynamic" bucketing sink

2018-03-26 Thread Christophe Jolif
ynamic" feature. Have you looked in to the bucketing sink code? Maybe you > can adapt it to your needs? > > Otherwise it might also make sense to open an issue for it to discuss a > design for it. Maybe other contributors are interested in this feature as > well. > > Regards,

"dynamic" bucketing sink

2018-03-23 Thread Christophe Jolif
Hi all, I'm using the nice topic pattern feature on the KafkaConsumer to read from multiple topics, automatically discovering new topics added into the system. At the end of the processing I'm sinking the result into a Hadoop Filesystem using a BucketingSink. All works great until I get the requ

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-21 Thread Christophe Jolif
are of this problem and gives you exactly > once guarantees. > > Cheers, > Till > > On Tue, Feb 20, 2018 at 11:51 PM, Christophe Jolif > wrote: > >> Hmm, I did not realize that. >> >> I was planning when upgrading a job (consuming from Kafka) to cancel it &g

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-20 Thread Christophe Jolif
Hmm, I did not realize that. I was planning when upgrading a job (consuming from Kafka) to cancel it with a savepoint and then start it back from the savedpoint. But this savedpoint thing was giving me the apparently false feeling I would not lose anything? My understanding was that maybe I would

Re: Kafka and parallelism

2018-02-07 Thread Christophe Jolif
chema#deserialize` > method, which exposes information about which topic and partition each > record came from. > > Cheers, > Gordon > > On 7 February 2018 at 9:40:50 AM, Christophe Jolif (cjo...@gmail.com) > wrote: > > Hi Gordon, or anyone else reading this, > >

Re: Kafka and parallelism

2018-02-07 Thread Christophe Jolif
how to do > that. > > Hope this helps! > > Cheers, > Gordon > > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/dev/connectors/kafka.html#kafka-consumers- > topic-and-partition-discovery > > On 3 February 2018 at 6:53:47 PM, Christophe Jolif (cjo...@gm

Re: ML and Stream

2018-02-05 Thread Christophe Jolif
other possible directions. > > Best, Fabian > > [1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a > 352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E > [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP- > 23+-+Model+Serving > > 2018

ML and Stream

2018-02-05 Thread Christophe Jolif
Hi all, Sorry, this is me again with another question. Maybe I did not search deep enough, but it seems the FlinkML API is still pure batch. If I read https://cwiki.apache.org/confluence/display/FLINK/FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit the streaming nature of

Re: RocksDB / checkpoint questions

2018-02-05 Thread Christophe Jolif
successful > checkpoint and recover once DFS is back. > > Best, > Stefan > > Am 03.02.2018 um 17:45 schrieb Christophe Jolif : > > Thanks for sharing Kien. Sounds like the logical behavior but good to hear > it is confirmed by your experience. > > -- > Christoph

Re: Kafka and parallelism

2018-02-05 Thread Christophe Jolif
> > [1] https://ci.apache.org/projects/flink/flink-docs- > release-1.4/dev/connectors/kafka.html#kafka-consumers- > topic-and-partition-discovery > > On 3 February 2018 at 6:53:47 PM, Christophe Jolif (cjo...@gmail.com) > wrote: > > Hi, > > If I'm sourcing from

Kafka and parallelism

2018-02-03 Thread Christophe Jolif
Hi, If I'm sourcing from a KafkaConsumer do I have to explicitly set the Flink job parallelism to the number of partions or will it adjust automatically accordingly? In other word if I don't call setParallelism will get 1 or the number of partitions? The reason I'm asking is that I'm listening to

Re: RocksDB / checkpoint questions

2018-02-03 Thread Christophe Jolif
should succeed. >> >> Of course, if you also write to the distributed disk inside your job, >> then your job may crash too, but this is unrelated to the checkpoint >> process. >> >> Best regards, >> Kien >> >> Sent from TypeApp <http://w

RocksDB / checkpoint questions

2018-02-02 Thread Christophe Jolif
ount a new distributed disk? Or will it stop? May I lose data/reprocess things under that condition? -- Christophe Jolif

Re: Flink on K8s job submission best practices

2018-02-01 Thread Christophe Jolif
Hi Maximilian, Coming back on this as we have similar challenges. I was leaning towards 3. But then I read you and figured I might have missed something ;) We agree 3 is not idiomatic and creates a "detached job" but in a lack of a proper solution I can live with that. We also agree there is no

Re: ElasticsearchSink in Flink 1.4.0 with Elasticsearch 5.2+

2018-01-30 Thread Christophe Jolif
1.4-and-1.5-timeline.html > . > > > On 29.01.2018 13:41, Christophe Jolif wrote: > > Thanks a lot. Is there any timeline for 1.5 by the way? > > -- > Christophe > > On Mon, Jan 29, 2018 at 11:36 AM, Tzu-Li (Gordon) Tai > wrote: > >> Hi Christophe, >> &

Re: ElasticsearchSink in Flink 1.4.0 with Elasticsearch 5.2+

2018-01-29 Thread Christophe Jolif
ight now, but I agree that we should have support for ES > 5.3 and Es 6.x for the next minor release 1.5. > > Best, > Fabian > > > 2018-01-26 23:09 GMT+01:00 Christophe Jolif : > >> Ok, I got it "done". I have a PR for ES5.3 (FLINK-7386) just rebasing >>

Re: ElasticsearchSink in Flink 1.4.0 with Elasticsearch 5.2+

2018-01-26 Thread Christophe Jolif
;) Thanks, -- Christophe On Fri, Jan 26, 2018 at 1:46 PM, Christophe Jolif wrote: > Fabien, > > Unfortunately I need more than that :) But this PR is definitely a first > step. > > My real need is Elasticsearch 6.x support through RestHighLevel client. > FYI Elastic has deprecated

Re: ElasticsearchSink in Flink 1.4.0 with Elasticsearch 5.2+

2018-01-25 Thread Christophe Jolif
Hi Fabian, FYI I rebased the branch and tested it and it worked OK on a sample. -- Christophe On Mon, Jan 22, 2018 at 2:53 PM, Fabian Hueske wrote: > Hi Adrian, > > thanks for raising this issue again. > I agree, we should add support for newer ES versions. > I've added 1.5.0 as target release

State backend questions

2018-01-16 Thread Christophe Jolif
Hi all, At first my state should not be "that" big and fit in memory, so FsStateBackend could be a solution for me. However moving forward I envision more features and more users and the state growing. With that in mind RocksDBStateBackend might be the solution. Is there an easy "upgrade" path fr

event ordering

2018-01-09 Thread Christophe Jolif
Hi everyone, Let's imagine I have a stream of events coming a bit like this: { id: "1", value: 1, timestamp: 1 } { id: "2", value: 2, timestamp: 1 } { id: "1", value: 4, timestamp: 3 } { id: "1", value: 5, timestamp: 2 } { id: "2", value: 5, timestamp: 3 } ... As you can see with the non monoto

"keyed" aggregation

2018-01-05 Thread Christophe Jolif
Hi all, I'm sourcing from a Kafka topic, using the key of the Kafka message to key the stream, then doing some aggregation on the keyed stream. Now I want to sink back to a different Kafka topic but re-using the same key. The thing is that my aggregation "lost" the key. Obviously I can make sure