co-location groups vs slot sharing groups

2018-02-20 Thread Deepak Sharma
Hi, I have a few questions regarding slot sharing and co-location: The slotSharingGroup(String name) function sets the slot sharing group, is there a function to set the co-location group? Does setting a colocation group exclude tasks of other groups from the co-located group's task slots?

Re: Regarding BucketingSink

2018-02-20 Thread Mu Kong
Hi Aljoscha, Thanks for confirming that fact that Flink doesn't clean up pending files. Is that safe to clean(remove) all the pending files after cancel(w/ or w/o savepointing) or failover? If we do that, will we lose some data? Thanks! Best, Mu On Wed, Feb 21, 2018 at 5:27 AM, Vishal

Re: SQL materialized upsert tables

2018-02-20 Thread Elias Levy
[ Adding the list back in, as this clarifies my question ] On Tue, Feb 20, 2018 at 3:42 PM, Darshan Singh wrote: > I am no expert in Flink but I will try my best. Issue you mentioned will > be with all streaming systems even with Kafka KTable I use them a lot for >

SQL materialized upsert tables

2018-02-20 Thread Elias Levy
I noticed that has been significant work on the SQL / Table subsystem and decided to evaluate it for one of our use cases. The use case requires the joining of two streams, which can be considered a stream of table upserts. Critically, when joining the streams, we only want to join against the

Re: Need to understand the execution model of the Flink

2018-02-20 Thread Darshan Singh
Is there any plans for this in future. I could see at the plans and without these stats I am bit lost on what to look for like what are pain points etc. I can see some very obvious things but not too much with these plans. My question is there a guide or document which describes what your plans

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-20 Thread Christophe Jolif
Hmm, I did not realize that. I was planning when upgrading a job (consuming from Kafka) to cancel it with a savepoint and then start it back from the savedpoint. But this savedpoint thing was giving me the apparently false feeling I would not lose anything? My understanding was that maybe I would

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-20 Thread Bart Kastermans
Thanks for the reply; is there a flip for this? - bart On Mon, Feb 19, 2018, at 5:50 PM, Till Rohrmann wrote: > Hi Bart, > > you're right that Flink currently does not support a graceful stop > mechanism for the Kafka source. The community has already a good > idea how to solve it in the

Re: Regarding BucketingSink

2018-02-20 Thread Vishal Santoshi
Sorry, but just wanted to confirm that the assertion "at-least-once" delivery true if there is a dangling pending file ? On Mon, Feb 19, 2018 at 2:21 PM, Vishal Santoshi wrote: > That is fine, till flink assure at-least-once semantics ? > > If the contents of a

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Gerard Garcia
You are right that probably the best solution would be to be able to use different state backends for different operators, I hope it gets implemented at some point. Meanwhile I'll take a look at the methods in org.apache.flink.runtime.state.KeyGroupRangeAssignment, maybe I can find a workaround

Secured communication with Zookeeper without Kerberos

2018-02-20 Thread Edward Rojas
Hi, I'm setting up a Flink cluster on kubernetes with High Availability using Zookeeper. It's working well so far without the security configuration for zookeeper. I need to have secured communication between Flink and zookeeper but I'd like to avoid the need to setup a Kerberos server only for

Re: Retrieving name of last external checkpoint directory

2018-02-20 Thread Aljoscha Krettek
Hi, I think there is currently no easy way of doing this. Things that come to mind are: - looking at the JM log - polling the JM REST interface for completed externalised checkpoints The good news is that Flink 1.5 will rework how externalised checkpoints work a bit: basically, all

Re: Correlation between number of operators and Job manager memory requirements

2018-02-20 Thread Shailesh Jain
Hi Till, Thanks for your reply. >> My suggestion would be to split the different patterns up and run them with in different jobs. I'm not able to understand how splitting up the jobs based on patterns would be more efficient than based on the key. The total number of operators would still be

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Stefan Richter
Hi, ok, now I understand your goal a bit better. If would still like to point out that it may take a bit more than it looks like. Just to name one example, you probably also want to support asynchronous snapshots which is most likely difficult when using just a hashmap. I think the proper

Save the date: ApacheCon North America, September 24-27 in Montréal

2018-02-20 Thread Rich Bowen
Dear Apache Enthusiast, (You’re receiving this message because you’re subscribed to a user@ or dev@ list of one or more Apache Software Foundation projects.) We’re pleased to announce the upcoming ApacheCon [1] in Montréal, September 24-27. This event is all about you — the Apache project

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Gerard Garcia
Hi Stefan, thanks Yes, we are also using keyed state in other operators the problem is that serialization is quite expensive and in some of them we would prefer to avoid it by storing the state in memory (for our use case one specific operator with in memory state gives at least a 30% throughput

Re: Get which key groups are assigned to an operator

2018-02-20 Thread Stefan Richter
Hi, from what I read, I get the impression that you attempt to implement you own "keyed state" with a hashmap? Why not using the keyed state that is already provided by Flink and gives you efficient rescaling etc. out of the box? Please see [1] for the details. [1]

Re: Flink Batch job: All slots for groupReduce task scheduled on same machine

2018-02-20 Thread Aljoscha Krettek
Hmm, that seems weird. Could you please also post the code of the complete program? Only the parts that build the program graph should be enough. And maybe a screenshot of the complete graph from the dashboard. -- Aljoscha > On 20. Feb 2018, at 11:36, Aneesha Kaushal

Get which key groups are assigned to an operator

2018-02-20 Thread gerardg
Hello, To improve performance we have " keyed state" in the operator's memory, basically we keep a Map which contains the state per each of the keys. The problem comes when we want to restore the state after a failure or after rescaling the operator. What we are doing is sending the concatenation

Re: Managed State Custom Serializer with Avro

2018-02-20 Thread Arvid Heise
Hi Aljoscha, I opened https://issues.apache.org/jira/browse/FLINK-8715 for the RocksDB issue with pointers to the code. Let me know if you need more details. Best, Arvid On Tue, Feb 20, 2018 at 1:04 PM, Arvid Heise wrote: > Hi Aljoscha, hi Till, > > @Aljoscha, the new

Re: Managed State Custom Serializer with Avro

2018-02-20 Thread Arvid Heise
Hi Aljoscha, hi Till, @Aljoscha, the new AvroSerializer is almost what I wanted except that it does not use the schema of the snapshot while reading. In fact, this version will fail with the same error as before when a field is added or removed.

Re: Managed State Custom Serializer with Avro

2018-02-20 Thread Till Rohrmann
A small addition, currently savepoints are always full checkpoints. Thus, you should not have the problem when calling cancel with savepoint. Concerning 2, I think the idea was to only check for compatibility at restore time. The check will either say its compatible or not. If it's not

Re: Managed State Custom Serializer with Avro

2018-02-20 Thread Aljoscha Krettek
Hi Arvid, Did you check out the most recent AvroSerializer code? https://github.com/apache/flink/blob/f3a2197a23524048200ae2b4712d6ed833208124/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSerializer.java#L185

Re: Correlation between number of operators and Job manager memory requirements

2018-02-20 Thread Till Rohrmann
Hi Shailesh, I fear that given your job topology, it is not that surprising that things break. The problem is that you might have M x N CEP operators concurrently active. This means that they have to keep their state in memory. Given 3.5 GB isn't that much if you have more than 300 CEP NFAs

Re: flink read hdfs file error

2018-02-20 Thread Aljoscha Krettek
Hi, What is the exact problem you're seeing? Could you please post your POM file and also a listing of all the files in your user jar? Best, Aljoscha > On 15. Feb 2018, at 11:12, Or Sher wrote: > > Hi, > Did you ever get to solve this issue? I'm getting the same error. >

Re: Flink Batch job: All slots for groupReduce task scheduled on same machine

2018-02-20 Thread Aneesha Kaushal
2018-02-20, 14:31:582018-02-20, 14:52:2820m 30s Map (Map at com.rfk.dataplatform.batch.jobs.topk.TopkOperations$$anonfun$4.apply(TopkOperations.scala:128)) 10.8 GB 130,639,359 10.8 GB 130,639,359 16 00016000 FINISHED Start Time End TimeDuration

Re: Managed State Custom Serializer with Avro

2018-02-20 Thread Arvid Heise
Hi guys, just wanted to write about that topic on my own. The FF talk of Tzu-Li gave me also the impression that by just using AvroSerializer, we get some kind of state evolution for free.

Re: sink with BucketingSink to S3 files override

2018-02-20 Thread Aljoscha Krettek
Hi, I'm afraid the BucketingSink does not work well with S3 because of the eventually-consistent nature of S3. As you noticed in the code snipped you sent, the sink relies on the fact that directory listings are accurate, which is not the case with S3. The Flink community is aware of this

Re: Correlation between number of operators and Job manager memory requirements

2018-02-20 Thread Shailesh Jain
Hi Till, When I'm submitting one big job, both JM and TM (sometimes just JM) are crashing at the time of initialization itself (i.e. not all operators switch to RUNNING) with OOM. The number of threads on TM go to almost 1000. But when I'm submitting multiple jobs, job submission is completed.

Re: CoProcess() VS union.Process() & Timers in them

2018-02-20 Thread m@xi
OK man! Thanks a lot. To tell you the truth the documentation did not explain it in a convincing way to consider it an important/potential operator to use in my applications. Thanks for mentioning. Best, Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: CoProcess() VS union.Process() & Timers in them

2018-02-20 Thread Fabian Hueske
I don't think that the mapping is that sophisticated. I'd assume it is a bit simpler and just keeps one local pipeline (the one with the same subtask index) which will run in the same slot (unless explicitly configured differently). TBH, I would not rely on this behavior. rescale() is rather an

Re: Flink Batch job: All slots for groupReduce task scheduled on same machine

2018-02-20 Thread Aljoscha Krettek
Could you please send a screenshot? > On 20. Feb 2018, at 11:09, Aneesha Kaushal > wrote: > > Hello Aljoscha > > I looked into the Subtasks session on Flink Dashboard, for the about two > tasks. > > Thanks > Aneesha > >> On 20-Feb-2018, at 3:32 PM, Aljoscha

Re: Manipulating Processing elements of Network Buffers

2018-02-20 Thread m@xi
Hi Till! Thanks a lot for your useful reply. So now I get it. I should not manipulate or disturb the network buffer contents, as this will trigger other problematic behaviours. On the other hand, the price of buffering the data in my operator first and e.g. sorting them first based on some

Re: Flink Batch job: All slots for groupReduce task scheduled on same machine

2018-02-20 Thread Aneesha Kaushal
Hello Aljoscha I looked into the Subtasks session on Flink Dashboard, for the about two tasks. Thanks Aneesha > On 20-Feb-2018, at 3:32 PM, Aljoscha Krettek wrote: > > Hi, > > Could you please also post where/how you see which tasks are mapped to which >

Re: Flink Batch job: All slots for groupReduce task scheduled on same machine

2018-02-20 Thread Aljoscha Krettek
Hi, Could you please also post where/how you see which tasks are mapped to which slots/TaskManagers? Best, Aljoscha > On 20. Feb 2018, at 10:50, Aneesha Kaushal > wrote: > > Hello, > > I have a fink batch job, where I am grouping dataset on some keys, and

Re: CoProcess() VS union.Process() & Timers in them

2018-02-20 Thread m@xi
Hey Fabian! Thanks for the comprehensive replies. Now I understand those concepts properly. Regarding .rescale() , it does not receive any arguments. Thus, I assume that the way it does the shuffling from operator A to operator B instances is a black box for the programmer and probably has to do

Re: # of active session windows of a streaming job

2018-02-20 Thread Fabian Hueske
Hi Dongwon Kim, That's an interesting question. I don't have a solution blueprint for you, but a few ideas that should help to solve the problem. I would start with a separate job first and later try to integrate it with the other job. You could implement a Trigger that fires when a new window

Flink Batch job: All slots for groupReduce task scheduled on same machine

2018-02-20 Thread Aneesha Kaushal
Hello, I have a fink batch job, where I am grouping dataset on some keys, and then using group reduce. Parallelism is set to 16. The slots for the Map task is distributed across all the machines, but for GroupReduce all the slots are being assigned to the same machine. Can you help me

Re: Iterating over state entries

2018-02-20 Thread Fabian Hueske
Hi Ken, That's correct. The iterator will become invalid once you leave the method. If you are only interested in a few specific entries than index access is probably the most efficient approach. Best, Fabian 2018-02-20 1:03 GMT+01:00 Ken Krugler : > Hi Till, > >

Re: Need to understand the execution model of the Flink

2018-02-20 Thread Fabian Hueske
No, there is no size or cardinality estimation happening at the moment. Best, Fabian 2018-02-19 21:56 GMT+01:00 Darshan Singh : > Thanks , is there a metric or other way to know how much space each > task/job is taking? Does execution plan has these details? > > Thanks >

# of active session windows of a streaming job

2018-02-20 Thread Dongwon Kim
Hi, It could be a totally stupid question but I currently have no idea how to get the number of active session windows from a running job. Our traffic trajectory application (which handles up to 10,000 tps) uses event-time session window on KeyedStream (keyed by userID). Should I write

Re: A "per operator instance" window all ?

2018-02-20 Thread Julien
Hi Xingcan, Ken and Till, OK, thank you. It is clear. I have various option then: * the one suggested by Ken where I can find a way to build a key that will be well distributed (1 key per task) o it relies on the way Flink partitions the key, but it will do the job * or I can