date:20180204

Why does jobmanager running needs slot ?

2018-02-04 Thread mingleizhang

I find some codes in flink does not make sense to me. Like in some classes below


JobMasterGateway.java has a offerSlots method which means Offers the given 
slots to the job manager. I was wondering why a jobmanager running should need 
slots ?
TaskExecutor.java has a offerSlotsToJobManager method which means offer slots 
to jobmanager.


Above both are confuse me. I just know that Task running needs slots which 
support by a taskManager. Does anyone let me why what does jobmanager needs 
slots mean ?


Thanks in advance.
Rice.

Re: How does BucketingSink generate a SUCCESS file when a directory is finished

2018-02-04 Thread xiaobin yan

Hi ,

I have tested it. There are some small problems. When checkpoint is 
finished, the name of the file will change, and the success file will be 
written before checkpoint.

Best,
Ben

> On 1 Feb 2018, at 8:06 PM, Kien Truong  wrote:
> 
> Hi,
> 
> I did not actually test this, but I think with Flink 1.4 you can extend 
> BucketingSink and overwrite the invoke method to access the watermark
> Pseudo code:
> invoke(IN value, SinkFunction.Context context) {
>long currentWatermark = context.watermark()
>long taskIndex = getRuntimeContext().getIndexOfThisSubtask()
> if (taskIndex == 0 && currentWatermark - lastSuccessWatermark > 1 hour) {
>Write _SUCCESS
>lastSuccessWatermark = currentWatermark round down to 1 hour
> }
> invoke(value)
> }
> 
> Regards,
> Kien
> On 1/31/2018 5:54 PM, xiaobin yan wrote:
>> Hi:
>> 
>> I think so too! But I have a question that when should I add this logic in 
>> BucketingSink! And who does this logic, and ensures that the logic is 
>> executed only once, not every parallel instance of the sink that executes 
>> this logic!
>> 
>> Best,
>> Ben
>> 
>>> On 31 Jan 2018, at 5:58 PM, Hung  
>>>  wrote:
>>> 
>>> it depends on how you partition your file. in my case I write file per hour,
>>> so I'm sure that file is ready after that hour period, in processing time.
>>> Here, read to be ready means this file contains all the data in that hour
>>> period.
>>> 
>>> If the downstream runs in a batch way, you may want to ensure the file is
>>> ready.
>>> In this case, ready to read can mean all the data before watermark as
>>> arrived.
>>> You could take the BucketingSink and implement this logic there, maybe wait
>>> until watermark
>>> reaches
>>> 
>>> Best,
>>> 
>>> Sendoh
>>> 
>>> 
>>> 
>>> --
>>> Sent from: 
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ 
>>>

Checkpoint is not triggering as per configuration

2018-02-04 Thread syed

Hi
I am new to the flink world, and trying to understand. Currently, I am using
Flink 1.3.2 on a small cluster of 4 nodes, 
I have configured checkpoint directory at HDFS, and run streaming word count
example with my own custom input file of 63M entries, 
I enabled checkpoint every one second {/env.enableCheckpointing(1000)/}

The problem I am facing is checkpoint is only triggered once after 1 second,
but no checkpoint afterwards, I run application for more than 5 minutes, but
checkpoint history shows only 1 checkpoint triggered and was successful. I
don't know why checkpoint not triggering after every second?
Please suggest me what is wrong?
Thanks in anticipation.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Getting Key from keyBy() in ProcessFunction

2018-02-04 Thread Ken Krugler

Hi Jürgen,

That makes sense to me.

Anyone from the Flink team want to comment on (a) if there is a way to get the 
current key in the timer callback without using an explicit ValueState that’s 
maintained in the processElement() method, and (b) if not, whether that could 
be added to the context?

Thanks,

— Ken


> On Feb 4, 2018, at 6:14 AM, Jürgen Thomann  
> wrote:
> 
> Hi Ken,
> 
> thanks for your answer. You're right and I'm doing it already that way. I 
> just hoped that I could avoid the ValueState (I'm using a MapState as well 
> already, which does not store the key) and get the key from the provided 
> Context of the ProcessFunction. This would avoid having the ValueState and 
> setting it in the processElement just to know the key in the onTimer 
> function. 
> In the current way I have to check the ValueState for every element if the 
> key is already set or just set it every time again the processElement method 
> is invoked.
> 
> Best,
> Jürgen
> 
> On 02.02.2018 18:37, Ken Krugler wrote:
>> Hi Jürgen,
>> 
>>> On Feb 2, 2018, at 6:24 AM, Jürgen Thomann >> > wrote:
>>> 
>>> Hi,
>>> 
>>> I'm currently using a ProcessFunction after a keyBy() and can't find a way 
>>> to get the key.
>> 
>> Doesn’t your keyBy() take a field (position, or name) to use as the key?
>> 
>> So then that same field contains the key in the 
>> ProcessFunction.processElement(in, …) parameter, yes?
>> 
>>> I'm currently storing it in a ValueState within processElement
>> 
>> If you’re using a ValueState, then there’s one of those for each unique key, 
>> not one for the operation.
>> 
>> I.e. the ValueState for key = “one” is separate from the ValueState for key 
>> = “two”.
>> 
>> You typically store the key in the state so it’s accessible in the onTimer 
>> method.
>> 
>>> and set it all the time, so that I can access it in onTimer(). Is there a 
>>> better way to get the key? We are using Flink 1.3 at the moment.
>> 
>> The ValueState (what you used in processElement) that you’re accessing in 
>> the onTimer() method is also scoped by the current key.
>> 
>> So assuming you stored the key in the state inside of your processElement() 
>> call, then you should have everything you need.
>> 
>> — Ken
>> 
>> PS - Check out 
>> https://www.slideshare.net/dataArtisans/apache-flink-training-datastream-api-processfunction
>>  
>> 
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Global window keyBy

2018-02-04 Thread miki haiat

Im using trigger   and a  guid in order to key stream .

I have  some problem to understand how to clear the window .


   - FIRE_AND_PURGE   in trigger  will remove the keyd data only ?

if fire and purge is removing all the data then i need to implement it more
like this  example

https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/windows/DrivingSegments.java

Evictor is used in order to clear the data by time stamp  but how can i
clear the data  by the key  also ?


thanks ,

Miki

Re: Getting Key from keyBy() in ProcessFunction

2018-02-04 Thread Jürgen Thomann

Hi Ken,

thanks for your answer. You're right and I'm doing it already that way.
I just hoped that I could avoid the ValueState (I'm using a MapState as
well already, which does not store the key) and get the key from the
provided Context of the ProcessFunction. This would avoid having the
ValueState and setting it in the processElement just to know the key in
the onTimer function.

In the current way I have to check the ValueState for every element if
the key is already set or just set it every time again the
processElement method is invoked.

Best,
Jürgen

On 02.02.2018 18:37, Ken Krugler wrote:

Hi Jürgen,

On Feb 2, 2018, at 6:24 AM, Jürgen Thomann
> wrote:

Hi,

I'm currently using a ProcessFunction after a keyBy() and can't find
a way to get the key.

Doesn’t your keyBy() take a field (position, or name) to use as the key?

So then that same field contains the key in the
ProcessFunction.processElement(in, …) parameter, yes?

I'm currently storing it in a ValueState within processElement

If you’re using a ValueState, then there’s one of those for each
unique key, not one for the operation.

I.e. the ValueState for key = “one” is separate from the ValueState
for key = “two”.

You typically store the key in the state so it’s accessible in the
onTimer method.

and set it all the time, so that I can access it in onTimer(). Is
there a better way to get the key? We are using Flink 1.3 at the moment.

The ValueState (what you used in processElement) that you’re accessing
in the onTimer() method is also scoped by the current key.

So assuming you stored the key in the state inside of your
processElement() call, then you should have everything you need.

— Ken

PS - Check out
https://www.slideshare.net/dataArtisans/apache-flink-training-datastream-api-processfunction

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Why does jobmanager running needs slot ?

Re: How does BucketingSink generate a SUCCESS file when a directory is finished

Checkpoint is not triggering as per configuration

Re: Getting Key from keyBy() in ProcessFunction

Global window keyBy

Re: Getting Key from keyBy() in ProcessFunction

6 matches

Site Navigation

Mail list logo

Footer information