Re: Retrieve written records of a sink after job

2018-02-19 Thread Flavio Pompermaier
Hi Fabian,
thanks for the feedback. Right now I'm doing exactly as you said.
Since I was seeing this as a useful API extension I just proposed this
addition and so I asked for feedbacks.
Of course, it doesn't make much sense if I'm the only one asking for it :)

Best,
Flavio

On Mon, Feb 19, 2018 at 10:31 AM, Fabian Hueske  wrote:

> Hi Flavio,
>
> Not sure if I would add this functionality to the sinks.
> You could also add a MapFunction with a counting Accumulator right before
> each sink.
>
> Best, Fabian
>
>
> 2018-02-14 14:11 GMT+01:00 Flavio Pompermaier :
>
>> So, if I'm not wrong, the right way to do this is using
>> accumulators..what do you think about my proposal to add an easy way to add
>> to a sink an accumulator for the written/outputed records?
>>
>> On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler 
>> wrote:
>>
>>> Technically yes, a subset of metrics is stored in the ExecutionGraph
>>> when the job finishes. (This is for example where the webUI derives the
>>> values from for finished jobs). However these are on the task level, and
>>> will not contain the number of incoming records if your sink is chained to
>>> another operator. Changing this would be a larger endeavor, and tbh i don't
>>> see this happening soon.
>>>
>>> I'm afraid for now you're stuck with the REST API for finished jobs.
>>> (Correction for my previous mail: The metrics REST API cannot be used for
>>> finished jobs)
>>>
>>> Alternatively, if you rather want to work on files/json you can enable
>>> job archiving by configuring the jobmanager.archive.fs.dir directory.
>>> When the job finishes this will contain a big JSON file for each job
>>> containing all responses that the UI would return for finished jobs.
>>>
>>>
>>> On 14.02.2018 12:50, Flavio Pompermaier wrote:
>>>
>>> The problem here is that I don't know the vertex id of the sink..would
>>> it be possible to access the sink info by id?
>>> And couldn't be all those info attached to the JobExecutionResult
>>> (avoiding to set up all the rest connection etc)?
>>>
>>> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler 
>>> wrote:
>>>
 The only way to access this info from the client is the REST API
 
 or the Metrics REST API
 .



 On 14.02.2018 12:38, Flavio Pompermaier wrote:

 Actually I'd like to get this number from my Java class in order to
 update some external dataset "catalog",
 so I'm asking if there's some programmatic way to access this info
 (from JobExecutionResult for example).

 On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler 
 wrote:

> Do you want to know how many records the sink received, or how many
> the sink wrote to the DB?
> If it's the first you're in luck because we measure that already,
> check out the metrics documentation.
> If it's the latter, then this issue is essentially covered by
> FLINK-7286 which aims at allowing functions
> to modify the numRecordsIn/numRecordsOut counts.
>
>
> On 14.02.2018 12:22, Flavio Pompermaier wrote:
>
> Hi to all,
> I have a (batch) job that writes to 1 or more sinks.
> Is there a way to retrieve, once the job has terminated, the number of
> records written to each sink?
> Is there any better way than than using an accumulator for each sink?
> If that is the only way to do that, the Sink API could be enriched in
> order to automatically create an accumulator when required. E.g.
>
> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
> .setDrivername(...)
> .setDBUrl(...)
> .setQuery(...)
> *.addRecordsCountAccumulator("some-name")*
> .finish())
>
> Best,
> Flavio
>
>
>


 --
 Flavio Pompermaier
 Development Department

 OKKAM S.r.l.
 Tel. +(39) 0461 041809 <+39%200461%20041809>



>>>
>>>
>>> --
>>> Flavio Pompermaier
>>> Development Department
>>>
>>> OKKAM S.r.l.
>>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>>
>>>
>>>
>>
>>
>> --
>> Flavio Pompermaier
>> Development Department
>>
>> OKKAM S.r.l.
>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>
>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809


Re: Retrieve written records of a sink after job

2018-02-19 Thread Fabian Hueske
Hi Flavio,

Not sure if I would add this functionality to the sinks.
You could also add a MapFunction with a counting Accumulator right before
each sink.

Best, Fabian


2018-02-14 14:11 GMT+01:00 Flavio Pompermaier :

> So, if I'm not wrong, the right way to do this is using accumulators..what
> do you think about my proposal to add an easy way to add to a sink an
> accumulator for the written/outputed records?
>
> On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler 
> wrote:
>
>> Technically yes, a subset of metrics is stored in the ExecutionGraph when
>> the job finishes. (This is for example where the webUI derives the values
>> from for finished jobs). However these are on the task level, and will not
>> contain the number of incoming records if your sink is chained to another
>> operator. Changing this would be a larger endeavor, and tbh i don't see
>> this happening soon.
>>
>> I'm afraid for now you're stuck with the REST API for finished jobs.
>> (Correction for my previous mail: The metrics REST API cannot be used for
>> finished jobs)
>>
>> Alternatively, if you rather want to work on files/json you can enable
>> job archiving by configuring the jobmanager.archive.fs.dir directory.
>> When the job finishes this will contain a big JSON file for each job
>> containing all responses that the UI would return for finished jobs.
>>
>>
>> On 14.02.2018 12:50, Flavio Pompermaier wrote:
>>
>> The problem here is that I don't know the vertex id of the sink..would it
>> be possible to access the sink info by id?
>> And couldn't be all those info attached to the JobExecutionResult
>> (avoiding to set up all the rest connection etc)?
>>
>> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler 
>> wrote:
>>
>>> The only way to access this info from the client is the REST API
>>> 
>>> or the Metrics REST API
>>> .
>>>
>>>
>>>
>>> On 14.02.2018 12:38, Flavio Pompermaier wrote:
>>>
>>> Actually I'd like to get this number from my Java class in order to
>>> update some external dataset "catalog",
>>> so I'm asking if there's some programmatic way to access this info
>>> (from JobExecutionResult for example).
>>>
>>> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler 
>>> wrote:
>>>
 Do you want to know how many records the sink received, or how many the
 sink wrote to the DB?
 If it's the first you're in luck because we measure that already, check
 out the metrics documentation.
 If it's the latter, then this issue is essentially covered by
 FLINK-7286 which aims at allowing functions
 to modify the numRecordsIn/numRecordsOut counts.


 On 14.02.2018 12:22, Flavio Pompermaier wrote:

 Hi to all,
 I have a (batch) job that writes to 1 or more sinks.
 Is there a way to retrieve, once the job has terminated, the number of
 records written to each sink?
 Is there any better way than than using an accumulator for each sink?
 If that is the only way to do that, the Sink API could be enriched in
 order to automatically create an accumulator when required. E.g.

 dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
 .setDrivername(...)
 .setDBUrl(...)
 .setQuery(...)
 *.addRecordsCountAccumulator("some-name")*
 .finish())

 Best,
 Flavio



>>>
>>>
>>> --
>>> Flavio Pompermaier
>>> Development Department
>>>
>>> OKKAM S.r.l.
>>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>>
>>>
>>>
>>
>>
>> --
>> Flavio Pompermaier
>> Development Department
>>
>> OKKAM S.r.l.
>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>
>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>


Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
So, if I'm not wrong, the right way to do this is using accumulators..what
do you think about my proposal to add an easy way to add to a sink an
accumulator for the written/outputed records?

On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler 
wrote:

> Technically yes, a subset of metrics is stored in the ExecutionGraph when
> the job finishes. (This is for example where the webUI derives the values
> from for finished jobs). However these are on the task level, and will not
> contain the number of incoming records if your sink is chained to another
> operator. Changing this would be a larger endeavor, and tbh i don't see
> this happening soon.
>
> I'm afraid for now you're stuck with the REST API for finished jobs.
> (Correction for my previous mail: The metrics REST API cannot be used for
> finished jobs)
>
> Alternatively, if you rather want to work on files/json you can enable job
> archiving by configuring the jobmanager.archive.fs.dir directory. When
> the job finishes this will contain a big JSON file for each job containing
> all responses that the UI would return for finished jobs.
>
>
> On 14.02.2018 12:50, Flavio Pompermaier wrote:
>
> The problem here is that I don't know the vertex id of the sink..would it
> be possible to access the sink info by id?
> And couldn't be all those info attached to the JobExecutionResult
> (avoiding to set up all the rest connection etc)?
>
> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler 
> wrote:
>
>> The only way to access this info from the client is the REST API
>> 
>> or the Metrics REST API
>> .
>>
>>
>>
>> On 14.02.2018 12:38, Flavio Pompermaier wrote:
>>
>> Actually I'd like to get this number from my Java class in order to
>> update some external dataset "catalog",
>> so I'm asking if there's some programmatic way to access this info
>> (from JobExecutionResult for example).
>>
>> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler 
>> wrote:
>>
>>> Do you want to know how many records the sink received, or how many the
>>> sink wrote to the DB?
>>> If it's the first you're in luck because we measure that already, check
>>> out the metrics documentation.
>>> If it's the latter, then this issue is essentially covered by FLINK-7286
>>> which aims at allowing functions
>>> to modify the numRecordsIn/numRecordsOut counts.
>>>
>>>
>>> On 14.02.2018 12:22, Flavio Pompermaier wrote:
>>>
>>> Hi to all,
>>> I have a (batch) job that writes to 1 or more sinks.
>>> Is there a way to retrieve, once the job has terminated, the number of
>>> records written to each sink?
>>> Is there any better way than than using an accumulator for each sink?
>>> If that is the only way to do that, the Sink API could be enriched in
>>> order to automatically create an accumulator when required. E.g.
>>>
>>> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
>>> .setDrivername(...)
>>> .setDBUrl(...)
>>> .setQuery(...)
>>> *.addRecordsCountAccumulator("some-name")*
>>> .finish())
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>>
>>
>>
>> --
>> Flavio Pompermaier
>> Development Department
>>
>> OKKAM S.r.l.
>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>
>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>
>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809 <+39%200461%20041809>


Re: Retrieve written records of a sink after job

2018-02-14 Thread Chesnay Schepler
Technically yes, a subset of metrics is stored in the ExecutionGraph 
when the job finishes. (This is for example where the webUI derives the 
values from for finished jobs). However these are on the task level, and 
will not contain the number of incoming records if your sink is chained 
to another operator. Changing this would be a larger endeavor, and tbh i 
don't see this happening soon.


I'm afraid for now you're stuck with the REST API for finished jobs. 
(Correction for my previous mail: The metrics REST API cannot be used 
for finished jobs)


Alternatively, if you rather want to work on files/json you can enable 
job archiving by configuring the |jobmanager.archive.fs.dir| directory. 
When the job finishes this will contain a big JSON file for each job 
containing all responses that the UI would return for finished jobs.


On 14.02.2018 12:50, Flavio Pompermaier wrote:
The problem here is that I don't know the vertex id of the sink..would 
it be possible to access the sink info by id?
And couldn't be all those info attached to the JobExecutionResult 
(avoiding to set up all the rest connection etc)?


On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler > wrote:


The only way to access this info from the client is the REST API


or the Metrics REST API

.



On 14.02.2018 12:38, Flavio Pompermaier wrote:

Actually I'd like to get this number from my Java class in order
to update some external dataset "catalog",
so I'm asking if there's some programmatic way to access this
info (from JobExecutionResult for example).

On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler
> wrote:

Do you want to know how many records the sink received, or
how many the sink wrote to the DB?
If it's the first you're in luck because we measure that
already, check out the metrics documentation.
If it's the latter, then this issue is essentially covered by
FLINK-7286 which aims at allowing functions
to modify the numRecordsIn/numRecordsOut counts.


On 14.02.2018 12:22, Flavio Pompermaier wrote:

Hi to all,
I have a (batch) job that writes to 1 or more sinks.
Is there a way to retrieve, once the job has terminated, the
number of records written to each sink?
Is there any better way than than using an accumulator for
each sink?
If that is the only way to do that, the Sink API could be
enriched in order to automatically create an accumulator
when required. E.g.

dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername(...)
.setDBUrl(...)
.setQuery(...)
*.addRecordsCountAccumulator("some-name")*
.finish())

Best,
Flavio






-- 
Flavio Pompermaier

Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809 






--
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809





Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
The problem here is that I don't know the vertex id of the sink..would it
be possible to access the sink info by id?
And couldn't be all those info attached to the JobExecutionResult (avoiding
to set up all the rest connection etc)?

On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler 
wrote:

> The only way to access this info from the client is the REST API
> 
> or the Metrics REST API
> 
> .
>
>
> On 14.02.2018 12:38, Flavio Pompermaier wrote:
>
> Actually I'd like to get this number from my Java class in order to update
> some external dataset "catalog",
> so I'm asking if there's some programmatic way to access this info
> (from JobExecutionResult for example).
>
> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler 
> wrote:
>
>> Do you want to know how many records the sink received, or how many the
>> sink wrote to the DB?
>> If it's the first you're in luck because we measure that already, check
>> out the metrics documentation.
>> If it's the latter, then this issue is essentially covered by FLINK-7286
>> which aims at allowing functions
>> to modify the numRecordsIn/numRecordsOut counts.
>>
>>
>> On 14.02.2018 12:22, Flavio Pompermaier wrote:
>>
>> Hi to all,
>> I have a (batch) job that writes to 1 or more sinks.
>> Is there a way to retrieve, once the job has terminated, the number of
>> records written to each sink?
>> Is there any better way than than using an accumulator for each sink?
>> If that is the only way to do that, the Sink API could be enriched in
>> order to automatically create an accumulator when required. E.g.
>>
>> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
>> .setDrivername(...)
>> .setDBUrl(...)
>> .setQuery(...)
>> *.addRecordsCountAccumulator("some-name")*
>> .finish())
>>
>> Best,
>> Flavio
>>
>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>
>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809


Re: Retrieve written records of a sink after job

2018-02-14 Thread Chesnay Schepler
The only way to access this info from the client is the REST API 
 
or the Metrics REST API 
.


On 14.02.2018 12:38, Flavio Pompermaier wrote:
Actually I'd like to get this number from my Java class in order to 
update some external dataset "catalog",
so I'm asking if there's some programmatic way to access this info 
(from JobExecutionResult for example).


On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler > wrote:


Do you want to know how many records the sink received, or how
many the sink wrote to the DB?
If it's the first you're in luck because we measure that already,
check out the metrics documentation.
If it's the latter, then this issue is essentially covered by
FLINK-7286 which aims at allowing functions
to modify the numRecordsIn/numRecordsOut counts.


On 14.02.2018 12:22, Flavio Pompermaier wrote:

Hi to all,
I have a (batch) job that writes to 1 or more sinks.
Is there a way to retrieve, once the job has terminated, the
number of records written to each sink?
Is there any better way than than using an accumulator for each sink?
If that is the only way to do that, the Sink API could be
enriched in order to automatically create an accumulator when
required. E.g.

dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername(...)
.setDBUrl(...)
.setQuery(...)
*.addRecordsCountAccumulator("some-name")*
.finish())

Best,
Flavio






--
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809





Re: Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
Actually I'd like to get this number from my Java class in order to update
some external dataset "catalog",
so I'm asking if there's some programmatic way to access this info
(from JobExecutionResult for example).

On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler 
wrote:

> Do you want to know how many records the sink received, or how many the
> sink wrote to the DB?
> If it's the first you're in luck because we measure that already, check
> out the metrics documentation.
> If it's the latter, then this issue is essentially covered by FLINK-7286
> which aims at allowing functions
> to modify the numRecordsIn/numRecordsOut counts.
>
>
> On 14.02.2018 12:22, Flavio Pompermaier wrote:
>
> Hi to all,
> I have a (batch) job that writes to 1 or more sinks.
> Is there a way to retrieve, once the job has terminated, the number of
> records written to each sink?
> Is there any better way than than using an accumulator for each sink?
> If that is the only way to do that, the Sink API could be enriched in
> order to automatically create an accumulator when required. E.g.
>
> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
> .setDrivername(...)
> .setDBUrl(...)
> .setQuery(...)
> *.addRecordsCountAccumulator("some-name")*
> .finish())
>
> Best,
> Flavio
>
>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809


Re: Retrieve written records of a sink after job

2018-02-14 Thread Chesnay Schepler
Do you want to know how many records the sink received, or how many the 
sink wrote to the DB?
If it's the first you're in luck because we measure that already, check 
out the metrics documentation.
If it's the latter, then this issue is essentially covered by FLINK-7286 
which aims at allowing functions

to modify the numRecordsIn/numRecordsOut counts.

On 14.02.2018 12:22, Flavio Pompermaier wrote:

Hi to all,
I have a (batch) job that writes to 1 or more sinks.
Is there a way to retrieve, once the job has terminated, the number of 
records written to each sink?

Is there any better way than than using an accumulator for each sink?
If that is the only way to do that, the Sink API could be enriched in 
order to automatically create an accumulator when required. E.g.


dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername(...)
.setDBUrl(...)
.setQuery(...)
*.addRecordsCountAccumulator("some-name")*
.finish())

Best,
Flavio





Retrieve written records of a sink after job

2018-02-14 Thread Flavio Pompermaier
Hi to all,
I have a (batch) job that writes to 1 or more sinks.
Is there a way to retrieve, once the job has terminated, the number of
records written to each sink?
Is there any better way than than using an accumulator for each sink?
If that is the only way to do that, the Sink API could be enriched in order
to automatically create an accumulator when required. E.g.

dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername(...)
.setDBUrl(...)
.setQuery(...)
*.addRecordsCountAccumulator("some-name")*
.finish())

Best,
Flavio