Hi Flavio,

Not sure if I would add this functionality to the sinks.
You could also add a MapFunction with a counting Accumulator right before
each sink.

Best, Fabian


2018-02-14 14:11 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>:

> So, if I'm not wrong, the right way to do this is using accumulators..what
> do you think about my proposal to add an easy way to add to a sink an
> accumulator for the written/outputed records?
>
> On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> Technically yes, a subset of metrics is stored in the ExecutionGraph when
>> the job finishes. (This is for example where the webUI derives the values
>> from for finished jobs). However these are on the task level, and will not
>> contain the number of incoming records if your sink is chained to another
>> operator. Changing this would be a larger endeavor, and tbh i don't see
>> this happening soon.
>>
>> I'm afraid for now you're stuck with the REST API for finished jobs.
>> (Correction for my previous mail: The metrics REST API cannot be used for
>> finished jobs)
>>
>> Alternatively, if you rather want to work on files/json you can enable
>> job archiving by configuring the jobmanager.archive.fs.dir directory.
>> When the job finishes this will contain a big JSON file for each job
>> containing all responses that the UI would return for finished jobs.
>>
>>
>> On 14.02.2018 12:50, Flavio Pompermaier wrote:
>>
>> The problem here is that I don't know the vertex id of the sink..would it
>> be possible to access the sink info by id?
>> And couldn't be all those info attached to the JobExecutionResult
>> (avoiding to set up all the rest connection etc)?
>>
>> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> The only way to access this info from the client is the REST API
>>> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#details-of-a-running-or-completed-job>
>>> or the Metrics REST API
>>> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#rest-api-integration>.
>>>
>>>
>>>
>>> On 14.02.2018 12:38, Flavio Pompermaier wrote:
>>>
>>> Actually I'd like to get this number from my Java class in order to
>>> update some external dataset "catalog",
>>> so I'm asking if there's some programmatic way to access this info
>>> (from JobExecutionResult for example).
>>>
>>> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> Do you want to know how many records the sink received, or how many the
>>>> sink wrote to the DB?
>>>> If it's the first you're in luck because we measure that already, check
>>>> out the metrics documentation.
>>>> If it's the latter, then this issue is essentially covered by
>>>> FLINK-7286 which aims at allowing functions
>>>> to modify the numRecordsIn/numRecordsOut counts.
>>>>
>>>>
>>>> On 14.02.2018 12:22, Flavio Pompermaier wrote:
>>>>
>>>> Hi to all,
>>>> I have a (batch) job that writes to 1 or more sinks.
>>>> Is there a way to retrieve, once the job has terminated, the number of
>>>> records written to each sink?
>>>> Is there any better way than than using an accumulator for each sink?
>>>> If that is the only way to do that, the Sink API could be enriched in
>>>> order to automatically create an accumulator when required. E.g.
>>>>
>>>> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
>>>>             .setDrivername(...)
>>>>             .setDBUrl(...)
>>>>             .setQuery(...)
>>>>             *.addRecordsCountAccumulator("some-name")*
>>>>             .finish())
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Flavio Pompermaier
>>> Development Department
>>>
>>> OKKAM S.r.l.
>>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>>
>>>
>>>
>>
>>
>> --
>> Flavio Pompermaier
>> Development Department
>>
>> OKKAM S.r.l.
>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>
>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>

Reply via email to