Hi Flavio, Not sure if I would add this functionality to the sinks. You could also add a MapFunction with a counting Accumulator right before each sink.
Best, Fabian 2018-02-14 14:11 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > So, if I'm not wrong, the right way to do this is using accumulators..what > do you think about my proposal to add an easy way to add to a sink an > accumulator for the written/outputed records? > > On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler <ches...@apache.org> > wrote: > >> Technically yes, a subset of metrics is stored in the ExecutionGraph when >> the job finishes. (This is for example where the webUI derives the values >> from for finished jobs). However these are on the task level, and will not >> contain the number of incoming records if your sink is chained to another >> operator. Changing this would be a larger endeavor, and tbh i don't see >> this happening soon. >> >> I'm afraid for now you're stuck with the REST API for finished jobs. >> (Correction for my previous mail: The metrics REST API cannot be used for >> finished jobs) >> >> Alternatively, if you rather want to work on files/json you can enable >> job archiving by configuring the jobmanager.archive.fs.dir directory. >> When the job finishes this will contain a big JSON file for each job >> containing all responses that the UI would return for finished jobs. >> >> >> On 14.02.2018 12:50, Flavio Pompermaier wrote: >> >> The problem here is that I don't know the vertex id of the sink..would it >> be possible to access the sink info by id? >> And couldn't be all those info attached to the JobExecutionResult >> (avoiding to set up all the rest connection etc)? >> >> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> The only way to access this info from the client is the REST API >>> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#details-of-a-running-or-completed-job> >>> or the Metrics REST API >>> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#rest-api-integration>. >>> >>> >>> >>> On 14.02.2018 12:38, Flavio Pompermaier wrote: >>> >>> Actually I'd like to get this number from my Java class in order to >>> update some external dataset "catalog", >>> so I'm asking if there's some programmatic way to access this info >>> (from JobExecutionResult for example). >>> >>> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> Do you want to know how many records the sink received, or how many the >>>> sink wrote to the DB? >>>> If it's the first you're in luck because we measure that already, check >>>> out the metrics documentation. >>>> If it's the latter, then this issue is essentially covered by >>>> FLINK-7286 which aims at allowing functions >>>> to modify the numRecordsIn/numRecordsOut counts. >>>> >>>> >>>> On 14.02.2018 12:22, Flavio Pompermaier wrote: >>>> >>>> Hi to all, >>>> I have a (batch) job that writes to 1 or more sinks. >>>> Is there a way to retrieve, once the job has terminated, the number of >>>> records written to each sink? >>>> Is there any better way than than using an accumulator for each sink? >>>> If that is the only way to do that, the Sink API could be enriched in >>>> order to automatically create an accumulator when required. E.g. >>>> >>>> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat() >>>> .setDrivername(...) >>>> .setDBUrl(...) >>>> .setQuery(...) >>>> *.addRecordsCountAccumulator("some-name")* >>>> .finish()) >>>> >>>> Best, >>>> Flavio >>>> >>>> >>>> >>> >>> >>> -- >>> Flavio Pompermaier >>> Development Department >>> >>> OKKAM S.r.l. >>> Tel. +(39) 0461 041809 <+39%200461%20041809> >>> >>> >>> >> >> >> -- >> Flavio Pompermaier >> Development Department >> >> OKKAM S.r.l. >> Tel. +(39) 0461 041809 <+39%200461%20041809> >> >> >> > > > -- > Flavio Pompermaier > Development Department > > OKKAM S.r.l. > Tel. +(39) 0461 041809 <+39%200461%20041809> >