Hi Meghajit,

Thanks for the feedback. I have fired a ticket:
https://issues.apache.org/jira/browse/FLINK-28117

Best regards,
Jing

On Mon, Jun 13, 2022 at 7:23 AM Meghajit Mazumdar <
meghajit.mazum...@gojek.com> wrote:

> Hi folks,
>
> Thanks for the reply.
> We have implemented our own SplitAssigner, FileReaderFormat and
> FileReaderFormat.Reader implementations. Hence, we plan to add custom
> metrics such as these:
> 1. No. of splits SplitAssigner is initialized with, number of splits
> re-added back to the SplitAssigner
> 2. Readers created per unit time
> 3. Time taken to create a reader
> 4. Time taken for the Reader to produce a single Row
> 5. Readers closed per unit time
> ... and some more
>
> However, since we haven't implemented our own FileSource or
> SplitEnumerator, we don't have visibility into the metrics of these
> components. We would ideally like to measure these:
> 1. Number of rows emitted by the source per unit time
> 2. Time taken by the enumerator to discover the splits
> 3. Total splits discovered
>
>
> Regards,
> Meghajit
>
>
> On Fri, Jun 10, 2022 at 10:04 PM Jing Ge <j...@ververica.com> wrote:
>
>> Hi meghajit,
>>
>> I think it makes sense to extend the current metrics. Could you list all
>> metrics you need? Thanks!
>>
>> Best regards,
>> Jing
>>
>> On Fri, Jun 10, 2022 at 5:06 PM Lijie Wang <wangdachui9...@gmail.com>
>> wrote:
>>
>>> Hi Meghajit,
>>>
>>> As far as I know, currently, the FileSource does not have the metrics
>>> you need.  You can implement your own source, and register custom metrics
>>> via `SplitEnumeratorContext#metricGroup` and
>>> `SourceReaderContext#metricGroup`.
>>>
>>> Best,
>>> Lijie
>>>
>>> Meghajit Mazumdar <meghajit.mazum...@gojek.com> 于2022年6月10日周五 16:36写道:
>>>
>>>> Hello,
>>>>
>>>> We are working on a Flink project which uses FileSource to discover and
>>>> read Parquet Files from GCS. ( using Flink 1.14)
>>>>
>>>> As part of this, we wanted to implement some health metrics around the
>>>> code.
>>>> I wanted to know whether Flink gathers some metrics by itself around
>>>> FileSource, e;g, number of files discovered by the SplitEnumerator, number
>>>> of files added back to SplitAssigner, time taken to process per split, etc 
>>>> ?
>>>>
>>>> I checked in the official documentation
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/filesystem/>
>>>> but there doesn't appear to be. Is the solution then to implement
>>>> custom metrics like this
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/metrics/>
>>>> ?
>>>>
>>>>
>>>> *Regards,*
>>>> *Meghajit*
>>>>
>>>
>
> --
> *Regards,*
> *Meghajit*
>

Reply via email to