Re: Dataset rowCount accumulator

2019-02-04 Thread Flavio Pompermaier
Thinking about it I came up that adding a map function after the read is probably more general. Is there any "significant" difference in terms of performance in using such dedicated map function (that just reads a row, increment an accumulator and returns immediately) vs adding this accumulator

Dataset rowCount accumulator

2019-02-04 Thread Flavio Pompermaier
Hi to all, we often need to track the number of rows of a dataset. In order to burden on the job complexitye we use accumulators to track this information. The problem is that we have to extends all InputFormats that we use in order to properly handle such row-count accumulator...my question is: