I had that problem/question some time ago, too.
The quick fix is to just put the line number in the line itself. Go for it.
However, we worked out a solution for another distributed processing
system, that did the following:
Read each partition, count the lines, broadcast a map
"partition->lin
Hi Anastasiia,
this is difficult because the input is usually read in parallel, i.e., an
input file is split into several blogs which are independently read and
processed by different threads (possibly on different machines). So it is
difficult to have a sequential row number.
If all rows have th
Is there a way to get the current line number (or generally the number of
element currently being processed) inside a mapper?
The example is a matrix you read line-line by line from the file and need both
the row and the column numbers. Column number is easy to get, but how to know
the row numb