Hi
Rather than trying to figure out the line number of the current line, you can
use the byte offset of the current line.
It's just as unique as the line number, and much easier to obtain:
TextInputFormat (FileInputFormat) uses it as the key.
Keys are the position in the file, and values are the line of text.
If you have multiple files, you may want to combine the file offset with the file
name (path) to get a unique id. See here how to get the input file name in the mapper
<How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.
hth
Gabriel Balan
On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
The problem can be thought as assigning line number for each line. Is there any
inbuilt functionality in hadoop which can do this?
On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <[email protected]
<mailto:[email protected]>> wrote:
yes , there can be loop in the graph
On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <[email protected]
<mailto:[email protected]>> wrote:
Are there loops in your graph?
On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <[email protected]
<mailto:[email protected]>> wrote:
Hi Hadoop user,
I have a file containing one line for each edge in the graph with two
vertex ids (source & sink).
sample:
1 2 (here 1 is source and 2 is sink node for the edge)
1 5
2 3
4 2
4 3
I want to assign a unique Id (Long value )to each edge i.e for each
line of the file.
How to ensure assignment of unique value in distributed mapper
process?
Note : File size is large, so using only one reducer is not
feasible.
Thanks
Ravikant
--
Harshit Mathur
--
The statements and opinions expressed here are my own and do not necessarily
represent those of Oracle Corporation.