Hi

Rather than trying to figure out the line number of the current line, you can 
use the byte offset of the current line.
It's just as unique as the line number, and much easier to obtain: 
TextInputFormat (FileInputFormat) uses it as the key.

   Keys are the position in the file, and values are the line of text.

If you have multiple files, you may want to combine the file offset with the file 
name (path) to get a unique id. See here how to get the input file name in the mapper 
<How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>.

hth
Gabriel Balan

On 6/26/2015 5:29 AM, Ravikant Dindokar wrote:
The problem can be thought as assigning line number for each line. Is there any 
inbuilt functionality in hadoop which can do this?

On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar <[email protected] 
<mailto:[email protected]>> wrote:

    yes , there can be loop in the graph

    On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <[email protected] 
<mailto:[email protected]>> wrote:

        Are there loops in your graph?


        On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar <[email protected] 
<mailto:[email protected]>> wrote:

            Hi Hadoop user,

            I have a file containing one line for each edge in the graph with two 
vertex ids (source & sink).
            sample:
            1    2 (here 1 is source and 2 is sink node for the edge)
            1    5
            2    3
            4    2
            4    3
            I want to assign a unique Id (Long value )to each edge i.e for each 
line of the file.

            How to ensure assignment of unique value in distributed mapper 
process?

            Note : File size is large, so using only one reducer is not 
feasible.

            Thanks
            Ravikant




-- Harshit Mathur




--
The statements and opinions expressed here are my own and do not necessarily 
represent those of Oracle Corporation.

Reply via email to