Thanks Gabriel.
On Tue, Jun 30, 2015 at 1:04 AM, gabriel balan <[email protected]> wrote: > Hi > > Rather than trying to figure out the line number of the current line, you > can use the byte offset of the current line. > It's just as unique as the line number, and much easier to obtain: > TextInputFormat (FileInputFormat) uses it as the key. > > Keys are the position in the file, and values are the line of text. > > If you have multiple files, you may want to combine the file offset with > the file name (path) to get a unique id. See here how to get the input > file name in the mapper > <http://How%20to%20get%20the%20input%20file%20name%20in%20the%20mapper>. > > hth > Gabriel Balan > > > On 6/26/2015 5:29 AM, Ravikant Dindokar wrote: > > The problem can be thought as assigning line number for each line. Is > there any inbuilt functionality in hadoop which can do this? > > On Fri, Jun 26, 2015 at 1:11 PM, Ravikant Dindokar < > [email protected]> wrote: > >> yes , there can be loop in the graph >> >> On Fri, Jun 26, 2015 at 9:09 AM, Harshit Mathur <[email protected]> >> wrote: >> >>> Are there loops in your graph? >>> >>> >>> On Thu, Jun 25, 2015 at 10:39 PM, Ravikant Dindokar < >>> [email protected]> wrote: >>> >>>> Hi Hadoop user, >>>> >>>> I have a file containing one line for each edge in the graph with two >>>> vertex ids (source & sink). >>>> sample: >>>> 1 2 (here 1 is source and 2 is sink node for the edge) >>>> 1 5 >>>> 2 3 >>>> 4 2 >>>> 4 3 >>>> I want to assign a unique Id (Long value )to each edge i.e for each >>>> line of the file. >>>> >>>> How to ensure assignment of unique value in distributed mapper process? >>>> >>>> Note : File size is large, so using only one reducer is not feasible. >>>> >>>> Thanks >>>> Ravikant >>>> >>> >>> >>> >>> -- >>> Harshit Mathur >>> >> >> > > -- > The statements and opinions expressed here are my own and do not necessarily > represent those of Oracle Corporation. > >
