Re: NLine Input Format

2008-11-19 Thread Rahul Tenany
Hi Amareshwari,It is in the ToolRunner.run() method that i am setting
the FileInputFormat as NLineInputFormat and in the same function i am
setting the mapred.line.input.format.linespermap property. Will that not
work? How can i overload LineRecordReader, so that it returns the value as N
Lines?

Thanks
Rahul

On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu 
[EMAIL PROTECTED] wrote:

 Hi Rahul,

 How did you set the configuration mapred.line.input.format.linespermap
 and your input format? You have to set them in hadoop-site.xml or pass them
 through -D option to the job.
 NLineInputFormat will split N lines of input as one split. So, each map
 gets N lines.
 But the RecordReader is still LineRecordReader, which reads one line at
 time, thereby Key is the offset in the file and Value is the line.
 If you want N lines as Key, you may to override LineRecordReader.

 Thanks
 Amareshwari


 Rahul Tenany wrote:

 Hi,   I am writing a Binary Search Tree on Hadoop and for the same i
 require
 to use NLineInputFormat. I'll read n lines at a time, convert the numbers
 in
 each line from string to int and then insert them into the binary tree.
 Once
 the binary tree is made i'll search for elements in it. But even if i set
 that input format as NLineInputFormat and set the
 mapred.line.input.format.linespermap
 to 10, i am able to read only 1 line at the time. Any idea where am i
 going
 wrong? How can i find whether NLineInputFormat is working or not?

 I want my program to work for any object that is comparable and not just
 integers, so in there any way i can read NObjects at a time?

 I am completely stuck. Any help will be appreciated.

 Thanks
 Rahul







Re: NLine Input Format

2008-11-19 Thread Amareshwari Sriramadasu

Rahul Tenany wrote:

Hi Amareshwari,
It is in the ToolRunner.run() method that i am setting the 
FileInputFormat as NLineInputFormat and in the same function i am 
setting the mapred.line.input.format.linespermap property. Will that 
not work? How can i overload LineRecordReader, so that it returns the 
value as N Lines?


Setting Configuration in run() method will also work. You have to extend 
LineRecordReader and override method next() to return N lines as value 
instead of 1 line.


Thanks
Amareshwari


Thanks
Rahul

On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:


Hi Rahul,

How did you set the configuration
mapred.line.input.format.linespermap and your input format? You
have to set them in hadoop-site.xml or pass them through -D option
to the job.
NLineInputFormat will split N lines of input as one split. So,
each map gets N lines.
But the RecordReader is still LineRecordReader, which reads one
line at time, thereby Key is the offset in the file and Value is
the line.
If you want N lines as Key, you may to override LineRecordReader.

Thanks
Amareshwari


Rahul Tenany wrote:

Hi,   I am writing a Binary Search Tree on Hadoop and for the
same i require
to use NLineInputFormat. I'll read n lines at a time, convert
the numbers in
each line from string to int and then insert them into the
binary tree. Once
the binary tree is made i'll search for elements in it. But
even if i set
that input format as NLineInputFormat and set the
mapred.line.input.format.linespermap
to 10, i am able to read only 1 line at the time. Any idea
where am i going
wrong? How can i find whether NLineInputFormat is working or not?

I want my program to work for any object that is comparable
and not just
integers, so in there any way i can read NObjects at a time?

I am completely stuck. Any help will be appreciated.

Thanks
Rahul

 








Re: NLine Input Format

2008-11-19 Thread Amareshwari Sriramadasu

Rahul Tenany wrote:

Hi Amareshwari,
It is in the ToolRunner.run() method that i am setting the 
FileInputFormat as NLineInputFormat and in the same function i am 
setting the mapred.line.input.format.linespermap property. Will that 
not work? How can i overload LineRecordReader, so that it returns the 
value as N Lines?


Thanks
Rahul

On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:


Hi Rahul,

How did you set the configuration
mapred.line.input.format.linespermap and your input format? You
have to set them in hadoop-site.xml or pass them through -D option
to the job.
NLineInputFormat will split N lines of input as one split. So,
each map gets N lines.
But the RecordReader is still LineRecordReader, which reads one
line at time, thereby Key is the offset in the file and Value is
the line.
If you want N lines as Key, you may to override LineRecordReader.

Thanks
Amareshwari


Rahul Tenany wrote:

Hi,   I am writing a Binary Search Tree on Hadoop and for the
same i require
to use NLineInputFormat. I'll read n lines at a time, convert
the numbers in
each line from string to int and then insert them into the
binary tree. Once
the binary tree is made i'll search for elements in it. But
even if i set
that input format as NLineInputFormat and set the
mapred.line.input.format.linespermap
to 10, i am able to read only 1 line at the time. Any idea
where am i going
wrong? How can i find whether NLineInputFormat is working or not?

I want my program to work for any object that is comparable
and not just
integers, so in there any way i can read NObjects at a time?

I am completely stuck. Any help will be appreciated.

Thanks
Rahul

 




One more thing, I don't think you need to use NLineInputFormat for your 
requirement. NLineInputFormat splits N lines as one split, thus each map 
processes N lines. In your application, you don't want each map to 
process just N lines, but you want value as N lines, right? So, you 
should right a new input format extending FileInputFormat and 
getRecordReader should return your new RecordReader implementation. Does 
this make sense?


Thanks
Amareshwari



Re: NLine Input Format

2008-11-16 Thread Amareshwari Sriramadasu

Hi Rahul,

How did you set the configuration mapred.line.input.format.linespermap 
and your input format? You have to set them in hadoop-site.xml or pass 
them through -D option to the job.
NLineInputFormat will split N lines of input as one split. So, each map 
gets N lines.
But the RecordReader is still LineRecordReader, which reads one line at 
time, thereby Key is the offset in the file and Value is the line.

If you want N lines as Key, you may to override LineRecordReader.

Thanks
Amareshwari

Rahul Tenany wrote:

Hi,   I am writing a Binary Search Tree on Hadoop and for the same i require
to use NLineInputFormat. I'll read n lines at a time, convert the numbers in
each line from string to int and then insert them into the binary tree. Once
the binary tree is made i'll search for elements in it. But even if i set
that input format as NLineInputFormat and set the
mapred.line.input.format.linespermap
to 10, i am able to read only 1 line at the time. Any idea where am i going
wrong? How can i find whether NLineInputFormat is working or not?

I want my program to work for any object that is comparable and not just
integers, so in there any way i can read NObjects at a time?

I am completely stuck. Any help will be appreciated.

Thanks
Rahul