Re: NLine Input Format
Hi Amareshwari,It is in the ToolRunner.run() method that i am setting the FileInputFormat as NLineInputFormat and in the same function i am setting the mapred.line.input.format.linespermap property. Will that not work? How can i overload LineRecordReader, so that it returns the value as N Lines? Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu [EMAIL PROTECTED] wrote: Hi Rahul, How did you set the configuration mapred.line.input.format.linespermap and your input format? You have to set them in hadoop-site.xml or pass them through -D option to the job. NLineInputFormat will split N lines of input as one split. So, each map gets N lines. But the RecordReader is still LineRecordReader, which reads one line at time, thereby Key is the offset in the file and Value is the line. If you want N lines as Key, you may to override LineRecordReader. Thanks Amareshwari Rahul Tenany wrote: Hi, I am writing a Binary Search Tree on Hadoop and for the same i require to use NLineInputFormat. I'll read n lines at a time, convert the numbers in each line from string to int and then insert them into the binary tree. Once the binary tree is made i'll search for elements in it. But even if i set that input format as NLineInputFormat and set the mapred.line.input.format.linespermap to 10, i am able to read only 1 line at the time. Any idea where am i going wrong? How can i find whether NLineInputFormat is working or not? I want my program to work for any object that is comparable and not just integers, so in there any way i can read NObjects at a time? I am completely stuck. Any help will be appreciated. Thanks Rahul
Re: NLine Input Format
Rahul Tenany wrote: Hi Amareshwari, It is in the ToolRunner.run() method that i am setting the FileInputFormat as NLineInputFormat and in the same function i am setting the mapred.line.input.format.linespermap property. Will that not work? How can i overload LineRecordReader, so that it returns the value as N Lines? Setting Configuration in run() method will also work. You have to extend LineRecordReader and override method next() to return N lines as value instead of 1 line. Thanks Amareshwari Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi Rahul, How did you set the configuration mapred.line.input.format.linespermap and your input format? You have to set them in hadoop-site.xml or pass them through -D option to the job. NLineInputFormat will split N lines of input as one split. So, each map gets N lines. But the RecordReader is still LineRecordReader, which reads one line at time, thereby Key is the offset in the file and Value is the line. If you want N lines as Key, you may to override LineRecordReader. Thanks Amareshwari Rahul Tenany wrote: Hi, I am writing a Binary Search Tree on Hadoop and for the same i require to use NLineInputFormat. I'll read n lines at a time, convert the numbers in each line from string to int and then insert them into the binary tree. Once the binary tree is made i'll search for elements in it. But even if i set that input format as NLineInputFormat and set the mapred.line.input.format.linespermap to 10, i am able to read only 1 line at the time. Any idea where am i going wrong? How can i find whether NLineInputFormat is working or not? I want my program to work for any object that is comparable and not just integers, so in there any way i can read NObjects at a time? I am completely stuck. Any help will be appreciated. Thanks Rahul
Re: NLine Input Format
Rahul Tenany wrote: Hi Amareshwari, It is in the ToolRunner.run() method that i am setting the FileInputFormat as NLineInputFormat and in the same function i am setting the mapred.line.input.format.linespermap property. Will that not work? How can i overload LineRecordReader, so that it returns the value as N Lines? Thanks Rahul On Mon, Nov 17, 2008 at 9:43 AM, Amareshwari Sriramadasu [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hi Rahul, How did you set the configuration mapred.line.input.format.linespermap and your input format? You have to set them in hadoop-site.xml or pass them through -D option to the job. NLineInputFormat will split N lines of input as one split. So, each map gets N lines. But the RecordReader is still LineRecordReader, which reads one line at time, thereby Key is the offset in the file and Value is the line. If you want N lines as Key, you may to override LineRecordReader. Thanks Amareshwari Rahul Tenany wrote: Hi, I am writing a Binary Search Tree on Hadoop and for the same i require to use NLineInputFormat. I'll read n lines at a time, convert the numbers in each line from string to int and then insert them into the binary tree. Once the binary tree is made i'll search for elements in it. But even if i set that input format as NLineInputFormat and set the mapred.line.input.format.linespermap to 10, i am able to read only 1 line at the time. Any idea where am i going wrong? How can i find whether NLineInputFormat is working or not? I want my program to work for any object that is comparable and not just integers, so in there any way i can read NObjects at a time? I am completely stuck. Any help will be appreciated. Thanks Rahul One more thing, I don't think you need to use NLineInputFormat for your requirement. NLineInputFormat splits N lines as one split, thus each map processes N lines. In your application, you don't want each map to process just N lines, but you want value as N lines, right? So, you should right a new input format extending FileInputFormat and getRecordReader should return your new RecordReader implementation. Does this make sense? Thanks Amareshwari
Re: NLine Input Format
Hi Rahul, How did you set the configuration mapred.line.input.format.linespermap and your input format? You have to set them in hadoop-site.xml or pass them through -D option to the job. NLineInputFormat will split N lines of input as one split. So, each map gets N lines. But the RecordReader is still LineRecordReader, which reads one line at time, thereby Key is the offset in the file and Value is the line. If you want N lines as Key, you may to override LineRecordReader. Thanks Amareshwari Rahul Tenany wrote: Hi, I am writing a Binary Search Tree on Hadoop and for the same i require to use NLineInputFormat. I'll read n lines at a time, convert the numbers in each line from string to int and then insert them into the binary tree. Once the binary tree is made i'll search for elements in it. But even if i set that input format as NLineInputFormat and set the mapred.line.input.format.linespermap to 10, i am able to read only 1 line at the time. Any idea where am i going wrong? How can i find whether NLineInputFormat is working or not? I want my program to work for any object that is comparable and not just integers, so in there any way i can read NObjects at a time? I am completely stuck. Any help will be appreciated. Thanks Rahul