Re: muti-thread mapreduce

2012-12-13 Thread Yu Yang
Thank you all. In fact, I don't expect that this way can help to enhance the performance. I need to process 3 different logs (with different format). I just want to sart all these 3 logs processing at the same time , all in just this one program. but I can give different separator to each threa

Re: muti-thread mapreduce

2012-12-13 Thread Harsh J
I suppose you could also leverage job configuration or per-input mapper impl. via MultipleInputs to do this. On Thu, Dec 13, 2012 at 5:44 PM, Yu Yang wrote: > Thank you all. In fact, I don't expect that this way can help to enhance > the performance. > I need to process 3 different logs (with

Map output files and partitions.

2012-12-13 Thread Pedro Sá da Costa
Hi, There only 2 types of map output files, Sequence and Text files. If those files are going to be used as input to several reduce tasks, they need to be partitioned into blocks. Is there any SEPARATOR bits that limits each partition? Can I read a specific partition of a map output file? Is there

Re: Map output files and partitions.

2012-12-13 Thread Harsh J
Map output files, by which you perhaps mean intermediate data files for temporary K/V persistence, are stored in IFiles. They do not use text nor sequence files (historically though, they did use sequence files at some point). You can read the IFile's sources at http://svn.apache.org/repos/asf/had

Re: Map output files and partitions.

2012-12-13 Thread Mohammad Tariq
Hello Pedro, The first part of your question is very well covered by Harsh. For the second part, the generation and no. of partitions is governed by the getPartition() Method present in the 'Partition' Interface. The default behavior is to create partitions based on Hashing. You can have y