Re: Why do most algorithms use sequencefile as input and output?

2014-11-09 Thread Bertrand Dechoux
SequenceFile is/was also the standard for binary data on Hadoop. The question is rather : what else would you expect? Surely not a text format? Bertrand On Fri, Nov 7, 2014 at 3:51 AM, Lee S sle...@gmail.com wrote: any other reasons or can you give a thorough analysis? 2014-11-05 11:00

Re: Why do most algorithms use sequencefile as input and output?

2014-11-09 Thread Serega Sheypak
Also it's the easiest way to SerDe any complex stuff and get split + block compression features since SeqFiles are splittable and could be compressed by default. See the code, it has really complex stuff to transfer between jobs. 2014-11-10 3:06 GMT+03:00 Bertrand Dechoux decho...@gmail.com:

Re: Why do most algorithms use sequencefile as input and output?

2014-11-06 Thread Lee S
any other reasons or can you give a thorough analysis? 2014-11-05 11:00 GMT+08:00 Ted Dunning ted.dunn...@gmail.com: Yes, type conversion is a reason. Sent from my iPhone On Nov 4, 2014, at 18:59, Lee S sle...@gmail.com wrote: eg. kmeans input: 1,2,3,4 //text file kmeans output

Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Lee S
Hi all: I'm wondering why the input and output of most algorithm like kmeans,naivebayes are all sequencefiles. One more step of conversion need to be done if we want the algorithm works.And I think the step is time consuming. Because it's also a mapreduce job. For the reason to deal with small

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Ted Dunning
What should the input be? On Tue, Nov 4, 2014 at 12:28 AM, Lee S sle...@gmail.com wrote: Hi all: I'm wondering why the input and output of most algorithm like kmeans,naivebayes are all sequencefiles. One more step of conversion need to be done if we want the algorithm works.And I think

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Lee S
eg. kmeans input: 1,2,3,4 //text file kmeans output: point1, point2,point3(text file of center points) I just thought of one reason. The input data should be storaged in vector(dense or sparse) format ,so a conversion step needs to be doned before algorithms deal with data. Is that right?

Re: Why do most algorithms use sequencefile as input and output?

2014-11-04 Thread Ted Dunning
Yes, type conversion is a reason. Sent from my iPhone On Nov 4, 2014, at 18:59, Lee S sle...@gmail.com wrote: eg. kmeans input: 1,2,3,4 //text file kmeans output point1, point2,point3(text file of center points) I just thought of one reason. The input data should be storaged in