SequenceFile is/was also the standard for binary data on Hadoop. The
question is rather : what else would you expect? Surely not a text format?
Bertrand
On Fri, Nov 7, 2014 at 3:51 AM, Lee S sle...@gmail.com wrote:
any other reasons or can you give a thorough analysis?
2014-11-05 11:00
Also it's the easiest way to SerDe any complex stuff and get split + block
compression features since SeqFiles are splittable and could be compressed
by default. See the code, it has really complex stuff to transfer between
jobs.
2014-11-10 3:06 GMT+03:00 Bertrand Dechoux decho...@gmail.com:
any other reasons or can you give a thorough analysis?
2014-11-05 11:00 GMT+08:00 Ted Dunning ted.dunn...@gmail.com:
Yes, type conversion is a reason.
Sent from my iPhone
On Nov 4, 2014, at 18:59, Lee S sle...@gmail.com wrote:
eg. kmeans input:
1,2,3,4 //text file
kmeans output
Hi all:
I'm wondering why the input and output of most algorithm like
kmeans,naivebayes are all sequencefiles. One more step of conversion need
to be done if we want the algorithm works.And
I think the step is time consuming. Because it's also a mapreduce job.
For the reason to deal with small
What should the input be?
On Tue, Nov 4, 2014 at 12:28 AM, Lee S sle...@gmail.com wrote:
Hi all:
I'm wondering why the input and output of most algorithm like
kmeans,naivebayes are all sequencefiles. One more step of conversion need
to be done if we want the algorithm works.And
I think
eg. kmeans input:
1,2,3,4 //text file
kmeans output:
point1, point2,point3(text file of center points)
I just thought of one reason. The input data should be storaged in
vector(dense or sparse) format ,so a conversion step
needs to be doned before algorithms deal with data. Is that right?
Yes, type conversion is a reason.
Sent from my iPhone
On Nov 4, 2014, at 18:59, Lee S sle...@gmail.com wrote:
eg. kmeans input:
1,2,3,4 //text file
kmeans output
point1, point2,point3(text file of center points)
I just thought of one reason. The input data should be storaged in