, 2015 11:07 AM
To: Shuai Zheng
Cc: user
Subject: Re: How to Take the whole file as a partition
You situation is special. It seems to me Spark may not fit well in your case.
You want to process the individual files (500M~2G) as a whole, you want good
performance.
You may want to write our
Have a look at the sparkContext.binaryFiles, it works like wholeTextFiles but
returns a PortableDataStream per file. It might be a workable solution though
you'll need to handle the binary to UTF-8 or equivalent conversion
Thanks,
Ewan
From: Shuai Zheng [mailto:szheng.c...@gmail.com]
Sent: 03 S
You situation is special. It seems to me Spark may not fit well in your
case.
You want to process the individual files (500M~2G) as a whole, you want
good performance.
You may want to write our own Scala/Java programs and distribute it along
with those files across your cluster, and run them in p