RE: How to Take the whole file as a partition

2015-09-03 Thread Shuai Zheng
, 2015 11:07 AM To: Shuai Zheng Cc: user Subject: Re: How to Take the whole file as a partition You situation is special. It seems to me Spark may not fit well in your case. You want to process the individual files (500M~2G) as a whole, you want good performance. You may want to write our

RE: How to Take the whole file as a partition

2015-09-03 Thread Ewan Leith
Have a look at the sparkContext.binaryFiles, it works like wholeTextFiles but returns a PortableDataStream per file. It might be a workable solution though you'll need to handle the binary to UTF-8 or equivalent conversion Thanks, Ewan From: Shuai Zheng [mailto:szheng.c...@gmail.com] Sent: 03 S

Re: How to Take the whole file as a partition

2015-09-03 Thread Tao Lu
You situation is special. It seems to me Spark may not fit well in your case. You want to process the individual files (500M~2G) as a whole, you want good performance. You may want to write our own Scala/Java programs and distribute it along with those files across your cluster, and run them in p