Re: How to speed up reading from file?

2015-10-16 Thread Xiao Li
Hi, Saif,

The optimal number of rows per partition depends on many factors, right?
for example, your row size, your file system configuration, your
replication configuration and the performance of your underlying hardware.
The best way is to do the performance testing and tuning your
configurations. Generally, if each batch contains just a few MB, the
performance is bad compared with a bigger batch.

Check the following paper regarding the performance of Spark and MR,
http://www.vldb.org/pvldb/vol8/p2110-shi.pdf. It might help you understand
your use case. For example, caching can be used in your system.

Good luck,

Xiao Li

2015-10-16 14:08 GMT-07:00 :

> Hello,
>
> Is there an optimal number of partitions per number of rows, when writing
> into disk, so we can re-read later from source in a distributed way?
> Any  thoughts?
>
> Thanks
> Saif
>
>


How to speed up reading from file?

2015-10-16 Thread Saif.A.Ellafi
Hello,

Is there an optimal number of partitions per number of rows, when writing into 
disk, so we can re-read later from source in a distributed way?
Any  thoughts?

Thanks
Saif