This all depends if you provide information to Driver on the underlying
RDBMS table and assuming that there is a unique ID on the underlying table
you can use to partition the load.

Have a look at this

http://metricbrew.com/get-data-from-databases-with-apache-spark-jdbc/

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 4 April 2016 at 09:09, Zhang, Jingyu <jingyu.zh...@news.com.au> wrote:

> Hi All,
>
> I want read Mysql from Spark. Please let me know how many threads will be
> used to read the RDBMS after set numPartitions =10 in Spark JDBC. What is
> the best practice to read large dataset from RDBMS to Spark?
>
> Thanks,
>
> Jingyu
>
> This message and its attachments may contain legally privileged or
> confidential information. It is intended solely for the named addressee. If
> you are not the addressee indicated in this message or responsible for
> delivery of the message to the addressee, you may not copy or deliver this
> message or its attachments to anyone. Rather, you should permanently delete
> this message and its attachments and kindly notify the sender by reply
> e-mail. Any content of this message and its attachments which does not
> relate to the official business of the sending company must be taken not to
> have been sent or endorsed by that company or any of its related entities.
> No warranty is made that the e-mail or attachments are free from computer
> virus or other defect.

Reply via email to