Re: Spark SQL Parallelism - While reading from Oracle

@Sanjiv Singh Wed, 10 Aug 2016 08:29:30 -0700

Use it ....
You can set up all the properties (driver,partitionColumn, lowerBound,
upperBound, numPartitions) you should start with the driver at first.


Now you have the maximum id so you can use it for the upperBound parameter.
The numPartitions now based on your table's dimensions and your actual
system what you use. Now with this snippet you read a database table to a
dataframe with Spark.

df = sqlContext.read.format('jdbc').options(
        
url="jdbc:mysql://ip-address:3306/sometable?user=username&password=password",
        dbtable=*sometable*,
        driver="com.mysql.jdbc.Driver",
        *partitionColumn*="id",
        *lowerBound *= 1,
        *upperBound *= maxId,
        *numPartitions *= 100
        ).load()



Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Aug 10, 2016 at 6:35 AM, Siva A <siva9940261...@gmail.com> wrote:

> Hi Team,
>
> How do we increase the parallelism in Spark SQL.
> In Spark Core, we can re-partition or pass extra arguments part of the
> transformation.
>
> I am trying the below example,
>
> val df1 = sqlContext.read.format("jdbc").options(Map(...)).load
> val df2= df1.cache
> val df2.count
>
> Here count operation using only one task. I couldn't increase the
> parallelism.
> Thanks in advance
>
> Thanks
> Siva
>

Re: Spark SQL Parallelism - While reading from Oracle

Reply via email to