Use it .... You can set up all the properties (driver,partitionColumn, lowerBound, upperBound, numPartitions) you should start with the driver at first.
Now you have the maximum id so you can use it for the upperBound parameter. The numPartitions now based on your table's dimensions and your actual system what you use. Now with this snippet you read a database table to a dataframe with Spark. df = sqlContext.read.format('jdbc').options( url="jdbc:mysql://ip-address:3306/sometable?user=username&password=password", dbtable=*sometable*, driver="com.mysql.jdbc.Driver", *partitionColumn*="id", *lowerBound *= 1, *upperBound *= maxId, *numPartitions *= 100 ).load() Regards Sanjiv Singh Mob : +091 9990-447-339 On Wed, Aug 10, 2016 at 6:35 AM, Siva A <siva9940261...@gmail.com> wrote: > Hi Team, > > How do we increase the parallelism in Spark SQL. > In Spark Core, we can re-partition or pass extra arguments part of the > transformation. > > I am trying the below example, > > val df1 = sqlContext.read.format("jdbc").options(Map(...)).load > val df2= df1.cache > val df2.count > > Here count operation using only one task. I couldn't increase the > parallelism. > Thanks in advance > > Thanks > Siva >