Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread ayan guha
Do you have a benchmark to say running these two statements as it is will be slower than what you suggest? On 9 Jul 2015 01:06, "Brandon White" wrote: > The point of running them in parallel would be faster creation of the > tables. Has anybody been able to efficiently parallelize something like

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Srikanth
Your tableLoad() APIs are not actions. File will be read fully only when an action is performed. If the action is something like table1.join(table2), then I think both files will be read in parallel. Can you try that and look at the execution plan or in 1.4 this is shown in Spark UI. Srikanth On

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Brandon White
The point of running them in parallel would be faster creation of the tables. Has anybody been able to efficiently parallelize something like this in Spark? On Jul 8, 2015 12:29 AM, "Akhil Das" wrote: > Whats the point of creating them in parallel? You can multi-thread it run > it in parallel tho

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks you Akhil for the link Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Wed, Jul 8, 2015 at 3:43 PM, Akhil Das wrote: > Have a look > http://alvinalexander.com/scala/how-to-create-java-thread-runn

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Akhil Das
Have a look http://alvinalexander.com/scala/how-to-create-java-thread-runnable-in-scala, create two threads and call thread1.start(), thread2.start() Thanks Best Regards On Wed, Jul 8, 2015 at 1:06 PM, Ashish Dutt wrote: > Thanks for your reply Akhil. > How do you multithread it? > > Sincerely,

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks for your reply Akhil. How do you multithread it? Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das wrote: > Whats the point of creating them in parallel? You can multi-thread it run > it in parallel though. > > Thanks > Best Regards > > On Wed, Jul 8, 2015 at 5:34 AM, Bra

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Akhil Das
Whats the point of creating them in parallel? You can multi-thread it run it in parallel though. Thanks Best Regards On Wed, Jul 8, 2015 at 5:34 AM, Brandon White wrote: > Say I have a spark job that looks like following: > > def loadTable1() { > val table1 = sqlContext.jsonFile(s"s3://textfi

Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-07 Thread Brandon White
Say I have a spark job that looks like following: def loadTable1() { val table1 = sqlContext.jsonFile(s"s3://textfiledirectory/") table1.cache().registerTempTable("table1")} def loadTable2() { val table2 = sqlContext.jsonFile(s"s3://testfiledirectory2/") table2.cache().registerTempTable("t