Do you have a benchmark to say running these two statements as it is will be slower than what you suggest? On 9 Jul 2015 01:06, "Brandon White" <bwwintheho...@gmail.com> wrote:
> The point of running them in parallel would be faster creation of the > tables. Has anybody been able to efficiently parallelize something like > this in Spark? > On Jul 8, 2015 12:29 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote: > >> Whats the point of creating them in parallel? You can multi-thread it run >> it in parallel though. >> >> Thanks >> Best Regards >> >> On Wed, Jul 8, 2015 at 5:34 AM, Brandon White <bwwintheho...@gmail.com> >> wrote: >> >>> Say I have a spark job that looks like following: >>> >>> def loadTable1() { >>> val table1 = sqlContext.jsonFile(s"s3://textfiledirectory/") >>> table1.cache().registerTempTable("table1")} >>> def loadTable2() { >>> val table2 = sqlContext.jsonFile(s"s3://testfiledirectory2/") >>> table2.cache().registerTempTable("table2")} >>> >>> def loadAllTables() { >>> loadTable1() >>> loadTable2()} >>> >>> loadAllTables() >>> >>> How do I parallelize this Spark job so that both tables are created at >>> the same time or in parallel? >>> >> >>