Re: Parallelizing multiple RDD / DataFrame creation in Spark

ayan guha Wed, 08 Jul 2015 08:33:13 -0700

Do you have a benchmark to say running these two statements as it is will
be slower than what you suggest?
On 9 Jul 2015 01:06, "Brandon White" <bwwintheho...@gmail.com> wrote:


> The point of running them in parallel would be faster creation of the
> tables. Has anybody been able to efficiently parallelize something like
> this in Spark?
> On Jul 8, 2015 12:29 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:
>
>> Whats the point of creating them in parallel? You can multi-thread it run
>> it in parallel though.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Jul 8, 2015 at 5:34 AM, Brandon White <bwwintheho...@gmail.com>
>> wrote:
>>
>>> Say I have a spark job that looks like following:
>>>
>>> def loadTable1() {
>>>   val table1 = sqlContext.jsonFile(s"s3://textfiledirectory/")
>>>   table1.cache().registerTempTable("table1")}
>>> def loadTable2() {
>>>   val table2 = sqlContext.jsonFile(s"s3://testfiledirectory2/")
>>>   table2.cache().registerTempTable("table2")}
>>>
>>> def loadAllTables() {
>>>   loadTable1()
>>>   loadTable2()}
>>>
>>> loadAllTables()
>>>
>>> How do I parallelize this Spark job so that both tables are created at
>>> the same time or in parallel?
>>>
>>
>>

Re: Parallelizing multiple RDD / DataFrame creation in Spark

Reply via email to