Re: Broadcasting a parquet file using spark and python

Jitesh chandra Mishra Tue, 31 Mar 2015 21:38:28 -0700

Hi Michael,

Thanks for your response. I am running 1.2.1.


Is there any workaround to achieve the same with 1.2.1?

Thanks,
Jitesh

On Wed, Apr 1, 2015 at 12:25 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> In Spark 1.3 I would expect this to happen automatically when the parquet
> table is small (< 10mb, configurable with 
> spark.sql.autoBroadcastJoinThreshold).
> If you are running 1.3 and not seeing this, can you show the code you are
> using to create the table?
>
> On Tue, Mar 31, 2015 at 3:25 AM, jitesh129 <jitesh...@gmail.com> wrote:
>
>> How can we implement a BroadcastHashJoin for spark with python?
>>
>> My SparkSQL inner joins are taking a lot of time since it is performing
>> ShuffledHashJoin.
>>
>> Tables on which join is performed are stored as parquet files.
>>
>> Please help.
>>
>> Thanks and regards,
>> Jitesh
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Broadcasting a parquet file using spark and python

Reply via email to