: Jitesh chandra Mishra
Cc: user
Subject: Re: Broadcasting a parquet file using spark and python
You will need to create a hive parquet table that points to the data and run
"ANALYZE TABLE tableName noscan" so that we have statistics on the size.
On Tue, Mar 31, 2015 at 9:36 PM, Jite
>
>
>
> Regards,
>
>
>
> Shuai
>
>
>
> *From:* Michael Armbrust [mailto:mich...@databricks.com]
> *Sent:* Wednesday, April 01, 2015 2:01 PM
> *To:* Jitesh chandra Mishra
> *Cc:* user
> *Subject:* Re: Broadcasting a parquet file using spark and python
>
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, April 01, 2015 2:01 PM
To: Jitesh chandra Mishra
Cc: user
Subject: Re: Broadcasting a parquet file using spark and python
You will need to create a hive parquet table that points to the data and run
"ANALYZE
adcastHashJoin for spark with python?
>>>
>>> My SparkSQL inner joins are taking a lot of time since it is performing
>>> ShuffledHashJoin.
>>>
>>> Tables on which join is performed are stored as parquet files.
>>>
>>> Please help.
>&g
h
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
> Thanks and regards,
> Jitesh
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html
> Sent from the
:
http://apache-spark-user-list.1001560.n3.nabble.com/Broadcasting-a-parquet-file-using-spark-and-python-tp22315.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr