Dima Zhiyanov created SPARK-6072:
Summary: Enable hash joins for nullable columns
Key: SPARK-6072
URL: https://issues.apache.org/jira/browse/SPARK-6072
Project: Spark
Issue Type: Improvement
Hello
Question regarding the new DataFrame API introduced here
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
I oftentimes use the zipWithUniqueId method of the SchemaRDD (as an RDD) to
replace string keys with more efficient long keys. Wo
Dima Zhiyanov created SPARK-5919:
Summary: Enable broadcast joins for Parquet files
Key: SPARK-5919
URL: https://issues.apache.org/jira/browse/SPARK-5919
Project: Spark
Issue Type
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-Sp
Sent from my iPhone
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
h-hadoop.com/m/JW1q5BZhf92
>
>> On Wed, Feb 11, 2015 at 3:04 PM, Dima Zhiyanov
>> wrote:
>> Hello
>>
>> Has Spark implemented computing statistics for Parquet files? Or is there
>> any other way I can enable broadcast joins between parqu
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-S
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-
Hello
Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?
Thanks
Dima
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-Spar
Yes
Sent from my iPhone
> On Aug 5, 2014, at 7:38 AM, "Dima Zhiyanov [via Apache Spark User List]"
> wrote:
>
> I am also experiencing this kryo buffer problem. My join is left outer with
> under 40mb on the right side. I would expect the broadcast join to succeed
I am also experiencing this kryo buffer problem. My join is left outer with
under 40mb on the right side. I would expect the broadcast join to succeed
in this case (hive did)
Another problem is that the optimizer
chose nested loop join for some reason
I would expect broadcast (map side) hash join.
11 matches
Mail list logo