DataFrame: Enable zipWithUniqueId

2015-02-20 Thread Dima Zhiyanov
Hello

Question regarding the new DataFrame API introduced here
https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html

I oftentimes use the zipWithUniqueId method of the SchemaRDD (as an RDD) to
replace string keys with more efficient long keys. Would it be possible to
use the same method in the new DataFrame class?

It looks like unlike the SchemaRdd DataFrame does not extend RDD

Thanks
Dima




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-Enable-zipWithUniqueId-tp21733.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to do broadcast join in SparkSQL

2015-02-12 Thread Dima Zhiyanov
Hello 

Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql? 

Thanks 
Dima



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21632.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Test

2015-02-12 Thread Dima Zhiyanov


Sent from my iPhone

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Thank you!

The Hive solution seemed more like a workaround. I was wondering if a native 
Spark Sql support for computing statistics for Parquet files would be available 

Dima



Sent from my iPhone

> On Feb 11, 2015, at 3:34 PM, Ted Yu  wrote:
> 
> See earlier thread:
> http://search-hadoop.com/m/JW1q5BZhf92
> 
>> On Wed, Feb 11, 2015 at 3:04 PM, Dima Zhiyanov  
>> wrote:
>> Hello
>> 
>> Has Spark implemented computing statistics for Parquet files? Or is there
>> any other way I can enable broadcast joins between parquet file RDDs in
>> Spark Sql?
>> 
>> Thanks
>> Dima
>> 
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21609.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
> 


Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Hello 

Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql? 

Thanks 
Dima 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21611.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Hello 

Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql? 

Thanks 
Dima 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21610.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to do broadcast join in SparkSQL

2015-02-11 Thread Dima Zhiyanov
Hello

Has Spark implemented computing statistics for Parquet files? Or is there
any other way I can enable broadcast joins between parquet file RDDs in
Spark Sql?

Thanks
Dima




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21609.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
Yes

Sent from my iPhone

> On Aug 5, 2014, at 7:38 AM, "Dima Zhiyanov [via Apache Spark User List]" 
>  wrote:
> 
> I am also experiencing this kryo buffer problem. My join is left outer with 
> under 40mb on the right side. I would expect the broadcast join to succeed 
> in this case (hive did) 
> Another problem is that the optimizer 
> chose nested loop join for some reason 
> I would expect broadcast (map side) hash join. 
> Am I correct in my expectations? 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html
> To unsubscribe from spark sql left join gives KryoException: Buffer overflow, 
> click here.
> NAML




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11433.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
I am also experiencing this kryo buffer problem. My join is left outer with
under 40mb on the right side. I would expect the broadcast join to succeed
in this case (hive did)
Another problem is that the optimizer 
chose nested loop join for some reason
I would expect broadcast (map side) hash join. 
Am I correct in my expectations?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org