DataFrame: Enable zipWithUniqueId
Hello Question regarding the new DataFrame API introduced here https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html I oftentimes use the zipWithUniqueId method of the SchemaRDD (as an RDD) to replace string keys with more efficient long keys. Would it be possible to use the same method in the new DataFrame class? It looks like unlike the SchemaRdd DataFrame does not extend RDD Thanks Dima -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-Enable-zipWithUniqueId-tp21733.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to do broadcast join in SparkSQL
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21632.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Test
Sent from my iPhone - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to do broadcast join in SparkSQL
Thank you! The Hive solution seemed more like a workaround. I was wondering if a native Spark Sql support for computing statistics for Parquet files would be available Dima Sent from my iPhone > On Feb 11, 2015, at 3:34 PM, Ted Yu wrote: > > See earlier thread: > http://search-hadoop.com/m/JW1q5BZhf92 > >> On Wed, Feb 11, 2015 at 3:04 PM, Dima Zhiyanov >> wrote: >> Hello >> >> Has Spark implemented computing statistics for Parquet files? Or is there >> any other way I can enable broadcast joins between parquet file RDDs in >> Spark Sql? >> >> Thanks >> Dima >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21609.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >
Re: How to do broadcast join in SparkSQL
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21611.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to do broadcast join in SparkSQL
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21610.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to do broadcast join in SparkSQL
Hello Has Spark implemented computing statistics for Parquet files? Or is there any other way I can enable broadcast joins between parquet file RDDs in Spark Sql? Thanks Dima -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-do-broadcast-join-in-SparkSQL-tp15298p21609.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql left join gives KryoException: Buffer overflow
Yes Sent from my iPhone > On Aug 5, 2014, at 7:38 AM, "Dima Zhiyanov [via Apache Spark User List]" > wrote: > > I am also experiencing this kryo buffer problem. My join is left outer with > under 40mb on the right side. I would expect the broadcast join to succeed > in this case (hive did) > Another problem is that the optimizer > chose nested loop join for some reason > I would expect broadcast (map side) hash join. > Am I correct in my expectations? > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html > To unsubscribe from spark sql left join gives KryoException: Buffer overflow, > click here. > NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11433.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: spark sql left join gives KryoException: Buffer overflow
I am also experiencing this kryo buffer problem. My join is left outer with under 40mb on the right side. I would expect the broadcast join to succeed in this case (hive did) Another problem is that the optimizer chose nested loop join for some reason I would expect broadcast (map side) hash join. Am I correct in my expectations? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-left-join-gives-KryoException-Buffer-overflow-tp10157p11432.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org