Re: Did DataFrames break basic SQLContext?
Now, I am not able to directly use my RDD object and have it implicitly become a DataFrame. It can be used as a DataFrameHolder, of which I could write: rdd.toDF.registerTempTable(foo) The rational here was that we added a lot of methods to DataFrame and made the implicits more powerful, but that increased the likelihood of accidental application of the implicit. I personally have had to explain the accidental application of implicits (and the confusing compiler messages that can result) to beginners so many times that we decided to remove the subtle conversion from RDD to DataFrame and instead make it explicit method call.
Did DataFrames break basic SQLContext?
I started to play with 1.3.0 and found that there are a lot of breaking changes. Previously, I could do the following: case class Foo(x: Int) val rdd = sc.parallelize(List(Foo(1))) import sqlContext._ rdd.registerTempTable(foo) Now, I am not able to directly use my RDD object and have it implicitly become a DataFrame. It can be used as a DataFrameHolder, of which I could write: rdd.toDF.registerTempTable(foo) But, that is kind of a pain in comparison. The other problem for me is that I keep getting a SQLException: java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppClassLoader@10393e97, see the next exception for details. This seems to be a dependency on Hive, when previously (1.2.0) there was no such dependency. I can open tickets for these, but wanted to ask here firstmaybe I am doing something wrong? Thanks, Justin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Did-DataFrames-break-basic-SQLContext-tp22120.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Did DataFrames break basic SQLContext?
It appears that the metastore_db problem is related to https://issues.apache.org/jira/browse/SPARK-4758. I had another shell open that was stuck. This is probably a bug, though? import sqlContext.implicits case class Foo(x: Int) val rdd = sc.parallelize(List(Foo(1))) rdd.toDF results in a frozen shell after this line: INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: @ (64), after : . which, locks the internally created metastore_db On Wed, Mar 18, 2015 at 11:20 AM, Justin Pihony justin.pih...@gmail.com wrote: I started to play with 1.3.0 and found that there are a lot of breaking changes. Previously, I could do the following: case class Foo(x: Int) val rdd = sc.parallelize(List(Foo(1))) import sqlContext._ rdd.registerTempTable(foo) Now, I am not able to directly use my RDD object and have it implicitly become a DataFrame. It can be used as a DataFrameHolder, of which I could write: rdd.toDF.registerTempTable(foo) But, that is kind of a pain in comparison. The other problem for me is that I keep getting a SQLException: java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppClassLoader@10393e97, see the next exception for details. This seems to be a dependency on Hive, when previously (1.2.0) there was no such dependency. I can open tickets for these, but wanted to ask here firstmaybe I am doing something wrong? Thanks, Justin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Did-DataFrames-break-basic-SQLContext-tp22120.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Did DataFrames break basic SQLContext?
To answer your first question - yes 1.3.0 did break backward compatibility for the change from SchemaRDD - DataFrame. SparkSQL was an alpha component so api breaking changes could happen. It is no longer an alpha component as of 1.3.0 so this will not be the case in future. Adding toDF should hopefully not be too much of an effort. For the second point - I also have seen these exceptions when upgrading jobs to 1.3.0 - but they don't fail my jobs. Not sure what the cause is would be good to understand this. — Sent from Mailbox On Wed, Mar 18, 2015 at 5:22 PM, Justin Pihony justin.pih...@gmail.com wrote: I started to play with 1.3.0 and found that there are a lot of breaking changes. Previously, I could do the following: case class Foo(x: Int) val rdd = sc.parallelize(List(Foo(1))) import sqlContext._ rdd.registerTempTable(foo) Now, I am not able to directly use my RDD object and have it implicitly become a DataFrame. It can be used as a DataFrameHolder, of which I could write: rdd.toDF.registerTempTable(foo) But, that is kind of a pain in comparison. The other problem for me is that I keep getting a SQLException: java.sql.SQLException: Failed to start database 'metastore_db' with class loader sun.misc.Launcher$AppClassLoader@10393e97, see the next exception for details. This seems to be a dependency on Hive, when previously (1.2.0) there was no such dependency. I can open tickets for these, but wanted to ask here firstmaybe I am doing something wrong? Thanks, Justin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Did-DataFrames-break-basic-SQLContext-tp22120.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org