Should not df.select just have the column names? And sqlC.sql have the select statement?
Therefore perhaps we could use: df.select("COLUMN1, COLUMN2") and sqlC.sql("select COLUMN1, COLUMN2 from tablename") Why would someone want to do a select on a dataframe after registering it as a table? I think we should be using hivecontext or sqlcontext to run queries on a registered table. Regards, Gourav Sengupta On Sat, Dec 26, 2015 at 6:27 PM, Eugene Morozov <evgeny.a.moro...@gmail.com> wrote: > Chris, thanks. That'd be great to try =) > > -- > Be well! > Jean Morozov > > On Fri, Dec 25, 2015 at 10:50 PM, Chris Fregly <ch...@fregly.com> wrote: > >> oh, and it's worth noting that - starting with Spark 1.6 - you'll be able >> to just do the following: >> >> SELECT * FROM json.`/path/to/json/file` >> >> (note the back ticks) >> >> instead of calling registerTempTable() for the sole purpose of using SQL. >> >> https://issues.apache.org/jira/browse/SPARK-11197 >> >> On Fri, Dec 25, 2015 at 2:17 PM, Chris Fregly <ch...@fregly.com> wrote: >> >>> I assume by "The same code perfectly works through Zeppelin 0.5.5" that >>> you're using the %sql interpreter with your regular SQL SELECT statement, >>> correct? >>> >>> If so, the Zeppelin interpreter is converting the <sql-statement> that >>> follows >>> >>> %sql >>> >>> to >>> >>> sqlContext.sql(<sql-statement>) >>> >>> per the following code: >>> >>> >>> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L125 >>> >>> >>> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L140-L141 >>> >>> >>> Also, keep in mind that you can do something like this if you want to >>> stay in DataFrame land: >>> >>> df.selectExpr("*").limit(5).show() >>> >>> >>> >>> On Fri, Dec 25, 2015 at 12:53 PM, Eugene Morozov < >>> evgeny.a.moro...@gmail.com> wrote: >>> >>>> Ted, Igor, >>>> >>>> Oh my... thanks a lot to both of you! >>>> Igor was absolutely right, but I missed that I have to use sqlContext =( >>>> >>>> Everything's perfect. >>>> Thank you. >>>> >>>> -- >>>> Be well! >>>> Jean Morozov >>>> >>>> On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> DataFrame uses different syntax from SQL query. >>>>> I searched unit tests but didn't find any in the form of df.select("select >>>>> ...") >>>>> >>>>> Looks like you should use sqlContext as other people suggested. >>>>> >>>>> On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov < >>>>> evgeny.a.moro...@gmail.com> wrote: >>>>> >>>>>> Thanks for the comments, although the issue is not in limit() >>>>>> predicate. >>>>>> It's something with spark being unable to resolve the expression. >>>>>> >>>>>> I can do smth like this. It works as it suppose to: >>>>>> df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5); >>>>>> >>>>>> But I think old fashioned sql style have to work also. I have >>>>>> df.registeredTempTable("tmptable") and then >>>>>> >>>>>> df.select("select * from tmptable where x1 = '3.0'").show(); >>>>>> >>>>>> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from >>>>>> tmp where x1 = '1.0'' given input columns x1, x4, x5, x3, x2; >>>>>> >>>>>> at >>>>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >>>>>> at >>>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) >>>>>> at >>>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca >>>>>> >>>>>> >>>>>> From the first statement I conclude that my custom datasource is >>>>>> perfectly fine. >>>>>> Just wonder how to fix / workaround that. >>>>>> -- >>>>>> Be well! >>>>>> Jean Morozov >>>>>> >>>>>> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> sqlContext.sql("select * from table limit 5").show() (not sure if >>>>>>> limit 5 supported) >>>>>>> >>>>>>> or use Dmitriy's solution. select() defines your projection when >>>>>>> you've specified entire query >>>>>>> >>>>>>> On 25 December 2015 at 15:42, Василец Дмитрий < >>>>>>> pronix.serv...@gmail.com> wrote: >>>>>>> >>>>>>>> hello >>>>>>>> you can try to use df.limit(5).show() >>>>>>>> just trick :) >>>>>>>> >>>>>>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov < >>>>>>>> evgeny.a.moro...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hello, I'm basically stuck as I have no idea where to look; >>>>>>>>> >>>>>>>>> Following simple code, given that my Datasource is working gives >>>>>>>>> me an exception. >>>>>>>>> >>>>>>>>> DataFrame df = sqlc.load(filename, >>>>>>>>> "com.epam.parso.spark.ds.DefaultSource"); >>>>>>>>> df.cache(); >>>>>>>>> df.printSchema(); <-- prints the schema perfectly fine! >>>>>>>>> >>>>>>>>> df.show(); <-- Works perfectly fine (shows table >>>>>>>>> with 20 lines)! >>>>>>>>> df.registerTempTable("table"); >>>>>>>>> df.select("select * from table limit 5").show(); <-- gives weird >>>>>>>>> exception >>>>>>>>> >>>>>>>>> Exception is: >>>>>>>>> >>>>>>>>> AnalysisException: cannot resolve 'select * from table limit 5' given >>>>>>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS >>>>>>>>> >>>>>>>>> I can do a collect on a dataframe, but cannot select any specific >>>>>>>>> columns either "select * from table" or "select VER, CREATED from >>>>>>>>> table". >>>>>>>>> >>>>>>>>> I use spark 1.5.2. >>>>>>>>> The same code perfectly works through Zeppelin 0.5.5. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> -- >>>>>>>>> Be well! >>>>>>>>> Jean Morozov >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> >>> *Chris Fregly* >>> Principal Data Solutions Engineer >>> IBM Spark Technology Center, San Francisco, CA >>> http://spark.tc | http://advancedspark.com >>> >> >> >> >> -- >> >> *Chris Fregly* >> Principal Data Solutions Engineer >> IBM Spark Technology Center, San Francisco, CA >> http://spark.tc | http://advancedspark.com >> > >