Re: Stuck with DataFrame df.select("select * from table");

Gourav Sengupta Sun, 27 Dec 2015 20:09:07 -0800

Should not df.select just have the column names?
And sqlC.sql have the select statement?


Therefore perhaps we could use: df.select("COLUMN1, COLUMN2") and
sqlC.sql("select COLUMN1, COLUMN2 from tablename")

Why would someone want to do a select on a dataframe after registering it
as a table? I think we should be using hivecontext or sqlcontext to run
queries on a registered table.


Regards,
Gourav Sengupta


On Sat, Dec 26, 2015 at 6:27 PM, Eugene Morozov <evgeny.a.moro...@gmail.com>
wrote:

> Chris, thanks. That'd be great to try =)
>
> --
> Be well!
> Jean Morozov
>
> On Fri, Dec 25, 2015 at 10:50 PM, Chris Fregly <ch...@fregly.com> wrote:
>
>> oh, and it's worth noting that - starting with Spark 1.6 - you'll be able
>> to just do the following:
>>
>> SELECT * FROM json.`/path/to/json/file`
>>
>> (note the back ticks)
>>
>> instead of calling registerTempTable() for the sole purpose of using SQL.
>>
>> https://issues.apache.org/jira/browse/SPARK-11197
>>
>> On Fri, Dec 25, 2015 at 2:17 PM, Chris Fregly <ch...@fregly.com> wrote:
>>
>>> I assume by "The same code perfectly works through Zeppelin 0.5.5" that
>>> you're using the %sql interpreter with your regular SQL SELECT statement,
>>> correct?
>>>
>>> If so, the Zeppelin interpreter is converting the <sql-statement> that
>>> follows
>>>
>>> %sql
>>>
>>> to
>>>
>>> sqlContext.sql(<sql-statement>)
>>>
>>> per the following code:
>>>
>>>
>>> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L125
>>>
>>>
>>> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L140-L141
>>>
>>>
>>> Also, keep in mind that you can do something like this if you want to
>>> stay in DataFrame land:
>>>
>>> df.selectExpr("*").limit(5).show()
>>>
>>>
>>>
>>> On Fri, Dec 25, 2015 at 12:53 PM, Eugene Morozov <
>>> evgeny.a.moro...@gmail.com> wrote:
>>>
>>>> Ted, Igor,
>>>>
>>>> Oh my... thanks a lot to both of you!
>>>> Igor was absolutely right, but I missed that I have to use sqlContext =(
>>>>
>>>> Everything's perfect.
>>>> Thank you.
>>>>
>>>> --
>>>> Be well!
>>>> Jean Morozov
>>>>
>>>> On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> DataFrame uses different syntax from SQL query.
>>>>> I searched unit tests but didn't find any in the form of df.select("select
>>>>> ...")
>>>>>
>>>>> Looks like you should use sqlContext as other people suggested.
>>>>>
>>>>> On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov <
>>>>> evgeny.a.moro...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the comments, although the issue is not in limit()
>>>>>> predicate.
>>>>>> It's something with spark being unable to resolve the expression.
>>>>>>
>>>>>> I can do smth like this. It works as it suppose to:
>>>>>>  df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5);
>>>>>>
>>>>>> But I think old fashioned sql style have to work also. I have
>>>>>> df.registeredTempTable("tmptable") and then
>>>>>>
>>>>>> df.select("select * from tmptable where x1 = '3.0'").show();
>>>>>>
>>>>>> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from
>>>>>> tmp where x1 = '1.0'' given input columns x1, x4, x5, x3, x2;
>>>>>>
>>>>>> at
>>>>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>>>>> at
>>>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
>>>>>> at
>>>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca
>>>>>>
>>>>>>
>>>>>> From the first statement I conclude that my custom datasource is
>>>>>> perfectly fine.
>>>>>> Just wonder how to fix / workaround that.
>>>>>> --
>>>>>> Be well!
>>>>>> Jean Morozov
>>>>>>
>>>>>> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> sqlContext.sql("select * from table limit 5").show() (not sure if
>>>>>>> limit 5 supported)
>>>>>>>
>>>>>>> or use Dmitriy's solution. select() defines your projection when
>>>>>>> you've specified entire query
>>>>>>>
>>>>>>> On 25 December 2015 at 15:42, Василец Дмитрий <
>>>>>>> pronix.serv...@gmail.com> wrote:
>>>>>>>
>>>>>>>> hello
>>>>>>>> you can try to use df.limit(5).show()
>>>>>>>> just trick :)
>>>>>>>>
>>>>>>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov <
>>>>>>>> evgeny.a.moro...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello, I'm basically stuck as I have no idea where to look;
>>>>>>>>>
>>>>>>>>> Following simple code, given that my Datasource is working gives
>>>>>>>>> me an exception.
>>>>>>>>>
>>>>>>>>> DataFrame df = sqlc.load(filename, 
>>>>>>>>> "com.epam.parso.spark.ds.DefaultSource");
>>>>>>>>> df.cache();
>>>>>>>>> df.printSchema();       <-- prints the schema perfectly fine!
>>>>>>>>>
>>>>>>>>> df.show();                      <-- Works perfectly fine (shows table 
>>>>>>>>> with 20 lines)!
>>>>>>>>> df.registerTempTable("table");
>>>>>>>>> df.select("select * from table limit 5").show(); <-- gives weird 
>>>>>>>>> exception
>>>>>>>>>
>>>>>>>>> Exception is:
>>>>>>>>>
>>>>>>>>> AnalysisException: cannot resolve 'select * from table limit 5' given 
>>>>>>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS
>>>>>>>>>
>>>>>>>>> I can do a collect on a dataframe, but cannot select any specific
>>>>>>>>> columns either "select * from table" or "select VER, CREATED from 
>>>>>>>>> table".
>>>>>>>>>
>>>>>>>>> I use spark 1.5.2.
>>>>>>>>> The same code perfectly works through Zeppelin 0.5.5.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> --
>>>>>>>>> Be well!
>>>>>>>>> Jean Morozov
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Chris Fregly*
>>> Principal Data Solutions Engineer
>>> IBM Spark Technology Center, San Francisco, CA
>>> http://spark.tc | http://advancedspark.com
>>>
>>
>>
>>
>> --
>>
>> *Chris Fregly*
>> Principal Data Solutions Engineer
>> IBM Spark Technology Center, San Francisco, CA
>> http://spark.tc | http://advancedspark.com
>>
>
>

Re: Stuck with DataFrame df.select("select * from table");

Reply via email to