Re: Stuck with DataFrame df.select("select * from table");

Chris Fregly Fri, 25 Dec 2015 11:18:06 -0800

I assume by "The same code perfectly works through Zeppelin 0.5.5" that
you're using the %sql interpreter with your regular SQL SELECT statement,
correct?


If so, the Zeppelin interpreter is converting the <sql-statement> that
follows

%sql

to

sqlContext.sql(<sql-statement>)

per the following code:

https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L125

https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L140-L141


Also, keep in mind that you can do something like this if you want to stay
in DataFrame land:

df.selectExpr("*").limit(5).show()



On Fri, Dec 25, 2015 at 12:53 PM, Eugene Morozov <evgeny.a.moro...@gmail.com
> wrote:

> Ted, Igor,
>
> Oh my... thanks a lot to both of you!
> Igor was absolutely right, but I missed that I have to use sqlContext =(
>
> Everything's perfect.
> Thank you.
>
> --
> Be well!
> Jean Morozov
>
> On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> DataFrame uses different syntax from SQL query.
>> I searched unit tests but didn't find any in the form of df.select("select
>> ...")
>>
>> Looks like you should use sqlContext as other people suggested.
>>
>> On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov <
>> evgeny.a.moro...@gmail.com> wrote:
>>
>>> Thanks for the comments, although the issue is not in limit() predicate.
>>> It's something with spark being unable to resolve the expression.
>>>
>>> I can do smth like this. It works as it suppose to:
>>>  df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5);
>>>
>>> But I think old fashioned sql style have to work also. I have
>>> df.registeredTempTable("tmptable") and then
>>>
>>> df.select("select * from tmptable where x1 = '3.0'").show();
>>>
>>> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from
>>> tmp where x1 = '1.0'' given input columns x1, x4, x5, x3, x2;
>>>
>>> at
>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
>>> at
>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca
>>>
>>>
>>> From the first statement I conclude that my custom datasource is
>>> perfectly fine.
>>> Just wonder how to fix / workaround that.
>>> --
>>> Be well!
>>> Jean Morozov
>>>
>>> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com>
>>> wrote:
>>>
>>>> sqlContext.sql("select * from table limit 5").show() (not sure if limit
>>>> 5 supported)
>>>>
>>>> or use Dmitriy's solution. select() defines your projection when you've
>>>> specified entire query
>>>>
>>>> On 25 December 2015 at 15:42, Василец Дмитрий <pronix.serv...@gmail.com
>>>> > wrote:
>>>>
>>>>> hello
>>>>> you can try to use df.limit(5).show()
>>>>> just trick :)
>>>>>
>>>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov <
>>>>> evgeny.a.moro...@gmail.com> wrote:
>>>>>
>>>>>> Hello, I'm basically stuck as I have no idea where to look;
>>>>>>
>>>>>> Following simple code, given that my Datasource is working gives me
>>>>>> an exception.
>>>>>>
>>>>>> DataFrame df = sqlc.load(filename, 
>>>>>> "com.epam.parso.spark.ds.DefaultSource");
>>>>>> df.cache();
>>>>>> df.printSchema();       <-- prints the schema perfectly fine!
>>>>>>
>>>>>> df.show();                      <-- Works perfectly fine (shows table 
>>>>>> with 20 lines)!
>>>>>> df.registerTempTable("table");
>>>>>> df.select("select * from table limit 5").show(); <-- gives weird 
>>>>>> exception
>>>>>>
>>>>>> Exception is:
>>>>>>
>>>>>> AnalysisException: cannot resolve 'select * from table limit 5' given 
>>>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS
>>>>>>
>>>>>> I can do a collect on a dataframe, but cannot select any specific
>>>>>> columns either "select * from table" or "select VER, CREATED from table".
>>>>>>
>>>>>> I use spark 1.5.2.
>>>>>> The same code perfectly works through Zeppelin 0.5.5.
>>>>>>
>>>>>> Thanks.
>>>>>> --
>>>>>> Be well!
>>>>>> Jean Morozov
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Re: Stuck with DataFrame df.select("select * from table");

Reply via email to