Re: Stuck with DataFrame df.select("select * from table");

Eugene Morozov Sat, 26 Dec 2015 04:58:27 -0800

Chris, thanks. That'd be great to try =)

--
Be well!
Jean Morozov


On Fri, Dec 25, 2015 at 10:50 PM, Chris Fregly <ch...@fregly.com> wrote:

> oh, and it's worth noting that - starting with Spark 1.6 - you'll be able
> to just do the following:
>
> SELECT * FROM json.`/path/to/json/file`
>
> (note the back ticks)
>
> instead of calling registerTempTable() for the sole purpose of using SQL.
>
> https://issues.apache.org/jira/browse/SPARK-11197
>
> On Fri, Dec 25, 2015 at 2:17 PM, Chris Fregly <ch...@fregly.com> wrote:
>
>> I assume by "The same code perfectly works through Zeppelin 0.5.5" that
>> you're using the %sql interpreter with your regular SQL SELECT statement,
>> correct?
>>
>> If so, the Zeppelin interpreter is converting the <sql-statement> that
>> follows
>>
>> %sql
>>
>> to
>>
>> sqlContext.sql(<sql-statement>)
>>
>> per the following code:
>>
>>
>> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L125
>>
>>
>> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L140-L141
>>
>>
>> Also, keep in mind that you can do something like this if you want to
>> stay in DataFrame land:
>>
>> df.selectExpr("*").limit(5).show()
>>
>>
>>
>> On Fri, Dec 25, 2015 at 12:53 PM, Eugene Morozov <
>> evgeny.a.moro...@gmail.com> wrote:
>>
>>> Ted, Igor,
>>>
>>> Oh my... thanks a lot to both of you!
>>> Igor was absolutely right, but I missed that I have to use sqlContext =(
>>>
>>> Everything's perfect.
>>> Thank you.
>>>
>>> --
>>> Be well!
>>> Jean Morozov
>>>
>>> On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> DataFrame uses different syntax from SQL query.
>>>> I searched unit tests but didn't find any in the form of df.select("select
>>>> ...")
>>>>
>>>> Looks like you should use sqlContext as other people suggested.
>>>>
>>>> On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov <
>>>> evgeny.a.moro...@gmail.com> wrote:
>>>>
>>>>> Thanks for the comments, although the issue is not in limit()
>>>>> predicate.
>>>>> It's something with spark being unable to resolve the expression.
>>>>>
>>>>> I can do smth like this. It works as it suppose to:
>>>>>  df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5);
>>>>>
>>>>> But I think old fashioned sql style have to work also. I have
>>>>> df.registeredTempTable("tmptable") and then
>>>>>
>>>>> df.select("select * from tmptable where x1 = '3.0'").show();
>>>>>
>>>>> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from
>>>>> tmp where x1 = '1.0'' given input columns x1, x4, x5, x3, x2;
>>>>>
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
>>>>> at
>>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca
>>>>>
>>>>>
>>>>> From the first statement I conclude that my custom datasource is
>>>>> perfectly fine.
>>>>> Just wonder how to fix / workaround that.
>>>>> --
>>>>> Be well!
>>>>> Jean Morozov
>>>>>
>>>>> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> sqlContext.sql("select * from table limit 5").show() (not sure if
>>>>>> limit 5 supported)
>>>>>>
>>>>>> or use Dmitriy's solution. select() defines your projection when
>>>>>> you've specified entire query
>>>>>>
>>>>>> On 25 December 2015 at 15:42, Василец Дмитрий <
>>>>>> pronix.serv...@gmail.com> wrote:
>>>>>>
>>>>>>> hello
>>>>>>> you can try to use df.limit(5).show()
>>>>>>> just trick :)
>>>>>>>
>>>>>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov <
>>>>>>> evgeny.a.moro...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello, I'm basically stuck as I have no idea where to look;
>>>>>>>>
>>>>>>>> Following simple code, given that my Datasource is working gives me
>>>>>>>> an exception.
>>>>>>>>
>>>>>>>> DataFrame df = sqlc.load(filename, 
>>>>>>>> "com.epam.parso.spark.ds.DefaultSource");
>>>>>>>> df.cache();
>>>>>>>> df.printSchema();       <-- prints the schema perfectly fine!
>>>>>>>>
>>>>>>>> df.show();                      <-- Works perfectly fine (shows table 
>>>>>>>> with 20 lines)!
>>>>>>>> df.registerTempTable("table");
>>>>>>>> df.select("select * from table limit 5").show(); <-- gives weird 
>>>>>>>> exception
>>>>>>>>
>>>>>>>> Exception is:
>>>>>>>>
>>>>>>>> AnalysisException: cannot resolve 'select * from table limit 5' given 
>>>>>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS
>>>>>>>>
>>>>>>>> I can do a collect on a dataframe, but cannot select any specific
>>>>>>>> columns either "select * from table" or "select VER, CREATED from 
>>>>>>>> table".
>>>>>>>>
>>>>>>>> I use spark 1.5.2.
>>>>>>>> The same code perfectly works through Zeppelin 0.5.5.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>> --
>>>>>>>> Be well!
>>>>>>>> Jean Morozov
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> *Chris Fregly*
>> Principal Data Solutions Engineer
>> IBM Spark Technology Center, San Francisco, CA
>> http://spark.tc | http://advancedspark.com
>>
>
>
>
> --
>
> *Chris Fregly*
> Principal Data Solutions Engineer
> IBM Spark Technology Center, San Francisco, CA
> http://spark.tc | http://advancedspark.com
>

Re: Stuck with DataFrame df.select("select * from table");

Reply via email to