Chris, thanks. That'd be great to try =) -- Be well! Jean Morozov
On Fri, Dec 25, 2015 at 10:50 PM, Chris Fregly <ch...@fregly.com> wrote: > oh, and it's worth noting that - starting with Spark 1.6 - you'll be able > to just do the following: > > SELECT * FROM json.`/path/to/json/file` > > (note the back ticks) > > instead of calling registerTempTable() for the sole purpose of using SQL. > > https://issues.apache.org/jira/browse/SPARK-11197 > > On Fri, Dec 25, 2015 at 2:17 PM, Chris Fregly <ch...@fregly.com> wrote: > >> I assume by "The same code perfectly works through Zeppelin 0.5.5" that >> you're using the %sql interpreter with your regular SQL SELECT statement, >> correct? >> >> If so, the Zeppelin interpreter is converting the <sql-statement> that >> follows >> >> %sql >> >> to >> >> sqlContext.sql(<sql-statement>) >> >> per the following code: >> >> >> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L125 >> >> >> https://github.com/apache/incubator-zeppelin/blob/01f4884a3a971ece49d668a9783d6b705cf6dbb5/spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L140-L141 >> >> >> Also, keep in mind that you can do something like this if you want to >> stay in DataFrame land: >> >> df.selectExpr("*").limit(5).show() >> >> >> >> On Fri, Dec 25, 2015 at 12:53 PM, Eugene Morozov < >> evgeny.a.moro...@gmail.com> wrote: >> >>> Ted, Igor, >>> >>> Oh my... thanks a lot to both of you! >>> Igor was absolutely right, but I missed that I have to use sqlContext =( >>> >>> Everything's perfect. >>> Thank you. >>> >>> -- >>> Be well! >>> Jean Morozov >>> >>> On Fri, Dec 25, 2015 at 8:31 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> DataFrame uses different syntax from SQL query. >>>> I searched unit tests but didn't find any in the form of df.select("select >>>> ...") >>>> >>>> Looks like you should use sqlContext as other people suggested. >>>> >>>> On Fri, Dec 25, 2015 at 8:29 AM, Eugene Morozov < >>>> evgeny.a.moro...@gmail.com> wrote: >>>> >>>>> Thanks for the comments, although the issue is not in limit() >>>>> predicate. >>>>> It's something with spark being unable to resolve the expression. >>>>> >>>>> I can do smth like this. It works as it suppose to: >>>>> df.select(df.col("*")).where(df.col("x1").equalTo(3.0)).show(5); >>>>> >>>>> But I think old fashioned sql style have to work also. I have >>>>> df.registeredTempTable("tmptable") and then >>>>> >>>>> df.select("select * from tmptable where x1 = '3.0'").show(); >>>>> >>>>> org.apache.spark.sql.AnalysisException: cannot resolve 'select * from >>>>> tmp where x1 = '1.0'' given input columns x1, x4, x5, x3, x2; >>>>> >>>>> at >>>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >>>>> at >>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56) >>>>> at >>>>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.sca >>>>> >>>>> >>>>> From the first statement I conclude that my custom datasource is >>>>> perfectly fine. >>>>> Just wonder how to fix / workaround that. >>>>> -- >>>>> Be well! >>>>> Jean Morozov >>>>> >>>>> On Fri, Dec 25, 2015 at 6:13 PM, Igor Berman <igor.ber...@gmail.com> >>>>> wrote: >>>>> >>>>>> sqlContext.sql("select * from table limit 5").show() (not sure if >>>>>> limit 5 supported) >>>>>> >>>>>> or use Dmitriy's solution. select() defines your projection when >>>>>> you've specified entire query >>>>>> >>>>>> On 25 December 2015 at 15:42, Василец Дмитрий < >>>>>> pronix.serv...@gmail.com> wrote: >>>>>> >>>>>>> hello >>>>>>> you can try to use df.limit(5).show() >>>>>>> just trick :) >>>>>>> >>>>>>> On Fri, Dec 25, 2015 at 2:34 PM, Eugene Morozov < >>>>>>> evgeny.a.moro...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello, I'm basically stuck as I have no idea where to look; >>>>>>>> >>>>>>>> Following simple code, given that my Datasource is working gives me >>>>>>>> an exception. >>>>>>>> >>>>>>>> DataFrame df = sqlc.load(filename, >>>>>>>> "com.epam.parso.spark.ds.DefaultSource"); >>>>>>>> df.cache(); >>>>>>>> df.printSchema(); <-- prints the schema perfectly fine! >>>>>>>> >>>>>>>> df.show(); <-- Works perfectly fine (shows table >>>>>>>> with 20 lines)! >>>>>>>> df.registerTempTable("table"); >>>>>>>> df.select("select * from table limit 5").show(); <-- gives weird >>>>>>>> exception >>>>>>>> >>>>>>>> Exception is: >>>>>>>> >>>>>>>> AnalysisException: cannot resolve 'select * from table limit 5' given >>>>>>>> input columns VER, CREATED, SOC, SOCC, HLTC, HLGTC, STATUS >>>>>>>> >>>>>>>> I can do a collect on a dataframe, but cannot select any specific >>>>>>>> columns either "select * from table" or "select VER, CREATED from >>>>>>>> table". >>>>>>>> >>>>>>>> I use spark 1.5.2. >>>>>>>> The same code perfectly works through Zeppelin 0.5.5. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> -- >>>>>>>> Be well! >>>>>>>> Jean Morozov >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> >> *Chris Fregly* >> Principal Data Solutions Engineer >> IBM Spark Technology Center, San Francisco, CA >> http://spark.tc | http://advancedspark.com >> > > > > -- > > *Chris Fregly* > Principal Data Solutions Engineer > IBM Spark Technology Center, San Francisco, CA > http://spark.tc | http://advancedspark.com >