Hi All, I am trying out with spark for the first time, so am reaching out for what would seem as very basic question.
Consider the below example >>> l = >>> [("US","City1",125),("US","City2",123),("Europe","CityX",23),("Europe","CityY",17)] >>> print l [('US', 'City1', 125), ('US', 'City2', 123), ('Europe', 'CityX', 23), ('Europe', 'CityY', 17)] >>> sc = SparkContext(appName="N") >>> sqlsc = SQLContext(sc) >>> df = sqlsc.createDataFrame(l) >>> df.printSchema() root |-- _1: string (nullable = true) |-- _2: string (nullable = true) |-- _3: long (nullable = true) >>> df.registerTempTable("t1") >>> rdf=sqlsc.sql("Select _1,sum(_3) from t1 group by _1").show() +------+---+ | _1|_c1| +------+---+ | US|248| |Europe| 40| +------+---+ >>> rdf.printSchema() root |-- _1: string (nullable = true) |-- _c1: long (nullable = true) >>> rdf.registerTempTable("t2") >>> sqlsc.sql("Select * from t2 where _c1 > 200").show() +---+---+ | _1|_c1| +---+---+ | US|248| +---+---+ So basically, I am trying to find all the _3 (which can be population subscribed to some service) which are above threshold in each country. In the above table, there is an additional dataframe is created (rdf) Now, How do I eliminate the rdf dataframe and embed the complete query within df dataframe itself. I tried, but pyspark throws error >>> sqlsc.sql("Select _1,sum(_3) from t1 group by _1").show() +------+---+ | _1|_c1| +------+---+ | US|248| |Europe| 40| +------+---+ >>> sqlsc.sql("Select _1,sum(_3) from t1 group by _1 where _c1 > 200").show() Traceback (most recent call last): File "/ghostcache/kimanjun/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o28.sql. : java.lang.RuntimeException: [1.39] failure: ``union'' expected but `where' found Question: Is there a possible way to avoid creation of the data frame (rdf) and directly get the result from df? I have not put much thought on how it would be beneficial, but just pondering the question. -- Regards Kiran -- Regards Kiran