I'm using Python to setup a dataframe, but for some reason it is not being made 
available to SQL.  Code (from Zeppelin) below.  I don't get any error when 
loading/prepping the data or dataframe.  Any tips?


(Originally I was not hardcoding the Row() structure, as my other tutorial 
added it by default, not sure why it didn't work here, but that might be 
besides the point.)


Any guesses greatly appreciated as I dig my teeth in here for the first time.

Thanks!


-------


%pyspark
from pyspark.sql.types import Row, StructType, StructField, IntegerType, 
StringType, DecimalType

from os import getcwd
sqlContext = SQLContext(sc)

datafile = sc.textFile("/Users/tyler/data/geonames/CA.txt")

geonames = datafile.map(lambda s: s.split("\t")).map(lambda s: Row(
    geonameid=int(s[0]), asciiname=str(s[2]), latitude=float(s[4]), 
longitude=float(s[5]), elevation=str(s[16]), featureclass=str(s[6]), 
featurecode=str(s[7]), countrycode=str(s[8]) ))

gndf = sqlContext.inferSchema(geonames)
gndf.registerAsTable("geonames")

#print gndf.count()
print "-----------"
print gndf.columns
print "-----------"
print gndf.first()
print "-----------"
gndf.schema

============
OUTPUT
============

[u'asciiname', u'countrycode', u'elevation', u'featureclass', u'featurecode', 
u'geonameid', u'latitude', u'longitude']
-----------
Row(asciiname=u'100 Mile House', countrycode=u'CA', elevation=u'928', 
featureclass=u'P', featurecode=u'PPL', geonameid=5881639, latitude=51.64982, 
longitude=-121.28594)
-----------
StructType(List(StructField(asciiname,StringType,true),StructField(countrycode,StringType,true),StructField(elevation,StringType,true),StructField(featureclass,StringType,true),StructField(featurecode,StringType,true),StructField(geonameid,LongType,true),StructField(latitude,DoubleType,true),StructField(longitude,DoubleType,true)))

=============
%sql
SELECT geonameid, count(1) value
FROM geonames
LIMIT 1

no such table List(geonames); line 2 pos 5


Reply via email to