PySpark and Phoenix Dynamic Columns

Craig Roberts Thu, 23 Feb 2017 20:57:52 -0800

Hi all,

I've got a (very) basic Spark application in Python that selects some basic
information from my Phoenix table. I can't quite figure out how (or even if
I can) select dynamic columns through this, however.


Here's what I have;

from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext

conf = SparkConf().setAppName("pysparkPhoenixLoad").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)

df = sqlContext.read.format("org.apache.phoenix.spark") \
       .option("table", """MYTABLE("dyamic_column" VARCHAR)""") \
       .option("zkUrl", "127.0.0.1:2181:/hbase-unsecure") \
       .load()

df.show()
df.printSchema()


I get a "org.apache.phoenix.schema.TableNotFoundException:" error for the
above.

If I try and load the data frame as a table and query that with SQL:

sqlContext.registerDataFrameAsTable(df, "test")
sqlContext.sql("""SELECT * FROM test("dynamic_column" VARCHAR)""")


I get a bit of a strange exception:

py4j.protocol.Py4JJavaError: An error occurred while calling o37.sql.
: java.lang.RuntimeException: [1.19] failure: ``union'' expected but `('
found

SELECT * FROM test("dynamic_column" VARCHAR)



Does anybody have a pointer on whether this is supported and how I might be
able to query a dynamic column? I haven't found much information on the
wider Internet about Spark + Phoenix integration for this kind of
thing...Simple selects are working. Final note: I have (rather stupidly)
lower-cased my column names in Phoenix, so I need to quote them when I
execute a query (I'll be changing this as soon as possible).

Any assistance would be appreciated :)
*-- Craig*

PySpark and Phoenix Dynamic Columns

Reply via email to