Issues with phoenix-spark connector

Marcin Januszkiewicz Thu, 16 Feb 2017 05:51:05 -0800

Hi,

I'm having some issues with the phoenix spark connector. I'm using the
phoenix-for-cloudera[1] build so I'm not really sure if these are bugs
that have been fixed already.


1. I can't use the connector to load tables that have lowercase names.
Suppose I have a view of a HBase table called 'test' (lowercase) then
loading the table with

val df = sqlContext.load(
  "org.apache.phoenix.spark",
  Map("table" -> "test", "zkUrl" -> zk)
)

fails with the exception

org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03):
Table undefined. tableName=TEST
  at org.apache.phoenix.schema.PMetaDataImpl.getTableRef(PMetaDataImpl.java:71)
  at 
org.apache.phoenix.jdbc.PhoenixConnection.getTable(PhoenixConnection.java:452)
  at org.apache.phoenix.util.PhoenixRuntime.getTable(PhoenixRuntime.java:399)
  at 
org.apache.phoenix.util.PhoenixRuntime.generateColumnInfo(PhoenixRuntime.java:425)
  at 
org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:281)
  at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:106)
  at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:60)
  at 
org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:40)


2. If a column name contains a period, it cannot be used. For example,
I added a column named 'test.test' to a table. Using
phoenix-sqlline.py I can use this column without a problem. But when I
load the table using the phoenix spark collector as before:

val df = sqlContext.load(
  "org.apache.phoenix.spark",
  Map("table" -> "TEST", "zkUrl" -> zk)
)

then I cannot even view the table:

scala> df.show
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001
(42I01): Undefined column family. familyName=test
  at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:921)
  at 
org.apache.phoenix.util.PhoenixRuntime.getColumnInfo(PhoenixRuntime.java:494)
  at 
org.apache.phoenix.util.PhoenixRuntime.generateColumnInfo(PhoenixRuntime.java:440)
  at 
org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:281)
  at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:106)
  at 
org.apache.phoenix.spark.PhoenixRelation.buildScan(PhoenixRelation.scala:47)
  at 
org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$11.apply(DataSourceStrategy.scala:336)

[1] https://github.com/chiastic-security/phoenix-for-cloudera

Issues with phoenix-spark connector

Reply via email to