Hi, I am attempting to make connection with Spark but no success so far. For writing into Phoenix, I am trying this:
tdd.toDF("ID", "COL1", "COL2", "COL3").write.format("org.apache.phoenix.spark").option("zkUrl", "zookeper-host-url:2181").option("table", htablename).mode("overwrite").save() But getting: *java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.* For reading, on the other hand, attempting this: val hbConf = HBaseConfiguration.create() val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml" hbConf.addResource(new Path(hbaseSitePath)) spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68", Array("ID"), conf = hbConf) Gets me *java.lang.NoClassDefFoundError: Could not initialize class org.apache.phoenix.query.QueryServicesOptions* I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and phoenix-queryserver-client-5.0.0-HBase-2.0.jar Any thoughts? I have an hbase-site.xml file with more configuration but not sure how to get it to be read in the saving instance. Thanks On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <els...@apache.org> wrote: > Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not sure if > Spark has already moved beyond that. > > On 9/12/18 11:00 PM, Saif Addin wrote: > > Thanks, we'll try Spark Connector then. Thought it didn't support newest > > Spark Versions > > > > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.pos...@gmail.com > > <mailto:cloud.pos...@gmail.com>> wrote: > > > > It seems columns data missing mapping information of the schema. if > > you want to use this way to write HBase table, you can create an > > HBase table and uses Phoenix mapping it. > > > > ---------------------------------------- > > Jaanai Zhang > > Best regards! > > > > > > > > Thomas D'Silva <tdsi...@salesforce.com > > <mailto:tdsi...@salesforce.com>> 于2018年9月13日周四 上午6:03写道: > > > > Is there a reason you didn't use the spark-connector to > > serialize your data? > > > > On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1...@gmail.com > > <mailto:saif1...@gmail.com>> wrote: > > > > Thank you Josh! That was helpful. Indeed, there was a salt > > bucket on the table, and the key-column now shows correctly. > > > > However, the problem still persists in that the rest of the > > columns show as completely empty on Phoenix (appear > > correctly on Hbase). We'll be looking into this but if you > > have any further advice, appreciated. > > > > Saif > > > > On Wed, Sep 12, 2018 at 5:50 PM Josh Elser > > <els...@apache.org <mailto:els...@apache.org>> wrote: > > > > Reminder: Using Phoenix internals forces you to > > understand exactly how > > the version of Phoenix that you're using serializes > > data. Is there a > > reason you're not using SQL to interact with Phoenix? > > > > Sounds to me that Phoenix is expecting more data at the > > head of your > > rowkey. Maybe a salt bucket that you've defined on the > > table but not > > created? > > > > On 9/12/18 4:32 PM, Saif Addin wrote: > > > Hi all, > > > > > > We're trying to write tables with all string columns > > from spark. > > > We are not using the Spark Connector, instead we are > > directly writing > > > byte arrays from RDDs. > > > > > > The process works fine, and Hbase receives the data > > correctly, and > > > content is consistent. > > > > > > However reading the table from Phoenix, we notice the > > first character of > > > strings are missing. This sounds like it's a byte > > encoding issue, but > > > we're at loss. We're using PVarchar to generate bytes. > > > > > > Here's the snippet of code creating the RDD: > > > > > > val tdd = pdd.flatMap(x => { > > > val rowKey = PVarchar.INSTANCE.toBytes(x._1) > > > for(i <- 0 until cols.length) yield { > > > other stuff for other columns ... > > > ... > > > (rowKey, (column1, column2, column3)) > > > } > > > }) > > > > > > ... > > > > > > We then create the following output to be written > > down in Hbase > > > > > > val output = tdd.map(x => { > > > val rowKeyByte: Array[Byte] = x._1 > > > val immutableRowKey = new > > ImmutableBytesWritable(rowKeyByte) > > > > > > val kv = new KeyValue(rowKeyByte, > > > PVarchar.INSTANCE.toBytes(column1), > > > PVarchar.INSTANCE.toBytes(column2), > > > PVarchar.INSTANCE.toBytes(column3) > > > ) > > > (immutableRowKey, kv) > > > }) > > > > > > By the way, we are using *KryoSerializer* in order to > > be able to > > > serialize all classes necessary for Hbase (KeyValue, > > BytesWritable, etc). > > > > > > The key of this table is the one missing data when > > queried from Phoenix. > > > So we guess something is wrong with the byte ser. > > > > > > Any ideas? Appreciated! > > > Saif > > > > >