Is there a reason you didn't use the spark-connector to serialize your data?
On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1...@gmail.com> wrote: > Thank you Josh! That was helpful. Indeed, there was a salt bucket on the > table, and the key-column now shows correctly. > > However, the problem still persists in that the rest of the columns show > as completely empty on Phoenix (appear correctly on Hbase). We'll be > looking into this but if you have any further advice, appreciated. > > Saif > > On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <els...@apache.org> wrote: > >> Reminder: Using Phoenix internals forces you to understand exactly how >> the version of Phoenix that you're using serializes data. Is there a >> reason you're not using SQL to interact with Phoenix? >> >> Sounds to me that Phoenix is expecting more data at the head of your >> rowkey. Maybe a salt bucket that you've defined on the table but not >> created? >> >> On 9/12/18 4:32 PM, Saif Addin wrote: >> > Hi all, >> > >> > We're trying to write tables with all string columns from spark. >> > We are not using the Spark Connector, instead we are directly writing >> > byte arrays from RDDs. >> > >> > The process works fine, and Hbase receives the data correctly, and >> > content is consistent. >> > >> > However reading the table from Phoenix, we notice the first character >> of >> > strings are missing. This sounds like it's a byte encoding issue, but >> > we're at loss. We're using PVarchar to generate bytes. >> > >> > Here's the snippet of code creating the RDD: >> > >> > val tdd = pdd.flatMap(x => { >> > val rowKey = PVarchar.INSTANCE.toBytes(x._1) >> > for(i <- 0 until cols.length) yield { >> > other stuff for other columns ... >> > ... >> > (rowKey, (column1, column2, column3)) >> > } >> > }) >> > >> > ... >> > >> > We then create the following output to be written down in Hbase >> > >> > val output = tdd.map(x => { >> > val rowKeyByte: Array[Byte] = x._1 >> > val immutableRowKey = new ImmutableBytesWritable(rowKeyByte) >> > >> > val kv = new KeyValue(rowKeyByte, >> > PVarchar.INSTANCE.toBytes(column1), >> > PVarchar.INSTANCE.toBytes(column2), >> > PVarchar.INSTANCE.toBytes(column3) >> > ) >> > (immutableRowKey, kv) >> > }) >> > >> > By the way, we are using *KryoSerializer* in order to be able to >> > serialize all classes necessary for Hbase (KeyValue, BytesWritable, >> etc). >> > >> > The key of this table is the one missing data when queried from >> Phoenix. >> > So we guess something is wrong with the byte ser. >> > >> > Any ideas? Appreciated! >> > Saif >> >