Thanks, we'll try Spark Connector then. Thought it didn't support newest Spark Versions
On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.pos...@gmail.com> wrote: > It seems columns data missing mapping information of the schema. if you > want to use this way to write HBase table, you can create an HBase table > and uses Phoenix mapping it. > > ---------------------------------------- > Jaanai Zhang > Best regards! > > > > Thomas D'Silva <tdsi...@salesforce.com> 于2018年9月13日周四 上午6:03写道: > >> Is there a reason you didn't use the spark-connector to serialize your >> data? >> >> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1...@gmail.com> wrote: >> >>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the >>> table, and the key-column now shows correctly. >>> >>> However, the problem still persists in that the rest of the columns show >>> as completely empty on Phoenix (appear correctly on Hbase). We'll be >>> looking into this but if you have any further advice, appreciated. >>> >>> Saif >>> >>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <els...@apache.org> wrote: >>> >>>> Reminder: Using Phoenix internals forces you to understand exactly how >>>> the version of Phoenix that you're using serializes data. Is there a >>>> reason you're not using SQL to interact with Phoenix? >>>> >>>> Sounds to me that Phoenix is expecting more data at the head of your >>>> rowkey. Maybe a salt bucket that you've defined on the table but not >>>> created? >>>> >>>> On 9/12/18 4:32 PM, Saif Addin wrote: >>>> > Hi all, >>>> > >>>> > We're trying to write tables with all string columns from spark. >>>> > We are not using the Spark Connector, instead we are directly writing >>>> > byte arrays from RDDs. >>>> > >>>> > The process works fine, and Hbase receives the data correctly, and >>>> > content is consistent. >>>> > >>>> > However reading the table from Phoenix, we notice the first character >>>> of >>>> > strings are missing. This sounds like it's a byte encoding issue, but >>>> > we're at loss. We're using PVarchar to generate bytes. >>>> > >>>> > Here's the snippet of code creating the RDD: >>>> > >>>> > val tdd = pdd.flatMap(x => { >>>> > val rowKey = PVarchar.INSTANCE.toBytes(x._1) >>>> > for(i <- 0 until cols.length) yield { >>>> > other stuff for other columns ... >>>> > ... >>>> > (rowKey, (column1, column2, column3)) >>>> > } >>>> > }) >>>> > >>>> > ... >>>> > >>>> > We then create the following output to be written down in Hbase >>>> > >>>> > val output = tdd.map(x => { >>>> > val rowKeyByte: Array[Byte] = x._1 >>>> > val immutableRowKey = new ImmutableBytesWritable(rowKeyByte) >>>> > >>>> > val kv = new KeyValue(rowKeyByte, >>>> > PVarchar.INSTANCE.toBytes(column1), >>>> > PVarchar.INSTANCE.toBytes(column2), >>>> > PVarchar.INSTANCE.toBytes(column3) >>>> > ) >>>> > (immutableRowKey, kv) >>>> > }) >>>> > >>>> > By the way, we are using *KryoSerializer* in order to be able to >>>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable, >>>> etc). >>>> > >>>> > The key of this table is the one missing data when queried from >>>> Phoenix. >>>> > So we guess something is wrong with the byte ser. >>>> > >>>> > Any ideas? Appreciated! >>>> > Saif >>>> >>> >>