Re: Missing content in phoenix after writing from Spark

Saif Addin Wed, 12 Sep 2018 20:01:51 -0700

Thanks, we'll try Spark Connector then. Thought it didn't support newest
Spark Versions


On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang <cloud.pos...@gmail.com>
wrote:

> It seems columns data missing mapping information of the schema. if you
> want to use this way to write HBase table,  you can create an HBase table
> and uses Phoenix mapping it.
>
> ----------------------------------------
>    Jaanai Zhang
>    Best regards!
>
>
>
> Thomas D'Silva <tdsi...@salesforce.com> 于2018年9月13日周四 上午6:03写道：
>
>> Is there a reason you didn't use the spark-connector to serialize your
>> data?
>>
>> On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin <saif1...@gmail.com> wrote:
>>
>>> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
>>> table, and the key-column now shows correctly.
>>>
>>> However, the problem still persists in that the rest of the columns show
>>> as completely empty on Phoenix (appear correctly on Hbase). We'll be
>>> looking into this but if you have any further advice, appreciated.
>>>
>>> Saif
>>>
>>> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser <els...@apache.org> wrote:
>>>
>>>> Reminder: Using Phoenix internals forces you to understand exactly how
>>>> the version of Phoenix that you're using serializes data. Is there a
>>>> reason you're not using SQL to interact with Phoenix?
>>>>
>>>> Sounds to me that Phoenix is expecting more data at the head of your
>>>> rowkey. Maybe a salt bucket that you've defined on the table but not
>>>> created?
>>>>
>>>> On 9/12/18 4:32 PM, Saif Addin wrote:
>>>> > Hi all,
>>>> >
>>>> > We're trying to write tables with all string columns from spark.
>>>> > We are not using the Spark Connector, instead we are directly writing
>>>> > byte arrays from RDDs.
>>>> >
>>>> > The process works fine, and Hbase receives the data correctly, and
>>>> > content is consistent.
>>>> >
>>>> > However reading the table from Phoenix, we notice the first character
>>>> of
>>>> > strings are missing. This sounds like it's a byte encoding issue, but
>>>> > we're at loss. We're using PVarchar to generate bytes.
>>>> >
>>>> > Here's the snippet of code creating the RDD:
>>>> >
>>>> > val tdd = pdd.flatMap(x => {
>>>> >    val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>>>> >    for(i <- 0 until cols.length) yield {
>>>> >      other stuff for other columns ...
>>>> >      ...
>>>> >      (rowKey, (column1, column2, column3))
>>>> >    }
>>>> > })
>>>> >
>>>> > ...
>>>> >
>>>> > We then create the following output to be written down in Hbase
>>>> >
>>>> > val output = tdd.map(x => {
>>>> >      val rowKeyByte: Array[Byte] = x._1
>>>> >      val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>>>> >
>>>> >      val kv = new KeyValue(rowKeyByte,
>>>> >          PVarchar.INSTANCE.toBytes(column1),
>>>> >          PVarchar.INSTANCE.toBytes(column2),
>>>> >        PVarchar.INSTANCE.toBytes(column3)
>>>> >      )
>>>> >      (immutableRowKey, kv)
>>>> > })
>>>> >
>>>> > By the way, we are using *KryoSerializer* in order to be able to
>>>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>>>> etc).
>>>> >
>>>> > The key of this table is the one missing data when queried from
>>>> Phoenix.
>>>> > So we guess something is wrong with the byte ser.
>>>> >
>>>> > Any ideas? Appreciated!
>>>> > Saif
>>>>
>>>
>>

Re: Missing content in phoenix after writing from Spark

Reply via email to