Re: Missing content in phoenix after writing from Spark

Josh Elser Mon, 17 Sep 2018 16:46:13 -0700

As I said earlier, the expectation is that you use thephoenix-client.jar and phoenix-spark2.jar for the phoenix-sparkintegration with spark2.

You do not need to reference all of these jars by hand. We create thejars with all of the necessary dependencies bundled to specificallyavoid creating this problem for users.


On 9/17/18 3:27 PM, Saif Addin wrote:

Thanks for the patience, sorry maybe I sent incomplete information. Weare loading the following jars and still getting: */executor 1):java.lang.NoClassDefFoundError: Could not initialize classorg.apache.phoenix.query.QueryServicesOptions/*

*/
/*
http://central.maven.org/maven2/org/apache/hbase/hbase-client/2.1.0/hbase-client-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-common/2.1.0/hbase-common-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop-compat/2.1.0/hbase-hadoop-compat-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-mapreduce/2.1.0/hbase-mapreduce-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-miscellaneous/2.1.0/hbase-shaded-miscellaneous-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-protocol/2.1.0/hbase-protocol-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-protocol-shaded/2.1.0/hbase-protocol-shaded-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-protobuf/2.1.0/hbase-shaded-protobuf-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/thirdparty/hbase-shaded-netty/2.1.0/hbase-shaded-netty-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-server/2.1.0/hbase-server-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-hadoop2-compat/2.1.0/hbase-hadoop2-compat-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-metrics/2.1.0/hbase-metrics-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-metrics-api/2.1.0/hbase-metrics-api-2.1.0.jar
http://central.maven.org/maven2/org/apache/hbase/hbase-zookeeper/2.1.0/hbase-zookeeper-2.1.0.jar

http://central.maven.org/maven2/org/apache/phoenix/phoenix-spark/5.0.0-HBase-2.0/phoenix-spark-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-core/5.0.0-HBase-2.0/phoenix-core-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver/5.0.0-HBase-2.0/phoenix-queryserver-5.0.0-HBase-2.0.jar
http://central.maven.org/maven2/org/apache/phoenix/phoenix-queryserver-client/5.0.0-HBase-2.0/phoenix-queryserver-client-5.0.0-HBase-2.0.jar

http://central.maven.org/maven2/org/apache/twill/twill-zookeeper/0.13.0/twill-zookeeper-0.13.0.jar
http://central.maven.org/maven2/org/apache/twill/twill-discovery-core/0.13.0/twill-discovery-core-0.13.0.jar

Not sure which one I could be missing??

On Fri, Sep 14, 2018 at 7:34 PM Josh Elser <[email protected]<mailto:[email protected]>> wrote:


    Uh, you're definitely not using the right JARs :)

    You'll want the phoenix-client.jar for the Phoenix JDBC driver and the
    phoenix-spark.jar for the Phoenix RDD.

    On 9/14/18 1:08 PM, Saif Addin wrote:
     > Hi, I am attempting to make connection with Spark but no success
    so far.
     >
     > For writing into Phoenix, I am trying this:
     >
     > tdd.toDF("ID", "COL1", "COL2",
     > "COL3").write.format("org.apache.phoenix.spark").option("zkUrl",
     > "zookeper-host-url:2181").option("table",
     > htablename).mode("overwrite").save()
     >
     > But getting:
     > *java.sql.SQLException: ERROR 103 (08004): Unable to establish
    connection.*
     > *
     > *
     > For reading, on the other hand, attempting this:
     >
     > val hbConf = HBaseConfiguration.create()
     > val hbaseSitePath = "/etc/hbase/conf/hbase-site.xml"
     > hbConf.addResource(new Path(hbaseSitePath))
     >
     > spark.sqlContext.phoenixTableAsDataFrame("VISTA_409X68",
    Array("ID"),
     > conf = hbConf)
     >
     > Gets me
     > *java.lang.NoClassDefFoundError: Could not initialize class
     > org.apache.phoenix.query.QueryServicesOptions*
     > *
     > *
     > I have added phoenix-queryserver-5.0.0-HBase-2.0.jar and
     > phoenix-queryserver-client-5.0.0-HBase-2.0.jar
     > Any thoughts? I have an hbase-site.xml file with more
    configuration but
     > not sure how to get it to be read in the saving instance.
     > Thanks
     >
     > On Thu, Sep 13, 2018 at 11:38 AM Josh Elser <[email protected]
    <mailto:[email protected]>
     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >     Pretty sure we ran tests with Spark 2.3 with Phoenix 5.0. Not
    sure if
     >     Spark has already moved beyond that.
     >
     >     On 9/12/18 11:00 PM, Saif Addin wrote:
     >      > Thanks, we'll try Spark Connector then. Thought it didn't
    support
     >     newest
     >      > Spark Versions
     >      >
     >      > On Wed, Sep 12, 2018 at 11:03 PM Jaanai Zhang
     >     <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >      > <mailto:[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>>>
     >     wrote:
     >      >
     >      >     It seems columns data missing mapping information of the
     >     schema. if
     >      >     you want to use this way to write HBase table,  you
    can create an
     >      >     HBase table and uses Phoenix mapping it.
     >      >
     >      >     ----------------------------------------
     >      >         Jaanai Zhang
     >      >         Best regards!
     >      >
     >      >
     >      >
     >      >     Thomas D'Silva <[email protected]
    <mailto:[email protected]>
     >     <mailto:[email protected] <mailto:[email protected]>>
     >      >     <mailto:[email protected]
    <mailto:[email protected]>
     >     <mailto:[email protected]
    <mailto:[email protected]>>>> 于2018年9月13日周四 上午6:03写道：
     >      >
     >      >         Is there a reason you didn't use the
    spark-connector to
     >      >         serialize your data?
     >      >
     >      >         On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin
     >     <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >      >         <mailto:[email protected]
    <mailto:[email protected]> <mailto:[email protected]
    <mailto:[email protected]>>>>
     >     wrote:
     >      >
     >      >             Thank you Josh! That was helpful. Indeed,
    there was a
     >     salt
     >      >             bucket on the table, and the key-column now shows
     >     correctly.
     >      >
     >      >             However, the problem still persists in that
    the rest
     >     of the
     >      >             columns show as completely empty on Phoenix
    (appear
     >      >             correctly on Hbase). We'll be looking into
    this but
     >     if you
     >      >             have any further advice, appreciated.
     >      >
     >      >             Saif
     >      >
     >      >             On Wed, Sep 12, 2018 at 5:50 PM Josh Elser
     >      >             <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >     <mailto:[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>>> wrote:
     >      >
     >      >                 Reminder: Using Phoenix internals forces
    you to
     >      >                 understand exactly how
     >      >                 the version of Phoenix that you're using
    serializes
     >      >                 data. Is there a
     >      >                 reason you're not using SQL to interact
    with Phoenix?
     >      >
     >      >                 Sounds to me that Phoenix is expecting
    more data
     >     at the
     >      >                 head of your
     >      >                 rowkey. Maybe a salt bucket that you've
    defined
     >     on the
     >      >                 table but not
     >      >                 created?
     >      >
     >      >                 On 9/12/18 4:32 PM, Saif Addin wrote:
     >      >                  > Hi all,
     >      >                  >
     >      >                  > We're trying to write tables with all
    string
     >     columns
     >      >                 from spark.
     >      >                  > We are not using the Spark Connector,
    instead
     >     we are
     >      >                 directly writing
     >      >                  > byte arrays from RDDs.
     >      >                  >
     >      >                  > The process works fine, and Hbase
    receives the
     >     data
     >      >                 correctly, and
     >      >                  > content is consistent.
     >      >                  >
     >      >                  > However reading the table from Phoenix, we
     >     notice the
     >      >                 first character of
     >      >                  > strings are missing. This sounds like
    it's a byte
     >      >                 encoding issue, but
     >      >                  > we're at loss. We're using PVarchar to
     >     generate bytes.
     >      >                  >
     >      >                  > Here's the snippet of code creating the
    RDD:
     >      >                  >
     >      >                  > val tdd = pdd.flatMap(x => {
     >      >                  >    val rowKey =
    PVarchar.INSTANCE.toBytes(x._1)
     >      >                  >    for(i <- 0 until cols.length) yield {
     >      >                  >      other stuff for other columns ...
     >      >                  >      ...
     >      >                  >      (rowKey, (column1, column2, column3))
     >      >                  >    }
     >      >                  > })
     >      >                  >
     >      >                  > ...
     >      >                  >
     >      >                  > We then create the following output to
    be written
     >      >                 down in Hbase
     >      >                  >
     >      >                  > val output = tdd.map(x => {
     >      >                  >      val rowKeyByte: Array[Byte] = x._1
     >      >                  >      val immutableRowKey = new
     >      >                 ImmutableBytesWritable(rowKeyByte)
     >      >                  >
     >      >                  >      val kv = new KeyValue(rowKeyByte,

> > >PVarchar.INSTANCE.toBytes(column1),> > >PVarchar.INSTANCE.toBytes(column2),

     >      >                  >        PVarchar.INSTANCE.toBytes(column3)
     >      >                  >      )
     >      >                  >      (immutableRowKey, kv)
     >      >                  > })
     >      >                  >
     >      >                  > By the way, we are using
    *KryoSerializer* in
     >     order to
     >      >                 be able to
     >      >                  > serialize all classes necessary for Hbase
     >     (KeyValue,
     >      >                 BytesWritable, etc).
     >      >                  >
     >      >                  > The key of this table is the one
    missing data when
     >      >                 queried from Phoenix.
     >      >                  > So we guess something is wrong with the
    byte ser.
     >      >                  >
     >      >                  > Any ideas? Appreciated!
     >      >                  > Saif
     >      >
     >      >
     >

Re: Missing content in phoenix after writing from Spark

Reply via email to