Re: HBase connector does not read ZK configuration from Spark session

2018-02-23 Thread Deepak Sharma
Hi Dharmin
With the 1st approach , you will have to read the properties from the
--files using this below:
SparkFiles.get('file.txt')

Or else , you can copy the file to hdfs , read it using sc.textFile and use
the property within it.

If you add files using --files , it gets copied to executor's working
directory but you still have to read it and use the properties to be set in
conf.
Thanks
Deepak

On Fri, Feb 23, 2018 at 10:25 AM, Dharmin Siddesh J <
siddeshjdhar...@gmail.com> wrote:

> I am trying to write a Spark program that reads data from HBase and store
> it in DataFrame.
>
> I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf
> folder, but I am facing few issues here.
>
> Issue 1
>
> The first issue is passing hbase-site.xml location with the --files
> parameter submitted through client mode (it works in cluster mode).
>
>
> When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute
> it in client mode by passing with the --files parameter over YARN I keep
> getting the an exception (which I think means it is not taking the
> ZooKeeper configuration from hbase-site.xml.
>
> spark-submit \
>
>   --master yarn \
>
>   --deploy-mode client \
>
>   --files /home/siddesh/hbase-site.xml \
>
>   --class com.orzota.rs.json.HbaseConnector \
>
>   --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \
>
>   --repositories http://repo.hortonworks.com/content/groups/public/ \
>
>   target/scala-2.11/test-0.1-SNAPSHOT.jar
>
> at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1125)
>
> 18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using
> SASL (unknown error)
>
> 18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected
> error, closing socket connection and attempting reconnect
>
> java.net.ConnectException: Connection refused
>
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>
> at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:717)
>
> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(
> ClientCnxnSocketNIO.java:361)
>
> at org.apache.zookeeper.ClientCnxn$SendThread.run(
> ClientCnxn.java:1125)
>
> However it works good when I run it in cluster mode.
>
>
> Issue 2
>
> Passing the HBase configuration details through the Spark session, which I
> can't get to work in both client and cluster mode.
>
>
> Instead of passing the entire hbase-site.xml I am trying to add the
> configuration directly in the code by adding it as a configuration
> parameter in the SparkSession, e.g.:
>
>
> val spark = SparkSession
>
>   .builder()
>
>   .appName(name)
>
>   .config("hbase.zookeeper.property.clientPort", "2181")
>
>   .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")
>
>   .config("spark.hbase.host","zookeeperquorum")
>
>   .getOrCreate()
>
>
> val json_df =
>
>   spark.read.option("catalog",catalog_read).
>
>   format("org.apache.spark.sql.execution.datasources.hbase").
>
>   load()
>
> This is not working in cluster mode either.
>
>
> Can anyone help me with a solution or explanation why this is happening
> are there any workarounds?
>
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Jorge Machado
Can it be that you are missing the HBASE_HOME var ? 

Jorge Machado






> On 23 Feb 2018, at 04:55, Dharmin Siddesh J  wrote:
> 
> I am trying to write a Spark program that reads data from HBase and store it 
> in DataFrame.
> 
> I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf 
> folder, but I am facing few issues here.
> 
> Issue 1
> 
> The first issue is passing hbase-site.xml location with the --files parameter 
> submitted through client mode (it works in cluster mode).
> 
> 
> 
> When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute it 
> in client mode by passing with the --files parameter over YARN I keep getting 
> the an exception (which I think means it is not taking the ZooKeeper 
> configuration from hbase-site.xml.
> 
> spark-submit \
> 
>   --master yarn \
> 
>   --deploy-mode client \
> 
>   --files /home/siddesh/hbase-site.xml \
> 
>   --class com.orzota.rs.json.HbaseConnector \
> 
>   --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \
> 
>   --repositories http://repo.hortonworks.com/content/groups/public/ 
>  \
> 
>   target/scala-2.11/test-0.1-SNAPSHOT.jar
> 
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 
> 18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server 
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
> (unknown error)
> 
> 18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected 
> error, closing socket connection and attempting reconnect
> 
> java.net.ConnectException: Connection refused
> 
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> 
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> 
> at 
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 
> However it works good when I run it in cluster mode.
> 
> 
> 
> Issue 2
> 
> Passing the HBase configuration details through the Spark session, which I 
> can't get to work in both client and cluster mode.
> 
> 
> 
> Instead of passing the entire hbase-site.xml I am trying to add the 
> configuration directly in the code by adding it as a configuration parameter 
> in the SparkSession, e.g.:
> 
> 
> 
> val spark = SparkSession
> 
>   .builder()
> 
>   .appName(name)
> 
>   .config("hbase.zookeeper.property.clientPort", "2181")
> 
>   .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")
> 
>   .config("spark.hbase.host","zookeeperquorum")
> 
>   .getOrCreate()
> 
> 
> 
> val json_df =
> 
>   spark.read.option("catalog",catalog_read).
> 
>   format("org.apache.spark.sql.execution.datasources.hbase").
> 
>   load()
> 
> This is not working in cluster mode either.
> 
> 
> 
> Can anyone help me with a solution or explanation why this is happening are 
> there any workarounds?
> 
> 
> 



HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Dharmin Siddesh J
I am trying to write a Spark program that reads data from HBase and store
it in DataFrame.

I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf
folder, but I am facing few issues here.

Issue 1

The first issue is passing hbase-site.xml location with the --files
parameter submitted through client mode (it works in cluster mode).


When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute it
in client mode by passing with the --files parameter over YARN I keep
getting the an exception (which I think means it is not taking the
ZooKeeper configuration from hbase-site.xml.

spark-submit \

  --master yarn \

  --deploy-mode client \

  --files /home/siddesh/hbase-site.xml \

  --class com.orzota.rs.json.HbaseConnector \

  --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \

  --repositories http://repo.hortonworks.com/content/groups/public/ \

  target/scala-2.11/test-0.1-SNAPSHOT.jar

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL
(unknown error)

18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

However it works good when I run it in cluster mode.


Issue 2

Passing the HBase configuration details through the Spark session, which I
can't get to work in both client and cluster mode.


Instead of passing the entire hbase-site.xml I am trying to add the
configuration directly in the code by adding it as a configuration
parameter in the SparkSession, e.g.:


val spark = SparkSession

  .builder()

  .appName(name)

  .config("hbase.zookeeper.property.clientPort", "2181")

  .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")

  .config("spark.hbase.host","zookeeperquorum")

  .getOrCreate()


val json_df =

  spark.read.option("catalog",catalog_read).

  format("org.apache.spark.sql.execution.datasources.hbase").

  load()

This is not working in cluster mode either.


Can anyone help me with a solution or explanation why this is happening are
there any workarounds?