RE: Spark opening to many connection with zookeeper

Amit Hora Tue, 20 Oct 2015 09:01:23 -0700

Hi Ted,

I made mistake last time yes the connection are very controlled when I used put 
like iterated over rdd for each and within that for each partition made 
connection and executed put list for hbase


But why it was that the connection were getting too much when I used hibconf 
and storehadoopdataset method?

-----Original Message-----
From: "Amit Hora" <hora.a...@gmail.com>
Sent: ‎20-‎10-‎2015 20:38
To: "Ted Yu" <yuzhih...@gmail.com>
Cc: "user" <user@spark.apache.org>
Subject: RE: Spark opening to many connection with zookeeper

I used that also but the number of connection goes on increasing started frm 10 
and went till 299 
Than I changed my zookeeper conf to set max client connection to just 30 and 
restarted job 
Now the connections are between 18- 24 from last 2 hours

I am unable to understand such a behaviour


From: Ted Yu
Sent: ‎20-‎10-‎2015 20:19
To: Amit Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


Can you take a look at example 37 on page 225 of:

http://hbase.apache.org/apache_hbase_reference_guide.pdf



You can use the following method of Table:
  void put(List<Put> puts) throws IOException;
After the put() returns, the connection is closed.
Cheers


On Tue, Oct 20, 2015 at 2:40 AM, Amit Hora <hora.a...@gmail.com> wrote:

One region 


From: Ted Yu
Sent: ‎20-‎10-‎2015 15:01
To: Amit Singh Hora
Cc: user
Subject: Re: Spark opening to many connection with zookeeper


How many regions do your table have ?


Which hbase release do you use ?


Cheers


On Tue, Oct 20, 2015 at 12:32 AM, Amit Singh Hora <hora.a...@gmail.com> wrote:

Hi All ,

My spark job started reporting zookeeper errors after seeing the zkdumps
from Hbase master i realized that there are N number of connection being
made from the nodes where worker of spark are running i  believe some how
the connections are not getting closed that is leading to error

please find below code

val conf = ConfigFactory.load("connection.conf").getConfig("connection")
      val hconf = HBaseConfiguration.create();
    hconf.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))
    hconf.set("zookeeper.session.timeout",
conf.getString("hbase.zookeepertimeout"));
    hconf.set("hbase.client.retries.number", Integer.toString(1));
    hconf.set("zookeeper.recovery.retry", Integer.toString(1));
    hconf.set("hbase.master", conf.getString("hbase.hbase_master"));

hconf.set("hbase.zookeeper.quorum",conf.getString("hbase.hbase_zkquorum"));
// zkquorum consists of 5 nodes
    hconf.set("zookeeper.znode.parent", "/hbase-unsecure");
    hconf.set("hbase.zookeeper.property.clientPort",
conf.getString("hbase.hbase_zk_port"));

hconf.set(TableOutputFormat.OUTPUT_TABLE,conf.getString("hbase.tablename"))
    val jobConfig: JobConf = new JobConf(hconf, this.getClass)
    jobConfig.set("mapreduce.output.fileoutputformat.outputdir",
"/user/user01/out")
    jobConfig.setOutputFormat(classOf[TableOutputFormat])
    jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
conf.getString("hbase.tablename"))

         try{
         rdd.map(convertToPut).
        saveAsHadoopDataset(jobConfig)
         }

the method convertToPut does nothing but jsut converts the json to Put
objects of HBase

After i killed the application/driver the number of connection decreased
drastically

Kindly help in understanding and resolving the issue



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-opening-to-many-connection-with-zookeeper-tp25137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: Spark opening to many connection with zookeeper

Reply via email to