RE: Table is disabled an no way to get it back online

2016-11-22 Thread Cecile, Adam
Hello,

Thanks a lot, the table is back online. One last question ? Can you provide a 
log pattern to spot this ? Just in case it occurs again ;-)

Regards, Adam.

De : Matteo Bertozzi 
Envoyé : mardi 22 novembre 2016 19:48
À : user@hbase.apache.org
Objet : Re: Table is disabled an no way to get it back online

hadoop fs -rmr /hbase/MasterProcWALs

Matteo


On Tue, Nov 22, 2016 at 10:42 AM, Cecile, Adam  wrote:

> Can you explain me how to delete masterwals directory?
>
>
>
> Sent from my Samsung device
>
>
>  Original message 
> From: Matteo Bertozzi 
> Date: 22/11/2016 19:12 (GMT+01:00)
> To: user@hbase.apache.org
> Subject: Re: Table is disabled an no way to get it back online
>
> I don't think this has anything to do with HBASE-13415 or the bugfix it
> related to it i'm working on.
>
> this is probably the usual case of mismatch state with zk.  Disable is
> saying that the table is already not disabled.
> so, enableTable() is the one that should give any exception in case. but I
> don't see any.
> I suggest to just drop the MasterWALs directory, drop the znode
> /hbase/table/sentinel-meta, restart the master and try disable and then
> enable. that should bring you back with the table online
>
> Matteo
>
>
> On Tue, Nov 22, 2016 at 10:05 AM, Cecile, Adam 
> wrote:
>
> > Thanks for everything. As you said, this bug is supposed to be fixed in
> > 1.2.0. Matteo is reading this list as well ?
> >
> > Regards, Adam.
> > 
> > De : Ted Yu 
> > Envoyé : mardi 22 novembre 2016 19:00
> > À : user@hbase.apache.org
> > Objet : Re: Table is disabled an no way to get it back online
> >
> > Please take a look at HBASE-13415
> >
> > From the log, you're using hbase 1.2.0 already. But I heard there is a
> > subtle bug which is being fixed.
> >
> > Matteo is the person with best knowledge in this regard.
> >
> > On Tue, Nov 22, 2016 at 9:48 AM, Cecile, Adam 
> > wrote:
> >
> > > Another one, because I'm not sure the log is overwritten when
> restarting.
> > > This one has been cleared before service start.
> > > 
> > > De : Cecile, Adam 
> > > Envoyé : mardi 22 novembre 2016 18:42
> > > À : user@hbase.apache.org
> > > Objet : RE: Table is disabled an no way to get it back online
> > >
> > > Hello,
> > >
> > > Sadly I could not use the webui, it killed my firefox (probably way too
> > > much time). Here is the debug log... (11Mb uncompressed for maybe two
> > > minutes running !!)
> > >
> > > Best regards, Adam.
> > > 
> > > De : Ted Yu 
> > > Envoyé : mardi 22 novembre 2016 17:05
> > > À : user@hbase.apache.org
> > > Objet : Re: Table is disabled an no way to get it back online
> > >
> > > In log4j.properties :
> > >
> > > log4j.logger.org.apache.hadoop.hbase=DEBUG
> > >
> > > On master UI, you can select the Procedures tab. Pastebin what you see
> > > (text is enough).
> > >
> > > Thanks
> > >
> > > On Tue, Nov 22, 2016 at 7:16 AM, Cecile, Adam 
> > > wrote:
> > >
> > > > Hey Ted,
> > > >
> > > > Thank you. Heading home right now but I'll start the laptop again.
> Not
> > > > sure exactly how I should turn debug log on so if you have the
> > > information
> > > > it'd be appreciated, otherwise I'll look at the xml files.
> > > >
> > > > Regards, Adam.
> > > > 
> > > > De : Ted Yu 
> > > > Envoyé : mardi 22 novembre 2016 15:46
> > > > À : user@hbase.apache.org
> > > > Objet : Re: Table is disabled an no way to get it back online
> > > >
> > > > Master log contained entries in the following form:
> > > >
> > > > 2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
> > > > procedure2.ProcedureExecutor: Rolledback procedure
> > DisableTableProcedure
> > > > (table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
> > > > exec-time=242hrs, 10mins, 28.896sec
> > > > exception=org.apache.hadoop.hbase.TableNotEnabledException:
> > > sentinel-meta
> > > >
> > > > Note the procedure Id was around 43000, far lower than 147464.
> > > >
> > > > Can you turn debug log on and repost master log ?
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > >
> > > > > We're having a table stuck in disabled state. First I'd like to
> start
> > > > with
> > > > > what I tried already:
> > > > >
> > > > >
> > > > > * Restart all machines involved in HBase cluster
> > > > >
> > > > > * hbase hbck with varios arguments
> > > > >
> > > > > * hdfs fsck
> > > > >
> > > > > * Purge ZK /hbase and restart masters
> > > > >
> > > > >
> > > > > Now more details anout what happens:
> > > > >
> > > > > * 

RE: problem in launching HBase

2016-11-22 Thread QI Congyun

When I learn the HBase operational guideline on the web, the directory 
"/home/testuser/zookeeper" doesn't need to be created according the web's step.

Actually I think of whether that the directory can't be created result from the 
reason which the zookeeper can't attempt to authenticate using SASL, just 
because zookeeper can't set up a socket connection, it can't log on the 
localhost, and can't create a given directory according to the 
file--"hbase-stie.xml" configuration. 
Do you think?

/2016-11-22 16:26:20,262 INFO  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
2016-11-22 16:26:20,266 WARN  [main-SendThread(localhost:2181)] 
zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing 
socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
//

http://hbase.apache.org/book.html#quickstart 
/You do not need to create the HBase data directory. HBase will do this for 
you. If you create the directory, HBase will attempt to do a migration, which 
is not what you want.



-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Wednesday, November 23, 2016 12:03 PM
To: user@hbase.apache.org
Subject: Re: problem in launching HBase

bq. Unable to create data dir /home/testuser/zookeeper

Can you check why the above directory couldn't be created ?

The attachment didn't come through. Consider using pastebin.

Cheers

On Tue, Nov 22, 2016 at 7:51 PM, QI Congyun  wrote:

>
> It's confirmed. Actually. The attached files are including my 
> /etc/hosts and /hbase/conf/regionservers configuration. I try to 
> modify the regionservers and restart hbase, the problem appear again.
> 
> 
> [hadoop@hadoop2 hbase-1.2.3]$ bin/start-hbase.sh
> localhost: starting zookeeper, logging to /home/hadoop/hbase-1.2.3/bin/.
> ./logs/hbase-hadoop-zookeeper-hadoop2.out
> localhost: java.io.IOException: Unable to create data dir 
> /home/testuser/zookeeper
> localhost:  at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.
> writeMyID(HQuorumPeer.java:157)
> localhost:  at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(
> HQuorumPeer.java:70)
> starting master, logging to /home/hadoop/hbase-1.2.3/logs/ 
> hbase-hadoop-master-hadoop2.out
> hadoop2: starting regionserver, logging to /home/hadoop/hbase-1.2.3/bin/.
> ./logs/hbase-hadoop-regionserver-hadoop2.out
> 
> 
>
> -Original Message-
> From: Sen [mailto:besent...@gmail.com]
> Sent: Tuesday, November 22, 2016 11:55 PM
> To: user@hbase.apache.org
> Subject: Re: problem in launching HBase
>
> Did you ensure your etc/hosts file has the IP addresses of the Hbase 
> server?
>
> On Tue, Nov 22, 2016 at 8:39 PM, Ted Yu  wrote:
>
> > I think hbase 1.2.3 should run fine with Hadoop 2.7.3
> >
> > Can you replace localhost in your hbase-site.xml and try again 
> > (remember to set corresponding entry in /etc/hosts) ?
> >
> > BTW I would be out of office starting tomorrow morning.
> >
> > On Tue, Nov 22, 2016 at 12:44 AM, QI Congyun < 
> > congyun...@alcatel-sbell.com.cn> wrote:
> >
> > > Hello Ted,
> > >
> > > I try to remove the folder of Hbase and re-install it many times, 
> > > the
> > same
> > > faults below happened.
> > > I doubt whether the version of HBase1.2.3 is incompatible with the
> > version
> > > of Hadoop2.7.3? I search out the similar issues via the internet, 
> > > the similar issue happened very few.
> > > I'm very bewildered, could you help to find the reasons?
> > >
> > > Thanks.
> > >
> > >
> > > -Original Message-
> > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > Sent: Wednesday, November 16, 2016 11:13 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: problem in launching HBase
> > >
> > > 2016-10-31 15:49:57,503 INFO
> > > [master/localhost/127.0.0.1:16000-SendThread(localhost:2181)]
> > > zookeeper.ClientCnxn: Opening socket connection to server
> > > localhost/0:0:0:0:0:0:0:1:  2181. Will not attempt to authenticate 
> > > using SASL (unknown error)
> > >
> > > Is your machine running IPv6 ?
> > >
> > > I don't have much experience with IPv6.
> > >
> > > Cheers
> > >
> > > On Tue, Nov 15, 2016 at 6:59 PM, QI Congyun <
> > congyun...@alcatel-sbell.com.
> > > cn
> > > > wrote:
> > >
> > > > Hi, 

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-22 Thread Yu Li
Thanks Andrew, actually a blog is coming soon (smile).

And I've opened HBASE-17138
 for the
backport-to-branch-1 discussion, FWIW.

Best Regards,
Yu

On 22 November 2016 at 22:13, Andrew Purtell 
wrote:

> > I hope we could strengthen our faith in HBase capability
>
> Us too. Would you be interested in taking the metrics and discussion of
> them that came out in this thread into a post for the HBase project blog (
> https://blogs.apache.org/hbase)? As you can see from the other blog
> entries details about the use case does not need to reveal proprietary
> information, readers would be most interested in the metrics you
> observed/achieved on 11/11 followed by a technical discussion of how
> (roughly) to replicate them. You have good command of the English language
> so that won't be a problem and anyway I offer my services as editor should
> you like to try. Think about it. This would be a great post. I am sure,
> very popular.
>
>
> > On Nov 22, 2016, at 12:51 AM, Yu Li  wrote:
> >
> > bq. If it were not "confidential" might you mention why there is such a
> > large (several orders of magnitude) explosion of end user queries to
> > backend ones?
> > For index building and online machine learning system, there're more
> > information recorded after each visit/trade, such as user query/click
> > history, item stock updates, etc., and multiple user-specific feature
> data
> > will be read/updated for better recommendation. The flow is pretty much
> > like:
> > user visit some items
> > -> put them into shopping cart
> > -> checkout/removing item from shopping cart
> > -> item stock update/recommend new items to user
> > -> user visit new items
> > Not that much details could be supplied but I believe we could imagine
> how
> > many queries/updates there will be at backend for such loops, right?
> (smile)
> >
> > Thanks again for the interest and questions although a little bit derail
> of
> > the thread, and I hope we could strengthen our faith in HBase capability
> > after these discussions. :-)
> >
> > Best Regards,
> > Yu
> >
> >> On 21 November 2016 at 01:26, Stephen Boesch  wrote:
> >>
> >> Thanks Yu - given your apparent direct knowledge of the data that is
> >> helpful (my response earlier had been to  张铎) .   It is important so as
> to
> >> ensure informing colleagues of numbers that are "real".
> >>
> >> If it were not "confidential" might you mention why there is such a
> large
> >> (several orders of magnitude) explosion of end user queries to backend
> >> ones?
> >>
> >>
> >>
> >> 2016-11-20 7:51 GMT-08:00 Yu Li :
> >>
> >>> Thanks everyone for the feedback/comments, glad this data means
> something
> >>> and have drawn your interesting. Let me answer the questions (and sorry
> >> for
> >>> the lag)
> >>>
> >>> For the backport patches, ours are based on a customized 1.1.2 version
> >> and
> >>> cannot apply directly for any 1.x branches. It would be easy for us to
> >>> upload existing patches somewhere but obviously not that useful... so
> >> maybe
> >>> we still should get them in branch-1 and officially support read-path
> >>> offheap in future 1.x release? Let me create one JIRA about this and
> >> let's
> >>> discuss in the JIRA system. And to be very clear, it's a big YES to
> share
> >>> our patches with all rather than only numbers, just which way is better
> >>> (smile).
> >>>
> >>> And answers for @Stephen Boesch:
> >>>
> >>> bq. In any case the data is marked as 9/25/16 not 11/11/16
> >>> It's specially noted that the data on 9/25 are from our online A/B test
> >>> cluster, and not showing fully online data because we published offheap
> >>> together with NettyRpcServer for online thus no standalone comparison
> >> data
> >>> for offheap. Please check my original email more carefully (smile).
> >>>
> >>> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
> >>> checked and *google* does 40*K* queries per second.
> >>> As you already noticed, the 20M QPS is number from A/B testing cluster
> >> (450
> >>> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
> >>> Please note that this is NOT some cluster directly serves queries from
> >> end
> >>> user, but serving the index building and online machine learning
> system.
> >>> Refer to our talk on hbasecon2016 (slides
> >>>  >> apache-hbase-and-its-
> >>> applications-in-alibaba-search>
> >>> /recording
> >>>  h9HrA9qfDVOeNh1l_
> >>> T5HvwvkO9raWy=10>)
> >>> for more details, if you're interested. And different from google,
> >> there's
> >>> an obvious "hot spot" for us, so I don't think the QPS of these two
> >>> different systems are comparable.
> >>>
> >>> bq. So maybe please check your numbers again.
> >>> The numbers are got from online monitoring system and 

Re: problem in launching HBase

2016-11-22 Thread Ted Yu
bq. Unable to create data dir /home/testuser/zookeeper

Can you check why the above directory couldn't be created ?

The attachment didn't come through. Consider using pastebin.

Cheers

On Tue, Nov 22, 2016 at 7:51 PM, QI Congyun  wrote:

>
> It's confirmed. Actually. The attached files are including my /etc/hosts
> and /hbase/conf/regionservers configuration. I try to modify the
> regionservers and restart hbase, the problem appear again.
> 
> 
> [hadoop@hadoop2 hbase-1.2.3]$ bin/start-hbase.sh
> localhost: starting zookeeper, logging to /home/hadoop/hbase-1.2.3/bin/.
> ./logs/hbase-hadoop-zookeeper-hadoop2.out
> localhost: java.io.IOException: Unable to create data dir
> /home/testuser/zookeeper
> localhost:  at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.
> writeMyID(HQuorumPeer.java:157)
> localhost:  at org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(
> HQuorumPeer.java:70)
> starting master, logging to /home/hadoop/hbase-1.2.3/logs/
> hbase-hadoop-master-hadoop2.out
> hadoop2: starting regionserver, logging to /home/hadoop/hbase-1.2.3/bin/.
> ./logs/hbase-hadoop-regionserver-hadoop2.out
> 
> 
>
> -Original Message-
> From: Sen [mailto:besent...@gmail.com]
> Sent: Tuesday, November 22, 2016 11:55 PM
> To: user@hbase.apache.org
> Subject: Re: problem in launching HBase
>
> Did you ensure your etc/hosts file has the IP addresses of the Hbase
> server?
>
> On Tue, Nov 22, 2016 at 8:39 PM, Ted Yu  wrote:
>
> > I think hbase 1.2.3 should run fine with Hadoop 2.7.3
> >
> > Can you replace localhost in your hbase-site.xml and try again
> > (remember to set corresponding entry in /etc/hosts) ?
> >
> > BTW I would be out of office starting tomorrow morning.
> >
> > On Tue, Nov 22, 2016 at 12:44 AM, QI Congyun <
> > congyun...@alcatel-sbell.com.cn> wrote:
> >
> > > Hello Ted,
> > >
> > > I try to remove the folder of Hbase and re-install it many times,
> > > the
> > same
> > > faults below happened.
> > > I doubt whether the version of HBase1.2.3 is incompatible with the
> > version
> > > of Hadoop2.7.3? I search out the similar issues via the internet,
> > > the similar issue happened very few.
> > > I'm very bewildered, could you help to find the reasons?
> > >
> > > Thanks.
> > >
> > >
> > > -Original Message-
> > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > Sent: Wednesday, November 16, 2016 11:13 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: problem in launching HBase
> > >
> > > 2016-10-31 15:49:57,503 INFO
> > > [master/localhost/127.0.0.1:16000-SendThread(localhost:2181)]
> > > zookeeper.ClientCnxn: Opening socket connection to server
> > > localhost/0:0:0:0:0:0:0:1:  2181. Will not attempt to authenticate
> > > using SASL (unknown error)
> > >
> > > Is your machine running IPv6 ?
> > >
> > > I don't have much experience with IPv6.
> > >
> > > Cheers
> > >
> > > On Tue, Nov 15, 2016 at 6:59 PM, QI Congyun <
> > congyun...@alcatel-sbell.com.
> > > cn
> > > > wrote:
> > >
> > > > Hi, Ted,
> > > >
> > > > Do you feel what I make some incorrect configuration lead to my
> > > > encountering issues?
> > > > Thanks.
> > > >
> > > >
> > > > -Original Message-
> > > > From: QI Congyun
> > > > Sent: Tuesday, November 15, 2016 1:29 PM
> > > > To: user@hbase.apache.org
> > > > Subject: RE: problem in launching HBase
> > > >
> > > >
> > > > I'm so sorry that I make a mistake. The Hadoop configuration files
> > > > are attached in the previous e-mail.
> > > >
> > > > The hbase-site.xml are attached, pls check it.
> > > >
> > > >
> > > >
> > > > -Original Message-
> > > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > > Sent: Tuesday, November 15, 2016 1:25 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: problem in launching HBase
> > > >
> > > > I don't see hbase-site.xml attached.
> > > >
> > > > Consider using pastebin.
> > > >
> > > > On Mon, Nov 14, 2016 at 9:19 PM, QI Congyun <
> > > congyun...@alcatel-sbell.com.
> > > > cn
> > > > > wrote:
> > > >
> > > > >
> > > > > The name node and data node are running normally, such as the
> > > > > following process. The file "hbase-site.xml" and other
> > > > > associated files
> > > > are enclosed.
> > > > > Thanks.
> > > > >
> > > > > 
> > > > > 
> > > > > ---
> > > > > [hadoop@hadoop2 conf]$ jps
> > > > > 11805 SecondaryNameNode
> > > > > 32314 Jps
> > > > > 11614 DataNode
> > > > > 507 NodeManager
> > > > > 385 ResourceManager
> > > > > 11379 NameNode
> > > > > 
> > > > > 

RE: problem in launching HBase

2016-11-22 Thread QI Congyun

It's confirmed. Actually. The attached files are including my /etc/hosts and 
/hbase/conf/regionservers configuration. I try to modify the regionservers and 
restart hbase, the problem appear again.

[hadoop@hadoop2 hbase-1.2.3]$ bin/start-hbase.sh 
localhost: starting zookeeper, logging to 
/home/hadoop/hbase-1.2.3/bin/../logs/hbase-hadoop-zookeeper-hadoop2.out
localhost: java.io.IOException: Unable to create data dir 
/home/testuser/zookeeper
localhost:  at 
org.apache.hadoop.hbase.zookeeper.HQuorumPeer.writeMyID(HQuorumPeer.java:157)
localhost:  at 
org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:70)
starting master, logging to 
/home/hadoop/hbase-1.2.3/logs/hbase-hadoop-master-hadoop2.out
hadoop2: starting regionserver, logging to 
/home/hadoop/hbase-1.2.3/bin/../logs/hbase-hadoop-regionserver-hadoop2.out


-Original Message-
From: Sen [mailto:besent...@gmail.com] 
Sent: Tuesday, November 22, 2016 11:55 PM
To: user@hbase.apache.org
Subject: Re: problem in launching HBase

Did you ensure your etc/hosts file has the IP addresses of the Hbase server?

On Tue, Nov 22, 2016 at 8:39 PM, Ted Yu  wrote:

> I think hbase 1.2.3 should run fine with Hadoop 2.7.3
>
> Can you replace localhost in your hbase-site.xml and try again 
> (remember to set corresponding entry in /etc/hosts) ?
>
> BTW I would be out of office starting tomorrow morning.
>
> On Tue, Nov 22, 2016 at 12:44 AM, QI Congyun < 
> congyun...@alcatel-sbell.com.cn> wrote:
>
> > Hello Ted,
> >
> > I try to remove the folder of Hbase and re-install it many times, 
> > the
> same
> > faults below happened.
> > I doubt whether the version of HBase1.2.3 is incompatible with the
> version
> > of Hadoop2.7.3? I search out the similar issues via the internet, 
> > the similar issue happened very few.
> > I'm very bewildered, could you help to find the reasons?
> >
> > Thanks.
> >
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Wednesday, November 16, 2016 11:13 AM
> > To: user@hbase.apache.org
> > Subject: Re: problem in launching HBase
> >
> > 2016-10-31 15:49:57,503 INFO
> > [master/localhost/127.0.0.1:16000-SendThread(localhost:2181)]
> > zookeeper.ClientCnxn: Opening socket connection to server
> > localhost/0:0:0:0:0:0:0:1:  2181. Will not attempt to authenticate 
> > using SASL (unknown error)
> >
> > Is your machine running IPv6 ?
> >
> > I don't have much experience with IPv6.
> >
> > Cheers
> >
> > On Tue, Nov 15, 2016 at 6:59 PM, QI Congyun <
> congyun...@alcatel-sbell.com.
> > cn
> > > wrote:
> >
> > > Hi, Ted,
> > >
> > > Do you feel what I make some incorrect configuration lead to my 
> > > encountering issues?
> > > Thanks.
> > >
> > >
> > > -Original Message-
> > > From: QI Congyun
> > > Sent: Tuesday, November 15, 2016 1:29 PM
> > > To: user@hbase.apache.org
> > > Subject: RE: problem in launching HBase
> > >
> > >
> > > I'm so sorry that I make a mistake. The Hadoop configuration files 
> > > are attached in the previous e-mail.
> > >
> > > The hbase-site.xml are attached, pls check it.
> > >
> > >
> > >
> > > -Original Message-
> > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > Sent: Tuesday, November 15, 2016 1:25 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: problem in launching HBase
> > >
> > > I don't see hbase-site.xml attached.
> > >
> > > Consider using pastebin.
> > >
> > > On Mon, Nov 14, 2016 at 9:19 PM, QI Congyun <
> > congyun...@alcatel-sbell.com.
> > > cn
> > > > wrote:
> > >
> > > >
> > > > The name node and data node are running normally, such as the 
> > > > following process. The file "hbase-site.xml" and other 
> > > > associated files
> > > are enclosed.
> > > > Thanks.
> > > >
> > > > 
> > > > 
> > > > ---
> > > > [hadoop@hadoop2 conf]$ jps
> > > > 11805 SecondaryNameNode
> > > > 32314 Jps
> > > > 11614 DataNode
> > > > 507 NodeManager
> > > > 385 ResourceManager
> > > > 11379 NameNode
> > > > 
> > > > 
> > > > 
> > > > --
> > > > --
> > > > [hadoop@hadoop2 hadoop-2.7.3]$ bin/hdfs dfsadmin -report 
> > > > Configured
> > > > Capacity: 154684043264 (144.06 GB) Present Capacity: 
> > > > 133174730752
> > > > (124.03 GB) DFS Remaining: 128144982016 (119.34 GB) DFS Used:
> > > > 5029748736 (4.68 GB) DFS Used%: 3.78% Under replicated blocks: 0 
> > > > Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks 
> > > > (with replication factor 1): 0
> > > >
> > > > 

Re: Table is disabled an no way to get it back online

2016-11-22 Thread Matteo Bertozzi
hadoop fs -rmr /hbase/MasterProcWALs

Matteo


On Tue, Nov 22, 2016 at 10:42 AM, Cecile, Adam  wrote:

> Can you explain me how to delete masterwals directory?
>
>
>
> Sent from my Samsung device
>
>
>  Original message 
> From: Matteo Bertozzi 
> Date: 22/11/2016 19:12 (GMT+01:00)
> To: user@hbase.apache.org
> Subject: Re: Table is disabled an no way to get it back online
>
> I don't think this has anything to do with HBASE-13415 or the bugfix it
> related to it i'm working on.
>
> this is probably the usual case of mismatch state with zk.  Disable is
> saying that the table is already not disabled.
> so, enableTable() is the one that should give any exception in case. but I
> don't see any.
> I suggest to just drop the MasterWALs directory, drop the znode
> /hbase/table/sentinel-meta, restart the master and try disable and then
> enable. that should bring you back with the table online
>
> Matteo
>
>
> On Tue, Nov 22, 2016 at 10:05 AM, Cecile, Adam 
> wrote:
>
> > Thanks for everything. As you said, this bug is supposed to be fixed in
> > 1.2.0. Matteo is reading this list as well ?
> >
> > Regards, Adam.
> > 
> > De : Ted Yu 
> > Envoyé : mardi 22 novembre 2016 19:00
> > À : user@hbase.apache.org
> > Objet : Re: Table is disabled an no way to get it back online
> >
> > Please take a look at HBASE-13415
> >
> > From the log, you're using hbase 1.2.0 already. But I heard there is a
> > subtle bug which is being fixed.
> >
> > Matteo is the person with best knowledge in this regard.
> >
> > On Tue, Nov 22, 2016 at 9:48 AM, Cecile, Adam 
> > wrote:
> >
> > > Another one, because I'm not sure the log is overwritten when
> restarting.
> > > This one has been cleared before service start.
> > > 
> > > De : Cecile, Adam 
> > > Envoyé : mardi 22 novembre 2016 18:42
> > > À : user@hbase.apache.org
> > > Objet : RE: Table is disabled an no way to get it back online
> > >
> > > Hello,
> > >
> > > Sadly I could not use the webui, it killed my firefox (probably way too
> > > much time). Here is the debug log... (11Mb uncompressed for maybe two
> > > minutes running !!)
> > >
> > > Best regards, Adam.
> > > 
> > > De : Ted Yu 
> > > Envoyé : mardi 22 novembre 2016 17:05
> > > À : user@hbase.apache.org
> > > Objet : Re: Table is disabled an no way to get it back online
> > >
> > > In log4j.properties :
> > >
> > > log4j.logger.org.apache.hadoop.hbase=DEBUG
> > >
> > > On master UI, you can select the Procedures tab. Pastebin what you see
> > > (text is enough).
> > >
> > > Thanks
> > >
> > > On Tue, Nov 22, 2016 at 7:16 AM, Cecile, Adam 
> > > wrote:
> > >
> > > > Hey Ted,
> > > >
> > > > Thank you. Heading home right now but I'll start the laptop again.
> Not
> > > > sure exactly how I should turn debug log on so if you have the
> > > information
> > > > it'd be appreciated, otherwise I'll look at the xml files.
> > > >
> > > > Regards, Adam.
> > > > 
> > > > De : Ted Yu 
> > > > Envoyé : mardi 22 novembre 2016 15:46
> > > > À : user@hbase.apache.org
> > > > Objet : Re: Table is disabled an no way to get it back online
> > > >
> > > > Master log contained entries in the following form:
> > > >
> > > > 2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
> > > > procedure2.ProcedureExecutor: Rolledback procedure
> > DisableTableProcedure
> > > > (table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
> > > > exec-time=242hrs, 10mins, 28.896sec
> > > > exception=org.apache.hadoop.hbase.TableNotEnabledException:
> > > sentinel-meta
> > > >
> > > > Note the procedure Id was around 43000, far lower than 147464.
> > > >
> > > > Can you turn debug log on and repost master log ?
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > >
> > > > > We're having a table stuck in disabled state. First I'd like to
> start
> > > > with
> > > > > what I tried already:
> > > > >
> > > > >
> > > > > * Restart all machines involved in HBase cluster
> > > > >
> > > > > * hbase hbck with varios arguments
> > > > >
> > > > > * hdfs fsck
> > > > >
> > > > > * Purge ZK /hbase and restart masters
> > > > >
> > > > >
> > > > > Now more details anout what happens:
> > > > >
> > > > > * When enabling from hbase shell:
> > > > >
> > > > >
> > > > > hbase(main):002:0> enable "sentinel-meta"
> > > > > ERROR: The procedure 147464 is still running
> > > > >
> > > > >
> > > > > The task ID changes every time I run the command so I think it's
> > > talking
> > > > > about itself (and it gets stuck for a while before saying anything)
> > > > >
> > > > >
> > > > > In 

RE: Table is disabled an no way to get it back online

2016-11-22 Thread Cecile, Adam
Can you explain me how to delete masterwals directory?



Sent from my Samsung device


 Original message 
From: Matteo Bertozzi 
Date: 22/11/2016 19:12 (GMT+01:00)
To: user@hbase.apache.org
Subject: Re: Table is disabled an no way to get it back online

I don't think this has anything to do with HBASE-13415 or the bugfix it
related to it i'm working on.

this is probably the usual case of mismatch state with zk.  Disable is
saying that the table is already not disabled.
so, enableTable() is the one that should give any exception in case. but I
don't see any.
I suggest to just drop the MasterWALs directory, drop the znode
/hbase/table/sentinel-meta, restart the master and try disable and then
enable. that should bring you back with the table online

Matteo


On Tue, Nov 22, 2016 at 10:05 AM, Cecile, Adam  wrote:

> Thanks for everything. As you said, this bug is supposed to be fixed in
> 1.2.0. Matteo is reading this list as well ?
>
> Regards, Adam.
> 
> De : Ted Yu 
> Envoyé : mardi 22 novembre 2016 19:00
> À : user@hbase.apache.org
> Objet : Re: Table is disabled an no way to get it back online
>
> Please take a look at HBASE-13415
>
> From the log, you're using hbase 1.2.0 already. But I heard there is a
> subtle bug which is being fixed.
>
> Matteo is the person with best knowledge in this regard.
>
> On Tue, Nov 22, 2016 at 9:48 AM, Cecile, Adam 
> wrote:
>
> > Another one, because I'm not sure the log is overwritten when restarting.
> > This one has been cleared before service start.
> > 
> > De : Cecile, Adam 
> > Envoyé : mardi 22 novembre 2016 18:42
> > À : user@hbase.apache.org
> > Objet : RE: Table is disabled an no way to get it back online
> >
> > Hello,
> >
> > Sadly I could not use the webui, it killed my firefox (probably way too
> > much time). Here is the debug log... (11Mb uncompressed for maybe two
> > minutes running !!)
> >
> > Best regards, Adam.
> > 
> > De : Ted Yu 
> > Envoyé : mardi 22 novembre 2016 17:05
> > À : user@hbase.apache.org
> > Objet : Re: Table is disabled an no way to get it back online
> >
> > In log4j.properties :
> >
> > log4j.logger.org.apache.hadoop.hbase=DEBUG
> >
> > On master UI, you can select the Procedures tab. Pastebin what you see
> > (text is enough).
> >
> > Thanks
> >
> > On Tue, Nov 22, 2016 at 7:16 AM, Cecile, Adam 
> > wrote:
> >
> > > Hey Ted,
> > >
> > > Thank you. Heading home right now but I'll start the laptop again. Not
> > > sure exactly how I should turn debug log on so if you have the
> > information
> > > it'd be appreciated, otherwise I'll look at the xml files.
> > >
> > > Regards, Adam.
> > > 
> > > De : Ted Yu 
> > > Envoyé : mardi 22 novembre 2016 15:46
> > > À : user@hbase.apache.org
> > > Objet : Re: Table is disabled an no way to get it back online
> > >
> > > Master log contained entries in the following form:
> > >
> > > 2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
> > > procedure2.ProcedureExecutor: Rolledback procedure
> DisableTableProcedure
> > > (table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
> > > exec-time=242hrs, 10mins, 28.896sec
> > > exception=org.apache.hadoop.hbase.TableNotEnabledException:
> > sentinel-meta
> > >
> > > Note the procedure Id was around 43000, far lower than 147464.
> > >
> > > Can you turn debug log on and repost master log ?
> > >
> > > Thanks
> > >
> > > On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >
> > > > We're having a table stuck in disabled state. First I'd like to start
> > > with
> > > > what I tried already:
> > > >
> > > >
> > > > * Restart all machines involved in HBase cluster
> > > >
> > > > * hbase hbck with varios arguments
> > > >
> > > > * hdfs fsck
> > > >
> > > > * Purge ZK /hbase and restart masters
> > > >
> > > >
> > > > Now more details anout what happens:
> > > >
> > > > * When enabling from hbase shell:
> > > >
> > > >
> > > > hbase(main):002:0> enable "sentinel-meta"
> > > > ERROR: The procedure 147464 is still running
> > > >
> > > >
> > > > The task ID changes every time I run the command so I think it's
> > talking
> > > > about itself (and it gets stuck for a while before saying anything)
> > > >
> > > >
> > > > In the log, all I can see is:
> > > >
> > > > 2016-11-22 13:10:50,776 INFO  [ProcedureExecutor-0]
> > > > procedure2.ProcedureExecutor: Rolledback procedure
> > DisableTableProcedure
> > > > (table=sentinel-meta) id=43220 owner=hbase state=ROLLEDBACK
> > > > exec-time=242hrs, 52mins, 7.454sec exception=org.apache.hadoop.
> hbase.
> > > TableNotEnabledException:
> > > > sentinel-meta
> > > > 2016-11-22 13:10:50,781 

Re: Table is disabled an no way to get it back online

2016-11-22 Thread Matteo Bertozzi
I don't think this has anything to do with HBASE-13415 or the bugfix it
related to it i'm working on.

this is probably the usual case of mismatch state with zk.  Disable is
saying that the table is already not disabled.
so, enableTable() is the one that should give any exception in case. but I
don't see any.
I suggest to just drop the MasterWALs directory, drop the znode
/hbase/table/sentinel-meta, restart the master and try disable and then
enable. that should bring you back with the table online

Matteo


On Tue, Nov 22, 2016 at 10:05 AM, Cecile, Adam  wrote:

> Thanks for everything. As you said, this bug is supposed to be fixed in
> 1.2.0. Matteo is reading this list as well ?
>
> Regards, Adam.
> 
> De : Ted Yu 
> Envoyé : mardi 22 novembre 2016 19:00
> À : user@hbase.apache.org
> Objet : Re: Table is disabled an no way to get it back online
>
> Please take a look at HBASE-13415
>
> From the log, you're using hbase 1.2.0 already. But I heard there is a
> subtle bug which is being fixed.
>
> Matteo is the person with best knowledge in this regard.
>
> On Tue, Nov 22, 2016 at 9:48 AM, Cecile, Adam 
> wrote:
>
> > Another one, because I'm not sure the log is overwritten when restarting.
> > This one has been cleared before service start.
> > 
> > De : Cecile, Adam 
> > Envoyé : mardi 22 novembre 2016 18:42
> > À : user@hbase.apache.org
> > Objet : RE: Table is disabled an no way to get it back online
> >
> > Hello,
> >
> > Sadly I could not use the webui, it killed my firefox (probably way too
> > much time). Here is the debug log... (11Mb uncompressed for maybe two
> > minutes running !!)
> >
> > Best regards, Adam.
> > 
> > De : Ted Yu 
> > Envoyé : mardi 22 novembre 2016 17:05
> > À : user@hbase.apache.org
> > Objet : Re: Table is disabled an no way to get it back online
> >
> > In log4j.properties :
> >
> > log4j.logger.org.apache.hadoop.hbase=DEBUG
> >
> > On master UI, you can select the Procedures tab. Pastebin what you see
> > (text is enough).
> >
> > Thanks
> >
> > On Tue, Nov 22, 2016 at 7:16 AM, Cecile, Adam 
> > wrote:
> >
> > > Hey Ted,
> > >
> > > Thank you. Heading home right now but I'll start the laptop again. Not
> > > sure exactly how I should turn debug log on so if you have the
> > information
> > > it'd be appreciated, otherwise I'll look at the xml files.
> > >
> > > Regards, Adam.
> > > 
> > > De : Ted Yu 
> > > Envoyé : mardi 22 novembre 2016 15:46
> > > À : user@hbase.apache.org
> > > Objet : Re: Table is disabled an no way to get it back online
> > >
> > > Master log contained entries in the following form:
> > >
> > > 2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
> > > procedure2.ProcedureExecutor: Rolledback procedure
> DisableTableProcedure
> > > (table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
> > > exec-time=242hrs, 10mins, 28.896sec
> > > exception=org.apache.hadoop.hbase.TableNotEnabledException:
> > sentinel-meta
> > >
> > > Note the procedure Id was around 43000, far lower than 147464.
> > >
> > > Can you turn debug log on and repost master log ?
> > >
> > > Thanks
> > >
> > > On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >
> > > > We're having a table stuck in disabled state. First I'd like to start
> > > with
> > > > what I tried already:
> > > >
> > > >
> > > > * Restart all machines involved in HBase cluster
> > > >
> > > > * hbase hbck with varios arguments
> > > >
> > > > * hdfs fsck
> > > >
> > > > * Purge ZK /hbase and restart masters
> > > >
> > > >
> > > > Now more details anout what happens:
> > > >
> > > > * When enabling from hbase shell:
> > > >
> > > >
> > > > hbase(main):002:0> enable "sentinel-meta"
> > > > ERROR: The procedure 147464 is still running
> > > >
> > > >
> > > > The task ID changes every time I run the command so I think it's
> > talking
> > > > about itself (and it gets stuck for a while before saying anything)
> > > >
> > > >
> > > > In the log, all I can see is:
> > > >
> > > > 2016-11-22 13:10:50,776 INFO  [ProcedureExecutor-0]
> > > > procedure2.ProcedureExecutor: Rolledback procedure
> > DisableTableProcedure
> > > > (table=sentinel-meta) id=43220 owner=hbase state=ROLLEDBACK
> > > > exec-time=242hrs, 52mins, 7.454sec exception=org.apache.hadoop.
> hbase.
> > > TableNotEnabledException:
> > > > sentinel-meta
> > > > 2016-11-22 13:10:50,781 INFO  [ProcedureExecutor-0] procedure.
> > > DisableTableProcedure:
> > > > Table sentinel-meta isn't enabled; skipping disable
> > > > 2016-11-22 13:10:51,084 INFO  [ProcedureExecutor-0]
> > > > procedure2.ProcedureExecutor: Rolledback procedure
> > DisableTableProcedure
> > > > 

RE: Table is disabled an no way to get it back online

2016-11-22 Thread Cecile, Adam
Thanks for everything. As you said, this bug is supposed to be fixed in 1.2.0. 
Matteo is reading this list as well ?

Regards, Adam.

De : Ted Yu 
Envoyé : mardi 22 novembre 2016 19:00
À : user@hbase.apache.org
Objet : Re: Table is disabled an no way to get it back online

Please take a look at HBASE-13415

>From the log, you're using hbase 1.2.0 already. But I heard there is a
subtle bug which is being fixed.

Matteo is the person with best knowledge in this regard.

On Tue, Nov 22, 2016 at 9:48 AM, Cecile, Adam  wrote:

> Another one, because I'm not sure the log is overwritten when restarting.
> This one has been cleared before service start.
> 
> De : Cecile, Adam 
> Envoyé : mardi 22 novembre 2016 18:42
> À : user@hbase.apache.org
> Objet : RE: Table is disabled an no way to get it back online
>
> Hello,
>
> Sadly I could not use the webui, it killed my firefox (probably way too
> much time). Here is the debug log... (11Mb uncompressed for maybe two
> minutes running !!)
>
> Best regards, Adam.
> 
> De : Ted Yu 
> Envoyé : mardi 22 novembre 2016 17:05
> À : user@hbase.apache.org
> Objet : Re: Table is disabled an no way to get it back online
>
> In log4j.properties :
>
> log4j.logger.org.apache.hadoop.hbase=DEBUG
>
> On master UI, you can select the Procedures tab. Pastebin what you see
> (text is enough).
>
> Thanks
>
> On Tue, Nov 22, 2016 at 7:16 AM, Cecile, Adam 
> wrote:
>
> > Hey Ted,
> >
> > Thank you. Heading home right now but I'll start the laptop again. Not
> > sure exactly how I should turn debug log on so if you have the
> information
> > it'd be appreciated, otherwise I'll look at the xml files.
> >
> > Regards, Adam.
> > 
> > De : Ted Yu 
> > Envoyé : mardi 22 novembre 2016 15:46
> > À : user@hbase.apache.org
> > Objet : Re: Table is disabled an no way to get it back online
> >
> > Master log contained entries in the following form:
> >
> > 2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
> > procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
> > (table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
> > exec-time=242hrs, 10mins, 28.896sec
> > exception=org.apache.hadoop.hbase.TableNotEnabledException:
> sentinel-meta
> >
> > Note the procedure Id was around 43000, far lower than 147464.
> >
> > Can you turn debug log on and repost master log ?
> >
> > Thanks
> >
> > On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam 
> > wrote:
> >
> > > Hello,
> > >
> > >
> > > We're having a table stuck in disabled state. First I'd like to start
> > with
> > > what I tried already:
> > >
> > >
> > > * Restart all machines involved in HBase cluster
> > >
> > > * hbase hbck with varios arguments
> > >
> > > * hdfs fsck
> > >
> > > * Purge ZK /hbase and restart masters
> > >
> > >
> > > Now more details anout what happens:
> > >
> > > * When enabling from hbase shell:
> > >
> > >
> > > hbase(main):002:0> enable "sentinel-meta"
> > > ERROR: The procedure 147464 is still running
> > >
> > >
> > > The task ID changes every time I run the command so I think it's
> talking
> > > about itself (and it gets stuck for a while before saying anything)
> > >
> > >
> > > In the log, all I can see is:
> > >
> > > 2016-11-22 13:10:50,776 INFO  [ProcedureExecutor-0]
> > > procedure2.ProcedureExecutor: Rolledback procedure
> DisableTableProcedure
> > > (table=sentinel-meta) id=43220 owner=hbase state=ROLLEDBACK
> > > exec-time=242hrs, 52mins, 7.454sec exception=org.apache.hadoop.hbase.
> > TableNotEnabledException:
> > > sentinel-meta
> > > 2016-11-22 13:10:50,781 INFO  [ProcedureExecutor-0] procedure.
> > DisableTableProcedure:
> > > Table sentinel-meta isn't enabled; skipping disable
> > > 2016-11-22 13:10:51,084 INFO  [ProcedureExecutor-0]
> > > procedure2.ProcedureExecutor: Rolledback procedure
> DisableTableProcedure
> > > (table=sentinel-meta) id=43221 owner=hbase state=ROLLEDBACK
> > > exec-time=242hrs, 51mins, 42.288sec exception=org.apache.hadoop.hbase.
> > TableNotEnabledException:
> > > sentinel-meta
> > > 2016-11-22 13:10:51,088 INFO  [ProcedureExecutor-0] procedure.
> > DisableTableProcedure:
> > > Table sentinel-meta isn't enabled; skipping disable
> > >
> > >
> > > Please also find attached a complete log from startup to shutdown on a
> > > single active master. You'll see the table is found as well as the
> > regions
> > > but it gets deactivated with no reason.
> > >
> > >
> > > Thanks a lot for your help, we're kinda running out if ideas here.
> > >
> > >
> > > Best regards,
> > >
> > >
> > > Adam.
> > >
> > >
> > >
> >
>


Re: Table is disabled an no way to get it back online

2016-11-22 Thread Ted Yu
Please take a look at HBASE-13415

>From the log, you're using hbase 1.2.0 already. But I heard there is a
subtle bug which is being fixed.

Matteo is the person with best knowledge in this regard.

On Tue, Nov 22, 2016 at 9:48 AM, Cecile, Adam  wrote:

> Another one, because I'm not sure the log is overwritten when restarting.
> This one has been cleared before service start.
> 
> De : Cecile, Adam 
> Envoyé : mardi 22 novembre 2016 18:42
> À : user@hbase.apache.org
> Objet : RE: Table is disabled an no way to get it back online
>
> Hello,
>
> Sadly I could not use the webui, it killed my firefox (probably way too
> much time). Here is the debug log... (11Mb uncompressed for maybe two
> minutes running !!)
>
> Best regards, Adam.
> 
> De : Ted Yu 
> Envoyé : mardi 22 novembre 2016 17:05
> À : user@hbase.apache.org
> Objet : Re: Table is disabled an no way to get it back online
>
> In log4j.properties :
>
> log4j.logger.org.apache.hadoop.hbase=DEBUG
>
> On master UI, you can select the Procedures tab. Pastebin what you see
> (text is enough).
>
> Thanks
>
> On Tue, Nov 22, 2016 at 7:16 AM, Cecile, Adam 
> wrote:
>
> > Hey Ted,
> >
> > Thank you. Heading home right now but I'll start the laptop again. Not
> > sure exactly how I should turn debug log on so if you have the
> information
> > it'd be appreciated, otherwise I'll look at the xml files.
> >
> > Regards, Adam.
> > 
> > De : Ted Yu 
> > Envoyé : mardi 22 novembre 2016 15:46
> > À : user@hbase.apache.org
> > Objet : Re: Table is disabled an no way to get it back online
> >
> > Master log contained entries in the following form:
> >
> > 2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
> > procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
> > (table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
> > exec-time=242hrs, 10mins, 28.896sec
> > exception=org.apache.hadoop.hbase.TableNotEnabledException:
> sentinel-meta
> >
> > Note the procedure Id was around 43000, far lower than 147464.
> >
> > Can you turn debug log on and repost master log ?
> >
> > Thanks
> >
> > On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam 
> > wrote:
> >
> > > Hello,
> > >
> > >
> > > We're having a table stuck in disabled state. First I'd like to start
> > with
> > > what I tried already:
> > >
> > >
> > > * Restart all machines involved in HBase cluster
> > >
> > > * hbase hbck with varios arguments
> > >
> > > * hdfs fsck
> > >
> > > * Purge ZK /hbase and restart masters
> > >
> > >
> > > Now more details anout what happens:
> > >
> > > * When enabling from hbase shell:
> > >
> > >
> > > hbase(main):002:0> enable "sentinel-meta"
> > > ERROR: The procedure 147464 is still running
> > >
> > >
> > > The task ID changes every time I run the command so I think it's
> talking
> > > about itself (and it gets stuck for a while before saying anything)
> > >
> > >
> > > In the log, all I can see is:
> > >
> > > 2016-11-22 13:10:50,776 INFO  [ProcedureExecutor-0]
> > > procedure2.ProcedureExecutor: Rolledback procedure
> DisableTableProcedure
> > > (table=sentinel-meta) id=43220 owner=hbase state=ROLLEDBACK
> > > exec-time=242hrs, 52mins, 7.454sec exception=org.apache.hadoop.hbase.
> > TableNotEnabledException:
> > > sentinel-meta
> > > 2016-11-22 13:10:50,781 INFO  [ProcedureExecutor-0] procedure.
> > DisableTableProcedure:
> > > Table sentinel-meta isn't enabled; skipping disable
> > > 2016-11-22 13:10:51,084 INFO  [ProcedureExecutor-0]
> > > procedure2.ProcedureExecutor: Rolledback procedure
> DisableTableProcedure
> > > (table=sentinel-meta) id=43221 owner=hbase state=ROLLEDBACK
> > > exec-time=242hrs, 51mins, 42.288sec exception=org.apache.hadoop.hbase.
> > TableNotEnabledException:
> > > sentinel-meta
> > > 2016-11-22 13:10:51,088 INFO  [ProcedureExecutor-0] procedure.
> > DisableTableProcedure:
> > > Table sentinel-meta isn't enabled; skipping disable
> > >
> > >
> > > Please also find attached a complete log from startup to shutdown on a
> > > single active master. You'll see the table is found as well as the
> > regions
> > > but it gets deactivated with no reason.
> > >
> > >
> > > Thanks a lot for your help, we're kinda running out if ideas here.
> > >
> > >
> > > Best regards,
> > >
> > >
> > > Adam.
> > >
> > >
> > >
> >
>


Re: problem in launching HBase

2016-11-22 Thread Sen
Did you ensure your etc/hosts file has the IP addresses of the Hbase server?

On Tue, Nov 22, 2016 at 8:39 PM, Ted Yu  wrote:

> I think hbase 1.2.3 should run fine with Hadoop 2.7.3
>
> Can you replace localhost in your hbase-site.xml and try again (remember to
> set corresponding entry in /etc/hosts) ?
>
> BTW I would be out of office starting tomorrow morning.
>
> On Tue, Nov 22, 2016 at 12:44 AM, QI Congyun <
> congyun...@alcatel-sbell.com.cn> wrote:
>
> > Hello Ted,
> >
> > I try to remove the folder of Hbase and re-install it many times, the
> same
> > faults below happened.
> > I doubt whether the version of HBase1.2.3 is incompatible with the
> version
> > of Hadoop2.7.3? I search out the similar issues via the internet, the
> > similar issue happened very few.
> > I'm very bewildered, could you help to find the reasons?
> >
> > Thanks.
> >
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Wednesday, November 16, 2016 11:13 AM
> > To: user@hbase.apache.org
> > Subject: Re: problem in launching HBase
> >
> > 2016-10-31 15:49:57,503 INFO
> > [master/localhost/127.0.0.1:16000-SendThread(localhost:2181)]
> > zookeeper.ClientCnxn: Opening socket connection to server
> > localhost/0:0:0:0:0:0:0:1:  2181. Will not attempt to authenticate using
> > SASL (unknown error)
> >
> > Is your machine running IPv6 ?
> >
> > I don't have much experience with IPv6.
> >
> > Cheers
> >
> > On Tue, Nov 15, 2016 at 6:59 PM, QI Congyun <
> congyun...@alcatel-sbell.com.
> > cn
> > > wrote:
> >
> > > Hi, Ted,
> > >
> > > Do you feel what I make some incorrect configuration lead to my
> > > encountering issues?
> > > Thanks.
> > >
> > >
> > > -Original Message-
> > > From: QI Congyun
> > > Sent: Tuesday, November 15, 2016 1:29 PM
> > > To: user@hbase.apache.org
> > > Subject: RE: problem in launching HBase
> > >
> > >
> > > I'm so sorry that I make a mistake. The Hadoop configuration files are
> > > attached in the previous e-mail.
> > >
> > > The hbase-site.xml are attached, pls check it.
> > >
> > >
> > >
> > > -Original Message-
> > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > Sent: Tuesday, November 15, 2016 1:25 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: problem in launching HBase
> > >
> > > I don't see hbase-site.xml attached.
> > >
> > > Consider using pastebin.
> > >
> > > On Mon, Nov 14, 2016 at 9:19 PM, QI Congyun <
> > congyun...@alcatel-sbell.com.
> > > cn
> > > > wrote:
> > >
> > > >
> > > > The name node and data node are running normally, such as the
> > > > following process. The file "hbase-site.xml" and other associated
> > > > files
> > > are enclosed.
> > > > Thanks.
> > > >
> > > > 
> > > > ---
> > > > [hadoop@hadoop2 conf]$ jps
> > > > 11805 SecondaryNameNode
> > > > 32314 Jps
> > > > 11614 DataNode
> > > > 507 NodeManager
> > > > 385 ResourceManager
> > > > 11379 NameNode
> > > > 
> > > > 
> > > > --
> > > > --
> > > > [hadoop@hadoop2 hadoop-2.7.3]$ bin/hdfs dfsadmin -report Configured
> > > > Capacity: 154684043264 (144.06 GB) Present Capacity: 133174730752
> > > > (124.03 GB) DFS Remaining: 128144982016 (119.34 GB) DFS Used:
> > > > 5029748736 (4.68 GB) DFS Used%: 3.78% Under replicated blocks: 0
> > > > Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks
> > > > (with replication factor 1): 0
> > > >
> > > > -
> > > >
> > > > Live datanodes (1):
> > > >
> > > > Name: 127.0.0.1:9866 (localhost)
> > > > Hostname: localhost
> > > > Decommission Status : Normal
> > > > Configured Capacity: 154684043264 (144.06 GB) DFS Used: 5029748736
> > > > (4.68 GB) Non DFS Used: 21509312512 (20.03 GB) DFS Remaining:
> > > > 128144982016 (119.34 GB) DFS Used%: 3.25% DFS Remaining%: 82.84%
> > > > Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache
> > > > Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00%
> > > > Xceivers: 1
> > > > Last contact: Tue Nov 15 13:17:01 CST 2016
> > > > .
> > > > 
> > > > ..
> > > >
> > > >
> > > >
> > > > -Original Message-
> > > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > > Sent: Tuesday, November 15, 2016 11:50 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: problem in launching HBase
> > > >
> > > > 2016-10-31 15:49:57,528 FATAL [localhost:16000.activeMasterManager]
> > > > master.HMaster: Failed to become active master
> > > > java.net.ConnectException: Call From hadoop2/127.0.0.1 to
> > > > localhost:8020 failed on connection exception:
> > > > java.net.ConnectException: Connection refused; For 

RE: Table is disabled an no way to get it back online

2016-11-22 Thread Cecile, Adam
Hey Ted,

Thank you. Heading home right now but I'll start the laptop again. Not sure 
exactly how I should turn debug log on so if you have the information it'd be 
appreciated, otherwise I'll look at the xml files.

Regards, Adam.

De : Ted Yu 
Envoyé : mardi 22 novembre 2016 15:46
À : user@hbase.apache.org
Objet : Re: Table is disabled an no way to get it back online

Master log contained entries in the following form:

2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
(table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
exec-time=242hrs, 10mins, 28.896sec
exception=org.apache.hadoop.hbase.TableNotEnabledException: sentinel-meta

Note the procedure Id was around 43000, far lower than 147464.

Can you turn debug log on and repost master log ?

Thanks

On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam  wrote:

> Hello,
>
>
> We're having a table stuck in disabled state. First I'd like to start with
> what I tried already:
>
>
> * Restart all machines involved in HBase cluster
>
> * hbase hbck with varios arguments
>
> * hdfs fsck
>
> * Purge ZK /hbase and restart masters
>
>
> Now more details anout what happens:
>
> * When enabling from hbase shell:
>
>
> hbase(main):002:0> enable "sentinel-meta"
> ERROR: The procedure 147464 is still running
>
>
> The task ID changes every time I run the command so I think it's talking
> about itself (and it gets stuck for a while before saying anything)
>
>
> In the log, all I can see is:
>
> 2016-11-22 13:10:50,776 INFO  [ProcedureExecutor-0]
> procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
> (table=sentinel-meta) id=43220 owner=hbase state=ROLLEDBACK
> exec-time=242hrs, 52mins, 7.454sec 
> exception=org.apache.hadoop.hbase.TableNotEnabledException:
> sentinel-meta
> 2016-11-22 13:10:50,781 INFO  [ProcedureExecutor-0] 
> procedure.DisableTableProcedure:
> Table sentinel-meta isn't enabled; skipping disable
> 2016-11-22 13:10:51,084 INFO  [ProcedureExecutor-0]
> procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
> (table=sentinel-meta) id=43221 owner=hbase state=ROLLEDBACK
> exec-time=242hrs, 51mins, 42.288sec 
> exception=org.apache.hadoop.hbase.TableNotEnabledException:
> sentinel-meta
> 2016-11-22 13:10:51,088 INFO  [ProcedureExecutor-0] 
> procedure.DisableTableProcedure:
> Table sentinel-meta isn't enabled; skipping disable
>
>
> Please also find attached a complete log from startup to shutdown on a
> single active master. You'll see the table is found as well as the regions
> but it gets deactivated with no reason.
>
>
> Thanks a lot for your help, we're kinda running out if ideas here.
>
>
> Best regards,
>
>
> Adam.
>
>
>


Re: problem in launching HBase

2016-11-22 Thread Ted Yu
I think hbase 1.2.3 should run fine with Hadoop 2.7.3

Can you replace localhost in your hbase-site.xml and try again (remember to
set corresponding entry in /etc/hosts) ?

BTW I would be out of office starting tomorrow morning.

On Tue, Nov 22, 2016 at 12:44 AM, QI Congyun <
congyun...@alcatel-sbell.com.cn> wrote:

> Hello Ted,
>
> I try to remove the folder of Hbase and re-install it many times, the same
> faults below happened.
> I doubt whether the version of HBase1.2.3 is incompatible with the version
> of Hadoop2.7.3? I search out the similar issues via the internet, the
> similar issue happened very few.
> I'm very bewildered, could you help to find the reasons?
>
> Thanks.
>
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Wednesday, November 16, 2016 11:13 AM
> To: user@hbase.apache.org
> Subject: Re: problem in launching HBase
>
> 2016-10-31 15:49:57,503 INFO
> [master/localhost/127.0.0.1:16000-SendThread(localhost:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server
> localhost/0:0:0:0:0:0:0:1:  2181. Will not attempt to authenticate using
> SASL (unknown error)
>
> Is your machine running IPv6 ?
>
> I don't have much experience with IPv6.
>
> Cheers
>
> On Tue, Nov 15, 2016 at 6:59 PM, QI Congyun  cn
> > wrote:
>
> > Hi, Ted,
> >
> > Do you feel what I make some incorrect configuration lead to my
> > encountering issues?
> > Thanks.
> >
> >
> > -Original Message-
> > From: QI Congyun
> > Sent: Tuesday, November 15, 2016 1:29 PM
> > To: user@hbase.apache.org
> > Subject: RE: problem in launching HBase
> >
> >
> > I'm so sorry that I make a mistake. The Hadoop configuration files are
> > attached in the previous e-mail.
> >
> > The hbase-site.xml are attached, pls check it.
> >
> >
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Tuesday, November 15, 2016 1:25 PM
> > To: user@hbase.apache.org
> > Subject: Re: problem in launching HBase
> >
> > I don't see hbase-site.xml attached.
> >
> > Consider using pastebin.
> >
> > On Mon, Nov 14, 2016 at 9:19 PM, QI Congyun <
> congyun...@alcatel-sbell.com.
> > cn
> > > wrote:
> >
> > >
> > > The name node and data node are running normally, such as the
> > > following process. The file "hbase-site.xml" and other associated
> > > files
> > are enclosed.
> > > Thanks.
> > >
> > > 
> > > ---
> > > [hadoop@hadoop2 conf]$ jps
> > > 11805 SecondaryNameNode
> > > 32314 Jps
> > > 11614 DataNode
> > > 507 NodeManager
> > > 385 ResourceManager
> > > 11379 NameNode
> > > 
> > > 
> > > --
> > > --
> > > [hadoop@hadoop2 hadoop-2.7.3]$ bin/hdfs dfsadmin -report Configured
> > > Capacity: 154684043264 (144.06 GB) Present Capacity: 133174730752
> > > (124.03 GB) DFS Remaining: 128144982016 (119.34 GB) DFS Used:
> > > 5029748736 (4.68 GB) DFS Used%: 3.78% Under replicated blocks: 0
> > > Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks
> > > (with replication factor 1): 0
> > >
> > > -
> > >
> > > Live datanodes (1):
> > >
> > > Name: 127.0.0.1:9866 (localhost)
> > > Hostname: localhost
> > > Decommission Status : Normal
> > > Configured Capacity: 154684043264 (144.06 GB) DFS Used: 5029748736
> > > (4.68 GB) Non DFS Used: 21509312512 (20.03 GB) DFS Remaining:
> > > 128144982016 (119.34 GB) DFS Used%: 3.25% DFS Remaining%: 82.84%
> > > Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache
> > > Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00%
> > > Xceivers: 1
> > > Last contact: Tue Nov 15 13:17:01 CST 2016
> > > .
> > > 
> > > ..
> > >
> > >
> > >
> > > -Original Message-
> > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > Sent: Tuesday, November 15, 2016 11:50 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: problem in launching HBase
> > >
> > > 2016-10-31 15:49:57,528 FATAL [localhost:16000.activeMasterManager]
> > > master.HMaster: Failed to become active master
> > > java.net.ConnectException: Call From hadoop2/127.0.0.1 to
> > > localhost:8020 failed on connection exception:
> > > java.net.ConnectException: Connection refused; For more details see:
> > > http://wiki.apache.org/hadoop/ConnectionRefused
> > >   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > >   at
> > > sun.reflect.NativeConstructorAccessorImpl.newInstance(
> > > NativeConstructorAccessorImpl.java:57)
> > > ...
> > >   at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2264)
> > >   at
> > > 

Re: Table is disabled an no way to get it back online

2016-11-22 Thread Ted Yu
Master log contained entries in the following form:

2016-11-22 13:13:41,836 INFO  [ProcedureExecutor-3]
procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
(table=sentinel-meta) id=43538 owner=hbase state=ROLLEDBACK
exec-time=242hrs, 10mins, 28.896sec
exception=org.apache.hadoop.hbase.TableNotEnabledException: sentinel-meta

Note the procedure Id was around 43000, far lower than 147464.

Can you turn debug log on and repost master log ?

Thanks

On Tue, Nov 22, 2016 at 4:16 AM, Cecile, Adam  wrote:

> Hello,
>
>
> We're having a table stuck in disabled state. First I'd like to start with
> what I tried already:
>
>
> * Restart all machines involved in HBase cluster
>
> * hbase hbck with varios arguments
>
> * hdfs fsck
>
> * Purge ZK /hbase and restart masters
>
>
> Now more details anout what happens:
>
> * When enabling from hbase shell:
>
>
> hbase(main):002:0> enable "sentinel-meta"
> ERROR: The procedure 147464 is still running
>
>
> The task ID changes every time I run the command so I think it's talking
> about itself (and it gets stuck for a while before saying anything)
>
>
> In the log, all I can see is:
>
> 2016-11-22 13:10:50,776 INFO  [ProcedureExecutor-0]
> procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
> (table=sentinel-meta) id=43220 owner=hbase state=ROLLEDBACK
> exec-time=242hrs, 52mins, 7.454sec 
> exception=org.apache.hadoop.hbase.TableNotEnabledException:
> sentinel-meta
> 2016-11-22 13:10:50,781 INFO  [ProcedureExecutor-0] 
> procedure.DisableTableProcedure:
> Table sentinel-meta isn't enabled; skipping disable
> 2016-11-22 13:10:51,084 INFO  [ProcedureExecutor-0]
> procedure2.ProcedureExecutor: Rolledback procedure DisableTableProcedure
> (table=sentinel-meta) id=43221 owner=hbase state=ROLLEDBACK
> exec-time=242hrs, 51mins, 42.288sec 
> exception=org.apache.hadoop.hbase.TableNotEnabledException:
> sentinel-meta
> 2016-11-22 13:10:51,088 INFO  [ProcedureExecutor-0] 
> procedure.DisableTableProcedure:
> Table sentinel-meta isn't enabled; skipping disable
>
>
> Please also find attached a complete log from startup to shutdown on a
> single active master. You'll see the table is found as well as the regions
> but it gets deactivated with no reason.
>
>
> Thanks a lot for your help, we're kinda running out if ideas here.
>
>
> Best regards,
>
>
> Adam.
>
>
>


Re: High CPU utilization by meta region

2016-11-22 Thread Jean-Marc Spaggiari
To add to what Stack asked, do you have the metrics for your META vs he
other regions? Is the meta hot-spotted, which might create an increase on
the CPU usage? Not just the requests per seconds, but also the number of
calls. Does the META have way more? Or almost the same? Or less?

thanks,

JMS


2016-11-22 0:04 GMT-05:00 Stack :

> Can we see configs -- encodings? -- and a thread dump?  Any I/O? If you
> look in HDFS, many files under hbase:meta? Is it big? When was last time it
> major compacted?
>
> Thanks,
> S
>
> On Mon, Nov 21, 2016 at 5:50 PM, Timothy Brown 
> wrote:
>
> > Hi,
> >
> > We are seeing about 80% CPU utilization on the Region Server that solely
> > serves the meta table while other region servers typically have under 50%
> > CPU utilization. Is this expected?
> >
> > Here's some more info about our cluster:
> > HBase version 1.2
> > Number of regions: 72
> > Number of tables: 97
> > Approx. requests per second to meta region server: 3k
> > Approx. requests per second to entire HBase cluster: 90k
> >
> > Let me know what other information would be useful.
> >
> > Thanks for the help,
> > Tim
> >
>


Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-22 Thread Andrew Purtell
> I hope we could strengthen our faith in HBase capability

Us too. Would you be interested in taking the metrics and discussion of them 
that came out in this thread into a post for the HBase project blog 
(https://blogs.apache.org/hbase)? As you can see from the other blog entries 
details about the use case does not need to reveal proprietary information, 
readers would be most interested in the metrics you observed/achieved on 11/11 
followed by a technical discussion of how (roughly) to replicate them. You have 
good command of the English language so that won't be a problem and anyway I 
offer my services as editor should you like to try. Think about it. This would 
be a great post. I am sure, very popular. 


> On Nov 22, 2016, at 12:51 AM, Yu Li  wrote:
> 
> bq. If it were not "confidential" might you mention why there is such a
> large (several orders of magnitude) explosion of end user queries to
> backend ones?
> For index building and online machine learning system, there're more
> information recorded after each visit/trade, such as user query/click
> history, item stock updates, etc., and multiple user-specific feature data
> will be read/updated for better recommendation. The flow is pretty much
> like:
> user visit some items
> -> put them into shopping cart
> -> checkout/removing item from shopping cart
> -> item stock update/recommend new items to user
> -> user visit new items
> Not that much details could be supplied but I believe we could imagine how
> many queries/updates there will be at backend for such loops, right? (smile)
> 
> Thanks again for the interest and questions although a little bit derail of
> the thread, and I hope we could strengthen our faith in HBase capability
> after these discussions. :-)
> 
> Best Regards,
> Yu
> 
>> On 21 November 2016 at 01:26, Stephen Boesch  wrote:
>> 
>> Thanks Yu - given your apparent direct knowledge of the data that is
>> helpful (my response earlier had been to  张铎) .   It is important so as to
>> ensure informing colleagues of numbers that are "real".
>> 
>> If it were not "confidential" might you mention why there is such a large
>> (several orders of magnitude) explosion of end user queries to backend
>> ones?
>> 
>> 
>> 
>> 2016-11-20 7:51 GMT-08:00 Yu Li :
>> 
>>> Thanks everyone for the feedback/comments, glad this data means something
>>> and have drawn your interesting. Let me answer the questions (and sorry
>> for
>>> the lag)
>>> 
>>> For the backport patches, ours are based on a customized 1.1.2 version
>> and
>>> cannot apply directly for any 1.x branches. It would be easy for us to
>>> upload existing patches somewhere but obviously not that useful... so
>> maybe
>>> we still should get them in branch-1 and officially support read-path
>>> offheap in future 1.x release? Let me create one JIRA about this and
>> let's
>>> discuss in the JIRA system. And to be very clear, it's a big YES to share
>>> our patches with all rather than only numbers, just which way is better
>>> (smile).
>>> 
>>> And answers for @Stephen Boesch:
>>> 
>>> bq. In any case the data is marked as 9/25/16 not 11/11/16
>>> It's specially noted that the data on 9/25 are from our online A/B test
>>> cluster, and not showing fully online data because we published offheap
>>> together with NettyRpcServer for online thus no standalone comparison
>> data
>>> for offheap. Please check my original email more carefully (smile).
>>> 
>>> bq. Repeating my earlier question:  20*Meg* queries per second??  Just
>>> checked and *google* does 40*K* queries per second.
>>> As you already noticed, the 20M QPS is number from A/B testing cluster
>> (450
>>> nodes), and there're much more on 11/11 online cluster (1600+ nodes).
>>> Please note that this is NOT some cluster directly serves queries from
>> end
>>> user, but serving the index building and online machine learning system.
>>> Refer to our talk on hbasecon2016 (slides
>>> > apache-hbase-and-its-
>>> applications-in-alibaba-search>
>>> /recording
>>> >> T5HvwvkO9raWy=10>)
>>> for more details, if you're interested. And different from google,
>> there's
>>> an obvious "hot spot" for us, so I don't think the QPS of these two
>>> different systems are comparable.
>>> 
>>> bq. So maybe please check your numbers again.
>>> The numbers are got from online monitoring system and all real not fake,
>> so
>>> no need to check. Maybe just need some more time to take and understand?
>>> (smile)
>>> 
>>> Best Regards,
>>> Yu
>>> 
 On 20 November 2016 at 23:03, Stephen Boesch  wrote:
 
 Your arguments do not reflect direct knowledge of the numbers.  (a)
>> There
 is no super-spikiness int he graphs in the data (b) In any case the
>> data
>>> is
 marked as 9/25/16 not 11/11/16.  (c) The number of internet users says

Re: hbase/spark - Delegation Token can be issued only with kerberos or web authentication

2016-11-22 Thread Abel Fernández
I think the tgt is not the problem, checking the logs I can see:

16/11/22 10:06:40 DEBUG [main] YarnSparkHadoopUtil: running as user: hbase
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: hadoop login
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: hadoop login commit
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: using kerberos
user:hb...@company.corp
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: Using user:
"hb...@company.corp" with name hb...@company.corp
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: User entry:
"hb...@company.corp"
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: UGI
loginUser:hb...@company.corp (auth:KERBEROS)
16/11/22 10:06:40 DEBUG [main] UserGroupInformation: PrivilegedAction
as:hbase (auth:SIMPLE)
from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
16/11/22 10:06:40 DEBUG [TGT Renewer for hb...@company.corp]
UserGroupInformation: Found tgt Ticket (hex) =
: 61 82 01 61 30 82 01 5D   A0 03 02 01 05 A1 12 1B  a..a0..]
0010: 10 53 41 4E 54 41 4E 44   45 52 55 4B 2E 43 4F 52  .COMPANY.COR
0020: 50 A2 25 30 23 A0 03 02   01 02 A1 1C 30 1A 1B 06  P.%0#...0...
0030: 6B 72 62 74 67 74 1B 10   53 41 4E 54 41 4E 44 45


Client Principal = hb...@company.corp
Server Principal = krbtgt/company.c...@company.corp
Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
: 2D 9D 67 F5 7C B4 15 17   AE DE BE A5 B9 2C 15 95  -.g..,..
0010: E6 6B 1C 4A 02 A2 44 67   6D D2 16 36 4A DA 11 82  .k.J..Dgm..6J...


Forwardable Ticket true
Forwarded Ticket false
Proxiable Ticket false
Proxy Ticket false
Postdated Ticket false
Renewable Ticket true
Initial Ticket true
Auth Time = Tue Nov 22 03:39:05 CET 2016
Start Time = Tue Nov 22 03:39:05 CET 2016
End Time = Wed Nov 23 03:39:05 CET 2016
Renew Till = Tue Nov 29 03:39:05 CET 2016
Client Addresses  Null
16/11/22 10:06:40 DEBUG [TGT Renewer for hb...@company.corp]
UserGroupInformation: Current time is 1479805600691
16/11/22 10:06:40 DEBUG [TGT Renewer for hb...@company.corp]
UserGroupInformation: Next refresh is 1479851465000

Is the retrofit version you are using public? We are using CDH 5.5.4 but
with a backported version of hbase on spark from the latest code released
on github.

On Mon, 21 Nov 2016 at 21:11 Nkechi Achara  wrote:

> I am still convinced that it could be due to class path issues but I might
> be missing something.
>
> Just to make sure Have you checked the use of the principal / keytab
> only on the driver only so you can make sure the tgt is valid.
>
> I am using the same config but with CDH 5.5.2, but I am using a retrofit of
> cloudera labs hbase on spark.
>
> Thanks
>
> On 21 Nov 2016 5:32 p.m., "Abel Fernández"  wrote:
>
> > I have included into the spark-submit and into all nodemanagers and
> drivers
> > the krb5.conf and the jaas.conf, but I am still having the same problem.
> >
> > I think the problem is this piece of code, it is trying to execute a
> > function into the executors and for some reason, the executors cannot
> get a
> > valid credentials.
> >
> > /**
> >  * A simple enrichment of the traditional Spark RDD foreachPartition.
> >  * This function differs from the original in that it offers the
> >  * developer access to a already connected Connection object
> >  *
> >  * Note: Do not close the Connection object.  All Connection
> >  * management is handled outside this method
> >  *
> >  * @param rdd  Original RDD with data to iterate over
> >  * @param fFunction to be given a iterator to iterate through
> >  * the RDD values and a Connection object to interact
> >  * with HBase
> >  */
> > def foreachPartition[T](rdd: RDD[T],
> > f: (Iterator[T], Connection) => Unit):Unit = {
> >   rdd.foreachPartition(
> > it => hbaseForeachPartition(broadcastedConf, it, f))
> > }
> >
> >
> > The first thing is trying to do the hbaseForeachPartition is getting the
> > credentials but I think this code is never executed:
> >
> > /**
> >  *  underlining wrapper all foreach functions in HBaseContext
> >  */
> > private def hbaseForeachPartition[T](configBroadcast:
> >
> > Broadcast[SerializableWritable[Configuration]],
> >   it: Iterator[T],
> >   f: (Iterator[T], Connection) =>
> > Unit) = {
> >
> >   val config = getConf(configBroadcast)
> >
> >   applyCreds
> >   // specify that this is a proxy user
> >   val smartConn = HBaseConnectionCache.getConnection(config)
> >   f(it, smartConn.connection)
> >   smartConn.close()
> > }
> >
> >
> > This is the latest spark-submit I am using:
> > #!/bin/bash
> >
> > SPARK_CONF_DIR=conf-hbase spark-submit --master yarn-cluster \
> >   --executor-memory 6G \
> >   --num-executors 10 \
> >   --queue cards \
> >   --executor-cores 4 \
> >   --driver-java-options "-Dlog4j.configuration=file:log4j.properties" \
> >   

RE: problem in launching HBase

2016-11-22 Thread QI Congyun
Hello Ted,

I try to remove the folder of Hbase and re-install it many times, the same 
faults below happened. 
I doubt whether the version of HBase1.2.3 is incompatible with the version of 
Hadoop2.7.3? I search out the similar issues via the internet, the similar 
issue happened very few.
I'm very bewildered, could you help to find the reasons?

Thanks.


-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com] 
Sent: Wednesday, November 16, 2016 11:13 AM
To: user@hbase.apache.org
Subject: Re: problem in launching HBase

2016-10-31 15:49:57,503 INFO
[master/localhost/127.0.0.1:16000-SendThread(localhost:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
localhost/0:0:0:0:0:0:0:1:  2181. Will not attempt to authenticate using SASL 
(unknown error)

Is your machine running IPv6 ?

I don't have much experience with IPv6.

Cheers

On Tue, Nov 15, 2016 at 6:59 PM, QI Congyun  wrote:

> Hi, Ted,
>
> Do you feel what I make some incorrect configuration lead to my 
> encountering issues?
> Thanks.
>
>
> -Original Message-
> From: QI Congyun
> Sent: Tuesday, November 15, 2016 1:29 PM
> To: user@hbase.apache.org
> Subject: RE: problem in launching HBase
>
>
> I'm so sorry that I make a mistake. The Hadoop configuration files are 
> attached in the previous e-mail.
>
> The hbase-site.xml are attached, pls check it.
>
>
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Tuesday, November 15, 2016 1:25 PM
> To: user@hbase.apache.org
> Subject: Re: problem in launching HBase
>
> I don't see hbase-site.xml attached.
>
> Consider using pastebin.
>
> On Mon, Nov 14, 2016 at 9:19 PM, QI Congyun  cn
> > wrote:
>
> >
> > The name node and data node are running normally, such as the 
> > following process. The file "hbase-site.xml" and other associated 
> > files
> are enclosed.
> > Thanks.
> >
> > 
> > ---
> > [hadoop@hadoop2 conf]$ jps
> > 11805 SecondaryNameNode
> > 32314 Jps
> > 11614 DataNode
> > 507 NodeManager
> > 385 ResourceManager
> > 11379 NameNode
> > 
> > 
> > --
> > --
> > [hadoop@hadoop2 hadoop-2.7.3]$ bin/hdfs dfsadmin -report Configured
> > Capacity: 154684043264 (144.06 GB) Present Capacity: 133174730752
> > (124.03 GB) DFS Remaining: 128144982016 (119.34 GB) DFS Used:
> > 5029748736 (4.68 GB) DFS Used%: 3.78% Under replicated blocks: 0 
> > Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks 
> > (with replication factor 1): 0
> >
> > -
> >
> > Live datanodes (1):
> >
> > Name: 127.0.0.1:9866 (localhost)
> > Hostname: localhost
> > Decommission Status : Normal
> > Configured Capacity: 154684043264 (144.06 GB) DFS Used: 5029748736
> > (4.68 GB) Non DFS Used: 21509312512 (20.03 GB) DFS Remaining:
> > 128144982016 (119.34 GB) DFS Used%: 3.25% DFS Remaining%: 82.84% 
> > Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache
> > Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00%
> > Xceivers: 1
> > Last contact: Tue Nov 15 13:17:01 CST 2016 
> > .
> > 
> > ..
> >
> >
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Tuesday, November 15, 2016 11:50 AM
> > To: user@hbase.apache.org
> > Subject: Re: problem in launching HBase
> >
> > 2016-10-31 15:49:57,528 FATAL [localhost:16000.activeMasterManager]
> > master.HMaster: Failed to become active master
> > java.net.ConnectException: Call From hadoop2/127.0.0.1 to
> > localhost:8020 failed on connection exception:
> > java.net.ConnectException: Connection refused; For more details see:
> > http://wiki.apache.org/hadoop/ConnectionRefused
> >   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> >   at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(
> > NativeConstructorAccessorImpl.java:57)
> > ...
> >   at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2264)
> >   at
> > org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(
> > DistributedFileSystem.java:986)
> >   at
> > org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(
> > DistributedFileSystem.java:970)
> >   at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:525)
> >   at
> > org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:971
> > )
> >
> > Was the namenode running fine on localhost ?
> >
> > Can you pastebin the contents of hbase-site.xml ?
> >
> > On Mon, Nov 14, 2016 at 7:40 PM, QI Congyun <
> congyun...@alcatel-sbell.com.
> > cn
> > > wrote:
> >
> > > Dear Ted,
> > >
> > > I had