[jira] [Commented] (HBASE-5846) HBase rpm packing is broken at multiple places
[ https://issues.apache.org/jira/browse/HBASE-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258467#comment-13258467 ] Shrijeet Paliwal commented on HBASE-5846: - Here is what happens if one runs update : {noformat} D: install: %post(hbase-0.92.1-2.x86_64) synchronous scriptlet start D: install: %post(hbase-0.92.1-2.x86_64) execv(/bin/sh) pid 26772 + /usr/share/hbase/sbin/update-hbase-env.sh --prefix=/usr --bin-dir=/usr/bin --conf-dir=/etc/hbase --log-dir=/var/log/hbase --pid-dir=/var/run/hbase D: install: waitpid(26772) rc 26772 status 0 secs 0.038 D: == --- hbase-0.92.1-1 x86_64-linux 0x0 D: erase: hbase-0.92.1-1 has 224 files, test = 0 D: erase: %preun(hbase-0.92.1-1.x86_64) asynchronous scriptlet start D: erase: %preun(hbase-0.92.1-1.x86_64) execv(/bin/sh) pid 26819 + /usr/share/hbase/sbin/update-hbase-env.sh --prefix=/usr --bin-dir=/usr/bin --conf-dir=/etc/hbase --log-dir=/var/log/hbase --pid-dir=/var/run/hbase --uninstall {noformat} This is out put of rpm -Uvv . Note how install post runs followed by preun . preun erases all the work that was done by install post. > HBase rpm packing is broken at multiple places > -- > > Key: HBASE-5846 > URL: https://issues.apache.org/jira/browse/HBASE-5846 > Project: HBase > Issue Type: Bug > Components: build >Affects Versions: 0.92.1 > Environment: CentOS release 5.7 (Final) >Reporter: Shrijeet Paliwal > > Here is how I executed rpm build: > {noformat} > MAVEN_OPTS="-Xmx2g" mvn clean package assembly:single -Prpm -DskipTests > {noformat} > The issues with the rpm build are: > * There is no clean (%clean) section in the hbase.spec file . Last run can > leave stuff in RPM_BUILD_ROOT which in turn will fail build. As a fix I added > 'rm -rf $RPM_BUILD_ROOT' to %clean section > * The Buildroot is set to _build_dir . The build fails with this error. > {noformat} > cp: cannot copy a directory, > `/data/9adda425-1f1e-4fe5-8a53-83bd2ce5ad45/app/jenkins/workspace/hbase.92/target/rpm/hbase/BUILD', > into itself, > `/data/9adda425-1f1e-4fe5-8a53-83bd2ce5ad45/app/jenkins/workspace/hbase.92/target/rpm/hbase/BUILD/BUILD' > {noformat} > If we set it to ' %{_tmppath}/%{name}-%{version}-root' build passes > * The src/packages/update-hbase-env.sh script will leave inconsistent state > if 'yum update hbase' is executed. It deletes data from /etc/init.d/hbase* > and does not put scripts back during update. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned
[ https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183929#comment-13183929 ] Shrijeet Paliwal commented on HBASE-3638: - I must add to avoid ambiguity, the log I pasted is of a time when master is initializing. > If a FS bootstrap, need to also ensure ZK is cleaned > > > Key: HBASE-3638 > URL: https://issues.apache.org/jira/browse/HBASE-3638 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Minor > > In a test environment where a cycle of start, operation, kill hbase (repeat), > noticed that we were doing a bootstrap on startup but then we were picking up > the previous cycles zk state. It made for a mess in the test. > Last thing seen on previous cycle was: > {code} > 2011-03-11 06:33:36,708 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, > region=1028785192/.META. > {code} > Then, in the messed up cycle I saw: > {code} > 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > BOOTSTRAP: creating ROOT and first META regions > . > {code} > Then after setting watcher on .META., we get a > {code} > 2011-03-11 06:42:58,301 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Processing region > .META.,,1.1028785192 in state RS_ZK_REGION_OPENED > 2011-03-11 06:42:58,302 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Region in transition > 1028785192 references a server no longer up X.X.X; letting RIT timeout so > will be assigned elsewhere > {code} > We're all confused. > Should at least clear our zk if a bootstrap happened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned
[ https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183925#comment-13183925 ] Shrijeet Paliwal commented on HBASE-3638: - Here is the relevant portion of log. The master (even if you restart all the Hbase services across the cluster) will always get stuck at this state. {noformat} 2012-01-10 21:28:03,382 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 1028785192 references a server no longer up txa-18.rfiserve.net,60020,1326125886539; letting RIT timeout so will be assigned elsewhere 2012-01-10 21:28:06,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:06,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:16,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:16,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:26,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:26,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:36,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:36,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:46,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 2012-01-10 21:28:46,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192 2012-01-10 21:28:56,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: .META.,,1.1028785192 state=OPENING, ts=1326241230066 {noformat} bq. What do you think Stack, can master pick a stale ZK state which is not a leftover from previous HBase install, in other words a stale state created by itself? By this I was referring to comment made by Todd in the related jira when he said: bq. Notably, it wasn't clearing ZK between runs. So some leftover RIT data from a previous HBase incarnation may be confusing this one's master. He floated one possibility, left over RIT from previous incarnation. I am thinking what other possibilities are there? > If a FS bootstrap, need to also ensure ZK is cleaned > > > Key: HBASE-3638 > URL: https://issues.apache.org/jira/browse/HBASE-3638 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Minor > > In a test environment where a cycle of start, operation, kill hbase (repeat), > noticed that we were doing a bootstrap on startup but then we were picking up > the previous cycles zk state. It made for a mess in the test. > Last thing seen on previous cycle was: > {code} > 2011-03-11 06:33:36,708 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, > region=1028785192/.META. > {code} > Then, in the messed up cycle I saw: > {code} > 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > BOOTSTRAP: creating ROOT and first META regions > . > {code} > Then after setting watcher on .META., we get a > {code} > 2011-03-11 06:42:58,301 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Processing region > .META.,,1.1028785192 in state RS_ZK_REGION_OPENED > 2011-03-11 06:42:58,302 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Region in transition > 1028785192 references a server no longer up X.X.X; letting RIT timeout so > will be assigned elsewhere > {code} > We're all confused. > Should at least clear our zk if a bootstrap happened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned
[ https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183875#comment-13183875 ] Shrijeet Paliwal commented on HBASE-3638: - We just hit this issue today in production. We did not do an FS bootstrap (I assume you mean cleaning /hbase directory from hdfs by FS bootstrap). It was a regular day a RS was throwing not serving exceptions and I went ahead and restarted it. It was not a META or ROOT serving RS. Following this RS restart hbck started reporting holes in regions. Later, for some unexplainable, crazy and panicky reason I restarted Master and all other region servers. This is the point where master started complaining META is in OPENED state in ZK, for a server which no longer exists. And like Todd explained in the other Jira, master went to an unending loop. The work around was to clear up all files from ZK data directory. What do you think Stack, can master pick a *stale* ZK state which is not a leftover from previous HBase install, in other words a stale state created by itself? > If a FS bootstrap, need to also ensure ZK is cleaned > > > Key: HBASE-3638 > URL: https://issues.apache.org/jira/browse/HBASE-3638 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Minor > > In a test environment where a cycle of start, operation, kill hbase (repeat), > noticed that we were doing a bootstrap on startup but then we were picking up > the previous cycles zk state. It made for a mess in the test. > Last thing seen on previous cycle was: > {code} > 2011-03-11 06:33:36,708 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, > region=1028785192/.META. > {code} > Then, in the messed up cycle I saw: > {code} > 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: > BOOTSTRAP: creating ROOT and first META regions > . > {code} > Then after setting watcher on .META., we get a > {code} > 2011-03-11 06:42:58,301 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Processing region > .META.,,1.1028785192 in state RS_ZK_REGION_OPENED > 2011-03-11 06:42:58,302 WARN > org.apache.hadoop.hbase.master.AssignmentManager: Region in transition > 1028785192 references a server no longer up X.X.X; letting RIT timeout so > will be assigned elsewhere > {code} > We're all confused. > Should at least clear our zk if a bootstrap happened. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179907#comment-13179907 ] Shrijeet Paliwal commented on HBASE-5041: - {code} mvn clean compile test -Dtest=TestReplication {code} Above passes without error for branch 0.90 in my dev machine. -Shrijeet > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal >Assignee: Shrijeet Paliwal > Fix For: 0.92.0, 0.94.0, 0.90.6 > > Attachments: > 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, > 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, > 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch > > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179208#comment-13179208 ] Shrijeet Paliwal commented on HBASE-5041: - @Ted, For me it is failing randomly on some, but not the one you mentioned. In current test run (which is going on as I type this), it has failed on org.apache.hadoop.hbase.TestZooKeeper and org.apache.hadoop.hbase.replication.TestMasterReplication . It passed the one you mentioned. Interestingly hadoop QA reported some other failures (not the ones you or myself saw). OK. How do I resubmit the patch for Hadoop QA? I dont see the retrigger click on jenkins. > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal >Assignee: Shrijeet Paliwal > Fix For: 0.92.0, 0.94.0, 0.90.6 > > Attachments: > 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch, > 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch > > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177849#comment-13177849 ] Shrijeet Paliwal commented on HBASE-5041: - I will update this Jira with new Patch post holidays. > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal >Assignee: Shrijeet Paliwal > Fix For: 0.92.0, 0.94.0, 0.90.6 > > Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch > > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175292#comment-13175292 ] Shrijeet Paliwal commented on HBASE-5041: - Also could one of you suggest if it makes sense to be explicitly declare throwing of TNFE in method signature of split, flush and compact ? Although we dont need it since TNFE is subclass on IOE, but ... what is best practice? > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal >Assignee: Shrijeet Paliwal > Fix For: 0.92.0, 0.94.0, 0.90.6 > > Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch > > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175287#comment-13175287 ] Shrijeet Paliwal commented on HBASE-5041: - @Stack {quote} I think patch is doing right thing. Its changing the contract for isRegionName but this is a private method and you are tightening what was a sloppy contract previous; it looks too like all instances of isRegionName can benefit from this tightening (is this your though Shrijeet?). {quote} Yes that is the idea. {quote} You might make a method that returns a String tablename for a table you know exists (else it throws the TNFE). {quote} Makes sense, will do. {quote} We are creating a new CatalogTracker instance. No one seems to be shutting it down? Is that a prob? {quote} Did not understand this one Stack. cleanupCatalogTracker called in finally will stop the CatalogTracker, no? > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal >Assignee: Shrijeet Paliwal > Fix For: 0.92.0, 0.94.0, 0.90.6 > > Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch > > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175264#comment-13175264 ] Shrijeet Paliwal commented on HBASE-5041: - @Ted, will add a unit test and upload a new one on top of trunk. @Ram, thanks for commenting. Do you mean to say isRegionName should throw an exception? I wanted to keep the semantic same as before - it tells weather the name argument 'appears' to be a region name or not. When MetaReader.getRegion returns null we know one thing for sure, it is not a region. Determining if its a valid table is left to caller, depending on need. Did you mean something else? > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal >Assignee: Shrijeet Paliwal > Fix For: 0.92.0, 0.94.0, 0.90.6 > > Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch > > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175184#comment-13175184 ] Shrijeet Paliwal commented on HBASE-5041: - Will do. > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175154#comment-13175154 ] Shrijeet Paliwal commented on HBASE-5041: - One possibility is make call to MetaReader.getRegion for the name and return true/false based on not-null/null value. > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error
[ https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175132#comment-13175132 ] Shrijeet Paliwal commented on HBASE-5041: - Our logic to check if the name is a regionname or tablename is designed to be as follows: tl;dr: If it is not an existing table, its should be a region. {noformat} /** * @param tableNameOrRegionName Name of a table or name of a region. * @return True if tableNameOrRegionName is *possibly* a region * name else false if a verified tablename (we call {@link #tableExists(byte[])}; * else we throw an exception. * @throws IOException */ private boolean isRegionName(final byte [] tableNameOrRegionName) throws IOException { if (tableNameOrRegionName == null) { throw new IllegalArgumentException("Pass a table name or region name"); } return !tableExists(tableNameOrRegionName); } {noformat} My plan was to modify majorCompact function's else block to check if the table exist and throw TableNotFoundException if it does not. But because of name logic one will never reach 'else' part and a compaction request will be registered assuming it must be a region. > Major compaction on non existing table does not throw error > > > Key: HBASE-5041 > URL: https://issues.apache.org/jira/browse/HBASE-5041 > Project: HBase > Issue Type: Bug > Components: regionserver, shell >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal > > Following will not complain even if fubar does not exist > {code} > echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell > {code} > The downside for this defect is that major compaction may be skipped due to > a typo by Ops. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan
[ https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173780#comment-13173780 ] Shrijeet Paliwal commented on HBASE-5035: - Amm you might be right. {noformat} final String serverAddress = Bytes.toString(value); // instantiate the location HRegionLocation loc = new HRegionLocation(regionInfo, new HServerAddress(serverAddress)); {noformat} The Bytes.toString call, in theory, may return both an empty string or a null string. In the case when it returns a null (see below), it tries to log an error which I didn't see in my log file. So I am not still 100% sure this is out guy. {noformat} try { return new String(b, off, len, HConstants.UTF8_ENCODING); } catch (UnsupportedEncodingException e) { LOG.error("UTF-8 not supported?", e); return null; } {noformat} Nonetheless it will be good to put a check against serverAddress variable for emptiness as well nullness since HServerAddress construtor may throw runtime error otherwise. Interesting point is - it can throw both ArrayIndexOutOfBoundsException and NPE and I saw both cases. {noformat} /** * @param hostAndPort Hostname and port formatted as':' */ public HServerAddress(String hostAndPort) { int colonIndex = hostAndPort.lastIndexOf(':'); {noformat} I will open a subtask to make the trace more helpful. > Runtime exceptions during meta scan > --- > > Key: HBASE-5035 > URL: https://issues.apache.org/jira/browse/HBASE-5035 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal > > Version: 0.90.3 + patches back ported > The other day our client started spitting these two runtime exceptions. Not > all clients connected to the cluster were under impact. Only 4 of them. While > 3 of them were throwing NPE, one of them was throwing > ArrayIndexOutOfBoundsException. The errors are : > 1. http://pastie.org/2987926 > 2. http://pastie.org/2987927 > Clients did not recover from this and I had to restart them. > Motive of this jira is to identify and put null checks at appropriate places. > Also with the given stack trace I can not tell which line caused NPE of > AIOBE, hence additional motive is to make the trace more helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan
[ https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173697#comment-13173697 ] Shrijeet Paliwal commented on HBASE-5035: - Ted, you had mentioned following in the email thread: "Null check for regionInfo should be added" I could not gather why regionInfo could possibly be null. The call 'Writables.getHRegionInfo(value);' does not seem to return null ever. Could you please tell me your reasoning. Meanwhile I am still reading code and trying to find the place where NPE might occur. > Runtime exceptions during meta scan > --- > > Key: HBASE-5035 > URL: https://issues.apache.org/jira/browse/HBASE-5035 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal > > Version: 0.90.3 + patches back ported > The other day our client started spitting these two runtime exceptions. Not > all clients connected to the cluster were under impact. Only 4 of them. While > 3 of them were throwing NPE, one of them was throwing > ArrayIndexOutOfBoundsException. The errors are : > 1. http://pastie.org/2987926 > 2. http://pastie.org/2987927 > Clients did not recover from this and I had to restart them. > Motive of this jira is to identify and put null checks at appropriate places. > Also with the given stack trace I can not tell which line caused NPE of > AIOBE, hence additional motive is to make the trace more helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan
[ https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169955#comment-13169955 ] Shrijeet Paliwal commented on HBASE-5035: - Here is the patched HCM https://gist.github.com/1478070 , can be used to match line numbers. > Runtime exceptions during meta scan > --- > > Key: HBASE-5035 > URL: https://issues.apache.org/jira/browse/HBASE-5035 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.3 >Reporter: Shrijeet Paliwal > > Version: 0.90.3 + patches back ported > The other day our client started spitting these two runtime exceptions. Not > all clients connected to the cluster were under impact. Only 4 of them. While > 3 of them were throwing NPE, one of them was throwing > ArrayIndexOutOfBoundsException. The errors are : > 1. http://pastie.org/2987926 > 2. http://pastie.org/2987927 > Clients did not recover from this and I had to restart them. > Motive of this jira is to identify and put null checks at appropriate places. > Also with the given stack trace I can not tell which line caused NPE of > AIOBE, hence additional motive is to make the trace more helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse
[ https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165406#comment-13165406 ] Shrijeet Paliwal commented on HBASE-4980: - Done attaching, should I click cancel patch and then click submit patch again? > Null pointer exception in HBaseClient receiveResponse > - > > Key: HBASE-4980 > URL: https://issues.apache.org/jira/browse/HBASE-4980 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.92.0 >Reporter: Shrijeet Paliwal > Labels: newbie > Attachments: > 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, > 0002-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, > 0003-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch > > > Relevant Stack trace: > 2011-11-30 13:10:26,557 [IPC Client (47) connection to > xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN > org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call > responses > java.lang.NullPointerException > >-at > >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583) > >-at > >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511) > {code} > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " got value #" + id); > Call call = calls.remove(id); > // Read the flag byte > byte flag = in.readByte(); > boolean isError = ResponseFlag.isError(flag); > if (ResponseFlag.isLength(flag)) { > // Currently length if present is unused. > in.readInt(); > } > int state = in.readInt(); // Read the state. Currently unused. > if (isError) { > //noinspection ThrowableInstanceNeverThrown > call.setException(new RemoteException( WritableUtils.readString(in), > WritableUtils.readString(in))); > } else { > {code} > This line {code}Call call = calls.remove(id);{code} may return a null > 'call'. It is so because if you have rpc timeout enable, we proactively clean > up other calls which have expired their lifetime along with the call for > which socket timeout exception happend. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism
[ https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162475#comment-13162475 ] Shrijeet Paliwal commented on HBASE-4633: - Recent updates: * In my case the leak/memory-hold is not in HBase client. I could not find enough evidence to conclude that. What I did find is, our application holds one heavy object in memory. This object is shared between threads. Every N minutes the application creates a new instance of this class. Unless any thread is still holding on to an old instance, all old instances are GCed in time. Hence in theory at any time there should be only one active instance of heavy object. * Under heavy load and client operation RPC timeout enabled, some threads get stuck. This causes multiple instances of heavy object. In turn heap grows. After reading client code multiple times I can not gather why there will be a case when application thread will get stuck for several minutes. We have safe guards to clean up calls 'forcefully' if they have been alive for more than rpc timeout interval. I had planned to update the title of Jira to reflect above finding but Gaojinchao observed something interesting at his end and so keeping title same for now. Gaojinchao's thread is here: http://search-hadoop.com/m/teczL8KvcH > Potential memory leak in client RPC timeout mechanism > - > > Key: HBASE-4633 > URL: https://issues.apache.org/jira/browse/HBASE-4633 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.3 > Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0 >Reporter: Shrijeet Paliwal > > Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937, > https://issues.apache.org/jira/browse/HBASE-4003 > We have been using the 'hbase.client.operation.timeout' knob > introduced in 2937 for quite some time now. It helps us enforce SLA. > We have two HBase clusters and two HBase client clusters. One of them > is much busier than the other. > We have seen a deterministic behavior of clients running in busy > cluster. Their (client's) memory footprint increases consistently > after they have been up for roughly 24 hours. > This memory footprint almost doubles from its usual value (usual case > == RPC timeout disabled). After much investigation nothing concrete > came out and we had to put a hack > which keep heap size in control even when RPC timeout is enabled. Also > note , the same behavior is not observed in 'not so busy > cluster. > The patch is here : https://gist.github.com/1288023 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism
[ https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131786#comment-13131786 ] Shrijeet Paliwal commented on HBASE-4633: - @Stack No we did not run with that flag. Also we never got to a point when application had to die cause of OOM. The reasons (I guess) are : # We have GC flags to do garbage collection as fast as possible. # The monitoring in place starts sending our alerts and we usually shoot the server in the head before it OOMs # The load balancer will kick in and start sending no work to application server realizing it is in bad state. As mentioned earlier I have found it hard to reproduce in dev environment, failing to simulate production like load. But I must try again when. > Potential memory leak in client RPC timeout mechanism > - > > Key: HBASE-4633 > URL: https://issues.apache.org/jira/browse/HBASE-4633 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.3 > Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0 >Reporter: Shrijeet Paliwal > > Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937, > https://issues.apache.org/jira/browse/HBASE-4003 > We have been using the 'hbase.client.operation.timeout' knob > introduced in 2937 for quite some time now. It helps us enforce SLA. > We have two HBase clusters and two HBase client clusters. One of them > is much busier than the other. > We have seen a deterministic behavior of clients running in busy > cluster. Their (client's) memory footprint increases consistently > after they have been up for roughly 24 hours. > This memory footprint almost doubles from its usual value (usual case > == RPC timeout disabled). After much investigation nothing concrete > came out and we had to put a hack > which keep heap size in control even when RPC timeout is enabled. Also > note , the same behavior is not observed in 'not so busy > cluster. > The patch is here : https://gist.github.com/1288023 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism
[ https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131677#comment-13131677 ] Shrijeet Paliwal commented on HBASE-4633: - @Liyin, Are you using RPC timeouts for client operations? bq. But Not sure the leak comes from HBase Client jar itself or just our client code. In the absence of a concrete evidence that leak is indeed in HBase client jar, I have similar feeling. It could be in our application layer. bq. Our symptom is that the memory footprint will increase as time. But the actual heap size of the client is not increasing. We observe the used memory using a collectd plugin http://collectd.org/wiki/index.php/Plugin:Memory bq. So I am very interested to know when you have keep the heap size in control, is the memory leaking solved ? We run with max and min memory set as -Xmx2{X}G -Xms{X}G . And when 'leak' happens the plugin shows the used memory touching 2X value, so it does seem heap size is increasing. Correct me here if I am mistaken. Let me know if you need more inputs. > Potential memory leak in client RPC timeout mechanism > - > > Key: HBASE-4633 > URL: https://issues.apache.org/jira/browse/HBASE-4633 > Project: HBase > Issue Type: Bug > Components: client >Affects Versions: 0.90.3 > Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0 >Reporter: Shrijeet Paliwal > > Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937, > https://issues.apache.org/jira/browse/HBASE-4003 > We have been using the 'hbase.client.operation.timeout' knob > introduced in 2937 for quite some time now. It helps us enforce SLA. > We have two HBase clusters and two HBase client clusters. One of them > is much busier than the other. > We have seen a deterministic behavior of clients running in busy > cluster. Their (client's) memory footprint increases consistently > after they have been up for roughly 24 hours. > This memory footprint almost doubles from its usual value (usual case > == RPC timeout disabled). After much investigation nothing concrete > came out and we had to put a hack > which keep heap size in control even when RPC timeout is enabled. Also > note , the same behavior is not observed in 'not so busy > cluster. > The patch is here : https://gist.github.com/1288023 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira