from:"Shrijeet Paliwal \(Commented\) \(JIRA\)"

[jira] [Commented] (HBASE-5846) HBase rpm packing is broken at multiple places

2012-04-20 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258467#comment-13258467
 ] 

Shrijeet Paliwal commented on HBASE-5846:
-

Here is what happens if one runs update : 

{noformat}
D:   install: %post(hbase-0.92.1-2.x86_64) synchronous scriptlet start
D:   install: %post(hbase-0.92.1-2.x86_64)  execv(/bin/sh) pid 26772
+ /usr/share/hbase/sbin/update-hbase-env.sh --prefix=/usr --bin-dir=/usr/bin 
--conf-dir=/etc/hbase --log-dir=/var/log/hbase --pid-dir=/var/run/hbase
D:   install: waitpid(26772) rc 26772 status 0 secs 0.038
D: == --- hbase-0.92.1-1 x86_64-linux 0x0
D: erase: hbase-0.92.1-1 has 224 files, test = 0
D: erase: %preun(hbase-0.92.1-1.x86_64) asynchronous scriptlet start
D: erase: %preun(hbase-0.92.1-1.x86_64) execv(/bin/sh) pid 26819
+ /usr/share/hbase/sbin/update-hbase-env.sh --prefix=/usr --bin-dir=/usr/bin 
--conf-dir=/etc/hbase --log-dir=/var/log/hbase --pid-dir=/var/run/hbase 
--uninstall
{noformat}

This is out put of rpm -Uvv . Note how install post runs followed by preun . 
preun erases all the work that was done by install post.

> HBase rpm packing is broken at multiple places
> --
>
> Key: HBASE-5846
> URL: https://issues.apache.org/jira/browse/HBASE-5846
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.92.1
> Environment: CentOS release 5.7 (Final)
>Reporter: Shrijeet Paliwal
>
> Here is how I executed rpm build: 
> {noformat}
> MAVEN_OPTS="-Xmx2g" mvn clean package assembly:single -Prpm -DskipTests
> {noformat}
> The issues with the rpm build are: 
> * There is no clean (%clean) section in the hbase.spec file . Last run can 
> leave stuff in RPM_BUILD_ROOT which in turn will fail build. As a fix I added 
> 'rm -rf $RPM_BUILD_ROOT' to %clean section
> * The Buildroot is set to _build_dir . The build fails with this error. 
> {noformat}
> cp: cannot copy a directory, 
> `/data/9adda425-1f1e-4fe5-8a53-83bd2ce5ad45/app/jenkins/workspace/hbase.92/target/rpm/hbase/BUILD',
>  into itself, 
> `/data/9adda425-1f1e-4fe5-8a53-83bd2ce5ad45/app/jenkins/workspace/hbase.92/target/rpm/hbase/BUILD/BUILD'
> {noformat}
> If we set it to ' %{_tmppath}/%{name}-%{version}-root' build passes
> * The src/packages/update-hbase-env.sh script will leave inconsistent state 
> if 'yum update hbase' is executed. It deletes data from /etc/init.d/hbase* 
> and does not put scripts back during update. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

2012-01-10 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183929#comment-13183929
 ] 

Shrijeet Paliwal commented on HBASE-3638:
-

I must add to avoid ambiguity, the log I pasted is of a time when master is 
initializing. 

> If a FS bootstrap, need to also ensure ZK is cleaned
> 
>
> Key: HBASE-3638
> URL: https://issues.apache.org/jira/browse/HBASE-3638
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Minor
>
> In a test environment where a cycle of start, operation, kill hbase (repeat), 
> noticed that we were doing a bootstrap on startup but then we were picking up 
> the previous cycles zk state.  It made for a mess in the test.
> Last thing seen on previous cycle was:
> {code}
> 2011-03-11 06:33:36,708 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, 
> region=1028785192/.META.
> {code}
> Then, in the messed up cycle I saw:
> {code}
> 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
> BOOTSTRAP: creating ROOT and first META regions
> .
> {code}
> Then after setting watcher on .META., we get a 
> {code}
> 2011-03-11 06:42:58,301 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-03-11 06:42:58,302 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 
> 1028785192 references a server no longer up X.X.X; letting RIT timeout so 
> will be assigned elsewhere
> {code}
> We're all confused.
> Should at least clear our zk if a bootstrap happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

2012-01-10 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183925#comment-13183925
 ] 

Shrijeet Paliwal commented on HBASE-3638:
-

Here is the relevant portion of log. 

The master (even if you restart all the Hbase services across the cluster) will 
always
get stuck at this state. 
{noformat}
2012-01-10 21:28:03,382 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Region in transition 1028785192 references a server no longer up 
txa-18.rfiserve.net,60020,1326125886539; letting RIT timeout so will be 
assigned elsewhere
2012-01-10 21:28:06,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1326241230066
2012-01-10 21:28:06,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2012-01-10 21:28:16,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1326241230066
2012-01-10 21:28:16,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2012-01-10 21:28:26,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1326241230066
2012-01-10 21:28:26,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2012-01-10 21:28:36,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1326241230066
2012-01-10 21:28:36,787 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2012-01-10 21:28:46,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1326241230066
2012-01-10 21:28:46,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2012-01-10 21:28:56,788 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1326241230066
{noformat}


bq. What do you think Stack, can master pick a stale ZK state which is not a 
leftover from previous HBase install, in other words a stale state created by 
itself?

By this I was referring to comment made by Todd in the related jira when he 
said:

bq. Notably, it wasn't clearing ZK between runs. So some leftover RIT data from 
a previous HBase incarnation may be confusing this one's master.

He floated one possibility, left over RIT from previous incarnation. I am 
thinking what other possibilities are there? 

> If a FS bootstrap, need to also ensure ZK is cleaned
> 
>
> Key: HBASE-3638
> URL: https://issues.apache.org/jira/browse/HBASE-3638
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Minor
>
> In a test environment where a cycle of start, operation, kill hbase (repeat), 
> noticed that we were doing a bootstrap on startup but then we were picking up 
> the previous cycles zk state.  It made for a mess in the test.
> Last thing seen on previous cycle was:
> {code}
> 2011-03-11 06:33:36,708 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, 
> region=1028785192/.META.
> {code}
> Then, in the messed up cycle I saw:
> {code}
> 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
> BOOTSTRAP: creating ROOT and first META regions
> .
> {code}
> Then after setting watcher on .META., we get a 
> {code}
> 2011-03-11 06:42:58,301 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-03-11 06:42:58,302 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 
> 1028785192 references a server no longer up X.X.X; letting RIT timeout so 
> will be assigned elsewhere
> {code}
> We're all confused.
> Should at least clear our zk if a bootstrap happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

2012-01-10 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183875#comment-13183875
 ] 

Shrijeet Paliwal commented on HBASE-3638:
-

We just hit this issue today in production. We did not do an FS bootstrap (I 
assume you mean cleaning /hbase directory from hdfs by FS bootstrap). It was a 
regular day a RS was throwing not serving exceptions and I went ahead and 
restarted it. It was not a META or ROOT serving RS. Following this RS restart 
hbck started reporting holes in regions. 

Later, for some unexplainable, crazy and panicky reason I restarted Master and 
all other region servers. This is the point where master started complaining 
META is in OPENED state in ZK, for a server which no longer exists. And like 
Todd explained in the other Jira, master went to an unending loop. 

The work around was to clear up all files from ZK data directory. 

What do you think Stack, can master pick a *stale* ZK state which is not a 
leftover from previous HBase install, in other words a stale state created by 
itself?

> If a FS bootstrap, need to also ensure ZK is cleaned
> 
>
> Key: HBASE-3638
> URL: https://issues.apache.org/jira/browse/HBASE-3638
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Minor
>
> In a test environment where a cycle of start, operation, kill hbase (repeat), 
> noticed that we were doing a bootstrap on startup but then we were picking up 
> the previous cycles zk state.  It made for a mess in the test.
> Last thing seen on previous cycle was:
> {code}
> 2011-03-11 06:33:36,708 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_OPENING, server=X.X.X.60020,1299853933073, 
> region=1028785192/.META.
> {code}
> Then, in the messed up cycle I saw:
> {code}
> 2011-03-11 06:42:48,530 INFO org.apache.hadoop.hbase.master.MasterFileSystem: 
> BOOTSTRAP: creating ROOT and first META regions
> .
> {code}
> Then after setting watcher on .META., we get a 
> {code}
> 2011-03-11 06:42:58,301 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Processing region 
> .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
> 2011-03-11 06:42:58,302 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 
> 1028785192 references a server no longer up X.X.X; letting RIT timeout so 
> will be assigned elsewhere
> {code}
> We're all confused.
> Should at least clear our zk if a bootstrap happened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179907#comment-13179907
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

{code}
 mvn clean compile test -Dtest=TestReplication
{code}

Above passes without error for branch 0.90 in my dev machine. 

-Shrijeet

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-03 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179208#comment-13179208
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

@Ted,
For me it is failing randomly on some, but not the one you mentioned. In 
current test run (which is going on as I type this), it has failed on 
org.apache.hadoop.hbase.TestZooKeeper and 
org.apache.hadoop.hbase.replication.TestMasterReplication . It passed the one 
you mentioned. Interestingly hadoop QA reported some other failures (not the 
ones you or myself saw). 

OK. How do I resubmit the patch for Hadoop QA? I dont see the retrigger click 
on jenkins. 

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-30 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177849#comment-13177849
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

I will update this Jira with new Patch post holidays.

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-22 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175292#comment-13175292
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

Also could one of you suggest if it makes sense to be explicitly declare 
throwing of TNFE in method signature of split, flush and compact ? Although we 
dont need it since TNFE is subclass on IOE, but ... what is best practice? 

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-22 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175287#comment-13175287
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

@Stack
{quote}
I think patch is doing right thing. Its changing the contract for isRegionName 
but this is a private method and you are tightening what was a sloppy contract 
previous; it looks too like all instances of isRegionName can benefit from this 
tightening (is this your though Shrijeet?).
{quote}
Yes that is the idea. 

{quote}
You might make a method that returns a String tablename for a table you know 
exists (else it throws the TNFE).
{quote}
Makes sense, will do.

{quote}
We are creating a new CatalogTracker instance. No one seems to be shutting it 
down? Is that a prob?
{quote}
Did not understand this one Stack. cleanupCatalogTracker called in finally will 
stop the CatalogTracker, no? 


> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-22 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175264#comment-13175264
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

@Ted, will add a unit test and upload a new one on top of trunk. 

@Ram, thanks for commenting. Do you mean to say isRegionName should throw an 
exception? I wanted to keep the semantic same as before - it tells weather the 
name argument 'appears' to be a region name or not. When MetaReader.getRegion 
returns null we know one thing for sure, it is not a region. Determining if its 
a valid table is left to caller, depending on need.

Did you mean something else?

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 0001-HBASE-5041-Throw-error-if-table-does-not-exist.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-22 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175184#comment-13175184
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

Will do. 

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-22 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175154#comment-13175154
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

One possibility is make call to MetaReader.getRegion for the name and return 
true/false based on not-null/null value. 

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2011-12-22 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175132#comment-13175132
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

Our logic to check if the name is a regionname or tablename is designed to be 
as follows: 
tl;dr: If it is not an existing table, its should be a region. 

{noformat}
 /**
   * @param tableNameOrRegionName Name of a table or name of a region.
   * @return True if tableNameOrRegionName is *possibly* a region
   * name else false if a verified tablename (we call {@link 
#tableExists(byte[])};
   * else we throw an exception.
   * @throws IOException 
   */
  private boolean isRegionName(final byte [] tableNameOrRegionName)
  throws IOException {
if (tableNameOrRegionName == null) {
  throw new IllegalArgumentException("Pass a table name or region name");
}
return !tableExists(tableNameOrRegionName);
  }
{noformat}

My plan was to modify majorCompact function's else block to check if the table 
exist and throw TableNotFoundException if it does not. 
But because of name logic one will never reach 'else' part and a compaction 
request will be registered assuming it must be a region. 

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

2011-12-20 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173780#comment-13173780
 ] 

Shrijeet Paliwal commented on HBASE-5035:
-

Amm you might be right. 

{noformat}
final String serverAddress = Bytes.toString(value);

// instantiate the location
HRegionLocation loc = new HRegionLocation(regionInfo,
new HServerAddress(serverAddress));
{noformat}

The Bytes.toString call, in theory, may return both an empty string or a null 
string.
In the case when it returns a null (see below), it tries to log an error which 
I didn't see in my log file. 
So I am not still 100% sure this is out guy. 
{noformat}
 try {
  return new String(b, off, len, HConstants.UTF8_ENCODING);
} catch (UnsupportedEncodingException e) {
  LOG.error("UTF-8 not supported?", e);
  return null;
}
{noformat}

Nonetheless it will be good to put a check against serverAddress variable for 
emptiness as well nullness since HServerAddress construtor may throw runtime 
error otherwise. Interesting point is - it can throw both 
ArrayIndexOutOfBoundsException and NPE and I saw both cases.

{noformat}
/**
   * @param hostAndPort Hostname and port formatted as  ':' 

   */
  public HServerAddress(String hostAndPort) {
int colonIndex = hostAndPort.lastIndexOf(':');
{noformat}


I will open a subtask to make the trace more helpful. 

> Runtime exceptions during meta scan
> ---
>
> Key: HBASE-5035
> URL: https://issues.apache.org/jira/browse/HBASE-5035
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>
> Version: 0.90.3 + patches back ported 
> The other day our client started spitting these two runtime exceptions. Not 
> all clients connected to the cluster were under impact. Only 4 of them. While 
> 3 of them were throwing NPE, one of them was throwing 
> ArrayIndexOutOfBoundsException. The errors are : 
> 1. http://pastie.org/2987926
> 2. http://pastie.org/2987927
> Clients did not recover from this and I had to restart them. 
> Motive of this jira is to identify and put null checks at appropriate places. 
> Also with the given stack trace I can not tell which line caused NPE of 
> AIOBE, hence additional motive is to make the trace more helpful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

2011-12-20 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173697#comment-13173697
 ] 

Shrijeet Paliwal commented on HBASE-5035:
-

Ted, you had mentioned following in the email thread: 

"Null check for regionInfo should be added" 

I could not gather why regionInfo could possibly be null. The call 
'Writables.getHRegionInfo(value);' does not seem to return null ever. Could you 
please tell me your reasoning. 

Meanwhile I am still reading code and trying to find the place where NPE might 
occur.  

> Runtime exceptions during meta scan
> ---
>
> Key: HBASE-5035
> URL: https://issues.apache.org/jira/browse/HBASE-5035
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>
> Version: 0.90.3 + patches back ported 
> The other day our client started spitting these two runtime exceptions. Not 
> all clients connected to the cluster were under impact. Only 4 of them. While 
> 3 of them were throwing NPE, one of them was throwing 
> ArrayIndexOutOfBoundsException. The errors are : 
> 1. http://pastie.org/2987926
> 2. http://pastie.org/2987927
> Clients did not recover from this and I had to restart them. 
> Motive of this jira is to identify and put null checks at appropriate places. 
> Also with the given stack trace I can not tell which line caused NPE of 
> AIOBE, hence additional motive is to make the trace more helpful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

2011-12-14 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169955#comment-13169955
 ] 

Shrijeet Paliwal commented on HBASE-5035:
-

Here is the patched HCM https://gist.github.com/1478070 , can be used to match 
line numbers.  

> Runtime exceptions during meta scan
> ---
>
> Key: HBASE-5035
> URL: https://issues.apache.org/jira/browse/HBASE-5035
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>
> Version: 0.90.3 + patches back ported 
> The other day our client started spitting these two runtime exceptions. Not 
> all clients connected to the cluster were under impact. Only 4 of them. While 
> 3 of them were throwing NPE, one of them was throwing 
> ArrayIndexOutOfBoundsException. The errors are : 
> 1. http://pastie.org/2987926
> 2. http://pastie.org/2987927
> Clients did not recover from this and I had to restart them. 
> Motive of this jira is to identify and put null checks at appropriate places. 
> Also with the given stack trace I can not tell which line caused NPE of 
> AIOBE, hence additional motive is to make the trace more helpful. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse

2011-12-08 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165406#comment-13165406
 ] 

Shrijeet Paliwal commented on HBASE-4980:
-

Done attaching, should I click cancel patch and then click submit patch again?

> Null pointer exception in HBaseClient receiveResponse
> -
>
> Key: HBASE-4980
> URL: https://issues.apache.org/jira/browse/HBASE-4980
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Shrijeet Paliwal
>  Labels: newbie
> Attachments: 
> 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, 
> 0002-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, 
> 0003-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch
>
>
> Relevant Stack trace: 
> 2011-11-30 13:10:26,557 [IPC Client (47) connection to 
> xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN  
> org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call 
> responses
> java.lang.NullPointerException
> >-at 
> >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583)
> >-at 
> >org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511)
> {code}
>   if (LOG.isDebugEnabled())
>   LOG.debug(getName() + " got value #" + id);
> Call call = calls.remove(id);
> // Read the flag byte
> byte flag = in.readByte();
> boolean isError = ResponseFlag.isError(flag);
> if (ResponseFlag.isLength(flag)) {
>   // Currently length if present is unused.
>   in.readInt();
> }
> int state = in.readInt(); // Read the state.  Currently unused.
> if (isError) {
>   //noinspection ThrowableInstanceNeverThrown
>   call.setException(new RemoteException( WritableUtils.readString(in),
>   WritableUtils.readString(in)));
> } else {
> {code}
> This line {code}Call call = calls.remove(id);{code}  may return a null 
> 'call'. It is so because if you have rpc timeout enable, we proactively clean 
> up other calls which have expired their lifetime along with the call for 
> which socket timeout exception happend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism

2011-12-04 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162475#comment-13162475
 ] 

Shrijeet Paliwal commented on HBASE-4633:
-

Recent updates: 
* In my case the leak/memory-hold is not in HBase client. I could not find 
enough evidence to conclude that. What I did find is, our application holds one 
heavy object in memory. This object is shared between threads. Every N minutes 
the application creates a new instance of this class. Unless any thread is 
still holding on to an old instance, all old instances are GCed in time. Hence 
in theory at any time there should be only one active instance of heavy object. 

* Under heavy load and client operation RPC timeout enabled, some threads get 
stuck. This causes multiple instances of heavy object. In turn heap grows. 

After reading client code multiple times I can not gather why there will be a 
case when application thread will get stuck for several minutes. We have safe 
guards to clean up calls 'forcefully' if they have been alive for more than rpc 
timeout interval. 

I had planned to update the title of Jira to reflect above finding but 
Gaojinchao observed something interesting at his end and so keeping title same 
for now. Gaojinchao's thread is here: http://search-hadoop.com/m/teczL8KvcH


> Potential memory leak in client RPC timeout mechanism
> -
>
> Key: HBASE-4633
> URL: https://issues.apache.org/jira/browse/HBASE-4633
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.3
> Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0
>Reporter: Shrijeet Paliwal
>
> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> https://issues.apache.org/jira/browse/HBASE-4003
> We have been using the 'hbase.client.operation.timeout' knob
> introduced in 2937 for quite some time now. It helps us enforce SLA.
> We have two HBase clusters and two HBase client clusters. One of them
> is much busier than the other.
> We have seen a deterministic behavior of clients running in busy
> cluster. Their (client's) memory footprint increases consistently
> after they have been up for roughly 24 hours.
> This memory footprint almost doubles from its usual value (usual case
> == RPC timeout disabled). After much investigation nothing concrete
> came out and we had to put a hack
> which keep heap size in control even when RPC timeout is enabled. Also
> note , the same behavior is not observed in 'not so busy
> cluster.
> The patch is here : https://gist.github.com/1288023

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism

2011-10-20 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131786#comment-13131786
 ] 

Shrijeet Paliwal commented on HBASE-4633:
-

@Stack
No we did not run with that flag. Also we never got to a point when application 
had to die cause of OOM. The reasons (I guess) are :
# We have GC flags to do garbage collection as fast as possible. 
# The monitoring in place starts sending our alerts and we usually shoot the 
server in the head before it OOMs
# The load balancer will kick in and start sending no work to application 
server realizing it is in bad state. 

As mentioned earlier I have found it hard to reproduce in dev environment, 
failing to simulate production like load. But I must try again when.

> Potential memory leak in client RPC timeout mechanism
> -
>
> Key: HBASE-4633
> URL: https://issues.apache.org/jira/browse/HBASE-4633
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.3
> Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0
>Reporter: Shrijeet Paliwal
>
> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> https://issues.apache.org/jira/browse/HBASE-4003
> We have been using the 'hbase.client.operation.timeout' knob
> introduced in 2937 for quite some time now. It helps us enforce SLA.
> We have two HBase clusters and two HBase client clusters. One of them
> is much busier than the other.
> We have seen a deterministic behavior of clients running in busy
> cluster. Their (client's) memory footprint increases consistently
> after they have been up for roughly 24 hours.
> This memory footprint almost doubles from its usual value (usual case
> == RPC timeout disabled). After much investigation nothing concrete
> came out and we had to put a hack
> which keep heap size in control even when RPC timeout is enabled. Also
> note , the same behavior is not observed in 'not so busy
> cluster.
> The patch is here : https://gist.github.com/1288023

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism

2011-10-20 Thread Shrijeet Paliwal (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131677#comment-13131677
 ] 

Shrijeet Paliwal commented on HBASE-4633:
-

@Liyin, 
Are you using RPC timeouts for client operations? 

bq. But Not sure the leak comes from HBase Client jar itself or just our client 
code. 
In the absence of a concrete evidence that leak is indeed in HBase client jar, 
I have similar feeling. It could be in our application layer. 

bq. Our symptom is that the memory footprint will increase as time. But the 
actual heap size of the client is not increasing.
We observe the used memory using a collectd plugin 
http://collectd.org/wiki/index.php/Plugin:Memory

bq. So I am very interested to know when you have keep the heap size in 
control, is the memory leaking solved ?
We run with max and min memory set as -Xmx2{X}G -Xms{X}G . And when 'leak' 
happens the plugin shows the used memory touching 2X value, so it does seem 
heap size is increasing. Correct me here if I am mistaken. 

Let me know if you need more inputs. 

> Potential memory leak in client RPC timeout mechanism
> -
>
> Key: HBASE-4633
> URL: https://issues.apache.org/jira/browse/HBASE-4633
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.3
> Environment: HBase version: 0.90.3 + Patches , Hadoop version: CDH3u0
>Reporter: Shrijeet Paliwal
>
> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> https://issues.apache.org/jira/browse/HBASE-4003
> We have been using the 'hbase.client.operation.timeout' knob
> introduced in 2937 for quite some time now. It helps us enforce SLA.
> We have two HBase clusters and two HBase client clusters. One of them
> is much busier than the other.
> We have seen a deterministic behavior of clients running in busy
> cluster. Their (client's) memory footprint increases consistently
> after they have been up for roughly 24 hours.
> This memory footprint almost doubles from its usual value (usual case
> == RPC timeout disabled). After much investigation nothing concrete
> came out and we had to put a hack
> which keep heap size in control even when RPC timeout is enabled. Also
> note , the same behavior is not observed in 'not so busy
> cluster.
> The patch is here : https://gist.github.com/1288023

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5846) HBase rpm packing is broken at multiple places

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

[jira] [Commented] (HBASE-3638) If a FS bootstrap, need to also ensure ZK is cleaned

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

[jira] [Commented] (HBASE-5035) Runtime exceptions during meta scan

[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse

[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism

[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism

[jira] [Commented] (HBASE-4633) Potential memory leak in client RPC timeout mechanism

20 matches

Site Navigation

Mail list logo

Footer information