from:"Ted Yu \(Commented\) \(JIRA\)"

[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files

2012-04-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252537#comment-13252537
 ] 

Ted Yu commented on HBASE-5604:
---

If you run the patch through 'arc lint', you would see (just excerpt):
{code}
   Warning  (TXT3) Line Too Long
This line is 106 characters long, but the convention is 100 characters.

 274 System.err.println(  -D + BULK_OUTPUT_CONF_KEY + 
=/path/for/output);
 275 System.err.println(  (Exactly one table must be specified 
for this option, and not mapping can be specified!));
 276 System.err.println(Other options:);
  277 System.err.println(  -D + HLogInputFormat.START_TIME_KEY 
+ =ms (only apply edit after this time));
 278 System.err.println(  -D + HLogInputFormat.END_TIME_KEY + 
=ms (only apply edit before this time));
 279 System.err.println(For performance also consider the 
following options:\n
 280 +   -Dmapred.map.tasks.speculative.execution=false\n

   Warning  (TXT3) Line Too Long
This line is 105 characters long, but the convention is 100 characters.

 275 System.err.println(  (Exactly one table must be specified 
for this option, and not mapping can be specified!));
 276 System.err.println(Other options:);
 277 System.err.println(  -D + HLogInputFormat.START_TIME_KEY 
+ =ms (only apply edit after this time));
  278 System.err.println(  -D + HLogInputFormat.END_TIME_KEY + 
=ms (only apply edit before this time));
 279 System.err.println(For performance also consider the 
following options:\n
 280 +   -Dmapred.map.tasks.speculative.execution=false\n
 281 +   
-Dmapred.reduce.tasks.speculative.execution=false);
{code}
Below is a long line:
{code}
+throw new IOException(option +  must be specified either in the form 
2001-02-20T16:35:06.99 or as a number of milliseconds);
{code}
Minor comment: the 'a' in 'as a number' should be omitted.

 M/R tool to replay WAL files
 

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 5604-v9.txt, HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files

2012-04-12 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252916#comment-13252916
]

Ted Yu commented on HBASE-5604:
---

Patch looks good.
Minor comment:
{code}
+ HLog.Entry temp;
+ long i=-1;
{code}
Insert spaces around equal sign above.

M/R tool to replay WAL files

Key: HBASE-5604
URL: https://issues.apache.org/jira/browse/HBASE-5604
Project: HBase
Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0, 0.96.0

Attachments: 5604-v10.txt, 5604-v11.txt, 5604-v4.txt, 5604-v6.txt,
5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt

Just an idea I had. Might be useful for restore of a backup using the HLogs.
This could an M/R (with a mapper per HLog file).
The tool would get a timerange and a (set of) table(s). We'd pick the right
HLogs based on time before the M/R job is started and then have a mapper per
HLog file.
The mapper would then go through the HLog, filter all WALEdits that didn't
fit into the time range or are not any of the tables and then uses
HFileOutputFormat to generate HFiles.
Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5780) Fix race in HBase regionserver startup vs ZK SASL authentication

2012-04-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253074#comment-13253074
 ] 

Ted Yu commented on HBASE-5780:
---

{code}
+} catch (InterruptedException e) {
+  LOG.error(Interrupted while waiting for the ZookeeperWatcher to 
authenticate, e);
{code}
Is it safe to proceed with start() in the above case ?

 Fix race in HBase regionserver startup vs ZK SASL authentication
 

 Key: HBASE-5780
 URL: https://issues.apache.org/jira/browse/HBASE-5780
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
 Attachments: HBASE-5780.patch


 Secure RegionServers sometimes fail to start with the following backtrace:
 2012-03-22 17:20:16,737 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 centos60-20.ent.cloudera.com,60020,1332462015929: Unexpected exception during 
 initialization, aborting
 org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
 NoAuth for /hbase/shutdown
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:295)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:518)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:494)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:569)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:532)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:634)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5604) M/R tool to replay WAL files

2012-04-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253085#comment-13253085
 ] 

Ted Yu commented on HBASE-5604:
---

TestWALPlayer.java doesn't have test category.

 M/R tool to replay WAL files
 

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0, 0.96.0

 Attachments: 5604-v10.txt, 5604-v11.txt, 5604-v4.txt, 5604-v6.txt, 
 5604-v7.txt, 5604-v8.txt, 5604-v9.txt, HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5780) Fix race in HBase regionserver startup vs ZK SASL authentication

2012-04-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253102#comment-13253102
 ] 

Ted Yu commented on HBASE-5780:
---

I think throwing exception immediately is better.

Please also run through test suite using '-Psecurity' since Hadoop QA doesn't 
test security profile. Let us know the test result.

Thanks

 Fix race in HBase regionserver startup vs ZK SASL authentication
 

 Key: HBASE-5780
 URL: https://issues.apache.org/jira/browse/HBASE-5780
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.1, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
 Attachments: HBASE-5780.patch


 Secure RegionServers sometimes fail to start with the following backtrace:
 2012-03-22 17:20:16,737 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 centos60-20.ent.cloudera.com,60020,1332462015929: Unexpected exception during 
 initialization, aborting
 org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
 NoAuth for /hbase/shutdown
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:295)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:518)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:494)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:569)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:532)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:634)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253107#comment-13253107
 ] 

Ted Yu commented on HBASE-5778:
---

{code}
+  } catch (IndexOutOfBoundsException iobe) {
+// this can happen with a corrupted file, fall through
+  }
{code}
I think we should note down the cause of failure to retrieve dictionary entry 
and provide clearer message in the IOE below:
{code}
   if (entry == null) {
 throw new IOException(Missing dictionary entry for index 
 + dictIdx);
{code}

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5778) Turn on WAL compression by default

2012-04-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253127#comment-13253127
 ] 

Ted Yu commented on HBASE-5778:
---

The remaining issue is about how the replication sink correctly decompresses 
WAL.
From test output, I saw:
{code}
java.io.EOFException
  at java.io.DataInputStream.readFully(DataInputStream.java:180)
  at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2243)
  at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2249)
  at 
org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:129)
  at 
org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1700)
{code}
For replication sink, there is no CompressionContext in HLog$Entry which can be 
used to perform decompression.

I agree the change should be reverted.

 Turn on WAL compression by default
 --

 Key: HBASE-5778
 URL: https://issues.apache.org/jira/browse/HBASE-5778
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5778-addendum.txt, 5778.addendum, HBASE-5778.patch


 I ran some tests to verify if WAL compression should be turned on by default.
 For a use case where it's not very useful (values two order of magnitude 
 bigger than the keys), the insert time wasn't different and the CPU usage 15% 
 higher (150% CPU usage VS 130% when not compressing the WAL).
 When values are smaller than the keys, I saw a 38% improvement for the insert 
 run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
 WAL compression accounts for all the additional CPU usage, it might just be 
 that we're able to insert faster and we spend more time in the MemStore per 
 second (because our MemStores are bad when they contain tens of thousands of 
 values).
 Those are two extremes, but it shows that for the price of some CPU we can 
 save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
 CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm

2012-04-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249555#comment-13249555
 ] 

Ted Yu commented on HBASE-5656:
---

Patch looks good. Minor comment:
{code}
+hcd.setCompressionType(reader.getCompressionAlgorithm());
{code}
I think we should log the compression type used in the if block.

 LoadIncrementalHFiles createTable should detect and set compression algorithm
 -

 Key: HBASE-5656
 URL: https://issues.apache.org/jira/browse/HBASE-5656
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, 
 HBASE-5656-0.92.patch, HBASE-5656-0.92.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 LoadIncrementalHFiles doesn't set compression when creating the the table.
 This can be detected from the files within each family dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm

2012-04-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249602#comment-13249602
 ] 

Ted Yu commented on HBASE-5656:
---

Latest patch looks good to me.

 LoadIncrementalHFiles createTable should detect and set compression algorithm
 -

 Key: HBASE-5656
 URL: https://issues.apache.org/jira/browse/HBASE-5656
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, 
 HBASE-5656-0.92.patch, HBASE-5656-0.92.patch, HBASE-5656-0.92.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 LoadIncrementalHFiles doesn't set compression when creating the the table.
 This can be detected from the files within each family dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5727) secure hbase build broke because of 'HBASE-5451 Switch RPC call envelope/headers to PBs'

2012-04-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249612#comment-13249612
 ] 

Ted Yu commented on HBASE-5727:
---

TestRowProcessorEndpoint in security profile passed smoothly with latest patch.

+1 on integrating the patch and launching another secure build.

Thanks for working over the weekend, Devaraj.

 secure hbase build broke because of 'HBASE-5451 Switch RPC call 
 envelope/headers to PBs'
 

 Key: HBASE-5727
 URL: https://issues.apache.org/jira/browse/HBASE-5727
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Devaraj Das
Priority: Blocker
 Attachments: 5727.1.patch, 5727.patch


 If you build with the security profile -- i.e. add '-P security' on the 
 command line -- you'll see that the secure build is broke since we messed in 
 rpc.
 Assigning Deveraj to take a look.   If you can't work on this now DD, just 
 give it back to me and I'll have a go at it.  Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5689) Skipping RecoveredEdits may cause data loss

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243743#comment-13243743
 ] 

Ted Yu commented on HBASE-5689:
---

Using a TreeMap is common practice.

Please attach test suite result - Hadoop QA is not working.

 Skipping RecoveredEdits may cause data loss
 ---

 Key: HBASE-5689
 URL: https://issues.apache.org/jira/browse/HBASE-5689
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5689-simplified.txt, 5689-testcase.patch, 
 HBASE-5689.patch


 Let's see the following scenario:
 1.Region is on the server A
 2.put KV(r1-v1) to the region
 3.move region from server A to server B
 4.put KV(r2-v2) to the region
 5.move region from server B to server A
 6.put KV(r3-v3) to the region
 7.kill -9 server B and start it
 8.kill -9 server A and start it 
 9.scan the region, we could only get two KV(r1-v1,r2-v2), the third 
 KV(r3-v3) is lost.
 Let's analyse the upper scenario from the code:
 1.the edit logs of KV(r1-v1) and KV(r3-v3) are both recorded in the same 
 hlog file on server A.
 2.when we split server B's hlog file in the process of ServerShutdownHandler, 
 we create one RecoveredEdits file f1 for the region.
 2.when we split server A's hlog file in the process of ServerShutdownHandler, 
 we create another RecoveredEdits file f2 for the region.
 3.however, RecoveredEdits file f2 will be skiped when initializing region
 HRegion#replayRecoveredEditsIfAny
 {code}
  for (Path edits: files) {
   if (edits == null || !this.fs.exists(edits)) {
 LOG.warn(Null or non-existent edits file:  + edits);
 continue;
   }
   if (isZeroLengthThenDelete(this.fs, edits)) continue;
   if (checkSafeToSkip) {
 Path higher = files.higher(edits);
 long maxSeqId = Long.MAX_VALUE;
 if (higher != null) {
   // Edit file name pattern, HLog.EDITFILES_NAME_PATTERN: -?[0-9]+
   String fileName = higher.getName();
   maxSeqId = Math.abs(Long.parseLong(fileName));
 }
 if (maxSeqId = minSeqId) {
   String msg = Maximum possible sequenceid for this log is  + 
 maxSeqId
   + , skipped the whole file, path= + edits;
   LOG.debug(msg);
   continue;
 } else {
   checkSafeToSkip = false;
 }
   }
 {code}
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243746#comment-13243746
 ] 

Ted Yu commented on HBASE-5693:
---

CreateTableHandler isn't initializing the regions.
Who will initialize them ?

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5677) The master never does balance because duplicate openhandled the one region

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243747#comment-13243747
 ] 

Ted Yu commented on HBASE-5677:
---

Interesting.
Chunhui proposed safe mode for Master in HBASE-5270. See 
https://issues.apache.org/jira/browse/HBASE-5270?focusedCommentId=13214394page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13214394

Can you verify that this issue has been fixed in 0.92.2 ?

Thanks

 The master never does balance because duplicate openhandled the one region
 --

 Key: HBASE-5677
 URL: https://issues.apache.org/jira/browse/HBASE-5677
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6
 Environment: 0.90
Reporter: xufeng
Assignee: xufeng

 If region be assigned When the master is doing initialization(before do 
 processFailover),the region will be duplicate openhandled.
 because the unassigned node in zookeeper will be handled again in 
 AssignmentManager#processFailover()
 it cause the region in RIT,thus the master never does balance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243762#comment-13243762
 ] 

Ted Yu commented on HBASE-5693:
---

It is called from OpenRegionHandler.openRegion()

I once made some threads daemon which passed unit tests but resulted in master 
and region server failing to start.

Testing on a real cluster is desirable.

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5693) When creating a region, the master initializes it and creates a memstore within the master server

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243765#comment-13243765
 ] 

Ted Yu commented on HBASE-5693:
---

@N:
Can you rebased the patch for trunk ?
{code}
Hunk #3 FAILED at 3613.
1 out of 3 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java.rej
{code}

 When creating a region, the master initializes it and creates a memstore 
 within the master server
 -

 Key: HBASE-5693
 URL: https://issues.apache.org/jira/browse/HBASE-5693
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5693.v1.patch


 I didn't do a complete analysis, but the attached patch saves more than 0.25s 
 for each region creation and locally all the unit tests work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5665) Repeated split causes HRegionServer failures and breaks table

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243842#comment-13243842
 ] 

Ted Yu commented on HBASE-5665:
---

HBASE-5665-trunk.patch looks good.

 Repeated split causes HRegionServer failures and breaks table 
 --

 Key: HBASE-5665
 URL: https://issues.apache.org/jira/browse/HBASE-5665
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
Priority: Blocker
 Attachments: HBASE-5665-0.92.patch, HBASE-5665-trunk.patch


 Repeated splits on large tables (2 consecutive would suffice) will 
 essentially break the table (and the cluster), unrecoverable.
 The regionserver doing the split dies and the master will get into an 
 infinite loop trying to assign regions that seem to have the files missing 
 from HDFS.
 The table can be disabled once. upon trying to re-enable it, it will remain 
 in an intermediary state forever.
 I was able to reproduce this on a smaller table consistently.
 {code}
 hbase(main):030:0 (0..1).each{|x| put 't1', #{x}, 'f1:t', 'dd'}
 hbase(main):030:0 (0..1000).each{|x| split 't1', #{x*10}}
 {code}
 Running overlapping splits in parallel (e.g. #{x*10+1}, #{x*10+2}... ) 
 will reproduce the issue almost instantly and consistently. 
 {code}
 2012-03-28 10:57:16,320 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
 Offlined parent region t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1. in 
 META
 2012-03-28 10:57:16,321 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Split requested for 
 t1,5,1332957435767.648d30de55a5cec6fc2f56dcb3c7eee1..  
 compaction_queue=(0:1), split_queue=10
 2012-03-28 10:57:16,343 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
 of failed split of t1,,1332957435767.2fb0473f4e71339e88dab0ee0d4dffa1.; 
 Failed ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 java.io.IOException: Failed 
 ld2,60020,1332957343833-daughterOpener=2469c5650ea2aeed631eb85d3cdc3124
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:363)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:451)
 at 
 org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.FileNotFoundException: File does not exist: 
 /hbase/t1/589c44cabba419c6ad8c9b427e5894e3.2fb0473f4e71339e88dab0ee0d4dffa1/f1/d62a852c25ad44e09518e102ca557237
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1813)
 at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
 at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:341)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1008)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.init(HalfStoreFileReader.java:65)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:467)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
 at 
 org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:284)
 at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:221)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2511)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:450)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3229)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughterRegion(SplitTransaction.java:504)
 at 
 org.apache.hadoop.hbase.regionserver.SplitTransaction$DaughterOpener.run(SplitTransaction.java:484)
 ... 1 more
 2012-03-28 10:57:16,345 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 ld2,60020,1332957343833: Abort; we got an error after point-of-no-return
 {code}
 http://hastebin.com/diqinibajo.avrasm
 later edit:
 (I'm using the last 4 characters from each string)
 Region 94e3 has storefile 7237
 Region 94e3 gets splited in daughters a: ffa1 and b: eee1
 Daughter region ffa1

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243845#comment-13243845
 ] 

Ted Yu commented on HBASE-5666:
---

{code}
+if (keeperEx != null)
+  throw keeperEx;
{code}
Please either lift the throw to the same line as if or add curly braces.
{code}
+checkExists(zk, parentZNode, maxTimeMs);
+LOG.info(Parent znode exists:  + parentZNode);
{code}
If checkExists() returns -1, would the log statement still be true ?

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243863#comment-13243863
 ] 

Ted Yu commented on HBASE-5666:
---

Patch v2 looks good.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, 
 hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5436) Right-size the map when reading attributes.

2012-04-01 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243958#comment-13243958
 ] 

Ted Yu commented on HBASE-5436:
---

Integrated to 0.92 branch.

 Right-size the map when reading attributes.
 ---

 Key: HBASE-5436
 URL: https://issues.apache.org/jira/browse/HBASE-5436
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Benoit Sigoure
Assignee: Benoit Sigoure
Priority: Trivial
  Labels: performance
 Fix For: 0.94.0

 Attachments: 0001-Right-size-the-map-when-reading-attributes.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-26 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238675#comment-13238675
 ] 

Ted Yu commented on HBASE-5623:
---

Last patch should be good to go.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-26 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238891#comment-13238891
 ] 

Ted Yu commented on HBASE-2600:
---

@Alex:
Can you attach hbase-2600-root.dir.tgz to this JIRA ?
Please briefly describe how you generated the tar ball.

Thanks

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, 
 HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5641) decayingSampleTick1 prevents HBase from shutting down.

2012-03-26 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239139#comment-13239139
 ] 

Ted Yu commented on HBASE-5641:
---

+1 on patch.

 decayingSampleTick1 prevents HBase from shutting down.
 --

 Key: HBASE-5641
 URL: https://issues.apache.org/jira/browse/HBASE-5641
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0

 Attachments: 5641.txt


 I think this is the problem. It creates a non-daemon thread.
 {code}
   private static final ScheduledExecutorService TICK_SERVICE = 
   Executors.newScheduledThreadPool(1, 
   Threads.getNamedThreadFactory(decayingSampleTick));
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-25 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238009#comment-13238009
 ] 

Ted Yu commented on HBASE-5615:
---

Integrated to 0.92, 0.94 and TRUNK as well.

 the master never does balance because of balancing the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
 NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html


 the master never do balance becauseof when master do rebuildUserRegions()，it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-25 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238010#comment-13238010
 ] 

Ted Yu commented on HBASE-2600:
---

@Alex:
Try this:
{code}
git diff --no-prefix --binary 
{code}

Thanks

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, jenkins.pdf


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-25 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238030#comment-13238030
 ] 

Ted Yu commented on HBASE-5615:
---

@Xufeng:
You're welcome.

In the future, please grant license to Apache when you attach patches.

 the master never does balance because of balancing the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
 NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html


 the master never do balance becauseof when master do rebuildUserRegions()，it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-25 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238074#comment-13238074
 ] 

Ted Yu commented on HBASE-2600:
---

dev-support/test-patch.sh doesn't use '--binary' option when applying patches.

I tried the following command:
{code}
patch -p0 --binary -i HBASE-2600+5217-Sun-Mar-25-2012-v4.patch
{code}
But src/test/data/hbase-2600-root.dir.tgz wasn't unpacked from patch.

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, 
 HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region

2012-03-24 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237556#comment-13237556
 ] 

Ted Yu commented on HBASE-5615:
---

Integrated to 0.90 branch.

Thanks for the patch Xufeng.

Thanks for the review Ramkrishna and Jinchao.

Patch for TRUNK to follow

 the master never does balance because of balancing the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Fix For: 0.90.7, 0.96.0

 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
 NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html


 the master never do balance becauseof when master do rebuildUserRegions()，it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-24 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237714#comment-13237714
 ] 

Ted Yu commented on HBASE-5623:
---

Adding LOG.debug should be fine.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2012-03-19 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233120#comment-13233120
 ] 

Ted Yu commented on HBASE-3996:
---

Eran might be busy.

I created https://reviews.apache.org/r/4411/ for people to review.

 Support multiple tables and scanners as input to the mapper in map/reduce jobs
 --

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Eran Kutner
 Fix For: 0.94.0, 0.96.0

 Attachments: 3996-v2.txt, 3996-v3.txt, HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records

2012-03-19 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233179#comment-13233179
 ] 

Ted Yu commented on HBASE-5564:
---

{code}
+public int getTimestapKeyColumnIndex() {
{code}
Please fix typo in the above method name.
{code}
+-D + TIMESTAMP_CONF_KEY + =currentTimeAsLong - use the specified 
timestamp for the import. This option is ignored if HBASE_TS_KEY is specfied in 
'importtsv.columns'\n +
{code}
Please wrap the long line above.
{code}
+// Should never get 0.
+ts = conf.getLong(ImportTsv.TIMESTAMP_CONF_KEY, 0);
{code}
Please explain why 0 wouldn't be returned.
{code}
+  if (parser.getTimestapKeyColumnIndex() != -1)
+ts = parsed.getTimestamp();
{code}
Please use curly braces around the assignment.

 Bulkload is discarding duplicate records
 

 Key: HBASE-5564
 URL: https://issues.apache.org/jira/browse/HBASE-5564
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
 Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
  Labels: bulkloader
 Attachments: HBASE-5564_trunk.patch


 Duplicate records are getting discarded when duplicate records exists in same 
 input file and more specifically if they exists in same split.
 Duplicate records are considered if the records are from diffrent different 
 splits.
 Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209

2012-03-19 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233216#comment-13233216
 ] 

Ted Yu commented on HBASE-5596:
---

Is this patch ready for integration ?

From 
https://builds.apache.org/job/PreCommit-HBASE-Build/1212//testReport/org.apache.hadoop.hbase.client/TestInstantSchemaChangeFailover/testInstantSchemaChangeWhileRSCrash/:
{code}
Caused by: java.lang.RuntimeException: Master not initialized after 200 seconds
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200)
{code}

 Few minor bugs from HBASE-5209
 --

 Key: HBASE-5596
 URL: https://issues.apache.org/jira/browse/HBASE-5596
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor
 Attachments: HBASE_5596.patch


 A few leftover bugs from HBASE-5209.  Comments are documented here:
 https://reviews.apache.org/r/3892/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5542) Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227569#comment-13227569
 ] 

Ted Yu commented on HBASE-5542:
---

I think we should keep time bound whose default value can be large.

 Unify HRegion.mutateRowsWithLocks() and HRegion.processRow()
 

 Key: HBASE-5542
 URL: https://issues.apache.org/jira/browse/HBASE-5542
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5542.D2217.1.patch, HBASE-5542.D2217.2.patch, 
 HBASE-5542.D2217.3.patch, HBASE-5542.D2217.4.patch, HBASE-5542.D2217.5.patch, 
 HBASE-5542.D2217.6.patch, HBASE-5542.D2217.7.patch


 mutateRowsWithLocks() does atomic mutations on multiple rows.
 processRow() does atomic read-modify-writes on a single row.
 It will be useful to generalize both and have a
 processRowsWithLocks() that does atomic read-modify-writes on multiple rows.
 This also helps reduce some redundancy in the codes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227661#comment-13227661
 ] 

Ted Yu commented on HBASE-5206:
---

Patch v2 looks good.
Minor comments:
{code}
 // Call to undisableTable does this. TODO: Make a more formal purge table.
-am.getZKTable().setEnabledTable(Bytes.toString(tableName));
+am.getZKTable().setDeletedTable(Bytes.toString(tableName));
{code}
I don't see undisableTable. Can we remove the comment above ?
{code}
+  } else if (!this.zkTable
+  .isEnabledTable(region.getTableNameAsString())) {
+setEnabledTable(region);
{code}
setEnabledTable(HRegionInfo hri) already calls zkTable.isEnabledTable(). It 
seems we can call setEnabledTable(region) directly above.

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227669#comment-13227669
 ] 

Ted Yu commented on HBASE-5206:
---

{code}
+  String errorMsg = Unable to ensure that the table  + tableName
+  + will be +  enabled because of a ZooKeeper issue;
{code}
A space should be added between  and will.

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92 and TRUNK

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227688#comment-13227688
 ] 

Ted Yu commented on HBASE-5206:
---

The following error is reproducible on MacBook (patch for 0.92):
{code}
Tests in error: 
  org.apache.hadoop.hbase.TestDrainingServer: 
org.apache.hadoop.hbase.TableNotEnabledException: t
{code}

 Port HBASE-5155 to 0.92 and TRUNK
 -

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.96.0
Reporter: Zhihong Yu
 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227892#comment-13227892
]

Ted Yu commented on HBASE-4608:
---

Introducing WAL_VERSION would imply that we may change HLog aspect other than
compression in the future.
Is there plan for the above ?
Having another compression type is nice but requires making HLogKey persistence
pluggable.

I think it would be better to introduce one meta entry instead of two.

HLog Compression

Key: HBASE-4608
URL: https://issues.apache.org/jira/browse/HBASE-4608
Project: HBase
Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
Fix For: 0.94.0

Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt,
4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt,
4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt

The current bottleneck to HBase write speed is replicating the WAL appends
across different datanodes. We can speed up this process by compressing the
HLog. Current plan involves using a dictionary to compress table name, region
id, cf name, and possibly other bits of repeated data. Also, HLog format may
be changed in other ways to produce a smaller HLog.

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227901#comment-13227901
 ] 

Ted Yu commented on HBASE-4608:
---

bq. try a paragraph of text going in and out
LRUDictionary deals with byte array:
{code}
  public short findEntry(byte[] data, int offset, int length) {
{code}
In this regard, piping text into the dictionary is functionally same as piping 
byte[] form of integer.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227946#comment-13227946
]

Ted Yu commented on HBASE-4608:
---

I think WAL_VERSION metadata is orthogonal to compression type metadata and I
would expect both to be present in new HLog files written with this feature.
Say we define WAL_VERSION as v2 which has WAL compression capability. We still
need to check compression type metadata before applying dictionary compression.
In this regard adding WAL_VERSION seems to be redundant.

HLog Compression

Key: HBASE-4608
URL: https://issues.apache.org/jira/browse/HBASE-4608
Project: HBase
Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
Fix For: 0.94.0

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227954#comment-13227954
 ] 

Ted Yu commented on HBASE-4608:
---

bq. Should the Compression class in wal package ...
I only see KeyValueCompression.java under wal package. Please elaborate which 
class should carry more comments.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227961#comment-13227961
 ] 

Ted Yu commented on HBASE-4608:
---

Uploaded v23 onto review board.
After WAL version metadata design is finalized, will add that.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227984#comment-13227984
 ] 

Ted Yu commented on HBASE-4608:
---

For code specific review, please use https://reviews.apache.org/r/4185/ where 
there would be context.

I can add WAL_VERSION as v2 in the metadata.
My question is: would HLog v2 be allowed not to compress Log entries ?

If desirable, we can discuss in more detail, face to face, on the 27th.

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13227997#comment-13227997
 ] 

Ted Yu commented on HBASE-5399:
---

TestAtomicOperation failed in latest TRUNK build:
https://builds.apache.org/job/HBase-TRUNK/2676/testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/

Similar failure shows up in the latest Hadoop QA run of HBASE-5542

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5399) Cut the link between the client and the zookeeper ensemble

2012-03-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228009#comment-13228009
 ] 

Ted Yu commented on HBASE-5399:
---

From test output:
{code}
Exception in thread Thread-211 junit.framework.AssertionFailedError   at 
junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.fail(Assert.java:56)
at 
org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392)
{code}
Here is related code in test:
{code}
  if (r.size() != 1) {
LOG.debug(r);
failures.incrementAndGet();
fail();
  }
{code}

 Cut the link between the client and the zookeeper ensemble
 --

 Key: HBASE-5399
 URL: https://issues.apache.org/jira/browse/HBASE-5399
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5399.v27.patch, 5399.v38.patch, 5399.v39.patch, 
 5399.v40.patch, 5399.v41.patch, 5399.v42.patch, 5399.v42.patch, 
 5399.v42.patch, 5399.v42.patch, 5399_inprogress.patch, 
 5399_inprogress.v14.patch, 5399_inprogress.v16.patch, 
 5399_inprogress.v18.patch, 5399_inprogress.v20.patch, 
 5399_inprogress.v21.patch, 5399_inprogress.v23.patch, 
 5399_inprogress.v3.patch, 5399_inprogress.v32.patch, 
 5399_inprogress.v9.patch, nochange.patch


 The link is often considered as an issue, for various reasons. One of them 
 being that there is a limit on the number of connection that ZK can manage. 
 Stack was suggesting as well to remove the link to master from HConnection.
 There are choices to be made considering the existing API (that we don't want 
 to break).
 The first patches I will submit on hadoop-qa should not be committed: they 
 are here to show the progress on the direction taken.
 ZooKeeper is used for:
 - public getter, to let the client do whatever he wants, and close ZooKeeper 
 when closing the connection = we have to deprecate this but keep it.
 - read get master address to create a master = now done with a temporary 
 zookeeper connection
 - read root location = now done with a temporary zookeeper connection, but 
 questionable. Used in public function locateRegion. To be reworked.
 - read cluster id = now done once with a temporary zookeeper connection.
 - check if base done is available = now done once with a zookeeper 
 connection given as a parameter
 - isTableDisabled/isTableAvailable = public functions, now done with a 
 temporary zookeeper connection.
  - Called internally from HBaseAdmin and HTable
 - getCurrentNrHRS(): public function to get the number of region servers and 
 create a pool of thread = now done with a temporary zookeeper connection
 -
 Master is used for:
 - getMaster public getter, as for ZooKeeper = we have to deprecate this but 
 keep it.
 - isMasterRunning(): public function, used internally by HMerge  HBaseAdmin
 - getHTableDescriptor*: public functions offering access to the master.  = 
 we could make them using a temporary master connection as well.
 Main points are:
 - hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a 
 strongly coupled architecture ;-). This can be changed, but requires a lot of 
 modifications in these classes (likely adding a class in the middle of the 
 hierarchy, something like that). Anyway, non connected client will always be 
 really slower, because it's a tcp connection, and establishing a tcp 
 connection is slow.
 - having a link between ZK and all the client seems to make sense for some 
 Use Cases. However, it won't scale if a TCP connection is required for every 
 client
 - if we move the table descriptor part away from the client, we need to find 
 a new place for it.
 - we will have the same issue if HBaseAdmin (for both ZK  Master), may be we 
 can put a timeout on the connection. That would make the whole system less 
 deterministic however.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5201) Add TThreadedSelectorServer and remove repeat codes in ThriftServer and HRegionThriftServer

2012-01-15 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186520#comment-13186520
 ] 

Ted Yu commented on HBASE-5201:
---

@Scott:
I think you need to mark your work complete before seeing the 'Submit Patch' 
button.
After you click 'Submit Patch', future attachments would be picked up by Hadoop 
QA.

 Add TThreadedSelectorServer and remove repeat codes in ThriftServer and 
 HRegionThriftServer
 ---

 Key: HBASE-5201
 URL: https://issues.apache.org/jira/browse/HBASE-5201
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.94.0

 Attachments: HBASE-5201-v2.txt, HBASE-5201-v3.txt, HBASE-5201.txt


 TThreadedSelectorServer is good for RPC-heavy situation because IO are not 
 limited to one CPU. See
 https://issues.apache.org/jira/browse/Thrift-1167
 I am porting the related classes form thrift trunk (it is not there in 
 thrift-0.7.0).
 There are lots of repeat codes in ThriftServer and HRegionThriftServer.
 These codes are now moved to a Runnable called ThriftServerRunner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-15 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186527#comment-13186527
 ] 

Ted Yu commented on HBASE-5174:
---

A slight variation of my previous proposal:
MonitoredTaskImpl can maintain MapString, MonitoredTask where String key is 
the description passed to 
TaskMonitor.createStatus(), prepended with MonitoredTask.State and separator 
string (such as '||').

A task may have two entries in the map, one starting with 'ABORTED', the other 
starting with 'COMPLETE'. This corresponds to task retries.
Special handling would be added to MonitoredTaskImpl.setState().

 Coalesce aborted tasks in the TaskMonitor
 -

 Key: HBASE-5174
 URL: https://issues.apache.org/jira/browse/HBASE-5174
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.92.1


 Some tasks can get repeatedly canceled like flushing when splitting is going 
 on, in the logs it looks like this:
 {noformat}
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 2012-01-10 19:28:29,164 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=1.6g
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 2012-01-10 19:28:29,164 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=1.6g
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 {noformat}
 But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
 regions. Basically 1000x:
 {noformat}
 Tue Jan 10 19:28:29 UTC 2012  Flushing 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
 ago)   Not flushing since writes not enabled (since 31sec ago)
 {noformat}
 It's ugly and I'm sure some users will freak out seeing this, plus you have 
 to scroll down all the way to see your regions. Coalescing consecutive 
 aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167581#comment-13167581
 ] 

Ted Yu commented on HBASE-4951:
---

+1 on patch. 

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.90.6

 Attachments: HBASE-4951.patch


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5001) Improve the performance of block cache keys

2011-12-12 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167947#comment-13167947
 ] 

Ted Yu commented on HBASE-5001:
---

In LruBlockCache, please change the javadoc for cacheKey parameter. 

 Improve the performance of block cache keys
 ---

 Key: HBASE-5001
 URL: https://issues.apache.org/jira/browse/HBASE-5001
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5001-v1.txt


 Doing a pure random read test on data that's 100% block cache, I see that we 
 are spending quite some time in getBlockCacheKey:
 {quote}
 IPC Server handler 19 on 62023 daemon prio=10 tid=0x7fe0501ff800 
 nid=0x6c87 runnable [0x7fe0577f6000]
java.lang.Thread.State: RUNNABLE
   at java.util.Arrays.copyOf(Arrays.java:2882)
   at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
   at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
   at java.lang.StringBuilder.append(StringBuilder.java:119)
   at 
 org.apache.hadoop.hbase.io.hfile.HFile.getBlockCacheKey(HFile.java:457)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:249)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.seekToDataBlock(HFileBlockIndex.java:209)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:521)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:536)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:178)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111)
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekExactly(StoreFileScanner.java:219)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:80)
   at 
 org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1689)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:2857)
 {quote}
 Since the HFile name size is known and the offset is a long, it should be 
 possible to allocate exactly what we need. Maybe use byte[] as the key and 
 drop the separator too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4994) TestHeapSize broke in trunk

2011-12-09 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13166405#comment-13166405
 ] 

Ted Yu commented on HBASE-4994:
---

Thanks for fixing this. 
Hadoop Qa didn't report this 

 TestHeapSize broke in trunk
 ---

 Key: HBASE-4994
 URL: https://issues.apache.org/jira/browse/HBASE-4994
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: heapsize.txt


 This commit added Map to HRegion
 {code}
 commit 888d73a9f5fe907f7c616211322fff339eeaa446
 Author: Zhihong Yu te...@apache.org
 Date:   Fri Dec 9 06:01:58 2011 +
 HBASE-4946  HTable.coprocessorExec (and possibly coprocessorProxy) does 
 not work with
dynamically loaded coprocessors (Andrei Dragomir)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4880) Region is on service before openRegionHandler completes, may cause data loss

2011-12-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165837#comment-13165837
 ] 

Ted Yu commented on HBASE-4880:
---

+1 on patch v4.

 Region is on service before openRegionHandler completes, may cause data loss
 

 Key: HBASE-4880
 URL: https://issues.apache.org/jira/browse/HBASE-4880
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: 4880.txt, hbase-4880.patch, hbase-4880v2.patch, 
 hbase-4880v3.patch, hbase-4880v4.patch


 OpenRegionHandler in regionserver is processed as the following steps:
 {code}
 1.openregion()(Through it, closed = false, closing = false)
 2.addToOnlineRegions(region)
 3.update .meta. table 
 4.update ZK's node state to RS_ZK_REGION_OPEND
 {code}
 We can find that region is on service before Step 4.
 It means client could put data to this region after step 3.
 What will happen if step 4 is failed processing?
 It will execute OpenRegionHandler#cleanupFailedOpen which will do closing 
 region, and master assign this region to another regionserver.
 If closing region is failed, the data which is put between step 3 and step 4 
 may loss, because the region has been opend on another regionserver and be 
 put new data. Therefore, it may not be recoverd through replayRecoveredEdit() 
 because the edit's LogSeqId is smaller than current region SeqId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d

2011-12-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165839#comment-13165839
 ] 

Ted Yu commented on HBASE-4946:
---

Will commit patch v4 tomorrow.

 HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
 dynamically loaded coprocessors (from hdfs or local system), because the RPC 
 system tries to deserialize an unknown class. 
 -

 Key: HBASE-4946
 URL: https://issues.apache.org/jira/browse/HBASE-4946
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.0
Reporter: Andrei Dragomir
Assignee: Andrei Dragomir
 Attachments: 4946-v4.txt, HBASE-4946-v2.patch, HBASE-4946-v3.patch, 
 HBASE-4946.patch


 Loading coprocessors jars from hdfs works fine. I load it from the shell, 
 after setting the attribute, and it gets loaded:
 {noformat}
 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
 config now ...
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
 com.MyCoprocessorClass needs to be loaded from a file - 
 hdfs://localhost:9000/coproc/rt-  0.0.1-SNAPSHOT.jar.
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
 com.MyCoprocessorClass
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
 RegionEnvironment createEnvironment
 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
 handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
 protocol=com.MyCoprocessorClassProtocol
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
 coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
 {noformat}
 The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
 with a dynamic method. When calling this method from the client with 
 HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
 cannot be deserialized from writables. 
 The problem is that Exec tries to do an early resolve of the coprocessor 
 class. The coprocessor class is loaded, but it is in the context of the 
 HRegionServer / HRegion. So, the call fails:
 {noformat}
 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
 Error in readFields
 java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
   at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
   ... 10 more
 {noformat}
 Probably the correct way to fix this is to make Exec really smart, so that it 
 knows all the class definitions loaded in CoprocessorHost(s).
 I created a small patch that simply doesn't resolve the class definition in 
 the Exec, instead passing it as string down to the HRegion layer. This layer 
 knows all the definitions, and simply loads it by name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165857#comment-13165857
 ] 

Ted Yu commented on HBASE-4120:
---

Please add license to QosRegionObserver.java

I think we should enhance ScannerListener.leaseExpired() with 
pre/postScannerClose() so that features such as table priority can receive 
consistent notification.
The trick here is that RegionScanner only exposes HRegionInfo. We should be 
able to utilize onlineRegions in looking up HRegion by region name.
Then we should be able to call the following:
{code}
if (region != null  region.getCoprocessorHost() != null) {
  region.getCoprocessorHost().postScannerClose(s);
}
{code}

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
 TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
 TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d

2011-12-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165870#comment-13165870
 ] 

Ted Yu commented on HBASE-4946:
---

Integrated to 0.92 and TRUNK.

Thanks for the patch Andrei.

Thanks for the review Lars and Stack.

 HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
 dynamically loaded coprocessors (from hdfs or local system), because the RPC 
 system tries to deserialize an unknown class. 
 -

 Key: HBASE-4946
 URL: https://issues.apache.org/jira/browse/HBASE-4946
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.0
Reporter: Andrei Dragomir
Assignee: Andrei Dragomir
 Fix For: 0.92.0, 0.94.0

 Attachments: 4946-v4.txt, 4946-v5.txt, HBASE-4946-v2.patch, 
 HBASE-4946-v3.patch, HBASE-4946.patch


 Loading coprocessors jars from hdfs works fine. I load it from the shell, 
 after setting the attribute, and it gets loaded:
 {noformat}
 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
 config now ...
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
 com.MyCoprocessorClass needs to be loaded from a file - 
 hdfs://localhost:9000/coproc/rt-  0.0.1-SNAPSHOT.jar.
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
 com.MyCoprocessorClass
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
 RegionEnvironment createEnvironment
 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
 handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
 protocol=com.MyCoprocessorClassProtocol
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
 coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
 {noformat}
 The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
 with a dynamic method. When calling this method from the client with 
 HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
 cannot be deserialized from writables. 
 The problem is that Exec tries to do an early resolve of the coprocessor 
 class. The coprocessor class is loaded, but it is in the context of the 
 HRegionServer / HRegion. So, the call fails:
 {noformat}
 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
 Error in readFields
 java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
   at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
   ... 10 more
 {noformat}
 Probably the correct way to fix this is to make Exec really smart, so that it 
 knows all the class definitions loaded in CoprocessorHost(s).
 I created a small patch that simply doesn't resolve the class definition in 
 the Exec, instead passing it as string down to the HRegion layer. This layer 
 knows all the definitions, and simply loads it by name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-08 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165876#comment-13165876
 ] 

Ted Yu commented on HBASE-4120:
---

In PriorityFunction.java, several lines are much wider than 80 characters.
Please use the formatter from HBASE-3678 on the new files.

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
 TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
 TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.

2011-12-07 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164274#comment-13164274
]

Ted Yu commented on HBASE-4970:
---

I think there shouldn't be upper case letters in name of new config.

Add a parameter to change keepAliveTime of Htable thread pool.
---

Key: HBASE-4970
URL: https://issues.apache.org/jira/browse/HBASE-4970
Project: HBase
Issue Type: Improvement
Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Trivial
Fix For: 0.90.5

Attachments: HBASE-4970_Branch90.patch

In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES
is slowed down.
Why increasing keepAliveTime of HBase thread pool is slowing down our problem
occurance [RES value increase]?
You can go through the source of sun.nio.ch.Util. Every thread hold 3
softreference of direct buffer(mustangsrc) for reusage. The code names the 3
softreferences buffercache. If the buffer was all occupied or none was
suitable in size, and new request comes, new direct buffer is allocated.
After the service, the bigger one replaces the smaller one in buffercache.
The replaced buffer is released.
So I think we can add a parameter to change keepAliveTime of Htable thread
pool.

[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164599#comment-13164599
 ] 

Ted Yu commented on HBASE-4224:
---

@Akash:
Do you have a newer patch ?
If so, please upload to this JIRA.

 Need a flush by regionserver rather than by table option
 

 Key: HBASE-4224
 URL: https://issues.apache.org/jira/browse/HBASE-4224
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: stack
Assignee: Akash Ashok
 Attachments: HBase-4224.patch


 This evening needed to clean out logs on the cluster.  logs are by 
 regionserver.  to let go of logs, we need to have all edits emptied from 
 memory.  only flush is by table or region.  We need to be able to flush the 
 regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164658#comment-13164658
 ] 

Ted Yu commented on HBASE-4729:
---

The HadoopQA report @ 
https://builds.apache.org/job/PreCommit-HBASE-Build/405//testReport/ showed 
basically no tests were run.

A manual test suite execution should have been performed.

 Clash between region unassign and splitting kills the master
 

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 
 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164695#comment-13164695
 ] 

Ted Yu commented on HBASE-4927:
---

Ran through the previously failing tests:
{code}
 1010  mt -Dtest=TestMasterRestartAfterDisablingTable 
 1012  mt -Dtest=TestOfflineMetaRebuildBase#testMetaRebuild
 1013  mt -Dtest=TestOfflineMetaRebuildHole
{code}
They pass now.

Going to commit to 0.92 and TRUNK.

 CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the 
 last region when the endkey is empty
 ---

 Key: HBASE-4927
 URL: https://issues.apache.org/jira/browse/HBASE-4927
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.0

 Attachments: 
 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 
 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 
 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 
 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, 
 hbase-4927-fix-ws.txt


 When reviewing HBASE-4238 backporting, Jon found this issue.
 What happens if the split points are  (empty end key is the last key, empty 
 start key is the first key)
 Parent [A,)
 L daughter [A,B), 
 R daughter [B,)
 When sorted, we gets to end key comparision which results in this incorrector 
 order:
 [A,B), [A,), [B,) 
 we wanted:
 [A,), [A,B), [B,)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164699#comment-13164699
 ] 

Ted Yu commented on HBASE-4927:
---

Also ran through the two tests in original patch:
{code}
 1302  mt -Dtest=TestHRegionInfo
 1303  mt -Dtest=TestCatalogJanitor
{code}
They passed as well.

Integrated to 0.92 and TRUNK.

Thanks for the addendum, Jimmy.

Thanks for the help, Jonathan.

 CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the 
 last region when the endkey is empty
 ---

 Key: HBASE-4927
 URL: https://issues.apache.org/jira/browse/HBASE-4927
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.92.0

 Attachments: 
 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 
 0001-Fixed-TestOffline-failure-caused-by-HBASE-4927.patch, 
 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 
 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, 
 hbase-4927-fix-ws.txt


 When reviewing HBASE-4238 backporting, Jon found this issue.
 What happens if the split points are  (empty end key is the last key, empty 
 start key is the first key)
 Parent [A,)
 L daughter [A,B), 
 R daughter [B,)
 When sorted, we gets to end key comparision which results in this incorrector 
 order:
 [A,B), [A,), [B,) 
 we wanted:
 [A,), [A,B), [B,)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164960#comment-13164960
 ] 

Ted Yu commented on HBASE-4120:
---

But preScannerClose() is only passed one scanner. 

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
 TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
 TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165017#comment-13165017
 ] 

Ted Yu commented on HBASE-4120:
---

Looks like we should expose scanner lease expiration through new coprocessor 
API. 

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
 TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
 TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165021#comment-13165021
 ] 

Ted Yu commented on HBASE-4980:
---

The patch changes the semantics of original error handling. 
I think we shouldn't leave data unread in the input stream. 

 Null pointer exception in HBaseClient receiveResponse
 -

 Key: HBASE-4980
 URL: https://issues.apache.org/jira/browse/HBASE-4980
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Shrijeet Paliwal
  Labels: newbie
 Attachments: 
 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch


 Relevant Stack trace: 
 2011-11-30 13:10:26,557 [IPC Client (47) connection to 
 xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN  
 org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call 
 responses
 java.lang.NullPointerException
 -at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583)
 -at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511)
 {code}
   if (LOG.isDebugEnabled())
   LOG.debug(getName() +  got value # + id);
 Call call = calls.remove(id);
 // Read the flag byte
 byte flag = in.readByte();
 boolean isError = ResponseFlag.isError(flag);
 if (ResponseFlag.isLength(flag)) {
   // Currently length if present is unused.
   in.readInt();
 }
 int state = in.readInt(); // Read the state.  Currently unused.
 if (isError) {
   //noinspection ThrowableInstanceNeverThrown
   call.setException(new RemoteException( WritableUtils.readString(in),
   WritableUtils.readString(in)));
 } else {
 {code}
 This line {code}Call call = calls.remove(id);{code}  may return a null 
 'call'. It is so because if you have rpc timeout enable, we proactively clean 
 up other calls which have expired their lifetime along with the call for 
 which socket timeout exception happend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165025#comment-13165025
 ] 

Ted Yu commented on HBASE-4980:
---

Please attach patch for trunk last so that HadoopQA can test it. 

 Null pointer exception in HBaseClient receiveResponse
 -

 Key: HBASE-4980
 URL: https://issues.apache.org/jira/browse/HBASE-4980
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Shrijeet Paliwal
  Labels: newbie
 Attachments: 
 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, 
 0002-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch


 Relevant Stack trace: 
 2011-11-30 13:10:26,557 [IPC Client (47) connection to 
 xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN  
 org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call 
 responses
 java.lang.NullPointerException
 -at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583)
 -at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511)
 {code}
   if (LOG.isDebugEnabled())
   LOG.debug(getName() +  got value # + id);
 Call call = calls.remove(id);
 // Read the flag byte
 byte flag = in.readByte();
 boolean isError = ResponseFlag.isError(flag);
 if (ResponseFlag.isLength(flag)) {
   // Currently length if present is unused.
   in.readInt();
 }
 int state = in.readInt(); // Read the state.  Currently unused.
 if (isError) {
   //noinspection ThrowableInstanceNeverThrown
   call.setException(new RemoteException( WritableUtils.readString(in),
   WritableUtils.readString(in)));
 } else {
 {code}
 This line {code}Call call = calls.remove(id);{code}  may return a null 
 'call'. It is so because if you have rpc timeout enable, we proactively clean 
 up other calls which have expired their lifetime along with the call for 
 which socket timeout exception happend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4980) Null pointer exception in HBaseClient receiveResponse

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165031#comment-13165031
 ] 

Ted Yu commented on HBASE-4980:
---

Please use --no-prefix to generate patch. 

 Null pointer exception in HBaseClient receiveResponse
 -

 Key: HBASE-4980
 URL: https://issues.apache.org/jira/browse/HBASE-4980
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.92.0
Reporter: Shrijeet Paliwal
  Labels: newbie
 Attachments: 
 0001-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch, 
 0002-HBASE-4980-Fix-NPE-in-HBaseClient-receiveResponse.patch


 Relevant Stack trace: 
 2011-11-30 13:10:26,557 [IPC Client (47) connection to 
 xx.xx.xx/172.22.4.68:60020 from an unknown user] WARN  
 org.apache.hadoop.ipc.HBaseClient - Unexpected exception receiving call 
 responses
 java.lang.NullPointerException
 -at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:583)
 -at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:511)
 {code}
   if (LOG.isDebugEnabled())
   LOG.debug(getName() +  got value # + id);
 Call call = calls.remove(id);
 // Read the flag byte
 byte flag = in.readByte();
 boolean isError = ResponseFlag.isError(flag);
 if (ResponseFlag.isLength(flag)) {
   // Currently length if present is unused.
   in.readInt();
 }
 int state = in.readInt(); // Read the state.  Currently unused.
 if (isError) {
   //noinspection ThrowableInstanceNeverThrown
   call.setException(new RemoteException( WritableUtils.readString(in),
   WritableUtils.readString(in)));
 } else {
 {code}
 This line {code}Call call = calls.remove(id);{code}  may return a null 
 'call'. It is so because if you have rpc timeout enable, we proactively clean 
 up other calls which have expired their lifetime along with the call for 
 which socket timeout exception happend.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4682) Support deleted rows using Import/Export

2011-12-07 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165048#comment-13165048
 ] 

Ted Yu commented on HBASE-4682:
---

In the new Delete ctor, row key should be included in the IOE. 

 Support deleted rows using Import/Export
 

 Key: HBASE-4682
 URL: https://issues.apache.org/jira/browse/HBASE-4682
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 4682-v1.txt


 Parent allows keeping deleted rows around. Would be nice if those could be 
 exported and imported as well.
 All the building blocks are there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d

2011-12-06 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163819#comment-13163819
 ] 

Ted Yu commented on HBASE-4946:
---

The failed tests were due to 'unable to create new native thread'

 HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
 dynamically loaded coprocessors (from hdfs or local system), because the RPC 
 system tries to deserialize an unknown class. 
 -

 Key: HBASE-4946
 URL: https://issues.apache.org/jira/browse/HBASE-4946
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.0
Reporter: Andrei Dragomir
Assignee: Andrei Dragomir
 Attachments: HBASE-4946-v2.patch, HBASE-4946-v3.patch, 
 HBASE-4946.patch


 Loading coprocessors jars from hdfs works fine. I load it from the shell, 
 after setting the attribute, and it gets loaded:
 {noformat}
 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
 config now ...
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
 com.MyCoprocessorClass needs to be loaded from a file - 
 hdfs://localhost:9000/coproc/rt-  0.0.1-SNAPSHOT.jar.
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
 com.MyCoprocessorClass
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
 RegionEnvironment createEnvironment
 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
 handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
 protocol=com.MyCoprocessorClassProtocol
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
 coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
 {noformat}
 The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
 with a dynamic method. When calling this method from the client with 
 HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
 cannot be deserialized from writables. 
 The problem is that Exec tries to do an early resolve of the coprocessor 
 class. The coprocessor class is loaded, but it is in the context of the 
 HRegionServer / HRegion. So, the call fails:
 {noformat}
 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
 Error in readFields
 java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
   at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
   ... 10 more
 {noformat}
 Probably the correct way to fix this is to make Exec really smart, so that it 
 knows all the class definitions loaded in CoprocessorHost(s).
 I created a small patch that simply doesn't resolve the class definition in 
 the Exec, instead passing it as string down to the HRegion layer. This layer 
 knows all the definitions, and simply loads it by name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d

2011-12-06 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163840#comment-13163840
 ] 

Ted Yu commented on HBASE-4946:
---

For coprocessor/Exec.java, javadoc doesn't match code:
{code}
 try {
   protocol = (ClassCoprocessorProtocol)conf.getClassByName(protocolName);
 }
 catch (ClassNotFoundException cnfe) {
-  throw new IOException(Protocol class +protocolName+ not found, cnfe);
+  // can't do eager instantiation. pass it as a string and try to 
deserialize later.
+  //throw new IOException(Protocol class +protocolName+ not found, 
cnfe);
{code}
I think the above try block should be commented out.
TestCoprocessorEndpoint passes without the above assignment to protocol.

 HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
 dynamically loaded coprocessors (from hdfs or local system), because the RPC 
 system tries to deserialize an unknown class. 
 -

 Key: HBASE-4946
 URL: https://issues.apache.org/jira/browse/HBASE-4946
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.0
Reporter: Andrei Dragomir
Assignee: Andrei Dragomir
 Attachments: HBASE-4946-v2.patch, HBASE-4946-v3.patch, 
 HBASE-4946.patch


 Loading coprocessors jars from hdfs works fine. I load it from the shell, 
 after setting the attribute, and it gets loaded:
 {noformat}
 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
 config now ...
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
 com.MyCoprocessorClass needs to be loaded from a file - 
 hdfs://localhost:9000/coproc/rt-  0.0.1-SNAPSHOT.jar.
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
 com.MyCoprocessorClass
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
 RegionEnvironment createEnvironment
 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
 handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
 protocol=com.MyCoprocessorClassProtocol
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
 coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
 {noformat}
 The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
 with a dynamic method. When calling this method from the client with 
 HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
 cannot be deserialized from writables. 
 The problem is that Exec tries to do an early resolve of the coprocessor 
 class. The coprocessor class is loaded, but it is in the context of the 
 HRegionServer / HRegion. So, the call fails:
 {noformat}
 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
 Error in readFields
 java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
   at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
   ... 10 more
 {noformat}
 Probably the correct way to fix this is to make Exec really smart, so that it

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-06 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163853#comment-13163853
]

Ted Yu commented on HBASE-4120:
---

From Andrew Purtell (see 'trip report for Hadoop In China' up on dev@):

After 0.92 is out I intend to champion / mentor / co-develop 4120 and the
follow on table allocation work and target 0.94 for it. I think the RPC QoS
aspect is not too controversial. The allocation/reservation aspects I'd like to
aim for a coprocessor or at least master plugin based integration so they won't
impact stability for users who don't enable it. Unlike RPC QoS I suspect the
changes needed to core can be minimized to coprocessor framework additions.
Follow up in new JIRAs soon.

isolation and allocation

Key: HBASE-4120
URL: https://issues.apache.org/jira/browse/HBASE-4120
Project: HBase
Issue Type: New Feature
Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
Fix For: 0.94.0

Attachments: Design_document_for_HBase_isolation_and_allocation.pdf,
Design_document_for_HBase_isolation_and_allocation_Revised.pdf,
HBase_isolation_and_allocation_user_guide.pdf,
Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch,
TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch,
TablePriority_v8.patch, TablePriority_v8_for_trunk.patch,
TablePrioriy_v9.patch

The HBase isolation and allocation tool is designed to help users manage
cluster resource among different application and tables.
When we have a large scale of HBase cluster with many applications running on
it, there will be lots of problems. In Taobao there is a cluster for many
departments to test their applications performance, these applications are
based on HBase. With one cluster which has 12 servers, there will be only one
application running exclusively on this server, and many other applications
must wait until the previous test finished.
After we add allocation manage function to the cluster, applications can
share the cluster and run concurrently. Also if the Test Engineer wants to
make sure there is no interference, he/she can move out other tables from
this group.
In groups we use table priority to allocate resource, when system is busy; we
can make sure high-priority tables are not affected lower-priority tables
Different groups can have different region server configurations, some groups
optimized for reading can have large block cache size, and others optimized
for writing can have large memstore size.
Tables and region servers can be moved easily between groups; after changing
the configuration, a group can be restarted alone instead of restarting the
whole cluster.
git entry : https://github.com/ICT-Ope/HBase_allocation .
We hope our work is helpful.

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-12-06 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163902#comment-13163902
 ] 

Ted Yu commented on HBASE-4120:
---

Andy suggested placing the PriorityFunction.initRegionPriority(region) call in 
RegionObserver.postOpen()

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, 
 TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, 
 TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4605) Constraints

2011-12-06 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163939#comment-13163939
 ] 

Ted Yu commented on HBASE-4605:
---

Renaming IntegrationTestConstraint as described above makes sense.

Thanks Jesse.

 Constraints
 ---

 Key: HBASE-4605
 URL: https://issues.apache.org/jira/browse/HBASE-4605
 Project: HBase
  Issue Type: Improvement
  Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch, 
 java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch


 From Jesse's comment on dev:
 {quote}
 What I would like to propose is a simple interface that people can use to 
 implement a 'constraint' (matching the classic database definition). This 
 would help ease of adoption by helping HBase more easily check that box, help 
 minimize code duplication across organizations, and lead to easier adoption.
 Essentially, people would implement a 'Constraint' interface for checking 
 keys before they are put into a table. Puts that are valid get written to the 
 table, but if not people can will throw an exception that gets propagated 
 back to the client explaining why the put was invalid.
 Constraints would be set on a per-table basis and the user would be expected 
 to ensure the jars containing the constraint are present on the machines 
 serving that table.
 Yes, people could roll their own mechanism for doing this via coprocessors 
 each time, but this would make it easier to do so, so you only have to 
 implement a very minimal interface and not worry about the specifics.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-12-06 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164104#comment-13164104
 ] 

Ted Yu commented on HBASE-4880:
---

Did RestAdmin pass for patch v2?

 Region is on service before completing openRegionHanlder, may cause data loss
 -

 Key: HBASE-4880
 URL: https://issues.apache.org/jira/browse/HBASE-4880
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-4880.patch, hbase-4880v2.patch


 OpenRegionHandler in regionserver is processed as the following steps:
 {code}
 1.openregion()(Through it, closed = false, closing = false)
 2.addToOnlineRegions(region)
 3.update .meta. table 
 4.update ZK's node state to RS_ZK_REGION_OPEND
 {code}
 We can find that region is on service before Step 4.
 It means client could put data to this region after step 3.
 What will happen if step 4 is failed processing?
 It will execute OpenRegionHandler#cleanupFailedOpen which will do closing 
 region, and master assign this region to another regionserver.
 If closing region is failed, the data which is put between step 3 and step 4 
 may loss, because the region has been opend on another regionserver and be 
 put new data. Therefore, it may not be recoverd through replayRecoveredEdit() 
 because the edit's LogSeqId is smaller than current region SeqId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-12-06 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164168#comment-13164168
]

Ted Yu commented on HBASE-4880:
---

Please check failed tests.
It seems trunk is broken now.

Region is on service before completing openRegionHanlder, may cause data loss
-

Key: HBASE-4880
URL: https://issues.apache.org/jira/browse/HBASE-4880
Project: HBase
Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch

OpenRegionHandler in regionserver is processed as the following steps:
{code}
1.openregion()(Through it, closed = false, closing = false)
2.addToOnlineRegions(region)
3.update .meta. table
4.update ZK's node state to RS_ZK_REGION_OPEND
{code}
We can find that region is on service before Step 4.
It means client could put data to this region after step 3.
What will happen if step 4 is failed processing?
It will execute OpenRegionHandler#cleanupFailedOpen which will do closing
region, and master assign this region to another regionserver.
If closing region is failed, the data which is put between step 3 and step 4
may loss, because the region has been opend on another regionserver and be
put new data. Therefore, it may not be recoverd through replayRecoveredEdit()
because the edit's LogSeqId is smaller than current region SeqId.

[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d

2011-12-05 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162740#comment-13162740
 ] 

Ted Yu commented on HBASE-4946:
---

The change in BloomFilterFactory is unrelated, please remove it from patch. 

 HTable.coprocessorExec (and possibly coprocessorProxy) does not work with 
 dynamically loaded coprocessors (from hdfs or local system), because the RPC 
 system tries to deserialize an unknown class. 
 -

 Key: HBASE-4946
 URL: https://issues.apache.org/jira/browse/HBASE-4946
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Affects Versions: 0.92.0
Reporter: Andrei Dragomir
Assignee: Andrei Dragomir
 Attachments: HBASE-4946-v2.patch, HBASE-4946.patch


 Loading coprocessors jars from hdfs works fine. I load it from the shell, 
 after setting the attribute, and it gets loaded:
 {noformat}
 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor 
 config now ...
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class 
 com.MyCoprocessorClass needs to be loaded from a file - 
 hdfs://localhost:9000/coproc/rt-  0.0.1-SNAPSHOT.jar.
 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: 
 com.MyCoprocessorClass
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: 
 RegionEnvironment createEnvironment
 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol 
 handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. 
 protocol=com.MyCoprocessorClassProtocol
 INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load 
 coprocessor com.MyCoprocessorClass from HTD of t1 successfully.
 {noformat}
 The problem is that this coprocessors simply extends BaseEndpointCoprocessor, 
 with a dynamic method. When calling this method from the client with 
 HTable.coprocessorExec, I get errors on the HRegionServer, because the call 
 cannot be deserialized from writables. 
 The problem is that Exec tries to do an early resolve of the coprocessor 
 class. The coprocessor class is loaded, but it is in the context of the 
 HRegionServer / HRegion. So, the call fails:
 {noformat}
 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: 
 Error in readFields
 java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125)
   at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575)
   at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
   at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122)
   ... 10 more
 {noformat}
 Probably the correct way to fix this is to make Exec really smart, so that it 
 knows all the class definitions loaded in CoprocessorHost(s).
 I created a small patch that simply doesn't resolve the class definition in 
 the Exec, instead passing it as string down to the HRegion layer. This layer 
 knows all the definitions, and simply loads it by name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek

2011-12-05 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163010#comment-13163010
 ] 

Ted Yu commented on HBASE-4954:
---

In our internal Jenkins build which uses hadoop 0.22, there was no such 
exception.
I last merged from Apache 0.92 on Nov. 14th
So this test failure might have been introduced by JIRAs integrated after Nov. 
14th - possibly the backport of HBASE-2856.

 IllegalArgumentException in hfile2 blockseek
 

 Key: HBASE-4954
 URL: https://issues.apache.org/jira/browse/HBASE-4954
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.92.0


 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote:
  The first hbase 0.92.0 release candidate is available for download:
 
   http://people.apache.org/~stack/hbase-0.92.0-candidate-0/
 Here's another persistent issues that I'd appreciate somebody taking
 a quick look at:
 
 http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/
 Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:218)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475)
at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek

2011-12-05 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163208#comment-13163208
 ] 

Ted Yu commented on HBASE-4954:
---

It turns out that we should be using the following command:
{code}
 1180  mvn clean -Dhadoop.profile=22 compile
 1181  mvn -Dhadoop.profile=22 -P localTests test 
-Dtest=TestHFileOutputFormat#testMRIncrementalLoadWithSplit
{code}
where I saw (under TRUNK):
{code}
testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat)
  Time elapsed: 18.591 sec   FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.runIncrementalPELoad(TestHFileOutputFormat.java:479)
  at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.doIncrementalLoadTest(TestHFileOutputFormat.java:391)
  at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testMRIncrementalLoadWithSplit(TestHFileOutputFormat.java:370)
{code}

 IllegalArgumentException in hfile2 blockseek
 

 Key: HBASE-4954
 URL: https://issues.apache.org/jira/browse/HBASE-4954
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.92.0


 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote:
  The first hbase 0.92.0 release candidate is available for download:
 
   http://people.apache.org/~stack/hbase-0.92.0-candidate-0/
 Here's another persistent issues that I'd appreciate somebody taking
 a quick look at:
 
 http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/
 Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:218)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475)
at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek

2011-12-05 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163212#comment-13163212
 ] 

Ted Yu commented on HBASE-4954:
---

Stack trace for 0.92 is essentially the same.
But I don't see test output:
{code}
-rw-r--r--  1 zhihyu  110088321  0 Dec  5 16:21 
target/surefire-reports/org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat-output.txt
{code}


 IllegalArgumentException in hfile2 blockseek
 

 Key: HBASE-4954
 URL: https://issues.apache.org/jira/browse/HBASE-4954
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.92.0


 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote:
  The first hbase 0.92.0 release candidate is available for download:
 
   http://people.apache.org/~stack/hbase-0.92.0-candidate-0/
 Here's another persistent issues that I'd appreciate somebody taking
 a quick look at:
 
 http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/
 Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:218)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475)
at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4954) IllegalArgumentException in hfile2 blockseek

2011-12-05 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163380#comment-13163380
 ] 

Ted Yu commented on HBASE-4954:
---

Can I mark this issue as Won't fix ?

 IllegalArgumentException in hfile2 blockseek
 

 Key: HBASE-4954
 URL: https://issues.apache.org/jira/browse/HBASE-4954
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.92.0


 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote:
  The first hbase 0.92.0 release candidate is available for download:
 
   http://people.apache.org/~stack/hbase-0.92.0-candidate-0/
 Here's another persistent issues that I'd appreciate somebody taking
 a quick look at:
 
 http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/
 Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:218)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511)
at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475)
at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325)
at 
 org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4605) Constraints

2011-12-04 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162565#comment-13162565
]

Ted Yu commented on HBASE-4605:
---

Thanks Gary for the detailed review.
For the check() API:
{code}
public void check(Put p) throws ConstraintException;
{code}
We can abstract the input parameter as Mutation.

I think we need to reach agreement on the following major issues:
1. introduction of dependency of Guava in client library - Jonathan Gray would
strongly disagree with this practice.
2. whether IntegerConstraint should be included in the constraint package
3. constraint configuration serialization method

If the current implementation for some of the above isn't critical to this
feature (#1 and #2), we can defer to future JIRAs.

Actually a fourth issue (raised by Suraj under the discussion of 'Question on
Coprocessors and Atomicity') is more important: if we don't provide atomicity
by holding rowlock during the check, the use cases for Constraints would
decrease.
Issue #4 definitely doesn't have to be covered by this JIRA.

The fifth issue is how IntegrationTestConstraint.java should be tagged.

Constraints
---

Key: HBASE-4605
URL: https://issues.apache.org/jira/browse/HBASE-4605
Project: HBase
Issue Type: Improvement
Components: client, coprocessors
Affects Versions: 0.94.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch,
java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch

From Jesse's comment on dev:
{quote}
What I would like to propose is a simple interface that people can use to
implement a 'constraint' (matching the classic database definition). This
would help ease of adoption by helping HBase more easily check that box, help
minimize code duplication across organizations, and lead to easier adoption.
Essentially, people would implement a 'Constraint' interface for checking
keys before they are put into a table. Puts that are valid get written to the
table, but if not people can will throw an exception that gets propagated
back to the client explaining why the put was invalid.
Constraints would be set on a per-table basis and the user would be expected
to ensure the jars containing the constraint are present on the machines
serving that table.
Yes, people could roll their own mechanism for doing this via coprocessors
each time, but this would make it easier to do so, so you only have to
implement a very minimal interface and not worry about the specifics.
{quote}

[jira] [Commented] (HBASE-4921) HTable initialization looks for EMPTY_START_ROW

2011-12-03 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162215#comment-13162215
 ] 

Ted Yu commented on HBASE-4921:
---

@Pritam:
Can you provide a patch and let us know the result of running through test 
suite ?

Thanks

 HTable initialization looks for EMPTY_START_ROW
 ---

 Key: HBASE-4921
 URL: https://issues.apache.org/jira/browse/HBASE-4921
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Pritam Damania

 The HTable initialization does something like this : 
 {code}this.connection.locateRegion(tableName, 
 HConstants.EMPTY_START_ROW);{code}
 What is the rationale behind this ? What would happen if this region is in 
 flight ? I ran into a problem where I disabled the first region of the table 
 and now I can't create an HTable instance to this table.
 Disabling the first region is like disabling the entire table from a client 
 perspective. I feel this is not the correct behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4379) [hbck] Does not complain about tables with no end region [Z,]

2011-12-03 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162224#comment-13162224
 ] 

Ted Yu commented on HBASE-4379:
---

@Jonathan:
You need to re-attach patch because HadoopQA checks the attachment Id.
If the attachment Id corresponds to a patch which was verified earlier, there 
would be no test suite execution.

Thanks

 [hbck] Does not complain about tables with no end region [Z,]
 -

 Key: HBASE-4379
 URL: https://issues.apache.org/jira/browse/HBASE-4379
 Project: HBase
  Issue Type: Bug
  Components: hbck
Affects Versions: 0.92.0, 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4379-hbck-does-not-complain-about-tables-with-.patch, 
 hbase-4379.v2.patch


 hbck does not detect or have an error condition when the last region of a 
 table is missing (end key != '').

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4942) HMaster is unable to start of HFile V1 is used

2011-12-03 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162270#comment-13162270
 ] 

Ted Yu commented on HBASE-4942:
---

You're right Lars.

 HMaster is unable to start of HFile V1 is used
 --

 Key: HBASE-4942
 URL: https://issues.apache.org/jira/browse/HBASE-4942
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: honghua zhu
 Fix For: 0.92.0, 0.94.0


 This was reported by HH Zhu (zhh200...@gmail.com)
 If the following is specified in hbase-site.xml：
 {code}
 property
 namehfile.format.version/name
 value1/value
 /property
 {code}
 Clear the hdfs directory hbase.rootdir so that MasterFileSystem.bootstrap() 
 is executed.
 You would see:
 {code}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV1.close(HFileReaderV1.java:358)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1083)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:570)
 at org.apache.hadoop.hbase.regionserver.Store.close(Store.java:441)
 at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:782)
 at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:717)
 at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:688)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:390)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:356)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:128)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:113)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:435)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:314)
 at java.lang.Thread.run(Thread.java:619)
 {code}
 The above exception would lead to:
 {code}
 java.lang.RuntimeException: HMaster Aborted
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:152)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:103)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at 
 org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
 at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1512)
 {code}
 In org.apache.hadoop.hbase.master.HMaster.HMaster(Configuration conf), we 
 have:
 {code}
 this.conf.setFloat(CacheConfig.HFILE_BLOCK_CACHE_SIZE_KEY, 0.0f);
 {code}
 When CacheConfig is instantiated, the following is called:
 {code}
 org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(Configuration
  conf)
 {code}
 Since hfile.block.cache.size is 0.0, instantiateBlockCache() would return 
 null, resulting in blockCache field of CacheConfig to be null.
 When master closes Root region, 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV1.close(boolean evictOnClose) 
 would be called. cacheConf.getBlockCache() returns null, leading to master 
 abort.
 The following should be called in HFileReaderV1.close(), similar to the code 
 in HFileReaderV2.close():
 {code}
 if (evictOnClose  cacheConf.isBlockCacheEnabled())
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4936) Cached HRegionInterface connections crash when getting UnknownHost exceptions

2011-12-03 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162272#comment-13162272
]

Ted Yu commented on HBASE-4936:
---

+1 on patch.

@Andrei:
Please re-attach patch using --no-prefix option.
HadoopQA uses -p0 to apply patches.

Cached HRegionInterface connections crash when getting UnknownHost exceptions
-

Key: HBASE-4936
URL: https://issues.apache.org/jira/browse/HBASE-4936
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.92.0
Reporter: Andrei Dragomir
Attachments: HBASE-4936.patch

This isssue is unlikely to come up in a cluster test case. However, for
development, the following thing happens:
1. Start the HBase cluster locally, on network A (DNS A, etc)
2. The region locations are cached using the hostname
(mycomputer.company.com, 211.x.y.z - real ip)
3. Change network location (go home)
4. Start the HBase cluster locally. My hostname / ips are not different
(mycomputer, 192.168.0.130 - new ip)
If the region locations have been cached using the hostname, there is an
UnknownHostException in CatalogTracker.getCachedConnection(ServerName sn),
uncaught in the catch statements. The server will crash constantly.
The error should be caught and not rethrown, so that the cached connection
expires normally.

[jira] [Commented] (HBASE-4945) NPE in HRegion.bulkLoadHFiles(...)

2011-12-03 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162275#comment-13162275
 ] 

Ted Yu commented on HBASE-4945:
---

Good catch, Lars.

I think the fix should go to 0.92

 NPE in HRegion.bulkLoadHFiles(...)
 --

 Key: HBASE-4945
 URL: https://issues.apache.org/jira/browse/HBASE-4945
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Priority: Minor

 Was playing with completebulkload, and ran into an NPE.
 The problem is here (HRegion.bulkLoadHFiles(...)).
 {code}
 Store store = getStore(familyName);
 if (store == null) {
   IOException ioe = new DoNotRetryIOException(
   No such column family  + Bytes.toStringBinary(familyName));
   ioes.add(ioe);
   failures.add(p);
 }
 try {
   store.assertBulkLoadHFileOk(new Path(path));
 } catch (WrongRegionException wre) {
   // recoverable (file doesn't fit in region)
   failures.add(p);
 } catch (IOException ioe) {
   // unrecoverable (hdfs problem)
   ioes.add(ioe);
 }
 {code}
 This should be 
 {code}
 Store store = getStore(familyName);
 if (store == null) {
 ...
 } else {
   try {
 store.assertBulkLoadHFileOk(new Path(path));
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-12-03 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162278#comment-13162278
]

Ted Yu commented on HBASE-4880:
---

I think the patch makes sense.

I ran TestHCM with patch applied. It passed.

Region is on service before completing openRegionHanlder, may cause data loss
-

Key: HBASE-4880
URL: https://issues.apache.org/jira/browse/HBASE-4880
Project: HBase
Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
Attachments: hbase-4880.patch

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

2011-12-03 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162280#comment-13162280
 ] 

Ted Yu commented on HBASE-4944:
---

Minor comments:
{code}
+KeyValue pkv = null;
{code}
The variable can be named prevKV which is clearer.
{code}
+  throw new InvalidHFileException(Previous row is greater then
{code}
Typo above, should be 'greater than'.



 Optionally verify bulk loaded HFiles
 

 Key: HBASE-4944
 URL: https://issues.apache.org/jira/browse/HBASE-4944
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0, 0.90.5
Reporter: Andrew Purtell
Priority: Minor
 Attachments: 4944.txt


 We rely on users to produce properly formatted HFiles for bulk import. 
 Attached patch adds an optional code path, toggled by a configuration 
 property, that verifies the HFile under consideration for import is properly 
 sorted. The default maintains the current behavior, which does not scan the 
 file for correctness.
 Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4944) Optionally verify bulk loaded HFiles

2011-12-03 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162281#comment-13162281
 ] 

Ted Yu commented on HBASE-4944:
---

Looks like the patch should be rebased:
{code}
4 out of 5 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/Store.java.rej
{code}

 Optionally verify bulk loaded HFiles
 

 Key: HBASE-4944
 URL: https://issues.apache.org/jira/browse/HBASE-4944
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0, 0.90.5
Reporter: Andrew Purtell
Priority: Minor
 Attachments: 4944.txt


 We rely on users to produce properly formatted HFiles for bulk import. 
 Attached patch adds an optional code path, toggled by a configuration 
 property, that verifies the HFile under consideration for import is properly 
 sorted. The default maintains the current behavior, which does not scan the 
 file for correctness.
 Patch is against trunk but can apply against all active branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-11-30 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160233#comment-13160233
]

Ted Yu commented on HBASE-3373:
---

@Ben:
Thanks for trying out 0.94

The code snippet above deals with region server which recently joined the
cluster. Its goal is to avoid hot region server which receives above average
load.
This is part of the changes from HBASE-3609. The randomization is done on this
line:
{code}
Collections.shuffle(sns, RANDOM);
{code}
where we schedule regions to region servers which are shuffled randomly.

Your observation about unbalanced table(s) in the cluster is valid. This is due
to master not passing per-table region distribution to balanceCluster().
I have a patch which is in internal repository where master calls
balanceCluster() for each table.
Once we test it in production cluster, I should be able to contribute back.

Allow regions of specific table to be load-balanced
---

Key: HBASE-3373
URL: https://issues.apache.org/jira/browse/HBASE-3373
Project: HBase
Issue Type: Improvement
Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
Attachments: HbaseBalancerTest2.java

From our experience, cluster can be well balanced and yet, one table's
regions may be badly concentrated on few region servers.
For example, one table has 839 regions (380 regions at time of table
creation) out of which 202 are on one server.
It would be desirable for load balancer to distribute regions for specified
tables evenly across the cluster. Each of such tables has number of regions
many times the cluster size.

[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159180#comment-13159180
 ] 

Ted Yu commented on HBASE-4729:
---

For the unexpected zk exception, we should abort the master instead of logging 
error. 

 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159410#comment-13159410
 ] 

Ted Yu commented on HBASE-4729:
---

Since TestAssignmentManager would be run in a shared JVM under TRUNK in the 
near future, I think it should be a medium test.

From N:
- Small tests are executed in a shared JVM. We put in this category all the
tests that can be executed quickly (the maximum execution time for a test
is 15 seconds, and they do not use a cluster) in a shared jvm.


 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 
 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159426#comment-13159426
 ] 

Ted Yu commented on HBASE-4729:
---

+1 on patch v5.

 Clash between region unassign and splitting kills the master
 

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 
 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4893) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159613#comment-13159613
 ] 

Ted Yu commented on HBASE-4893:
---

HConnectionManager.deleteStaleConnection() can be utilized in the above 
proposal.

 HConnectionImplementation closed-but-not-deleted, need a way to find the 
 state of connection
 

 Key: HBASE-4893
 URL: https://issues.apache.org/jira/browse/HBASE-4893
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.1, 0.90.2, 0.90.3, 0.90.4
 Environment: Linux 2.6, HBase-0.90.1
Reporter: Mubarak Seyed
  Labels: hbase-client
 Fix For: 0.90.5


 In abort() of HConnectionManager$HConnectionImplementation, instance of 
 HConnectionImplementation is marked as this.closed=true.
 There is no way for client application to check the hbase client connection 
 whether it is still opened/good (this.closed=false) or not. We need a method 
 to validate the state of a connection like isClosed().
 {code}
 public boolean isClosed(){
return this.closed;
 } 
 {code}
 Once the connection is closed and it should get deleted. Client application 
 still gets a connection from HConnectionManager.getConnection(Configuration) 
 and tries to make a RPC call to RS, since connection is already closed, 
 HConnectionImplementation.getRegionServerWithRetries throws 
 RetriesExhaustedException with error message
 {code}
 Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying 
 to contact region server null for region , row 
 '----xxx', but failed after 10 attempts.
 Exceptions:
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7
  closed
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1008)
   at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159676#comment-13159676
 ] 

Ted Yu commented on HBASE-4616:
---

Where is MetaSearchRow defined ?
What's the difference between MetaSearchRow.getStartSearchRow(tableName, null) 
and MetaSearchRow.getStartSearchRow(tableName, HConstants.EMPTY_BYTE_ARRAY) ?

 Update hregion encoded name to reduce logic and prevent region collisions in 
 META
 -

 Key: HBASE-4616
 URL: https://issues.apache.org/jira/browse/HBASE-4616
 Project: HBase
  Issue Type: Umbrella
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-4616.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159694#comment-13159694
 ] 

Ted Yu commented on HBASE-4616:
---

In HRegionInfo.java:
{code}
  public static final int END_OF_TABLE_TABLE_NAME = END_OF_TABLE_NAME + 1;
{code}
I think END_OF_TABLE_NAME_FOR_EMPTY_ENDKEY would be a better name.

In createRegionName():
{code}
if (id != null || id.length  0 ) {
{code}
 should be used above.

 Update hregion encoded name to reduce logic and prevent region collisions in 
 META
 -

 Key: HBASE-4616
 URL: https://issues.apache.org/jira/browse/HBASE-4616
 Project: HBase
  Issue Type: Umbrella
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-4616.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4616) Update hregion encoded name to reduce logic and prevent region collisions in META

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159761#comment-13159761
 ] 

Ted Yu commented on HBASE-4616:
---

In SplitTransaction.java, getDaughterRegionIdTimestamp() is removed.
I think we should take care of clock skew.

Also, using hri.getRegionId() - 1 as Id for daughter region Ids is not 
intuitive.

 Update hregion encoded name to reduce logic and prevent region collisions in 
 META
 -

 Key: HBASE-4616
 URL: https://issues.apache.org/jira/browse/HBASE-4616
 Project: HBase
  Issue Type: Umbrella
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-4616.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159858#comment-13159858
 ] 

Ted Yu commented on HBASE-4899:
---

{code}
+  +  because it has been opened in 
+  + addressFromAM.getServerName());
{code}
We should use the value of rit (RegionState) in the above log instead of hard 
coding 'opened'.

 Region would be assigned twice easily with continually  killing server and 
 moving region in testing environment
 ---

 Key: HBASE-4899
 URL: https://issues.apache.org/jira/browse/HBASE-4899
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: chunhui shen
 Attachments: hbase-4899.patch


 Before assigning region in ServerShutdownHandler#process, it will check 
 whether region is in RIT,
 however, this checking doesn't work as the excepted in the following case:
 1.move region A from server B to server C
 2.kill server B
 3.start server B immediately
 Let's see what happen in the code for the above case
 {code}
 for step1:
 1.1 server B close the region A,
 1.2 master setOffline for region 
 A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
 1.3 server C start to open region A.(Not completed)
 for step3:
 master ServerShutdownHandler#process() for server B
 {
 ..
 splitlog()
 ...
 ListRegionState regionsInTransition =
 this.services.getAssignmentManager()
 .processServerShutdown(this.serverName);
 ...
 Skip regions that were in transition unless CLOSING or PENDING_CLOSE
 ...
 assign region
 }
 {code}
 In fact, when running 
 ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
  region A is in RIT (step1.3 not completed), but the return ListRegionState 
 regionsInTransition doesn't contain it, because region A has removed from 
 AssignmentManager.regions by AssignmentManager#setOffline in step 1.2
 Therefore, region A will be assigned twice.
 Actually, one server killed and started twice will also easily cause region 
 assigned twice.
 Exclude the above reason, another probability : 
 when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions 
 ,region is included which is in RIT now.
 But after completing MetaReader.getServerUserRegions, the region has been 
 opened in other server and is not in RIT now.
 In our testing environment where balancing,moving and killing are executed 
 periodly, assigning region twice often happens, and it is hateful because it 
 will affect other test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4899) Region would be assigned twice easily with continually killing server and moving region in testing environment

2011-11-29 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13159859#comment-13159859
 ] 

Ted Yu commented on HBASE-4899:
---

@Chunhui:
Please let us know testing result on your QA environment.

 Region would be assigned twice easily with continually  killing server and 
 moving region in testing environment
 ---

 Key: HBASE-4899
 URL: https://issues.apache.org/jira/browse/HBASE-4899
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-4899.patch


 Before assigning region in ServerShutdownHandler#process, it will check 
 whether region is in RIT,
 however, this checking doesn't work as the excepted in the following case:
 1.move region A from server B to server C
 2.kill server B
 3.start server B immediately
 Let's see what happen in the code for the above case
 {code}
 for step1:
 1.1 server B close the region A,
 1.2 master setOffline for region 
 A,(AssignmentManager#setOffline:this.regions.remove(regionInfo))
 1.3 server C start to open region A.(Not completed)
 for step3:
 master ServerShutdownHandler#process() for server B
 {
 ..
 splitlog()
 ...
 ListRegionState regionsInTransition =
 this.services.getAssignmentManager()
 .processServerShutdown(this.serverName);
 ...
 Skip regions that were in transition unless CLOSING or PENDING_CLOSE
 ...
 assign region
 }
 {code}
 In fact, when running 
 ServerShutdownHandler#process()#this.services.getAssignmentManager().processServerShutdown(this.serverName),
  region A is in RIT (step1.3 not completed), but the return ListRegionState 
 regionsInTransition doesn't contain it, because region A has removed from 
 AssignmentManager.regions by AssignmentManager#setOffline in step 1.2
 Therefore, region A will be assigned twice.
 Actually, one server killed and started twice will also easily cause region 
 assigned twice.
 Exclude the above reason, another probability : 
 when execute ServerShutdownHandler#process()#MetaReader.getServerUserRegions 
 ,region is included which is in RIT now.
 But after completing MetaReader.getServerUserRegions, the region has been 
 opened in other server and is not in RIT now.
 In our testing environment where balancing,moving and killing are executed 
 periodly, assigning region twice often happens, and it is hateful because it 
 will affect other test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4773) HBaseAdmin may leak ZooKeeper connections

2011-11-28 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158569#comment-13158569
 ] 

Ted Yu commented on HBASE-4773:
---

Integrated to 0.90, 0.92 and TRUNK.

Thanks for the patch Xufeng.

Thanks for the review Jinchao and Ramkrishna.

 HBaseAdmin may leak ZooKeeper connections
 -

 Key: HBASE-4773
 URL: https://issues.apache.org/jira/browse/HBASE-4773
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Priority: Critical
 Fix For: 0.90.5

 Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch


 When master crashs, HBaseAdmin will leaks ZooKeeper connections
 I think we should close the zk connetion when throw MasterNotRunningException
  public HBaseAdmin(Configuration c)
   throws MasterNotRunningException, ZooKeeperConnectionException {
 this.conf = HBaseConfiguration.create(c);
 this.connection = HConnectionManager.getConnection(this.conf);
 this.pause = this.conf.getLong(hbase.client.pause, 1000);
 this.numRetries = this.conf.getInt(hbase.client.retries.number, 10);
 this.retryLongerMultiplier = 
 this.conf.getInt(hbase.client.retries.longer.multiplier, 10);
 //we should add this code and close the zk connection
 try{
   this.connection.getMaster();
 }catch(MasterNotRunningException e){
   HConnectionManager.deleteConnection(conf, false);
   throw e;  
 }
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-28 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158589#comment-13158589
 ] 

Ted Yu commented on HBASE-4120:
---

The new tests should be placed under 
src/test/java/org/apache/hadoop/hbase/allocation/, instead of 
src/test/java/org/apache/hadoop/hbase/allocation/test

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-28 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158598#comment-13158598
 ] 

Ted Yu commented on HBASE-4120:
---

Please also categorize the new tests.
See N's email on d...@hbase.apache.org, entitled 'unit tests - pom.xml  
surefire changed - categories are available', for guideline.

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-28 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158626#comment-13158626
 ] 

Ted Yu commented on HBASE-4120:
---

Indentation in certain classes was off.
e.g.
{code}
public static int initRegionPriority(HRegion region) {
{code}
Please use Eclipse formatter Nicloas attached to HBASE-3678 to format every 
file.

{code}
private static int initRegionPriority(String region, boolean force) {
...
  if (ret != null)
return ret;
{code}
Please enclose the return statement in curly braces.



 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-28 Thread Ted Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158648#comment-13158648
]

Ted Yu commented on HBASE-4820:
---

@Jimmy:
Please upload latest patch to review board.

https://builds.apache.org/job/PreCommit-HBASE-Build/367//testReport/ was
incomplete. Please run log splitting related tests manually.

Let's see what Kannan has to say.

Distributed log splitting coding enhancement to make it easier to understand,
no semantics change
-

Key: HBASE-4820
URL: https://issues.apache.org/jira/browse/HBASE-4820
Project: HBase
Issue Type: Improvement
Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
Labels: newbie
Fix For: 0.94.0

Attachments:
0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,

0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch,
0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme.patch,
0001-HBASE-4820-Minor-distributed-log-splitting-enhanceme_new.patch

In reviewing distributed log splitting feature, we found some cosmetic
issues. They make the code hard to understand.
It will be great to fix them. For this issue, there should be no semantic
change.

1 2 3 4 5 6 7 >

1 - 100 of 666 matches

Mail list logo