[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179335#comment-13179335
 ] 

Phabricator commented on HBASE-4218:


mcorgan has commented on the revision [jira] [HBASE-4218] HFile data block 
encoding framework and delta encoding implementation.

  Trying to review this with an eye on schema changes and compactions.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java:241
 What about the situation where regionserver is running for a while with 
ENCODING_IN_MEMORY=true and block cache gets filled with encoded blocks, and 
then user does schema change to disable encoding altogether.  Now the block 
cache may return an old encoded block.  (Assuming online schema change doesn't 
invalidate all blocks for a table?)

  If i'm understanding that correctly, then it shouldn't be an 
IllegalStateException but should be handled normally.  It should probably 
invalidate the encoded block from the block cache if possible, otherwise it 
will expire normally.  Then it should return null so that HfileReaderV2 knows 
to go to the filesystem to get the block.

REVISION DETAIL
  https://reviews.facebook.net/D447


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179349#comment-13179349
 ] 

Mikhail Bautin commented on HBASE-4218:
---

A brief status update. I am in the process of implementing support for column 
family data block encoding configuration changes. Those changes are coming in 
the next version of the patch that I will post tomorrow. After discussing this 
with Kannan, our solution is:
* Assign an in-cache data block encoding to every HFile reader. This in-cache 
encoding is determined as follows:
** If the HFile is not encoded on disk, the in-cache encoding is set to the 
column family's DATA_BLOCK_ENCODING.
** If the HFile is encoded on disk, the in-cache encoding is set to the HFile 
encoding to avoid the wasted effort of re-encoding blocks for cache.
* When a non-encoded block is loaded from disk, it is encoded using the 
in-cache encoding and put in cache.
* When an encoded block is loaded from disk, its encoding is left as is.
* To reduce the complexity of data block encoding switching, we can include the 
in-cache encoding type in the block cache key. For example, if 
ENCODED_IN_CACHE_ONLY is turned on without encoding on disk, and then the 
encoding is turned off altogether, the cache will be populated with non-encoded 
blocks (since they will have completely different keys) and encoded blocks will 
age out from the cache. While this is suboptimal, the implementation is very 
simple and the common case (when the CF encoding options do not change) is not 
complicated with unnecessary corner cases.


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179390#comment-13179390
 ] 

Zhihong Yu commented on HBASE-5121:
---

Should null assignment for lastTop be after the debug log ?

 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
 Attachments: hbase-5121.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179394#comment-13179394
 ] 

Zhihong Yu commented on HBASE-5081:
---

In the above scenario, would there be many duplicate log messages in master 
log file ?

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2012-01-04 Thread Benoit Sigoure (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179450#comment-13179450
 ] 

Benoit Sigoure commented on HBASE-3939:
---

On 04/Nov/11 00:07, Stack wrote:
bq. So why introduce it? Just so its in place when we want to use it later?

Anyone knows the answer?

 Some crossports of Hadoop IPC fixes
 ---

 Key: HBASE-3939
 URL: https://issues.apache.org/jira/browse/HBASE-3939
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
 3939-v6.txt, 3939-v7.txt, 3939-v8.txt, 3939-v9.txt, 3939.txt


 A few fixes from Hadoop IPC that we should probably cross-port into our copy:
 - HADOOP-7227: remove the protocol version check at call time
 - HADOOP-7146: fix a socket leak in server
 - HADOOP-7121: fix behavior when response serialization throws an exception
 - HADOOP-7346: send back nicer error response when client is using an out of 
 date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179394#comment-13179394
 ] 

Zhihong Yu edited comment on HBASE-5081 at 1/4/12 2:48 PM:
---

In the above scenario, would there be many duplicate log split scheduled for  
messages in master log file ?

I think the latest patch should go through verification in a real cluster.

  was (Author: zhi...@ebaysf.com):
In the above scenario, would there be many duplicate log messages in 
master log file ?
  
 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5122) Provide flexible method for loading ColumnInterpreters

2012-01-04 Thread Zhihong Yu (Created) (JIRA)
Provide flexible method for loading ColumnInterpreters
--

 Key: HBASE-5122
 URL: https://issues.apache.org/jira/browse/HBASE-5122
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu


See the discussion on user list entitled 'AggregateProtocol Help'
From Royston:
{noformat}
I re-created my HBase table to contain Bytes.toBytes(Long) values and that 
fixed it.
{noformat}
It was not the intention when AggregateProtocol was designed that users have to 
change their schema to match LongColumnInterpreter.

This JIRA aims to provide a flexible way for users to load their custom 
ColumnInterpreters into region servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5123) Provide more aggregate functions for Aggregations Protocol

2012-01-04 Thread Zhihong Yu (Created) (JIRA)
Provide more aggregate functions for Aggregations Protocol
--

 Key: HBASE-5123
 URL: https://issues.apache.org/jira/browse/HBASE-5123
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu


Royston requested the following aggregates on top of what we already have:
Median, Weighted Median, Mult

See discussion entitled 'AggregateProtocol Help' on user list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179551#comment-13179551
 ] 

stack commented on HBASE-3939:
--

@BenoƮt Is this a blocker for you?

 Some crossports of Hadoop IPC fixes
 ---

 Key: HBASE-3939
 URL: https://issues.apache.org/jira/browse/HBASE-3939
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
 3939-v6.txt, 3939-v7.txt, 3939-v8.txt, 3939-v9.txt, 3939.txt


 A few fixes from Hadoop IPC that we should probably cross-port into our copy:
 - HADOOP-7227: remove the protocol version check at call time
 - HADOOP-7146: fix a socket leak in server
 - HADOOP-7121: fix behavior when response serialization throws an exception
 - HADOOP-7346: send back nicer error response when client is using an out of 
 date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4565) Maven HBase build broken on cygwin with copynativelib.sh call.

2012-01-04 Thread Suraj Varma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179550#comment-13179550
 ] 

Suraj Varma commented on HBASE-4565:


Ah - looks like it is out of sync with 0.92 and trunk now with the 
hbase-security pom updates. I'll rebase and put out a new patch version.

 Maven HBase build broken on cygwin with copynativelib.sh call.
 --

 Key: HBASE-4565
 URL: https://issues.apache.org/jira/browse/HBASE-4565
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.0
 Environment: cygwin (on xp and win7)
Reporter: Suraj Varma
Assignee: Suraj Varma
  Labels: build, maven
 Fix For: 0.94.0

 Attachments: HBASE-4565-0.92.patch, HBASE-4565-v2.patch, 
 HBASE-4565-v3-0.92.patch, HBASE-4565-v3.patch, HBASE-4565.patch


 This is broken in both 0.92 as well as trunk pom.xml
 Here's a sample maven log snippet from trunk (from Mayuresh on user mailing 
 list)
 [INFO] [antrun:run {execution: package}]
 [INFO] Executing tasks
 main:
[mkdir] Created dir: 
 D:\workspace\mkshirsa\hbase-trunk\target\hbase-0.93-SNAPSHOT\hbase-0.93-SNAPSHOT\lib\native\${build.platform}
 [exec] ls: cannot access D:workspacemkshirsahbase-trunktarget/nativelib: 
 No such file or directory
 [exec] tar (child): Cannot connect to D: resolve failed
 [INFO] 
 
 [ERROR] BUILD ERROR
 [INFO] 
 
 [INFO] An Ant BuildException has occured: exec returned: 3328
 There are two issues: 
 1) The ant run task below doesn't resolve the windows file separator returned 
 by the project.build.directory - this causes the above resolve failed.
 !-- Using Unix cp to preserve symlinks, using script to handle wildcards --
 echo file=${project.build.directory}/copynativelibs.sh
 if [ `ls ${project.build.directory}/nativelib | wc -l` -ne 0]; then
 2) The tar argument value below also has a similar issue in that the path arg 
 doesn't resolve right.
 !-- Using Unix tar to preserve symlinks --
 exec executable=tar failonerror=yes 
 dir=${project.build.directory}/${project.artifactId}-${project.version}
 arg value=czf/
 arg 
 value=/cygdrive/c/workspaces/hbase-0.92-svn/target/${project.artifactId}-${project.version}.tar.gz/
 arg value=./
 /exec
 In both cases, the fix would probably be to use a cross-platform way to 
 handle the directory locations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5123) Provide more aggregate functions for Aggregations Protocol

2012-01-04 Thread Tom Wilcox (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179559#comment-13179559
 ] 

Tom Wilcox commented on HBASE-5123:
---

SumProduct is probably another useful one.

 Provide more aggregate functions for Aggregations Protocol
 --

 Key: HBASE-5123
 URL: https://issues.apache.org/jira/browse/HBASE-5123
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu

 Royston requested the following aggregates on top of what we already have:
 Median, Weighted Median, Mult
 See discussion entitled 'AggregateProtocol Help' on user list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-5120:
--

Attachment: HBASE-5120.patch

Patch is attached so that i can access it at home.  Not the final one and not 
fully tested in cluster.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 2012-01-04 

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Comment: was deleted

(was: Patch is attached so that i can access it at home.  Not the final one and 
not fully tested in cluster.)

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Attachment: HBASE-5120.patch

Attaching the patch so that i can access it at home.  First cut versiion and 
not fully tested.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Attachment: (was: HBASE-5120.patch)

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 2012-01-04 00:20:39,864 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition 

[jira] [Issue Comment Edited] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179390#comment-13179390
 ] 

Zhihong Yu edited comment on HBASE-5121 at 1/4/12 5:04 PM:
---

{code}
+this.lastTop = null;
+LOG.debug(Storescanner.peek() is changed where before = 
++ this.lastTop.toString() + ,and after = 
{code}
Should null assignment for lastTop be after the debug log ?

  was (Author: zhi...@ebaysf.com):
Should null assignment for lastTop be after the debug log ?
  
 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
 Attachments: hbase-5121.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179650#comment-13179650
 ] 

Lars Hofhansl commented on HBASE-4218:
--

One more thought about ENCODED_IN_CACHE_ONLY (and then I'll shut up about 
this)...

If we ever wanted to extend this in the future and allow disk only encoding, 
maybe a better way would be to have ENCODING and ENCODE_ON_DISK. ENCODE_ON_DISK 
(default false) would just be the inverse of what ENCODED_IN_CACHE_ONLY is. 
That way (if we felt so inclined) we can add ENCODE_IN_CACHE later and allow it 
to be false.


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179660#comment-13179660
 ] 

Lars Hofhansl commented on HBASE-5120:
--

Hey Ram, I know you said this is not done, yet...
{code}
if (t instanceof NullPointerException) {
  removeRegionInTransition(region);
...
{code}

Could we instead add a null check at the relevant point and deal with it there 
(or maybe throw another exception)? (Dealing with this after the NPE occurred 
leaves a bad taste... What we introduce another bug later that also causes an 
NPE, that will go unnoticed for a while.)


 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 

[jira] [Issue Comment Edited] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179660#comment-13179660
 ] 

Lars Hofhansl edited comment on HBASE-5120 at 1/4/12 5:17 PM:
--

Hey Ram, I know you said this is not done, yet...
{code}
if (t instanceof NullPointerException) {
  removeRegionInTransition(region);
...
{code}

Could we instead add a null check at the relevant point and deal with it there 
(or maybe throw another exception)? (Dealing with this after the NPE occurred 
leaves a bad taste... What if we introduce another bug later that also causes 
an NPE? That would go unnoticed for a while.)


  was (Author: lhofhansl):
Hey Ram, I know you said this is not done, yet...
{code}
if (t instanceof NullPointerException) {
  removeRegionInTransition(region);
...
{code}

Could we instead add a null check at the relevant point and deal with it there 
(or maybe throw another exception)? (Dealing with this after the NPE occurred 
leaves a bad taste... What we introduce another bug later that also causes an 
NPE, that will go unnoticed for a while.)

  
 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the 

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179662#comment-13179662
 ] 

Zhihong Yu commented on HBASE-5120:
---

Thanks for the quick turnaround.

The patch depends on NullPointerException to detect problematic region.
I think we should use more protective measure as Lars commented.
The patch specifies M_ZK_REGION_CLOSING for ZKAssign.deleteNode(). This narrows 
the applicability of the NPE handling.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5120:
--

Fix Version/s: 0.94.0
   0.92.0

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 2012-01-04 00:20:39,864 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: 

[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179666#comment-13179666
 ] 

Lars Hofhansl commented on HBASE-5121:
--

I see the patch is against trunk. Does this happen in trunk only?

It might have to do with the various optimization for scanning that the FB 
folks put in (if I recall FB rarely does major compactions).
Maybe Mikhail could have a look at this too.


 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
 Attachments: hbase-5121.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179685#comment-13179685
 ] 

Lars Hofhansl commented on HBASE-2947:
--

I ran the four failing tests locally and they all pass.
I would like to commit this today (the change it pretty uncontentious I think).


 MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 2947-v2.txt, HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179685#comment-13179685
 ] 

Lars Hofhansl edited comment on HBASE-2947 at 1/4/12 5:53 PM:
--

I ran the four failing tests locally and they all pass.
I would like to commit this today (the change itself is pretty uncontentious I 
think).


  was (Author: lhofhansl):
I ran the four failing tests locally and they all pass.
I would like to commit this today (the change it pretty uncontentious I think).

  
 MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 2947-v2.txt, HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions to be load-balanced by table

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179712#comment-13179712
 ] 

Zhihong Yu commented on HBASE-3373:
---

@Ben, @Jonathan Gray:
What do you think of my patch ?

Thanks

 Allow regions to be load-balanced by table
 --

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.94.0

 Attachments: 3373.txt, HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5097:
--

Status: Open  (was: Patch Available)

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5097.patch, HBASE-5097_1.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5097:
--

Attachment: HBASE-5097_2.patch

Indention problem occured by mistake.  Corrected patch uploaded.

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179725#comment-13179725
 ] 

ramkrishna.s.vasudevan commented on HBASE-5120:
---

@Lars and @Ted
Thanks for your reviews.
I was also not convinced in adding the instanceof check for NPE.  It does not 
look good.
I was trying at other option of not calling sendRegionClose if the server is 
null.
Also remove the RIT if anything has been added.  But need to verify some more 
scenarios. Not sure if i can dedicate time tomorrow as we have some internal 
workshop.  I will work on this and give my patch sooner. :)


 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in 

[jira] [Commented] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179727#comment-13179727
 ] 

Andrew Purtell commented on HBASE-5097:
---

+1 on fixed patch.

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Shrijeet Paliwal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shrijeet Paliwal updated HBASE-5041:


Attachment: 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch

Uploading patch for 0.90 branch. Sorry for the delay Ted. Noticed your comment 
late that you wanted to merge it yesterday itself.

 Major compaction on non existing table does not throw error 
 

 Key: HBASE-5041
 URL: https://issues.apache.org/jira/browse/HBASE-5041
 Project: HBase
  Issue Type: Bug
  Components: regionserver, shell
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal
Assignee: Shrijeet Paliwal
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch


 Following will not complain even if fubar does not exist
 {code}
 echo major_compact 'fubar' | $HBASE_HOME/bin/hbase shell
 {code}
 The downside for this defect is that major compaction may be skipped due to
 a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5124) Backport LoadTestTool to 0.92

2012-01-04 Thread Zhihong Yu (Created) (JIRA)
Backport LoadTestTool to 0.92
-

 Key: HBASE-5124
 URL: https://issues.apache.org/jira/browse/HBASE-5124
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Fix For: 0.92.0


LoadTestTool is very useful.
This JIRA backports LoadTestTool to 0.92 so that users don't have to build 
TRUNK in order to use it against 0.92 cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179735#comment-13179735
 ] 

Hadoop QA commented on HBASE-5041:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509436/0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/664//console

This message is automatically generated.

 Major compaction on non existing table does not throw error 
 

 Key: HBASE-5041
 URL: https://issues.apache.org/jira/browse/HBASE-5041
 Project: HBase
  Issue Type: Bug
  Components: regionserver, shell
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal
Assignee: Shrijeet Paliwal
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch


 Following will not complain even if fubar does not exist
 {code}
 echo major_compact 'fubar' | $HBASE_HOME/bin/hbase shell
 {code}
 The downside for this defect is that major compaction may be skipped due to
 a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

2012-01-04 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179738#comment-13179738
 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
---

@Stack
I tried the option of applying the fix on the master side.  But i felt like the 
history of the region was not available.  (May be am wrong.)
Also the ServerShutdownhandler.processShutDown() removes the online server 
first and also the list of regions.

Then it is after the region opening is done by the balancer flow the new server 
name is again added back.
{code}
if (isServerOnline(sn)) {
this.regions.put(regionInfo, sn);
addToServers(sn, regionInfo);
this.regions.notifyAll();
{code}
bq.Could we make it such that only one thread can transition a region at a time?
This i did not think much on this.  

@Ming Ma
If it is ok, do you mind sharing the patch that you had prepared ?


 The META can hold an entry for a region with a different server name from the 
 one actually in the AssignmentManager thus making the region inaccessible.
 

 Key: HBASE-5094
 URL: https://issues.apache.org/jira/browse/HBASE-5094
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: HBASE-5094_1.patch


 {code}
 RegionState rit = 
 this.services.getAssignmentManager().isRegionInTransition(e.getKey());
 ServerName addressFromAM = this.services.getAssignmentManager()
 .getRegionServerOfRegion(e.getKey());
 if (rit != null  !rit.isClosing()  !rit.isPendingClose()) {
   // Skip regions that were in transition unless CLOSING or
   // PENDING_CLOSE
   LOG.info(Skip assigning region  + rit.toString());
 } else if (addressFromAM != null
  !addressFromAM.equals(this.serverName)) {
   LOG.debug(Skip assigning region 
 + e.getKey().getRegionNameAsString()
 +  because it has been opened in 
 + addressFromAM.getServerName());
   }
 {code}
 In ServerShutDownHandler we try to get the address in the AM.  This address 
 is initially null because it is not yet updated after the region was opened 
 .i.e. the CAll back after node deletion is not yet done in the master side.
 But removal from RIT is completed on the master side.  So this will trigger a 
 new assignment.
 So there is a small window between the online region is actually added in to 
 the online list and the ServerShutdownHandler where we check the existing 
 address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179743#comment-13179743
 ] 

Todd Lipcon commented on HBASE-5121:


No test for this? Can we get a functional test that can at least be run from 
the command line to trigger it?

Nits: the javadoc on the constructors are unnecessary (they don't give any 
extra information)
Using the exception for control flow here looks dirty IMO.

 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
 Attachments: hbase-5121.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5081:
---

Attachment: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch

implement feedback

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5121:
-

Assignee: chunhui shen

 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5121.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4773) HBaseAdmin may leak ZooKeeper connections

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4773:
--

   Resolution: Fixed
Fix Version/s: 0.92.0
   Status: Resolved  (was: Patch Available)

Committed sometime back.

 HBaseAdmin may leak ZooKeeper connections
 -

 Key: HBASE-4773
 URL: https://issues.apache.org/jira/browse/HBASE-4773
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Assignee: xufeng
Priority: Critical
 Fix For: 0.92.0, 0.90.6

 Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch


 When master crashs, HBaseAdmin will leaks ZooKeeper connections
 I think we should close the zk connetion when throw MasterNotRunningException
  public HBaseAdmin(Configuration c)
   throws MasterNotRunningException, ZooKeeperConnectionException {
 this.conf = HBaseConfiguration.create(c);
 this.connection = HConnectionManager.getConnection(this.conf);
 this.pause = this.conf.getLong(hbase.client.pause, 1000);
 this.numRetries = this.conf.getInt(hbase.client.retries.number, 10);
 this.retryLongerMultiplier = 
 this.conf.getInt(hbase.client.retries.longer.multiplier, 10);
 //we should add this code and close the zk connection
 try{
   this.connection.getMaster();
 }catch(MasterNotRunningException e){
   HConnectionManager.deleteConnection(conf, false);
   throw e;  
 }
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179763#comment-13179763
 ] 

Kannan Muthukkaruppan commented on HBASE-4218:
--

I also like ENCODE_ON_DISK instead of ENCODE_IN_CACHE_ONLY (with the reverse 
semantics).

I would say let's keep the default for ENCODE_ON_DISK to true though. This is 
more a testing knob in early stages-- where someone will set it to false before 
publishing a new data block encoder for general use. By the time end users try 
this, the code should be robust enough, and the Column Family setting of which 
data block encoding to use should be ideally the only knob they need to think 
about.



 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions to be load-balanced by table

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179767#comment-13179767
 ] 

Lars Hofhansl commented on HBASE-3373:
--

I forgot to +1 it above.

 Allow regions to be load-balanced by table
 --

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.94.0

 Attachments: 3373.txt, HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179789#comment-13179789
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509441/0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 77 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/665//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/665//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/665//console

This message is automatically generated.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5109) Fix TestAvroServer so that it waits properly for the modifyTable operation to complete

2012-01-04 Thread Ming Ma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179794#comment-13179794
 ] 

Ming Ma commented on HBASE-5109:


Thanks, Ted. Then, it is mostly taken cared of. Just want to point out two 
small issues in the fix for HBase-4621:

1. The loop won't stop if there is a bug in modifyTable call. In that case, the 
unit test will just hang instead of fail. Adding a timeout might be helpful.
2. The two asserts statement after the while statement are redundant. So we can 
take one out.

 Fix TestAvroServer so that it waits properly for the modifyTable operation to 
 complete
 --

 Key: HBASE-5109
 URL: https://issues.apache.org/jira/browse/HBASE-5109
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-5109-0.92.patch


 TestAvroServer has the following issue
  
 impl.modifyTable(tableAname, tableA);
 // It can take a while for the change to take effect. Wait here a while.
 while(impl.describeTable(tableAname) == null ) {
   Threads.sleep(100);
 }
 assertTrue(impl.describeTable(tableAname).maxFileSize == 123456L);
  
 impl.describeTable(tableAname) returns the default maxSize 256M right away as 
 modifyTable is async. Before HBASE-4328 is fixed, we can fix the test code to 
 wait for say max of 5 seconds to check if 
 impl.describeTable(tableAname).maxFileSize is uploaded to 123456L. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179797#comment-13179797
 ] 

stack commented on HBASE-5081:
--

These failures seem unrelated to this patch -- they happened on previous 
hadoopqa build for unrelated patch.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179798#comment-13179798
 ] 

stack commented on HBASE-5081:
--

I'll commit this unless objection.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179802#comment-13179802
 ] 

Matt Corgan commented on HBASE-4218:


Some food for thought - there is probably more complexity to this down the 
road.  There are always going to be trade-offs between encoding speed, 
compression ratio, scan throughput, and seek latency.  These trade-offs can 
actually be quite huge, like 10x when you start considering things like suffix 
compression.  I can see having different encodings in the same column family 
depending on dynamic performance decisions.  For example, use the most compact 
encoding during major compaction, but use the fastest encoding if memstore 
flushes are backlogged.

We probably can't get it perfect in this first iteration.  Just want to avoid 
shooting ourselves in the foot as much as possible.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5109) Fix TestAvroServer so that it waits properly for the modifyTable operation to complete

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179805#comment-13179805
 ] 

stack commented on HBASE-5109:
--

+1 on committing the patch.  Its some nice, if small, cleanup.

 Fix TestAvroServer so that it waits properly for the modifyTable operation to 
 complete
 --

 Key: HBASE-5109
 URL: https://issues.apache.org/jira/browse/HBASE-5109
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Ming Ma
Assignee: Ming Ma
 Attachments: HBASE-5109-0.92.patch


 TestAvroServer has the following issue
  
 impl.modifyTable(tableAname, tableA);
 // It can take a while for the change to take effect. Wait here a while.
 while(impl.describeTable(tableAname) == null ) {
   Threads.sleep(100);
 }
 assertTrue(impl.describeTable(tableAname).maxFileSize == 123456L);
  
 impl.describeTable(tableAname) returns the default maxSize 256M right away as 
 modifyTable is async. Before HBASE-4328 is fixed, we can fix the test code to 
 wait for say max of 5 seconds to check if 
 impl.describeTable(tableAname).maxFileSize is uploaded to 123456L. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179806#comment-13179806
 ] 

Zhihong Yu commented on HBASE-5081:
---

{code}
+if (oldtask.status == FAILURE) {
+  // wait for status to change to DELETED
+  try {
+oldtask.wait();
+  } catch (InterruptedException e) {
+LOG.warn(Interrupted when waiting for znode delete callback);
+// fall through to return failure
{code}
Should a loop be introduced wrapping oldtask.wait() ? JVM sometimes produces 
spurious notification.
{code}
+LOG.fatal(Logic error. Deleted task still present in tasks map);
+assert false : Deleted task still present in tasks map;
{code}
The assertion wouldn't be effective at runtime, right ?
I think throwing exception would be better.


 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5093) wiki update for HBase/Scala

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179809#comment-13179809
 ] 

stack commented on HBASE-5093:
--

I just added you as a wiki contributor Joe.  Let me know if you still can't 
edit.

 wiki update for HBase/Scala
 ---

 Key: HBASE-5093
 URL: https://issues.apache.org/jira/browse/HBASE-5093
 Project: HBase
  Issue Type: Improvement
Reporter: Joe Stein

 I tried to edit the wiki but it says immutable page
 would be helpful/nice for folks to know how to get sbt working with Scala
 the following is what I did to get it working, not sure why could not edit 
 the wiki figure i open a JIRA so someone with access could update this
 {code}
 resolvers += Apache HBase at 
 https://repository.apache.org/content/repositories/releases;
 resolvers += Thrift at http://people.apache.org/~rawson/repo/;
 libraryDependencies ++= Seq(
 org.apache.hadoop % hadoop-core % 0.20.2,
 org.apache.hbase % hbase % 0.90.4
 )
 {code}
 or let me access it and I can do it, np

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179811#comment-13179811
 ] 

Zhihong Yu commented on HBASE-5081:
---

See 
http://stackoverflow.com/questions/1050592/do-spurious-wakeups-actually-happen

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179813#comment-13179813
 ] 

stack commented on HBASE-4218:
--

/me hearts this issue

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179821#comment-13179821
 ] 

Zhihong Yu commented on HBASE-5081:
---

w.r.t. master retry after the following exception is raised:
{code}
throw new IOException(duplicate log split scheduled for 
+ lf.getPath());
{code}
I see this code in MasterFileSystem.splitLog():
{code}
  try {
splitLogSize = splitLogManager.splitLogDistributed(logDirs);
  } catch (OrphanHLogAfterSplitException e) {
LOG.warn(Retrying distributed splitting for  + serverNames, e);
splitLogManager.splitLogDistributed(logDirs);
  }
{code}
Would the retry be carried out as described ?

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5118) Fix Scan documentation

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179826#comment-13179826
 ] 

stack commented on HBASE-5118:
--

+1

 Fix Scan documentation
 --

 Key: HBASE-5118
 URL: https://issues.apache.org/jira/browse/HBASE-5118
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial
 Attachments: 5118.txt


 Current documentation for scan states:
 {code}
 Scan scan = new Scan();
 scan.addColumn(Bytes.toBytes(cf),Bytes.toBytes(attr));
 scan.setStartRow( Bytes.toBytes(row));   // start key is 
 inclusive
 scan.setStopRow( Bytes.toBytes(row +  new byte[] {0}));  // stop key is 
 exclusive
 for(Result result : htable.getScanner(scan)) {
   // process Result instance
 }
 {code}
 row +  new byte[] {0}  is not correct. That should  row + (char)0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179825#comment-13179825
 ] 

stack commented on HBASE-2947:
--

Is this safe to do Lars?

{code}
-public class Increment implements Writable {
+public class Increment implements Row {
{code}

Should you leave Writable in place?

Else LGTM

 MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 2947-v2.txt, HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179834#comment-13179834
 ] 

stack commented on HBASE-5097:
--

+1 on patch

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5071) HFile has a possible cast issue.

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179838#comment-13179838
 ] 

stack commented on HBASE-5071:
--

lgtm

 HFile has a possible cast issue.
 

 Key: HBASE-5071
 URL: https://issues.apache.org/jira/browse/HBASE-5071
 Project: HBase
  Issue Type: Bug
  Components: io
Affects Versions: 0.90.0
Reporter: Harsh J
  Labels: hfile

 HBASE-3040 introduced this line originally in HFile.Reader#loadFileInfo(...):
 {code}
 int allIndexSize = (int)(this.fileSize - this.trailer.dataIndexOffset - 
 FixedFileTrailer.trailerSize());
 {code}
 Which on trunk today, for HFile v1 is:
 {code}
 int sizeToLoadOnOpen = (int) (fileSize - trailer.getLoadOnOpenDataOffset() -
 trailer.getTrailerSize());
 {code}
 This computed (and casted) integer is then used to build an array of the same 
 size. But if fileSize is very large ( Integer.MAX_VALUE), then there's an 
 easy chance this can go negative at some point and spew out exceptions such 
 as:
 {code}
 java.lang.NegativeArraySizeException 
 at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readAllIndex(HFile.java:805) 
 at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:832) 
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadFileInfo(StoreFile.java:1003)
  
 at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:382) 
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:438)
  
 at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:267) 
 at org.apache.hadoop.hbase.regionserver.Store.init(Store.java:209) 
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2088)
  
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:358) 
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2661) 
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647) 
 {code}
 Did we accidentally limit single region sizes this way?
 (Unsure about HFile v2's structure so far, so do not know if v2 has the same 
 issue.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179841#comment-13179841
 ] 

Zhihong Yu commented on HBASE-5081:
---

About the test suite, from 
https://builds.apache.org/job/PreCommit-HBASE-Build/665/console:
{code}
Running org.apache.hadoop.hbase.master.TestSplitLogManager
Running org.apache.hadoop.hbase.regionserver.TestSeekOptimizations
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.055 sec
{code}
It is a pity that the hung test was masked by known failures.

MAPREDUCE-3583 seems to have gone into oblivion.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179843#comment-13179843
 ] 

stack commented on HBASE-5088:
--

+1 on last patch posted by Lars.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179846#comment-13179846
 ] 

stack commented on HBASE-4397:
--

Nice one Ming.

 -ROOT-, .META. tables stay offline for too long in recovery phase after all 
 RSs are shutdown at the same time
 -

 Key: HBASE-4397
 URL: https://issues.apache.org/jira/browse/HBASE-4397
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4397-0.92.patch


 1. Shutdown all RSs.
 2. Bring all RS back online.
 The -ROOT-, .META. stay in offline state until timeout monitor force 
 assignment 30 minutes later. That is because HMaster can't find a RS to 
 assign the tables to in assign operation.
 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
 Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
 trying to assign elsewhere instead; retry=0
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
 at $Proxy9.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2011-09-13 13:25:52,743 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
 location to assign region -ROOT-,,0.70236052
 Possible fixes:
 1. Have serverManager handle server online event similar to how 
 RegionServerTracker.java calls servermanager.expireServer in the case server 
 goes down.
 2. Make timeoutMonitor handle the situation better. This is a special 
 situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2834) Deferred deletes

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179849#comment-13179849
 ] 

stack commented on HBASE-2834:
--

We can close this issue then?  (After adding a bit of 'how to do deferred 
deletes' to the manual?)

 Deferred deletes
 

 Key: HBASE-2834
 URL: https://issues.apache.org/jira/browse/HBASE-2834
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Purtell

 Tangentally mentioned in a blog post, James Hamilton talks about deferred 
 deletes:
 {quote}
 If you have an application error, administrative error, or database 
 implementation bug that losses data, then it is simply gone unless you have 
 an offline copy. This, by the way, is why I'm a big fan of deferred delete.  
 This is a technique where deleted items are marked as deleted but not garbage 
 collected until some days or preferably weeks later.  Deferred delete is not 
 full protection but it has saved my butt more than once and I'm a believer. 
 See On Designing and Deploying Internet-Scale Services 
 (http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf) for more detail.
 {quote}
 (See 
 http://perspectives.mvdirona.com/2010/04/07/StonebrakerOnCAPTheoremAndDatabases.aspx)
 Because deletes -- at least, after the initial write has been flushed from 
 memstore -- are tombstones, deferred delete in HBase could be supported if 
 somehow tombstones could be invalidated, an undelete operation in effect. 
 This could be accomplished by adding support for tombstones for deletes. 
 Would complicate major compaction but otherwise not touch much. A typical use 
 case might be resurrect any data deleted from _ts1_ to _ts2_ , a period of 
 4 hours when an application error was operative. In this case a new write 
 would be issued to the table that is a tombstone covering any deletes over 
 that period of time. Users would defer major compactions until safe 
 checkpoint periods. 
 Such guarantees could optionally be extended to the memstoe by using 
 tombstones there as well. But it would probably be sufficient to provide 
 guidance that forcing a flush is  necessary to insure edits are persisted in 
 a way that allows for undeletion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Created) (JIRA)
Upgrade hadoop to 1.0.0
---

 Key: HBASE-5125
 URL: https://issues.apache.org/jira/browse/HBASE-5125
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0
 Attachments: 5125.txt



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5125:
-

Attachment: 5125.txt

 Upgrade hadoop to 1.0.0
 ---

 Key: HBASE-5125
 URL: https://issues.apache.org/jira/browse/HBASE-5125
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0

 Attachments: 5125.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5125:
-

Status: Patch Available  (was: Open)

 Upgrade hadoop to 1.0.0
 ---

 Key: HBASE-5125
 URL: https://issues.apache.org/jira/browse/HBASE-5125
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0

 Attachments: 5125.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5093) wiki update for HBase/Scala

2012-01-04 Thread Joe Stein (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein resolved HBASE-5093.
--

Resolution: Fixed

good to go thanks, wiki updated

 wiki update for HBase/Scala
 ---

 Key: HBASE-5093
 URL: https://issues.apache.org/jira/browse/HBASE-5093
 Project: HBase
  Issue Type: Improvement
Reporter: Joe Stein

 I tried to edit the wiki but it says immutable page
 would be helpful/nice for folks to know how to get sbt working with Scala
 the following is what I did to get it working, not sure why could not edit 
 the wiki figure i open a JIRA so someone with access could update this
 {code}
 resolvers += Apache HBase at 
 https://repository.apache.org/content/repositories/releases;
 resolvers += Thrift at http://people.apache.org/~rawson/repo/;
 libraryDependencies ++= Seq(
 org.apache.hadoop % hadoop-core % 0.20.2,
 org.apache.hbase % hbase % 0.90.4
 )
 {code}
 or let me access it and I can do it, np

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179891#comment-13179891
 ] 

Prakash Khemani commented on HBASE-5081:


The retry logic is in HMaster.splitLogAfterStartup(). I will remove the
OrphanLogException handling from MasterFileSystem.

On 1/4/12 12:03 PM, Zhihong Yu (Commented) (JIRA) j...@apache.org




 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179890#comment-13179890
 ] 

Zhihong Yu commented on HBASE-5041:
---

TestReplication failed in test suite run.
When I ran it separately, I got:
{code}
Failed tests: 
  queueFailover(org.apache.hadoop.hbase.replication.TestReplication)
{code}
Looking at the patch for 0.90, it doesn't seem to be related to replication.

@J-D:
Your review/comment on the patch would be helpful.

Thanks

 Major compaction on non existing table does not throw error 
 

 Key: HBASE-5041
 URL: https://issues.apache.org/jira/browse/HBASE-5041
 Project: HBase
  Issue Type: Bug
  Components: regionserver, shell
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal
Assignee: Shrijeet Paliwal
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch


 Following will not complain even if fubar does not exist
 {code}
 echo major_compact 'fubar' | $HBASE_HOME/bin/hbase shell
 {code}
 The downside for this defect is that major compaction may be skipped due to
 a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179894#comment-13179894
 ] 

Prakash Khemani commented on HBASE-5081:


Will look into the test failure. I am not sure I know where to find the
test run's output logs.

On 1/4/12 12:35 PM, Zhihong Yu (Commented) (JIRA) j...@apache.org




 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Shrijeet Paliwal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179907#comment-13179907
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

{code}
 mvn clean compile test -Dtest=TestReplication
{code}

Above passes without error for branch 0.90 in my dev machine. 

-Shrijeet

 Major compaction on non existing table does not throw error 
 

 Key: HBASE-5041
 URL: https://issues.apache.org/jira/browse/HBASE-5041
 Project: HBase
  Issue Type: Bug
  Components: regionserver, shell
Affects Versions: 0.90.3
Reporter: Shrijeet Paliwal
Assignee: Shrijeet Paliwal
 Fix For: 0.92.0, 0.94.0, 0.90.6

 Attachments: 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch


 Following will not complain even if fubar does not exist
 {code}
 echo major_compact 'fubar' | $HBASE_HOME/bin/hbase shell
 {code}
 The downside for this defect is that major compaction may be skipped due to
 a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5081:
---

Attachment: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch

set the interrupt flag on InterruptedException. Remove OrphanLogException 
handling for distributed log splitting.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179912#comment-13179912
 ] 

Zhihong Yu commented on HBASE-5081:
---

bq. If the caller thread is interrupted while waiting for status to change to 
DELETED ...
My comment @ 04/Jan/12 19:46 was about oldtask.wait() being awakened due to 
spurious wakeup from JVM.

You can use the following to mimic the environment of hadoopx build machines 
when you run unit tests:
{code}
-argLine-enableassertions -Xmx1900m 
-Djava.security.egd=file:/dev/./urandom/argLine
+argLine-d32 -enableassertions -Xmx1900m 
-Djava.security.egd=file:/dev/./urandom/argLine
{code}

Looking at 
https://builds.apache.org/job/PreCommit-HBASE-Build/665/testReport/org.apache.hadoop.hbase.master/TestSplitLogManager/,
 it seems all the tests passed.

Maybe N Keywal would have some idea about this.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179926#comment-13179926
 ] 

Prakash Khemani commented on HBASE-5081:


If there is a spurious wakeup before the status has changed to DELETE then
the code will return error (oldtask) to the caller.

Regarding the hung TestSplitLogManager test in
https://builds.apache.org/job/PreCommit-HBASE-Build/665/console - I
couldn't find what failed or what hung.
https://builds.apache.org/job/PreCommit-HBASE-Build/665//testReport/org.apa
che.hadoop.hbase.master/TestSplitLogManager/ shows that everything passed.





 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179931#comment-13179931
 ] 

Hadoop QA commented on HBASE-5125:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509452/5125.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 77 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/667//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/667//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/667//console

This message is automatically generated.

 Upgrade hadoop to 1.0.0
 ---

 Key: HBASE-5125
 URL: https://issues.apache.org/jira/browse/HBASE-5125
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0

 Attachments: 5125.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179933#comment-13179933
 ] 

Lars Hofhansl commented on HBASE-2947:
--

Good point... I just checked, and Row extends WritableComparable, which in turn 
extends Writable... So should be good.
Maybe than Append can be simplified to only implement Row and not also Writable.

 MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 2947-v2.txt, HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179935#comment-13179935
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509461/0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 77 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/668//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/668//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/668//console

This message is automatically generated.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179938#comment-13179938
 ] 

stack commented on HBASE-5120:
--

When would it ever make sense that TimeoutMonitor ask a regionserver RE-close a 
region?  It implies that somehow the regionserver 'forgot' to do the close or 
that it dropped the message.  If either happened, we have bigger problems and 
having the TM do a new close will only make it worse (If a RS 'forgot' or 
'failed' a close, its probably on its way out and its ephemeral node is about 
to vanish and master shutdown handler will do the close fixups; if it does not 
work this way, we should make it so?).

TM might make some sense for region openings but even here, I'd think the above 
applies; i.e. if a RS can't open a region, it should kill itself. If the region 
is bad, its unlikely any RS will open it unless the error transient.  For this 
latter case the TM running every 30 minutes, while conservative, is probably 
fine for a default.

I think we should leave the TM at its current 30 minute timeout and undo this 
issue as critical against 0.92.

There are outstanding race conditions in hbase around shutdown handler and 
master operations that deserve more attention that this IMO.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT 

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179940#comment-13179940
 ] 

stack commented on HBASE-5120:
--

On the patch, Ram, should we even try unassign if server in this.regions is 
null; i.e. we should fail way earlier?

IMO, this AM needs a rewrite if only to make its various transitions testable.  
There are also too many methods doing similar things with lots of overlap; its 
really hard to figure whats going on.

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Priority: Blocker
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-5120.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179945#comment-13179945
 ] 

Mikhail Bautin commented on HBASE-4218:
---

I think that with an 8K line patch we probably should not try to put more 
complexity into the first version of delta encoding. We can always make things 
more complicated later. I like the two-parameter setup: DATA_BLOCK_ENCODING 
sets the encoding type (on-disk and in-cache by default) and ENCODE_ON_DISK 
(true by default) allows to use in-cache-only encoding (when explicitly setting 
ENCODE_ON_DISK=false) and get the benefit of encoding in cache even before we 
are 100% sure that our encoding algorithms and encoded scanners are stable. If 
everyone agrees with that, I will finish the patch by (1) adding a unit test 
for switching data block encoding column family settings; (2) including 
encoding type in the cache key; and (3) simplifying the HFileDataBlockEncoder 
interface, since we assume that the in-memory format (used by scanners) is 
always the same as the in-cache format and don't need methods such as 
afterReadFromDiskAndPuttingInCache anymore.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179944#comment-13179944
 ] 

stack commented on HBASE-5081:
--

I'm running tests locally.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5125:
-

Resolution: Fixed
  Assignee: stack
Status: Resolved  (was: Patch Available)

Committed 0.92 and trunk

 Upgrade hadoop to 1.0.0
 ---

 Key: HBASE-5125
 URL: https://issues.apache.org/jira/browse/HBASE-5125
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 5125.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179955#comment-13179955
 ] 

Matt Corgan commented on HBASE-4218:


{quote}I think that with an 8K line patch we probably should not try to put 
more complexity into the first version of delta encoding.{quote}Yes, totally 
agreeing here.  It is a work in progress, and so these settings in this patch 
don't have to make perfect sense.  I like the latest 
DATA_BLOCK_ENCODING=NONE(?) and ENCODE_ON_DISK=true defaults.

All other comments look sensible.  Have you covered the case where you have 
encoded blocks in the block cache and are compacting to an unencoded hfile?  
You will want to make sure that you are using (not ignoring) the cached blocks.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179960#comment-13179960
 ] 

Mikhail Bautin commented on HBASE-4218:
---

Re-reading my previous post, I want to make an addition: we still use cached 
encoded blocks when compacting a fully-encoded column family.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179959#comment-13179959
 ] 

Mikhail Bautin commented on HBASE-4218:
---

Actually, I think it is OK to ignore cached encoded blocks on compaction. We 
can get encoded blocks in cache and have a compaction write an unencoded file 
in two cases:
* Encoding is turned on in cache only. In that case we don't want to use 
encoded blocks during compaction at all, because the in-cache-only mode implies 
that we don't trust our encoding algorithms 100% and want to guard against 
possible persistent data corruption.
* Encoding was turned on (either in cache only or everywhere) and it was turned 
off entirely. Since this is not a very frequent case, I think we could probably 
optimize this after the patch is stabilized.


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-2947:
-

Attachment: 2947-final.txt

Patch that I committed (based on Stack's comment Append does not actually need 
to implement Writable, since Row already extends Writable).

 MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 2947-final.txt, 2947-v2.txt, HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-2947:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review Stack!

 MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
 --

 Key: HBASE-2947
 URL: https://issues.apache.org/jira/browse/HBASE-2947
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Jonathan Gray
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 2947-final.txt, 2947-v2.txt, HBASE-2947-v1.patch


 HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
 operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179973#comment-13179973
 ] 

Matt Corgan commented on HBASE-4218:


{quote}I think it is OK to ignore cached encoded blocks on compaction{quote}The 
circumstance i was worried about is if you are doing many small flushes and 
minor compactions.  The blocks to be compacted could mostly be in cache, and 
you would be ignoring them all.  I guess it doesn't matter if it's just for 
testing, but might give a false impression of performance.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Kannan Muthukkaruppan (Created) (JIRA)
AND (USING FilterList) of two ColumnPrefixFilters broken


 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan


[Notice this in 89 branch. Possibly an issue in trunk also.]

A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
should return 0 kvs; instead it returns kvs with prefix tag0.

{code}
table = HTable.new(conf, tableName)

put = Put.new(Bytes.toBytes(row))
put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))

table.put(put)

# Test for AND Two Column Prefix Filters

   
filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));

filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
filters.addFilter(filter1);
filters.addFilter(filter1);

get = Get.new(Bytes.toBytes(row))
get.setFilter(filters)
get.setMaxVersions();
keyValues = table.get(get).raw()

keyValues.each do |keyValue|
  puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
Timestamp=#{keyValue.getTimestamp()} 
end
{code}

outputs:

{code}
Key=tag0; Value=value0; Timestamp=1325719223523
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-5126:
-

Attachment: testAndTwoPrefixFilters.rb

full test attached.

 AND (USING FilterList) of two ColumnPrefixFilters broken
 

 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: testAndTwoPrefixFilters.rb


 [Notice this in 89 branch. Possibly an issue in trunk also.]
 A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
 should return 0 kvs; instead it returns kvs with prefix tag0.
 {code}
 table = HTable.new(conf, tableName)
 put = Put.new(Bytes.toBytes(row))
 put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
 put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
 put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))
 table.put(put)
 # Test for AND Two Column Prefix Filters  
   

 filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
 filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));
 filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter1);
 get = Get.new(Bytes.toBytes(row))
 get.setFilter(filters)
 get.setMaxVersions();
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
 Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
 Timestamp=#{keyValue.getTimestamp()} 
 end
 {code}
 outputs:
 {code}
 Key=tag0; Value=value0; Timestamp=1325719223523
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5031) [89-fb] Remove hard-coded non-existent host name from TestScanner

2012-01-04 Thread Nicolas Spiegelberg (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg resolved HBASE-5031.


  Resolution: Fixed
Release Note: Fixed in 0.89-fb.  Not necessary in trunk

 [89-fb] Remove hard-coded non-existent host name from TestScanner 
 --

 Key: HBASE-5031
 URL: https://issues.apache.org/jira/browse/HBASE-5031
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor
 Attachments: D867.1.patch


 TestScanner is failing on 0.89-fb because it has a hard-coded fake host name 
 that it is trying to look up. Replacing this with 127.0.0.1:random_port 
 instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Madhuwanti Vaidya (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179991#comment-13179991
 ] 

Madhuwanti Vaidya commented on HBASE-5126:
--

Kannan: You added filter1 twice.

 AND (USING FilterList) of two ColumnPrefixFilters broken
 

 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: testAndTwoPrefixFilters.rb


 [Notice this in 89 branch. Possibly an issue in trunk also.]
 A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
 should return 0 kvs; instead it returns kvs with prefix tag0.
 {code}
 table = HTable.new(conf, tableName)
 put = Put.new(Bytes.toBytes(row))
 put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
 put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
 put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))
 table.put(put)
 # Test for AND Two Column Prefix Filters  
   

 filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
 filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));
 filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter1);
 get = Get.new(Bytes.toBytes(row))
 get.setFilter(filters)
 get.setMaxVersions();
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
 Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
 Timestamp=#{keyValue.getTimestamp()} 
 end
 {code}
 outputs:
 {code}
 Key=tag0; Value=value0; Timestamp=1325719223523
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5124) Backport LoadTestTool to 0.92

2012-01-04 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5124:
--

Attachment: hbase-5124.txt

The patch, v1

 Backport LoadTestTool to 0.92
 -

 Key: HBASE-5124
 URL: https://issues.apache.org/jira/browse/HBASE-5124
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Fix For: 0.92.0

 Attachments: hbase-5124.txt


 LoadTestTool is very useful.
 This JIRA backports LoadTestTool to 0.92 so that users don't have to build 
 TRUNK in order to use it against 0.92 cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5124) Backport LoadTestTool to 0.92

2012-01-04 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5124:
-

Assignee: Zhihong Yu

 Backport LoadTestTool to 0.92
 -

 Key: HBASE-5124
 URL: https://issues.apache.org/jira/browse/HBASE-5124
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Fix For: 0.92.0

 Attachments: hbase-5124.txt


 LoadTestTool is very useful.
 This JIRA backports LoadTestTool to 0.92 so that users don't have to build 
 TRUNK in order to use it against 0.92 cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1318#comment-1318
 ] 

stack commented on HBASE-5081:
--

All tests but TestClassLoading (unrelated) pass for me locally:

{code}
...
Results :

Failed tests:   
testClassLoadingFromHDFS(org.apache.hadoop.hbase.coprocessor.TestClassLoading): 
Class TestCP1 was missing on a region

Tests in error: 
  
testEnableTableRoundRobinAssignment(org.apache.hadoop.hbase.client.TestAdmin): 
org.apache.hadoop.hbase.TableNotEnabledException: testEnableTableAssignment

Tests run: 792, Failures: 1, Errors: 1, Skipped: 10

[INFO] 
[ERROR] BUILD FAILURE
[INFO] 
[INFO] There are test failures.
{code}

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180002#comment-13180002
 ] 

stack commented on HBASE-5081:
--

You down w/ my committing Ted?

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Amitanand Aiyer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180008#comment-13180008
 ] 

Amitanand Aiyer commented on HBASE-5126:


Fixed the bug in the test. 

But the issue still persists. There are rows that get printed.

 AND (USING FilterList) of two ColumnPrefixFilters broken
 

 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: testAndTwoPrefixFilters.rb


 [Notice this in 89 branch. Possibly an issue in trunk also.]
 A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
 should return 0 kvs; instead it returns kvs with prefix tag0.
 {code}
 table = HTable.new(conf, tableName)
 put = Put.new(Bytes.toBytes(row))
 put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
 put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
 put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))
 table.put(put)
 # Test for AND Two Column Prefix Filters  
   

 filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
 filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));
 filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter1);
 get = Get.new(Bytes.toBytes(row))
 get.setFilter(filters)
 get.setMaxVersions();
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
 Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
 Timestamp=#{keyValue.getTimestamp()} 
 end
 {code}
 outputs:
 {code}
 Key=tag0; Value=value0; Timestamp=1325719223523
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180018#comment-13180018
 ] 

stack commented on HBASE-5081:
--

Put up your version of the patch Ted and lets commit it.  I'll roll a new RC 
and lets test that rather than individual patch.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-5126:
-

Attachment: testAndTwoPrefixFilters.rb

 AND (USING FilterList) of two ColumnPrefixFilters broken
 

 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: testAndTwoPrefixFilters.rb, testAndTwoPrefixFilters.rb


 [Notice this in 89 branch. Possibly an issue in trunk also.]
 A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
 should return 0 kvs; instead it returns kvs with prefix tag0.
 {code}
 table = HTable.new(conf, tableName)
 put = Put.new(Bytes.toBytes(row))
 put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
 put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
 put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))
 table.put(put)
 # Test for AND Two Column Prefix Filters  
   

 filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
 filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));
 filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 get = Get.new(Bytes.toBytes(row))
 get.setFilter(filters)
 get.setMaxVersions();
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
 Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
 Timestamp=#{keyValue.getTimestamp()} 
 end
 {code}
 outputs:
 {code}
 Key=tag0; Value=value0; Timestamp=1325719223523
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-5126:
-

Description: 
[Notice this in 89 branch. Possibly an issue in trunk also.]

A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
should return 0 kvs; instead it returns kvs with prefix tag0.

{code}
table = HTable.new(conf, tableName)

put = Put.new(Bytes.toBytes(row))
put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))

table.put(put)

# Test for AND Two Column Prefix Filters

   
filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));

filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
filters.addFilter(filter1);
filters.addFilter(filter2);

get = Get.new(Bytes.toBytes(row))
get.setFilter(filters)
get.setMaxVersions();
keyValues = table.get(get).raw()

keyValues.each do |keyValue|
  puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
Timestamp=#{keyValue.getTimestamp()} 
end
{code}

outputs:

{code}
Key=tag0; Value=value0; Timestamp=1325719223523
{code}


  was:
[Notice this in 89 branch. Possibly an issue in trunk also.]

A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
should return 0 kvs; instead it returns kvs with prefix tag0.

{code}
table = HTable.new(conf, tableName)

put = Put.new(Bytes.toBytes(row))
put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))

table.put(put)

# Test for AND Two Column Prefix Filters

   
filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));

filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
filters.addFilter(filter1);
filters.addFilter(filter1);

get = Get.new(Bytes.toBytes(row))
get.setFilter(filters)
get.setMaxVersions();
keyValues = table.get(get).raw()

keyValues.each do |keyValue|
  puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
Timestamp=#{keyValue.getTimestamp()} 
end
{code}

outputs:

{code}
Key=tag0; Value=value0; Timestamp=1325719223523
{code}



 AND (USING FilterList) of two ColumnPrefixFilters broken
 

 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: testAndTwoPrefixFilters.rb, testAndTwoPrefixFilters.rb


 [Notice this in 89 branch. Possibly an issue in trunk also.]
 A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
 should return 0 kvs; instead it returns kvs with prefix tag0.
 {code}
 table = HTable.new(conf, tableName)
 put = Put.new(Bytes.toBytes(row))
 put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
 put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
 put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))
 table.put(put)
 # Test for AND Two Column Prefix Filters  
   

 filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
 filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));
 filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 get = Get.new(Bytes.toBytes(row))
 get.setFilter(filters)
 get.setMaxVersions();
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
 Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
 Timestamp=#{keyValue.getTimestamp()} 
 end
 {code}
 outputs:
 {code}
 Key=tag0; Value=value0; Timestamp=1325719223523
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5127) [ref manual] Better block cache documentation

2012-01-04 Thread Jean-Daniel Cryans (Created) (JIRA)
[ref manual] Better block cache documentation
-

 Key: HBASE-5127
 URL: https://issues.apache.org/jira/browse/HBASE-5127
 Project: HBase
  Issue Type: Task
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans


I've been playing a lot with 0.92's block caching and I wrote down some 
documentation that I think will be useful to others.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5127) [ref manual] Better block cache documentation

2012-01-04 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5127:
--

Attachment: HBASE-5127.patch

Here's the patch for book.xml, anyone can review both the content and my 
english?

 [ref manual] Better block cache documentation
 -

 Key: HBASE-5127
 URL: https://issues.apache.org/jira/browse/HBASE-5127
 Project: HBase
  Issue Type: Task
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Attachments: HBASE-5127.patch


 I've been playing a lot with 0.92's block caching and I wrote down some 
 documentation that I think will be useful to others.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5081:
--

Attachment: 5081-deleteNode-with-while-loop.txt

Patch based on Prakash's latest patch.
Changed 'if (oldtask.status == FAILURE)' to a while loop.
Also restored @param for the first SplitLogManager ctor

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 5081-deleteNode-with-while-loop.txt, 
 distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Amitanand Aiyer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180054#comment-13180054
 ] 

Amitanand Aiyer commented on HBASE-5126:


Seems like this is fixed in trunk ... although they also have new functions 
transform that doesn't exist in the 89 branch.

 AND (USING FilterList) of two ColumnPrefixFilters broken
 

 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
 Attachments: testAndTwoPrefixFilters.rb, testAndTwoPrefixFilters.rb


 [Notice this in 89 branch. Possibly an issue in trunk also.]
 A test which does a columnPrefixFilter(tag0) AND columnPrefixFilter(tag1) 
 should return 0 kvs; instead it returns kvs with prefix tag0.
 {code}
 table = HTable.new(conf, tableName)
 put = Put.new(Bytes.toBytes(row))
 put.add(cf1name, Bytes.toBytes(tag0), Bytes.toBytes(value0))
 put.add(cf1name, Bytes.toBytes(tag1), Bytes.toBytes(value1))
 put.add(cf1name, Bytes.toBytes(tag2), Bytes.toBytes(value2))
 table.put(put)
 # Test for AND Two Column Prefix Filters  
   

 filter1 = ColumnPrefixFilter.new(Bytes.toBytes(tag0));
 filter2 = ColumnPrefixFilter.new(Bytes.toBytes(tag2));
 filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 get = Get.new(Bytes.toBytes(row))
 get.setFilter(filters)
 get.setMaxVersions();
 keyValues = table.get(get).raw()
 keyValues.each do |keyValue|
   puts Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
 Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
 Timestamp=#{keyValue.getTimestamp()} 
 end
 {code}
 outputs:
 {code}
 Key=tag0; Value=value0; Timestamp=1325719223523
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-3949) Add Master link to RegionServer pages

2012-01-04 Thread Todd Lipcon (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HBASE-3949:
--

Assignee: Gregory Chanan

 Add Master link to RegionServer pages
 ---

 Key: HBASE-3949
 URL: https://issues.apache.org/jira/browse/HBASE-3949
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.90.3, 0.92.0
Reporter: Lars George
Assignee: Gregory Chanan
Priority: Minor
  Labels: noob

 Use the ZK info where the master is to add a UI link on the top of each 
 RegionServer page. Currently you cannot navigate directly to the Master UI 
 once you are on a RS page.
 Not sure if the info port is exposed OTTOMH, but we could either use the RS 
 local config setting for that or add it to ZK to enable lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >