[jira] [Created] (HBASE-5431) Improve delete marker handling in Import M/R jobs

2012-02-17 Thread Lars Hofhansl (Created) (JIRA)
Improve delete marker handling in Import M/R jobs
-

 Key: HBASE-5431
 URL: https://issues.apache.org/jira/browse/HBASE-5431
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


Import currently create a new Delete object for each delete KV found in a 
result object.
This can be improved with the new Delete API that allows adding a delete KV to 
a Delete object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5440) Allow import to optionally use HFileOutputFormat

2012-02-21 Thread Lars Hofhansl (Created) (JIRA)
Allow import to optionally use HFileOutputFormat


 Key: HBASE-5440
 URL: https://issues.apache.org/jira/browse/HBASE-5440
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


importtsv support imporing into a life table or to generate HFiles for bulk 
load.
import should allow the same.

Could even consider merging these tools into one (in principle the only 
difference is the parsing part - although that is maybe for a different jira).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5460) Add protobuf as M/R dependency jar

2012-02-22 Thread Lars Hofhansl (Created) (JIRA)
Add protobuf as M/R dependency jar
--

 Key: HBASE-5460
 URL: https://issues.apache.org/jira/browse/HBASE-5460
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


Getting this from M/R jobs (Export for example):

Error: java.lang.ClassNotFoundException: com.google.protobuf.Message
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:262)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5472) LoadIncrementalHFiles loops forever if the target table misses a CF

2012-02-23 Thread Lars Hofhansl (Created) (JIRA)
LoadIncrementalHFiles loops forever if the target table misses a CF
---

 Key: HBASE-5472
 URL: https://issues.apache.org/jira/browse/HBASE-5472
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Lars Hofhansl
Priority: Minor


I have some HFiles for two column families 'y','z', but I specified a target 
table that only has CF 'y'.
I see the following repeated forever.
...
12/02/23 22:57:37 WARN mapreduce.LoadIncrementalHFiles: Attempt to bulk load 
region containing  into table z with files [family:y 
path:hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09, family:z 
path:hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d] failed.  
This is recoverable and they will be retried.
12/02/23 22:57:37 DEBUG client.MetaScanner: Scanning .META. starting at 
row=z,,00 for max=2147483647 rows using 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7b7a4989
12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Split occured while 
grouping HFiles, retry attempt 1596 with 2 files remaining to group or split
12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09 first=r 
last=r
12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load 
hfile=hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d first=r 
last=r
12/02/23 22:57:37 DEBUG mapreduce.LoadIncrementalHFiles: Going to connect to 
server region=z,,1330066309814.d5fa76a38c9565f614755e34eacf8316., 
hostname=localhost, port=60020 for row 
...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5475) Allow importtsv and Import to work truly offline when using bulk import option

2012-02-24 Thread Lars Hofhansl (Created) (JIRA)
Allow importtsv and Import to work truly offline when using bulk import option
--

 Key: HBASE-5475
 URL: https://issues.apache.org/jira/browse/HBASE-5475
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Lars Hofhansl


Currently importtsv (and now also Import with HBASE-5440) support using 
HFileOutputFormat for later bulk loading.
However, currently that cannot be without having access to the table we're 
going to import to, because both importtsv and Import need to lookup the split 
points, and find the compression setting.
It would be nice if there would be an offline way to provide the split point 
and compression setting.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5497) Add protobuf as M/R dependency jar (mapred)

2012-02-29 Thread Lars Hofhansl (Created) (JIRA)
Add protobuf as M/R dependency jar (mapred)
---

 Key: HBASE-5497
 URL: https://issues.apache.org/jira/browse/HBASE-5497
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


Getting this from M/R jobs (Export for example):

Error: java.lang.ClassNotFoundException: com.google.protobuf.Message
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:262)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5509) MR based copier for copying HFiles (trunk version)

2012-03-01 Thread Lars Hofhansl (Created) (JIRA)
MR based copier for copying HFiles (trunk version)
--

 Key: HBASE-5509
 URL: https://issues.apache.org/jira/browse/HBASE-5509
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan


This copier is a modification of the distcp tool in HDFS. It does the following:

1. List out all the regions in the HBase cluster for the required table
2. Write the above out to a file
3. Each mapper 
   3.1 lists all the HFiles for a given region by querying the regionserver
   3.2 copies all the HFiles
   3.3 outputs success if the copy succeeded, failure otherwise. Failed regions 
are retried in another loop
4. Mappers are placed on nodes which have maximum locality for a given region 
to speed up copying


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5523) Fix Delete Timerange logic for KEEP_DELETED_CELLS

2012-03-05 Thread Lars Hofhansl (Created) (JIRA)
Fix Delete Timerange logic for KEEP_DELETED_CELLS
-

 Key: HBASE-5523
 URL: https://issues.apache.org/jira/browse/HBASE-5523
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0, 0.96.0


A Delete at time T marks a Put at time T as deleted.
In parent I invented special logic that insert a virtual millisecond into the 
tr if the encountered KV is a delete marker.
This was so that there is a way to specify a timerange that would allow to see 
the put but not the delete:
{code}
if (kv.isDelete()) {
  if (!keepDeletedCells) {
// first ignore delete markers if the scanner can do so, and the
// range does not include the marker
boolean includeDeleteMarker = seePastDeleteMarkers ?
// +1, to allow a range between a delete and put of same TS
tr.withinTimeRange(timestamp+1) :
tr.withinOrAfterTimeRange(timestamp);
{code}

Discussed this today with a coworker and he convinced me that this is very 
confusing and also not needed.
When we have a Delete and Put at the same time T, there *is* not timerange that 
can include the Put but not the Delete.

So I will change the code to this (and fix the tests):
{code}
if (kv.isDelete()) {
  if (!keepDeletedCells) {
// first ignore delete markers if the scanner can do so, and the
// range does not include the marker
boolean includeDeleteMarker = seePastDeleteMarkers ?
tr.withinTimeRange(timestamp) :
tr.withinOrAfterTimeRange(timestamp);
{code}

It's easier to understand, and does not lead to strange scenarios when the TS 
is used as a controlled counter.

Needs to be done before 0.94 goes out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5541) Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks

2012-03-08 Thread Lars Hofhansl (Created) (JIRA)
Avoid holding the rowlock during HLog sync in HRegion.mutateRowWithLocks


 Key: HBASE-5541
 URL: https://issues.apache.org/jira/browse/HBASE-5541
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


Currently mutateRowsWithLocks holds the row lock while the HLog is sync'ed.
Similar to what we do in doMiniBatchPut, we should create the log entry with 
the lock held, but only sync the HLog after the log is released, along with 
rollback logic in case the sync'ing fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5547) Don't delete HFiles when in backup mode

2012-03-08 Thread Lars Hofhansl (Created) (JIRA)
Don't delete HFiles when in backup mode
-

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl


This came up in a discussion I had with Stack.
It would be nice if HBase could be notified that a backup is in progress (via a 
znode for example) and in that case either:
1. rename HFiles to be delete to file.bck
2. rename the HFiles into a special directory
3. rename them to a general trash directory (which would not need to be tied to 
backup mode).

That way it should be able to get a consistent backup based on HFiles (HDFS 
snapshots or hard links would be better options here, but we do not have those).

#1 makes cleanup a bit harder.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5569) TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally

2012-03-12 Thread Lars Hofhansl (Created) (JIRA)
TestAtomicOperation.testMultiRowMutationMultiThreads fails occasionally
---

 Key: HBASE-5569
 URL: https://issues.apache.org/jira/browse/HBASE-5569
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Priority: Minor


What I pieces together so far is that it is the *scanning* side that has 
problems sometimes.

Every time I see a assertion failure in the log I see this before:
{quote}
2012-03-12 21:48:49,523 DEBUG [Thread-211] regionserver.StoreScanner(499): 
Storescanner.peek() is changed where before = 
rowB/colfamily11:qual1/75366/Put/vlen=6,and after = 
rowB/colfamily11:qual1/75203/DeleteColumn/vlen=0
{quote}
The order of if the Put and Delete is sometimes reversed.

The test threads should always see exactly one KV, if the before was the Put 
the thread see 0 KVs, if the before was the Delete the threads see 2 KVs.

This debug message comes from StoreScanner to checkReseek. It seems we still 
some consistency issue with scanning sometimes :(


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5604) HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.

2012-03-20 Thread Lars Hofhansl (Created) (JIRA)
HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.


 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl


Just an idea I had. Might be useful for restore of a backup using the HLogs.
This could either be a standalone tool and or an M/R (with a mapper per HLog 
file).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5622) Improve efficiency of mapred vesion of RowCounter

2012-03-22 Thread Lars Hofhansl (Created) (JIRA)
Improve efficiency of mapred vesion of RowCounter
-

 Key: HBASE-5622
 URL: https://issues.apache.org/jira/browse/HBASE-5622
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5641) decayingSampleTick1 prevents HBase from shutting down.

2012-03-26 Thread Lars Hofhansl (Created) (JIRA)
decayingSampleTick1 prevents HBase from shutting down.
--

 Key: HBASE-5641
 URL: https://issues.apache.org/jira/browse/HBASE-5641
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Blocker
 Fix For: 0.94.0, 0.96.0
 Attachments: 5641.txt

I think this is the problem. It creates a non-daemon thread.
{code}
  private static final ScheduledExecutorService TICK_SERVICE = 
  Executors.newScheduledThreadPool(1, 
  Threads.getNamedThreadFactory(decayingSampleTick));
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally

2012-03-27 Thread Lars Hofhansl (Created) (JIRA)
TestAtomicOperation.testMultiRowMutationMultiThreads is still failing 
occasionally
--

 Key: HBASE-5659
 URL: https://issues.apache.org/jira/browse/HBASE-5659
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Minor


See run here: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/
{quote}
2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): 
Storescanner.peek() is changed where before = 
rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = 
rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0
2012-03-27 04:36:12,629 INFO  [Thread-121] regionserver.HRegion(1558): Finished 
memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region 
testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, 
sequenceid=7927, compaction requested=true
2012-03-27 04:36:12,629 DEBUG [Thread-126] 
regionserver.TestAtomicOperation$2(362): flushing
2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): Started 
memstore flush for testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., 
current region memstore size 1.9k
2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): Finished 
snapshotting testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., 
commencing wait for mvcc, flushsize=1968
2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): Finished 
snapshotting, commencing flushing stores
2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating 
file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
 with permission=rwxrwxrwx
2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): 
Initialized with CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=false]
2012-03-27 04:36:12,631 INFO  [Thread-126] regionserver.StoreFile$Writer(997): 
Delete Family Bloom filter type for 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57:
 CompoundBloomFilterWriter
2012-03-27 04:36:12,632 INFO  [Thread-126] regionserver.StoreFile$Writer(1220): 
NO General Bloom and NO DeleteFamily was added to HFile 
(/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57)
 
2012-03-27 04:36:12,632 INFO  [Thread-126] regionserver.Store(770): Flushed , 
sequenceid=7934, memsize=1.9k, into tmp file 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming 
flushed file at 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
 to 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57
2012-03-27 04:36:12,634 INFO  [Thread-126] regionserver.Store(818): Added 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57,
 entries=12, sequenceid=7934, filesize=1.3k
2012-03-27 04:36:12,642 DEBUG [Thread-118] 
regionserver.TestAtomicOperation$2(392): []
Exception in thread Thread-118 junit.framework.AssertionFailedError   at 
junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.fail(Assert.java:56)
at 
org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:394)
2012-03-27 04:36:12,643 INFO  [Thread-126] regionserver.HRegion(1558): Finished 
memstore flush of ~1.9k/1968, currentsize=1.3k/1312 for region 

[jira] [Created] (HBASE-5670) Have Mutation implement the Row interface.

2012-03-28 Thread Lars Hofhansl (Created) (JIRA)
Have Mutation implement the Row interface.
--

 Key: HBASE-5670
 URL: https://issues.apache.org/jira/browse/HBASE-5670
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial


In HBASE-4347 I factored some code from Put/Delete/Append in Mutation.

In a discussion with a co-worker I noticed that Put/Delete/Append still 
implement the Row interface, but Mutation does not.

In a trivial change I would like to move that interface up to Mutation, along 
with changing HTable.batch(ListRow) to HTable.batch(List? extends Row) 
(HConnection.processBatch takes List? extends Row already anyway), so that 
HTable.batch can be used with a list of Mutations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5682) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers (port to 0.94)

2012-03-30 Thread Lars Hofhansl (Created) (JIRA)
Add retry logic in HConnectionImplementation#resetZooKeeperTrackers (port to 
0.94)
--

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl


Just realized that without this HBASE-4805 is broken.
I.e. there's no point keeping a persistent HConnection around if it can be 
rendered permanently unusable if the ZK connection is lost temporarily.
Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5096) Replication does not handle deletes correctly.

2011-12-26 Thread Lars Hofhansl (Created) (JIRA)
Replication does not handle deletes correctly.
--

 Key: HBASE-5096
 URL: https://issues.apache.org/jira/browse/HBASE-5096
 Project: HBase
  Issue Type: Sub-task
  Components: replication
Affects Versions: 0.94.0, 0.92.1
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


Teruyoshi Zenmyo discovered this problem.

The problem turns out to be this code in ReplicationSink.java:
{code}
if (kvs.get(0).isDelete()) {
...
  if (kv.isDeleteFamily()) {
delete.deleteFamily(kv.getFamily());
  } else if (!kv.isEmptyColumn()) {
delete.deleteColumn(kv.getFamily(), kv.getQualifier());
  }
}
...
{code}

So the code deal with families delete markers and then assumes that if it's not 
a family delete marker it must have been a version delete marker.
(deleteColumn sets a version delete marker, deleteColumns sets a column delete 
marker).

I.e. column delete markers are not replicated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5118) Fix Scan documentation

2012-01-03 Thread Lars Hofhansl (Created) (JIRA)
Fix Scan documentation
--

 Key: HBASE-5118
 URL: https://issues.apache.org/jira/browse/HBASE-5118
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Trivial


Current documentation for scan states:
{code}
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes(cf),Bytes.toBytes(attr));
scan.setStartRow( Bytes.toBytes(row));   // start key is 
inclusive
scan.setStopRow( Bytes.toBytes(row +  new byte[] {0}));  // stop key is 
exclusive
for(Result result : htable.getScanner(scan)) {
  // process Result instance
}
{code}
row +  new byte[] {0}  is not correct. That should  row + (char)0


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5164) Better HTable resource consumption in CoprocessorHost

2012-01-09 Thread Lars Hofhansl (Created) (JIRA)
Better HTable resource consumption in CoprocessorHost
-

 Key: HBASE-5164
 URL: https://issues.apache.org/jira/browse/HBASE-5164
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


HBASE-4805 allows for more control over HTable's resource consumption.
This is currently not used by CoprocessorHost (even though it would even be 
more critical to control this inside the RegionServer).

It's not immediate obvious how to do that.
Maybe CoprocessorHost should maintain a lazy ExecutorService and HConnection 
and reuse both for all HTables retrieved via 
CoprocessorEnvironment.getTable(...).

Not sure how critical this is, but I feel without this it is dangerous to use 
getTable, as it would lead to all resource consumption problems we find in the 
client, but inside a crucial part of the HBase servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5203) Fix atomic put/delete with region server failures.

2012-01-15 Thread Lars Hofhansl (Created) (JIRA)
Fix atomic put/delete with region server failures.
--

 Key: HBASE-5203
 URL: https://issues.apache.org/jira/browse/HBASE-5203
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl


HBASE-3584 does not not provide fully atomic operation in case of region server 
failures (see explanation there).

What should happen is that either (1) all edits are applied via a single 
WALEdit, or (2) the WALEdits are applied in async mode and then sync'ed 
together.

For #1 it is not clear whether it is advisable to manage multiple *different* 
operations (Put/Delete) via a single WAL edit. A quick check reveals that WAL 
replay on region startup would work, but that replication would need to be 
adapted. The refactoring needed would be non-trivial.

#2 Might actually not work, as another operation could request sync'ing a later 
edit and hence flush these entries out as well.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5205) Delete handles deleteFamily incorrectly

2012-01-15 Thread Lars Hofhansl (Created) (JIRA)
Delete handles deleteFamily incorrectly
---

 Key: HBASE-5205
 URL: https://issues.apache.org/jira/browse/HBASE-5205
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Lars Hofhansl
Priority: Minor


Delete.deleteFamily clears all other markers for the same family.
That is not correct as some of these other markers might be for a later time.

That logic should be removed.

If (really) needed this can be slightly optimized by keeping track of the max 
TS so far for each family.
If both the TS-so-far and the TS of a new deleteFamily request is  
LATEST_TIMESTAMP and the TS-so-far is  deleteFamily marker, then the previous 
delete marker can be removed.
I think that might be overkill, as most deletes issued from clients are for 
LATEST_TIMESTAMP (which the server translates to the current time).

I'll have a (one-line) patch soon. Unless folks insist on the optimization I 
mentioned above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5229) Support atomic region operations

2012-01-18 Thread Lars Hofhansl (Created) (JIRA)
Support atomic region operations


 Key: HBASE-5229
 URL: https://issues.apache.org/jira/browse/HBASE-5229
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0
 Attachments: 5229.txt

As discussed (at length) on the dev mailing list with the HBASE-3584 and 
HBASE-5203 committed, supporting atomic cross row transactions within a region 
becomes simple.
I am aware of the hesitation about the usefulness of this feature, but we have 
to start somewhere.

Let's use this jira for discussion, I'll attach a patch (with tests) 
momentarily to make this concrete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5257) Allow filter to be evaluated after version handling

2012-01-22 Thread Lars Hofhansl (Created) (JIRA)
Allow filter to be evaluated after version handling
---

 Key: HBASE-5257
 URL: https://issues.apache.org/jira/browse/HBASE-5257
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl


There are various usecases and filter types where evaluating the filter before 
version are handled either do not make sense, or make filter handling more 
complicated.

Also see this comment in ScanQueryMatcher:
{code}
/**
 * Filters should be checked before checking column trackers. If we do
 * otherwise, as was previously being done, ColumnTracker may increment its
 * counter for even that KV which may be discarded later on by Filter. This
 * would lead to incorrect results in certain cases.
 */
{code}

So we had Filters after the column trackers (which do the version checking), 
and then moved it.
Should be at the discretion of the Filter.
Could either add a new method to FilterBase (maybe excludeVersions() or 
something). Or have a new Filter wrapper (like WhileMatchFilter), that should 
only be used as outmost filter and indicates the same (maybe 
ExcludeVersionsFilter).

See latest comments on HBASE-5229 for motivation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5266) Add documentation for ColumnRangeFilter

2012-01-23 Thread Lars Hofhansl (Created) (JIRA)
Add documentation for ColumnRangeFilter
---

 Key: HBASE-5266
 URL: https://issues.apache.org/jira/browse/HBASE-5266
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


There are only a few lines of documentation for ColumnRangeFilter.
Given the usefulness of this filter for efficient intra-row scanning (see 
HASE-5229 and HBASE-4256), we should make this filter more prominent in the 
documentation.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5268) Add delete column prefix delete marker

2012-01-23 Thread Lars Hofhansl (Created) (JIRA)
Add delete column prefix delete marker
--

 Key: HBASE-5268
 URL: https://issues.apache.org/jira/browse/HBASE-5268
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


This is another part missing in the wide row challenge.
Currently entire families of a row can be deleted or individual columns or 
versions.
There is no facility to mark multiple columns for deletion by column prefix.

Turns out that be achieve with very little code (it's possible that I missed 
some of the new delete bloom filter code, so please review this thoroughly). 
I'll attach a patch soon, just working on some tests now.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5304) Pluggable split key policy

2012-01-30 Thread Lars Hofhansl (Created) (JIRA)
Pluggable split key policy
--

 Key: HBASE-5304
 URL: https://issues.apache.org/jira/browse/HBASE-5304
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0


We need a way to specify custom policies to determine split keys.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5311) Allow inmemory Memstore compactions

2012-01-31 Thread Lars Hofhansl (Created) (JIRA)
Allow inmemory Memstore compactions
---

 Key: HBASE-5311
 URL: https://issues.apache.org/jira/browse/HBASE-5311
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl


Just like we periodically compact the StoreFiles we should also periodically 
compact the MemStore.
During these compactions we eliminate deleted cells, expired cells, cells to 
removed because of version count, etc, before we even do a memstore flush.

Besides the optimization that we could get from this, it should also allow to 
remove the special handling of IVN, Increment, and Append (all of which use 
upsert logic to avoid accumulating excessive cells in the Memstore).

Not targeting this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5333) Introduce Memstore backpressure for writes

2012-02-03 Thread Lars Hofhansl (Created) (JIRA)
Introduce Memstore backpressure for writes


 Key: HBASE-5333
 URL: https://issues.apache.org/jira/browse/HBASE-5333
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl


Currently if the memstore/flush/compaction cannot keep up with the writeload, 
we block writers up to hbase.hstore.blockingWaitTime milliseconds (default is 
9).
Would be nice if there was a concept of a soft backpressure that slows 
writing clients gracefully *before* we reach this condition.

From the log:
2012-02-04 00:00:06,963 WARN 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
table,,1328313512779.c2761757621ddf8fb78baf5288d71271. has too many store 
files; delaying flush up to 9ms


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5336) Spurious exceptions in HConnectionImplementation

2012-02-03 Thread Lars Hofhansl (Created) (JIRA)
Spurious exceptions in HConnectionImplementation


 Key: HBASE-5336
 URL: https://issues.apache.org/jira/browse/HBASE-5336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


I have seen this on the client a few time during heave write testing:

java.util.concurrent.ExecutionException: java.io.IOException: 
java.io.IOException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1376)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:891)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:743)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:730)
at NewsFeedCreate.insert(NewsFeedCreate.java:91)
at NewsFeedCreate$1.run(NewsFeedCreate.java:38)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: java.io.IOException: 
java.lang.NullPointerException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
at 
org.apache.hadoop.hbase.client.ServerCallable.translateException(ServerCallable.java:228)
at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:212)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1348)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
... 1 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.lang.NullPointerException
at 
org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:243)
at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1289)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1386)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2161)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1954)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3363)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)

at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:899)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy1.multi(Unknown Source)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1353)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1351)
at 
org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
... 7 more


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5368) Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in HBase installs

2012-02-09 Thread Lars Hofhansl (Created) (JIRA)
Move PrefixSplitKeyPolicy out of the src/test into src, so it is accessible in 
HBase installs
-

 Key: HBASE-5368
 URL: https://issues.apache.org/jira/browse/HBASE-5368
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor


Very simple change to make PrefixSplitKeyPolicy accessible in HBase installs 
(user still needs to setup the table(s) accordingly).

Right now it is in src/test/org.apache.hadoop.hbase.regionserver, I propose 
moving it to src/org.apache.hadoop.hbase.regionserver (alongside 
ConstantSizeRegionSplitPolicy), and maybe renaming it too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5370) Allow HBase shell set HTableDescriptor values

2012-02-09 Thread Lars Hofhansl (Created) (JIRA)
Allow HBase shell set HTableDescriptor values
-

 Key: HBASE-5370
 URL: https://issues.apache.org/jira/browse/HBASE-5370
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Priority: Minor


Currently it does not seem to be possible to set value on a table's 
HTableDescriptor (either on creation or afterwards).

The syntax I have in mind is something like:
create {NAME='table', 'somekey'='somevalue'}, 'column'

In analogy to how we allow a column to either a string ('column') or an 
association {NAME='column', ...}

alter would be changed to allow setting arbitrary values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5774) Add documentation for WALPlayer to HBase reference guide.

2012-04-12 Thread Lars Hofhansl (Created) (JIRA)
Add documentation for WALPlayer to HBase reference guide.
-

 Key: HBASE-5774
 URL: https://issues.apache.org/jira/browse/HBASE-5774
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4488) Store could miss rows during flush

2011-09-26 Thread Lars Hofhansl (Created) (JIRA)
Store could miss rows during flush
--

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Priority: Critical


While looked at HBASE-4344 I found that my change HBASE-4241 contains a 
critical mistake.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4496) HFile V2 does not honor setCacheBlocks when scanning.

2011-09-27 Thread Lars Hofhansl (Created) (JIRA)
HFile V2 does not honor setCacheBlocks when scanning.
-

 Key: HBASE-4496
 URL: https://issues.apache.org/jira/browse/HBASE-4496
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


While testing the LRU cache during the scanning I noticed quite some churn in 
the cache even when Scan.cacheBlocks is set to false. After debugging this, I 
found that HFile V2 always caches blocks in the LRU cache regardless of the 
cacheBlocks setting.

Here's a trace (from Eclipse) showing the problem:

HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279 
HFileReaderV2.readBlockData(long, long, int, boolean) line: 219 
HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) 
line: 191
HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502 
HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539
StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151  
StoreFileScanner.reseek(KeyValue) line: 110 
KeyValueHeap.reseek(KeyValue) line: 255 
StoreScanner.reseek(KeyValue) line: 409 
StoreScanner.next(ListKeyValue, int) line: 304
KeyValueHeap.next(ListKeyValue, int) line: 114
KeyValueHeap.next(ListKeyValue) line: 143 
HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774
HRegion$RegionScannerImpl.nextInternal(int) line: 2722  
HRegion$RegionScannerImpl.next(ListKeyValue, int) line: 2682  
HRegion$RegionScannerImpl.next(ListKeyValue) line: 2699   
HRegionServer.next(long, int) line: 2092

Every scanner.next causes a reseek, which eventually causes a call to 
HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the 
cacheBlocks information is lost. HFileReaderV2.readBlockData calls 
HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.

The fix is not immediately clear, unless we want to pass cacheBlocks to 
HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to 
HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as 
readBlockData should not care about caching.

Avoiding caching during scans is somewhat important for us.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4517) Document new replication features in 0.92

2011-09-29 Thread Lars Hofhansl (Created) (JIRA)
Document new replication features in 0.92
-

 Key: HBASE-4517
 URL: https://issues.apache.org/jira/browse/HBASE-4517
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.92.0, 0.94.0


Document changes from HBASE-2195 and HBASE-2196

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4536) Allow CF to retain deleted rows

2011-10-03 Thread Lars Hofhansl (Created) (JIRA)
Allow CF to retain deleted rows
---

 Key: HBASE-4536
 URL: https://issues.apache.org/jira/browse/HBASE-4536
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0


Parent allows for a cluster to retain rows for a TTL or keep a minimum number 
of versions.
However, if a client deletes a row all version older than the delete tomb stone 
will be remove at the next major compaction (and even at memstore flush - see 
HBASE-4241).
There should be a way to retain those version to guard against software error.

I see two options here:
1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED.
2. Folds this into the parent change. I.e. keep minimum-number-of-versions of 
versions even past the delete marker.

#1 would allow for more flexibility. #2 comes somewhat naturally with parent 
(from a user viewpoint)

Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4556) Fix all incorrect uses of InternalScanner.next(...)

2011-10-07 Thread Lars Hofhansl (Created) (JIRA)
Fix all incorrect uses of InternalScanner.next(...)
---

 Key: HBASE-4556
 URL: https://issues.apache.org/jira/browse/HBASE-4556
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


There are cases all over the code where InternalScanner.next(...) is not used 
correctly.

I see this a lot:
{code}
while(scanner.next(...)) {
}
{code}

The correct pattern is:
{code}
boolean more = false;
do {
   more = scanner.next(...);
} while (more);
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4583) Integrate RWCC with Append and Increment operations

2011-10-12 Thread Lars Hofhansl (Created) (JIRA)
Integrate RWCC with Append and Increment operations
---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.94.0


Currently Increment and Append operations do not work with RWCC and hence a 
client could see the results of multiple such operation mixed in the same 
Get/Scan.
The semantics might be a bit more interesting here as upsert adds and removes 
to and from the memstore.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4626) Filters unnecessarily copy byte arrays...

2011-10-19 Thread Lars Hofhansl (Created) (JIRA)
Filters unnecessarily copy byte arrays...
-

 Key: HBASE-4626
 URL: https://issues.apache.org/jira/browse/HBASE-4626
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


Just looked at SingleCol and ValueFilter... And on every column compared they 
create a copy of the column and/or value portion of the KV.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4673) NPE in HFileReaderV2.close during major compaction when hfile.block.cache.size is set to 0

2011-10-25 Thread Lars Hofhansl (Created) (JIRA)
NPE in HFileReaderV2.close during major compaction when hfile.block.cache.size 
is set to 0 
---

 Key: HBASE-4673
 URL: https://issues.apache.org/jira/browse/HBASE-4673
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Priority: Minor


On a test system got this exception when hfile.block.cache.size is set to 0:

java.lang.NullPointerException
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.close(HFileReaderV2.java:321)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.close(StoreFile.java:1065)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.closeReader(StoreFile.java:539)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.deleteReader(StoreFile.java:549)
at 
org.apache.hadoop.hbase.regionserver.Store.completeCompaction(Store.java:1314)
at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:686)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1016)
at 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest.run(CompactionRequest.java:178)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619) 

Minor issue as nobody in their right mind with have hfile.block.cache.size=0

Looks like this is due to HBASE-4422

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4682) Support deleted rows using Import/Export

2011-10-26 Thread Lars Hofhansl (Created) (JIRA)
Support deleted rows using Import/Export


 Key: HBASE-4682
 URL: https://issues.apache.org/jira/browse/HBASE-4682
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Reporter: Lars Hofhansl


Parent allow keeping deleted rows around. Would be nice if those could be 
exported and imported as well.
All the building blocks are there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4683) Create config option to only cache index blocks

2011-10-26 Thread Lars Hofhansl (Created) (JIRA)
Create config option to only cache index blocks
---

 Key: HBASE-4683
 URL: https://issues.apache.org/jira/browse/HBASE-4683
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


This would add a new boolean config option: hfile.block.cache.datablocks
Default would be true.

Setting this to false allows HBase in a mode where only index blocks are 
cached, which is useful for analytical scenarios where a useful working set of 
the data cannot be expected to fit into the cache.
This is the equivalent of setting all cacheBlocks to false on all scans 
(including scans on behalf of gets).

I would like to general feeling about what folks think about this.
The change itself would be simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4691) Remove more unnecessary byte[] copies from KeyValues

2011-10-27 Thread Lars Hofhansl (Created) (JIRA)
Remove more unnecessary byte[] copies from KeyValues


 Key: HBASE-4691
 URL: https://issues.apache.org/jira/browse/HBASE-4691
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0


Just looking through the code I found some more spots where we unnecessarily 
copy byte[] rather than just passing offset and length around.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4800) Result.compareResults is incorrect

2011-11-16 Thread Lars Hofhansl (Created) (JIRA)
Result.compareResults is incorrect
--

 Key: HBASE-4800
 URL: https://issues.apache.org/jira/browse/HBASE-4800
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Lars Hofhansl


A coworker of mine (James Taylor) found a bug in Result.compareResults(...).
This condition:
{code}
  if (!ourKVs[i].equals(replicatedKVs[i]) 
  !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
throw new Exception(This result was different: 
{code}
should be
{code}
  if (!ourKVs[i].equals(replicatedKVs[i]) ||
  !Bytes.equals(ourKVs[i].getValue(), replicatedKVs[i].getValue())) {
throw new Exception(This result was different: 
{code}

Just checked, this is wrong in all branches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4805) Allow better control of resource consumption in HTable

2011-11-16 Thread Lars Hofhansl (Created) (JIRA)
Allow better control of resource consumption in HTable
--

 Key: HBASE-4805
 URL: https://issues.apache.org/jira/browse/HBASE-4805
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


From some internal discussions at Salesforce we concluded that we need better 
control over the resources (mostly threads) consumed by HTable when used in a 
AppServer with many client threads.

Since HTable is not thread safe, the only options are cache them (in a custom 
thread local or using HTablePool) or to create them on-demand.

I propose a simple change: Add a new constructor to HTable that takes an 
optional ExecutorService and HConnection instance. That would make HTable a 
pretty lightweight object and we would manage the ES and HC separately.

I'll upload a patch a soon to get some feedback.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4838) Port 2856 (TestAcidGuarantees is failing) to 0.92

2011-11-21 Thread Lars Hofhansl (Created) (JIRA)
Port 2856 (TestAcidGuarantees is failing) to 0.92
-

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0


Moving back port into a separate issue (as suggested by JonH), because this not 
trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4844) Coprocessor hooks for log rolling

2011-11-21 Thread Lars Hofhansl (Created) (JIRA)
Coprocessor hooks for log rolling
-

 Key: HBASE-4844
 URL: https://issues.apache.org/jira/browse/HBASE-4844
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Priority: Minor


In order to eventually do point in time recovery we need a way to reliably back 
up the logs. Rather than adding some hard coded changes, we can provide 
coprocessor hooks and folks can implement their own policies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4886) truncate fails in HBase shell

2011-11-28 Thread Lars Hofhansl (Created) (JIRA)
truncate fails in HBase shell
-

 Key: HBASE-4886
 URL: https://issues.apache.org/jira/browse/HBASE-4886
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0
 Attachments: 4886.txt

Seeing this in trunk:

{noformat}
hbase(main):001:0 truncate 'table'
Truncating 'table' table (it may take a while):

ERROR: wrong number of arguments (1 for 3)

Here is some help for this command:
  Disables, drops and recreates the specified table.
{noformat}

... caused by the removal of the HTable(String) constructor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4945) NPE in HRegion.bulkLoadHFiles(...)

2011-12-03 Thread Lars Hofhansl (Created) (JIRA)
NPE in HRegion.bulkLoadHFiles(...)
--

 Key: HBASE-4945
 URL: https://issues.apache.org/jira/browse/HBASE-4945
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Priority: Minor


Was playing with completebulkload, and ran into an NPE.
The problem is here.

{code}
Store store = getStore(familyName);
if (store == null) {
  IOException ioe = new DoNotRetryIOException(
  No such column family  + Bytes.toStringBinary(familyName));
  ioes.add(ioe);
  failures.add(p);
}

try {
  store.assertBulkLoadHFileOk(new Path(path));
} catch (WrongRegionException wre) {
  // recoverable (file doesn't fit in region)
  failures.add(p);
} catch (IOException ioe) {
  // unrecoverable (hdfs problem)
  ioes.add(ioe);
}
{code}

This should be 
{code}
Store store = getStore(familyName);
if (store == null) {
...
} else {
  try {
store.assertBulkLoadHFileOk(new Path(path));
...
}
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4979) Setting KEEP_DELETE_CELLS fails in shell

2011-12-07 Thread Lars Hofhansl (Created) (JIRA)
Setting KEEP_DELETE_CELLS fails in shell


 Key: HBASE-4979
 URL: https://issues.apache.org/jira/browse/HBASE-4979
 Project: HBase
  Issue Type: Sub-task
  Components: shell
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


admin.rb uses wrong method on HColumnDescription to enable keeping of deleted 
cells.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4981) add raw scan support to shell

2011-12-07 Thread Lars Hofhansl (Created) (JIRA)
add raw scan support to shell
-

 Key: HBASE-4981
 URL: https://issues.apache.org/jira/browse/HBASE-4981
 Project: HBase
  Issue Type: Sub-task
  Components: shell
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


Parent adds raw scan support to include delete markers and deleted rows in 
scan results. Would be nice it that would available in the shell to see exactly 
what exists in a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4998) Support deleted rows in CopyTable

2011-12-09 Thread Lars Hofhansl (Created) (JIRA)
Support deleted rows in CopyTable
-

 Key: HBASE-4998
 URL: https://issues.apache.org/jira/browse/HBASE-4998
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor


It turns out that with HBASE-4682 in place, it is trivial to add this to 
CopyTable as well. This would be another tools in the backup arsenal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5058) Allow HBaseAmin to use an existing connection

2011-12-16 Thread Lars Hofhansl (Created) (JIRA)
Allow HBaseAmin to use an existing connection
-

 Key: HBASE-5058
 URL: https://issues.apache.org/jira/browse/HBASE-5058
 Project: HBase
  Issue Type: Sub-task
  Components: client
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor


What HBASE-4805 does for HTables, this should do for HBaseAdmin.
Along with this the shared error handling and retrying between HBaseAdmin and 
HConnectionManager can also be improved. I'll attach a first pass patch soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5059) Tests for: Support deleted rows in CopyTable

2011-12-16 Thread Lars Hofhansl (Created) (JIRA)
Tests for: Support deleted rows in CopyTable


 Key: HBASE-5059
 URL: https://issues.apache.org/jira/browse/HBASE-5059
 Project: HBase
  Issue Type: Sub-task
  Components: mapreduce
Affects Versions: 0.94.0
Reporter: Lars Hofhansl
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira