from:"Keith Turner \(Commented\) \(JIRA\)"

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

2012-04-12 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252491#comment-13252491
]

Keith Turner commented on HBASE-5754:
-

The counts for the 1B run seem odd to me , but maybe thats just an artifact of
how many map task you ran for the generator and how much data each task
generated. If a map task does not does not generate a multiple of 25,000,000
then it will leave some unreferenced. It generates a circular linked list
every 25M.

{noformat}
12/04/12 03:54:11 INFO mapred.JobClient: REFERENCED=564459547
12/04/12 03:54:11 INFO mapred.JobClient: UNREFERENCED=104000
{noformat}

If you were to run 10 map task each generating 100M, then this should generate
1B with all nodes referenced. Minimizing the number of unreferenced is ideal,
because the test can not detect the loss of unreferenced nodes. I should
probably add this info to the readme.

data lost with gora continuous ingest test (goraci)
---

Key: HBASE-5754
URL: https://issues.apache.org/jira/browse/HBASE-5754
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.1
Environment: 10 node test cluster
Reporter: Eric Newton
Assignee: stack

Keith Turner re-wrote the accumulo continuous ingest test using gora, which
has both hbase and accumulo back-ends.
I put a billion entries into HBase, and ran the Verify map/reduce job. The
verification failed because about 21K entries were missing. The goraci
[README|https://github.com/keith-turner/goraci] explains the test, and how it
detects missing data.
I re-ran the test with 100 million entries, and it verified successfully.
Both of the times I tested using a billion entries, the verification failed.
If I run the verification step twice, the results are consistent, so the
problem is
probably not on the verify step.
Here's the versions of the various packages:
||package||version||
|hadoop|0.20.205.0|
|hbase|0.92.1|
|gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
|goraci|https://github.com/ericnewton/goraci tagged 2012-04-08|
The change I made to goraci was to configure it for hbase and to allow it to
build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

2012-04-12 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252502#comment-13252502
]

Keith Turner commented on HBASE-5754:
-

When Eric ran 10 generators each adding 100M against HBase and no data was
lost, he saw 1B referenced, 0 unref, and 0 undef.

data lost with gora continuous ingest test (goraci)
---

Key: HBASE-5754
URL: https://issues.apache.org/jira/browse/HBASE-5754
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.1
Environment: 10 node test cluster
Reporter: Eric Newton
Assignee: stack

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-04-10 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250769#comment-13250769
]

Keith Turner commented on HBASE-5487:
-

Accumulo 1.3 cannot not survive running our random walk test w/ the agitator (a
perl script that kills accumulo processes, it not as devious as Todd's
gremlins).

Random walk + Agitation + Accumulo 1.3 == foobar

Attempting the above would leave Accumulo in an inconsistent state (like
corrupted metadata table) or test clients would die with unexpected exceptions.

My point is that while developing FATE it was nice to have Random Walk +
Agitation to really beat up the FATE framework and the FATE table operations.
We also wrote some new random walk test for 1.4 that were even meaner.

Generic framework for Master-coordinated tasks
--

Key: HBASE-5487
URL: https://issues.apache.org/jira/browse/HBASE-5487
Project: HBase
Issue Type: New Feature
Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Labels: noob

Need a framework to execute master-coordinated tasks in a fault-tolerant
manner.
Master-coordinated tasks such as online-scheme change and delete-range
(deleting region(s) based on start/end key) can make use of this framework.
The advantages of framework are
1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
master-coordinated tasks
2. Ability to abstract the common functions across Master - ZK and RS - ZK
3. Easy to plugin new master-coordinated tasks without adding code to core
components

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-04-10 Thread Keith Turner (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250770#comment-13250770
 ] 

Keith Turner commented on HBASE-5487:
-

To add context to my above comment, Accumulo 1.3 does not have FATE it was 
introduced in 1.4.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
  Labels: noob

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

2012-04-10 Thread Keith Turner (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251017#comment-13251017
 ] 

Keith Turner commented on HBASE-5754:
-

You may run into GORA-116, a bug in the gora-hbase store.

 data lost with gora continuous ingest test (goraci)
 ---

 Key: HBASE-5754
 URL: https://issues.apache.org/jira/browse/HBASE-5754
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
 Environment: 10 node test cluster
Reporter: Eric Newton
Assignee: stack

 Keith Turner re-wrote the accumulo continuous ingest test using gora, which 
 has both hbase and accumulo back-ends.
 I put a billion entries into HBase, and ran the Verify map/reduce job.  The 
 verification failed because about 21K entries were missing.  The goraci 
 [README|https://github.com/keith-turner/goraci] explains the test, and how it 
 detects missing data.
 I re-ran the test with 100 million entries, and it verified successfully.  
 Both of the times I tested using a billion entries, the verification failed.
 If I run the verification step twice, the results are consistent, so the 
 problem is
 probably not on the verify step.
 Here's the versions of the various packages:
 ||package||version||
 |hadoop|0.20.205.0|
 |hbase|0.92.1|
 |gora|http://svn.apache.org/repos/asf/gora/trunk r1311277|
 |goraci|https://github.com/ericnewton/goraci  tagged 2012-04-08|
 The change I made to goraci was to configure it for hbase and to allow it to 
 build properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

2012-04-04 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246443#comment-13246443
]

Keith Turner commented on HBASE-5479:
-

There is exception to what I said above. User requested compaction are still
done depth first with an optimization. If a user request a tablet with 30
files compact, it will allocate a compaction thread to compact that tablet to
one file. It still only does up to 10 files at time though.

* compact 10 smallest, results in 21 files
* compact 10 smallest, results in 12 files
* compact 3 smallest, results in 10 files --this is the optimization to
avoid redundant work
* compact 10 smallest, results in 1 file

Postpone CompactionSelection to compaction execution time
-

Key: HBASE-5479
URL: https://issues.apache.org/jira/browse/HBASE-5479
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Reporter: Matt Corgan

It can be commonplace for regionservers to develop long compaction queues,
meaning a CompactionRequest may execute hours after it was created. The
CompactionRequest holds a CompactionSelection that was selected at request
time but may no longer be the optimal selection. The CompactionSelection
should be created at compaction execution time rather than compaction request
time.
The current mechanism breaks down during high volume insertion. The
inefficiency is clearest when the inserts are finished. Inserting for 5
hours may build up 50 storefiles and a 40 element compaction queue. When
finished inserting, you would prefer that the next compaction merges all 50
files (or some large subset), but the current system will churn through each
of the 40 compaction requests, the first of which may be hours old. This
ends up re-compacting the same data many times.
The current system is especially inefficient when dealing with time series
data where the data in the storefiles has minimal overlap. With time series
data, there is even less benefit to intermediate merges because most
storefiles can be eliminated based on their key range during a read, even
without bloomfilters. The only goal should be to reduce file count, not to
minimize number of files merged for each read.
There are other aspects to the current queuing mechanism that would need to
be looked at. You would want to avoid having the same Store in the queue
multiple times. And you would want the completion of one compaction to
possibly queue another compaction request for the store.
A alternative architecture to the current style of queues would be to have
each Store (all open in memory) keep a compactionPriority score up to date
after events like flushes, compactions, schema changes, etc. Then you create
a CompactionPriorityComparator implements ComparatorStore and stick all
the Stores into a PriorityQueue (synchronized remove/add from the queue when
the value changes). The async compaction threads would keep pulling off the
head of that queue as long as the head has compactionPriority X.

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245528#comment-13245528
]

Keith Turner commented on HBASE-4821:
-

I am an Accumulo developer, there is some cruft in our test dir. The two most
successful cluster test we have are continuous ingest and random walk. We have
found lots of bugs w/ these test. I wrote a Gora version of continuous ingest
that should run against HBASE. The readme on github has a nice description.

https://github.com/keith-turner/goraci/

The accumulo version of continuous ingest can be found here.

http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/continuous/

This dir contains an old set of open office slides that also give an overview
of continuous ingest. At the end of the slides is the beginning of the idea of
random walk test. I am not sure if we have a nice description of random walk
anywhere. It is a fairly simple test framework. You write test nodes in Java
and link the nodes together in a graph using XML. You start a test clients
each node in a cluster. The test client just does a random walk of the test
graph. We have found a ton of bugs in 1.3 and 1.4 using random walk.

Actually the Accumulo features page may be the only place we give an overview
of randomwalk. I noticed that our random walk readme only tells you how to run
it, not what it is. Below is a link to the random walk test, but like I said
its not very informative.

http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/randomwalk/

The actual Java code at the link below. The framework and test nodes code is
all here.

http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/src/server/src/main/java/org/apache/accumulo/server/test/randomwalk/

The short description of randomwalk I mentioned is here.

http://accumulo.apache.org/notable_features.html#testing

If anyone is interested in generalizing random walk so that HBase could use it
to, let me know.

One last thing. We tested Accumulo for over a month on a 10 node cluster using
Continuous ingest, Random Walk, and the Agitator. Below are some of the bugs
we found during that time period.

[Bugs found in 1.4
testing|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=labels+%3D+14_qa_bug]

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

To properly verify that a particular version of HBase is good for production
deployment we need a better way to do real cluster testing after incremental
changes. Running unit tests is good, but we also need to deploy HBase to a
cluster, run integration tests, load tests, Thrift server tests, kill some
region servers, kill the master, and produce a report. All of this needs to
happen in 20-30 minutes with minimal manual intervention. I think this way we
can combine agile development with high stability of the codebase. I am
envisioning a high-level framework written in a scripting language (e.g.
Python) that would abstract external operations such as deploy to test
cluster, kill a particular server, run load test A, run load test B
(we already have a few kinds of load tests implemented in Java, and we could
write a Thrift load test in Python). This tool should also produce
intermediate output, allowing to catch problems early and restart the test.
No implementation has yet been done. Any ideas or suggestions are welcome.

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245545#comment-13245545
]

Keith Turner commented on HBASE-5487:
-

I am an Accumulo developer. CompactRange is our operation to force a range of
tablets(regions) to major compact all of their files into one file. The
TableRangeOp will merge a range of tablets into one tablet. TableRangeOp can
also delete a range of row from a table efficiently. It inserts splits points
at the rows you want to delete, drops the tablets, and then merges whats left.

Generic framework for Master-coordinated tasks
--

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245557#comment-13245557
]

Keith Turner commented on HBASE-4821:
-

Eric Newton has been experimenting with running goraci against HBase. One
issue he ran into was that gora-hbase uses auto flush on every HTable
connection. This really slowed down ingest. He modified the gora code locally
so it would not do this. Eric posted a question on the gora user list asking
why it behaved this way. The Gora API has a flush call.

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-04-03 Thread Keith Turner (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245596#comment-13245596
 ] 

Keith Turner commented on HBASE-5487:
-

The description of FATE given in this ticket is pretty good.  The following 
resources may provide a little more info.

http://people.apache.org/~kturner/accumulo14_15.pdf

http://mail-archives.apache.org/mod_mbox/incubator-accumulo-dev/201202.mbox/%3CCAGUtCHpcHTDue-C_2RyDkZm0diW=zojd7-bzcgszqdtidzn...@mail.gmail.com%3E

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
  Labels: noob

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-643) Rename tables

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245620#comment-13245620
]

Keith Turner commented on HBASE-643:

Accumulo supports this feature by using table ids. Tables ids are generated
using zookeeper and are never reused (base 36 numbers are used to keep them
short and readable). A mapping from table id to table name is stored in
zookeeper. To rename a table, lock the table and change the mapping in
zookeeper.

Accumulo used to not use table ids, it stored the table name in meta and hdfs.
Now it uses the table id in hdfs and meta. We were discussing renaming tables,
and it seemed so complicated. Then someone thought of this table id solution,
it was such an elegant solution and made the problem trivial.

Although table ids were implemented to support table renaming, they had the
nice side effect of making hdfs and meta entries much shorter.

Rename tables
-

Key: HBASE-643
URL: https://issues.apache.org/jira/browse/HBASE-643
Project: HBase
Issue Type: New Feature
Reporter: Michael Bieniosek
Attachments: copy_table.rb, rename_table.rb

It would be nice to be able to rename tables, if this is possible. Some of
our internal users are doing things like: upload table mytable - realize
they screwed up - upload table mytable_2 - decide mytable_2 looks better -
have to go on using mytable_2 instead of originally desired table name.

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

2012-04-03 Thread Keith Turner (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245705#comment-13245705
]

Keith Turner commented on HBASE-4821:
-

I noticed an early comment about Python code in the Accumulo test dir. This is
code in test/auto and we call these functional test. This code is probably
similar to some HBase unit test. It supports test that run against a live
instance of Accumulo. The test framework starts an instance of Accumulo, runs
a python or JAva test against that instance, and then shuts the instance down.
Running all of the functional test takes 1 to 2 hours.

This test framework was written before random walk and it ensures basic
functionality works. For example theres a test to verify that adding split
points to a table continues to work. Since we have implemented random walk, I
have found myself writing a lot more random walk test and less functional test.
The reason for this is that the functional test usually test the feature when
the system is one state, where as random walk test the same feature with the
system in many different states. For example a random walk test that adds
splits points to a table will try to do that when the table and system are in
many different states. It may try to add the split when a tablet/region is
migrating, currently splitting, minor compacting, major compacting, offline,
etc. So the likelyhood of finding a bug with addsplits() in randomwalk is
much greater than the functional test. The functional test will detect if the
feature is completely broken, random walk can detect if the feature is broken
under certain circumstances.

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

[jira] [Commented] (HBASE-5090) Allow the user to specify inclusiveness of start/end rows on Scan.

2012-01-03 Thread Keith Turner (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178930#comment-13178930
 ] 

Keith Turner commented on HBASE-5090:
-

One way to achieve this with the current API is to append a binary zero to the 
row.  This is the next possible row in sorted order.  Doing this will make the 
start row exclusive or the end row inclusive.  

 Allow the user to specify inclusiveness of start/end rows on Scan.
 --

 Key: HBASE-5090
 URL: https://issues.apache.org/jira/browse/HBASE-5090
 Project: HBase
  Issue Type: Improvement
Reporter: Ioannis Canellos

 Currently Scans handle start/end rows with the following manner:
 startRow - row to start scanner at or after (inclusive)
 stopRow - row to stop scanner before (exclusive)
 It would be great if the user could be able to specify the 
 inclussiveness/exclusiveness.
 For example 2 those two additional methods would be really needed:
 public Scan setStartRow(byte[] startRow, boolean inclussive);
 public Scan setEndRow(byte[] startRow, boolean inclussive);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[jira] [Commented] (HBASE-5754) data lost with gora continuous ingest test (goraci)

[jira] [Commented] (HBASE-5479) Postpone CompactionSelection to compaction execution time

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[jira] [Commented] (HBASE-643) Rename tables

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[jira] [Commented] (HBASE-5090) Allow the user to specify inclusiveness of start/end rows on Scan.

13 matches

Site Navigation

Mail list logo

Footer information