Re: Waiting for accumulo to be initialized

2013-03-27 Thread Krishmin Rai
Hi Aji,
I wrote the original question linked below (about re-initing Accumulo over an 
existing installation).  For what it's worth, I believe that my ZooKeeper data 
loss was related to the linux+java leap second bug -- not likely to be 
affecting you now (I did not go back and attempt to re-create the issue, so 
it's also possible there were other compounding issues). We have not 
encountered any ZK data-loss problems since. 

At the time, I did some basic experiments to understand the process better, and 
successfully followed (essentially) the steps Eric has described. The only real 
difficulty I had was identifying which directories corresponded to which 
tables; I ended up iterating over individual RFiles and manually identifying 
tables based on expected data. This was a somewhat painful process, but at 
least made me confident that it would be possible in production.

It's also important to note that, at least according to my understanding, this 
procedure still potentially loses data: mutations written after the last minor 
compaction will only have reached the write-ahead-logs and will not be 
available in the raw RFiles you're importing from.

-Krishmin

On Mar 27, 2013, at 4:45 PM, Aji Janis wrote:

 Eric, Really appreciate you jotting this down. Too late to try it out this 
 time but will give this a try (if, hopefully not) there is a next time to be 
 had. 
 
 Thanks again.
 
 
 
 On Wed, Mar 27, 2013 at 4:19 PM, Eric Newton eric.new...@gmail.com wrote:
 I should write this up in the user manual.  It's not that hard, but it's 
 really not the first thing you want to tackle while learning how to use 
 accumulo.  I just opened ACCUMULO-1217 to do that.
 
 I wrote this from memory: expect errors.  Needless to say, you would only 
 want to do this when you are more comfortable with hadoop, zookeeper and 
 accumulo. 
 
 First, get zookeeper up and running, even if you have delete all its data.  
 
 Next, attempt to determine the mapping of table names to tableIds.  You can 
 do this in the shell when your accumulo instance is healthy.  If it isn't 
 healthy, you will have to guess based on the data in the files in HDFS.
 
 So, for example, the table trace is probably table id 1.  You can find 
 the files for trace in /accumulo/tables/1.
 
 Don't worry if you get the names wrong.  You can always rename the tables 
 later. 
 
 Move the old files for accumulo out of the way and re-initialize:
 
 $ hadoop fs -mv /accumulo /accumulo-old
 $ ./bin/accumulo init
 $ ./bin/start-all.sh
 
 Recreate your tables:
 
 $ ./bin/accumulo shell -u root -p mysecret
 shell  createtable table1
 
 Learn the new table id mapping:
 shell  tables -l
 !METADATA = !0
 trace = 1
 table1 = 2
 ...
 
 Bulk import all your data back into the new table ids:
 Assuming you have determined that table1 used to be table id a and is now 
 2,
 you do something like this:
 
 $ hadoop fs -mkdir /tmp/failed
 $ ./bin/accumulo shell -u root -p mysecret
 shell  table table1
 shell table1  importdirectory /accumulo-old/tables/a/default_tablet 
 /tmp/failed true
 
 There are lots of directories under every table id directory.  You will need 
 to import each of them.  I suggest creating a script and passing it to the 
 shell on the command line.
 
 I know of instances in which trillions of entries were recovered and 
 available in a matter of hours.
 
 -Eric
 
 
 
 On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis aji1...@gmail.com wrote:
 when you say  you can move the files aside in HDFS .. which files are you 
 referring to? I have never set up zookeeper myself so I am not aware of all 
 the changes needed.
 
 
 
 On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton eric.new...@gmail.com wrote:
 If you lose zookeeper, you can move the files aside in HDFS, recreate your 
 instance in zookeeper and bulk import all of the old files.  It's not 
 perfect: you lose table configurations, split points and user permissions, 
 but you do preserve most of the data.
 
 You can back up each of these bits of information periodically if you like.  
 Outside of the files in HDFS, the configuration information is pretty small.
 
 -Eric
 
 
 
 On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis aji1...@gmail.com wrote:
 Eric and Josh thanks for all your feedback. We ended up loosing all our 
 accumulo data because I had to reformat hadoop. Here is in a nutshell what I 
 did:
 
 Stop accumulo 
 Stop hadoop
 On hadoop master and all datanodes, from dfs.data.dir (hdfs-site.xml) remove 
 everything under the data folder
 On hadoop master, from dfs.name.dir (hdfs-site.xml) remove everything under 
 the name folder
 As hadoop user, execute.../hadoop/bin/hadoop namenode -format
 As hadoop user, execute.../hadoop/bin/start-all.sh == should populate data/ 
 and name/ dirs that was erased in steps 3, 4.
 Initialized Accumulo - as accumulo user,  ../accumulo/bin/accumulo init (I 
 created a new instance)
 Start accumulo
 I was wondering if anyone had suggestions or thoughts on how I could have 

Re: Memory setting recommendations for Accumulo / Hadoop

2013-03-12 Thread Krishmin Rai
Have you also increased the maximum number of processes (nproc in the same 
file)? I have definitely seen this kind of error as a result of in 
insufficiently large process limit.

Some more details, maybe, on these pages:

http://ww2.cs.fsu.edu/~czhang/errors.html
http://incubator.apache.org/ambari/1.2.0/installing-hadoop-using-ambari/content/ambari-chap5-3-1.html

-Krishmin

On Mar 12, 2013, at 1:52 PM, Mike Hugo wrote:

 Eventually it will be 4 nodes, this particular test was running on a single 
 node
 
 hadoop version is 1.0.4
 
 we already upped the limits in /etc/security/limits.conf to:
 
 usernameherehardnofile   16384
 
 Mike
 
 
 On Tue, Mar 12, 2013 at 12:49 PM, Krishmin Rai kr...@missionfoc.us wrote:
 Hi Mike,
   This could be related to the maximum number of processes or files allowed 
 for your linux user. You might try bumping these values up (e.g via 
 /etc/security/limits.conf).
 
 -Krishmin
 
 On Mar 12, 2013, at 1:35 PM, Mike Hugo wrote:
 
  Hello,
 
  I'm setting up accumulo on a small cluster where each node has 96GB of ram 
  and 24 cores.  Any recommendations on what memory settings to use for the 
  accumulo processes, as well as what to use for the hadoop processes (e.g. 
  datanode, etc)?
 
  I did a small test just to try some things standalone on a single node, 
  setting the accumulo processes to 2GB of ram and the HADOOP_HEAPSIZE=2000.  
  While running a map reduce job with 4 workers (each allocated 1GB of RAM), 
  the datanode runs out of memory about 25% of the way into the job and dies. 
   The job is basically building an index, iterating over data in one table 
  and applying mutations to another - nothing too fancy.
 
  Since I'm dealing with a subset of data, I set the table split threshold to 
  128M for testing purposes, there are currently about 170 tablets so we not 
  dealing with a ton of data here. Might this low split threshold be a 
  contributing factor?
 
  Should I increase the HADDOP_HEAPSIZE even further?  Or will that just 
  delay the inevitable OOM error?
 
  The exception we are seeing is below.
 
  ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
  DatanodeRegistration(...):DataXceiveServer: Exiting due 
  to:java.lang.OutOfMemoryError: unable to create new native thread
  at java.lang.Thread.start0(Native Method)
  at java.lang.Thread.start(Unknown Source)
  at 
  org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133)
  at java.lang.Thread.run(Unknown Source)
 
 
  Thanks for your help!
 
  Mike
 
 



Re: Number of partitions for sharded table

2012-10-30 Thread Krishmin Rai
I should clarify that I've been pre-splitting tables at each shard so that each 
tablet consists of a single row.

On Oct 30, 2012, at 3:06 PM, Krishmin Rai wrote:

 Hi All, 
  We're working with an index table whose row is a shardId (an integer, like 
 the wiki-search or IndexedDoc examples). I was just wondering what the right 
 strategy is for choosing a number of partitions, particularly given a cluster 
 that could potentially grow.
 
  If I simply set the number of shards equal to the number of slave nodes, 
 additional nodes would not improve query performance (at least over the data 
 already ingested). But starting with more partitions than slave nodes would 
 result in multiple tablets per tablet server… I'm not really sure how that 
 would impact performance, particularly given that all queries against the 
 table will be batchscanners with an infinite range.
 
  Just wondering how others have addressed this problem, and if there are any 
 performance rules of thumb regarding the ratio of tablets to tablet servers.
 
 Thanks!
 Krishmin



Re: Number of partitions for sharded table

2012-10-30 Thread Krishmin Rai
Hi Dave,
  Our example is simpler than the wiki-search one, and I had forgotten exactly 
how wiki-search is structured: We're using a simple, single-table layout, 
everything sharded. We also have some other stuff in the column family, but 
simplified, it's just:

row: ShardID
colFam: Term
colQual: Document

And then searches will use an iterator extending IntersectingIterator to find 
results matching specified terms etc.

Thanks,
Krishmin

On Oct 30, 2012, at 3:43 PM, dlmar...@comcast.net wrote:

 Krishmin,
 
   In the wikisearch example there is a non-sharded index table and a sharded 
 document table. The index table is used to reduce the number of tablets that 
 need to be searched for a given set of terms. Is your setup similar? I'm a 
 little confused since you mention using a sharded index table and that all 
 queries will have an infinite range.
 
 Dave Marion
 
 From: Krishmin Rai kr...@missionfoc.us
 To: user@accumulo.apache.org
 Sent: Tuesday, October 30, 2012 3:28:15 PM
 Subject: Re: Number of partitions for sharded table
 
 I should clarify that I've been pre-splitting tables at each shard so that 
 each tablet consists of a single row.
 
 On Oct 30, 2012, at 3:06 PM, Krishmin Rai wrote:
 
  Hi All, 
   We're working with an index table whose row is a shardId (an integer, like 
  the wiki-search or IndexedDoc examples). I was just wondering what the 
  right strategy is for choosing a number of partitions, particularly given a 
  cluster that could potentially grow.
  
   If I simply set the number of shards equal to the number of slave nodes, 
  additional nodes would not improve query performance (at least over the 
  data already ingested). But starting with more partitions than slave nodes 
  would result in multiple tablets per tablet server… I'm not really sure how 
  that would impact performance, particularly given that all queries against 
  the table will be batchscanners with an infinite range.
  
   Just wondering how others have addressed this problem, and if there are 
  any performance rules of thumb regarding the ratio of tablets to tablet 
  servers.
  
  Thanks!
  Krishmin
 



Re: Number of partitions for sharded table

2012-10-30 Thread Krishmin Rai
Thanks, Adam… that's exactly what I was looking for, and gives me a lot to 
think about.

-Krishmin

On Oct 30, 2012, at 4:08 PM, Adam Fuchs wrote:

 Krishmin,
 
 There are a few extremes to keep in mind when choosing a manual partitioning 
 strategy:
 1. Parallelism and balance at ingest time. You need to find a happy medium 
 between too few partitions (not enough parallelism) and too many partitions 
 (tablet server resource contention and inefficient indexes). Probably at 
 least one partition per tablet server being actively written to is good, and 
 you'll want to pre-split so they can be distributed evenly. Ten partitions 
 per tablet server is probably not too many -- I wouldn't expect to see 
 contention at that point.
 2. Parallelism and balance at query time. At query time, you'll be selecting 
 a subset of all of the partitions that you've ever ingested into. This subset 
 should be bounded similarly to the concern addressed in #1, but the bounds 
 could be looser depending on the types of queries you want to run. Lower 
 latency queries would tend to favor only a few partitions per node.
 3. Growth over time in partition size. Over time, you want partitions to be 
 bounded to less than about 10-100GB. This has to do with limiting the maximum 
 amount of time that a major compaction will take, and impacts availability 
 and performance in the extreme cases. At the same time, you want partitions 
 to be as large as possible so that their indexes are more efficient.
 
 One strategy to optimize partition size would be to keep using each partition 
 until it is full, then make another partition. Another would be to allocate 
 a certain number of partitions per day, and then only put data in those 
 partitions during that day. These strategies are also elastic, and can be 
 tweaked as the cluster grows.
 
 In all of these cases, you will need a good load balancing strategy. The 
 default strategy of evening up the number of partitions per tablet server is 
 probably not sufficient, so you may need to write your own tablet load 
 balancer that is aware of your partitioning strategy.
 
 Cheers,
 Adam
 
 
 
 On Tue, Oct 30, 2012 at 3:06 PM, Krishmin Rai kr...@missionfoc.us wrote:
 Hi All,
   We're working with an index table whose row is a shardId (an integer, like 
 the wiki-search or IndexedDoc examples). I was just wondering what the right 
 strategy is for choosing a number of partitions, particularly given a cluster 
 that could potentially grow.
 
   If I simply set the number of shards equal to the number of slave nodes, 
 additional nodes would not improve query performance (at least over the data 
 already ingested). But starting with more partitions than slave nodes would 
 result in multiple tablets per tablet server… I'm not really sure how that 
 would impact performance, particularly given that all queries against the 
 table will be batchscanners with an infinite range.
 
   Just wondering how others have addressed this problem, and if there are any 
 performance rules of thumb regarding the ratio of tablets to tablet servers.
 
 Thanks!
 Krishmin
 



Re-init Accumulo over existing installation

2012-07-05 Thread Krishmin Rai
Hi All,
  We've recently encountered a strange situation on a small test cluster: after 
an awkward crash, our ZooKeeper data was erased and we no longer have the 
[accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and 
etc are still there, but it's not clear how best to bring Accumulo back up to 
its previous state. Obviously just starting Accumulo as-is complains about the 
missing znode (Waiting for accumulo to be initialized), whereas 
re-initializing is not possible over existing HDFS directories (It appears 
this location was previously initialized, exiting).

  A couple of questions about recovery strategies:

1) Is there any way to re-create the znode for a previous instance-id? My 
understanding is that ZK is mostly used to store ephemeral data (such as which 
tserver is currently responsible for which tablets) and things like users 
(which we could re-create), so perhaps this is plausible?

2) I imagine that I could init Accumulo with a new instance.dfs.dir, then 
import the RFiles from the old installation back in. I see Patrick just asked a 
related question, so, with the data integrity caveats, I would essentially be 
following the last of the steps in ACCUMULO-456.

3) This is a vague question, but have any of you had experience with the 
[accumulo] znode being entirely deleted? Aside from stopping/starting ZK 
(3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we 
could have done to actually delete it.

This is just a test instance, and the data could easily be recreated, but I 
want to take this opportunity to learn a little more about Accumulo plumbing 
and maintenance.

Thanks,
Krishmin