Re: Waiting for accumulo to be initialized
Hi Aji, I wrote the original question linked below (about re-initing Accumulo over an existing installation). For what it's worth, I believe that my ZooKeeper data loss was related to the linux+java leap second bug -- not likely to be affecting you now (I did not go back and attempt to re-create the issue, so it's also possible there were other compounding issues). We have not encountered any ZK data-loss problems since. At the time, I did some basic experiments to understand the process better, and successfully followed (essentially) the steps Eric has described. The only real difficulty I had was identifying which directories corresponded to which tables; I ended up iterating over individual RFiles and manually identifying tables based on expected data. This was a somewhat painful process, but at least made me confident that it would be possible in production. It's also important to note that, at least according to my understanding, this procedure still potentially loses data: mutations written after the last minor compaction will only have reached the write-ahead-logs and will not be available in the raw RFiles you're importing from. -Krishmin On Mar 27, 2013, at 4:45 PM, Aji Janis wrote: Eric, Really appreciate you jotting this down. Too late to try it out this time but will give this a try (if, hopefully not) there is a next time to be had. Thanks again. On Wed, Mar 27, 2013 at 4:19 PM, Eric Newton eric.new...@gmail.com wrote: I should write this up in the user manual. It's not that hard, but it's really not the first thing you want to tackle while learning how to use accumulo. I just opened ACCUMULO-1217 to do that. I wrote this from memory: expect errors. Needless to say, you would only want to do this when you are more comfortable with hadoop, zookeeper and accumulo. First, get zookeeper up and running, even if you have delete all its data. Next, attempt to determine the mapping of table names to tableIds. You can do this in the shell when your accumulo instance is healthy. If it isn't healthy, you will have to guess based on the data in the files in HDFS. So, for example, the table trace is probably table id 1. You can find the files for trace in /accumulo/tables/1. Don't worry if you get the names wrong. You can always rename the tables later. Move the old files for accumulo out of the way and re-initialize: $ hadoop fs -mv /accumulo /accumulo-old $ ./bin/accumulo init $ ./bin/start-all.sh Recreate your tables: $ ./bin/accumulo shell -u root -p mysecret shell createtable table1 Learn the new table id mapping: shell tables -l !METADATA = !0 trace = 1 table1 = 2 ... Bulk import all your data back into the new table ids: Assuming you have determined that table1 used to be table id a and is now 2, you do something like this: $ hadoop fs -mkdir /tmp/failed $ ./bin/accumulo shell -u root -p mysecret shell table table1 shell table1 importdirectory /accumulo-old/tables/a/default_tablet /tmp/failed true There are lots of directories under every table id directory. You will need to import each of them. I suggest creating a script and passing it to the shell on the command line. I know of instances in which trillions of entries were recovered and available in a matter of hours. -Eric On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis aji1...@gmail.com wrote: when you say you can move the files aside in HDFS .. which files are you referring to? I have never set up zookeeper myself so I am not aware of all the changes needed. On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton eric.new...@gmail.com wrote: If you lose zookeeper, you can move the files aside in HDFS, recreate your instance in zookeeper and bulk import all of the old files. It's not perfect: you lose table configurations, split points and user permissions, but you do preserve most of the data. You can back up each of these bits of information periodically if you like. Outside of the files in HDFS, the configuration information is pretty small. -Eric On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis aji1...@gmail.com wrote: Eric and Josh thanks for all your feedback. We ended up loosing all our accumulo data because I had to reformat hadoop. Here is in a nutshell what I did: Stop accumulo Stop hadoop On hadoop master and all datanodes, from dfs.data.dir (hdfs-site.xml) remove everything under the data folder On hadoop master, from dfs.name.dir (hdfs-site.xml) remove everything under the name folder As hadoop user, execute.../hadoop/bin/hadoop namenode -format As hadoop user, execute.../hadoop/bin/start-all.sh == should populate data/ and name/ dirs that was erased in steps 3, 4. Initialized Accumulo - as accumulo user, ../accumulo/bin/accumulo init (I created a new instance) Start accumulo I was wondering if anyone had suggestions or thoughts on how I could have
Re: Memory setting recommendations for Accumulo / Hadoop
Have you also increased the maximum number of processes (nproc in the same file)? I have definitely seen this kind of error as a result of in insufficiently large process limit. Some more details, maybe, on these pages: http://ww2.cs.fsu.edu/~czhang/errors.html http://incubator.apache.org/ambari/1.2.0/installing-hadoop-using-ambari/content/ambari-chap5-3-1.html -Krishmin On Mar 12, 2013, at 1:52 PM, Mike Hugo wrote: Eventually it will be 4 nodes, this particular test was running on a single node hadoop version is 1.0.4 we already upped the limits in /etc/security/limits.conf to: usernameherehardnofile 16384 Mike On Tue, Mar 12, 2013 at 12:49 PM, Krishmin Rai kr...@missionfoc.us wrote: Hi Mike, This could be related to the maximum number of processes or files allowed for your linux user. You might try bumping these values up (e.g via /etc/security/limits.conf). -Krishmin On Mar 12, 2013, at 1:35 PM, Mike Hugo wrote: Hello, I'm setting up accumulo on a small cluster where each node has 96GB of ram and 24 cores. Any recommendations on what memory settings to use for the accumulo processes, as well as what to use for the hadoop processes (e.g. datanode, etc)? I did a small test just to try some things standalone on a single node, setting the accumulo processes to 2GB of ram and the HADOOP_HEAPSIZE=2000. While running a map reduce job with 4 workers (each allocated 1GB of RAM), the datanode runs out of memory about 25% of the way into the job and dies. The job is basically building an index, iterating over data in one table and applying mutations to another - nothing too fancy. Since I'm dealing with a subset of data, I set the table split threshold to 128M for testing purposes, there are currently about 170 tablets so we not dealing with a ton of data here. Might this low split threshold be a contributing factor? Should I increase the HADDOP_HEAPSIZE even further? Or will that just delay the inevitable OOM error? The exception we are seeing is below. ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(...):DataXceiveServer: Exiting due to:java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:133) at java.lang.Thread.run(Unknown Source) Thanks for your help! Mike
Re: Number of partitions for sharded table
I should clarify that I've been pre-splitting tables at each shard so that each tablet consists of a single row. On Oct 30, 2012, at 3:06 PM, Krishmin Rai wrote: Hi All, We're working with an index table whose row is a shardId (an integer, like the wiki-search or IndexedDoc examples). I was just wondering what the right strategy is for choosing a number of partitions, particularly given a cluster that could potentially grow. If I simply set the number of shards equal to the number of slave nodes, additional nodes would not improve query performance (at least over the data already ingested). But starting with more partitions than slave nodes would result in multiple tablets per tablet server… I'm not really sure how that would impact performance, particularly given that all queries against the table will be batchscanners with an infinite range. Just wondering how others have addressed this problem, and if there are any performance rules of thumb regarding the ratio of tablets to tablet servers. Thanks! Krishmin
Re: Number of partitions for sharded table
Hi Dave, Our example is simpler than the wiki-search one, and I had forgotten exactly how wiki-search is structured: We're using a simple, single-table layout, everything sharded. We also have some other stuff in the column family, but simplified, it's just: row: ShardID colFam: Term colQual: Document And then searches will use an iterator extending IntersectingIterator to find results matching specified terms etc. Thanks, Krishmin On Oct 30, 2012, at 3:43 PM, dlmar...@comcast.net wrote: Krishmin, In the wikisearch example there is a non-sharded index table and a sharded document table. The index table is used to reduce the number of tablets that need to be searched for a given set of terms. Is your setup similar? I'm a little confused since you mention using a sharded index table and that all queries will have an infinite range. Dave Marion From: Krishmin Rai kr...@missionfoc.us To: user@accumulo.apache.org Sent: Tuesday, October 30, 2012 3:28:15 PM Subject: Re: Number of partitions for sharded table I should clarify that I've been pre-splitting tables at each shard so that each tablet consists of a single row. On Oct 30, 2012, at 3:06 PM, Krishmin Rai wrote: Hi All, We're working with an index table whose row is a shardId (an integer, like the wiki-search or IndexedDoc examples). I was just wondering what the right strategy is for choosing a number of partitions, particularly given a cluster that could potentially grow. If I simply set the number of shards equal to the number of slave nodes, additional nodes would not improve query performance (at least over the data already ingested). But starting with more partitions than slave nodes would result in multiple tablets per tablet server… I'm not really sure how that would impact performance, particularly given that all queries against the table will be batchscanners with an infinite range. Just wondering how others have addressed this problem, and if there are any performance rules of thumb regarding the ratio of tablets to tablet servers. Thanks! Krishmin
Re: Number of partitions for sharded table
Thanks, Adam… that's exactly what I was looking for, and gives me a lot to think about. -Krishmin On Oct 30, 2012, at 4:08 PM, Adam Fuchs wrote: Krishmin, There are a few extremes to keep in mind when choosing a manual partitioning strategy: 1. Parallelism and balance at ingest time. You need to find a happy medium between too few partitions (not enough parallelism) and too many partitions (tablet server resource contention and inefficient indexes). Probably at least one partition per tablet server being actively written to is good, and you'll want to pre-split so they can be distributed evenly. Ten partitions per tablet server is probably not too many -- I wouldn't expect to see contention at that point. 2. Parallelism and balance at query time. At query time, you'll be selecting a subset of all of the partitions that you've ever ingested into. This subset should be bounded similarly to the concern addressed in #1, but the bounds could be looser depending on the types of queries you want to run. Lower latency queries would tend to favor only a few partitions per node. 3. Growth over time in partition size. Over time, you want partitions to be bounded to less than about 10-100GB. This has to do with limiting the maximum amount of time that a major compaction will take, and impacts availability and performance in the extreme cases. At the same time, you want partitions to be as large as possible so that their indexes are more efficient. One strategy to optimize partition size would be to keep using each partition until it is full, then make another partition. Another would be to allocate a certain number of partitions per day, and then only put data in those partitions during that day. These strategies are also elastic, and can be tweaked as the cluster grows. In all of these cases, you will need a good load balancing strategy. The default strategy of evening up the number of partitions per tablet server is probably not sufficient, so you may need to write your own tablet load balancer that is aware of your partitioning strategy. Cheers, Adam On Tue, Oct 30, 2012 at 3:06 PM, Krishmin Rai kr...@missionfoc.us wrote: Hi All, We're working with an index table whose row is a shardId (an integer, like the wiki-search or IndexedDoc examples). I was just wondering what the right strategy is for choosing a number of partitions, particularly given a cluster that could potentially grow. If I simply set the number of shards equal to the number of slave nodes, additional nodes would not improve query performance (at least over the data already ingested). But starting with more partitions than slave nodes would result in multiple tablets per tablet server… I'm not really sure how that would impact performance, particularly given that all queries against the table will be batchscanners with an infinite range. Just wondering how others have addressed this problem, and if there are any performance rules of thumb regarding the ratio of tablets to tablet servers. Thanks! Krishmin
Re-init Accumulo over existing installation
Hi All, We've recently encountered a strange situation on a small test cluster: after an awkward crash, our ZooKeeper data was erased and we no longer have the [accumulo] znode. The HDFS accumulo directory is intact, so all the RFiles and etc are still there, but it's not clear how best to bring Accumulo back up to its previous state. Obviously just starting Accumulo as-is complains about the missing znode (Waiting for accumulo to be initialized), whereas re-initializing is not possible over existing HDFS directories (It appears this location was previously initialized, exiting). A couple of questions about recovery strategies: 1) Is there any way to re-create the znode for a previous instance-id? My understanding is that ZK is mostly used to store ephemeral data (such as which tserver is currently responsible for which tablets) and things like users (which we could re-create), so perhaps this is plausible? 2) I imagine that I could init Accumulo with a new instance.dfs.dir, then import the RFiles from the old installation back in. I see Patrick just asked a related question, so, with the data integrity caveats, I would essentially be following the last of the steps in ACCUMULO-456. 3) This is a vague question, but have any of you had experience with the [accumulo] znode being entirely deleted? Aside from stopping/starting ZK (3.3.5) and Accumulo 1.4.0 (possibly with a force-quit), I'm not sure what we could have done to actually delete it. This is just a test instance, and the data could easily be recreated, but I want to take this opportunity to learn a little more about Accumulo plumbing and maintenance. Thanks, Krishmin