Re: Waiting for accumulo to be initialized

Aji Janis Thu, 28 Mar 2013 05:57:16 -0700

Krishmin,

Thank you for the response. Its always great to hear from someone who has
tried out the steps (even if you had a different issue). Like you said I am
not really sure what caused the crash in our evn in the first place but
having a plan is always good...


Thanks again all,
Aji


On Wed, Mar 27, 2013 at 5:00 PM, Krishmin Rai <[email protected]> wrote:

> Hi Aji,
> I wrote the original question linked below (about re-initing Accumulo over
> an existing installation).  For what it's worth, I believe that my
> ZooKeeper data loss was related to the linux+java leap second 
> bug<https://access.redhat.com/knowledge/articles/15145> -- not
> likely to be affecting you now (I did not go back and attempt to re-create
> the issue, so it's also possible there were other compounding issues). We
> have not encountered any ZK data-loss problems since.
>
> At the time, I did some basic experiments to understand the process
> better, and successfully followed (essentially) the steps Eric has
> described. The only real difficulty I had was identifying which directories
> corresponded to which tables; I ended up iterating over individual RFiles
> and manually identifying tables based on expected data. This was a somewhat
> painful process, but at least made me confident that it would be possible
> in production.
>
> It's also important to note that, at least according to my understanding,
> this procedure still potentially loses data: mutations written after the
> last minor compaction will only have reached the write-ahead-logs and will
> not be available in the raw RFiles you're importing from.
>
> -Krishmin
>
> On Mar 27, 2013, at 4:45 PM, Aji Janis wrote:
>
> Eric, Really appreciate you jotting this down. Too late to try it out this
> time but will give this a try (if, hopefully not) there is a next time to
> be had.
>
> Thanks again.
>
>
>
> On Wed, Mar 27, 2013 at 4:19 PM, Eric Newton <[email protected]>wrote:
>
>> I should write this up in the user manual.  It's not that hard, but it's
>> really not the first thing you want to tackle while learning how to use
>> accumulo.  I just opened 
>> ACCUMULO-1217<https://issues.apache.org/jira/browse/ACCUMULO-1217> to
>> do that.
>>
>> I wrote this from memory: expect errors.  Needless to say, you would only
>> want to do this when you are more comfortable with hadoop, zookeeper and
>> accumulo.
>>
>> First, get zookeeper up and running, even if you have delete all its
>> data.
>>
>> Next, attempt to determine the mapping of table names to tableIds.  You
>> can do this in the shell when your accumulo instance is healthy.  If it
>> isn't healthy, you will have to guess based on the data in the files in
>> HDFS.
>>
>> So, for example, the table "trace" is probably table id "1".  You can
>> find the files for trace in /accumulo/tables/1.
>>
>> Don't worry if you get the names wrong.  You can always rename the tables
>> later.
>>
>> Move the old files for accumulo out of the way and re-initialize:
>>
>> $ hadoop fs -mv /accumulo /accumulo-old
>> $ ./bin/accumulo init
>> $ ./bin/start-all.sh
>>
>> Recreate your tables:
>>
>> $ ./bin/accumulo shell -u root -p mysecret
>> shell > createtable table1
>>
>> Learn the new table id mapping:
>> shell > tables -l
>> !METADATA => !0
>> trace => 1
>> table1 => 2
>> ...
>>
>> Bulk import all your data back into the new table ids:
>> Assuming you have determined that "table1" used to be table id "a" and is
>> now "2",
>> you do something like this:
>>
>> $ hadoop fs -mkdir /tmp/failed
>> $ ./bin/accumulo shell -u root -p mysecret
>> shell > table table1
>> shell table1 > importdirectory /accumulo-old/tables/a/default_tablet
>> /tmp/failed true
>>
>> There are lots of directories under every table id directory.  You will
>> need to import each of them.  I suggest creating a script and passing it to
>> the shell on the command line.
>>
>> I know of instances in which trillions of entries were recovered and
>> available in a matter of hours.
>>
>> -Eric
>>
>>
>>
>> On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <[email protected]> wrote:
>>
>>> when you say " you can move the files aside in HDFS" .. which files are
>>> you referring to? I have never set up zookeeper myself so I am not aware of
>>> all the changes needed.
>>>
>>>
>>>
>>> On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <[email protected]>wrote:
>>>
>>>> If you lose zookeeper, you can move the files aside in HDFS, recreate
>>>> your instance in zookeeper and bulk import all of the old files.  It's not
>>>> perfect: you lose table configurations, split points and user permissions,
>>>> but you do preserve most of the data.
>>>>
>>>> You can back up each of these bits of information periodically if you
>>>> like.  Outside of the files in HDFS, the configuration information is
>>>> pretty small.
>>>>
>>>> -Eric
>>>>
>>>>
>>>>
>>>> On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis <[email protected]> wrote:
>>>>
>>>>> Eric and Josh thanks for all your feedback. We ended up *loosing all
>>>>> our accumulo data* because I had to reformat hadoop. Here is in a
>>>>> nutshell what I did:
>>>>>
>>>>>
>>>>>    1. Stop accumulo
>>>>>    2. Stop hadoop
>>>>>    3. On hadoop master and all datanodes, from dfs.data.dir
>>>>>    (hdfs-site.xml) remove everything under the data folder
>>>>>    4. On hadoop master, from dfs.name.dir (hdfs-site.xml) remove
>>>>>    everything under the name folder
>>>>>    5. As hadoop user, execute.../hadoop/bin/hadoop namenode -format
>>>>>    6. As hadoop user, execute.../hadoop/bin/start-all.sh ==> should
>>>>>    populate data/ and name/ dirs that was erased in steps 3, 4.
>>>>>    7. Initialized Accumulo - as accumulo user,
>>>>>     ../accumulo/bin/accumulo init (I created a new instance)
>>>>>    8. Start accumulo
>>>>>
>>>>> I was wondering if anyone had suggestions or thoughts on how I could
>>>>> have solved the original issue of accumulo waiting initialization without
>>>>> loosing my accumulo data? Is it possible to do so?
>>>>>
>>>>
>>>>
>>>
>>
>
>

Re: Waiting for accumulo to be initialized

Reply via email to