Re: problem installing accumulo

Tim Piety Thu, 03 Jan 2013 11:27:00 -0800

John,

I guess a little of both. In the Enron email set I have a bunch of folders
representing people. Each folder has subfolders that equate to mailboxes
(inbox, sent_mail, etc...). Each mailbox simply contains text files named
1, 2, 3, 4 that equate to an individual email.

Each email is a text file that if easy to parse into specific fields.  I
want to place those emails in accumulo and run some simple MapReduce for
the demo. Similar to what I saw in some *Cloudbase *training last year.
What I didn't remember is how the tables were arranged.

I was just going to make each email, regardless of mailbox as a row in
accumulo and make make the mailbox and owner separate columns (or column
qualifier to be more specific). My issue is the To and CC fields. Each can
be a list. I was thinking of making the column family to and the column
qualifier 1,2,3, ...).  I could also make the column qualifier for the to
family the actual value "[email protected]". I wasn't exactly sure of the best
way.

Each email has a Message_ID and so far I think they are unique. If not I
can generate a unique ID.

Again this will be for a simple demo where people may want to search from
some person, to some person and maybe for specific terms in the body of the
email.

Hope this gives a good idea of what I am trying to do. Feel free to ask any
other questions you may have if I wasn';t clear enough. Again I have more
experience working with existing structures. I am trying t use this
experience to learn a little about how to organize the data.

thanks in advance,

Tim

On Thu, Jan 3, 2013 at 12:28 PM, John Vines <[email protected]> wrote:

> Are you looking for generic pointers for it or do you have specific
> questions? Feel free to ask away and someone will be able to help.
>
> John
>
>
> On Thu, Jan 3, 2013 at 12:23 PM, Tim Piety <[email protected]> wrote:
>
>> John,
>>
>> No I hadn't. Thank you that was it. I to another look at the install doc
>> and didn't see this step in there.  I then looked at the README file on the
>> ACCUMULO website and it is in there.
>> I was able to start accumulo and then start an accumulo shell and execute
>> the tables command and it listed !METADATA. I presume that this means I am
>> up and running.
>>
>> I am going to use the enron dataset for my demo. I do have a few
>> questions regarding how to structure it if you don't mind a few more
>> questions.
>>
>>
>> thanks again.
>>
>> Tim
>>
>>
>> On Thu, Jan 3, 2013 at 12:07 PM, John Vines <[email protected]> wrote:
>>
>>> Did you initialize accumulo by running bin/accumulo init?
>>>
>>>
>>> On Thu, Jan 3, 2013 at 12:02 PM, Tim Piety <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I posted a message the the dev list before Xmas and got not response. I
>>>> figuered I'd try this list. If this is not the correct forum can someone
>>>> please let me know what the correct forum is. I am trying to install
>>>> accumulo for a simple demo. I have hadoop installed and running. I verified
>>>> by testing a mapreduce program and I can look at the HDFS system.
>>>>
>>>> When I try to start accumulo I get a INFO message saying attempting to
>>>> talk to zookeeper. I verified zookeeper is running and I can access it
>>>> using the zkCli.sh. The next line to display is INFO :Waiting for accumulo
>>>> to be initialized. That line repeats infinitely.
>>>>
>>>> I looked at the logs and get a message in the tserver_localhost.out
>>>> saying unable obtain instance id at /accumulo/instance_id. A quick web
>>>> search found a message (
>>>> http://affy.blogspot.com/2012/06/accumulo-where-is-my-instance-id.html)
>>>> saying I needed to put the HADOOP/conf directory in  my CLASSPATH. I tried
>>>> that, but that did not work.
>>>>
>>>> I have looked and didn't find any other groupsw where I could post  a
>>>> question.
>>>>
>>>> thanks,
>>>>
>>>> Tim
>>>
>>>
>>>
>>
>

Re: problem installing accumulo

Reply via email to