John, I guess a little of both. In the Enron email set I have a bunch of folders representing people. Each folder has subfolders that equate to mailboxes (inbox, sent_mail, etc...). Each mailbox simply contains text files named 1, 2, 3, 4 that equate to an individual email.
Each email is a text file that if easy to parse into specific fields. I want to place those emails in accumulo and run some simple MapReduce for the demo. Similar to what I saw in some *Cloudbase *training last year. What I didn't remember is how the tables were arranged. I was just going to make each email, regardless of mailbox as a row in accumulo and make make the mailbox and owner separate columns (or column qualifier to be more specific). My issue is the To and CC fields. Each can be a list. I was thinking of making the column family to and the column qualifier 1,2,3, ...). I could also make the column qualifier for the to family the actual value "[email protected]". I wasn't exactly sure of the best way. Each email has a Message_ID and so far I think they are unique. If not I can generate a unique ID. Again this will be for a simple demo where people may want to search from some person, to some person and maybe for specific terms in the body of the email. Hope this gives a good idea of what I am trying to do. Feel free to ask any other questions you may have if I wasn';t clear enough. Again I have more experience working with existing structures. I am trying t use this experience to learn a little about how to organize the data. thanks in advance, Tim On Thu, Jan 3, 2013 at 12:28 PM, John Vines <[email protected]> wrote: > Are you looking for generic pointers for it or do you have specific > questions? Feel free to ask away and someone will be able to help. > > John > > > On Thu, Jan 3, 2013 at 12:23 PM, Tim Piety <[email protected]> wrote: > >> John, >> >> No I hadn't. Thank you that was it. I to another look at the install doc >> and didn't see this step in there. I then looked at the README file on the >> ACCUMULO website and it is in there. >> I was able to start accumulo and then start an accumulo shell and execute >> the tables command and it listed !METADATA. I presume that this means I am >> up and running. >> >> I am going to use the enron dataset for my demo. I do have a few >> questions regarding how to structure it if you don't mind a few more >> questions. >> >> >> thanks again. >> >> Tim >> >> >> On Thu, Jan 3, 2013 at 12:07 PM, John Vines <[email protected]> wrote: >> >>> Did you initialize accumulo by running bin/accumulo init? >>> >>> >>> On Thu, Jan 3, 2013 at 12:02 PM, Tim Piety <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I posted a message the the dev list before Xmas and got not response. I >>>> figuered I'd try this list. If this is not the correct forum can someone >>>> please let me know what the correct forum is. I am trying to install >>>> accumulo for a simple demo. I have hadoop installed and running. I verified >>>> by testing a mapreduce program and I can look at the HDFS system. >>>> >>>> When I try to start accumulo I get a INFO message saying attempting to >>>> talk to zookeeper. I verified zookeeper is running and I can access it >>>> using the zkCli.sh. The next line to display is INFO :Waiting for accumulo >>>> to be initialized. That line repeats infinitely. >>>> >>>> I looked at the logs and get a message in the tserver_localhost.out >>>> saying unable obtain instance id at /accumulo/instance_id. A quick web >>>> search found a message ( >>>> http://affy.blogspot.com/2012/06/accumulo-where-is-my-instance-id.html) >>>> saying I needed to put the HADOOP/conf directory in my CLASSPATH. I tried >>>> that, but that did not work. >>>> >>>> I have looked and didn't find any other groupsw where I could post a >>>> question. >>>> >>>> thanks, >>>> >>>> Tim >>> >>> >>> >> >
