Hi Arshak, Here is how you might do it. We implement everything with batch writers and batch scanners. Note: if you are doing high ingest rates, the degree table can be tricky and usually requires pre-summing prior to ingestion to reduce the pressure on the accumulator inside of Accumulo. Feel free to ask further questions as I would imagine that there a details that still wouldn't be clear. In particular, why we do it this way.
Regards. -Jeremy Original data: Machine,Pool,Load,ReadingTimestamp neptune,west,5,1388191975000 neptune,west,9,1388191975010 pluto,east,13,1388191975090 Tedge table: rowKey,columnQualifier,value 0005791918831-neptune,Machine|neptune,1 0005791918831-neptune,Pool|west,1 0005791918831-neptune,Load|5,1 0005791918831-neptune,ReadingTimestamp|1388191975000,1 0105791918831-neptune,Machine|neptune,1 0105791918831-neptune,Pool|west,1 0105791918831-neptune,Load|9,1 0105791918831-neptune,ReadingTimestamp|1388191975010,1 0905791918831-pluto,Machine|pluto,1 0905791918831-pluto,Pool|east,1 0905791918831-pluto,Load|13,1 0905791918831-pluto,ReadingTimestamp|1388191975090,1 TedgeTranspose table: rowKey,columnQualifier,value Machine|neptune,0005791918831-neptune,1 Pool|west,0005791918831-neptune,1 Load|5,0005791918831-neptune,1 ReadingTimestamp|1388191975000,0005791918831-neptune,1 Machine|neptune,0105791918831-neptune,1 Pool|west,0105791918831-neptune,1 Load|9,0105791918831-neptune,1 ReadingTimestamp|1388191975010,0105791918831-neptune,1 Machine|pluto,0905791918831-pluto,1 Pool|east,0905791918831-pluto,1 Load|13,0905791918831-pluto,1 ReadingTimestamp|1388191975090,0905791918831-pluto,1 TedgeDegree table: rowKey,columnQualifier,value Machine|neptune,Degree,2 Pool|west,Degree,2 Load|5,Degree,1 ReadingTimestamp|1388191975000,Degree,1 Load|9,Degree,1 ReadingTimestamp|1388191975010,Degree,1 Machine|pluto,Degree,1 Pool|east,Degree,1 Load|13,Degree,1 ReadingTimestamp|1388191975090,Degree,1 TedgeText table: rowKey,columnQualifier,value 0005791918831-neptune,Text,< ... raw text of original log ...> 0105791918831-neptune,Text,< ... raw text of original log ...> 0905791918831-pluto,Text,< ... raw text of original log ...> On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[email protected]> wrote: > Jeremy, > > Wow, didn't expect to get help from the author :) > > How about something simple like this: > > Machine Pool Load ReadingTimestamp > neptune west 5 1388191975000 > neptune west 9 1388191975010 > pluto east 13 1388191975090 > > These are the areas I am unclear on: > > 1. Should the transpose table be built as part of ingest code or as an > accumulo combiner? > 2. What does the degree table do in this example ? The paper mentions it's > useful for query optimization. How? > 3. Does D4M accommodate "repurposing" the row_id to a partition key? The > wikisearch shows how the partition id is important for parallel scans of the > index. But since Accumulo is a row store how can you do fast lookups by row > if you've used the row_id as a partition key. > > Thank you, > > Arshak > > > > > > > On Thu, Dec 26, 2013 at 5:31 PM, Jeremy Kepner <[email protected]> wrote: > Hi Arshak, > Maybe you can send a few (~3) records of data that you are familiar with > and we can walk you through how the D4M schema would be applied to those > records. > > Regards. -Jeremy > > On Thu, Dec 26, 2013 at 03:10:59PM -0500, Arshak Navruzyan wrote: > > Hello, > > I am trying to get my head around Accumulo schema designs. I went > > through > > a lot of trouble to get the wikisearch example running but since the data > > in protobuf lists, it's not that illustrative (for a newbie). > > Would love to find another example that is a little simpler to > > understand. > > In particular I am interested in java/scala code that mimics the D4M > > schema design (not a Matlab guy). > > Thanks, > > Arshak >
smime.p7s
Description: S/MIME cryptographic signature
