Re: schema examples

Kepner, Jeremy - 0553 - MITLL Sat, 28 Dec 2013 10:37:06 -0800

Hi Arshak,
  Here is how you might do it.  We implement everything with batch writers and 
batch scanners.  Note: if you are doing high ingest rates, the degree table can 
be tricky and usually requires pre-summing prior to ingestion to reduce the 
pressure on the accumulator inside of Accumulo.  Feel free to ask further 
questions as I would imagine that there a details that still wouldn't be clear. 
 In particular, why we do it this way.


Regards.  -Jeremy

Original data:

Machine,Pool,Load,ReadingTimestamp
neptune,west,5,1388191975000
neptune,west,9,1388191975010
pluto,east,13,1388191975090


Tedge table:
rowKey,columnQualifier,value

0005791918831-neptune,Machine|neptune,1
0005791918831-neptune,Pool|west,1
0005791918831-neptune,Load|5,1
0005791918831-neptune,ReadingTimestamp|1388191975000,1
0105791918831-neptune,Machine|neptune,1
0105791918831-neptune,Pool|west,1
0105791918831-neptune,Load|9,1
0105791918831-neptune,ReadingTimestamp|1388191975010,1
0905791918831-pluto,Machine|pluto,1
0905791918831-pluto,Pool|east,1
0905791918831-pluto,Load|13,1
0905791918831-pluto,ReadingTimestamp|1388191975090,1


TedgeTranspose table:
rowKey,columnQualifier,value

Machine|neptune,0005791918831-neptune,1
Pool|west,0005791918831-neptune,1
Load|5,0005791918831-neptune,1
ReadingTimestamp|1388191975000,0005791918831-neptune,1
Machine|neptune,0105791918831-neptune,1
Pool|west,0105791918831-neptune,1
Load|9,0105791918831-neptune,1
ReadingTimestamp|1388191975010,0105791918831-neptune,1
Machine|pluto,0905791918831-pluto,1
Pool|east,0905791918831-pluto,1
Load|13,0905791918831-pluto,1
ReadingTimestamp|1388191975090,0905791918831-pluto,1


TedgeDegree table:
rowKey,columnQualifier,value

Machine|neptune,Degree,2
Pool|west,Degree,2
Load|5,Degree,1
ReadingTimestamp|1388191975000,Degree,1
Load|9,Degree,1
ReadingTimestamp|1388191975010,Degree,1
Machine|pluto,Degree,1
Pool|east,Degree,1
Load|13,Degree,1
ReadingTimestamp|1388191975090,Degree,1


TedgeText table:
rowKey,columnQualifier,value

0005791918831-neptune,Text,< ... raw text of original log ...>
0105791918831-neptune,Text,< ... raw text of original log ...>
0905791918831-pluto,Text,< ... raw text of original log ...>

On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[email protected]> wrote:

> Jeremy,
> 
> Wow, didn't expect to get help from the author :)
> 
> How about something simple like this:
> 
> Machine    Pool      Load        ReadingTimestamp
> neptune     west      5            1388191975000
> neptune     west      9            1388191975010
> pluto         east       13           1388191975090
> 
> These are the areas I am unclear on:
> 
> 1.  Should the transpose table be built as part of ingest code or as an 
> accumulo combiner?
> 2.  What does the degree table do in this example ?  The paper mentions it's 
> useful for query optimization.  How?  
> 3.  Does D4M accommodate "repurposing" the row_id to a partition key?  The 
> wikisearch shows how the partition id is important for parallel scans of the 
> index.  But since Accumulo is a row store how can you do fast lookups by row 
> if you've used the row_id as a partition key.
> 
> Thank you,
> 
> Arshak
> 
> 
> 
> 
> 
> 
> On Thu, Dec 26, 2013 at 5:31 PM, Jeremy Kepner <[email protected]> wrote:
> Hi Arshak,
>   Maybe you can send a few (~3) records of data that you are familiar with
> and we can walk you through how the D4M schema would be applied to those 
> records.
> 
> Regards.  -Jeremy
> 
> On Thu, Dec 26, 2013 at 03:10:59PM -0500, Arshak Navruzyan wrote:
> >    Hello,
> >    I am trying to get my head around Accumulo schema designs.  I went 
> > through
> >    a lot of trouble to get the wikisearch example running but since the data
> >    in protobuf lists, it's not that illustrative (for a newbie).
> >    Would love to find another example that is a little simpler to 
> > understand.
> >     In particular I am interested in java/scala code that mimics the D4M
> >    schema design (not a Matlab guy).
> >    Thanks,
> >    Arshak
>

smime.p7s
Description: S/MIME cryptographic signature

Re: schema examples

Reply via email to