Re: schema examples

Kepner, Jeremy - 0553 - MITLL Sun, 29 Dec 2013 08:44:28 -0800

Hi Arshak,
  See interspersed below.
Regards.  -Jeremy

On Dec 29, 2013, at 11:34 AM, Arshak Navruzyan <[email protected]> wrote:


> Jeremy,
> 
> Thanks for the detailed explanation.  Just a couple of final questions:
> 
> 1.  What's your advise on the transpose table as far as whether to repeat the 
> indexed term (one per matching row id) or try to store all matching row ids 
> from tedge in a single row in tedgetranspose (using protobuf for example).  
> What's the performance implication of each approach?  In the paper you 
> mentioned that if it's a few values they should just be stored together.  Was 
> there a cut-off point in your testing?

Can you clarify?  I am not sure what your asking.

> 
> 2.  You mentioned that the degrees should be calculated beforehand for high 
> ingest rates.  Doesn't this change Accumulo from being a true database to 
> being more of an index?  If changes to the data cause the degree table to get 
> out of sync, sounds like changes have to be applied elsewhere first and 
> Accumulo has to be reloaded periodically.  Or perhaps letting the degree 
> table get out of sync is ok since it's just an assist...

My point was a very narrow comment on optimization in very high performance 
situations. I probably shouldn't have mentioned it.  If you have ever have 
performance issues with your degree tables, that would be the time to discuss. 
. You may never encounter this issue.

> Thanks,
> 
> Arshak
> 
> 
> On Sat, Dec 28, 2013 at 10:36 AM, Kepner, Jeremy - 0553 - MITLL 
> <[email protected]> wrote:
> Hi Arshak,
>   Here is how you might do it.  We implement everything with batch writers 
> and batch scanners.  Note: if you are doing high ingest rates, the degree 
> table can be tricky and usually requires pre-summing prior to ingestion to 
> reduce the pressure on the accumulator inside of Accumulo.  Feel free to ask 
> further questions as I would imagine that there a details that still wouldn't 
> be clear.  In particular, why we do it this way.
> 
> Regards.  -Jeremy
> 
> Original data:
> 
> Machine,Pool,Load,ReadingTimestamp
> neptune,west,5,1388191975000
> neptune,west,9,1388191975010
> pluto,east,13,1388191975090
> 
> 
> Tedge table:
> rowKey,columnQualifier,value
> 
> 0005791918831-neptune,Machine|neptune,1
> 0005791918831-neptune,Pool|west,1
> 0005791918831-neptune,Load|5,1
> 0005791918831-neptune,ReadingTimestamp|1388191975000,1
> 0105791918831-neptune,Machine|neptune,1
> 0105791918831-neptune,Pool|west,1
> 0105791918831-neptune,Load|9,1
> 0105791918831-neptune,ReadingTimestamp|1388191975010,1
> 0905791918831-pluto,Machine|pluto,1
> 0905791918831-pluto,Pool|east,1
> 0905791918831-pluto,Load|13,1
> 0905791918831-pluto,ReadingTimestamp|1388191975090,1
> 
> 
> TedgeTranspose table:
> rowKey,columnQualifier,value
> 
> Machine|neptune,0005791918831-neptune,1
> Pool|west,0005791918831-neptune,1
> Load|5,0005791918831-neptune,1
> ReadingTimestamp|1388191975000,0005791918831-neptune,1
> Machine|neptune,0105791918831-neptune,1
> Pool|west,0105791918831-neptune,1
> Load|9,0105791918831-neptune,1
> ReadingTimestamp|1388191975010,0105791918831-neptune,1
> Machine|pluto,0905791918831-pluto,1
> Pool|east,0905791918831-pluto,1
> Load|13,0905791918831-pluto,1
> ReadingTimestamp|1388191975090,0905791918831-pluto,1
> 
> 
> TedgeDegree table:
> rowKey,columnQualifier,value
> 
> Machine|neptune,Degree,2
> Pool|west,Degree,2
> Load|5,Degree,1
> ReadingTimestamp|1388191975000,Degree,1
> Load|9,Degree,1
> ReadingTimestamp|1388191975010,Degree,1
> Machine|pluto,Degree,1
> Pool|east,Degree,1
> Load|13,Degree,1
> ReadingTimestamp|1388191975090,Degree,1
> 
> 
> TedgeText table:
> rowKey,columnQualifier,value
> 
> 0005791918831-neptune,Text,< ... raw text of original log ...>
> 0105791918831-neptune,Text,< ... raw text of original log ...>
> 0905791918831-pluto,Text,< ... raw text of original log ...>
> 
> On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[email protected]> wrote:
> 
> > Jeremy,
> >
> > Wow, didn't expect to get help from the author :)
> >
> > How about something simple like this:
> >
> > Machine    Pool      Load        ReadingTimestamp
> > neptune     west      5            1388191975000
> > neptune     west      9            1388191975010
> > pluto         east       13           1388191975090
> >
> > These are the areas I am unclear on:
> >
> > 1.  Should the transpose table be built as part of ingest code or as an 
> > accumulo combiner?
> > 2.  What does the degree table do in this example ?  The paper mentions 
> > it's useful for query optimization.  How?
> > 3.  Does D4M accommodate "repurposing" the row_id to a partition key?  The 
> > wikisearch shows how the partition id is important for parallel scans of 
> > the index.  But since Accumulo is a row store how can you do fast lookups 
> > by row if you've used the row_id as a partition key.
> >
> > Thank you,
> >
> > Arshak
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 26, 2013 at 5:31 PM, Jeremy Kepner <[email protected]> wrote:
> > Hi Arshak,
> >   Maybe you can send a few (~3) records of data that you are familiar with
> > and we can walk you through how the D4M schema would be applied to those 
> > records.
> >
> > Regards.  -Jeremy
> >
> > On Thu, Dec 26, 2013 at 03:10:59PM -0500, Arshak Navruzyan wrote:
> > >    Hello,
> > >    I am trying to get my head around Accumulo schema designs.  I went 
> > > through
> > >    a lot of trouble to get the wikisearch example running but since the 
> > > data
> > >    in protobuf lists, it's not that illustrative (for a newbie).
> > >    Would love to find another example that is a little simpler to 
> > > understand.
> > >     In particular I am interested in java/scala code that mimics the D4M
> > >    schema design (not a Matlab guy).
> > >    Thanks,
> > >    Arshak
> >
> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: schema examples

Reply via email to