Hi Arshak, See interspersed below. Regards. -Jeremy On Dec 29, 2013, at 11:34 AM, Arshak Navruzyan <[email protected]> wrote:
> Jeremy, > > Thanks for the detailed explanation. Just a couple of final questions: > > 1. What's your advise on the transpose table as far as whether to repeat the > indexed term (one per matching row id) or try to store all matching row ids > from tedge in a single row in tedgetranspose (using protobuf for example). > What's the performance implication of each approach? In the paper you > mentioned that if it's a few values they should just be stored together. Was > there a cut-off point in your testing? Can you clarify? I am not sure what your asking. > > 2. You mentioned that the degrees should be calculated beforehand for high > ingest rates. Doesn't this change Accumulo from being a true database to > being more of an index? If changes to the data cause the degree table to get > out of sync, sounds like changes have to be applied elsewhere first and > Accumulo has to be reloaded periodically. Or perhaps letting the degree > table get out of sync is ok since it's just an assist... My point was a very narrow comment on optimization in very high performance situations. I probably shouldn't have mentioned it. If you have ever have performance issues with your degree tables, that would be the time to discuss. . You may never encounter this issue. > Thanks, > > Arshak > > > On Sat, Dec 28, 2013 at 10:36 AM, Kepner, Jeremy - 0553 - MITLL > <[email protected]> wrote: > Hi Arshak, > Here is how you might do it. We implement everything with batch writers > and batch scanners. Note: if you are doing high ingest rates, the degree > table can be tricky and usually requires pre-summing prior to ingestion to > reduce the pressure on the accumulator inside of Accumulo. Feel free to ask > further questions as I would imagine that there a details that still wouldn't > be clear. In particular, why we do it this way. > > Regards. -Jeremy > > Original data: > > Machine,Pool,Load,ReadingTimestamp > neptune,west,5,1388191975000 > neptune,west,9,1388191975010 > pluto,east,13,1388191975090 > > > Tedge table: > rowKey,columnQualifier,value > > 0005791918831-neptune,Machine|neptune,1 > 0005791918831-neptune,Pool|west,1 > 0005791918831-neptune,Load|5,1 > 0005791918831-neptune,ReadingTimestamp|1388191975000,1 > 0105791918831-neptune,Machine|neptune,1 > 0105791918831-neptune,Pool|west,1 > 0105791918831-neptune,Load|9,1 > 0105791918831-neptune,ReadingTimestamp|1388191975010,1 > 0905791918831-pluto,Machine|pluto,1 > 0905791918831-pluto,Pool|east,1 > 0905791918831-pluto,Load|13,1 > 0905791918831-pluto,ReadingTimestamp|1388191975090,1 > > > TedgeTranspose table: > rowKey,columnQualifier,value > > Machine|neptune,0005791918831-neptune,1 > Pool|west,0005791918831-neptune,1 > Load|5,0005791918831-neptune,1 > ReadingTimestamp|1388191975000,0005791918831-neptune,1 > Machine|neptune,0105791918831-neptune,1 > Pool|west,0105791918831-neptune,1 > Load|9,0105791918831-neptune,1 > ReadingTimestamp|1388191975010,0105791918831-neptune,1 > Machine|pluto,0905791918831-pluto,1 > Pool|east,0905791918831-pluto,1 > Load|13,0905791918831-pluto,1 > ReadingTimestamp|1388191975090,0905791918831-pluto,1 > > > TedgeDegree table: > rowKey,columnQualifier,value > > Machine|neptune,Degree,2 > Pool|west,Degree,2 > Load|5,Degree,1 > ReadingTimestamp|1388191975000,Degree,1 > Load|9,Degree,1 > ReadingTimestamp|1388191975010,Degree,1 > Machine|pluto,Degree,1 > Pool|east,Degree,1 > Load|13,Degree,1 > ReadingTimestamp|1388191975090,Degree,1 > > > TedgeText table: > rowKey,columnQualifier,value > > 0005791918831-neptune,Text,< ... raw text of original log ...> > 0105791918831-neptune,Text,< ... raw text of original log ...> > 0905791918831-pluto,Text,< ... raw text of original log ...> > > On Dec 27, 2013, at 8:01 PM, Arshak Navruzyan <[email protected]> wrote: > > > Jeremy, > > > > Wow, didn't expect to get help from the author :) > > > > How about something simple like this: > > > > Machine Pool Load ReadingTimestamp > > neptune west 5 1388191975000 > > neptune west 9 1388191975010 > > pluto east 13 1388191975090 > > > > These are the areas I am unclear on: > > > > 1. Should the transpose table be built as part of ingest code or as an > > accumulo combiner? > > 2. What does the degree table do in this example ? The paper mentions > > it's useful for query optimization. How? > > 3. Does D4M accommodate "repurposing" the row_id to a partition key? The > > wikisearch shows how the partition id is important for parallel scans of > > the index. But since Accumulo is a row store how can you do fast lookups > > by row if you've used the row_id as a partition key. > > > > Thank you, > > > > Arshak > > > > > > > > > > > > > > On Thu, Dec 26, 2013 at 5:31 PM, Jeremy Kepner <[email protected]> wrote: > > Hi Arshak, > > Maybe you can send a few (~3) records of data that you are familiar with > > and we can walk you through how the D4M schema would be applied to those > > records. > > > > Regards. -Jeremy > > > > On Thu, Dec 26, 2013 at 03:10:59PM -0500, Arshak Navruzyan wrote: > > > Hello, > > > I am trying to get my head around Accumulo schema designs. I went > > > through > > > a lot of trouble to get the wikisearch example running but since the > > > data > > > in protobuf lists, it's not that illustrative (for a newbie). > > > Would love to find another example that is a little simpler to > > > understand. > > > In particular I am interested in java/scala code that mimics the D4M > > > schema design (not a Matlab guy). > > > Thanks, > > > Arshak > > > >
smime.p7s
Description: S/MIME cryptographic signature
