Thanks for responding.
The RKEYs for nodes are N|<NodeID> and we have CF:CQs for each edge. We maintain the edge attributes as separate RKEYs using E<EdgeID>. I’m not sure what you mean by repeating the node id.. Mike On Wed, Nov 6, 2013 at 9:58 AM, William Slacum < [email protected]> wrote: > When you say schema, do you mean key schema? If so, why are you repeating > the node id? > > Locality groups would help if you have larger swaths of data you wanted to > group together and query discretely from other locality groups. For > instance, I've seen key schemas where "in" and "out" edges are grouped > together. > > At a system level, if you know some information about the distribution of > the row values (in this case, it looks like node id and edge id), you can > pre split the table by taking some samples out of that space. This would > distribute the tablets arounds, making queries using the batch scanner > faster by increasing the parallelism. This would also increase the number > of input splits generated by the input format if you wanted to do batch > processing on the entire graph. > > On Wed, Nov 6, 2013 at 9:19 AM, Michael Orr <[email protected]>wrote: > >> Hello, >> >> I’m working on an application that needs fast read performance. I’ve been >> conducting some experiments starting with a single (pseudo-distributed) >> cluster with the intent of scaling out. However, prior to doing so, I >> wanted to get a good gauge for how fast a single tablet server can read. >> >> The application processes and stores graph data with the following schema: >> >> for nodes: >> N|NodeID ID:NodeID EIN:EdgeID >> EOUT:EdgeID .. lots of other attributes >> >> there can be multiple EIN and EOUT CFs for each node >> >> for edges >> E|EdgeID ID:NodeID VIN:VertexID >> EOUT:VertexID .. lots of other attributes >> >> >> Scans into the system can be for entire graph or a subset of nodes and >> edges. We generally pull navigational information first, then other >> attributes later if needed. I’ve spent some time looking into using >> locality groups but was curious if there are recommendations on backend >> properties that could be set to increase read time particularly if memory >> and space were not a concern. >> >> Thanks for your help! >> >> Mike >> > >
