Hello,
I’m working on an application that needs fast read performance. I’ve been
conducting some experiments starting with a single (pseudo-distributed)
cluster with the intent of scaling out. However, prior to doing so, I
wanted to get a good gauge for how fast a single tablet server can read.
The application processes and stores graph data with the following schema:
for nodes:
N|NodeID ID:NodeID EIN:EdgeID EOUT:EdgeID
.. lots of other attributes
there can be multiple EIN and EOUT CFs for each node
for edges
E|EdgeID ID:NodeID VIN:VertexID
EOUT:VertexID .. lots of other attributes
Scans into the system can be for entire graph or a subset of nodes and
edges. We generally pull navigational information first, then other
attributes later if needed. I’ve spent some time looking into using
locality groups but was curious if there are recommendations on backend
properties that could be set to increase read time particularly if memory
and space were not a concern.
Thanks for your help!
Mike