Re: Accumulo: "BigTable" vs. "Document Model"

Josh Elser Fri, 04 Sep 2015 13:32:14 -0700

These days, I tend to lean towards breaking out each attribute in arecord into discrete columns.

When you roll up multiple columns into a single value, you lose theability to use the native column filtering (cf or cf+cq) that's builtinto Accumulo. Same goes for column visibilities (at least in thetraditional sense). Deletes and updates are more difficult to reasonabout and require some extra coordination to work.

You can always aggregate many rows on the server dynamically if thatmakes processing things as one "entry" more simple.


Michael Moss wrote:

Hello, everyone.

I'd love to hear folks' input on using the "natural" data model of
Accumulo ("BigTable" style) vs more of a Document Model. I'll try to
succinctly describe with a contrived example.

Let's say I have one domain object I'd like to model, "SensorReadings".
A single entry might look something like the following with 4 distinct
CF, CQ pairs.

RowKey: DeviceID-YYYMMDD-ReadingID (i.e. - 1-20150101-1234)
CF: "Meta", CQ: "Timestamp", Value: <Some timestamp>
CF: "Sensor", CQ: "Temperature", Value: 80.4
CF: "Sensor", CQ: "Humidity", Value: 40.2
CF: "Sensor", CQ: "Barometer", Value: 29.1

I might do queries like "get me all SensorReadings for 2015 for DeviceID
= 1" and if I wanted to operate on each SensorReading as a single unit
(and not as the 4 'rows' it returns for each one), I'd either have to
aggregate the 4 CF, CQ pairs for each RowKey client side, or use
something like the WholeRowIterator.

In addition, if I wanted to write a query like, "for DeviceID = 1 in
2015, return me SensorReadings where Temperature > 90, Humidity < 40,
Barometer > 31", I'd again have to either use the WholeRowIterator to
'see' each entire SensorReading in memory on the server for the compound
query, or I could take the intersection of the results of 3 parallel,
independent queries on the client side.

Where I am going with this is, what are the thoughts around creating a
Java, Protobuf, Avro (etc) object with these 4 CF, CQ pairs as fields
and storing each SensorReading as a single 'Document'?

RowKey: DeviceID-YYYMMDD
CF: ReadingID Value: Protobuf(Timestamp=123, Temperature=80.4,
Humidity=40.2, Barometer = 29.1)

This way you avoid having to use the WholeRowIterator and unless you
often have queries that only look at a tiny subset of your fields (let's
say just "Temperature"), the serialization costs seem similar since
Value is just bytes anyway.

Appreciate folks' experience and wisdom here. Hope this makes sense,
happy to clarify.

Best.

-Mike

Re: Accumulo: "BigTable" vs. "Document Model"

Reply via email to