You could use a server-side iterator that does the filtering on the server, and returns a protobuf value for matching rows.
-Eric On Fri, Sep 4, 2015 at 11:42 AM, Michael Moss <michael.m...@gmail.com> wrote: > Hello, everyone. > > I'd love to hear folks' input on using the "natural" data model of > Accumulo ("BigTable" style) vs more of a Document Model. I'll try to > succinctly describe with a contrived example. > > Let's say I have one domain object I'd like to model, "SensorReadings". A > single entry might look something like the following with 4 distinct CF, CQ > pairs. > > RowKey: DeviceID-YYYMMDD-ReadingID (i.e. - 1-20150101-1234) > CF: "Meta", CQ: "Timestamp", Value: <Some timestamp> > CF: "Sensor", CQ: "Temperature", Value: 80.4 > CF: "Sensor", CQ: "Humidity", Value: 40.2 > CF: "Sensor", CQ: "Barometer", Value: 29.1 > > I might do queries like "get me all SensorReadings for 2015 for DeviceID = > 1" and if I wanted to operate on each SensorReading as a single unit (and > not as the 4 'rows' it returns for each one), I'd either have to aggregate > the 4 CF, CQ pairs for each RowKey client side, or use something like the > WholeRowIterator. > > In addition, if I wanted to write a query like, "for DeviceID = 1 in 2015, > return me SensorReadings where Temperature > 90, Humidity < 40, Barometer > > 31", I'd again have to either use the WholeRowIterator to 'see' each entire > SensorReading in memory on the server for the compound query, or I could > take the intersection of the results of 3 parallel, independent queries on > the client side. > > Where I am going with this is, what are the thoughts around creating a > Java, Protobuf, Avro (etc) object with these 4 CF, CQ pairs as fields and > storing each SensorReading as a single 'Document'? > > RowKey: DeviceID-YYYMMDD > CF: ReadingID Value: Protobuf(Timestamp=123, Temperature=80.4, > Humidity=40.2, Barometer = 29.1) > > This way you avoid having to use the WholeRowIterator and unless you often > have queries that only look at a tiny subset of your fields (let's say just > "Temperature"), the serialization costs seem similar since Value is just > bytes anyway. > > Appreciate folks' experience and wisdom here. Hope this makes sense, happy > to clarify. > > Best. > > -Mike > > > > >