I might be causing more confusion. Consider the following:
{"name":"Josh", "age":85}
If you stored the attribute name in the colf and the type (string or
int) in the colq, it works fine for the above document.
Now consider the following document, say where there were multiple
sources of my age with we didn't know which was reliable
{"name":"Josh", "age":[40,85]}
In the aforementioned scheme, "rowid age:int -> 40" and "rowid age:int
-> 85" would collapse on one another. These are the Map (as in your
java.util.Map) semantics that Accumulo provides.
If you have very distinct data types (which it appears you do), this
might not be of concern to you. Just be cognizant in your translation
from EMF to Key that you aren't creating duplicate Keys unexpectedly.
On 4/25/14, 10:53 AM, Geoffry Roberts wrote:
Ok Josh, you have me worried.
I am storing the object's name in the colfam: e.g. "patientId", the
object's data type goes in the colq: e.g "org.hl7.v3.II", then the value
in the colval. I think the largest graph I'm likely to have is < 5k and
you say I soul have memory problems. This is good topic. How then can
I estimate?
On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
Not necessarily. If you are storing just the type in the colq and
have one value and type per document/row, you won't have a problem.
If you have more than one value in a type per document/row, the last
one you inserted will be what sticks (which is likely undesirable).
Of course, this is also assuming there isn't some other uniquely
identifying attribute in the colfam.
On 4/25/14, 9:55 AM, Geoffry Roberts wrote:
Thanks for the comments.
I'm using the qualifier to tell me the type of the value.
Sounds like
I'm misusing it.
My EMF documents are running no more than 5k so I gather a row
will fit
into memory well enough.
On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <[email protected]
<mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
Large rows are only an issue if you are going to try to put the
entire row in memory at once. As long as you have small enough
entries in the row, and can treat them individually, you
should be fine.
The qualifier is anything that you want to use to determine
uniqueness across keys. So yes, this sounds fine, although
possibly
not fine grain enough.
Mike
On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
<[email protected] <mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>__>> wrote:
Interesting, multiple mutations that is. Are we talking
multiples on the same row id?
Upon reflection, I realized the embedded thing is nothing
special. I think I'll keep adding columns to a single
mutation.
This will make for a wide row, but I'm not seeing
that as a
problem. I am I being naive?
Another question if I may. As I walk my graph, I must keep
track of the type of the value being persisted. I am
using the
qualifier for this, putting in it a URI that indicates
the type.
Is this a proper use for the qualifier?
Thanks for the discussion
On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
<wilhelm.von.cloud@accumulo.__net
<mailto:[email protected]>
<mailto:wilhelm.von.cloud@__accumulo.net
<mailto:[email protected]>>> wrote:
Depending on your table schema, you'll probably want to
translate an object graph into multiple mutations.
On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
<[email protected]
<mailto:[email protected]>
<mailto:david.medinets@gmail.__com
<mailto:[email protected]>>>
wrote:
If the sub-document changes, you'll need to
search the
values of every Accumulo entry?
On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
<[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>__>>
wrote:
The use case is, I am walking a complex
object graph
and persisting what I find there. Said
object graph
in my case is always EMF (eclipse modeling
framework) compliant. An EMF graph can
have in if
references to--brace yourself--a non-cross
document
containment reference. When using Mongo,
these were
persisted as a DBObject embedded into a
containing
DBObject. I'm trying to decide whether I
want to
follow suit.
Any thoughts?
On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
<[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>
wrote:
Can you describe the use case more? Do
you know
what the purpose for the embedded
changes are?
On Thu, Apr 24, 2014 at 2:59 PM,
Geoffry Roberts
<[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>__>> wrote:
All,
I am in the throws of converting
some(else's) code from MongoDB to
Accumulo.
I am seeing a situation where one
DBObject
if being embedded into another
DBObject. I
see that Mutation supports a method
called
getRow() that returns a byte array. I
gather I can use this to achieve a
similar
result if I were so inclined.
Am I so inclined? i.e. Is this the
way we
do things in Accumulo?
DBObject, roughly speaking, is Mongo's
counterpart to Mutation.
Thanks mucho
--
There are ways and there are ways,
Geoffry Roberts
--
Sean
--
There are ways and there are ways,
Geoffry Roberts
--
There are ways and there are ways,
Geoffry Roberts
--
There are ways and there are ways,
Geoffry Roberts
--
There are ways and there are ways,
Geoffry Roberts