I might be causing more confusion. Consider the following:

{"name":"Josh", "age":85}

If you stored the attribute name in the colf and the type (string or int) in the colq, it works fine for the above document.

Now consider the following document, say where there were multiple sources of my age with we didn't know which was reliable

{"name":"Josh", "age":[40,85]}

In the aforementioned scheme, "rowid age:int -> 40" and "rowid age:int -> 85" would collapse on one another. These are the Map (as in your java.util.Map) semantics that Accumulo provides.

If you have very distinct data types (which it appears you do), this might not be of concern to you. Just be cognizant in your translation from EMF to Key that you aren't creating duplicate Keys unexpectedly.

On 4/25/14, 10:53 AM, Geoffry Roberts wrote:
Ok Josh, you have me worried.

I am storing the object's name in the colfam: e.g. "patientId", the
object's data type goes in the colq: e.g "org.hl7.v3.II", then the value
in the colval.  I think the largest graph I'm likely to have is < 5k and
you say I soul have memory problems.  This is good topic.  How then can
I estimate?


On Fri, Apr 25, 2014 at 10:17 AM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:

    Not necessarily. If you are storing just the type in the colq and
    have one value and type per document/row, you won't have a problem.
    If you have more than one value in a type per document/row, the last
    one you inserted will be what sticks (which is likely undesirable).

    Of course, this is also assuming there isn't some other uniquely
    identifying attribute in the colfam.


    On 4/25/14, 9:55 AM, Geoffry Roberts wrote:

        Thanks for the comments.

        I'm using the qualifier to tell me the type of the value.
          Sounds like
        I'm misusing it.

        My EMF documents are running  no more than 5k so I gather a row
        will fit
        into memory well enough.


        On Fri, Apr 25, 2014 at 9:29 AM, Mike Drob <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

             Large rows are only an issue if you are going to try to put the
             entire row in memory at once. As long as you have small enough
             entries in the row, and can treat them individually, you
        should be fine.

             The qualifier is anything that you want to use to determine
             uniqueness across keys. So yes, this sounds fine, although
        possibly
             not fine grain enough.

             Mike


             On Fri, Apr 25, 2014 at 9:11 AM, Geoffry Roberts
             <[email protected] <mailto:[email protected]>
        <mailto:[email protected]
        <mailto:[email protected]>__>> wrote:

                 Interesting, multiple mutations that is.  Are we talking
                 multiples on the same row id?

                 Upon reflection, I realized the embedded thing is nothing
                 special.  I think I'll keep adding columns to a single
        mutation.
                   This will make for a wide row, but I'm not seeing
        that as a
                 problem.  I am I being naive?

                 Another question if I may.  As I walk my graph, I must keep
                 track of the type of the value being persisted.  I am
        using the
                 qualifier for this, putting in it a URI that indicates
        the type.
                   Is this a proper use for the qualifier?

                 Thanks for the discussion


                 On Thu, Apr 24, 2014 at 11:23 PM, William Slacum
                 <wilhelm.von.cloud@accumulo.__net
        <mailto:[email protected]>
                 <mailto:wilhelm.von.cloud@__accumulo.net
        <mailto:[email protected]>>> wrote:

                     Depending on your table schema, you'll probably want to
                     translate an object graph into multiple mutations.


                     On Thu, Apr 24, 2014 at 8:40 PM, David Medinets
                     <[email protected]
        <mailto:[email protected]>
        <mailto:david.medinets@gmail.__com
        <mailto:[email protected]>>>

                     wrote:

                         If the sub-document changes, you'll need to
        search the
                         values of every Accumulo entry?


                         On Thu, Apr 24, 2014 at 5:31 PM, Geoffry Roberts
                         <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>__>>

                         wrote:

                             The use case is, I am walking a complex
        object graph
                             and persisting what I find there.  Said
        object graph
                             in my case is always EMF (eclipse modeling
                             framework) compliant.  An EMF graph can
        have in if
                             references to--brace yourself--a non-cross
        document
                             containment reference.  When using Mongo,
        these were
                             persisted as a DBObject embedded into a
        containing
                             DBObject.  I'm trying to decide whether I
        want to
                             follow suit.

                             Any thoughts?


                             On Thu, Apr 24, 2014 at 4:03 PM, Sean Busbey
                             <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>>

                             wrote:

                                 Can you describe the use case more? Do
        you know
                                 what the purpose for the embedded
        changes are?


                                 On Thu, Apr 24, 2014 at 2:59 PM,
        Geoffry Roberts
                                 <[email protected]
        <mailto:[email protected]>
                                 <mailto:[email protected]
        <mailto:[email protected]>__>> wrote:

                                     All,

                                     I am in the throws of converting
                                     some(else's) code from MongoDB to
        Accumulo.
                                       I am seeing a situation where one
        DBObject
                                     if being embedded into another
        DBObject.  I
                                     see that Mutation supports a method
        called
                                     getRow()  that returns a byte array.  I
                                     gather I can use this to achieve a
        similar
                                     result if I were so inclined.

                                     Am I so inclined?  i.e. Is this the
        way we
                                     do things in Accumulo?

                                     DBObject, roughly speaking, is Mongo's
                                     counterpart to Mutation.

                                     Thanks mucho

                                     --
                                     There are ways and there are ways,

                                     Geoffry Roberts




                                 --
                                 Sean




                             --
                             There are ways and there are ways,

                             Geoffry Roberts






                 --
                 There are ways and there are ways,

                 Geoffry Roberts





        --
        There are ways and there are ways,

        Geoffry Roberts




--
There are ways and there are ways,

Geoffry Roberts

Reply via email to