Thanks for your email Ryan.

Let me give you some more information on what i'm trying to do...
Essentially, i have to create a "sort of CMDB" system that stores, not only 
configuration data, but also operational data (so...i guess you could call it a 
OMDB instead).

Either way, my company develops a meta-scheduler that can be used for HPC or 
Cloud environments. It will guarantee that your resources are used the best way 
possible, maximizing their usage, based on the policies you set up in it.

To do that, our software needs to be aware of how the environment looks and 
this is why an OMDB piece is very important for us (as it allows us to store 
information on the environment).

Also, our software talks with external resource managers by a protocol we 
developed more than a dozen years ago called "WIKI" (not as in "wikipedia" but, 
WIKI as in the hawayan word for fast). That protocol is heavily based around 
key/value pairs so this is one of the reasons i was EXTREMELY excited to find 
out that, with CouchDB's "view" functionality, i would be able to map document 
attributes to more meaningful attributes that our software understands (i.e. 
map the document's "available_cores" attribute to "ccores" [the "consumable 
cores" parameter our software understands]).

Another important thing to notice is that resources can be off different types: 
node (for bare metal nodes), vm (for vms running on nodes) and storage (we can 
actually have more data types but those are enough to exemplify what i'm 
talking about).

This is why i created those "big documents" instead of smaller ones!
For instance, each document would represent an entire node (i.e. procs, memory, 
etc).

So my idea was to have an external process initially populate the database with 
documents representing ALL the nodes we are managing (hence why i started my 
benchmarks with 100K increments) and OTHER external processes (i.e. other types 
of resource managers) would update individual attributes in each document.

Let's imagine a document with id "node01":
These fields would be updated by an agent that collected some of the hardware 
specs:
        ccores: 4 // total cores on machine
        acores: 4 // available cores on machine
        cmemory: 4096 // total memory on machine
        amemory: 1024 // available memory
        cpuload: 94%
This field would be updated by our storage resource manager:
        GMETRIC["disk"]: 1000000
And, for instance, these fields would be updated by a network resource manager:
        GMETRIC["NETIO"]: { "in":100, "out":200 }

So, as you can see, different processes would manage the same document (just 
different attributes in it).

And the REALLY cool thing about the Views is the fact that our customers could 
VERY easily adapt the database so that it would store THEIR extra data and 
shove it in a generic parameter that our software woulder understand [i.e. the 
GMETRIC parameters are generic metrics...).

So, based on these requirements, do you have any suggestions on how we should 
store our data (keeping its structure easy enough for external consumers to 
maintain it without having to bust their heads figuring out the logic behind 
the document attributes)?? :o)

Thank you!
Luis Miguel Silva

On Apr 5, 2011, at 6:45 PM, Ryan Ramage <[email protected]> wrote:

> Luis,
> 
> Having the rev is very important when you update a doc. It lets you
> know that your piece of information is out of date. This is a good
> thing....
> 
> I am wondering if the way you are modeling your data is not leading
> you to do this update with less chance of conflict. See if you can
> break your docs into even smaller docs. For example, I noticed from a
> prior post you had a lot of Arrays in your docs. If multiple processes
> are changing that array, you might be better served by making each
> element in the array a separate doc.
> 
> Ryan
> 
> On Tue, Apr 5, 2011 at 4:41 PM, Luis Miguel Silva
> <[email protected]> wrote:
>> More or less!
>> 
>> The most common scenario will be:
>> - two or more processes writing to the same document, but only to a
>> specific attribute (not overwriting the whole document)
>> 
>> If, by any chance, two processes overwrite the same field, i'm ok with
>> the last one always winning.
>> 
>> Thanks,
>> Luis
>> 
>> On Tue, Apr 5, 2011 at 4:26 PM, Robert Newson <[email protected]> 
>> wrote:
>>> "Ideally, we would be able to update without specifying the _rev, just
>>> posting (or, in this case PUTting) to the document..."
>>> 
>>> So you want to blindly overwrite some unknown data?
>>> 
>>> B.
>>> 
>>> On 5 April 2011 22:57, Zachary Zolton <[email protected]> wrote:
>>>> Luis,
>>>> 
>>>> Checkout _update handlers:
>>>> 
>>>> http://wiki.apache.org/couchdb/Document_Update_Handlers
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Zach
>>>> 
>>>> On Tue, Apr 5, 2011 at 4:46 PM, Luis Miguel Silva
>>>> <[email protected]> wrote:
>>>>> Dear all,
>>>>> 
>>>>> I'm trying to play around with updates and i'm bumping into some problems.
>>>>> 
>>>>> Let's image we have to clients that poll a document from the server at
>>>>> the same time and get the same _rev.
>>>>> Then one of them updates the doc based on the _rev it got:
>>>>> [root@xkitten ~]# curl -X PUT -d
>>>>> '{"_rev":"3-0d519bcf08130bf784f3c35d79760740","hello2":"fred2"}'
>>>>> http://localhost:5984/benchmark/test?conflicts=true
>>>>> {"ok":true,"id":"test","rev":"4-03640ebafbb4fcaf127844671f8e2de7"}
>>>>> Then another one tries to update the doc based on the same exact _rev:
>>>>> [root@xkitten ~]# curl -X PUT -d
>>>>> '{"_rev":"3-0d519bcf08130bf784f3c35d79760740","hello3":"fred3"}'
>>>>> http://localhost:5984/benchmark/test?conflicts=true
>>>>> {"error":"conflict","reason":"Document update conflict."}
>>>>> [root@xkitten ~]#
>>>>> 
>>>>> Is there a way to avoid this?! (like...make the update just create a
>>>>> new _rev or something)??
>>>>> 
>>>>> Ideally, we would be able to update without specifying the _rev, just
>>>>> posting (or, in this case PUTting) to the document...
>>>>> 
>>>>> Thoughts??
>>>>> 
>>>>> Thank you,
>>>>> Luis
>>>>> 
>>>> 
>>> 
>> 

Reply via email to