Hmm.. I'll take a stab at giving you some guidelines and suggestions. I hope it
helps but I'll warn you I haven't done anything with CouchDB in awhile and I'm
not testing my map/reduce and queries. Keep that in mind.
Something that raised an eyebrow is that you are trying to model your document
(known for being non-relational) data similar to your tabular relational model
data.
So something that you do with relational databases is normalize your data to
try and store it only once. You then have to join the data back together
piecemeal to get what you want. Document databases are usually naturally
denormalized in comparison and you have shallow and few relationships. The
documents can tend to represent the core ideas for your application and may
match more to how you work with the data.
Naturally the trade off is storage costs, update costs for the duplicated data,
but could provide simpler and easy to work with data model.
Another thing to keep in mind is that defining a map function (as in the map
part of a view) is defining the index you want to do range queries on. A map
function is not a query, but a clustered index on the key/value pair which
point back to the original document (you could say they act as a non-clustered
index in that manner). From your e-mail I wasn't sure if that was understood.
With that said, I would suggest storing the gallery information inside the
artwork document. This duplicates the data but I'm guessing you would rarely
modify the galleries information and more likely add new galleries or maybe
switch out the gallery info for an artwork document. You could do something
similar with the artist information considering that this might not change very
often as well. Also keep in mind that you don't have to use GUIDs for your
keys. If the artist's full name or something similar provides the uniqueness
you could use that.
Perhaps the artist and gallery documents stay the same but the artwork document
looks like:
{
"type": "artwork",
"title": "Guernica",
"medium": "Oil on canvas",
"dimensions": "349 cm x 776 cm (137.4 in x 305.5 in)",
"artist": "Pablo Picasso",
"birth_year": 1881,
"artist_id": "<some guid>",
"gallery_id": "<some guid>",
"gallery": "Museo Reina Sofia"
}
That's just a possibility. I wouldn't store all the info in the artwork
document but just the major bits you know are less likely to change. Storing a
document like this means you can retrieve all the relevant pieces of
information in one query against a map function defined as:
function (doc) {
if (doc.type === 'artwork') {
emit(doc._id, null);
}
}
Notice that I left out the document as a value for the emit() call. This is
because the document's id is always stored with the key/value pair. Emitting
the document for the value is a trade off of retrieve speed versus
storage/update costs.
Anyway you can then query this and ask CouchDB to retrieve the documents.
Something like this with paging:
http://server/db/_design/art/_view/artwork?start_key="<guid>"&limit=101&include_docs=true
You could even not have a gallery document if you didn't need any extra info
with them. You could then do a map/reduce on the artwork documents to get the
list of galleries your data represents.
"galleries" : {
"map": function (doc) {
if (doc.type === 'artwork') {
emit(doc.gallery, 1);
}
},
"reduce": "_sum"
}
This example query would retrieve the galleries and the number of artworks for
each:
http://server/db/_design/art/_view/galleries?group=true
Assuming that the name is unique enough to specify the gallery that is. If you
did need to add extra data you could give the gallery document an id of it's
name to link the two.
As I said before you want shallow and few relationships but you can retrieve
them together if you are trying to go the other direction of the relationship.
In this situation if you wanted to retrieve an artist with all their artworks.
A map function such as:
function (doc) {
if (doc.type === 'artist') {
emit([doc._id, 0], null);
}
else if (doc.type === 'artwork') {
emit([doc.artist_id, 1], null);
}
}
would allow queries to retrieve an artist and all their artworks. You just have
to know that the 0 is the artist and the 1 is the artwork in the results. I use
those numbers to simply order the artist first. A query on the view with the
artist's guid would allow you to retrieve all the artworks as well.
http://server/db/_design/art/_view/artist_and_works?key="[<guid>]"&include_docs=true
Ok, dang. I may have gone off on a tangent but I hope this helps with your
CouchDB understanding and data modeling. Here are some links as well that may
help.
Relationships in CouchDB:
https://wiki.apache.org/couchdb/EntityRelationship
Views intro:
https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
View query options:
https://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
How your keys sort the data in a view:
https://wiki.apache.org/couchdb/View_collation
Remember you can retrieve multiple ids at once:
https://wiki.apache.org/couchdb/HTTP_Bulk_Document_API#Fetch_Multiple_Documents_With_a_Single_Request
The questions many don't think to ask, incomplete but a couple with good info:
https://wiki.apache.org/couchdb/FUQ
Also I swear MongoDB's doc's had a decent page on document database data
modeling but I can't seem to dig it up right now.
If there are any mistakes I hope others with chime in.
--
Sean Copenhaver
On Tuesday, December 27, 2011 at 4:31 PM, Steven Ringo wrote:
Hi all,
(I originally posted this on the mobile-couchbase list, but was
recommended to post it here instead, since this is more of a general
couchdb question)
I am struggling with something that I know would be quite trivial to do
in SQL, but seems to be a lot more complex to do in couchdb. Perhaps 20
years of experience with SQL is clouding my judgement and so I am trying
rather to break away from the SQL/relational mindset and understand how
this would be better designed in couch.
I am developing an iOS art gallery app, and wish to display information
about artworks, artists and galleries.
The menu screen requires a list of all artworks together with the
gallery they are currently housed at and the artist that created the
artwork:
* artwork title
* artwork medium
* artwork dimensions
* artist full_name
* gallery name,
* gallery city
for each artwork, respectively.
Imagine the following objects in my database (I am not including all the
fields for sake of brevity. i.e. only the ones that are important).
| Artist
------
_id
full_name
birth_year
birth_place
Gallery
-------
_id
name
city
Artwork
-------
_id
title
medium
dimensions
artist_id_ (foreign key)
gallery_id_ (foreign key)
|
In SQL it would be as easy as one query with two joins:
|SELECT
artwork.title,
artwork.medium,
artwork.dimensions,
artist.full_name,
gallery.name (http://gallery.name),
gallery.city
FROM
artwork INNER JOIN artist
ON artwork.artist_id_ = artist._id INNER JOIN gallery
ON artwork.gallery_id_ = gallery._id
|
From this I would be able to get the data I need in one query (i.e. one
call to the database).
| Guernica | Pablo Picasso | Museo Reina Sofia, Madrid, Spain
Mona Lisa | Leonardo da Vinci | Musée du Louvre, Paris, France
David | Michelangelo | Uffizi Gallery, Florence, Italy.
|
Using couchdb I am storing the data in a very similar way to the way I
would in a relational database, represented as something along the lines of:
|{
"type" : "Artwork",
"_id" : "d665afaedde12b44c5159cf1782c5c1e",
"_rev" : "1-446201af8a1ddfa3759164a01be899de",
"artist_id" : "d665afaedde12b44c5159cf1782c2d0b",
"gallery_id" : "d665afaedde12b44c5159cf1782d44af",
"title" : "Guernica",
"medium" : "Oil on canvas",
"dimensions" : "349 cm × 776 cm (137.4 in × 305.5 in)"
}
{
"type" : "Artist",
"_id" : "d665afaedde12b44c5159cf1782c2d0b",
"_rev" : "1-8fa8a6dd50d9d9072b08db1a840128b1",
"full_name" : "Pablo Picasso",
"birth_year" : "1881"
}
{
"type" : "Gallery",
"_id" : "d665afaedde12b44c5159cf1782d44af",
"_rev" : "1-27a139e40a7f88d6465812eec3c73a0f",
"name" : "Museo Reina Sofia"
}
|
To get the same result in couch, I would have to do one query to get all
the artworks, e.g.
|function(doc) {
if(doc.type&& doc.type == 'Artist') {
emit(doc._id,doc);
}
}
|
then I would need to do a |GET| for the _id of each |Gallery| and
|Artist| returned per |Artwork| respectively to fill in the missing
information. This is usually known as the |n+1| queries problem (in this
case |2n+1|).
I have always worked on the assumption that the database should be
leveraged to do as much of the heavy lifting as possible with as few
network hits as possible (i.e. SQL is always faster than client
manipulation). With a list of 120 artworks, this means 241 HTTP GETs to
the database.
I am aware of "linked documents" in views
(http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Linked_documents),
however this seems to only work with one related row.
I can't really see how "view collation"
(http://wiki.apache.org/couchdb/View_collation) would work here either.
With view collation the correlated information ends up being on
different rows, and I think that would be better suited for where say a
single Artists or Gallery is displayed with associated Artworks. My case
is the opposite, whereby Artworks plus corresponding information from
Artists and Galleries is required.
I am also aware that one could embed each gallery or artist in each
artwork or vice versa, but that seems to be a lot of duplication, and
would make it difficult to change artist or gallery information if
required. Maybe this is how it should be done?
For this small app, this is not going to be a performance issue. I am
trying rather to break away from a SQL/relational mindset and understand
how this would be better designed in a database like couch.
Thanks,
Steve