Hi all,
(I originally posted this on the mobile-couchbase list, but was
recommended to post it here instead, since this is more of a general
couchdb question)
I am struggling with something that I know would be quite trivial to do
in SQL, but seems to be a lot more complex to do in couchdb. Perhaps 20
years of experience with SQL is clouding my judgement and so I am trying
rather to break away from the SQL/relational mindset and understand how
this would be better designed in couch.
I am developing an iOS art gallery app, and wish to display information
about artworks, artists and galleries.
The menu screen requires a list of all artworks together with the
gallery they are currently housed at and the artist that created the
artwork:
* artwork title
* artwork medium
* artwork dimensions
* artist full_name
* gallery name,
* gallery city
for each artwork, respectively.
Imagine the following objects in my database (I am not including all the
fields for sake of brevity. i.e. only the ones that are important).
| Artist
------
_id
full_name
birth_year
birth_place
Gallery
-------
_id
name
city
Artwork
-------
_id
title
medium
dimensions
artist_id_ (foreign key)
gallery_id_ (foreign key)
|
In SQL it would be as easy as one query with two joins:
|SELECT
artwork.title,
artwork.medium,
artwork.dimensions,
artist.full_name,
gallery.name,
gallery.city
FROM
artwork INNER JOIN artist
ON artwork.artist_id_ = artist._id INNER JOIN gallery
ON artwork.gallery_id_ = gallery._id
|
From this I would be able to get the data I need in one query (i.e. one
call to the database).
| Guernica | Pablo Picasso | Museo Reina Sofia, Madrid, Spain
Mona Lisa | Leonardo da Vinci | Musée du Louvre, Paris, France
David | Michelangelo | Uffizi Gallery, Florence, Italy.
|
Using couchdb I am storing the data in a very similar way to the way I
would in a relational database, represented as something along the lines of:
|{
"type" : "Artwork",
"_id" : "d665afaedde12b44c5159cf1782c5c1e",
"_rev" : "1-446201af8a1ddfa3759164a01be899de",
"artist_id" : "d665afaedde12b44c5159cf1782c2d0b",
"gallery_id" : "d665afaedde12b44c5159cf1782d44af",
"title" : "Guernica",
"medium" : "Oil on canvas",
"dimensions" : "349 cm × 776 cm (137.4 in × 305.5 in)"
}
{
"type" : "Artist",
"_id" : "d665afaedde12b44c5159cf1782c2d0b",
"_rev" : "1-8fa8a6dd50d9d9072b08db1a840128b1",
"full_name" : "Pablo Picasso",
"birth_year" : "1881"
}
{
"type" : "Gallery",
"_id" : "d665afaedde12b44c5159cf1782d44af",
"_rev" : "1-27a139e40a7f88d6465812eec3c73a0f",
"name" : "Museo Reina Sofia"
}
|
To get the same result in couch, I would have to do one query to get all
the artworks, e.g.
|function(doc) {
if(doc.type&& doc.type == 'Artist') {
emit(doc._id,doc);
}
}
|
then I would need to do a |GET| for the _id of each |Gallery| and
|Artist| returned per |Artwork| respectively to fill in the missing
information. This is usually known as the |n+1| queries problem (in this
case |2n+1|).
I have always worked on the assumption that the database should be
leveraged to do as much of the heavy lifting as possible with as few
network hits as possible (i.e. SQL is always faster than client
manipulation). With a list of 120 artworks, this means 241 HTTP GETs to
the database.
I am aware of "linked documents" in views
(http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Linked_documents),
however this seems to only work with one related row.
I can't really see how "view collation"
(http://wiki.apache.org/couchdb/View_collation) would work here either.
With view collation the correlated information ends up being on
different rows, and I think that would be better suited for where say a
single Artists or Gallery is displayed with associated Artworks. My case
is the opposite, whereby Artworks plus corresponding information from
Artists and Galleries is required.
I am also aware that one could embed each gallery or artist in each
artwork or vice versa, but that seems to be a lot of duplication, and
would make it difficult to change artist or gallery information if
required. Maybe this is how it should be done?
For this small app, this is not going to be a performance issue. I am
trying rather to break away from a SQL/relational mindset and understand
how this would be better designed in a database like couch.
Thanks,
Steve