On Sun, 15 Nov 2015, at 07:46 PM, Dale Scott wrote:
> Hi, I've been lurking for a while and have a use case and architecture
> that
> I'd appreciate comments on. I've never personally built anything like
> this
> before.
> 
>  
> 
> Without intentionally obfuscating, I have 128GB of data collected from an
> experiment, roughly equivalent to a large set of 640x480 PNG images.
> Images
> are independent and analyzed image-by-image by an image recognition
> algorithm. I was thinking of dividing the set of images into sub-sets by
> a
> scheduler and have a new EC2 instance analyze each sub-set.
> 
>  
> 
> Are there any places in this scenario where couchdb would shine?
> Replicating
> a master couchdb image recognition library to each new EC2 instance?
> Replicating the analysis results from each EC2 instance to a master
> couchdb
> database?
> 
>  
> 
> Thanks!
> 
> ---
> 
> Dale R. Scott, P.Eng.
> 
> Transparency with Trust

Welcome Dale!

This sounds roughly like you have a message passing workflow:

- Jobs are inserted into the system
- N workers  process Y jobs
- The results are stored (or collated...)

For a pure couchdb approach, see https://github.com/iriscouch/cqs &
https://github.com/jo/couch-daemon in particular the links in the last
one may be very interesting for your obfuscated use case. The general
idea is to have workers actively pulling jobs off a couchdb, updating
the doc with a time-stamped reservation, and having a reaper process to
ensure that slow workers' docs are returned to the queue for another
hopefully faster worker to pick it up. Using this + attachments may work
well, or you may prefer to keep the queue separate from the raw data in
a different db.

However you may find using something like rabbitmq is easier here, or
even some hosted cloud equivalent (maybe AWS lambda) but if you want to
keep the raw & generated attachments in related (or the same) doc it may
be better in couchdb.

I know a number of people e.g. jhs@ who have successfully (ab)used
couchdb as both a message queue and a backing store for this, it really
depends on whether you want to use couchdb for everything, or have some
other needs that are better served with a real message queue
architecture + couchdb to store and transfer the potentially large
image/data attachments instead of bloating the message queue.

I think the tradeoff is largely around what else you need to do & how
much data you are sending around, and whether you need a full-blown
message queue system or can hack up the equivalents you need with
couchdb instead.

A+
Dave

Reply via email to