Hi Mathieu,

Sorry to jump in on this conversation late.  This is a bit verbose, but I've 
seen this question go by unanswered way too many times and decided to be 
proactive. 

*Long story short: CouchDB is ideal for this, even on big data sets.  It will 
be fast at scale.

* Details:  When working with dates in couchdb, I almost always find myself 
using the following pattern:

1) Store the date-time in either epoch seconds or a ISO std format, both of 
which are convenient to work with in javascript (for server-side views as well 
as client applications).  Your choice of ISO 8601 formatted works nicely with 
the JS Date class:

var d = new Date(Date.parse("2012-02-11T22:00:00"))

2) Then, in preparation for future reduces you will likely end up wanting, I'd 
use a compound key structured like:
[<userId>, year, month, day]

So, the map code would be:

function(doc){
    if (doc && doc.userId && doc.timeScheduled && doc.dollarValue) {
        var d = new Date(Date.parse("2012-02-11T22:00:00"));
        //note, Month runs [0,11]
        emit( [doc.userId, d.getFullYear(), d.getMonth(), d.getDate()], 
doc.dollarValue);
    }
}

where I've assumed that you may want to aggregate on some fictitious 
doc.dollarValue numerical field.  For that, you would add to your design 
document a builtin reduce function:

"reduce": "_stats" 

to get the count, sum, min value, max value, mean and std-dev.  Let's suppose 
we call this view "idByTime" and it  lives in the design_doc called "selectors".

3) Now, to query this for the SELECT you want you would do:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?reduce=false&startkey=\["bob",2012,0,1\]&endkey=\["bob",2012,0,25\]'

to get the list of document ids that fall within Jan 1, 2012 and Jan 25, 2012 
for user id "bob".  

Now, if you want to get the full documents, you can just change that to:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?reduce=false&startkey=\["bob",2012,0,1\]&endkey=\["bob",2012,0,25\]&include_docs=true'

4) Now, the real fun comes when you can use that same index to do query-time 
rollup that's super fast.  For this the thing you want to note is the 
group_level option at query time.  If you have a key of 'n' dimensions (n=4 in 
our case), then you can roll it up from dimensionality n=0 through n=4.  So, at 
full dimensionality:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=4'

will give you the values for all users aggregated by day.  You can add startkey 
and endky just as before to slice into the range.  

Now if you want to roll it up by user/year/month:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=3'

by user/year:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=2'

by user:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=1'

and ultimately roll up over all users:

curl -X GET 
'http://demo.cloudant.com/dbname/_design/sectors/_view/idByTime?group_level=0'

Note that group_level=0 => "group=false", and group_level = n => "group=true" 
in the view query options at:

http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options.

I prefer to just be explicit with the group_level and forget that 
group=true/false exists.

Thanks, Mike

p.s., apologies for any typos, I was cribbing this from some cloudant 
blog-posts in the making.



On Feb 13, 2012, at 11:11 AM, Mathieu Castonguay wrote:

> I tried that exact example with
> ?startKey=["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-11T22:00:00"]&endkey=["26de9c438e5d1c0f075f2ae6ad0bcc82",{}]
> and I still get records in the past:
> 
> {"total_rows":3,"offset":0,"rows":[
> {"id":"344e921af796598bcd709ba973003c60","key":["26de9c438e5d1c0f075f2ae6ad0b39b2","2012-02-13T16:18:19.565+0000"],"value":"344e921af796598bcd709ba973003c60"},
> {"id":"344e921af796598bcd709ba973001d3f","key":["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-10T21:44:14.920+0000"],"value":"344e921af796598bcd709ba973001d3f"},
> {"id":"344e921af796598bcd709ba973002c01","key":["26de9c438e5d1c0f075f2ae6ad0bcc82","2012-02-10T22:05:48.218+0000"],"value":"344e921af796598bcd709ba973002c01"}
> ]}
> 
> 
> The view's map function is:
> 
> function(doc) { if(doc.userId && doc.timeScheduled)
> {emit([doc.userId,doc.timeScheduled], doc._id)} }
> 
> 
> 
> 
> On Mon, Feb 13, 2012 at 1:55 PM, James Klo <[email protected]> wrote:
> 
>> Not sure how you are querying, but are you doing the equivalent to this?
>> startkey and endkey should be expressed as JSON
>> 
>> curl -g '
>> http://localhost:5984/orders/_design/Order/_view/by_users_after_time?startkey=[
>> "f98ba9a518650a6c15c566fc6f00c157","2012-01-01T11:40:52.280Z"]&endkey=["userid",{}]'
>> 
>> 
>> *
>> Jim Klo
>> Senior Software Engineer
>> Center for Software Engineering
>> SRI International
>> e. [email protected]
>> p. 805.542.9330 x121
>> m.  805.286.1350
>> f. 805.546.2444
>> *
>> 
>> On Feb 13, 2012, at 10:27 AM, Mathieu Castonguay wrote:
>> 
>> I tried reversing the keys with no luck. I still get timestamps that are in
>> the past (before the startKey).
>> 
>> On Sat, Feb 11, 2012 at 6:37 PM, James Klo <[email protected]> wrote:
>> 
>> Reverse the key. [userid, time]
>> 
>> 
>> CouchDB is all about understanding collation. Basically views are
>> 
>> sorted/grouped from left to right alphanumeric. See
>> 
>> http://wiki.apache.org/couchdb/View_collation for the finer details as
>> 
>> there are more rules than the basics I mention.
>> 
>> 
>> so the reversal sorts the view by userid first, then date as string.
>> 
>> Instead of sorting by dates then userids.
>> 
>> 
>> You do it this way because you know the exact userid, but not the exact
>> 
>> date. If you knew the exact date, but not the userid, what you have
>> 
>> currently would be better.
>> 
>> 
>> - Jim
>> 
>> 
>> 
>> Sent from my iPad
>> 
>> 
>> On Feb 11, 2012, at 1:54 PM, "Mathieu Castonguay" <
>> 
>> [email protected]> wrote:
>> 
>> 
>> I have a simple document named Order structure with the fields id, name,
>> 
>> userId and timeScheduled.
>> 
>> 
>> What I would like to do is create a view where I can find the
>> 
>> document.idfor those who's userId is some value and timeScheduledis
>> 
>> after a given date.
>> 
>> 
>> My view:
>> 
>> 
>> "by_users_after_time": {
>> 
>>     "map": "function(doc) { if (doc.userId && doc.timeScheduled) {
>> 
>> emit([doc.timeScheduled, doc.userId], doc._id); }}"
>> 
>> }
>> 
>> 
>> If I do
>> 
>> 
>> 
>> localhost:5984/orders/_design/Order/_view/by_users_after_time?startKey="[2012-01-01T11:40:52.280Z,f98ba9a518650a6c15c566fc6f00c157]"
>> 
>> I get every result back. Is there a way to access key[1] to do an if
>> 
>> doc.userId == key[1] or something along those lines and simply emit on
>> 
>> the
>> 
>> time?
>> 
>> 
>> This would be the SQL equivalent of select id from Order where userId =
>> 
>> "f98ba9a518650a6c15c566fc6f00c157" and timeScheduled >
>> 
>> 2012-01-01T11:40:52.280Z;
>> 
>> 
>> I did quite a few Google searches but I can't seem to find a good
>> 
>> tutorial
>> 
>> on working with multiple keys. It's also possible that my approach is
>> 
>> entirely flawed so any guidance would be appreciated.
>> 
>> 
>> Thank you,
>> 
>> 
>> Matt
>> 
>> 
>> 
>> 

Reply via email to