On Wed, Dec 12, 2012 at 05:07:08PM -0500, nicholas a. evans wrote: > On Wed, Dec 12, 2012 at 4:03 PM, James Marca > <[email protected]> wrote: > > I feel your pain but cannot offer any help. I also use your option 5: > > I use node.js to manually store view output into a separate db, with > > the doc _ids equal to the key of the view output, so that I can limit > > updates to only those things that change. > > Thanks James. Do you apply the changes incrementally, and if so how > do you detect which view rows (in the source DB) have changes so you > don't need to download the whole reduced/grouped view? And to the > point of my last email, how do you detect missing view rows in the > source DB and delete them from the chained DB?
My app is somewhat of a special case, which is why I was willing to roll my own. I am averaging multiple imputation runs to generate a single estimate of a parameter. So when "something changes", yes, I have to re-collect all the view outputs because that means I cranked up the imputation engine and re-ran more estimates. I get new data in yearly and monthly batches, and the data is large so I tend to store my imputation outputs in one database per county per year. This hand-sharding means that when adding new data I don't have to worry about avoiding old data in the view collation step. If I'm just running a new month of data (hypothetical, I've only done years at a time so far), then I set the view parameters to get from the start time to the end time (my view keys are the detector id and the hour the data was collected) to limit the work I have to do. Finally, given that my application is gathering real world measurements, I never delete things, so worring about rows in the collated db that should go away is also not an issue for me. If I had to do that, I would create a view on my collation db that spits out some unique key (detector, day, for example) I can check against the source databases. Or else, I'd increment a field in the collated document, say the date stamp of the update or something, and then have a view that spit out docs sorted by that field, and run a background job to reap the ones that are out of date. Sorry, that is probably totally unhelpful. Regards, James Marca
pgpI63nAE0d30.pgp
Description: PGP signature
