I like that idea Riyad. I'll give it a shot. Thanks. On Feb 6, 2014, at 1:04 PM, Riyad Kalla <[email protected]> wrote:
> Dan, I wonder if you would be better serviced by creating a View in your > original DB that does all the needed manipulation to the docs and code up > some form of manual replication where you take all the results from that > view and copy them into your target data source? > > You wouldn't be able to use the built-in CouchDB replication, but at least > you would have total control over the data leaving your master source (it > sounds like in your case masking PII/sensitive data before it leaves is > important, so this step might be handy). > > > On Thu, Feb 6, 2014 at 11:06 AM, Jens Alfke <[email protected]> wrote: > >> >> On Feb 6, 2014, at 9:38 AM, Dan Santner <[email protected]> wrote: >> >>> I have the replication filtering down now but I'm wondering is there >> anyway for me to change the doc before it copies to the source? >> >> Well, to take your question literally, you can of course change the >> documents on the original database before starting the replication. Only >> the latest revisions (with the redacted names) will be transferred. >> >> But I think you're asking for some kind of filter that would alter >> documents while they're being replicated? I don't think that's feasible. >> The document's revision ID is tied to its contents (it's based on a SHA-1 >> digest of the JSON) and you can't change the contents while leaving the >> revision ID the same. But changing the rev ID in the middle of replication >> would be really problematic because the replicator is transferring specific >> revisions by their revIDs, and it would confuse it if it got a different >> revID than the one it asked for. >> >>> The use case is I have production documents that I want to migrate >> somewhere else but change all the names to 'John Smith' before they land in >> the new destination. Also need to remove a couple other things that might >> be considered sensitive. >> >> The only good option I can think of is to keep the sensitive parts of the >> data in separate documents. (The main doc would have a property that >> contains the doc ID of the sensitive data.) Then you can run a filtered >> replication that sends the regular documents but not the sensitive ones. >> >> --Jens
