On 10/14/15 12:55 PM, jason kirtland wrote:
> If you can partition the rows numerically, this is trivially easily to
> implement using redis as the orchestrator.
>
> For example if you have integer PKs, you might have a loop like:
>
> offset = 0
> while offset < tablesize:
> for row in query[offset:batchsize]:
> migrate(row)
> commit()
> offset += batchsize
>
> With redis orchestrating, you use a key in redis and INCRBY to reliably
> distribute batches to an arbitrary number of workers on an arbitrary
> number of hosts.
>
> while True:
> offset = redis.incrby('migration-offset', batchsize)
> rows = query[offset:batchsize]
> if not rows:
> break
> for row in rows:
> migrate(row)
> commit()
>
> INCRBY is atomic and returns the adjusted value, so every invocation of
> this script that calls into redis and INCRBYs by, say, 1000, has its own
> chunk of 1000 to work on. For a starting value of -1000 and four
> invocations, you'd see 0, 1000, 2000 and 3000.
>
> I'll typically do this on one invocation, see that it's running well and
> that I chose a performant batch size, and then spin up additional
> workers on more cores until the migration hits the overall throughput
> required.
What am I missing that one wouldn't use say multiprocessing.Pool() to do
this kind of thing in the general sense? If we're only talking about
5-10 runners they could just as well be local forked processes.
>
>
>
> On Wed, Oct 14, 2015 at 9:32 AM, Jonathan Vanasco <[email protected]
> <mailto:[email protected]>> wrote:
>
> I have to run a script on 2MM objects to update the database. Not
> really a schema migration, more like changing the internal data
> representation in the fields.
>
> There's a bit of post-processing and bottlenecks involved, so doing
> everything one-at-a-time will take a few days.
>
> I'd like to split this out into 5-10 'task runners' that are each
> responsible for a a section of the database (ie, every 5th record).
> That should considerably drop the runtime.
>
> I thought I had seen a recipe for this somewhere, but checked and
> couldn't find anything. That leads me to question if this is a good
> idea or not. Anyone have thoughts/pointers?
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected]
> <mailto:[email protected]>.
> To post to this group, send email to [email protected]
> <mailto:[email protected]>.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected]
> <mailto:[email protected]>.
> To post to this group, send email to [email protected]
> <mailto:[email protected]>.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.