Keeping old models is one thing.  Keeping track of exactly which data you
trained with is another thing.

Since you often need access to both old and new models at the same time, it
is common to simply burn a serial number into the file containing the model
and simply keep all of them.  You need to keep a record as well of which
model resulted from which data using which build of the training software.

This leads to the question of how you make sure you know what training data
you used.  If your data is relatively small, then making a copy is a fine
idea.  As your input data gets bigger and in a production setting where
data is coming in all the time, you may find that you need to start using
something like a snapshot so that you don't actually use n times the
storage for your data and so that you get an exact moment in time for the
training data.




On Wed, Jul 17, 2013 at 7:29 PM, Lee, Howon <[email protected]> wrote:

> Hey, I'm planning to make some sgd logistic regression models, serialize
> them to save them and test my programs with these models.
>
> It seems pretty terrible to check them into my version control, because
> they're binaries.
>
> Is there a good way to keep track of versions of my models, revert them,
> etc, even though they're serialized and stuff? I've been thinking about
> making a separate repo just for these models. Does anybody have any
> experience and/or advice in this matter?
>

Reply via email to