This sounds like a fine solution.  Make sure that you lock down permissions
on models after writing so you know they are static and you should be fine.

On the positive side, you can run map-reduce programs with the models as
input!


On Thu, Jul 18, 2013 at 12:14 AM, Johannes Schulte <
[email protected]> wrote:

> hi,
>
> we are just keeping them in hdfs, one directory with timestamp per model
> and a meta file gathering some metrics like AUC, number of training
> examples, class distribution. This makes it easy to generate reports out of
> it on the fly, why this would be very hard with git (plus there is no added
> value).
>
> This might not be the best solution but it's a cheap way to see model
> performance over time and better than no history
>
>
> On Thu, Jul 18, 2013 at 7:15 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Keeping old models is one thing.  Keeping track of exactly which data you
> > trained with is another thing.
> >
> > Since you often need access to both old and new models at the same time,
> it
> > is common to simply burn a serial number into the file containing the
> model
> > and simply keep all of them.  You need to keep a record as well of which
> > model resulted from which data using which build of the training
> software.
> >
> > This leads to the question of how you make sure you know what training
> data
> > you used.  If your data is relatively small, then making a copy is a fine
> > idea.  As your input data gets bigger and in a production setting where
> > data is coming in all the time, you may find that you need to start using
> > something like a snapshot so that you don't actually use n times the
> > storage for your data and so that you get an exact moment in time for the
> > training data.
> >
> >
> >
> >
> > On Wed, Jul 17, 2013 at 7:29 PM, Lee, Howon <[email protected]>
> > wrote:
> >
> > > Hey, I'm planning to make some sgd logistic regression models,
> serialize
> > > them to save them and test my programs with these models.
> > >
> > > It seems pretty terrible to check them into my version control, because
> > > they're binaries.
> > >
> > > Is there a good way to keep track of versions of my models, revert
> them,
> > > etc, even though they're serialized and stuff? I've been thinking about
> > > making a separate repo just for these models. Does anybody have any
> > > experience and/or advice in this matter?
> > >
> >
>

Reply via email to