On 04/05/2013 01:05 PM, Bill Power wrote:
> Lars: must have missed your response earlier. i guess i was hoping for
> convenient instead of good :-)
>
> i don't concede to some of your points though. that validation is
> significantly complicated is not true as presumably you just need to
> check for the feature dimension of each class. what's that? a loop and
> a shape check? hardly complicated or slow. i also don't know if i
> accept the memory issues as machine learning isn't exactly the most
> optimal in terms of memory and processing power is it? i would imagine
> this would add minimal extra data as you can delete the dict memory
> after joining it all up.
>
It does add lines that are really special case. And loops are slow!
> my flow involves keeping my features in separate files for each class
> of data, and it was getting a bit annoying having to use a few extra
> lines before calling fit. for this process flow, reallilgnment must
> always be performed with the current exposure of the fit methods. so
> where's the loss in wrapping it in an estimator function over doing
> the alignment myself as it has to be done anyway.
Yeah but an unexperienced user will ask "why does it crash when my 3gb
dataset does fit into my 4gb ram?"
>
> that symmetry would be broken is a good point so it is probably not
> appropriate to do the dict in the fit method, but perhaps it would
> make sense as a new method called "fit_dict" or something which might
> fit (hehe, get it?) in with the fit_transform and other fit_predict
> helper methods that are in kmeans, for example.
>
Never. Seriously. Adding a function for an unlikely edge case is never
going to happen.
>
> Andreas: this doesn't affect me at all when i am performing
> classification as i've already "figured out" the classifier that I
> need and the process is wrapped in higher level functions. it just
> annoys me that i have do do this when i'm investigating new models
> from the command line
>
Well you can do
def my_helper(clf):
return Pipeline([("dict_to_vect", DictToVect()), , ("classifier",
clf())]
and can just do
svm = myhelper(SVC(C=10))
in the interactive shell.
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general