Re: [Scikit-learn-general] Feature selection != feature elimination?

2016-03-14 Thread Joel Nothman
Currently there is no automatic mechanism for eliminating the generation of
features that are not selected downstream. It needs to be achieved manually.

On 15 March 2016 at 08:05, Philip Tully  wrote:

> Hi,
>
> I'm trying to optimize the time it takes to make a prediction with my
> model(s). I realized that when I perform feature selection during the
> model fit(), that these features are likely still computed when I go
> to predict() or predict_proba(). An optimization would then involve
> actually eliminating those features that aren't selected from my
> Pipeline altogether, instead of just selecting them.
>
> Does sklearn already do this automatically? Or does this readjustment
> need to be done manually before serialization?
>
> thanks,
> Philip
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] Feature selection != feature elimination?

2016-03-14 Thread Philip Tully
Hi,

I'm trying to optimize the time it takes to make a prediction with my
model(s). I realized that when I perform feature selection during the
model fit(), that these features are likely still computed when I go
to predict() or predict_proba(). An optimization would then involve
actually eliminating those features that aren't selected from my
Pipeline altogether, instead of just selecting them.

Does sklearn already do this automatically? Or does this readjustment
need to be done manually before serialization?

thanks,
Philip

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] performance/scalability of NMF

2016-03-14 Thread Tom DLT
Hi Roberto,

In 0.17, we added a coordinate descent solver for NMF, which is more
efficient than previous projected gradient solver.

About performances for both dense and sparse data, I link you to this
pull-request for a better NMF benchmark
.

About multithreading, the new solver releases the GIL (through cython code)
during a large part of the time.
The other main computational cost goes with numpy dot product, which
depends on your BLAS configuration.
Here is also a quick example for benchmarking multithreading
:

Best,

Tom

2016-03-11 12:26 GMT+01:00 Roberto Pagliari :

> Are there results about performance and scalability of scikit-learn
> implementation of NMF?
>
> According to this thread on SO
>
>
> http://stackoverflow.com/questions/18575846/non-negative-matrix-factorization-of-sparse-input
>
> There are scalability issue. I would be interested to know the biggest
> dataset NMF can handle and what the memory footprint is.
>
> Thank you,
>
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general