Hi,
Sorry I was not aware about the patches. I have used sklearn a lot so I can
send a couple of patches in the next days hopefully this should not be a
problem.
@regarding my code
I am writing some machine learning algorithms in python for my sponsor
company. We work mainly with medium size data (3000 samples and 10000
variables in general).
The algorithms are designed for a real time production process and so they
are designed to be used with minimal user intervention (or by people with
limited knowledge of data analytic). I think that some of them may be of
interest for sklearn. I did not push any of them yet because they are
prototypes and so the quality of the code is still very low. I will for
sure try to push them when I will have a better version.
Let me know if you need other information or if is there anything else that
I should do.
Thanks a lot,
Luca
Hi Luca.
>
> Have you had a look at the top of
>
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015
> ?
> For an application, it is expected that you submit some patches to the
> repo to get familiar with the codebase.
> What is your github handle (I might have overlooked it).
>
> Also, it would be great if you could push your code to github if you
> think people might be interested.
>
> Best,
> Andy
>
>
> On 03/06/2015 07:57 PM, Luca Puggini wrote:
> > Thanks a lot for the material provided on randomized pca and random
> > forest it would for sure help me in my research.
> >
> > I talked with my supervisor and he said that I am free to apply for
> > this summer project.
> >
> > I used quiet a lot GAM and I did some work related to high dimensional
> > fault detection system and so to metrics but apparently these topics
> > are already taken.
> >
> > My understanding from the previous emails is that nipals PCA may be of
> > interest. On the same topic I have a couple of algorithms that I think
> > could be useful.
> >
> > 1- Sparse principal component analysis via regularized low rank matrix
> > approximation.
> > http://www.sciencedirect.com/science/article/pii/S0047259X07000887
> > This is basically the equivalent of the nipals algorithm for SPCA. It
> > is more efficient for high dimensional problem. It is pretty useful
> > because it is possible to avoid the initial SVD.
> >
> > 2- Feature Subset Selection and Ranking for Data Dimensionality
> > Reduction http://eprints.whiterose.ac.uk/1947/1/weihl3.pdf .
> >
> > This is a method to do unsupervised features selection. Similar to
> > SPCA but it is optimized in order to maximize the percentage of
> > explained variance respect to the number of selected variables.
> >
> >
> > If these topics are not of interest I will be happy to work on
> > - improve GMM or -Global optimization based Hyperparameter optimization
> >
> > I am not familiar with these 2 topics but they are close to my
> > research area so I will be happy to study them.
> >
> >
> > Now my understanding is that the staff should contact me to discuss
> > further the various arguments. Please fill free to contact me to my
> > private email and I am happy to share my cv and my python code
> > (research quality code )
> >
> >
> > Thanks a lot,
> > Luca
> >
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general