Something that hasn't been discussed in a while is semi-supervised
learning. Issue #1243
<https://github.com/scikit-learn/scikit-learn/issues/1243> suggests a
generic meta-estimator approach may be feasible, but there might be a few
different approaches available. More of an issue in my opinion is that the
API etc for semi-supervised learning needs to be tightened (even if only to
allow external semi-supervised algorithm implementations to fit into the
scikit-learn framework).
For large collections of unlabelled data, partial_fit support is probably a
must; even for smaller collections, I think our cross validation strategies
need altering, as it makes no sense to have unlabelled data as a test
instance. Finally, if the unlabelled data is contiguous in the input, it
would be ideal not to copy that data, which the test of X[y == -1] (and its
inverse) will do.
Would improving semi-supervised techniques / support be an appropriate GSoC?
On 13 February 2015 at 05:55, Milton Pividori <milto...@gmail.com> wrote:
> Hi, guys. My name is Milton Pividori and this is the first time I write to
> this list. I'm a PhD student, working on clustering, particularly on
> consensus clustering. I'm relatively new to Python, and I am migrating
> legacy code from MATLAB. I plan to use scikit-learn as well as other
> libraries.
>
> After looking at the scikit code and the mailing list, I didn't found any
> methods related to consensus clustering or cluster ensembles. I think the
> main paper about it is the one from Strehl and Ghosh (2002, JMLR, link
> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I don't
> know if you discussed about it before, but I think it could be a good idea
> to have these consensus functions implemented in scikit-learn (the paper
> proposes three, graph-based).
>
> I was thinking on how to implement them. These three consensus functions
> (CSPA, HGPA and MCLA) use METIS for graph partitioning. That could be an
> obstacle for scikit-learn interests, as a new dependency would be needed (I
> found python bindings for it). It would be also necessary to implement some
> methods for ensemble generation with varying levels of diversity
> (generating different clustering partitions by varying algorithms, changing
> their parameters or manipulating data with projections, subsampling or
> feature selection), but that's easier than implementing the consensus
> functions.
>
> Well, it's just an idea. I would be glad to help with coding if this is
> interesting for the community.
>
> Regards,
>
> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <se.rasc...@gmail.com>:
>
> What about adding multiclass support for the SVC "roc_auc" for grid search
>> CV to the to do list?
>>
>> Best,
>> Sebastian
>>
>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <ronnie.gh...@gmail.com>
>> wrote:
>>
>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>
>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <rag...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there a good deal of interest in having GAMs implemented?
>>>
>>> The timeline for such a project would go something like :
>>>
>>> Before GSoC:
>>> * Implement SpAM
>>>
>>> Before Midterm :
>>> * Help merge pyearth into scikit learn
>>> * Implement Additive Model -> `AdditiveClassifier` /
>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>
>>> After Midterm :
>>> * Implement GAMLSS
>>> * Implement LISO
>>>
>>> Kindly also see
>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>> references with citation counts.
>>>
>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>> could be used as reference material too...
>>>
>>> On a scale of 0 to 100 could I know how much importance / interest
>>> would there be in such a project for GSoC 2015?
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general