On 10/05/2015 11:30 AM, Kyle Kastner wrote:
> preprocessing was done with straight numpy, and I am 90% sure there is
> a more "sklearn approved" way to do it using FeatureUnion, etc.
Nope, not really currently. Not nicely. ColumnTransformer is not merged yet.
Also OneHotEncoder is currently not i
I did a piece of that in the Titanic examples from the SciPy tutorial,
but it could definitely use a more thorough and clear example. This
version could probably be simplified/streamlined - much of my
preprocessing was done with straight numpy, and I am 90% sure there is
a more "sklearn approved" w
On 09/30/2015 05:53 PM, KAB wrote:
> s. And this is due to the special way scikit-learn requires the data
> to be presented to its objects. Last time I checked (I really don't
> know if there has been any change since then) one had to do some
> wrangling with pandas' data frames, however subtl
On 10/05/2015 05:59 AM, Dale Smith wrote:
> Ah, I should say that the dealing with "too large" data should be referred to
> Andreas' tutorial at PyData 2015 (I think that's where I saw it) and the
> scikit-learn website. I don't see any reason to repeat it at PyData or PyCon.
> If necessary, I
5, 2015 8:35 AM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] PyCon 2016 scikit-learn tutorial
Re “The Ins and Outs of a Machine Learning Pipeline — About the data that you
feed to a learning algorithm and how to analyze the results”. I'm referencing
the proposed list of t
Suite 400 | Atlanta, GA
30305
-Original Message-
From: Sebastian Raschka [mailto:se.rasc...@gmail.com]
Sent: Wednesday, September 30, 2015 9:13 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] PyCon 2016 scikit-learn tutorial
If there is interest, I could wo
Hi all,
Interesting discussion – thanks for the thoughts and ideas! I like the idea
of a Data Munging tutorial being separate from an ML tutorial. With only 3
hours, that seems more doable than trying to squeeze it all into one
session. Perhaps Stephan's idea of an image-focused ML tutorial could b
If people are planning to work on this, it would be good to check what
Andy and I presented at SciPy, which is based on what Jake and Olivier
did at PyCon (and what Andy, Jake and Gael did at SciPy 2013, etc.
etc.).
To Sebastian's points - we covered all of these nearly verbatim except
perhaps cla
I believe a “Data cleaning and preprocessing for data science”
(insert-snappier-title-here) tutorial would be a great addition to a PyCon.
It’s a prerequisite for machine learning, that’s sure. A machine learning
tutorial should probably not completely sweep it under the carpet, but treat it
in
If there is interest, I could work on something like
“The Ins and Outs of a Machine Learning Pipeline — About the data that you feed
to a learning algorithm and how to analyze the results”
covering the topics
Part 1:
- class label encoding
- feature encoding
- feature selection vs. dimensionali
I agree that data munging is not strictly speaking a machine learning
question, i.e. from the mathematics or computational point of view. But
there is no denying the fact that most time doing machine learning is
actually spent on data munging. So surely dealing with data has
something to do with ma
I totally agree with Jake. However, I also think that a few general tutorials
on “preprocessing” of “clean” datasets (clean in terms of missing values,
duplicates, outliers have been dealt with) could be useful to a broader,
interdisciplinary audience. For example:
- encoding class labels, enco
Hi,
The problem with including data munging in the tutorial is that it's not
really a machine learning question. Solutions are generally so
domain-specific that you can't present it in a way that would be generally
useful to an interdisciplinary audience. This is why most (all?) short
machine learn
Hello Jake and Andy,
If you would not mind some advice, I would suggest including examples
(or at least one) where you use data that is not built-in. I remember
the first several tutorials (if not all of them) relied completely on
built-in data sets and unapologetically ignored the big elephant in
Hi Jake.
I think the tutorial Kyle and I did based on the previous tutorials was
working quite well.
I think it would make sense to work of our scipy ones and improve them
further.
I'd be happy to work on it.
We have some more exercises in a branch, and I have also improved
versions of some of
15 matches
Mail list logo