Pandas has a read_excel function that can load data from an excel spreadsheet: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
On Sun, Oct 6, 2019 at 1:57 AM Mike Smith <javaeur...@gmail.com> wrote: > Can I call an MSExcel cell range in a function such as model.predict(), > instead of typing the data in for each element? > > On Sat, Oct 5, 2019 at 11:58 AM <scikit-learn-requ...@python.org> wrote: > >> Send scikit-learn mailing list submissions to >> scikit-learn@python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/scikit-learn >> or, via email, send a message with subject or body 'help' to >> scikit-learn-requ...@python.org >> >> You can reach the person managing the list at >> scikit-learn-ow...@python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of scikit-learn digest..." >> >> >> Today's Topics: >> >> 1. Re: scikit-learn Digest, Vol 43, Issue 10 (Mike Smith) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Sat, 5 Oct 2019 11:55:33 -0700 >> From: Mike Smith <javaeur...@gmail.com> >> To: scikit-learn@python.org >> Subject: Re: [scikit-learn] scikit-learn Digest, Vol 43, Issue 10 >> Message-ID: >> <CAEWZffDWv8mOUVaKSSBzpiEebjcVrRD-t8zxuBSFCKxqTGi3= >> a...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> 1. Re: Can Scikit-learn decision tree (CART) have both >> continuous and categorical features? (C W) >> >> What I'd ask in reply to this is if regression and classification module >> results can be entered into an input for one resultant output. >> >> >> >> On Sat, Oct 5, 2019, 11:50 AM , <scikit-learn-requ...@python.org> wrote: >> >> > Send scikit-learn mailing list submissions to >> > scikit-learn@python.org >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > or, via email, send a message with subject or body 'help' to >> > scikit-learn-requ...@python.org >> > >> > You can reach the person managing the list at >> > scikit-learn-ow...@python.org >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of scikit-learn digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: Can Scikit-learn decision tree (CART) have both >> > continuous and categorical features? (C W) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Sat, 5 Oct 2019 14:50:09 -0400 >> > From: C W <tmrs...@gmail.com> >> > To: Scikit-learn mailing list <scikit-learn@python.org> >> > Subject: Re: [scikit-learn] Can Scikit-learn decision tree (CART) have >> > both continuous and categorical features? >> > Message-ID: >> > < >> > cae2fw2nhdjgnky2vwk-u8fu3gqwbqwegidztawnuq+nzak6...@mail.gmail.com> >> > Content-Type: text/plain; charset="utf-8" >> > >> > Thanks, great material! I got pydotplus with graphviz to work. >> > >> > Using the code on sklean website [1], tree.plot_tree(clf.fit(iris.data, >> > iris.target)) gives an error: >> > AttributeError: module 'sklearn.tree' has no attribute 'plot_tree' >> > >> > Both my colleague and I got the same error message. Per this post >> > https://github.com/Microsoft/LightGBM/issues/1844, a PyPI update is >> > needed. >> > >> > [1] sklearn link: >> > https://scikit-learn.org/stable/modules/tree.html#classification >> > >> > >> > On Fri, Oct 4, 2019 at 11:52 PM Sebastian Raschka < >> > m...@sebastianraschka.com> >> > wrote: >> > >> > > The docs show a way such that you don't need to write it as png file >> > using >> > > tree.plot_tree: >> > > https://scikit-learn.org/stable/modules/tree.html#classification >> > > >> > > I don't remember why, but I think I had problems with that in the >> past (I >> > > think it didn't look so nice visually, but don't remember), which is >> why >> > I >> > > still stick to graphviz. For my use cases, it's not much hassle -- it >> > used >> > > to be a bit of a hassle to get GraphViz working, but now you can do >> > > >> > > conda install pydotplus >> > > conda install graphviz >> > > >> > > Coincidentally, I just made an example for a lecture I was teaching on >> > > Tue: >> > > >> > >> https://github.com/rasbt/stat479-machine-learning-fs19/blob/master/06_trees/code/06-trees_demo.ipynb >> > > >> > > Best, >> > > Sebastian >> > > >> > > >> > > > On Oct 4, 2019, at 10:09 PM, C W <tmrs...@gmail.com> wrote: >> > > > >> > > > On a separate note, what do you use for plotting? >> > > > >> > > > I found graphviz, but you have to first save it as a png on your >> > > computer. That's a lot work for just one plot. Is there something >> like a >> > > matplotlib? >> > > > >> > > > Thanks! >> > > > >> > > > On Fri, Oct 4, 2019 at 9:42 PM Sebastian Raschka < >> > > m...@sebastianraschka.com> wrote: >> > > > Yeah, think of it more as a computational workaround for achieving >> the >> > > same thing more efficiently (although it looks inelegant/weird)-- >> > something >> > > like that wouldn't be mentioned in textbooks. >> > > > >> > > > Best, >> > > > Sebastian >> > > > >> > > > > On Oct 4, 2019, at 6:33 PM, C W <tmrs...@gmail.com> wrote: >> > > > > >> > > > > Thanks Sebastian, I think I get it. >> > > > > >> > > > > It's just have never seen it this way. Quite different from what >> I'm >> > > used in Elements of Statistical Learning. >> > > > > >> > > > > On Fri, Oct 4, 2019 at 7:13 PM Sebastian Raschka < >> > > m...@sebastianraschka.com> wrote: >> > > > > Not sure if there's a website for that. In any case, to explain >> this >> > > differently, as discussed earlier sklearn assumes continuous features >> for >> > > decision trees. So, it will use a binary threshold for splitting >> along a >> > > feature attribute. In other words, it cannot do sth like >> > > > > >> > > > > if x == 1 then right child node >> > > > > else left child node >> > > > > >> > > > > Instead, what it does is >> > > > > >> > > > > if x >= 0.5 then right child node >> > > > > else left child node >> > > > > >> > > > > These are basically equivalent as you can see when you just plug >> in >> > > values 0 and 1 for x. >> > > > > >> > > > > Best, >> > > > > Sebastian >> > > > > >> > > > > > On Oct 4, 2019, at 5:34 PM, C W <tmrs...@gmail.com> wrote: >> > > > > > >> > > > > > I don't understand your answer. >> > > > > > >> > > > > > Why after one-hot-encoding it still outputs greater than 0.5 or >> > less >> > > than? Does sklearn website have a working example on categorical >> input? >> > > > > > >> > > > > > Thanks! >> > > > > > >> > > > > > On Fri, Oct 4, 2019 at 3:48 PM Sebastian Raschka < >> > > m...@sebastianraschka.com> wrote: >> > > > > > Like Nicolas said, the 0.5 is just a workaround but will do the >> > > right thing on the one-hot encoded variables, here. You will find that >> > the >> > > threshold is always at 0.5 for these variables. I.e., what it will do >> is >> > to >> > > use the following conversion: >> > > > > > >> > > > > > treat as car_Audi=1 if car_Audi >= 0.5 >> > > > > > treat as car_Audi=0 if car_Audi < 0.5 >> > > > > > >> > > > > > or, it may be >> > > > > > >> > > > > > treat as car_Audi=1 if car_Audi > 0.5 >> > > > > > treat as car_Audi=0 if car_Audi <= 0.5 >> > > > > > >> > > > > > (Forgot which one sklearn is using, but either way. it will be >> > fine.) >> > > > > > >> > > > > > Best, >> > > > > > Sebastian >> > > > > > >> > > > > > >> > > > > >> On Oct 4, 2019, at 1:44 PM, Nicolas Hug <nio...@gmail.com> >> wrote: >> > > > > >> >> > > > > >> >> > > > > >>> But, decision tree is still mistaking one-hot-encoding as >> > > numerical input and split at 0.5. This is not right. Perhaps, I'm >> doing >> > > something wrong? >> > > > > >> >> > > > > >> You're not doing anything wrong, and neither is the tree. Trees >> > > don't support categorical variables in sklearn, so everything is >> treated >> > as >> > > numerical. >> > > > > >> >> > > > > >> This is why we do one-hot-encoding: so that a set of numerical >> > (one >> > > hot encoded) features can be treated as if they were just one >> categorical >> > > feature. >> > > > > >> >> > > > > >> >> > > > > >> >> > > > > >> Nicolas >> > > > > >> >> > > > > >> On 10/4/19 2:01 PM, C W wrote: >> > > > > >>> Yes, you are right. it was 0.5 and 0.5 for split, not 1.5. So, >> > > typo on my part. >> > > > > >>> >> > > > > >>> Looks like I did one-hot-encoding correctly. My new variable >> > names >> > > are: car_Audi, car_BMW, etc. >> > > > > >>> >> > > > > >>> But, decision tree is still mistaking one-hot-encoding as >> > > numerical input and split at 0.5. This is not right. Perhaps, I'm >> doing >> > > something wrong? >> > > > > >>> >> > > > > >>> Is there a good toy example on the sklearn website? I am only >> see >> > > this: >> > > >> > >> https://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html >> > > . >> > > > > >>> >> > > > > >>> Thanks! >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> On Fri, Oct 4, 2019 at 1:28 PM Sebastian Raschka < >> > > m...@sebastianraschka.com> wrote: >> > > > > >>> Hi, >> > > > > >>> >> > > > > >>>> The funny part is: the tree is taking one-hot-encoding >> (BMW=0, >> > > Toyota=1, Audi=2) as numerical values, not category.The tree splits at >> > 0.5 >> > > and 1.5 >> > > > > >>> >> > > > > >>> that's not a onehot encoding then. >> > > > > >>> >> > > > > >>> For an Audi datapoint, it should be >> > > > > >>> >> > > > > >>> BMW=0 >> > > > > >>> Toyota=0 >> > > > > >>> Audi=1 >> > > > > >>> >> > > > > >>> for BMW >> > > > > >>> >> > > > > >>> BMW=1 >> > > > > >>> Toyota=0 >> > > > > >>> Audi=0 >> > > > > >>> >> > > > > >>> and for Toyota >> > > > > >>> >> > > > > >>> BMW=0 >> > > > > >>> Toyota=1 >> > > > > >>> Audi=0 >> > > > > >>> >> > > > > >>> The split threshold should then be at 0.5 for any of these >> > > features. >> > > > > >>> >> > > > > >>> Based on your email, I think you were assuming that the DT >> does >> > > the one-hot encoding internally, which it doesn't. In practice, it is >> > hard >> > > to guess what is a nominal and what is a ordinal variable, so you >> have to >> > > do the onehot encoding before you give the data to the decision tree. >> > > > > >>> >> > > > > >>> Best, >> > > > > >>> Sebastian >> > > > > >>> >> > > > > >>>> On Oct 4, 2019, at 11:48 AM, C W <tmrs...@gmail.com> wrote: >> > > > > >>>> >> > > > > >>>> I'm getting some funny results. I am doing a regression >> decision >> > > tree, the response variables are assigned to levels. >> > > > > >>>> >> > > > > >>>> The funny part is: the tree is taking one-hot-encoding >> (BMW=0, >> > > Toyota=1, Audi=2) as numerical values, not category. >> > > > > >>>> >> > > > > >>>> The tree splits at 0.5 and 1.5. Am I doing one-hot-encoding >> > > wrong? How does the sklearn know internally 0 vs. 1 is categorical, >> not >> > > numerical? >> > > > > >>>> >> > > > > >>>> In R for instance, you do as.factor(), which explicitly >> states >> > > the data type. >> > > > > >>>> >> > > > > >>>> Thank you! >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> On Wed, Sep 18, 2019 at 11:13 AM Andreas Mueller < >> > > t3k...@gmail.com> wrote: >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> On 9/15/19 8:16 AM, Guillaume Lema?tre wrote: >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> On Sat, 14 Sep 2019 at 20:59, C W <tmrs...@gmail.com> >> wrote: >> > > > > >>>>> Thanks, Guillaume. >> > > > > >>>>> Column transformer looks pretty neat. I've also heard >> though, >> > > this pipeline can be tedious to set up? Specifying what you want for >> > every >> > > feature is a pain. >> > > > > >>>>> >> > > > > >>>>> It would be interesting for us which part of the pipeline is >> > > tedious to set up to know if we can improve something there. >> > > > > >>>>> Do you mean, that you would like to automatically detect of >> > > which type of feature (categorical/numerical) and apply a >> > > > > >>>>> default encoder/scaling such as discuss there: >> > > >> > >> https://github.com/scikit-learn/scikit-learn/issues/10603#issuecomment-401155127 >> > > > > >>>>> >> > > > > >>>>> IMO, one a user perspective, it would be cleaner in some >> cases >> > > at the cost of applying blindly a black box >> > > > > >>>>> which might be dangerous. >> > > > > >>>> Also see >> > > >> > >> https://amueller.github.io/dabl/dev/generated/dabl.EasyPreprocessor.html#dabl.EasyPreprocessor >> > > > > >>>> Which basically does that. >> > > > > >>>> >> > > > > >>>> >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> Jaiver, >> > > > > >>>>> Actually, you guessed right. My real data has only one >> > numerical >> > > variable, looks more like this: >> > > > > >>>>> >> > > > > >>>>> Gender Date Income Car Attendance >> > > > > >>>>> Male 2019/3/01 10000 BMW Yes >> > > > > >>>>> Female 2019/5/02 9000 Toyota No >> > > > > >>>>> Male 2019/7/15 12000 Audi Yes >> > > > > >>>>> >> > > > > >>>>> I am predicting income using all other categorical >> variables. >> > > Maybe it is catboost! >> > > > > >>>>> >> > > > > >>>>> Thanks, >> > > > > >>>>> >> > > > > >>>>> M >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> On Sat, Sep 14, 2019 at 9:25 AM Javier L?pez <jlo...@ende.cc >> > >> > > wrote: >> > > > > >>>>> If you have datasets with many categorical features, and >> > perhaps >> > > many categories, the tools in sklearn are quite limited, >> > > > > >>>>> but there are alternative implementations of boosted trees >> that >> > > are designed with categorical features in mind. Take a look >> > > > > >>>>> at catboost [1], which has an sklearn-compatible API. >> > > > > >>>>> >> > > > > >>>>> J >> > > > > >>>>> >> > > > > >>>>> [1] https://catboost.ai/ >> > > > > >>>>> >> > > > > >>>>> On Sat, Sep 14, 2019 at 3:40 AM C W <tmrs...@gmail.com> >> wrote: >> > > > > >>>>> Hello all, >> > > > > >>>>> I'm very confused. Can the decision tree module handle both >> > > continuous and categorical features in the dataset? In this case, it's >> > just >> > > CART (Classification and Regression Trees). >> > > > > >>>>> >> > > > > >>>>> For example, >> > > > > >>>>> Gender Age Income Car Attendance >> > > > > >>>>> Male 30 10000 BMW Yes >> > > > > >>>>> Female 35 9000 Toyota No >> > > > > >>>>> Male 50 12000 Audi Yes >> > > > > >>>>> >> > > > > >>>>> According to the documentation >> > > >> > >> https://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart >> > , >> > > it can not! >> > > > > >>>>> >> > > > > >>>>> It says: "scikit-learn implementation does not support >> > > categorical variables for now". >> > > > > >>>>> >> > > > > >>>>> Is this true? If not, can someone point me to an example? If >> > > yes, what do people do? >> > > > > >>>>> >> > > > > >>>>> Thank you very much! >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> _______________________________________________ >> > > > > >>>>> scikit-learn mailing list >> > > > > >>>>> scikit-learn@python.org >> > > > > >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>>>> _______________________________________________ >> > > > > >>>>> scikit-learn mailing list >> > > > > >>>>> scikit-learn@python.org >> > > > > >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>>>> _______________________________________________ >> > > > > >>>>> scikit-learn mailing list >> > > > > >>>>> scikit-learn@python.org >> > > > > >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> -- >> > > > > >>>>> Guillaume Lemaitre >> > > > > >>>>> INRIA Saclay - Parietal team >> > > > > >>>>> Center for Data Science Paris-Saclay >> > > > > >>>>> https://glemaitre.github.io/ >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>> _______________________________________________ >> > > > > >>>>> scikit-learn mailing list >> > > > > >>>>> >> > > > > >>>>> scikit-learn@python.org >> > > > > >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>>> >> > > > > >>>> _______________________________________________ >> > > > > >>>> scikit-learn mailing list >> > > > > >>>> scikit-learn@python.org >> > > > > >>>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>>> _______________________________________________ >> > > > > >>>> scikit-learn mailing list >> > > > > >>>> scikit-learn@python.org >> > > > > >>>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>> >> > > > > >>> _______________________________________________ >> > > > > >>> scikit-learn mailing list >> > > > > >>> scikit-learn@python.org >> > > > > >>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >>> >> > > > > >>> >> > > > > >>> _______________________________________________ >> > > > > >>> scikit-learn mailing list >> > > > > >>> >> > > > > >>> scikit-learn@python.org >> > > > > >>> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >> _______________________________________________ >> > > > > >> scikit-learn mailing list >> > > > > >> scikit-learn@python.org >> > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > > >> > > > > > _______________________________________________ >> > > > > > scikit-learn mailing list >> > > > > > scikit-learn@python.org >> > > > > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > > _______________________________________________ >> > > > > > scikit-learn mailing list >> > > > > > scikit-learn@python.org >> > > > > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > >> > > > > _______________________________________________ >> > > > > scikit-learn mailing list >> > > > > scikit-learn@python.org >> > > > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > > > _______________________________________________ >> > > > > scikit-learn mailing list >> > > > > scikit-learn@python.org >> > > > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > > >> > > > _______________________________________________ >> > > > scikit-learn mailing list >> > > > scikit-learn@python.org >> > > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ >> > > > scikit-learn mailing list >> > > > scikit-learn@python.org >> > > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > >> > > _______________________________________________ >> > > scikit-learn mailing list >> > > scikit-learn@python.org >> > > https://mail.python.org/mailman/listinfo/scikit-learn >> > > >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > URL: < >> > >> http://mail.python.org/pipermail/scikit-learn/attachments/20191005/7234be32/attachment.html >> > > >> > >> > ------------------------------ >> > >> > Subject: Digest Footer >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > ------------------------------ >> > >> > End of scikit-learn Digest, Vol 43, Issue 10 >> > ******************************************** >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: < >> http://mail.python.org/pipermail/scikit-learn/attachments/20191005/14272924/attachment.html >> > >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ------------------------------ >> >> End of scikit-learn Digest, Vol 43, Issue 11 >> ******************************************** >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn