On Tue, Aug 18, 2020 at 6:53 PM Kevin Markham <ke...@dataschool.io> wrote:
> Hi Ram, > > > For a column with numbers written like "one", "two" and missing values > "?", I had to do two things: Change them to numbers (1, 2), and then, > instead of the missing values, add the most common element, or mean or > whatever. When I tried to use LabelEncoder to do the first part, it > complained about the missing values. > > LabelEncoder is not the right tool for this task. It does map strings to > integers, but it's not a tool for mapping *particular* strings to > *particular* integers. More generally: LabelEncoder is a tool for encoding > a label, not a tool for data cleaning (which is how I would describe your > task). > > > all the while I'm thinking "It would be so much simpler to just write my > own logic in a for-loop rather than try to get Pandas and scikit-learn > working together. > > I wouldn't describe this as a case in which "pandas and scikit-learn > aren't working well together." Rather, I would describe this as a case of > trying to use a scikit-learn function when what you actually need is a > pandas function. > > Here's a solution to your problem in two lines of pandas code: > df['col'] = df['col'].map({'one':1, 'two':2, '?':np.nan}) > df['col'] = df['col'].fillna(df['col'].mean()) > > Showing you that there is a simple solution is not a critique of you. > Rather, pandas and scikit-learn are complex tools with huge APIs, and it > takes time to master them. And to be clear, I'm not critiquing the tools > either: they are complex tools with huge APIs because they are addressing > complex problems with lots of functional areas. > I understand, that makes sense. Thank you. > > > But it kind of felt like... What am I using a framework for to begin > with? > > I think you will find that pandas and scikit-learn can save you a lot of > code, but it does require finding the right function or class. Learning > these tools requires an investment of time, and many people have found that > this investment is well worth it. > > However, solving your problems with custom code is always an option, and > it's totally fine if that is your preferred option! > > Hope that helps, > > Kevin > > Thanks for your help Kevin.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn