Re: pandas loc on str lower for column comparison
Sayth Renshaw writes: >> >> That actually creates another error. >> >> A value is trying to be set on a copy of a slice from a DataFrame. >> Try using .loc[row_indexer,col_indexer] = value instead >> >> See the caveats in the documentation: >> http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy >> >> So tried this >> df['c'] = df.apply(lambda df1: df1['Current >> Team'].str.lower().str.strip() == df1['New >> Team'].str.lower().str.strip(), axis=1) >> >> Based on this SO answer https://stackoverflow.com/a/46570641 >> >> Thoughts? >> >> Sayth > > This works on an individual row > df2 = df1.loc[(df1['Current Team'] == df1['New Team']),'D'] = 'Wow' > > But how do I apply it to the whole new column and return the new dataset? > > Trying to use lambda but it cannot contain assigment > df2 = df1.apply(lambda df1: [ (df1['Current Team'] == df1['New Team']) > ]['D'] = 'succeed') > df2 > > Confused > > Sayth df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip() works on whole columns, not only on an individual row. xls = pd.ExcelFile("Melbourne.xlsx") df = xls.parse('Sheet1', skiprows= 4) df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip() print(df1.head()) -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
On Tuesday, 10 September 2019 12:56:36 UTC+10, Sayth Renshaw wrote: > On Friday, 6 September 2019 07:52:56 UTC+10, Piet van Oostrum wrote: > > Piet van Oostrum <> writes: > > > > > That would select ROWS 0,1,5,6,7, not columns. > > > To select columns 0,1,5,6,7, use two-dimensional indexes > > > > > > df1 = df.iloc[:, [0,1,5,6,7]] > > > > > > : selects all rows. > > > > And that also solves your original problem. > > > > This statement: > > > > df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == > > df1.loc['New Team'].str.lower().str.strip() > > > > should not use .loc, because then you are selecting rows, not columns, but: > > > > df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New > > Team'].str.lower().str.strip() > > -- > > Piet van Oostrum <> > > WWW: http://piet.vanoostrum.org/ > > PGP key: [8DAE142BE17999C4] > > That actually creates another error. > > A value is trying to be set on a copy of a slice from a DataFrame. > Try using .loc[row_indexer,col_indexer] = value instead > > See the caveats in the documentation: > http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy > > So tried this > df['c'] = df.apply(lambda df1: df1['Current Team'].str.lower().str.strip() == > df1['New Team'].str.lower().str.strip(), axis=1) > > Based on this SO answer https://stackoverflow.com/a/46570641 > > Thoughts? > > Sayth This works on an individual row df2 = df1.loc[(df1['Current Team'] == df1['New Team']),'D'] = 'Wow' But how do I apply it to the whole new column and return the new dataset? Trying to use lambda but it cannot contain assigment df2 = df1.apply(lambda df1: [ (df1['Current Team'] == df1['New Team']) ]['D'] = 'succeed') df2 Confused Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
On Friday, 6 September 2019 07:52:56 UTC+10, Piet van Oostrum wrote: > Piet van Oostrum <> writes: > > > That would select ROWS 0,1,5,6,7, not columns. > > To select columns 0,1,5,6,7, use two-dimensional indexes > > > > df1 = df.iloc[:, [0,1,5,6,7]] > > > > : selects all rows. > > And that also solves your original problem. > > This statement: > > df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == > df1.loc['New Team'].str.lower().str.strip() > > should not use .loc, because then you are selecting rows, not columns, but: > > df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New > Team'].str.lower().str.strip() > -- > Piet van Oostrum <> > WWW: http://piet.vanoostrum.org/ > PGP key: [8DAE142BE17999C4] That actually creates another error. A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy So tried this df['c'] = df.apply(lambda df1: df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip(), axis=1) Based on this SO answer https://stackoverflow.com/a/46570641 Thoughts? Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
That is actually consistent with Excel row, column. Can see why it works that way then. Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
Piet van Oostrum writes: > That would select ROWS 0,1,5,6,7, not columns. > To select columns 0,1,5,6,7, use two-dimensional indexes > > df1 = df.iloc[:, [0,1,5,6,7]] > > : selects all rows. And that also solves your original problem. This statement: df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == df1.loc['New Team'].str.lower().str.strip() should not use .loc, because then you are selecting rows, not columns, but: df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip() -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
Sayth Renshaw writes: > On Sunday, 1 September 2019 10:48:54 UTC+10, Sayth Renshaw wrote: >> I've created a share doc same structure anon data from my google drive. >> >> https://drive.google.com/file/d/0B28JfFTPNr_lckxQRnFTRF9UTEFYRUVqRWxCNVd1VEZhcVNr/view?usp=sharing >> >> Sayth > > I tried creating the df1 dataframe by using iloc instead of loc to avoid any > column naming issues. > > So i created a list of integers for iloc representing the columns in current > example. > > df1 = df.iloc[[0,1,5,6,7]] > > However, I ust be misunderstanding the docs > https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc > Allowed inputs are: > > An integer, e.g. 5. > A list or array of integers, e.g. [4, 3, 0]. > > Because while it works I appear to grab all columns 13 when I requested 5. > UID Name FTE Agent ID Current Leader New Leader Current Team New > Team Current Site New Site Unnamed: 10 Unnamed: 11 Unnamed: 12 > > How do I misunderstand iloc? > That would select ROWS 0,1,5,6,7, not columns. To select columns 0,1,5,6,7, use two-dimensional indexes df1 = df.iloc[:, [0,1,5,6,7]] : selects all rows. -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
On Sunday, 1 September 2019 10:48:54 UTC+10, Sayth Renshaw wrote: > I've created a share doc same structure anon data from my google drive. > > https://drive.google.com/file/d/0B28JfFTPNr_lckxQRnFTRF9UTEFYRUVqRWxCNVd1VEZhcVNr/view?usp=sharing > > Sayth I tried creating the df1 dataframe by using iloc instead of loc to avoid any column naming issues. So i created a list of integers for iloc representing the columns in current example. df1 = df.iloc[[0,1,5,6,7]] However, I ust be misunderstanding the docs https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc Allowed inputs are: An integer, e.g. 5. A list or array of integers, e.g. [4, 3, 0]. Because while it works I appear to grab all columns 13 when I requested 5. UID NameFTE Agent IDCurrent Leader New Leader Current TeamNew TeamCurrent SiteNew SiteUnnamed: 10 Unnamed: 11 Unnamed: 12 How do I misunderstand iloc? Thanks, Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
I've created a share doc same structure anon data from my google drive. https://drive.google.com/file/d/0B28JfFTPNr_lckxQRnFTRF9UTEFYRUVqRWxCNVd1VEZhcVNr/view?usp=sharing Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
On Sunday, 1 September 2019 05:19:34 UTC+10, Piet van Oostrum wrote: > Sayth Renshaw writes: > > > But on both occasions I receive this error. > > > > # KeyError: 'the label [Current Team] is not in the [index]' > > > > if I test df1 before trying to create the new column it works just fine. > > > What do you mean by testing df1? > > And could it be that 'Current Team' is spelled differently in the assignment > than in the construction of df1? For example a difference in spaces, like a > triling space or a breaking vs. non-breaking space? Please check that both > are exactly the same. > > -- > Piet van Oostrum > WWW: http://piet.vanoostrum.org/ > PGP key: [8DAE142BE17999C4] Hi Version info 1.1.0 #scipy 0.23.4 #pandas RangeIndex: 35 entries, 0 to 34 Data columns (total 5 columns): UID 35 non-null object Name35 non-null object New Leader 35 non-null object Current Team35 non-null object New Team35 non-null object dtypes: object(5) memory usage: 1.4+ KB I had to anonymise the sheet. But same structure. tested produces same errors. import pandas as pd import scipy print(scipy.version.version) print(pd.__version__) xls = pd.ExcelFile("Melbourne-anon.xlsx") df = xls.parse('Sheet1', skiprows= 4) # df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] # print(df['Current Team'] == df1['Current Team']) df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] # df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == df1.loc['New Team'].str.lower().str.strip() # df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == df1.loc['New Team'].str.lower().str.strip() df1.info() -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
Sayth Renshaw writes: > But on both occasions I receive this error. > > # KeyError: 'the label [Current Team] is not in the [index]' > > if I test df1 before trying to create the new column it works just fine. > What do you mean by testing df1? And could it be that 'Current Team' is spelled differently in the assignment than in the construction of df1? For example a difference in spaces, like a triling space or a breaking vs. non-breaking space? Please check that both are exactly the same. -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
Sayth Renshaw writes: > > I have tried both > > df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] > df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == > df1.loc['New Team'].str.lower().str.strip() > > and > > df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() > df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == > df1.loc['New Team'].str.lower().str.strip() > > But on both occasions I receive this error. > > # KeyError: 'the label [Current Team] is not in the [index]' > > if I test df1 before trying to create the new column it works just fine. > > Sayth What does df1.info() produce? Which versions of numpy, scipy and pandas are you using? It would be helpful if you could make a copy of the original .xlsx file available. -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
On Friday, 30 August 2019 00:49:32 UTC+10, Piet van Oostrum wrote: > Piet van Oostrum writes: > > > So the correct way to do this is to make df1 a copy rather than a view. > > > > df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] > > Or maybe even make an explicit copy: > > df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() > -- > Piet van Oostrum > WWW: http://piet.vanoostrum.org/ > PGP key: [8DAE142BE17999C4] I have tried both df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == df1.loc['New Team'].str.lower().str.strip() and df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == df1.loc['New Team'].str.lower().str.strip() But on both occasions I receive this error. # KeyError: 'the label [Current Team] is not in the [index]' if I test df1 before trying to create the new column it works just fine. Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
On Friday, 30 August 2019 00:49:32 UTC+10, Piet van Oostrum wrote: > Piet van Oostrum > > > So the correct way to do this is to make df1 a copy rather than a view. > > > > df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] > > Or maybe even make an explicit copy: > > df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() Thank you so much. Sayth -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
Piet van Oostrum writes: > So the correct way to do this is to make df1 a copy rather than a view. > > df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] Or maybe even make an explicit copy: df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy() -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list
Re: pandas loc on str lower for column comparison
Sayth Renshaw writes: > Hi > > I am importing 4 columns into a dataframe from a spreadsheet. > > My goal is to create a 5th column with TRUE or False if column 4 (str) > matches column 3. > > Trying to leverage this answer https://stackoverflow.com/a/35940955/461887 > > This is my code > > import pandas as pd > > xls = pd.ExcelFile("Melbourne.xlsx") > df = xls.parse('Sheet1', skiprows= 4) > df1 = df[['UID','Name','New Leader','Current Team', 'New Team']] > df1['Difference'] = df1['Current > Team'].str.lower().str.replace('s/+',"") == df1['New > Team'].str.lower().str.replace('s/+',"") > > Which gives this error > > C:\Users\u369811\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:6: > SettingWithCopyWarning: > A value is trying to be set on a copy of a slice from a DataFrame. I think what happens is that df1 is not a copy of the subset of df that you want, but it is a VIEW on df instead. This means that df1 shares memory with df (for memory savings reasons). But if you would change this view by adding a column, where should it be put? In df? If so, where? So the correct way to do this is to make df1 a copy rather than a view. df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] And than it should work. Except that the str.replace is wrong for what you want. It just replaces the literal string "s/+" with an empty string instead of white space. This was wrong in the stackoverflow post. To replace whitespace it should be str.replace('\\s+',"", regex=True). But simpler is to use str.strip(): df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New Team'].str.lower().str.strip() -- Piet van Oostrum WWW: http://piet.vanoostrum.org/ PGP key: [8DAE142BE17999C4] -- https://mail.python.org/mailman/listinfo/python-list