Re: pandas loc on str lower for column comparison

2019-09-10 Thread Piet van Oostrum
Sayth Renshaw  writes:

>> 
>> That actually creates another error.
>> 
>> A value is trying to be set on a copy of a slice from a DataFrame.
>> Try using .loc[row_indexer,col_indexer] = value instead
>> 
>> See the caveats in the documentation:
>> http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
>> 
>> So tried this
>> df['c'] = df.apply(lambda df1: df1['Current
>> Team'].str.lower().str.strip() == df1['New
>> Team'].str.lower().str.strip(), axis=1)
>> 
>> Based on this SO answer https://stackoverflow.com/a/46570641
>> 
>> Thoughts?
>> 
>> Sayth
>
> This works on an individual row
> df2 = df1.loc[(df1['Current Team'] == df1['New Team']),'D'] = 'Wow'
>
> But how do I apply it to the whole new column and return the new dataset?
>
> Trying to use lambda but it cannot contain assigment
> df2 = df1.apply(lambda df1: [ (df1['Current Team'] == df1['New Team'])  
> ]['D'] = 'succeed')
> df2
>
> Confused
>
> Sayth

df1['Difference'] = df1['Current Team'].str.lower().str.strip() ==
df1['New Team'].str.lower().str.strip()

works on whole columns, not only on an individual row.


xls = pd.ExcelFile("Melbourne.xlsx")
df = xls.parse('Sheet1', skiprows= 4)
df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()
df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New 
Team'].str.lower().str.strip()
print(df1.head())

-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-09 Thread Sayth Renshaw
On Tuesday, 10 September 2019 12:56:36 UTC+10, Sayth Renshaw  wrote:
> On Friday, 6 September 2019 07:52:56 UTC+10, Piet van Oostrum  wrote:
> > Piet van Oostrum <> writes:
> > 
> > > That would select ROWS 0,1,5,6,7, not columns.
> > > To select columns 0,1,5,6,7, use two-dimensional indexes
> > >
> > > df1 = df.iloc[:, [0,1,5,6,7]]
> > >
> > > : selects all rows.
> > 
> > And that also solves your original problem.
> > 
> > This statement:
> > 
> > df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
> > df1.loc['New Team'].str.lower().str.strip()
> > 
> > should not use .loc, because then you are selecting rows, not columns, but:
> > 
> > df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New 
> > Team'].str.lower().str.strip()
> > -- 
> > Piet van Oostrum <>
> > WWW: http://piet.vanoostrum.org/
> > PGP key: [8DAE142BE17999C4]
> 
> That actually creates another error.
> 
> A value is trying to be set on a copy of a slice from a DataFrame.
> Try using .loc[row_indexer,col_indexer] = value instead
> 
> See the caveats in the documentation: 
> http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
> 
> So tried this
> df['c'] = df.apply(lambda df1: df1['Current Team'].str.lower().str.strip() == 
> df1['New Team'].str.lower().str.strip(), axis=1)
> 
> Based on this SO answer https://stackoverflow.com/a/46570641
> 
> Thoughts?
> 
> Sayth

This works on an individual row
df2 = df1.loc[(df1['Current Team'] == df1['New Team']),'D'] = 'Wow'

But how do I apply it to the whole new column and return the new dataset?

Trying to use lambda but it cannot contain assigment
df2 = df1.apply(lambda df1: [ (df1['Current Team'] == df1['New Team'])  ]['D'] 
= 'succeed')
df2

Confused

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-09 Thread Sayth Renshaw
On Friday, 6 September 2019 07:52:56 UTC+10, Piet van Oostrum  wrote:
> Piet van Oostrum <> writes:
> 
> > That would select ROWS 0,1,5,6,7, not columns.
> > To select columns 0,1,5,6,7, use two-dimensional indexes
> >
> > df1 = df.iloc[:, [0,1,5,6,7]]
> >
> > : selects all rows.
> 
> And that also solves your original problem.
> 
> This statement:
> 
> df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
> df1.loc['New Team'].str.lower().str.strip()
> 
> should not use .loc, because then you are selecting rows, not columns, but:
> 
> df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New 
> Team'].str.lower().str.strip()
> -- 
> Piet van Oostrum <>
> WWW: http://piet.vanoostrum.org/
> PGP key: [8DAE142BE17999C4]

That actually creates another error.

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: 
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

So tried this
df['c'] = df.apply(lambda df1: df1['Current Team'].str.lower().str.strip() == 
df1['New Team'].str.lower().str.strip(), axis=1)

Based on this SO answer https://stackoverflow.com/a/46570641

Thoughts?

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-05 Thread Sayth Renshaw
That is actually consistent with Excel row, column. Can see why it works that 
way then.

Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-05 Thread Piet van Oostrum
Piet van Oostrum  writes:

> That would select ROWS 0,1,5,6,7, not columns.
> To select columns 0,1,5,6,7, use two-dimensional indexes
>
> df1 = df.iloc[:, [0,1,5,6,7]]
>
> : selects all rows.

And that also solves your original problem.

This statement:

df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
df1.loc['New Team'].str.lower().str.strip()

should not use .loc, because then you are selecting rows, not columns, but:

df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New 
Team'].str.lower().str.strip()
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-05 Thread Piet van Oostrum
Sayth Renshaw  writes:

> On Sunday, 1 September 2019 10:48:54 UTC+10, Sayth Renshaw  wrote:
>> I've created a share doc same structure anon data from my google drive.
>> 
>> https://drive.google.com/file/d/0B28JfFTPNr_lckxQRnFTRF9UTEFYRUVqRWxCNVd1VEZhcVNr/view?usp=sharing
>> 
>> Sayth
>
> I tried creating the df1 dataframe by using iloc instead of loc to avoid any 
> column naming issues.
>
> So i created a list of integers for iloc representing the columns in current 
> example.
>
> df1 = df.iloc[[0,1,5,6,7]] 
>
> However, I ust be misunderstanding the docs
> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc
> Allowed inputs are:
>
> An integer, e.g. 5.
> A list or array of integers, e.g. [4, 3, 0].
>
> Because while it works I appear to grab all columns 13 when I requested 5.
>   UID Name FTE Agent ID Current Leader New Leader Current Team New
> Team Current Site New Site Unnamed: 10 Unnamed: 11 Unnamed: 12
>
> How do I misunderstand iloc?
>
That would select ROWS 0,1,5,6,7, not columns.
To select columns 0,1,5,6,7, use two-dimensional indexes

df1 = df.iloc[:, [0,1,5,6,7]]

: selects all rows.
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-04 Thread Sayth Renshaw
On Sunday, 1 September 2019 10:48:54 UTC+10, Sayth Renshaw  wrote:
> I've created a share doc same structure anon data from my google drive.
> 
> https://drive.google.com/file/d/0B28JfFTPNr_lckxQRnFTRF9UTEFYRUVqRWxCNVd1VEZhcVNr/view?usp=sharing
> 
> Sayth

I tried creating the df1 dataframe by using iloc instead of loc to avoid any 
column naming issues.

So i created a list of integers for iloc representing the columns in current 
example.

df1 = df.iloc[[0,1,5,6,7]] 

However, I ust be misunderstanding the docs 
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc
Allowed inputs are:

An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].

Because while it works I appear to grab all columns 13 when I requested 5.
UID NameFTE Agent IDCurrent Leader  New Leader  
Current TeamNew TeamCurrent SiteNew SiteUnnamed: 10 
Unnamed: 11 Unnamed: 12 

How do I misunderstand iloc?

Thanks,
Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-09-01 Thread Sayth Renshaw



I've created a share doc same structure anon data from my google drive.

https://drive.google.com/file/d/0B28JfFTPNr_lckxQRnFTRF9UTEFYRUVqRWxCNVd1VEZhcVNr/view?usp=sharing

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-31 Thread Sayth Renshaw
On Sunday, 1 September 2019 05:19:34 UTC+10, Piet van Oostrum  wrote:
> Sayth Renshaw writes:
> 
> > But on both occasions I receive this error.
> >
> > # KeyError: 'the label [Current Team] is not in the [index]'
> >
> > if I test df1 before trying to create the new column it works just fine.
> >
> What do you mean by testing df1?
> 
> And could it be that 'Current Team' is spelled differently in the assignment 
> than in the construction of df1? For example a difference in spaces, like a 
> triling space or a breaking vs. non-breaking space? Please check that both 
> are exactly the same.
> 
> -- 
> Piet van Oostrum 
> WWW: http://piet.vanoostrum.org/
> PGP key: [8DAE142BE17999C4]

Hi

Version info

1.1.0 #scipy
0.23.4 #pandas

RangeIndex: 35 entries, 0 to 34
Data columns (total 5 columns):
UID 35 non-null object
Name35 non-null object
New Leader  35 non-null object
Current Team35 non-null object
New Team35 non-null object
dtypes: object(5)
memory usage: 1.4+ KB

I had to anonymise the sheet. But same structure. tested produces same errors.

import pandas as pd
import scipy

print(scipy.version.version)
print(pd.__version__)
xls = pd.ExcelFile("Melbourne-anon.xlsx")
df = xls.parse('Sheet1', skiprows= 4)
# df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')]
# print(df['Current Team'] == df1['Current Team'])
df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] 
# df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
df1.loc['New Team'].str.lower().str.strip()
# df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()
df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
df1.loc['New Team'].str.lower().str.strip()
df1.info()


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-31 Thread Piet van Oostrum
Sayth Renshaw  writes:

> But on both occasions I receive this error.
>
> # KeyError: 'the label [Current Team] is not in the [index]'
>
> if I test df1 before trying to create the new column it works just fine.
>
What do you mean by testing df1?

And could it be that 'Current Team' is spelled differently in the assignment 
than in the construction of df1? For example a difference in spaces, like a 
triling space or a breaking vs. non-breaking space? Please check that both are 
exactly the same.

-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-30 Thread Piet van Oostrum
Sayth Renshaw  writes:

>
> I have tried both
>
> df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] 
> df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() ==
> df1.loc['New Team'].str.lower().str.strip()
>
> and 
>
> df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()
> df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() ==
> df1.loc['New Team'].str.lower().str.strip()
>
> But on both occasions I receive this error.
>
> # KeyError: 'the label [Current Team] is not in the [index]'
>
> if I test df1 before trying to create the new column it works just fine.
>
> Sayth

What does df1.info() produce?
Which versions of numpy, scipy and pandas are you using?

It would be helpful if you could make a copy of the original .xlsx file 
available.
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-29 Thread Sayth Renshaw
On Friday, 30 August 2019 00:49:32 UTC+10, Piet van Oostrum  wrote:
> Piet van Oostrum  writes:
> 
> > So the correct way to do this is to make df1 a copy rather than a view.
> >
> > df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')]
> 
> Or maybe even make an explicit copy:
> 
> df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()
> -- 
> Piet van Oostrum
> WWW: http://piet.vanoostrum.org/
> PGP key: [8DAE142BE17999C4]

I have tried both

df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')] 
df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
df1.loc['New Team'].str.lower().str.strip()

and 

df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()
df1['Difference'] = df1.loc['Current Team'].str.lower().str.strip() == 
df1.loc['New Team'].str.lower().str.strip()

But on both occasions I receive this error.

# KeyError: 'the label [Current Team] is not in the [index]'

if I test df1 before trying to create the new column it works just fine.

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-29 Thread Sayth Renshaw
On Friday, 30 August 2019 00:49:32 UTC+10, Piet van Oostrum  wrote:
> Piet van Oostrum 
> 
> > So the correct way to do this is to make df1 a copy rather than a view.
> >
> > df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')]
> 
> Or maybe even make an explicit copy:
> 
> df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()

Thank you so much.

Sayth

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-29 Thread Piet van Oostrum
Piet van Oostrum  writes:

> So the correct way to do this is to make df1 a copy rather than a view.
>
> df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')]

Or maybe even make an explicit copy:

df1 = df[['UID','Name','New Leader','Current Team', 'New Team']].copy()
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pandas loc on str lower for column comparison

2019-08-29 Thread Piet van Oostrum
Sayth Renshaw  writes:

> Hi
>
> I am importing 4 columns into a dataframe from a spreadsheet.
>
> My goal is to create a 5th column with TRUE or False if column 4 (str) 
> matches column 3.
>
> Trying to leverage this answer https://stackoverflow.com/a/35940955/461887
>
> This is my code 
>
> import pandas as pd
>
> xls = pd.ExcelFile("Melbourne.xlsx")
> df = xls.parse('Sheet1', skiprows= 4)
> df1 = df[['UID','Name','New Leader','Current Team', 'New Team']]
> df1['Difference'] = df1['Current
> Team'].str.lower().str.replace('s/+',"") == df1['New
> Team'].str.lower().str.replace('s/+',"")
>
> Which gives this error
>
> C:\Users\u369811\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:6:
> SettingWithCopyWarning:
> A value is trying to be set on a copy of a slice from a DataFrame.

I think what happens is that df1 is not a copy of the subset of df that you 
want, but it is a VIEW on df instead. This means that df1 shares memory with df 
(for memory savings reasons). But if you would change this view by adding a 
column, where should it be put? In df? If so, where?

So the correct way to do this is to make df1 a copy rather than a view.

df1 = df.loc[:, ('UID','Name','New Leader','Current Team', 'New Team')]

And than it should work.
Except that the str.replace is wrong for what you want.  It just replaces the 
literal string "s/+" with an empty string instead of white space. This was 
wrong in the stackoverflow post.
To replace whitespace it should be str.replace('\\s+',"", regex=True). But 
simpler is to use str.strip():

df1['Difference'] = df1['Current Team'].str.lower().str.strip() == df1['New 
Team'].str.lower().str.strip()
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list