Exposito, Pedro (RIS-MDW) wrote:

> This code does a "where" clause on a panda data frame...
> 
> Code:
> import pandas as pd;
> col_names = ['Name', 'Age', 'Weight', "Education"];
> # create panda dataframe
> x = pd.read_csv('test.dat', sep='|', header=None, names = col_names);
>                 # apply "where" condition
> z = x[ (x['Age'] == 55) ]
> # prints row WHERE age == 55
> print (z);
> 
> What is happening in this statement:
> z = x[ (x['Age'] == 55) ]
> 
> Thanks,

Let's take it apart into individual steps:

Make up example data:

>>> import pandas as pd
>>> x = pd.DataFrame([["Jim", 44], ["Sue", 55], ["Alice", 66]], 
columns=["Name", "Age"])
>>> x
    Name  Age
0    Jim   44
1    Sue   55
2  Alice   66

Have a look at the inner expression:

>>> x["Age"] == 55
0    False
1     True
2    False

So this is a basically vector of boolean values. If you want more details: 
in numpy operations involving a a scalar and an array work via 
"broadcasting". In pure Python you would write something similar as

>>> [v == 55 for v in x["Age"]]
[False, True, False]

Use the result as an index:

>>> x[[False, True, True]]
    Name  Age
1    Sue   55
2  Alice   66

[2 rows x 2 columns]

This is again in line with numpy arrays -- if you pass an array of boolean 
values as an index the values in the True positions are selected. In pure 
Python you could achieve that with

>>> index = [v == 55 for v in x["Age"]]
>>> index
[False, True, False]
>>> [v for b, v in zip(index, x["Age"]) if b]
[55]


-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to