Exposito, Pedro (RIS-MDW) wrote: > This code does a "where" clause on a panda data frame... > > Code: > import pandas as pd; > col_names = ['Name', 'Age', 'Weight', "Education"]; > # create panda dataframe > x = pd.read_csv('test.dat', sep='|', header=None, names = col_names); > # apply "where" condition > z = x[ (x['Age'] == 55) ] > # prints row WHERE age == 55 > print (z); > > What is happening in this statement: > z = x[ (x['Age'] == 55) ] > > Thanks,
Let's take it apart into individual steps: Make up example data: >>> import pandas as pd >>> x = pd.DataFrame([["Jim", 44], ["Sue", 55], ["Alice", 66]], columns=["Name", "Age"]) >>> x Name Age 0 Jim 44 1 Sue 55 2 Alice 66 Have a look at the inner expression: >>> x["Age"] == 55 0 False 1 True 2 False So this is a basically vector of boolean values. If you want more details: in numpy operations involving a a scalar and an array work via "broadcasting". In pure Python you would write something similar as >>> [v == 55 for v in x["Age"]] [False, True, False] Use the result as an index: >>> x[[False, True, True]] Name Age 1 Sue 55 2 Alice 66 [2 rows x 2 columns] This is again in line with numpy arrays -- if you pass an array of boolean values as an index the values in the True positions are selected. In pure Python you could achieve that with >>> index = [v == 55 for v in x["Age"]] >>> index [False, True, False] >>> [v for b, v in zip(index, x["Age"]) if b] [55] -- https://mail.python.org/mailman/listinfo/python-list