On 7/4/2009 9:09 AM Steven Buck said...
Dear Python Tutor,
I'm doing econometric work and am a new user of Python. I have read
several of the tutorials, but haven't found them useful for a newbie
problem I've encountered.
I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to
get a Stata ".dta" file into Python. In Stata the data set is an NXK
matrix where N is the number of observations (households) and K is the
number of variables.
I gather it's now a list where each element of the list is an
observation (a vector) for one household. The name of my list is
"data"; I gather Python recognizes the first observation by: data[1] .
Example,
data = [X_1, X_2, X_3, . . . . , X_N] where each X_i for all i, is
vector of household characteristics, eg X_1 = (age_1, wage_1, . . . ,
residence_1).
I also have a list for variable names called "varname"; although I'm not
sure the module I used to extract the ".dta" into Python also created a
correspondence between the varname list and the data list--the python
interpreter won't print anything when I type one of the variable names,
I was hoping it would print out a vector of ages or the like.
Assuming you're working in the python console somewhat from the example
on the source website for PyDTA:
from PyDTA import Reader
dta = Reader(file('input.dta'))
fields = ','.join(['%s']*len(dta.variables()))
... you might try starting at dir|help (dta.variables)
I didn't look, but the sources are available as well.
In anycase, I'd like to make a scatter plot in pylab,
I think I'd use dictionaries along these lines:
wages = { age_1: [ X_1, X_15, X_3...],
age_2: [ X_2, X_5... ],
]
but don't know how
to identify a variable in "data" (i.e. I'd like a vector listing the
ages and another vector listing the wages of households).
I think poking into dta.variables will answer this one.
HTH,
Emile
Perhaps, I
need to run subroutine to collect each relevant data point to create a
new list which I define as my variable of interest? From the above
example, I'd like to create a list such as: age = [age_1, age_2, . . . ,
age_N] and likewise for wages.
Any help you could offer would be very much appreciated. Also, this is
my first time using the python tutor, so let me know if I've used it
appropriately or if I should change/narrow the structure of my question.
Thanks
Steve
--
Steven Buck
Ph.D. Student
Department of Agricultural and Resource Economics
University of California, Berkeley
------------------------------------------------------------------------
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor