On 7/4/2009 9:09 AM Steven Buck said...
Dear Python Tutor,
I'm doing econometric work and am a new user of Python. I have read several of the tutorials, but haven't found them useful for a newbie problem I've encountered. I've used a module (StataTools) from (http://presbrey.mit.edu/PyDTA ) to get a Stata ".dta" file into Python. In Stata the data set is an NXK matrix where N is the number of observations (households) and K is the number of variables. I gather it's now a list where each element of the list is an observation (a vector) for one household. The name of my list is "data"; I gather Python recognizes the first observation by: data[1] . Example, data = [X_1, X_2, X_3, . . . . , X_N] where each X_i for all i, is vector of household characteristics, eg X_1 = (age_1, wage_1, . . . , residence_1). I also have a list for variable names called "varname"; although I'm not sure the module I used to extract the ".dta" into Python also created a correspondence between the varname list and the data list--the python interpreter won't print anything when I type one of the variable names, I was hoping it would print out a vector of ages or the like.

Assuming you're working in the python console somewhat from the example on the source website for PyDTA:

from PyDTA import Reader
dta = Reader(file('input.dta'))
fields = ','.join(['%s']*len(dta.variables()))

... you might try starting at dir|help (dta.variables)

I didn't look, but the sources are available as well.


In anycase, I'd like to make a scatter plot in pylab,

I think I'd use dictionaries along these lines:

  wages = { age_1: [ X_1, X_15, X_3...],
            age_2: [ X_2, X_5... ],
          ]


but don't know how to identify a variable in "data" (i.e. I'd like a vector listing the ages and another vector listing the wages of households).

I think poking into dta.variables will answer this one.

HTH,

Emile

Perhaps, I need to run subroutine to collect each relevant data point to create a new list which I define as my variable of interest? From the above example, I'd like to create a list such as: age = [age_1, age_2, . . . , age_N] and likewise for wages. Any help you could offer would be very much appreciated. Also, this is my first time using the python tutor, so let me know if I've used it appropriately or if I should change/narrow the structure of my question. Thanks
Steve

--
Steven Buck
Ph.D. Student
Department of Agricultural and Resource Economics
University of California, Berkeley


------------------------------------------------------------------------

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to