excellent thank you for the warning, I will look into dictionaries alot more carefully now. I have some simple questions that I would like to ask straight away but will try and figure a few things out on my own first.
Thanks again!! On Wed, May 9, 2012 at 11:16 AM, Andre' Walker-Loud <[email protected]>wrote: > dear anonymous questioner, > > > Excellent, thank you so much. I don't understand all the steps at this > stage so I will need some time to go through it carefully but it works > perfectly. > > Thanks again! > > words of caution - you will notice that the way I constructed the data > file - I assumed what the input file would look like (they come > chronologically and all data are present - no missing years or months). > While this might be safe for a data file you constructed, and one that is > short enough to read - this is generally a very bad habit - hence my > encouraging you to figure out how to make dictionaries. > > Imagine how you would read a file that you got from a colleague, which was > so long you can not look by eye and see that it is intact, or perhaps you > know that in 1983, the month of June is missing as well as some other holes > in the data. And perhaps your colleague decided, since those data are > missing, just don't write the data to the file, so instead of having > > 1983 May 2.780009889 > 1983 June nan > 1983 July 0.138150181 > > you have > > 1983 May 2.780009889 > 1983 July 0.138150181 > > now the little loop I showed you will fail to place the data in the > correct place in your numpy array, and you will get all your averaging and > other analysis wrong. > > > Instead - you can create dictionaries for the years and months. Then when > you read in the data, you can grab this info to correctly place it in the > right spot > > # years and months are your dictionaries > years = {'1972':0,'1973':1,....} > months = {'Jan':0,'Feb':1,...,'Dec':11} > data = open(your_data).readlines() > for line in data: > year,month,dat = line.split() > y = int(('%('+year+')s') % years) > m = int(('%('+month+')s') % months) > rain_fall[y,m] = float(dat) > > [also - if someone knows how to use the dictionaries more appropriately > here - please chime in] > > then you also have to think about what to do with the empty data sets. > The initialization > > rain_fall = np.zeros([n_years,n_months]) > > will have placed zeros everywhere - and if the data is missing - it won't > get re-written. So that will make your analysis bogus also - so you have > to walk through and replace the zeros with something else, like 'nan'. And > then you could think about replacing missing data by averages - eg. replace > a missing June entry by the average over all the non-zero June data. > > > I was just hoping to give you a working example that you could use to make > a functioning well thought out example that can handle the exceptions which > will arise (like missing data, or a data file with a string where a float > should be etc') > > > Have fun! > > Andre > > > > > On May 8, 2012, at 5:48 PM, questions anon wrote: > > > On Tue, May 8, 2012 at 4:41 PM, Andre' Walker-Loud <[email protected]> > wrote: > > Hello anonymous questioner, > > > > first comment - you may want to look into hdf5 data structures > > > > http://www.hdfgroup.org/HDF5/ > > > > and the python tools to play with them > > > > pytables - http://www.pytables.org/moin > > h5py - http://code.google.com/p/h5py/ > > > > I have personally used pytables more - but not for any good reason. If > you happen to have the Enthought python distribution - these come with the > package, as well as an installation of hdf5 > > > > hdf5 is a very nice file format for storing large amounts of data > (binary) with descriptive meta-data. Also, numpy plays very nice with > hdf5. Given all your questions here, I suspect you would benefit from > learning about these and learning to play with them. > > > > Now to your specific question. > > > > > I would like to calculate summary statistics of rainfall based on year > and month. > > > I have the data in a text file (although could put in any format if it > helps) extending over approx 40 years: > > > YEAR MONTH MeanRain > > > 1972 Jan 12.7083199 > > > 1972 Feb 14.17007142 > > > 1972 Mar 14.5659302 > > > 1972 Apr 1.508517302 > > > 1972 May 2.780009889 > > > 1972 Jun 1.609619287 > > > 1972 Jul 0.138150181 > > > 1972 Aug 0.214346148 > > > 1972 Sep 1.322102228 > > > > > > I would like to be able to calculate the total rain annually: > > > > > > YEAR Annualrainfall > > > 1972 400 > > > 1973 300 > > > 1974 350 > > > .... > > > 2011 400 > > > > > > and also the monthly mean rainfall for all years: > > > > > > YEAR MonthlyMeanRain > > > Jan 13 > > > Feb 15 > > > Mar 8 > > > ..... > > > Dec 13 > > > > > > > > > Is this something I can easily do? > > > > Yes - this should be very easy. Imagine importing all this data into a > numpy array > > > > === > > import numpy as np > > > > data = open(your_data).readlines() > > years = [] > > for line in data: > > if line.split()[0] not in years: > > years.append(line.split()[0]) > > months = ['Jan','Feb',....,'Dec'] > > > > rain_fall = np.zeros([len(n_year),len(months)]) > > for y,year in enumerate(years): > > for m,month in enumerate(months): > > rain_fall[y,m] = float(data[ y * 12 + m].split()[2]) > > > > # to get average per year - average over months - axis=1 > > print np.mean(rain_fall,axis=1) > > > > # to get average per month - average over years - axis=0 > > print np.mean(rain_fall,axis=0) > > > > === > > > > now you should imagine doing this by setting up dictionaries, so that > you can request an average for year 1972 or for month March. That is why I > used the enumerate function before to walk the indices - so that you can > imagine building the dictionary simultaneously. > > > > years = {'1972':0, '1973':1, ....} > > months = {'Jan':0,'Feb':1,...'Dec':11} > > > > then you can access and store the data to the array using these > dictionaries. > > > > print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)] > > > > > > Andre > > > > > > > > > > > > > I have started by simply importing the text file but data is not > represented as time so that is probably my first problem and then I am not > sure how to group them by month/year. > > > > > > textfile=r"textfile.txt" > > > f=np.genfromtxt(textfile,skip_header=1) > > > > > > Any feedback will be greatly appreciated. > > > > > > _______________________________________________ > > > Tutor maillist - [email protected] > > > To unsubscribe or change subscription options: > > > http://mail.python.org/mailman/listinfo/tutor > > > > > >
_______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
