Excellent, thank you so much. I don't understand all the steps at this stage so I will need some time to go through it carefully but it works perfectly. Thanks again!
On Tue, May 8, 2012 at 4:41 PM, Andre' Walker-Loud <[email protected]>wrote: > Hello anonymous questioner, > > first comment - you may want to look into hdf5 data structures > > http://www.hdfgroup.org/HDF5/ > > and the python tools to play with them > > pytables - http://www.pytables.org/moin > h5py - http://code.google.com/p/h5py/ > > I have personally used pytables more - but not for any good reason. If > you happen to have the Enthought python distribution - these come with the > package, as well as an installation of hdf5 > > hdf5 is a very nice file format for storing large amounts of data (binary) > with descriptive meta-data. Also, numpy plays very nice with hdf5. Given > all your questions here, I suspect you would benefit from learning about > these and learning to play with them. > > Now to your specific question. > > > I would like to calculate summary statistics of rainfall based on year > and month. > > I have the data in a text file (although could put in any format if it > helps) extending over approx 40 years: > > YEAR MONTH MeanRain > > 1972 Jan 12.7083199 > > 1972 Feb 14.17007142 > > 1972 Mar 14.5659302 > > 1972 Apr 1.508517302 > > 1972 May 2.780009889 > > 1972 Jun 1.609619287 > > 1972 Jul 0.138150181 > > 1972 Aug 0.214346148 > > 1972 Sep 1.322102228 > > > > I would like to be able to calculate the total rain annually: > > > > YEAR Annualrainfall > > 1972 400 > > 1973 300 > > 1974 350 > > .... > > 2011 400 > > > > and also the monthly mean rainfall for all years: > > > > YEAR MonthlyMeanRain > > Jan 13 > > Feb 15 > > Mar 8 > > ..... > > Dec 13 > > > > > > Is this something I can easily do? > > Yes - this should be very easy. Imagine importing all this data into a > numpy array > > === > import numpy as np > > data = open(your_data).readlines() > years = [] > for line in data: > if line.split()[0] not in years: > years.append(line.split()[0]) > months = ['Jan','Feb',....,'Dec'] > > rain_fall = np.zeros([len(n_year),len(months)]) > for y,year in enumerate(years): > for m,month in enumerate(months): > rain_fall[y,m] = float(data[ y * 12 + m].split()[2]) > > # to get average per year - average over months - axis=1 > print np.mean(rain_fall,axis=1) > > # to get average per month - average over years - axis=0 > print np.mean(rain_fall,axis=0) > > === > > now you should imagine doing this by setting up dictionaries, so that you > can request an average for year 1972 or for month March. That is why I > used the enumerate function before to walk the indices - so that you can > imagine building the dictionary simultaneously. > > years = {'1972':0, '1973':1, ....} > months = {'Jan':0,'Feb':1,...'Dec':11} > > then you can access and store the data to the array using these > dictionaries. > > print rain_fall[int('%(1984)s' % years), int('%(March)s' % months)] > > > Andre > > > > > > > I have started by simply importing the text file but data is not > represented as time so that is probably my first problem and then I am not > sure how to group them by month/year. > > > > textfile=r"textfile.txt" > > f=np.genfromtxt(textfile,skip_header=1) > > > > Any feedback will be greatly appreciated. > > > > _______________________________________________ > > Tutor maillist - [email protected] > > To unsubscribe or change subscription options: > > http://mail.python.org/mailman/listinfo/tutor > >
_______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
