Re: [Tutor] print date and skip any repeats

Dave Angel Thu, 02 Oct 2014 10:37:48 -0700

On 09/24/2014 05:19 AM, questions anon wrote:

Ok, I am continuing to get stuck. I think I was asking the wrong question
so I have posted the entire script (see below).
What I really want to do is find the daily maximum for a dataset (e.g.
Temperature) that is contained in monthly netcdf files where the data are
separated by hour.
The major steps are:
open monthly netcdf files and use the timestamp to extract the hourly data
for the first day.
Append the data for each hour of that day to a list, concatenate, find max
and plot
then loop through and do the following day.


I can do some of the steps separately but I run into trouble in working out
how to loop through each hour and separate the data into each day and then
repeat all the steps for the following day.

Any feedback will be greatly appreciated!

This is NOT the whole program. You don't seem to define ncvariablename,Dataset, np, Basemap, plt, and probably others.


oneday=[]
all_the_dates=[]
onedateperday=[]


#open folders and open relevant netcdf files that are monthly files
containing hourly data across a region
for (path, dirs, files) in os.walk(MainFolder):
         for ncfile in files:

If there are more than one file, or more than one directory, then thereare no guarantees that this will give you the files in the order you arelikely to want. You may need to do some sorting to make it work out.But after further study, you seem to assume you'll read everything infrom all the files, and then process it. Have you studied how big thatmight get, and whether you'll have enough memory to make it viable?

             if ncfile.endswith(ncvariablename+'.nc'):
                 print "dealing with ncfiles:", path+ncfile
                 ncfile=os.path.join(path,ncfile)
                 ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
                 variable=ncfile.variables[ncvariablename][:,:,:]
                 TIME=ncfile.variables['time'][:]

Since you've given ncfile MANY different meanings, this close() will notclose the file. Variables are cheap, use lots of them, and give themgood names. In this particular case, maybe using a with statement wouldmake sense.

                 ncfile.close()

                 #combine variable and time so I can make calculations based
on hours or days
                 for v, time in zip((variable[:,:,:]),(TIME[:])):

                     cdftime=utime('seconds since 1970-01-01 00:00:00')
                     ncfiletime=cdftime.num2date(time)
                     timestr=str(ncfiletime)
                     utc_dt = dt.strptime(timestr, '%Y-%m-%d %H:%M:%S')
                     au_tz = pytz.timezone('Australia/Sydney')
                     local_dt =
utc_dt.replace(tzinfo=pytz.utc).astimezone(au_tz)
                     strp_local=local_dt.strftime('%Y-%m-%d_%H') #strips
time down to date and hour
                     local_date=local_dt.strftime('%Y-%m-%d') #strips time
down to just date

                     all_the_dates.append(local_date)

#make a list that strips down the dates to only have one date per day
(rather than one for each hour)
onedateperday = sorted ( list (set (all_the_dates)))

#loop through each day and combine data (v) where the hours occur on the
same day
for days in onedateperday:
     if strp_local.startswith(days):
         oneday.append(v)

And just where does v come from? You're no longer in the loop that says"for v, time in..."

big_array=np.ma.dstack(oneday) #concatenate data
v_DailyMax=big_array.max(axis=2) # find max

#then go on to plot v_DailyMax for each day
map = Basemap(projection='merc',llcrnrlat=-40,urcrnrlat=-33,
         llcrnrlon=139.0,urcrnrlon=151.0,lat_ts=0,resolution='i')
map.drawcoastlines()
map.drawstates()
map.readshapefile(shapefile1, 'REGIONS')
x,y=map(*np.meshgrid(LON,LAT))
plottitle=ncvariablename+'v_DailyMax'+days
cmap=plt.cm.jet
CS = map.contourf(x,y,v_DailyMax, 15, cmap=cmap)
l,b,w,h =0.1,0.1,0.8,0.8
cax = plt.axes([l+w+0.025, b, 0.025, h])
plt.colorbar(CS,cax=cax, drawedges=True)
plt.savefig((os.path.join(OutputFolder, plottitle+'.png')))
plt.show()
plt.close()

It looks to me like you've jumbled up a bunch of different codefragments. Have you learned about writing functions yet? Orgenerators? By making everything global, you're masking a lot ofmistakes you've made. For example, v has a value, but just the datafrom the last record of the last file.

I think you need to back off and design this, without worrying much atfirst about how to code it. It's generally a bad idea to try to collectall data before processing it and winnowing it down. If you knowsomething about the filenaming (or directory naming), and can assumethat all data from one day will be processed before encountering thenext day, you can greatly simplify things. Likewise if one hour iscomplete before the next begins.

Is there one file per day? Is there one record per hour and are they inorder? If some such assumptions can be made, then you might be able tofactor the problem down to a set of less grandiose functions.

Whoever wrote Dataset() has produced some data from a single file thatyou can work with. What does that data look like, and how much of itare you really needing to save past the opening of the next one?

Sometimes when the data is large and not sorted, it pays off to make twoor more passes through it. In the extreme, you might take one pass tomake a sorted list of all the hours involved in all the files. Then foreach of those hours, you'd take an additional pass looking for datarelated to that particular hour. Of course if you're talking years,you'll be opening each file many thousands of times. So if you knowANYTHING about the sorting of the data, you can make that more efficient.

I can't help you at all with numpy (presumably what np stands for), orthe plotting stuff.




--
DaveA
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] print date and skip any repeats

Reply via email to