On Mon, 4 Oct 2010 08:33:07 am David Hutto wrote: > I'm creating an app that charts/graphs data. The mapping of the > graphs is the 'easy' part with matplotlib, > and wx. My question relates to the alignment of the data to be > processed. > > Let's say I have three sets of 24 hr graphs with the same time steps: > > -the position of the sun > -the temp. > -local powerplant energy consumption > > > A human could perceive the relations that when it's wintertime, cold > and the sun goes down, heaters are turned on > and energy consumption goes up, and the opposite in summer when it > the sun comes up. > My problem is how to compare and make the program perceive the > relation.
This is a statistics problem, not a programming problem. Or rather, parts of it *uses* programming to solve the statistics problem. My statistics isn't good enough to tell you how to find correlations between three independent variables, but I can break the problem into a simpler one: find the correlation between two variables, temperature and energy consumption. Without understanding how the data was generated, I'm not entirely sure how to set the data up, but here's one approach: (1) Plot the relationship between: x = temperature y = power consumption where x is the independent variable and y is the dependent variable. (2) Look at the graph. Can you see any obvious pattern? If all the data points are scattered randomly around the graph, there you can be fairly sure that there is no correlation and you can go straight on to calculating the correlation coefficient to make sure. (3) But if the graph clearly appears to be made of separate sections, AND those sections correlate to physical differences due to the time of day (position of the sun), then you need to break the data set into multiple data sets and work on each one individually. E.g. if the graph forms a straight line pointing DOWN for the hours 11pm to 5am, and a straight line pointing UP for the hours 5am to 11pm, and you can think of a physical reason why this is plausible, then you would be justified in separating out the data into two independent sets: 5am-11pm, 11pm-5am. If you want to have the program do this part for you, this is a VERY hard problem. You're essentially wanting to write an artifical intelligence system capable of picking out statistical correlations from data. Such software does exist. It tends to cost hundreds of thousands of dollars, or millions. Good luck writing your own! (4) Otherwise feel free to simplify the problem by just investigating the relationship between temperature and power consumption during (say) daylight hours. (5) However you decide to proceed, you should now have one (or more) x-y graph. First step is to decide whether there is any correlation at all. If there is not, you can stop there. Calculate the correlation coefficient, r. r will be a number between -1 and 1. r=1 means a perfect positive correlation; r=-1 means a perfect negative correlation. r=0 means no correlation at all. (6) Decide whether the correlation is meaningful. I don't remember how to do this -- consult your statistics text books. If it's not meaningful, then you are done -- there's no statistically valid relationship between the variables. (7) Otherwise, you want to calculate the line of best fit (or possibly some other curve, but let's stick to straight lines for now) for the data. The line of best fit may be complicated to calculate, and it may not be justified statistically, so start off with something simpler which (hopefully!) is nearly as good -- a linear regression line. This calculates a line that statistically matches your data. (8) Technically, you can calculate a regression line for *any* data, even if it clearly doesn't form a line. That's why you are checking the correlation coefficient to decide whether it is sensible or not. By now any *real* statisticians reading this will be horrified :) What I've described is essentially the most basic, "Stats 101 for Dummies" level. Have fun! -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor