Re: Distribution generator for simulator in C++
Dave Nulton wrote: Quite frankly Robert the details are proprietary. I suppose I could have been more descriptive, but I don't see what the shape of my distribution have to do with what it represents Well, our expertise is also proprietary!-) _ | | Robert W. Hayden | | Department of Mathematics / | Plymouth State College MSC#29 | | Plymouth, New Hampshire 03264 USA | * | Rural Route 1, Box 10 /| Ashland, NH 03217-9702 | ) (603) 968-9914 (home) L_/ [EMAIL PROTECTED] fax (603) 535-2943 (work)
Re: Distribution generator for simulator in C++
On Mon, 10 Jan 2000 18:33:47 -0800, "Dave and Kim Nulton" [EMAIL PROTECTED] wrote: Quite frankly Robert the details are proprietary. I suppose I could have been more descriptive, but I don't see what the shape of my distribution have to do with what it represents. I have received several email replies with various recommendations. I'll add transforming to the list. Not being a statistician by trade, the little hints I have received should provide a good starting point at the library. Thanks again - just to be counted: I will side whole-heartedly with Robert, and I think the statisticians with consulting experience will be with us. Robert describes what happens in the real world. When we *learn* what has generated the data, when we have pried out the news from people who were sure that we had no need of it, it has been -- too often -- revealing and important. Not 100% of the time, and maybe it is not much more than 25% of the time (but it has certainly been no less). I don't want someone to say, "What a TOTAL idiot you have to be, to ignore XXX!"I don't want someone to say THAT about my advice. Or, in this case, you should not be able to say, "The folks on sci.stat.edu thought that a little bit of xxx would be okay" , since the folks on sse want to warn you that good statistical advice is still an art; if you only provide a caricature of the data, you might get back only a caricature of an answer, no matter how inspired a guesser your advisor may be. BUT YOU win (and deserve) a reputation as a flake, an unreliable screw-up, if you miss an important, *obvious* issue, even once per 10. Or something like that. And if you knew what was "obvious" then you wouldn't be asking for that advice. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html
Re: Distribution generator for simulator in C++
I see your point Robert, and I hope you didn't think I was curt in my response (which I may have been). Your message was quite informative. I did meet an on-site statistician who has pledged to help me. I'm particularly interested in the square root transformation. I'm guessing it will compress the data. I'll look into it. I could have provided a related example to the problem I'm trying to solve. It could best be represented by maintenance intervals on an automobile. Much of the data is random (depending on what you drive). At the same time, some of the data is non-random in that you must change the oil periodically and that the older the car is, the more likely it will break down and for a longer period of time. You have been quite helpful and I appreciate your time and interest in my problem. I will investigate your leads. You all can use me as an example of ignorant arrogance. -dnult Robert Dawson wrote in message 071a01bf5c3e$a131dab0$[EMAIL PROTECTED]... Dave Nulton wrote: Quite frankly Robert the details are proprietary. I suppose I could have been more descriptive, but I don't see what the shape of my distribution have to do with what it represents To take the second point first, the origin of a dataset often contains valuable information relating to the plausibility of various models. For instance, it is a truism that "it takes money to make money". If I buy 100 shares of Wombat.Com and you buy 1000 shares, and the price goes up by $5 per share, I make $500 and you make $5000. Because of this inherently multiplicative structure, it is *very* common for financial data to respond well to a logarithmic transformation. On the other hand, "count" data may - depending on what's being counted and how - follow a "Poisson" model. In such a model, the events being counted hapen independently and at random in a "window" of fixed size - calls per day to a help line, flaws per 1000 meters in recording tape, snowflakes landing on your tongue per minute... Such data, if the numbers are small, may require specialized regression techniques; with more data, a square root transformation often helps. If the data set is small or has any unusual features, it may be difficult to tell which transformation is appropriate just by looking at the data. The "story" of the data is important. There are many other examples. For instance, even with a simple 2x2 table in which the frequencies of two outcomes are compared under two situations, you need to know whether the trials are independent (in which case a two-sample z test would typically be used) or paired across treatments, in which case McNemar's test would be more appropriate. For such reasons, it is often impossible to give reliable statistical advice based on numbers _in_vacuo_. I cannot imagine members of many other professions attempting to do the equivalent - indeed, I would hazard a guess that in many cases professional associations would take a dim view of giving a professional opinion to a client/patient/whatever who insisted on withholding relevant information. I would suggest that if this dataset is important enough to warrant this level of secrecy, you find a statistician who is willing to sign a NDA, and that you pay the going rate for the consultation. (Don't ask me, I'm neither a professional statistician nor interested.) Trying to get advice, free or not, from people whom you do not trust enough to give even a basic explanation seems to me like a waste of your time and ours. -Robert Dawson
Re: Distribution generator for simulator in C++
: Dave Nulton wrote: I'm writing a simulator in C++. So far I have written a program to collect data from a database and hope to be able to generate an algorithm to return a random value with a distribution that matches my real world data. What I'm finding is that the data is UGLY. In order to generate a reasonable representation of the data, I'd need almost 3 million bins, and then most of the information would be crammed into the first 1000 or so bins. I've drawn an ASCII art representation below. I don't want to give up those flyers, because they sum up to a considerable amount. I'm modeling man loading in a manufacturing facility, so throwing out the flyers will really skew my simulator. Has anyone ever encountered such a problem? Better yet, can someone recommend a C++ algorithm to model my data? I'm thinking I may have to go to some sort of a logarithmic distribution, but it is important to base my simulator on real world data and not generic algorithms. I would be willing to fit a model if I knew of a good model and how to utilize it in C++. -dnult / \ / \ / \ /\ ! /\ ^ . . . *'`' `*** *** Two points: (1) I and others have said on various occasions: please, if you're asking for advice about a data set, tell people what it is. I'm not entirely sure that I understand the psychology of this practice, but the result is akin to going to the dentist and refusing to open your mouth. (2) Try transforming. I don't know if this is good advice or not - see (1). -Robert Dawson
Re: Distribution generator for simulator in C++
Quite frankly Robert the details are proprietary. I suppose I could have been more descriptive, but I don't see what the shape of my distribution have to do with what it represents. I have received several email replies with various recommendations. I'll add transforming to the list. Not being a statistician by trade, the little hints I have received should provide a good starting point at the library. Thanks again -dnult Robert Dawson wrote in message 05c501bf5b6f$c3448b90$[EMAIL PROTECTED]... : Dave Nulton wrote: snip (1) I and others have said on various occasions: please, if you're asking for advice about a data set, tell people what it is. I'm not entirely sure that I understand the psychology of this practice, but the result is akin to going to the dentist and refusing to open your mouth. (2) Try transforming. I don't know if this is good advice or not - see (1). -Robert Dawson