Re: Distribution generator for simulator in C++

2000-01-11 Thread Bob Hayden


Dave Nulton wrote:

 Quite frankly Robert the details are proprietary.  I suppose I could have
 been more descriptive, but I don't see what the shape of my distribution
 have to do with what it represents

Well, our expertise is also proprietary!-) 

  _
 | |  Robert W. Hayden
 | |  Department of Mathematics
/  |  Plymouth State College MSC#29
   |   |  Plymouth, New Hampshire 03264  USA
   | * |  Rural Route 1, Box 10
  /|  Ashland, NH 03217-9702
 | )  (603) 968-9914 (home)
 L_/  [EMAIL PROTECTED]
  fax (603) 535-2943 (work)



Re: Distribution generator for simulator in C++

2000-01-11 Thread Rich Ulrich

On Mon, 10 Jan 2000 18:33:47 -0800, "Dave and Kim Nulton"
[EMAIL PROTECTED] wrote:

 Quite frankly Robert the details are proprietary.  I suppose I could have
 been more descriptive, but I don't see what the shape of my distribution
 have to do with what it represents.  I have received several email replies
 with various recommendations.  I'll add transforming to the list.  Not being
 a statistician by trade, the little hints I have received should provide a
 good starting point at the library. Thanks again

 - just to be counted:  I will side whole-heartedly with Robert, and I
think the statisticians with consulting experience will be with us.  

Robert describes what happens in the real world.  When we *learn* what
has generated the data, when we have pried out the news from people
who were sure that we had no need of it, it has been -- too often --
revealing and important.  Not 100% of the time, and maybe it is not
much more than 25% of the time (but it has certainly been no less).  I
don't want someone to say, "What a TOTAL idiot you have to be, to
ignore XXX!"I don't want someone to say THAT about my advice.  

Or, in this case, you should not be able to say, "The folks on
sci.stat.edu  thought that a little bit of xxx would be okay" , since
the folks on sse  want to warn you that good statistical advice is
still an art;  if you only provide a caricature of the data, you might
get back only a caricature of an answer, no matter how inspired a
guesser  your advisor may be.


BUT YOU win (and deserve) a reputation as a flake, an unreliable
screw-up, if you miss an important, *obvious* issue, even once per 10.
Or something like that.  And if you knew what was "obvious" then you
wouldn't be asking for that advice.
-- 
Rich Ulrich, [EMAIL PROTECTED]




http://www.pitt.edu/~wpilib/index.html



Re: Distribution generator for simulator in C++

2000-01-11 Thread Dave and Kim Nulton

I see your point Robert, and I hope you didn't think I was curt in my
response (which I may have been).  Your message was quite informative.  I
did meet an on-site statistician who has pledged to help me.  I'm
particularly interested in the square root transformation.  I'm guessing it
will compress the data.  I'll look into it.

I could have provided a related example to the problem I'm trying to solve.
It could best be represented by maintenance intervals on an automobile.
Much of the
data is random (depending on what you drive).  At the same time, some of the
data is non-random in that you must change the oil periodically and that the
older the car is, the more likely it will break down and for a longer period
of time.

You have been quite helpful and I appreciate your time and interest in my
problem.  I will investigate your leads.  You all can use me as an example
of ignorant arrogance.

-dnult

Robert Dawson wrote in message
071a01bf5c3e$a131dab0$[EMAIL PROTECTED]...
Dave Nulton wrote:

 Quite frankly Robert the details are proprietary.  I suppose I could have
 been more descriptive, but I don't see what the shape of my distribution
 have to do with what it represents

To take the second point first, the origin of a dataset often contains
valuable information relating to the plausibility of various models. For
instance, it is a truism that "it takes money to make money". If I buy 100
shares of Wombat.Com and you buy 1000 shares, and the price goes up by $5
per share, I make $500 and you make $5000. Because of this inherently
multiplicative structure, it is *very* common for financial data to respond
well to a logarithmic transformation.

On the other hand, "count" data may - depending on what's being counted
and how - follow a "Poisson" model. In such a model, the events being
counted hapen independently and at random in a "window" of fixed size -
calls per day to a help line, flaws per 1000 meters in recording tape,
snowflakes landing on your tongue per minute...  Such data, if the numbers
are small, may require specialized regression techniques; with more data, a
square root transformation often helps.

If the data set is small or has any unusual features, it may be
difficult to tell which transformation is appropriate just by looking at
the
data.  The "story" of the data is important.

There are many other examples. For instance, even with a simple 2x2
table in which the frequencies of two outcomes are compared under two
situations, you need to know whether the trials are independent (in which
case a two-sample z test would typically be used) or paired across
treatments, in which case McNemar's test would be more appropriate.

For such reasons, it is often impossible to give reliable statistical
advice based on numbers _in_vacuo_. I cannot imagine members of many other
professions attempting to do the equivalent - indeed, I would hazard a
guess
that in many cases professional associations would take a dim view of
giving
a professional opinion to a client/patient/whatever who insisted on
withholding relevant information.

I would suggest that if this dataset is important enough to warrant
this
level of secrecy, you find a statistician who is willing to sign a NDA, and
that you pay the going rate for the consultation. (Don't ask me, I'm
neither
a professional statistician nor interested.) Trying to get advice, free or
not, from people whom you do not trust enough to give even a basic
explanation seems to me like a waste of your time and ours.

-Robert Dawson





Re: Distribution generator for simulator in C++

2000-01-10 Thread Robert Dawson

: Dave Nulton wrote:


 I'm writing a simulator in C++.  So far I have written a program to
collect
 data from a database and hope to be able to generate an algorithm to
return
 a random value with a distribution that matches my real world data.  What
 I'm finding is that the data is UGLY.  In order to generate a reasonable
 representation of the data, I'd need almost 3 million bins, and then most
of
 the information would be crammed into the first 1000 or so bins.  I've
drawn
 an ASCII art representation below.

 I don't want to give up those flyers, because they sum up to a
considerable
 amount.  I'm modeling man loading in a manufacturing facility, so throwing
 out the flyers will really skew my simulator.

 Has anyone ever encountered such a problem?  Better yet, can someone
 recommend a C++ algorithm to model my data?  I'm thinking I may have to go
 to some sort of a logarithmic distribution, but it is important to base my
 simulator on real world data and not generic algorithms.  I would be
willing
 to fit a model if I knew of a good model and how to utilize it in C++.

 -dnult
 / \
/   \
   / \   /\  !   /\ ^   .  . .
 *'`'  `*** *** 


Two points:

(1) I and others have said on various occasions:  please, if you're
asking for advice about a data set, tell people what it is.  I'm not
entirely sure that I understand the psychology of this practice, but the
result is akin to going to the dentist and refusing to open your mouth.

(2) Try transforming.  I don't know if this is good advice or not - see
(1).

-Robert Dawson







Re: Distribution generator for simulator in C++

2000-01-10 Thread Dave and Kim Nulton

Quite frankly Robert the details are proprietary.  I suppose I could have
been more descriptive, but I don't see what the shape of my distribution
have to do with what it represents.  I have received several email replies
with various recommendations.  I'll add transforming to the list.  Not being
a statistician by trade, the little hints I have received should provide a
good starting point at the library. Thanks again

-dnult

Robert Dawson wrote in message
05c501bf5b6f$c3448b90$[EMAIL PROTECTED]...
: Dave Nulton wrote:


snip
(1) I and others have said on various occasions:  please, if you're
asking for advice about a data set, tell people what it is.  I'm not
entirely sure that I understand the psychology of this practice, but the
result is akin to going to the dentist and refusing to open your mouth.

(2) Try transforming.  I don't know if this is good advice or not - see
(1).

-Robert Dawson