Re: [R] Fitting inter-arrival time data

2003-07-01 Thread Adelchi Azzalini
On Tuesday 01 July 2003 05:16, M. Edward Borasky wrote:
 Unfortunately, the data are *non-negative*, not strictly positive. Zero is
 a valid and frequent inter-arrival time. It is, IIRC, the most likely value
 of a (negative) exponential distribution.

Not really. Zero+ is the value with highest density in a (negative) exponential 
distribution, which implies that you should have *no* observed zero's from that
distribution.

If you have a non-negligible fraction of 0 values, then your data are reasonably 
described as  having a mixed distribution: 
  (1) a discrete component at 0, and 
  (2) a continuous positive component.

Kernel (or similar) density estimation is appropriate for the continuous component
only.  Notice that the same remark applies to any procedure (parametric or 
non-parametric, using mixtures, etc.) which is based on continuous components only. 

It *looks* that a wise procedure is to separate out the discrete and the continuos
component of your data, and handle them separately.  At the end you can merge
the two parts into
 Y = p * 0 + (1-p) * X
where p is the proportion of 0's, and X represents the  continuous component of
the random variable.

best wishes,

Adelchi Azzalini

-- 
Adelchi Azzalini  [EMAIL PROTECTED]
Dipart.Scienze Statistiche, Università di Padova, Italia
http://azzalini.stat.unipd.it/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Fitting inter-arrival time data

2003-07-01 Thread Adelchi Azzalini
 the two parts into
      Y = p * 0 + (1-p) * X
 where p is the proportion of 0's, and X represents the  continuous
 component of the random variable.

I must amend myself... what I should have written is
   Y = I * 0 + (1-I) * X
where I is a Bernoulli random variable with probability p of success (i.e. 1)
and X represents the  continuous component of the random variable.

-- 
Adelchi Azzalini  [EMAIL PROTECTED]
Dipart.Scienze Statistiche, Università di Padova, Italia
http://azzalini.stat.unipd.it/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Fitting inter-arrival time data

2003-06-30 Thread Prof Brian Ripley
On Sun, 29 Jun 2003, M. Edward Borasky wrote:

 I have a collection of data which includes inter-arrival times of requests
 to a server. What I've done so far with it is use sm.density to explore
 the distribution, which found two large peaks. However, the peaks are made
 up of Gaussians, and that's not really correct, because the inter-arrival
 time can never be less than zero. In fact, the leftmost peak is centered at
 somewhere around ten seconds, and quite a bit of it extends into negative
 territory.
 
 What I'd like to do is fit this dataset to a mixture (sum) of exponentials,
 hyper-exponentials and hypo-exponentials. My preference is to use the
 well-known branching Erlang approximation (exponential stages) to the hyper-
 and hypo-exponentials. In this approximation, a distribution is specified by
 its mean and coefficient of variation.
 
 So far, what I've been able to come up with in a literature search has been
 something called the Expectation Maximization algorithm. And I haven't been
 able to locate R code for this. So my questions are:

 1. Is EM the right way to go about this, or is there something better?

Even for normal mixtures, direct likelihood maximization was considered to 
be better in several studies.  The EM method converges notoriously slowly.

 2. Is there some EM code in R that I could experiment with, or do I need to
 write my own?

It's not an algorithm (despite its common name) so cannot be coding
generically.  There is EM code for normal mixtures in several places, e.g.
in packages emclust and mda.  Direct ML would be easier to code, I expect.

 3. Is there a way this could be done using the existing R kernel density
 estimators and some kind of kernel that is zero for negative values of its
 argument? 

No, but there are ways to do by transforming the x scale.  Local 
polynomial estimators (KernSmooth, locfit) will do better.  For all of 
these see MASS (the book) and its on-line complements.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Fitting inter-arrival time data

2003-06-30 Thread Adelchi Azzalini
On Monday 30 June 2003 01:23, M. Edward Borasky wrote:
 I have a collection of data which includes inter-arrival times of requests
 to a server. What I've done so far with it is use sm.density to explore
 the distribution, which found two large peaks. However, the peaks are made
 up of Gaussians, and that's not really correct, because the inter-arrival
 time can never be less than zero. In fact, the leftmost peak is centered at
 somewhere around ten seconds, and quite a bit of it extends into negative
 territory.

if you data are positive, you could use

  sm.density(..., positive=TRUE)

and possibly make use of the additional parameter delta for fine tuning

best wishes,

Adelchi Azzalini

-- 
Adelchi Azzalini  [EMAIL PROTECTED]
Dipart.Scienze Statistiche, Università di Padova, Italia
http://azzalini.stat.unipd.it/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Fitting inter-arrival time data

2003-06-30 Thread M. Edward Borasky
Unfortunately, the data are *non-negative*, not strictly positive. Zero is a
valid and frequent inter-arrival time. It is, IIRC, the most likely value of
a (negative) exponential distribution.

-- 
M. Edward (Ed) Borasky
mailto:[EMAIL PROTECTED]
http://www.borasky-research.net
 
Suppose that tonight, while you sleep, a miracle happens - you wake up
tomorrow with what you have longed for! How will you discover that a miracle
happened? How will your loved ones? What will be different? What will you
notice? What do you need to explode into tomorrow with grace, power, love,
passion and confidence? -- L. Michael Hall, PhD


 -Original Message-
[snip]

 if you data are positive, you could use
 
   sm.density(..., positive=TRUE)

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


RE: [R] Fitting inter-arrival time data

2003-06-30 Thread M. Edward Borasky
Thanks!! It does look like the easiest thing is direct ML; the code for a
normal mixture is in the book, so all I have to do is modify that for a sum
of a hyper-exponential, for which I have an approximate mean and CV, and a
normal, for which I have an approximate mean and SD.

I have two big peaks, one near zero which is probably hyperexponential with
a CV about 3, and the other near 600 seconds (a refresh that happens every
ten minutes) which looks Gaussian with a very small standard deviation. I
think what I'm going to do is fit the two peaks using ML, since I know where
they are, then subtract them out and look at the structure of the residuals.
The stuff over 600 seconds is sparse and totally uninteresting. After I'm
done with this, I get to look at the distribution of the network traffic.
The good news is that I get those inter-arrival times to the nearest
microsecond. :)

-- 
M. Edward (Ed) Borasky
mailto:[EMAIL PROTECTED]
http://www.borasky-research.net
 
Suppose that tonight, while you sleep, a miracle happens - you wake up
tomorrow with what you have longed for! How will you discover that a miracle
happened? How will your loved ones? What will be different? What will you
notice? What do you need to explode into tomorrow with grace, power, love,
passion and confidence? -- L. Michael Hall, PhD


 -Original Message-
[snip]

 For all of these see MASS (the book) and its on-line complements.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Fitting inter-arrival time data

2003-06-29 Thread M. Edward Borasky
I have a collection of data which includes inter-arrival times of requests
to a server. What I've done so far with it is use sm.density to explore
the distribution, which found two large peaks. However, the peaks are made
up of Gaussians, and that's not really correct, because the inter-arrival
time can never be less than zero. In fact, the leftmost peak is centered at
somewhere around ten seconds, and quite a bit of it extends into negative
territory.

What I'd like to do is fit this dataset to a mixture (sum) of exponentials,
hyper-exponentials and hypo-exponentials. My preference is to use the
well-known branching Erlang approximation (exponential stages) to the hyper-
and hypo-exponentials. In this approximation, a distribution is specified by
its mean and coefficient of variation.

So far, what I've been able to come up with in a literature search has been
something called the Expectation Maximization algorithm. And I haven't been
able to locate R code for this. So my questions are:

1. Is EM the right way to go about this, or is there something better?
2. Is there some EM code in R that I could experiment with, or do I need to
write my own?
3. Is there a way this could be done using the existing R kernel density
estimators and some kind of kernel that is zero for negative values of its
argument? 

-- 
M. Edward (Ed) Borasky
mailto:[EMAIL PROTECTED]
http://www.borasky-research.net
 
Suppose that tonight, while you sleep, a miracle happens - you wake up
tomorrow with what you have longed for! How will you discover that a miracle
happened? How will your loved ones? What will be different? What will you
notice? What do you need to explode into tomorrow with grace, power, love,
passion and confidence? -- L. Michael Hall, PhD

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help