from:"Leif Kirschenbaum"

[R] Estimation of skewness from quantiles of near-normal distribution

2006-03-23 Thread Leif Kirschenbaum

I have summary statistics from many sets (10,000's) of near-normal continuous 
data.  From previously generated QQplots of these data I can visually see that 
most of them are normal with a few which are not normal.  I have the raw data 
for a few (700) of these sets.  I have applied several tests of normality, 
skew, and kurtosis to these sets to see which test might yield a parameter 
which identifies the sets which are visibly non-normal on the QQplot.  My 
conclusions thus far has been that the skew is the best determinant of 
non-normality for these particular data.

Given that I do not have ready access to the sets (10,000's) of data, only to 
summary statistics which have been calculated on these sets, is there a method 
by which I may estimate the skew given the following summary statistics:
0.1% 1% 5% 10% 25% 75% 90% 95% 99% 99.9% mean median N sigma

N is usually about 900, and so I would discount the 0.1%, 1%, 99%, and 99.9% 
quantiles as unreliable due to noisiness in the distributions.

I know that for instance there are general rules for calculated sigma of a 
normal distribution given quantiles, and so am wondering if there are any 
general rules for calculating skew given a set of quantiles, mean, and sigma.  
I am currently thinking of trying polynomial fits on the QQplot using the raw 
data I have and then empirically trying to derive a relationship between the 
quantiles and the skew.

Thank you for any ideas.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Combining plaintext and plotmath expressions

2006-03-02 Thread Leif Kirschenbaum

I searched the archives and did not find a solution, so I pose this question to 
those well-versed in the use of plotmath and expressions.

I have a list of strings in an external CSV file which I wish to use sometimes 
as plot axis labels and sometimes as plot titles.  These strings combine 
plaintext and a few mathematical expressions (Greek letters, subscripts). 
Moreover, I sometimes need to concatenate other plaintext with these strings. 
Thus far I have written my string list to look like this:

"X translation ("*mu*"m)"
"Rotation ("*degree*")"
"Mean ("*mu*"m)"
"Vb1-Vb2 "*(Omega)
H[2]O "Used"
Max P
# Defects

I read in the strings, and use one at a time in the variable "parmlabel".
I have another variable called "testlabel" which I also need to concatenate, 
and can be things like "distance" or "E-test". Sometimes I also concatenate 
another variable "oplabel" when the variable "op" is not zero-length.

I have kludged a solution using this function:


expr.label<-function(title=""){
## single quote the character # so that it is not interpreted as a comment
  parmlabel<-gsub("#","'#'",cfg$parmlabel[icfg])
## make sure that a string like "Vb1-Vb2" is not interpreted as subtraction
  text1<-gsub("-","*'-'*",testlabel)
## if need to prepend title test, do that, and make multiple spaces single
  if(nchar(title)) text1<-paste(gsub("  "," ",title)," ",testlabel,sep="")
## substitute to avoid interpretation of "for"
  text1<-gsub("for","f*or",text1,fixed=TRUE)
## substitute ~ for spaces to avoid errors in parsing
  text1<-gsub(" ","~",text1,fixed=TRUE)
## see if we have a parsable expression
  text2<-try(parse(text=parmlabel),silent=TRUE)
## if not parsable then concatenate another way
  if(class(text2)=="try-error" | length(text2)==0){
text2<-gsub(" ","~",parmlabel)
exprtitle<-paste(text1,text2,sep="~")
  } else{ exprtitle<-paste('"',text1,'~"*',text2,sep='') }
  exprtitle<-paste(text1,text2,sep="~")
## if the variable op is not zero-length, then use oplabel
  if(nchar(op)>1) exprtitle=paste(exprtitle,'~',gsub(' ','~',oplabel),sep="")
  return(exprtitle)
}

As an example:

> parmlabel<-'"Vb1-Vb2 "*(Omega)'
> testlabel<-'E-test'
> op<-'dev lab'
> oplabel<-'Development Lab'
> expr.label("SPC Chart for")

I then use exprtitle in:

   plot(1:10,title= if(plotnum==1) parse(text=exprtitle) else NULL)

However:
  this code seems convoluted and ungainly
  I don't always get the results I want

Any suggestions?

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] How do I "normalise" a power spectral density

2006-01-31 Thread Leif Kirschenbaum

I have done a fair bit of spectral analysis, and hadn't finished collecting my 
thoughts for a reply, so hadn't replied yet.

What exactly do you mean by normalize?
  I have not used the functons periodogram or spectrum, however from the 
description for periodogram it appears that it returns the spectral density, 
which is already normalized by frequency, so you don't have to worry about 
changing the appearance of your periodogram or power spectrum if you change the 
time intervals of your data. With a normal Fourier Transform, not only do you 
need to complex square the terms but you also need to divide by a normalizing 
factor to give power-per-frequency-bin, which the periodogram appears to do.

  However, if you look in various textbooks, the definition of the Fourier 
Transform (FT) varies from author to author in the magnitude of the prepended 
scaling factor. Since the periodogram is related to the FT (periodogram 
ultimately uses the function mvfft), without examination of the code for 
periodogram you cannot know the scaling factor, which is almost always one of 
the following:
  1.0
  1/2
  1/(2*pi)
  SQRT(1/2)
  SQRT(1/(2*pi))
  In fact, if you obtain an FT (or FFT or DFT) from a piece of electronics (say 
an electronic spectrum analyzer), the prepending factor can vary from 
manufacturer to manufacturer.
  Fortunately, there is a strict relationship between the variance of your 
signal and the integrated spectral density. If your time signal is x(t), the 
spectral density is S(f), and fc = frequency(x)/2 the Nyquist cutoff frequency, 
then this may be expressed as:

  variance(x(t)) = constant * {Integral from 0 to fc of S(f)}

In R-code: let x be your time series, and "constant" be the unknown scaling 
factor (1/2, 1/2pi, etc.)
  p <- periodogram(x)
  var(x) == constant * sum(p[[1]])/length(p[[1]])

Or:
  constant = sum(p[[1]])/length(p[[1]])/var(x)

and we find that the appropriate scaling constant is 1.0.

  As regards plotting versus period, the periodogram returns arrays of spectral 
amplitude and frequencies. The frequencies are in inverse units of the 
intervals of your time series. i.e. if your time series is 1-point per day, 
then the frequencies are in 1/day units. Thus if you wish to plot amplitude 
versus period in weeks you have a little math to do.
  I believe that plotting is usually versus frequency since most observers are 
interested in how things vary versus frequency: multiple evenly spaced peaks on 
a linear frequency scale indicates the presence of harmonics; this is not so 
simply seen in a plot versus period. ex. peaks of 10 Hz, 20 Hz, 30 Hz, 40 
Hz,... in period would be at periods of 100 ms, 50 ms, 25 ms, 12.5 ms, and the 
peaks are not evenly spaced.
  Additionally, there are all kinds of typical responses versus frequency (1/f, 
1/f^2) which are clearly seen in plots versus frequency as straight lines (log 
power vs log frequency), but would come out as curves in plots versus period.
  I can see how ecological studies may indeed be more interested in the periods.

  However, I would be wary of using the periodogram function, for if I 
calculate periodograms of the same sinewave but for differing lengths of the 
sample, the spectral density does not come out the same. All 4 of the plots 
produced by the code below should overlay, and yet as the time series becomes 
longer there appears to be an increasing offset of the magnitudes returned. 
(black-brown-blue-red)

x0<-ts(data=sin(2*pi*1.1*(1:1000)/10),frequency=10)
p0<-periodogram(x0)
var(x0)

x1<-ts(data=sin(2*pi*1.1*(1:1)/10),frequency=10)
p1<-periodogram(x1)
var(x1)

x2<-ts(data=sin(2*pi*1.1*(1:10)/10),frequency=10)
p2<-periodogram(x2)
var(x2)

x3<-ts(data=sin(2*pi*1.1*(1:100)/10),frequency=10)
p3<-periodogram(x3)
var(x3)

plot(p3[[2]],p3[[1]],col="red",type="p",pch=19,cex=0.05,log="y",
xlim=c(0.1,0.12),ylim=c(1e-30,1e-15))
points(p2[[2]],p2[[1]],type="l",col="blue")
points(p1[[2]],p1[[1]],type="l",col="brown")
points(p0[[2]],p0[[1]],type="o",col="black")


> Message: 113
> Date: Mon, 30 Jan 2006 17:45:58 -0800
> From: Spencer Graves <[EMAIL PROTECTED]>
> Subject: Re: [R] How do I "normalise" a power spectral density
>   analysis?
> To: Tom C Cameron <[EMAIL PROTECTED]>
> Cc: r-help@stat.math.ethz.ch
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
> 
> Since I have not seen a reply to this post, I will 
> offer a comment, 
> even though I have not used spectral analysis myself and 
> therefore have 
> you intuition about it.  First, from the definitions I read in the 
> results from, e.g., RSiteSearch("time series power spectral density") 
> [e.g., 
> http://finzi.psych.upenn.edu/R/library/GeneTS/html/periodogram
> .html] and 
> "spectral analysis" in Venables and Ripley (2002) Modern Applied 
> Statistics with S (Springer), I see no reason why you 
> couldn't plot the 
> spectrum vs. the period rather than the frequency.  Someone else

Re: [R] Scientific notation in plots

2006-01-17 Thread Leif Kirschenbaum

You could also write your numbers using SI suffixes.
Below is a function I use for this purpose.
The option "near" allows you to require that numbers between 0.001 and 1000 are 
written as decimal; i.e. "0.010" appears as "0.010" instead of "10.0m".

##
## num2SI converts numbers to SI suffixed numbers
##
num2SI<-function(num,digits=4,near=TRUE){
  power<-signif(log10(abs(num)),digits=6)
  power3=floor(power/3); power3[!is.finite(power3)]=0
  powers=c("y","z","a","f","p","n","u","m","","k","M","G","T","P")
  powerstr=powers[power3+9]
  newnum=signif(num/10^(power3*3),digits=digits)
  newnum[is.nan(newnum)]=0
  numstr=(paste(newnum,powerstr,sep=""))
  nearidx=near & (abs(num)>0.001) & (abs(num)<1000)
  nearidx[is.na(nearidx)]=FALSE
  if(sum(nearidx)) 
numstr[nearidx]=sprintf("%.*f",pmax(1,(digits-floor(power[nearidx])-1),na.rm=TRUE),num[nearidx])
  numstr[!is.finite(num)] = paste(num[!is.finite(num)])
 return(numstr)
}

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED] 

> Message: 6
> Date: Fri, 13 Jan 2006 12:49:54 +0100 (CET)
> From: Oddmund Nordg??rd <[EMAIL PROTECTED]>
> Subject: [R] Scientific notation in plots
> To: r-help@stat.math.ethz.ch
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: TEXT/PLAIN; charset=ISO-8859-1
> 
> 
> Is it possible to use scientific notation of numbers on the 
> axis of plots
> without using the xEy notation. That means: a beatiful 1x10^3 
> instead of 1E3.
> Logarithmic scale, in my case.
> 
> Thank you very much!
> 
> Oddmund
> 
> **
> 
>   Oddmund Nordg?rd
> 
>   Department of Haematology and Oncology
>   Stavanger University Hospital
>   P.O. Box 8100
>   4068 STAVANGER
>   Phone: 51 51 89 34
>   Email: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Find last row (observation) for each combination of variables

2006-01-10 Thread Leif Kirschenbaum

Let's say I have a data.frame like
A   B   C   TS  other columns
1   1   1   12345
1   1   1   56789
1   2   1   23456
1   2   2   23457
2   4   7   23458
2   4   7   34567
2   4   7   45678

and I want the last row for each unique combination of A/B/C, where by "last" I 
mean greatest TS.
A   B   C   TS  other columns
1   1   1   56789
1   2   1   23456
1   2   2   23457
2   4   7   45678

I did this simply in SAS:
 proc sort data=DF;
   by A B C descending TS
 run;
 proc sort data=DF NODUPKEY;
   by A B C;
 run;

I tried using "aggregate" to find the maximum TS for each combination of A/B/C, 
but it's slow.
I also tried "by" but it's also slow.
My current (faster) solution is:

 DF$abc<-paste(DF$A,DF$B,DF$C,sep="")
 abclist<-unique(DF$ABC)
 numtest<-length(abclist)
 maxTS<-rep(0,numtest)
 for(i in 1:numtest){
  maxTS[i]<-max(DF$TS[DF$abc==abclist[i]],na.rm=TRUE)
 }
 maxTSdf<-data.frame(device=I(abc),maxTS=maxTS )
 DF<-merge(DF,maxTSdf,by="abc",all.x=TRUE)
 DF<-Df[DF$TS==DF$maxTS,,drop=TRUE]
 DF$maxTS<-NULL

This seems a bit lengthy for such a simple task.

Any simpler suggestions?

-Leif K.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] "Missing value representation in Excel before

2006-01-10 Thread Leif Kirschenbaum

I reproduce from memory my exhaustive look into this issue.
RODBC uses the Microsoft ODBC DLL's developed by Microsoft.
  These DLL's perform an automatic determination of column type based on the 
contents of the first N rows of cells in each column, where N [0,16]. N may be 
set in the Windows system registry, and there are a few other things that may 
be set in the system registry which control how the DLL parses an Excel 
spreadsheet. Unfortunately, the Microsoft DLL's do not always pay attention to 
the registry settings and do not always interpret them in the same manner.
  The end result is that no matter what you do with RODBC, and no matter how 
the authors of RODBC re-write it, some Excel spreadsheets will always be 
unreadable via RODBC given particular insidious combinations of data in some 
columns of your spreadsheet. (until such time as Microsoft fixes their DLL 
bugs, I mean features) I have some faint recollection that the Microsoft DLL 
incorrectly parses a column with non-empty rows due to some formatting issue of 
those particular columns, which I was unable to cure by re-formatting the 
source worksheet.
  I have had to resort to using the gdata package which runs a Perl script 
"xls2csv.pl", which converts an Excel spreadsheet to CSV, for a few Excel 
spreadsheets which exhibit the particular anomalies preventing use of RODBC.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED] 

> Message: 21
> Date: Mon, 9 Jan 2006 18:06:49 +0100
> From: "Fredrik Lundgren" <[EMAIL PROTECTED]>
> Subject: Re: [R] "Missing value representation in Excel before
>   extraction to   R with RODBC"
> To: "Prof Brian Ripley" <[EMAIL PROTECTED]>,  "Petr Pikal"
>   <[EMAIL PROTECTED]>
> Cc: R-help 
> Message-ID: <[EMAIL PROTECTED]>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>   reply-type=response
> 
> Dear list,
> 
> Well, those columns in Excel that starts with NA (actually 8 
> NA's in my 
> case) is imported as all NA in R but if the columns starts 
> with at least 
> 3 cells with values (i.e not NA) the are imported correctly 
> to R. When 
> as.is=TRUE is used a simular conversion takes place but now 
> as all  
> and dates are represented as date-and-time.
> Is there any way to get this correct even when the Excel 
> columns start 
> with several NA's?
> 
> Sincerely
> Fredrik
> 
> 
> - Original Message - 
> From: "Prof Brian Ripley" <[EMAIL PROTECTED]>
> To: "Petr Pikal" <[EMAIL PROTECTED]>
> Cc: "Fredrik Lundgren" <[EMAIL PROTECTED]>; "R-help" 
> 
> Sent: Monday, January 09, 2006 9:36 AM
> Subject: Re: [R] "Missing value representation in Excel before 
> extraction to R with RODBC"
> 
> 
> > On Mon, 9 Jan 2006, Petr Pikal wrote:
> >
> >> Hi
> >>
> >> I believe it has something to do with the column identification
> >> decision. When R decides what is in a column it uses only 
> some values
> >> from the beginning of a file.
> >
> > Not R, Excel.  Excel tells ODBC what the column types are.
> >

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Wikis etc.

2006-01-09 Thread Leif Kirschenbaum

To avoid spam on the R wikis pages:
  If we assume that anyone who we would want to be empowered to modify the R 
wiki pages is an R-user, would it be possible to somehow incorporate a function 
into the next R release which provides a user with a key/password?
  A new R function would generate a day-of-the year dependent key: if you want 
to modify an R wiki page you need to enter the key for that day (This is not a 
proposal to make keys user specific: every R user worldwide would have the same 
key each day). Then only a person who has installed R would be able to run the 
function to get a key to modify R wiki pages. Of course anyone could read the 
wikis.

I supposed that if we wanted, that the key provided could somehow encode the 
O/S and R version being run, and then the wiki page modified would note which 
O/S and version the annotator is running, however for ease of use I suggest 
that the key generated each day be short for simplicy in typing it in.

I suppose a more complex solution would be for an R function to make a call to 
open a web-browser with a cookie or something set which thus allows the user to 
modify R wiki pages.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] A comment about R:

2006-01-05 Thread Leif Kirschenbaum

A few thoughts about R vs SAS:
I started learning SAS 8 years ago at IBM, I believe it was version 6.10.
I started with R 7 months ago.

Learning curve:
  I think I can do everything in R after 7 months that I could do in SAS after 
about 4 years.

Bugs:
  I suffered through several SAS version changes, 7.0, 7.1, 7.2, 8.0, 9.0 (I 
may have misquoted some version numbers). Every version change gave me 
headaches, as every version release (of an expensive commercially produced 
software set) had bugs which upset or crashed previously working code. I had 
code which ran fine under Windows 2000 and terribly under Windows XP. Most bugs 
I found were noted by SAS, but never fixed.
  With R I have encounted very few bugs, except for an occasional crash of R, 
which I usually ascribe to some bug in Windows XP.

Help:
  SAS help was OK. As others have mentioned, there is too much. I even had the 
set of printed manuals on my desk (stretching 4 feet or so), which were quote 
impenetrable. I had almost no support from colleagues: even within IBM the 
number of advanced SAS users was small.
  With R this mailing list has been of great help: almost every issue I copy 
some program and save it as a "R hint " file.
--> A REQUEST
I would say that I would appreciate a few more program examples with the help 
pages for some functions. For instance, "?Control" tells me about "if(cond) 
cons.expr  else  alt.expr", however an example of
   if(i==1) { print("one") 
   } else if(i==2) { print("two")
   } else if(i>2) { print("bigger than two") }
 at the end of that help section would have been very helpful for me a few 
months ago.

Functions:
  Writing my own functions in SAS was by use of macros, and usually depended 
heavily on macro substitution. Learning SAS's macro language, especially macro 
substitution, was very difficult and it took me years to be able to write 
complicated functions. Quite different situation in R. Some functions I have 
written by dint of copying code from other people's packages, which has been 
very helpful.
  I wanted to generate arbitrary k-values (the k-multiplier of sigma for a 
given alpha, beta, and N to establish confidence limits around a mean for small 
populations). I had a table from a years old microfiche book giving values but 
wanted to generate my own. I had to find the correct integrals to approximate 
the k-values and then write two SAS macros which iterated to the desired level 
of tolerance to generate values. I would guess that there is either an R base 
function or a package which will do this for me (when I need to start 
generating AQL tables). Given the utility of these numbers, I was disappointed 
with SAS.

Data manipulation:
  All SAS data is in 2-dimensional datasets, which was very frustrating after 
having used variables, arrays, and matrices in BASIC, APL, FORTRAN, C, Pascal, 
and LabVIEW. SAS allows you to access only 1 row of a dataset at a time which 
was terribly horribly incomprehensibly frustrating. There were so many many 
problems I had to solve where I had to work around this SAS paradigm.
  In R, I can access all the elements of a matrix/dataframe at once, and I can 
use >2 dimensional matrices. In fact, the limitations of SAS I had ingrained 
from 7.5 years has sometimes made me forget how I can do something so easily in 
R, like be able to know when a value in a column of a dataframe changes:
  DF$marker <- DF[1:(nrow(DF)-1),icol] != DF[2:nrow(DF),icol]
This was hard to do in SAS...and even after years it was sometimes buggy, 
keeping variable values from previous iterations of a SAS program.
  One very nice advantage with SAS is that after data is saved in libraries, 
there is a GUI showing all the libraries and the datasets inside the libraries 
with sizes and dates. While we can save Rdata objects in an external file, the 
base package doesn't seem to have the same capabilities as SAS.

Graphics:
  SAS graphics were quite mediocre, and generating customized labels was 
cumbersome. Porting code from one Windows platform to another produced 
unpredictable and sometimes unworkable results.
  It has been easier in R: I anticipate that I will be able to port R Windows 
code to *NIX and generate the same graphics.

Batch commands:
  I am working on porting some of my R code to our *NIX server to generate 
reports and graphs on a scheduled basis. Although a few at IBM did this with 
SAS, I would have found doing this fairly daunting.


-Leif

-
 Leif Kirschenbaum, Ph.D.
 Senior Yield Engineer
 Reflectivity
 [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] how to specify dev.print target by a variable?

2005-12-22 Thread Leif Kirschenbaum

I want to do the following:

  DEVw=500
  DEVh=350
  fname="my_plot"
  dev.print(file=fname, device=FOO, width=DEVw, height=DEVh, bg="transparent")

How do I do this such that I can specify FOO to be one of several choices? 
(GDD, PNG, postscript, etc.)
If I make FOO a character variable, then "dev.print" complains.
I tried a simpled "substitute" but didn't get it to work...
I'm thinking it's going to involve a "do.call" and "substitute" but I'm not 
sure.

Using:

$platform[1] "i386-pc-mingw32"
$arch[1] "i386"
$os[1] "mingw32"
$system[1] "i386, mingw32"
$status[1] ""
$major[1] "2"
$minor[1] "2.0"
$year[1] "2005"
$month[1] "10"
$day[1] "06"
$"svn rev"[1] "35749"
$language[1] "R"

and also running the same code on:

$platform[1] "i686-redhat-linux-gnu"
$arch[1] "i686"
$os[1] "linux-gnu"
$system[1] "i686, linux-gnu"
$status[1] ""
$major[1] "2"
$minor[1] "0.0"
$year[1] "2004"
$month[1] "10"
$day[1] "04"
$language[1] "R"


-Leif S. Kirschenbaum, Ph.D.
 Yield Integration Engineer
 Reflectivity, Inc.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] qcc

2005-11-30 Thread Leif Kirschenbaum

If you examine the code for the function "violating.runs" in the qcc 
package (try "violating.runs") you can deconstruct it to find that the code 
flags runs where there are .qcc.options$run.length or more points in a row on 
one side of the center (process mean).
However, the classic Shewhart rules dictate that a run of monotonically 
increasing or decreasing points of  is also a run violation. I 
include here my modified version of the function "violating.runs" which also 
checks for these violations:

##
## Correct some typos in violating.runs from qcc package
## Added test for run.length of points monotonically increasing or decreasing
## The simplest way is to re-run the code but with "diffs"
## representing the sign of the difference from one point to the next
##
violating.runs<-function (object, run.length = qcc.options("run.length")) 
{
center <- object$center
statistics <- c(object$statistics, object$new.statistics)
cl <- object$limits
violators <- numeric()

   for(i in 1:2){
diffs <- statistics - center
if(i==2) {
 diffs <- c(0,diff(statistics))
## need to decrement run.length since we're looking at differences 
between points
 run.length<-run.length-1
}
diffs[diffs > 0] <- 1
diffs[diffs < 0] <- -1
runs <- rle(diffs)
index.lengths <- (1:length(runs$lengths))[runs$lengths >= run.length]
index.stats <- 1:length(statistics)
vruns <- rep(runs$lengths >= run.length, runs$lengths)
vruns.above <- (vruns & (diffs > 0))
vruns.below <- (vruns & (diffs < 0))
rvruns.above <- rle(vruns.above)
rvruns.below <- rle(vruns.below)
vbeg.above <- 
cumsum(rvruns.above$lengths)[rvruns.above$values]-(rvruns.above$lengths - 
run.length)[rvruns.above$values]
vend.above <- cumsum(rvruns.above$lengths)[rvruns.above$values]
vbeg.below <- 
cumsum(rvruns.below$lengths)[rvruns.below$values]-(rvruns.below$lengths - 
run.length)[rvruns.below$values]
vend.below <- cumsum(rvruns.below$lengths)[rvruns.below$values]
if (length(vbeg.above)) {
for (i in 1:length(vbeg.above)) violators <- c(violators, 
vbeg.above[i]:vend.above[i])
}
if (length(vbeg.below)) {
for (i in 1:length(vbeg.below)) violators <- c(violators, 
vbeg.below[i]:vend.below[i])
}
   } ## ENDOF for i in 1:2
return(violators)
}


> 3)Is there any more criterias made somewhere ?
The other criteria is found by the function "beyond.limits" which 
checks to see if any points are beyond the upper or lower control limits.

I believe that Prof. Scrucca wrote the qcc package as a tool to 
demonstrate process control to his students taking his statistics classes. 
Hence the statistics are excellent. For example note the use of log-gamma and 
exponential functions in the function "sd.xbar" to calculate the constant "c4", 
which avoids the divison of extremely large [~ 1E20] numbers:
c4 <- function(n) sqrt(2/(n-1)) * exp(lgamma(n/2) - lgamma((n-1)/2)))

where a textbook might write it as:
c4 <- function(n) sqrt(2/(n-1)) * (gamma(n/2) / gamma((n-1)/2)))
Thus I suggest that although qcc provides an excellent basis for 
constructing statistical control reports, that you should be prepared to modify 
it for your purposes (as I am heavily doing).

P.S. I change .qcc.options using syntax such as 
".qcc.options$violating.runs<-7".

Leif S. Kirschenbaum, Ph.D.
Senior Yield Engineer
Reflectivity, Inc.
408-737-8100 x307
408-737-8153 (Fax)


> Date: Tue, 29 Nov 2005 02:08:57 +0200
> From: Tommi Viitanen <[EMAIL PROTECTED]>
> Subject: [R] qcc
> 
> violating.runs
[deleted]
> 
> that the criteria for the violating is 5 but
> 1)I cannot find "5" in the code of the function. Where is the "5" ?
> 2)What is the easiest way to change it ?
> 3)Is there any more criterias made somewhere ?
> 
> Yours sincerelly, Tommi Viitanen
> 
[deleted]
> --
> 
> Date: Tue, 29 Nov 2005 08:58:24 +0100
> From: Uwe Ligges <[EMAIL PROTECTED]>
> Subject: Re: [R] qcc
> To: Tommi Viitanen <[EMAIL PROTECTED]>
> 
> 
> See ?violating.runs:
> violating.runs(object, run.length = qcc.options("run.length"))
>  ^ ^^^ ^^^ ^^ ^^^ ^^^
> 
[deleted]
> 
> Uwe Ligges

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Listing object sizes without RODBC objects: contributed function

2005-09-27 Thread Leif Kirschenbaum

I include my workaround for object.size failing on RODBC objects.
Code adapted from Petr Pikal's code 
(http://tolstoy.newcastle.edu.au/R/help/04/06/1228.html).

ls.objects<-function (pos = 1, order.by,...)
{
napply <- function(names, fn) sapply(names, function(x) fn(get(x,pos=pos)))
names <- ls(pos = pos,...)
obj.class <- napply(names, function(x) as.character(class(x))[1])
obj.drop <- match("RODBC",obj.class)
obj.class <- obj.class[-obj.drop]
names <- names[-obj.drop]
obj.mode <- napply(names, mode)
obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
obj.size <- napply(names, object.size)
obj.dim <- t(napply(names, function(x) as.numeric(dim(x))[1:2]))
vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
obj.dim[vec, 1] <- napply(names, length)[vec]
out <- data.frame(obj.type, obj.size, obj.dim)
names(out) <- c("Type", "Size", "Rows", "Columns")
if (!missing(order.by)) out <- out[order(out[[order.by]]), ]
out
} 


Leif S. Kirschenbaum, Ph.D.
Senior Yield Engineer
Reflectivity, Inc.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] PowerPoint graph insertion

2005-09-09 Thread Leif Kirschenbaum

Yes:  I have Tufte's monograph on my desk. (along with 4 statistics texts)
Yes:  I am not the biggest fan of PowerPoint.
Yes:  I am using R to generate charts, plots, trends, etc. I have to summarize 
them each week.

  When I consider how to organize this data my first thought is to generate an 
HTML file with links to the R-generated plots, which HTML file organizes the 
plots in the required order.
However:
 * Each week we annotate one PowerPoint slide in the weekly presentation with 
action items -- we don't only use PP as a presentation tool. This is 
convenient, as then the action items resulting from particular data trends are 
associated in a single document with the plots of the data trends.
 * Other (non-R) users insert data into the weekly PP presentation: from other 
plotting software and images from various sources (microscope, SEM, TEM, etc.), 
which I cannot easily incorporate into a generated HTML file before-the-fact.
 * I'm not sure how to create an HTML file which allows one to page forward and 
backward through it easily, as with PowerPoint (a minor point: and there is 
probably a way to write HTML to respond to such)

So:
 Can R insert plots into an existing PowerPoint presentation?
(actually, I'd copy last week's presentation and then update with new plots)

 I'll guess that it cannot, as there probably is not a Microsoft supplied 
interface (ODBC or otherwise) with PowerPoint as there is with Excel.

-Leif Kirschenbaum, Ph.D.
 Sr. Yield Integration Engineer (I'm a physicist)
 [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Estimation of skewness from quantiles of near-normal distribution

[R] Combining plaintext and plotmath expressions

Re: [R] How do I "normalise" a power spectral density

Re: [R] Scientific notation in plots

[R] Find last row (observation) for each combination of variables

Re: [R] "Missing value representation in Excel before

Re: [R] Wikis etc.

Re: [R] A comment about R:

[R] how to specify dev.print target by a variable?

Re: [R] qcc

[R] Listing object sizes without RODBC objects: contributed function

[R] PowerPoint graph insertion

12 matches

Site Navigation

Mail list logo

Footer information