[R] penalized cox regression

2007-06-09 Thread carol white
Hi,
What is the function to calculate penalized cox regression? frailtyPenal in 
frailtypack R package imposes max 2 strata. I want to use a function that 
reduces all my variables without stratifying them in advance.

Look forward to your reply

carol
   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing two vectors

2007-06-09 Thread gallon li
Suppose I have a vector A=c(1,2,3)

now I want to compare each element of A to another vector L=c(0.5, 1.2)

and then recode values for sum(A>0.5) and sum(A>1.2)

to get a result of (3,2)

how can I get this without writing a loop of sums?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What ECDF function?

2007-06-09 Thread Robert A LaBudde
At 06:36 PM 6/9/2007, Marco wrote:
>On 6/9/07, Robert A LaBudde <[EMAIL PROTECTED]> wrote:
> > At 12:57 PM 6/9/2007, Marco wrote:
> > >
>
>
>Hmmm I'm a bit confused, but very interested!
>So you don't use the R "ecdf", do you?

Only when an i/n edf is needed (some tests, such as ks.test() are 
based on this). Also, I frequently do modeling in Excel as well, 
where you need to enter your own formulas.

>
> > Also remember that edfs are not very accurate, so the differences
> > between these formulae are difficult to justify in practice.
> >
>
>I will bear in min! My first interpretation was that using some
>different from i/n (e.g. i/(n+1)), let to better individuate tail 
>differences (maybe...)

The chief advantage to i/(n+1) is that you don't generate 1.0 as an 
abscissa, as you do with i/n. But the same is true of (i-0.5)/n, and 
it's more accurate.

Unless you need to do otherwise, just use ecdf(), because it matches 
the theory for most uses, and it almost always doesn't matter that 
it's slightly less accurate than other choices.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-09 Thread Gabor Grothendieck
That can be  elegantly handled in R through R's object oriented programming
by defining a class for the fancy input.  See this post:
  https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html
for a simple example of that style.


On 6/9/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
> Here are some examples of the type of data crunching you might have to do.
>
> In response to the requests by Christophe Pallier and Martin Stevens.
>
> Before I started developing Vilno, some six years ago, I had been working in
> the pharmaceuticals for eight years ( it's not easy to show you actual data
> though, because it's all confidential of course).
>
> Lab data can be especially messy, especially if one clinical trial allows
> the physicians to use different labs. So let's consider lab data.
>
> Merge in normal ranges, into the lab data. This has to be done by lab-site
> and lab testcode(PLT for platelets, etc.), obviously. I've seen cases where
> you also need to match by sex and age. The sex column in the normal ranges
> could be: blank, F, M, or B ( B meaning for Both sexes). The age column in
> the normal ranges could be: blank, or something like "40 <55". Even worse,
> you could have an ageunits column in the normal ranges dataset: usually "Y",
> but if there are children in the clinical trial, you will have "D" or "M",
> for Days and Months. If the clinical trial is for adults, all rows with "D"
> or "M" should be tossed out at the start. Clearly the statistical programmer
> has to spend time looking at the data, before writing the program. Remember,
> all of these details can change any time you move to a new clinical trial.
>
> So for the lab data, you have to merge in the patient's date of birth,
> calculate age, and somehow relate that to the age-group column in the normal
> ranges dataset.
>
> (By the way, in clinical trial data preparation, the SAS datastep is much
> more useful and convenient, in my opinion, than the SQL SELECT syntax, at
> least 97% of the time. But in the middle of this program, when you merge the
> normal ranges into the lab data, you get a better solution with PROC SQL (
> just the SQL SELECT statement implemented inside SAS) This is because of the
> trickiness of the age match-up, and the SAS datastep does not do well with
> many-to-many joins.).
>
> Merge in various study drug administration dates into the lab data. Now, for
> each lab record, calculate treatment period ( or cycle number ), depending
> on the statistician's specifications and the way the clinical trial is
> structured.
>
> Different clinical sites chose to use different lab providers. So, for
> example, for Monocytes, you have 10 different units ( essentially 6 units,
> but spelling inconsistencies as well). The statistician has requested that
> you use standardized units in some of the listings ( % units, and only one
> type of non-% unit, for example ). At the same time, lab values need to be
> converted ( *1.61 , divide by 1000, etc. ). This can be very time consuming
> no matter what software you use, and, in my experience, when the SAS
> programmer asks for more clinical information or lab guidebooks, the
> response is incomplete, so he does a lot of guesswork. SAS programmers do
> not have expertise in lab science, hence the guesswork.
>
> Your program has to accomodate numeric values, "1.54" , quasi-numeric values
> "<1" , and non-numeric values "Trace".
>
> Your data listing is tight for space, so print "PROLONGED CELL CONT" as
> "PRCC".
>
> Once normal ranges are merged in, figure out which values are out-of-range
> and high , which are low, and which are within normal range. In the data
> listing, you may have "H" or "L" appended to the result value being printed.
>
> For each treatment period, you may need a unique lab record selected, in
> case there are two or three for the same treatment period. The statistician
> will tell the SAS programmer how. Maybe the averages of the results for that
> treatment period, maybe that lab record closest to the mid-point of of the
> treatment period. This isn't for the data listing, but for a summary table.
>
> For the differentials ( monocytes, lymphocytes, etc) , merge in the WBC
> (total white blood cell count) values , to convert values between % units
> and absolute count units.
>
> When printing the values in the data listing, you need "H" or "L" to the
> right of the value. But you also need the values to be well lined up ( the
> decimal place ). This can be stupidly time consuming.
>
>
>
> AND ON AND ON AND ON .
>
> I think you see why clinical trials statisticians and SAS programmers enjoy
> lots of job security.

This could be readily handled in R using object oriented programming.
You would specify a class for the strange input,

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide comm

Re: [R] Tools For Preparing Data For Analysis

2007-06-09 Thread Robert Wilkins
Here are some examples of the type of data crunching you might have to do.

In response to the requests by Christophe Pallier and Martin Stevens.

Before I started developing Vilno, some six years ago, I had been working in
the pharmaceuticals for eight years ( it's not easy to show you actual data
though, because it's all confidential of course).

Lab data can be especially messy, especially if one clinical trial allows
the physicians to use different labs. So let's consider lab data.

Merge in normal ranges, into the lab data. This has to be done by lab-site
and lab testcode(PLT for platelets, etc.), obviously. I've seen cases where
you also need to match by sex and age. The sex column in the normal ranges
could be: blank, F, M, or B ( B meaning for Both sexes). The age column in
the normal ranges could be: blank, or something like "40 <55". Even worse,
you could have an ageunits column in the normal ranges dataset: usually "Y",
but if there are children in the clinical trial, you will have "D" or "M",
for Days and Months. If the clinical trial is for adults, all rows with "D"
or "M" should be tossed out at the start. Clearly the statistical programmer
has to spend time looking at the data, before writing the program. Remember,
all of these details can change any time you move to a new clinical trial.

So for the lab data, you have to merge in the patient's date of birth,
calculate age, and somehow relate that to the age-group column in the normal
ranges dataset.

(By the way, in clinical trial data preparation, the SAS datastep is much
more useful and convenient, in my opinion, than the SQL SELECT syntax, at
least 97% of the time. But in the middle of this program, when you merge the
normal ranges into the lab data, you get a better solution with PROC SQL (
just the SQL SELECT statement implemented inside SAS) This is because of the
trickiness of the age match-up, and the SAS datastep does not do well with
many-to-many joins.).

Merge in various study drug administration dates into the lab data. Now, for
each lab record, calculate treatment period ( or cycle number ), depending
on the statistician's specifications and the way the clinical trial is
structured.

Different clinical sites chose to use different lab providers. So, for
example, for Monocytes, you have 10 different units ( essentially 6 units,
but spelling inconsistencies as well). The statistician has requested that
you use standardized units in some of the listings ( % units, and only one
type of non-% unit, for example ). At the same time, lab values need to be
converted ( *1.61 , divide by 1000, etc. ). This can be very time consuming
no matter what software you use, and, in my experience, when the SAS
programmer asks for more clinical information or lab guidebooks, the
response is incomplete, so he does a lot of guesswork. SAS programmers do
not have expertise in lab science, hence the guesswork.

Your program has to accomodate numeric values, "1.54" , quasi-numeric values
"<1" , and non-numeric values "Trace".

Your data listing is tight for space, so print "PROLONGED CELL CONT" as
"PRCC".

Once normal ranges are merged in, figure out which values are out-of-range
and high , which are low, and which are within normal range. In the data
listing, you may have "H" or "L" appended to the result value being printed.

For each treatment period, you may need a unique lab record selected, in
case there are two or three for the same treatment period. The statistician
will tell the SAS programmer how. Maybe the averages of the results for that
treatment period, maybe that lab record closest to the mid-point of of the
treatment period. This isn't for the data listing, but for a summary table.

For the differentials ( monocytes, lymphocytes, etc) , merge in the WBC
(total white blood cell count) values , to convert values between % units
and absolute count units.

When printing the values in the data listing, you need "H" or "L" to the
right of the value. But you also need the values to be well lined up ( the
decimal place ). This can be stupidly time consuming.



AND ON AND ON AND ON .

I think you see why clinical trials statisticians and SAS programmers enjoy
lots of job security.



On 6/8/07, Martin Henry H. Stevens <[EMAIL PROTECTED]> wrote:
>
> Is there an example available of this sort of problematic data that
> requires this kind of data screening and filtering? For many of us,
> this issue would be nice to learn about, and deal with within R. If a
> package could be created, that would be optimal for some of us. I
> would like to learn a tad more, if it were not too much effort for
> someone else to point me in the right direction?
> Cheers,
> Hank
> On Jun 8, 2007, at 8:47 AM, Douglas Bates wrote:
>
> > On 6/7/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
> >> As noted on the R-project web site itself ( www.r-project.org ->
> >> Manuals -> R Data Import/Export ), it can be cumbersome to prepare
> >> messy and dirty data for analysis 

Re: [R] What ECDF function?

2007-06-09 Thread Shiazy Fuzzy
On 6/9/07, Robert A LaBudde <[EMAIL PROTECTED]> wrote:
> At 12:57 PM 6/9/2007, Marco wrote:
> >
> >2.I found various version of P-P plot  where instead of using the
> >"ecdf" function use ((1:n)-0.5)/n
> >   After investigation I found there're different definition of ECDF
> >(note "i" is the rank):
> >   * Kaplan-Meier: i/n
> >   * modified Kaplan-Meier: (i-0.5)/n
> >   * Median Rank: (i-0.3)/(n+0.4)
> >   * Herd Johnson i/(n+1)
> >   * ...
> >   Furthermore, similar expressions are used by "ppoints".
> >   So,
> >   2.1 For P-P plot, what shall I use?
> >   2.2 In general why should I prefer one kind of CDF over another one?
> >
>
> This is an age-old debate in statistics. There are many different
> formulas, some of which are optimal for particular distributions.
>
> Using i/n (which I would call the Kolmogorov method), (i-1)/n or
> i/(n+1) is to be discouraged for general ECDF modeling. These
> correspond in quality to the rectangular rule method of integration
> of the bins, and assume only that the underlying density function is
> piecewise constant. There is no disadvantage to using these methods,
> however, if the pdf has multiple discontinuities.
>
> I tend to use (i-0.5)/n, which corresponds to integrating with the
> "midpoint rule", which is a 1-point Gaussian quadrature, and which is
> exact for linear behavior with derivative continuous. It's simple,
> it's accurate, and it is near optimal for a wide range of continuous
> alternatives.
>

Hmmm I'm a bit confused, but very interested!
So you don't use the R "ecdf", do you?

> The formula (i- 3/8)/(n + 1/4) is optimal for the normal
> distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so
> there is no real benefit to using it. Similarly, there is a formula
> (i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure
> (don't need to test) the form of the distribution, you're better off
> fitting that distribution function directly and not worrying about the edf.
>
> Also remember that edfs are not very accurate, so the differences
> between these formulae are difficult to justify in practice.
>

I will bear in min! My first interpretation was that using some
different from i/n (e.g. i/(n+1)),
let to better individuate tail differences (maybe...)

Regards,

-- Marco

> 
> Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
> Least Cost Formulations, Ltd.URL: http://lcfltd.com/
> 824 Timberlake Drive Tel: 757-467-0954
> Virginia Beach, VA 23464-3239Fax: 757-467-2947
>
> "Vere scire est per causas scire"
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with xlsreadwrite package

2007-06-09 Thread hassen62
Hi friends,
I have installed R 2.4.0 in my pc. I have a file xls entitled dali following 
this directory:c://programfiles//R 2.4.0. Recently I have installed 
xlsreadwrite 1.3.2. but , when I wrote the following lines:
>library(xlsReadWrite)
>read.xls( file, colNames = TRUE, sheet = 1, type = "data.frame",  from = 1, 
>colClasses = NA )
I obtained from R console the following messages:
Error in library(xlsReadWrite)
impossible to find the function"read.xls".
Please help me, many thanks in advance.
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to plot vertical line

2007-06-09 Thread David Barron
abline(v=c(intercept1,intercept2,intercept3))

On 09/06/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hi,I have a result from polr which I fit a univariate variable (of ordinal 
> data) with probit function. What I would like to do is to overlay the plot of 
> my fitted values with the different intercept for each level in my ordinal 
> data. I can do something like:lines(rep(intercept1, 1000), 
> seq(from=0,to=max(fit),by=max(fit)/1000))where my intercept1 is, for example, 
> the intercept that breaks between y=1 and y=2 labels and the max(fit) is the 
> maximum of overall fitted values or maximum of all ordinal y labels. I'm 
> wondering if there is better way to do this? If you could let me know, I 
> would really appreciated. Thank you.- adschai
>
> [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What ECDF function?

2007-06-09 Thread Robert A LaBudde
At 12:57 PM 6/9/2007, Marco wrote:
>
>2.I found various version of P-P plot  where instead of using the
>"ecdf" function use ((1:n)-0.5)/n
>   After investigation I found there're different definition of ECDF
>(note "i" is the rank):
>   * Kaplan-Meier: i/n
>   * modified Kaplan-Meier: (i-0.5)/n
>   * Median Rank: (i-0.3)/(n+0.4)
>   * Herd Johnson i/(n+1)
>   * ...
>   Furthermore, similar expressions are used by "ppoints".
>   So,
>   2.1 For P-P plot, what shall I use?
>   2.2 In general why should I prefer one kind of CDF over another one?
>

This is an age-old debate in statistics. There are many different 
formulas, some of which are optimal for particular distributions.

Using i/n (which I would call the Kolmogorov method), (i-1)/n or 
i/(n+1) is to be discouraged for general ECDF modeling. These 
correspond in quality to the rectangular rule method of integration 
of the bins, and assume only that the underlying density function is 
piecewise constant. There is no disadvantage to using these methods, 
however, if the pdf has multiple discontinuities.

I tend to use (i-0.5)/n, which corresponds to integrating with the 
"midpoint rule", which is a 1-point Gaussian quadrature, and which is 
exact for linear behavior with derivative continuous. It's simple, 
it's accurate, and it is near optimal for a wide range of continuous 
alternatives.

The formula (i- 3/8)/(n + 1/4) is optimal for the normal 
distribution. However, it is equal to (i-0.5)/n to order 1/n^3, so 
there is no real benefit to using it. Similarly, there is a formula 
(i-.44)/(N+.12) for a Gumbel distribution. If you do know for sure 
(don't need to test) the form of the distribution, you're better off 
fitting that distribution function directly and not worrying about the edf.

Also remember that edfs are not very accurate, so the differences 
between these formulae are difficult to justify in practice.


Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

"Vere scire est per causas scire"

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lines in dotchart & dotplot ?

2007-06-09 Thread deepayan . sarkar
On 6/9/07, John Kane <[EMAIL PROTECTED]> wrote:
> Is it possible to use dotchart or dotplot and set the
> lines in such a way that they only extend from the
> left y-axis to the data point?

Yes (sort of) in dotplot at least. E.g.,

dotplot(VADeaths, groups = FALSE, type = c("p", "h"))
dotplot(VADeaths, groups = FALSE, type = c("p", "h"), origin = 0)

-Deepayan

> I seem to remember that Wm Cleveland did this in his
> 1985 book  "The elements of graphing data".
>
> In cases where one has a true starting or O point on
> the x-scale this layout seems to be very effective in
> displaying some data.
>
> I know that I can do it by simple ploting lines and
> points but a more polished function than I am likely
> to produce would be nice.
>
> Thanks
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lines in dotchart & dotplot ?

2007-06-09 Thread John Kane
Is it possible to use dotchart or dotplot and set the
lines in such a way that they only extend from the
left y-axis to the data point?  

I seem to remember that Wm Cleveland did this in his
1985 book  "The elements of graphing data".

In cases where one has a true starting or O point on
the x-scale this layout seems to be very effective in
displaying some data. 

I know that I can do it by simple ploting lines and
points but a more polished function than I am likely
to produce would be nice.

Thanks

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to plot vertical line

2007-06-09 Thread adschai
Hi,I have a result from polr which I fit a univariate variable (of ordinal 
data) with probit function. What I would like to do is to overlay the plot of 
my fitted values with the different intercept for each level in my ordinal 
data. I can do something like:lines(rep(intercept1, 1000), 
seq(from=0,to=max(fit),by=max(fit)/1000))where my intercept1 is, for example, 
the intercept that breaks between y=1 and y=2 labels and the max(fit) is the 
maximum of overall fitted values or maximum of all ordinal y labels. I'm 
wondering if there is better way to do this? If you could let me know, I would 
really appreciated. Thank you.- adschai

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What ECDF function?

2007-06-09 Thread Shiazy Fuzzy
Hello!

I want to plot a P-P plot. So I've implemented this function:

ppplot <- function(x,dist,...)
{
  pdf <- get(paste("p",dist,sep=""),mode="function");
  x <- sort(x);
  plot( pdf(x,...),  ecdf(x)(x));
}

I have two questions:
1. Is it right to draw as reference line the following:

xx <- pdf(x,...);
yy <- ecdf(x)(x);
l <- lm(  yy ~ xx )
abline( l$coefficients );

  or what else is better?

2.I found various version of P-P plot  where instead of using the
"ecdf" function use ((1:n)-0.5)/n
  After investigation I found there're different definition of ECDF
(note "i" is the rank):
  * Kaplan-Meier: i/n
  * modified Kaplan-Meier: (i-0.5)/n
  * Median Rank: (i-0.3)/(n+0.4)
  * Herd Johnson i/(n+1)
  * ...
  Furthermore, similar expressions are used by "ppoints".
  So,
  2.1 For P-P plot, what shall I use?
  2.2 In general why should I prefer one kind of CDF over another one?

  (Note: this issue might also apply to Q-Q plot, infact qqnorm use
ppoints instead of ecdf)

Thank you very much!!

Sincerely,

-- Marco

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "R is not a validated software package.."

2007-06-09 Thread AJ Rossini
You've just opened up another bit of confusion.   A submission package has 
many pieces, and that cited one is just a small part of it.

As Frank has mentioned (though perhaps tritely), and Cody points out -- 

The only issue that a Pharma has to worry about is whether they know enough 
about a software language/application to trust it.  Period.  There are 
incantations and approaches for this, but working through a corporation of 
diverse skills and knowledge, to justify the use of a particular tool 
(whether it is statistical, legal, chemical, etc) with in the corporation, 
brings up the need to follow whatever implementation the company has done 
using the guidelines from FDA, EMEA, PMDA, etc...

But it's the company guidelines, and the company's due diligence that is 
important.

NOT the external view.   So the idea of computer systems validation, and the 
qualification of programming languages such as R and SAS is a good one, since 
it helps you understand where the weaknesses in computed results might be, 
for weighting results for internal decision making and knowledge management.

But much of it has to be done internally.  The side problem of "few people 
using S in clinical statistics" is related to the risks that a company has in 
having people that know about R, etc.  That is the practical issue, solvable 
when budgets exist and when it falls into strategy.

Sorry for getting on the soap-box, but I've just had a nasty week dealing with 
the results from  simplified bullet points which skip the major important 
complexities which were handled badly by decisions made using the summaries.  
Ugly, ugly nonsense.

And Computer Systems validation is a canonical area where the above paragraph 
holds.

best,
-tony.

On Friday 08 June 2007, Sicotte, Hugues Ph.D. wrote:
> I may have overstated  things a bit.
>
> See section VIII
> http://www.fda.gov/CDER/GUIDANCE/2396dft.htm
>
> If you are analyzing data your statistical package does not necessarely
> have to be validated. You may have to show that the statistical methods
> are adequate/appropriate or that the results are reproduced with
> different softwares if you are using non-standard packages. By all
> tests, S-plus appears acceptable, do not know about R.
>
> However, If your statistical method is an intricate part of a test, then
> you do have to validate the system.
> This is becoming increasingly relevant for theragnostics.
>
> .. Which is why I said
> "Should they need to use those results in a report [where] that will
> matter to the FDA.."
> (I added the where .. It makes more sense)
>
>
>
> -Original Message-
> From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
> Sent: Friday, June 08, 2007 11:08 AM
> To: Sicotte, Hugues Ph.D.
> Cc: Wensui Liu; Giovanni Parrinello; r-help@stat.math.ethz.ch
> Subject: Re: [R] "R is not a validated software package.."
>
> Sicotte, Hugues Ph.D. wrote:
> > People, don't get angry at the pharma statistician, he is just trying
>
> to
>
> > abide by an FDA requirement that is designed to insure that test
>
> perform
>
> > reliably the same. There is no point in getting into which product is
> > better. As far as the FDA rules are concerned a validated system beats
>
> a
>
> > "better" system any day of the week.
>
> There is no such requirement.
>
> > Here is your polite answer.
> > You can develop and try your software in R.
> > Should they need to use those results in a report that will matter to
> > the FDA, then you can work together with him to set up a validated
> > environment for S-plus. You then have to commit to port your code to
> > S-plus.
>
> That doesn't follow.  What matters is good statistical analysis practice
>
> no matter which environment you use.  Note that more errors are made in
> the data preparation / derived variables stage than are made by
> statistical software.
>
> Frank
>
> > As I assume that you do not work in a regulated environment, you
> > probably wouldn't have access to a validated SAS environment anyways.
>
> It
>
> > is not usually enough to install a piece of software, you have to
> > validate every step of the installation. Since AFAIK the FDA uses
> > S-plus, it would be to your pharma person's advantage to speed-up
> > submissions if they also had a validated S-plus environment.
>
> http://www.msmiami.com/custom/downloads/S-PLUSValidationdatasheet_Final.
>
> > pdf
> >
> >
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Wensui Liu
> > Sent: Friday, June 08, 2007 9:24 AM
> > To: Giovanni Parrinello
> > Cc: r-help@stat.math.ethz.ch
> > Subject: Re: [R] "R is not a validated software package.."
> >
> > I like to know the answer as well.
> > To be honest, I really have hard time to understand the mentality of
> > clinical trial guys and rather believe it is something related to job
> > security.
> >
> > On 6/8/07, Giovanni Parrinello <[EMAIL PROTECTED]> wrote:
> >> Dear All,
> >> discussing with a statistician of a pha

Re: [R] "R is not a validated software package.."

2007-06-09 Thread AJ Rossini
On Friday 08 June 2007, Giovanni Parrinello wrote:
> Dear All,
> discussing with a statistician of a pharmaceutical company I received
> this answer about the statistical package that I have planned to use:
>
> As R is not a validated software package, we would like to ask if it
> would rather be possible for you to use SAS, SPSS or another approved
> statistical software system.
>
> Could someone suggest me a 'polite' answer?
> TIA
> Giovanni

You can't validate any complex software package, i.e. computer programming 
language complexity (SAS, R, S-PLUS, SPSS, PERL, Python, Ruby, Java)

You can qualify a software package, and validate code written in it.

As a "statistician" in a very large pharmaceutical company based in Basel 
which happens to be bigger than the other large pharma in Basel, I can say 
that we should have most of the paperwork done for qualification, at some 
point this year, for use as part of submission packages.  Whether it will be 
used is another matter, which will be driven by business needs :-).

So your colleague is right, only in the sense that whatever the company has 
approved is appropriate, and qualification in the "computer systems 
validation" context is expensive, time and man-power wise.  But that holds 
true for any software package.

Your colleague should have technically said:  "As R is not a qualified 
software package at my company, we would like to ask if it would be possible 
for you to use software which my company has approved and done 
the /risk-management/ paperwork for and gotten approval from our Clinical 
Quality group to use". 

This is an issue -- however, whether R could pass that is not an issue, it 
clearly could be done if they wanted to do it.
 
best,
-tony

[EMAIL PROTECTED]
Muttenz, Switzerland.
"Commit early,commit often, and commit in a repository from which we can 
easily
roll-back your mistakes" (AJR, 4Jan05).


signature.asc
Description: This is a digitally signed message part.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to find how many modes in 2 dimensions case

2007-06-09 Thread Tony Plate
If you want to count the local maxima in the n x n matrix returned by 
kde2d, AND you know there are no ties, you could do something like the 
following:

 > set.seed(1)
 > x <- matrix(sample(10, 25, rep=TRUE), 5, 5)
 > x
  [,1] [,2] [,3] [,4] [,5]
[1,]3935   10
[2,]4   10283
[3,]677   107
[4,]   107442
[5,]31883
 > sum(x > cbind(0, x[,-5]) & x > cbind(x[,-1], 0) & x > rbind(x[-1,], 
0) & x > rbind(0, x[-5,]))
[1] 4
 >

Just be careful that your counting formula matches your definition of 
"neighbor" (the above formula does not include diagonal neighbors).

And of course, ties make things more complicated (note that the above 
simple algorithm misses the local maximum consisting of two 8's in the 
last row.)

-- Tony Plate


Patrick Wang wrote:
> Hi,
> 
> Does anyone know how to count the number of modes in 2 dimensions using
> kde2d function?
> 
> Thanks
> Pat
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pnorm how to decide lower-tail true or false

2007-06-09 Thread Martin Maechler
> "CM" == Carmen Meier <[EMAIL PROTECTED]>
> on Fri, 08 Jun 2007 19:31:49 +0200 writes:

CM> Hi to all, maybe the last question was not clear enough.
CM> I did not found any hints how to decide whether it
CM> should use lower.tail or not.  As it is an extra
CM> R-feature ( written in
CM> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66250.html
CM> ) I do not find anything about it in any statistical
CM> books of me.

Yes, most "statistical books" do not consider numerical accuracy
which is the real issue here.
Note that R is much more than a "statistical package" and hence
to be appreciated properly needs much broader (applied)
mathematical, statistical and computer science knowledge ;-)

When  p ~= 1,  '1 - p' suffers from so called cancellation
("Numerical analysis 101").
If you already know that you will use "q := 1 - p",
rather compuate 'q' directly than first compute p, then 1-p,
losing all accuracy.

All of R's  p(..) functions have an argument 'lower.tail'
which is TRUE by default, since after all,

  p(x) = Prob_{}[X <= x]   

measures the probability of the lower or left tail of the
-distribution.
 = norm  is just a special case.
If you really want 
   q =  1 - p(x) = Prob_{}[X > x]   

then you can get this directly via
 
   q <- p(x, lower.tail = FALSE, )


Simple example with R :

> pnorm(10)
[1] 1
> 1 - pnorm(10)
[1] 0
> pnorm(10, lower.tail=FALSE)
[1] 7.619853e-24


Regards,
Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write.table: last line should differ from usual eol

2007-06-09 Thread Gabor Grothendieck
Try this:


write.ilog <- function(X, file = "") {
   w <- function(x, z, file)
  cat("[", paste(x, collapse = ","), "]", z, sep = "", file = file)
   if (!identical(file, "")) {
  file <- open(file, "w")
  on.exit(close(file))
   }
   cat("X=[", file = file)
   nr <- nrow(X)
   for(i in 1:nr) w(X[i,], if (i == nr) "];\n" else ",\n", file)
   invisible(X)
}

X<-array(1:4,c(2,2))
write.ilog(X)




On 6/9/07, jim holtman <[EMAIL PROTECTED]> wrote:
> This will probably do it for you.  It is a function to create the output:
>
>
>
> write.array <- function(x,fileName){
>outFile <- file(fileName, 'w')
>cat(deparse(substitute(x)), "=[", sep='', file=outFile)
>for (i in 1:nrow(x)){
>cat('[', paste(x[i,], collapse=','), ']', file=outFile, sep='')
>if (i == nrow(x)) cat('];', file=outFile, sep='')
>else cat(',\n', file=outFile, sep='')
>}
>close(outFile)
> }
>
> # test data
> a <- matrix(1:25,5)
> write.array(a, '/tempxx.txt')
>
> Here is the output file:
>
> a=[[1,6,11,16,21],
> [2,7,12,17,22],
> [3,8,13,18,23],
> [4,9,14,19,24],
> [5,10,15,20,25]];
>
>
>
>
> I have a problem with writing an array to (for example) a .txt-file.
> Because of the .txt-file must be read from another programm (OPL ILOG),
> the syntax of the output must be from a special form:
>
> name_of_the_object = [  [1,2, ... ],
>   [1,...],
>   ... ];
>
> I think it's easier to understand with a small example:
>
> X<-array(1:4,c(2,2))
>
> should be written as:
> X = [[1,3],
> [2,4]];
>
>
> I have (until now) used the following:
>
> write("X=[[",file=filename)
> write.table(X,file=filename,sep=",",eol="],\n [", row.names=FALSE,
> col.names=FALSE,append=TRUE)
>
> which leads to the following output:
> X=[[
> 1,3],
> [2,4],
> [
>
> I hope you can help because it's very annoying to adjust the resulting
> .txt-file "by hand".
>
> Thanks a lot for your help!
> With nice greetings
>
> Andreas Gegg,
> mathematic-student on Catholic University of Eichstätt-Ingolstadt (Germany)
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
>[[alternative HTML version deleted]]
>
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do you do an e-mail post that is within an ongoing thread?

2007-06-09 Thread Gabor Grothendieck
In gmail just hit Reply to All at the bottom of the post you wish to
follow up on.

On 6/8/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
> That may sound like a stupid question, but if it confuses me, I'm sure
> it confuses others as well. I've tried to find that information on the
> R mail-group info pages, can't seem to find it. Is it something
> obvious?
>
> To begin a brand new discussion, you do your post as an e-mail sent to
>  r-help@stat.math.ethz.ch .
> As I am doing right now.
>
> How do I do an additional post that gets included in the
> "[R] Tools For Preparing Data For Analysis" thread, a thread which I
> started myself yesterday ( thanks for all the responses everybody )?
>
> There's got to be a real easy answer to that, since everybody else does that.
> (I'm using gmail, does it make a difference what e-mail host you use?).
>
> ---
>
>
> PS
> If you happen to be reading this, Christophe Pallier & Martin Stevens,
> I will respond to your request for examples shortly, once I figure
> this posting how-to out. My examples will come from data preparation
> problems in clinical trial data ( I worked for 8 years on clinical
> trial analysis before beginning work on Vilno ). I'll probably use lab
> data as an example because  lab data can be messy and difficult to
> work with.
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open .r files with double-click

2007-06-09 Thread Duncan Murdoch
On 08/06/2007 2:52 PM, [EMAIL PROTECTED] wrote:
> Hi Folks,
> On Windows XP, R 2.5.0.
> 
> After reading the Installation for Windows and Windows FAQs,
> I cannot resolve this.
> 
> I set file types so that Rgui.exe will open .r files.
> 
> When I try to open a .r file by double-clicking, R begins to launch,
> but I get an error message saying
> 
> "Argument 'C:\Documents and Settings\Zoology\My Documents\trial.r' _ignored_"
> 
> I click OK, and then R GUI opens, but not the script file.
> 
> Is there a way to change this?

Not currently. See the appendix "Invoking R" of the Introduction manual 
for the current command line parameters, which don't include "open a 
script".  This would be a reasonable addition, and I'll add it at some 
point, sooner if someone else comes up with a convincing argument for 
the "right" command line parameter to do this.

It would be better if clicking on a second script opened a new window in 
the same session, but that takes more work; not sure I'll get to this.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] open .r files with double-click

2007-06-09 Thread michael watson \(IAH-C\)
Hmmm.  Possibly your best bet is to create a batch file, runr.bat or something, 
and associate .r files with that.

The batch file would be something like:

"C:/Program Files/R/R-2.5.0/bin/Rgui.exe" --no-save < %1

(I think thats how you reference arguments in dos...)


-Original Message-
From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED]
Sent: Fri 08/06/2007 7:52 PM
To: r-help@stat.math.ethz.ch
Subject: [R] open .r files with double-click
 
Hi Folks,
On Windows XP, R 2.5.0.

After reading the Installation for Windows and Windows FAQs,
I cannot resolve this.

I set file types so that Rgui.exe will open .r files.

When I try to open a .r file by double-clicking, R begins to launch,
but I get an error message saying

"Argument 'C:\Documents and Settings\Zoology\My Documents\trial.r' _ignored_"

I click OK, and then R GUI opens, but not the script file.

Is there a way to change this?

thanks,
Hank

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write.table: last line should differ from usual eol

2007-06-09 Thread jim holtman
This will probably do it for you.  It is a function to create the output:



write.array <- function(x,fileName){
outFile <- file(fileName, 'w')
cat(deparse(substitute(x)), "=[", sep='', file=outFile)
for (i in 1:nrow(x)){
cat('[', paste(x[i,], collapse=','), ']', file=outFile, sep='')
if (i == nrow(x)) cat('];', file=outFile, sep='')
else cat(',\n', file=outFile, sep='')
}
close(outFile)
}

# test data
a <- matrix(1:25,5)
write.array(a, '/tempxx.txt')

Here is the output file:

a=[[1,6,11,16,21],
[2,7,12,17,22],
[3,8,13,18,23],
[4,9,14,19,24],
[5,10,15,20,25]];




I have a problem with writing an array to (for example) a .txt-file.
Because of the .txt-file must be read from another programm (OPL ILOG),
the syntax of the output must be from a special form:

name_of_the_object = [  [1,2, ... ],
   [1,...],
   ... ];

I think it's easier to understand with a small example:

X<-array(1:4,c(2,2))

should be written as:
X = [[1,3],
 [2,4]];


I have (until now) used the following:

write("X=[[",file=filename)
write.table(X,file=filename,sep=",",eol="],\n [", row.names=FALSE,
col.names=FALSE,append=TRUE)

which leads to the following output:
X=[[
1,3],
[2,4],
[

I hope you can help because it's very annoying to adjust the resulting
.txt-file "by hand".

Thanks a lot for your help!
With nice greetings

Andreas Gegg,
mathematic-student on Catholic University of Eichstätt-Ingolstadt (Germany)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] character to time problem

2007-06-09 Thread John Kane
Thanks. I read your code too quickly.  I'll have a
look at the R News article. I read it last year but
apparently have forgotten just about all of it. :(


--- Gabor Grothendieck <[EMAIL PROTECTED]>
wrote:

> The code in my post uses "Date" class, not POSIX.
> sort.POSIXlt is never invoked.  Suggest you read the
> help desk article in R News 4/1 for more.
> 
> On 6/8/07, John Kane <[EMAIL PROTECTED]> wrote:
> > Looks much better. I seldom use dates for much and
> > didn't think to look at the sort.POSIXlt function.
> >
> > If I understand this correctly the sort.POSIXlt
> with
> > na.last = FALSE is dropping all the NAs.  Very
> nice.
> >
> >
> > --- Gabor Grothendieck <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Perhaps you want one of these:
> > >
> > > > sort(as.Date(aa$times, "%d/%m/%Y"))
> > > [1] "1995-03-02" "2001-05-12" "2007-02-14"
> > >
> > > > sort(as.Date(aa$times, "%d/%m/%Y"), na.last =
> > > TRUE)
> > > [1] "1995-03-02" "2001-05-12" "2007-02-14" NA
> > >NA
> > > [6] NA
> > >
> > >
> > > On 6/7/07, John Kane <[EMAIL PROTECTED]> wrote:
> > > > I am trying to clean up some dates and I am
> > > clearly
> > > > doing something wrong.  I have laid out an
> example
> > > > that seems to show what is happening with the
> > > "real"
> > > > data.  The  coding is lousy but it looks like
> it
> > > > should have worked.
> > > >
> > > > Can anyone suggest a) why I am getting that NA
> > > > appearing after the strptime() command and b)
> why
> > > the
> > > > NA is disappearing in the sort()? It happens
> with
> > > > na.rm=TRUE  and na.rm=FALSE
> > > >
> -
> > > > aa  <- data.frame( c("12/05/2001", " ",
> > > "30/02/1995",
> > > > NA, "14/02/2007", "M" ) )
> > > > names(aa)  <- "times"
> > > > aa[is.na(aa)] <- "M"
> > > > aa[aa==" "]  <- "M"
> > > > bb <- unlist(subset(aa, aa[,1] !="M"))
> > > > dates <- strptime(bb, "%d/%m/%Y")
> > > > dates
> > > > sort(dates)
> > > >
> --
> > > >
> > > > Session Info
> > > > R version 2.4.1 (2006-12-18)
> > > > i386-pc-mingw32
> > > >
> > > > locale:
> > > > LC_COLLATE=English_Canada.1252;
> > > > LC_CTYPE=English_Canada.1252;
> > > > LC_MONETARY=English_Canada.1252;
> > > > LC_NUMERIC=C;LC_TIME=English_Canada.1252
> > > >
> > > > attached base packages:
> > > > [1] "stats" "graphics"  "grDevices"
> "utils"
> > > > "datasets"  "methods"   "base"
> > > >
> > > > other attached packages:
> > > >  gdata   Hmisc
> > > > "2.3.1" "3.3-2"
> > > >
> > > >  (Yes I know I'm out of date but I don't like
> > > > upgrading just as I am finishing a project)
> > > >
> > > > Thanks
> > > >
> > > > __
> > > > R-help@stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal,
> self-contained,
> > > reproducible code.
> > > >
> > >
> >
> >
> >
> >  Be smarter than spam. See how smart SpamGuard
> is at giving junk email the boot with the All-new

> http://mrd.mail.yahoo.com/try_beta?.intl=ca
> >
> >
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] write.table: last line should differ from usual eol

2007-06-09 Thread Andreas Gegg
Dear R-Team,

I have a problem with writing an array to (for example) a .txt-file.  
Because of the .txt-file must be read from another programm (OPL ILOG),  
the syntax of the output must be from a special form:

name_of_the_object = [  [1,2, ... ],
[1,...],
... ];

I think it's easier to understand with a small example:

X<-array(1:4,c(2,2))

should be written as:
X = [[1,3],
  [2,4]];


I have (until now) used the following:

write("X=[[",file=filename)
write.table(X,file=filename,sep=",",eol="],\n [", row.names=FALSE,  
col.names=FALSE,append=TRUE)

which leads to the following output:
X=[[
1,3],
  [2,4],
  [

I hope you can help because it's very annoying to adjust the resulting  
.txt-file "by hand".

Thanks a lot for your help!
With nice greetings

Andreas Gegg,
mathematic-student on Catholic University of Eichstätt-Ingolstadt (Germany)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do you do an e-mail post that is within an ongoing thread?

2007-06-09 Thread Gavin Simpson
On Fri, 2007-06-08 at 20:39 -0500, Robert Wilkins wrote:
> That may sound like a stupid question, but if it confuses me, I'm sure
> it confuses others as well. I've tried to find that information on the
> R mail-group info pages, can't seem to find it. Is it something
> obvious?
> 
> To begin a brand new discussion, you do your post as an e-mail sent to
>  r-help@stat.math.ethz.ch .
> As I am doing right now.
> 
> How do I do an additional post that gets included in the
> "[R] Tools For Preparing Data For Analysis" thread, a thread which I
> started myself yesterday ( thanks for all the responses everybody )?

Just reply all (to the list and the sender of the email, plus any other
recipients in the CC list if appropriate) to the email you wish to
comment on. You can reply at any point in the thread and your email will
end up located in that position in the thread, i.e. underneath the
message you replied to in the thread.

The actual threading is dealt with by peoples own email software (and by
the software used to manage the archives), via some of the headers sent
along with your email, for example:

In-Reply-To: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>

The long code there is the Message-Id header of the email that the reply
references.

Most emailers will hide all of these headers from you, but a good one
will allow you to look at all the headers or the actual source of the
email, where you will be able to see them, along with a lot of other
information about the message sent.

> 
> There's got to be a real easy answer to that, since everybody else does that.
> (I'm using gmail, does it make a difference what e-mail host you use?).

I've not use gmail much, but many people on the list do and they end up
in the correct place in the thread. Note that in Gmail to see the
headers for a message you can select "show original" from the little
drop down menu (down triangle) next to the reply button.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "R is not a validated software package.."

2007-06-09 Thread Berwin A Turlach
G'day Marc,

On Fri, 08 Jun 2007 14:11:48 -0500
Marc Schwartz <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-06-08 at 16:02 +0200, Giovanni Parrinello wrote:
> > Dear All,
> > discussing with a statistician of a pharmaceutical company I
> > received this answer about the statistical package that I have
> > planned to use:
> > 
> > As R is not a validated software package, we would like to ask if
> > it would rather be possible for you to use SAS, SPSS or another
> > approved statistical software system.
> > 
> > Could someone suggest me a 'polite' answer?
> > TIA
> > Giovanni
> > 
> 
> The polite answer is that there is no such thing as 'FDA approved'
> software for conducting clinical trials. The FDA does not approve,
> validate or otherwise endorse software.

I like this one. :)

My polite answer would have been: "Sure, can do.  If you pay for the
license of the finally agreed upon statistical software system and pay
for the time it takes me to learn it so that I can do the analysis
using that system instead of R."  Most clients I know would withdraw
such requests if they notice that it will probably double or triple
their bill. ;-)

Cheers,

Berwin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.