Re: [R] kruskal-wallis, stratified

2010-04-13 Thread Heinz Tuechler

See the thread stratified Wilcoxon available? at

http://tolstoy.newcastle.edu.au/R/help/05/08/11143.html

Heinz

At 11:21 13.04.2010, Kay Cichini wrote:


hello everyone,

can anybody tell me if there is a kruskal-wallis, or another non-parametric
test, that can deal with multiple samples that are stratified?

thanks,
kay
--
View this message in context: 
http://n4.nabble.com/kruskal-wallis-stratified-tp1838210p1838210.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] kruskal-wallis, stratifiedhttp://n4.nabble.com/forum/NewNode.jtp?tpl=replynode=1838232

2010-04-13 Thread Heinz Tuechler

Sorry for not being precise enough.

Here
http://tolstoy.newcastle.edu.au/R/help/05/08/11177.html

you should find the attachment

http://tolstoy.newcastle.edu.au/R/help/att-11177/KW.strat.2005.R

I used it, and it seems to work. In some cases some elements of 
weight may become Inf.


Heinz

At 12:29 13.04.2010, Kay Cichini wrote:


hello heinz,

i read the thread already. i think it applies only to 2-sample problems.

greatings,
kay
--
View this message in context: 
http://n4.nabble.com/kruskal-wallis-stratified-tp1838210p1838261.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can we get rid of bar charts with error bars?

2009-12-03 Thread Heinz Tuechler

Frank,

the example on http://biostat.mc.vanderbilt.edu/DynamitePlots is 
nice, and I agree with you. Just one minor question: would it be 
possible to mention as An article with nice dot plots a paper, 
which is freely available?


Heinz

At 14:56 03.12.2009, Frank E Harrell Jr wrote:
Bar charts with error bars are far inferior to dot charts and other 
types of displays.  One of many problems is demonstrated if you draw 
a bar chart displaying temperature in F then re-draw it on the 
degrees C scale.  See http://biostat.mc.vanderbilt.edu/DynamitePlots 
for much more information.  The error bars lull us into an 
assumption that symmetric confidence intervals are OK, among other things.


Frank

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Building static HTML help pages in R 2.10.x on Windows

2009-12-27 Thread Heinz Tuechler

Instead of an answer, may I add question

c) can someone state that it is impossible to generate static HTML 
help pages under Windows?


At 21:40 22.12.2009, Steve Rowley wrote:

I upgraded to R2.10.1pat and discovered, along with everybody else,
that static HTML pages are no longer the default.  Fine; my tastes
would go the other way, but I'm happy to adapt.

However, I'd still like to build static HTML pages (for stable
bookmarking, use when R is not running, etc.).

I'm using the Windows installer, so the advice in the R Installation 
Admin guide (section 2.2, Help options) to use the configure option
--enable-prebuilt-html doesn't seem to apply.  I'm using
install.packages() rather than R CMD INSTALL, so I don't quite
understand how the --html arg to R CMD INSTALL can apply either.

So, can anybody point me to an example of either:

(a) how to build the static HTML help pages of all currently
installed packages under Windows,

or, failing that

(b) how to do this on Windows ab initio, from a clean install?

Thanks!
--
Steve Rowley s...@alum.mit.edu http://alum.mit.edu/www/sgr/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying function to parts of a matrix based on a factor

2010-01-13 Thread Heinz Tuechler

If your matrix were a data.frame, it could work like this:

df - data.frame(age=1:100, sex=rep(1:2, 50))
with(df, by(age, sex, mean))

without the lapply, sapply etc. family.

h

At 18:16 13.01.2010, Doran, Harold wrote:

with(yourdataframe, tapply(age,sex,mean))

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin

Sent: Wednesday, January 13, 2010 12:11 PM
To: r-help@r-project.org
Subject: [R] Applying function to parts of a matrix based on a factor

R 2.9
Windows XP

I have a matrix, Data, which contains a factor Sex and a continuous 
variable Age.

I want to get mean age by sex. I know I can do this with two statements,
mean(Data[Age,Data[,Sex]==Male) and
mean(Data[Age,Data[,Sex]==Female)

I know this can be done in a single command, but I can remember how. 
There is a function that allows another function work within 
factors, something like
magicfunction(Data,Factor=Sex). n.b. I know the function I am 
looking for is not in the lapply, sapply etc. family


Please put me out of my misery (and senior moment) and remind me 
what function I should be using.





John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for t...{{dropped:9}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging issue.........

2010-01-13 Thread Heinz Tuechler

Did you consider to look at the help page for merge?
h

At 22:01 13.01.2010, karena wrote:


hi, I have a question about merging two files.
For example, I have two files, the first file is like the following:

id   trait1
110.2
211.1
39.7
610.2
78.9
10  9.7
11  10.2

The second file is like the following:
idtrait2
1 9.8
2 10.8
4 7.8
5 9.8
6 10.1
1210.2
1310.1

now I want to merge the two files by the variable id, I only want to keep
the ids which show up in the first file. Even the id does not show up in
the second file, it doesn't matter, I can keep the missing values. So my
question is: how can I merge the two files and keep only the rows whose id
show up in the first file?
I know how to do it is SAS, just use the following code:
merge data1(in=in1) data2(in=in2);
by id;
if in1;

but I really have no idea about how to do it in R.

thank you in advance,

karean
--
View this message in context: 
http://n4.nabble.com/merging-issue-tp1013356p1013356.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BMDP and SAS (was R in clinical trials)

2010-02-22 Thread Heinz Tuechler
Once I suggested to BMDP to introduce a module-statement that would 
direct the syntax to the specified module (1L, 2L, ...), so that all 
syntax could reside in one job, but they did not like that idea.


Heinz

At 14:55 19.02.2010, Terry Therneau wrote:

  I used both BMDP and SAS in my earlier years, side by side.  At that
time the BMDP statistical methods were much more mature and
comprehensive: we treated them as the standard when the two packages
disagreed.  (It was a BMDP manual that clearly explained to me what the
hypothesis of Yate's weighted mean test is, something SAS decided to
call type III and eternally obfuscate by defining it in terms of a
computational algorithm).
  The BMDP programs had reasonable facilities for data manipulation ---
not as strong as SAS but reasonable.  However each analysis program was
a separate run, so you had to cut and paste your block of setup code
onto the front of each program's instructions.  Cut and paste with a
keypunch machine is not quite as simple as with a mouse, if you needed a
listing, some frequencies, 2-3 regressions, ... it got rather tedious.

  Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Output mean/median survival time from survfit

2008-02-05 Thread Heinz Tuechler
Maybe this thread is of use for you.
How to access results of survival analysis Xiaochun Li (06 May 2006)
http://tolstoy.newcastle.edu.au/R/help/06/05/26713.html

Heinz

At 21:28 04.02.2008, Xing Yuan wrote:
Hi all,

Does anybody know how to output the mean/median survival time from survfit?
Thank you very much!!!

Joe

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with cut

2008-02-22 Thread Heinz Tuechler
At 15:22 22.02.2008, Henrique Dallazuanna wrote:
Try this:

grep(330, levels(cc), value=T)

Could you please explain in a little more detail, 
how this answers the original question?
I would have expected 330 to fall into (313,330] category.
  Can you please advice what do I do wrong?

Thank you

Heinz


On 22/02/2008, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
  Hi All,
 
   I might misunderstood how cut works. But following behaviour surprises
   me.
 
   vv - seq(150, 346, by= 4)
   cc - cut(vv, 12)
   cc[vv == 330]
   Results [1] (330,346]
 
   I would have expected 330 to fall into (313,330] category.
 
   Can you please advice what do I do wrong?
 
   Many Thanks,
   Jussi Lehto
 
  Visit our website at http://www.ubs.com
 
   This message contains confidential information and is in...{{dropped:29}}
 
 
  __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
 


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation between categorical data

2009-06-21 Thread Heinz Tuechler

At 07:40 21.06.2009, J Dougherty wrote:

[...]
There are other ways of regarding the FET.  Since it is precisely 
what it says

- an exact test - you can argue that you should avoid carrying over any
conclusions drawn about the small population the test was applied to and
employing them in a broader context.  In so far as the test is concerned, the
sample data and the contingency table it is arrayed in are the entire
universe.  In that sense, the FET can't be conservative or liberal.  It
isn't actually a hypothesis test and should not be thought of as one or used
in the place of one.

JDougherty


Could you give some reference, supporting this, for me, surprising 
view? I don't see a necessary connection between an exact test and 
the idea that it does not test a hypothesis.


Thanks,
Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survSplit with data.frame containing a Surv object

2009-07-13 Thread Heinz Tuechler

Dear All,

since years I am struggling with Surv objects in data.frames. The 
following seems to have to do with it.
See below the modified example from the help page of survSplit. The 
original works, as expected. If, however, a Surv object is added to 
the data.frame, each record gets doubled.

Is there some solution other than avoiding Surv objects in data.frames?

Thanks,
Heinz


require(survival)

## from the help page
aml3-survSplit(aml,cut=c(5,10,50),end=time,start=start,
  event=status,episode=i)

summary(aml)
summary(aml3)

coxph(Surv(time,status)~x,data=aml)
## the same
coxph(Surv(start,time,status)~x,data=aml3)

## added to show doubling of records
aml.so - aml
aml.so$surv.object - with(aml, Surv(time, status))

aml3.so - survSplit(aml.so ,cut=c(5,10,50),end=time,start=start,
 event=status,episode=i)
summary(aml3.so)

sessionInfo('survival')
R version 2.9.1 Patched (2009-07-07 r48910)
i386-pc-mingw32

locale:
LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252

attached base packages:
character(0)

other attached packages:
[1] survival_2.35-4

loaded via a namespace (and not attached):
[1] base_2.9.1  graphics_2.9.1  grDevices_2.9.1 methods_2.9.1
[5] splines_2.9.1   stats_2.9.1 utils_2.9.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survSplit with data.frame containing a Surv object

2009-07-13 Thread Heinz Tuechler

At 20:18 13.07.2009, Charles C. Berry wrote:

On Mon, 13 Jul 2009, Heinz Tuechler wrote:


Dear All,

since years I am struggling with Surv objects in data.frames. The 
following seems to have to do with it.
See below the modified example from the help page of survSplit. The 
original works, as expected. If, however, a Surv object is added to 
the data.frame, each record gets doubled.

Is there some solution other than avoiding Surv objects in data.frames?


I think you can modify survSplit so that it will properly handle Surv objects.

Change this line:

newdata - lapply(data, rep, ntimes + 1)

to this:

newdata - lapply(data,
function(x) {
x - as.matrix(x)
x[rep(1:nrow(x), ntimes + 1),]
})

or something similar that results Surv objects being rep()'ed 
rowwise rather than elementwise and returned as objects of the right 
dimension (rather than as a vector).


Caveat: This works in the example you give, but I've not tested this 
extensively.


HTH,

Chuck





Thanks,
Heinz


require(survival)

## from the help page
aml3-survSplit(aml,cut=c(5,10,50),end=time,start=start,
  event=status,episode=i)

summary(aml)
summary(aml3)

coxph(Surv(time,status)~x,data=aml)
## the same
coxph(Surv(start,time,status)~x,data=aml3)

## added to show doubling of records
aml.so - aml
aml.so$surv.object - with(aml, Surv(time, status))

aml3.so - survSplit(aml.so ,cut=c(5,10,50),end=time,start=start,
event=status,episode=i)
summary(aml3.so)

sessionInfo('survival')
R version 2.9.1 Patched (2009-07-07 r48910)
i386-pc-mingw32

locale:
LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252

attached base packages:
character(0)

other attached packages:
[1] survival_2.35-4

loaded via a namespace (and not attached):
[1] base_2.9.1  graphics_2.9.1  grDevices_2.9.1 methods_2.9.1
[5] splines_2.9.1   stats_2.9.1 utils_2.9.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098
Dept of 
Family/Preventive Medicine

E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901


Thank you Chuck,

it seems to work also with my real data, but I noted that in the 
example aml$x, which is a factor, gets converted to character in 
aml3.so. Maybe, if I find the time, I should look at 
as.data.frame.matrix and rbind for Surv objects.


Thanks again,
Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Two envelopes problem

2008-08-26 Thread Heinz Tuechler

Mark

My experience was similarly frustrating. Maybe formulating the 
problem a bit differently could help to clarify it.

State it like this:
Someone chooses an amount of money x. He puts 2x/3 of it in one 
envelope and x/3 in an other. There is no assumption about the 
distribution of x.
If you choose one envelope your expectation is x/2 and changing may 
lead to a gain or a loss of x/6.
In my view there is no basis for a frequentist conditional 
expectation, conditional on the amount in the first envelope. Of 
course, after opening the first envelope and finding a, you know for 
sure that x can only be 3a or 3a/2, but to me there seems to be no 
basis to assign probabilities to these two alternatives.

I am aware of the long lasting discussion and of course this will not end it.

Heinz



At 14:51 26.08.2008, Mark Leeds wrote:

Duncan: I think I see what you're saying but the strange thing is that if
you use the utility function log(x) rather than x, then the expected values
are equal. Somehow, if you are correct and I think you are, then taking the
log , fixes the distribution of x which is kind of odd to me. I'm sorry to
belabor this non R related discussion and I won't say anything more about it
but I worked/talked  on this with someone for about a month a few years ago
and we gave up so it's interesting for me to see this again.

   Mark

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Duncan Murdoch
Sent: Tuesday, August 26, 2008 8:15 AM
To: Jim Lemon
Cc: r-help@r-project.org; Mario
Subject: Re: [R] Two envelopes problem

On 26/08/2008 7:54 AM, Jim Lemon wrote:
 Hi again,
 Oops, I meant the expected value of the swap is:

 5*0.5 + 20*0.5 = 12.5

 Too late, must get to bed.

But that is still wrong.  You want a conditional expectation,
conditional on the observed value (10 in this case).  The answer depends
on the distribution of the amount X, where the envelopes contain X and
2X.  For example, if you knew that X was at most 5, you would know you
had just observed 2X, and switching would be  a bad idea.

The paradox arises because people want to put a nonsensical Unif(0,
infinity) distribution on X.  The Wikipedia article points out that it
can also arise in cases where the distribution on X has infinite mean:
a mathematically valid but still nonsensical possibility.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mode Vs Class

2008-04-09 Thread Heinz Tuechler
Congratulation Bill for this very clear and useful explanation.

Heinz

At 14:58 08.04.2008, [EMAIL PROTECTED] wrote:
'mode' is a mutually exclusive classification of objects according to
their basic structure.  The 'atomic' modes are numeric, complex,
charcter and logical.  Recursive objects have modes such as 'list' or
'function' or a few others.  An object has one and only one mode.

'class' is a property assigned to an object that determines how generic
functions operate with it.  It is not a mutually exclusive
classification.  If an object has no specific class assigned to it, such
as a simple numeric vector, it's class is usually the same as its mode,
by convention.

Changing the mode of an object is often called 'coercion'.  The mode of
an object can change without necessarily changing the class.  e.g.

  x - 1:16
  mode(x)
[1] numeric
  dim(x) - c(4,4)
  mode(x)
[1] numeric
  class(x)
[1] matrix
  is.numeric(x)
[1] TRUE
  mode(x) - character
  mode(x)
[1] character
  class(x)
[1] matrix

However:

  x - factor(x)
  class(x)
[1] factor
  mode(x)
[1] numeric
 

At this stage, even though x has mode numeric again, its new class,
'factor', inhibits it being used in arithmetic operations.

In practice, mode is not used very much, other than to define a class
implicitly when no explicit class has been assigned.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Shubha Vishwanath Karanth
Sent: Tuesday, 8 April 2008 10:20 PM
To: [EMAIL PROTECTED]
Subject: [R] Mode Vs Class

Hi R,

Just came across the 'mode' of an object. What is the basic difference
between ?class and ?mode ... For example:

d - data.frame(a = c(1,2), b = c(5,6))

class(d)

[1] data.frame

mode(d)

[1] list

But,

c - c(2,3,5,6,7)

class(c)

[1] numeric

mode(c)

[1] numeric

Could anyone help me out...

Thanks,

shubha
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trend test for survival data

2008-04-22 Thread Heinz Tuechler
Dear Markus!

Since I did not see an answer yet, my suggestion is to use coxph with 
the groups variable numerically coded as the only independent variable.

Heinz

At 13:39 21.04.2008, Markus Kreuz wrote:
Hello,
is there a R package that provides a log rank trend test
for survival data in =3 treatment groups?
Or are there any comparable trend tests for survival data in R?

Thanks a lot
Markus

--
Dipl. Inf. Markus Kreuz
Universitaet Leipzig
Institut fuer medizinische Informatik, Statistik und Epidemiologie (IMISE)
Haertelstr. 16-18
D-04107 Leipzig

Tel. +49 341 97 16 276
Fax. +49 341 97 16 109
email: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival curves for time dependent covariates (was consultation)

2009-05-13 Thread Heinz Tuechler

At 14:50 12.05.2009, Terry Therneau wrote:

*I´m writing to ask you how can I do Survivals Curves using Time-dependent
*covariates? Which packages I need to Install?*

  This is a very difficult problem 
statistically.  That is, there are not many

good ideas for what SHOULD be done.  Hence, there are no packages. Almost
everything you find in an applied paper (e.g. a medical journal) is wrong.

 Terry Therneau



Dear Terry,

just in case it does not make too much work to 
you, maybe you could give some references to 
examples of wrong applications in applied medical papers.


Thanks,
Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Where to find a changelog for the survival package

2009-05-20 Thread Heinz Tuechler

Dear All,

since some days I try to use the versions 2.35-4 of the survival 
package instead of versions 2.31, I had installed until now. Several 
changes in print.survfit, plot.survfit and seemingly in the structure 
of ratetabels effect some of my syntax files.

Is there somewhere a documentation of these changes, besides the code itself?

Thanks in advance,
Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Where to find a changelog for the survival package

2009-05-20 Thread Heinz Tuechler
Thank you Richie. I had seen this before, but my impression is that 
it's not up to date. I gave a wrong version number in my previous 
post. I changed from 2.34-1 to 2.35-4. For example, the plot.survfit 
function lost it's legend parameters, but I don't see this in the changelog.


Thanks again,
Heinz

At 14:53 20.05.2009, richard.cot...@hsl.gov.uk wrote:

 since some days I try to use the versions 2.35-4 of the survival
 package instead of versions 2.31, I had installed until now. Several
 changes in print.survfit, plot.survfit and seemingly in the structure
 of ratetabels effect some of my syntax files.
 Is there somewhere a documentation of these changes, besides the code
itself?

It's in the repository on R-Forge.  The latest version is here:
http://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/pkg/survival/Changelog.09?rev=11234root=survivalview=markup

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential information intended
for the addressee(s) only. If this message was sent to you in error,
you must not disseminate, copy or take any action in reliance on it and
we request that you notify the sender immediately by return email.

Opinions expressed in this message and any attachments are not
necessarily those held by the Health and Safety Laboratory or any person
connected with the organisation, save those by whom the opinions were
expressed.

Please note that any messages sent or received by the Health and Safety
Laboratory email system may be monitored and stored in an information
retrieval system.



Scanned by MailMarshal - Marshal's comprehensive email content security
solution. Download a free evaluation of MailMarshal at www.marshal.com



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changelog for the survival package

2009-05-21 Thread Heinz Tuechler

Dear Terry,

first of all, thank you for your immense work. At the moment, I don't 
have a small reproducible example for the ratetable difficulty I 
have. I will work on it. Maybe the error message I get is of some 
information to you.


Error in match.ratetable(m[, rate], ratetable) :
  Data has a date type variable, but the reference ratetable is not 
a date for variable year


If I want to make str(survexp.ode) that is my ratetable, I get:
str(survexp.ode)
Error in `[.ratetable`(object, seq_len(iv.len)) : Invalid subscript
The same, however, is possible in version 2.34-1^

Try:
str(survexp.us)
Error in `[.ratetable`(object, seq_len(iv.len)) : Invalid subscript

But with unclass() it works
str(unclass(survexp.us))
 num [1:113, 1:2, 1:65] 1.58e-02 1.87e-03 3.01e-04 6.05e-05 1.52e-05 ...
 - attr(*, dimnames)=List of 3
  ..$ : chr [1:113] 0-1d 1-7d 7-28d 28-365d ...
  ..$ : chr [1:2] male female
  ..$ : chr [1:65] 1940 1941 1942 1943 ...
 - attr(*, dimid)= chr [1:3] age sex year
 - attr(*, type)= num [1:3] 2 1 4
 - attr(*, cutpoints)=List of 3
  ..$ : num [1:113] 0 1 7 28 365 ...
  ..$ : NULL
  ..$ : int [1:65] -7305 -6939 -6574 -6209 -5844 -5478 -5113 -4748 
-4383 -4017 ...

 - attr(*, summary)=function (R)


Concerning the legend, I fully aggree with you. It's just that I have 
several syntax files, where I made use of the legend parameters and 
so I noted the change. For these files I rebuilt your old plot.survfit().


Further I appreciate your new function survmean(). At the moment it 
seems to be intended as internal, and not documented in the help. 
Still, I use it to get the old form of the output and to get the 
output as an object. I think, with only right censored data, n.max 
and n.start are not informative.


To underline, I appreciate your changes, it's only a little difficult 
to recognize them correctly by trial and error.


Thanks,
Heinz

At 18:57 21.05.2009, Terry Therneau wrote:
  Several changes in print.survfit, plot.survfit and seemingly in 
the structure

 of ratetabels effect some of my syntax files.
 Is there somewhere a documentation of these changes, besides the 
code itself?


 I agree, the Changelog.09 file is not as comprehensive as one would like.
Specific comments:

 1. The ratetables were recently changed to accomodate a new 
option.  I thought
that I had made them completely backwards compatable with the old -- 
please let

me know specifics if I overlooked something.
  The routines that make use of the rate tables can now use 
multiple date types,

but they still support the older 'date' class.

  2. My local code and the R code had gotton badly out of sync, I spent a
substantial fraction of my evenings re-merging them for over a 
year.  2/3 of the
changes were disjoint improvments in the two trees, these were easy 
to merge.

The hardest were survfit and its print/plot methods and some summary methods,
where both of us had worked towards the same goal but in not quite 
the same way.
  I had made 3x as many updates to survfit as the R tree, so used 
my (Mayo) code

as the base, almost all the others stayed closer to the R side.
  Feel free to ask me direct questions about any feature or change.  I can't
necessarily promise fast resolution, but will try.

  3. I don't understand putting legend or title options into a plot method,
since a separate call after the plot is so much more flexible.  They 
got pushed

to the bottom of my change list, and then completely forgotton.

  4. In the last few weeks issues with anova.coxph, and
predict.coxph/factors/newdata were raised.  The fixes were added to 
Rforge last

night, and include 2 new test cases to avoid future mishaps.

   Terry T.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survfit, summary, and survmean (was Changelog for survival package)

2009-05-22 Thread Heinz Tuechler

Dear Terry,

sorry that I did not see this change, and thank you for it. It is very useful.

Heinz

At 15:28 22.05.2009, Terry Therneau wrote:


 Further I appreciate your new function survmean(). At the moment it
 seems to be intended as internal, and not documented in the help.

The computations done by print.survfit are now a part of the results 
returned by

summary.survfit.  See  'table' in the output list of ?summary.survfit.  Both
call an internal survmean() function to ensure that any future 
updates stay in

synchrony.

This was a perennial (and justified) complaint with print.survfit.  Per the
standard print(x) always returns x, so there was no way to get the results of
the print as an S object.

  Terry



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results in calculating SD of 2 numbers

2008-01-16 Thread Heinz Tuechler
At 11:32 16.01.2008, Jim Lemon wrote:
(Ted Harding) wrote:
  On 16-Jan-08 08:45:04, Martin Maechler wrote:
 
 RM == Ron Michael [EMAIL PROTECTED]
 on Wed, 16 Jan 2008 00:14:56 -0800 (PST) writes:
 
 RM Hi all,
 RM Can anyone tell me why I am getting different results in
 calculating SD of 2 numbers ?
 
  (1.25-0.95)/2
 RM [1] 0.15
  sd(c(1.25, 0.95))
 RM [1] 0.2121320  # why it is different from 0.15?
 
 because  1 is different from 2 !
 If 2 was 1, than sqrt(2) == 1 as well, but actually I don't
 think the universe and we all would exist in that case 
 Martin Maechler, ETH
 
 
  Of course we would!! -- Since FALSE implies X is TRUE for any X.
 
  But FALSE would also imply that X is FALSE, so you are entitled
  to your view as well, Martin.
 
Then again, as pi might have been equal to 1 prior to the Big Bang, I
see no reason why sqrt(2) shouldn't have been equal to 1 as well. After
all, in those days we were all one...

Jim


Of course the question is off topic, but I like it. In my 
understanding mathematics is a theoretical model, that may or may not 
describe properly certain aspects of a reality. I cannot see, why a 
theoretical model should have any influence on our existence, as long 
as we don't apply it in an unreasonable way.
To believe in our existence or to prove it is a totally different case.

Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to keep attributes when dropping factor levels?

2009-03-19 Thread Heinz Tuechler

Dear All,

to drop unused factor levels two ways are outlined in R-help. In both 
cases a label attribute is lost.

The same happens, when using car:::recode.
Is there a simple way to avoid losing attributes?

Thanks,

Heinz

## example
ff - factor(substring(statistics, 1:10, 1:10), levels=letters)
attributes(ff)$label - 'test label'
attributes(ff)$label
gg - ff[, drop=TRUE]
attributes(gg)$label
hh - factor(ff)
attributes(hh)$label
ii - car:::recode(ff, 't'='s')
attributes(ii)$label

 version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status Patched
major  2
minor  8.1
year   2009
month  03
day13
svn rev48132
language   R
version.string R version 2.8.1 Patched (2009-03-13 r48132)
 sessionInfo()
R version 2.8.1 Patched (2009-03-13 r48132)
i386-pc-mingw32

locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] car_1.2-12

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] factor, as.factor and levels

2009-04-08 Thread Heinz Tuechler

Dear All,

to my surprise as.factor does not accept a levels argument. Maybe I 
did not read the documentation well enough. See the example below. I 
wanted to use ch1 as factor in the newdata argument of survfit, so I 
assumed that I could write as.factor(ch1, levels=ch1), since the 
order should be kept.


But as.factor(ch1, levels=ch1) results in the error:

Error in as.factor(ch1, levels = ch1) :
  unused argument(s) (levels = c(low, inter, high))

factor(ch1, levels=ch1) works as I expected.
Is it intended that as.factor does not use the levels argument?

Thanks,

Heinz

ch1 - c('low', 'inter', 'high')
factor(ch1)
factor(ch1, levels=ch1)
as.factor(ch1, levels=ch1)

 version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status Patched
major  2
minor  8.1
year   2009
month  03
day13
svn rev48132
language   R
version.string R version 2.8.1 Patched (2009-03-13 r48132)
 sessionInfo()
R version 2.8.1 Patched (2009-03-13 r48132)
i386-pc-mingw32

locale:
LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] survival_2.34-1 car_1.2-12  gmodels_2.14.1  gdata_2.4.2
[5] Hmisc_3.5-2

loaded via a namespace (and not attached):
[1] cluster_1.11.12 grid_2.8.1  gtools_2.5.0-1  lattice_0.17-20
[5] MASS_7.2-46


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factor, as.factor and levels

2009-04-08 Thread Heinz Tuechler

Thank you, Jim. I see, the fact that in the documentation you find only
as.factor(x) means that it does not accept more arguments.
Does as.factor have speed advantages over factor, or is there a 
different cause for it's existence?


Heinz



At 13:50 08.04.2009, jim holtman wrote:

as.factor does not accept levels as an argument.  use the first form
that you have

factor(ch1, levels=ch1)



On Wed, Apr 8, 2009 at 7:36 AM, Heinz Tuechler tuech...@gmx.at wrote:
 Dear All,

 to my surprise as.factor does not accept a levels argument. Maybe I did not
 read the documentation well enough. See the example below. I wanted to use
 ch1 as factor in the newdata argument of survfit, so I assumed that I could
 write as.factor(ch1, levels=ch1), since the order should be kept.

 But as.factor(ch1, levels=ch1) results in the error:

 Error in as.factor(ch1, levels = ch1) :
  unused argument(s) (levels = c(low, inter, high))

 factor(ch1, levels=ch1) works as I expected.
 Is it intended that as.factor does not use the levels argument?

 Thanks,

 Heinz

 ch1 - c('low', 'inter', 'high')
 factor(ch1)
 factor(ch1, levels=ch1)
 as.factor(ch1, levels=ch1)

 version
   _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status Patched
 major  2
 minor  8.1
 year   2009
 month  03
 day13
 svn rev48132
 language   R
 version.string R version 2.8.1 Patched (2009-03-13 r48132)
 sessionInfo()
 R version 2.8.1 Patched (2009-03-13 r48132)
 i386-pc-mingw32

 locale:
 
LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252


 attached base packages:
 [1] splines   stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
 [1] survival_2.34-1 car_1.2-12  gmodels_2.14.1  gdata_2.4.2
 [5] Hmisc_3.5-2

 loaded via a namespace (and not attached):
 [1] cluster_1.11.12 grid_2.8.1  gtools_2.5.0-1  lattice_0.17-20
 [5] MASS_7.2-46


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factor, as.factor and levels

2009-04-08 Thread Heinz Tuechler
Jim - you are right, I should have looked before. So there is a 
difference that should also effect the dropping of unused levels.


Thanks,

Heinz

At 15:31 08.04.2009, jim holtman wrote:

It is just a simple version of 'factor'.  The only speed advantage it
might have is that it checks to see if it is a factor first.  Here is
the definition:

) as.factor
function (x)
if (is.factor(x)) x else factor(x)
environment: namespace:base

You can always list out what the function does to get a better
understanding of how it works.

On Wed, Apr 8, 2009 at 8:16 AM, Heinz Tuechler tuech...@gmx.at wrote:
 Thank you, Jim. I see, the fact that in the documentation you find only
 as.factor(x) means that it does not accept more arguments.
 Does as.factor have speed advantages over factor, or is there a different
 cause for it's existence?

 Heinz



 At 13:50 08.04.2009, jim holtman wrote:

 as.factor does not accept levels as an argument.  use the first form
 that you have

 factor(ch1, levels=ch1)



 On Wed, Apr 8, 2009 at 7:36 AM, Heinz Tuechler tuech...@gmx.at wrote:
  Dear All,
 
  to my surprise as.factor does not accept a levels argument. Maybe I did
  not
  read the documentation well enough. See the example below. I wanted to
  use
  ch1 as factor in the newdata argument of survfit, so I assumed that I
  could
  write as.factor(ch1, levels=ch1), since the order should be kept.
 
  But as.factor(ch1, levels=ch1) results in the error:
 
  Error in as.factor(ch1, levels = ch1) :
   unused argument(s) (levels = c(low, inter, high))
 
  factor(ch1, levels=ch1) works as I expected.
  Is it intended that as.factor does not use the levels argument?
 
  Thanks,
 
  Heinz
 
  ch1 - c('low', 'inter', 'high')
  factor(ch1)
  factor(ch1, levels=ch1)
  as.factor(ch1, levels=ch1)
 
  version
_
  platform   i386-pc-mingw32
  arch   i386
  os mingw32
  system i386, mingw32
  status Patched
  major  2
  minor  8.1
  year   2009
  month  03
  day13
  svn rev48132
  language   R
  version.string R version 2.8.1 Patched (2009-03-13 r48132)
  sessionInfo()
  R version 2.8.1 Patched (2009-03-13 r48132)
  i386-pc-mingw32
 
  locale:
 
  
LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C;LC_TIME=German_Switzerland.1252

 
  attached base packages:
  [1] splines   stats graphics  grDevices utils datasets  methods
  [8] base
 
  other attached packages:
  [1] survival_2.34-1 car_1.2-12  gmodels_2.14.1  gdata_2.4.2
  [5] Hmisc_3.5-2
 
  loaded via a namespace (and not attached):
  [1] cluster_1.11.12 grid_2.8.1  gtools_2.5.0-1  lattice_0.17-20
  [5] MASS_7.2-46
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?






--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (senza oggetto)

2009-04-16 Thread Heinz Tuechler

At 11:10 16.04.2009, giuseppef...@libero.it wrote:

Dear all, I have a database x,y,value imported in R with read.table:

dati-
read.table(dati.dat)

value is a categorical data (land use) and i want to
plot in the same colour the same land use. It is possible with R. Thanks a lot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Maybe, something like:

set.seed(726)
x - runif(10)
y - runif(10)
value - sample(c('agric', 'urban', 'traffic'), 10, replace=TRUE)
plot(x, y, col=as.numeric(as.factor((value))),
 pch=as.numeric(as.factor((value

Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Umlaut read from csv-file

2008-11-06 Thread Heinz Tuechler

Dear All!

Reading character strings containing an umlaut 
from a csv-file I find a (to me) surprising 
behaviour in R 2.8.0, that I did not notice in R 2.7.2.

A comparison by == results in FALSE, while grep does find the aggreement.
See the example below.
The crucial line is x==div 1-2 Veränderungen, 
with the result [1] FALSE in R 2.8.0 but

[1] TRUE in R 2.7.2.

Thank you in advance for your help

Heinz Tüchler

# in R 2.8.0 patched

x0 - div 1-2 Veränderungen # define a character string

write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
rm(x0)

x - read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file
x
x==div 1-2 Veränderungen
 [1] FALSE
grep(div 1-2 Veränderungen, x)
 [1] 1
grep(div 1-2 Veränderungen, x, value=TRUE)
 [1] div 1-2 Veränderungen

unlink('chr.csv') # delete file

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 8.0
 year = 2008
 month = 11
 day = 04
 svn rev = 46830
 language = R
 version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base



# in R 2.7.2 patched


x0 - div 1-2 Veränderungen # define a character string

write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
rm(x0)

x - read.csv('chr.csv', skip=0, header=TRUE, as.is=TRUE)$x # read in csv-file
x
x==div 1-2 Veränderungen
 [1] TRUE
grep(div 1-2 Veränderungen, x)
 [1] 1
grep(div 1-2 Veränderungen, x, value=TRUE)
 [1] div 1-2 Veränderungen

unlink('chr.csv') # delete file

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 7.2
 year = 2008
 month = 09
 day = 02
 svn rev = 46486
 language = R
 version.string = R version 2.7.2 Patched (2008-09-02 r46486)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Umlaut read from csv-file

2008-11-06 Thread Heinz Tuechler

Dear Prof.Ripley!

Thank you very much for your attention. In the 
given example Encoding(), or the encoding 
parameter of read.csv solve the problem. I hope 
your patch will solve also the problem, when I 
read a spss file by spss.get(), since this 
function has no encoding parameter and my real problem originated there.


many thanks

Heinz Tüchler

At 23:51 06.11.2008, you wrote:
Look at Encoding() on your two strings.  The 
results are different, and this seems to be the 
root of the problem.  Adding encoding=latin1 
to the read.csv call is a workaround.


It looks like there is a problem in the use of 
the CHARSXP cache: if I save the session then x0 
== x becomes true when I reload it, even though the encodings remain different.


I've found the immediate cause and will change this in R-patched shortly.

On Thu, 6 Nov 2008, Heinz Tuechler wrote:


Dear All!

Reading character strings containing an 
umlaut from a csv-file I find a (to me) 
surprising behaviour in R 2.8.0, that I did not notice in R 2.7.2.

A comparison by == results in FALSE, while grep does find the aggreement.
See the example below.
The crucial line is x==div 1-2 Veränderungen, 
with the result [1] FALSE in R 2.8.0 but

[1] TRUE in R 2.7.2.

Thank you in advance for your help

Heinz Tüchler

# in R 2.8.0 patched

x0 - div 1-2 Veränderungen # define a character string

write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
rm(x0)

x - read.csv('chr.csv', skip=0, header=TRUE, 
as.is=TRUE)$x # read in csv-file

x
x==div 1-2 Veränderungen

[1] FALSE

grep(div 1-2 Veränderungen, x)

[1] 1

grep(div 1-2 Veränderungen, x, value=TRUE)

[1] div 1-2 Veränderungen


unlink('chr.csv') # delete file

Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status = Patched
major = 2
minor = 8.0
year = 2008
month = 11
day = 04
svn rev = 46830
language = R
version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
.GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base



# in R 2.7.2 patched


x0 - div 1-2 Veränderungen # define a character string

write.csv(x0, 'chr.csv', row.names=FALSE) # write a csv-file with one line
rm(x0)

x - read.csv('chr.csv', skip=0, header=TRUE, 
as.is=TRUE)$x # read in csv-file

x
x==div 1-2 Veränderungen

[1] TRUE

grep(div 1-2 Veränderungen, x)

[1] 1

grep(div 1-2 Veränderungen, x, value=TRUE)

[1] div 1-2 Veränderungen


unlink('chr.csv') # delete file

Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status = Patched
major = 2
minor = 7.2
year = 2008
month = 09
day = 02
svn rev = 46486
language = R
version.string = R version 2.7.2 Patched (2008-09-02 r46486)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
.GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Encoding() and strsplit()

2008-11-06 Thread Heinz Tuechler

Dear All,

Encoding() goes beyond my understanding. See the 
example. I would expect from reading the help for 
Encoding() that strsplit preserves the encoding 
for each resulting element, but for simple letters it gets lost.
Also it seems that an Encoding() cannot be 
declared for simple letters. They remain in any 
case unknown. In paste() latin1 seems to dominate unknown.
What kind of characteristic of an object is the 
encoding? It does not show up as attribute and 
also str() does not give me any hint.

Where can I find some explanation regarding encoding?

Thanks

Heinz

###   Encoding() and strsplit
u - 'abcäöü'
Encoding(u)
[1] latin1
Encoding(u) - 'latin1' # to be sure about encoding
us - strsplit(u, '')[[1]] # split in single strings
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
Encoding(us) - rep('latin1', length(us))
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
pus - paste(us[1], us[5], sep='')
Encoding(pus)
[1] latin1

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 8.0
 year = 2008
 month = 11
 day = 04
 svn rev = 46830
 language = R
 version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Encoding() and strsplit()

2008-11-07 Thread Heinz Tuechler

At 09:15 07.11.2008, Prof Brian Ripley wrote:

See the 'R Internals' manual.


Thank you, now I understand a little more.
My real problem, however is a data frame produced 
by spss.get(). Is there a simple possibility to 
mark all characters in that data.frame (except 
ASCII characters), including levels of factors to latin1?


Heinz Tüchler



ASCII characters are not marked as Latin-1 nor UTF-8.

On Fri, 7 Nov 2008, Heinz Tuechler wrote:


Dear All,

Encoding() goes beyond my understanding. See 
the example. I would expect from reading the 
help for Encoding() that strsplit preserves the 
encoding for each resulting element, but for simple letters it gets lost.
Also it seems that an Encoding() cannot be 
declared for simple letters. They remain in any 
case unknown. In paste() latin1 seems to dominate unknown.
What kind of characteristic of an object is the 
encoding? It does not show up as attribute and 
also str() does not give me any hint.

Where can I find some explanation regarding encoding?

Thanks

Heinz

###   Encoding() and strsplit
u - 'abcäöü'
Encoding(u)
[1] latin1
Encoding(u) - 'latin1' # to be sure about encoding
us - strsplit(u, '')[[1]] # split in single strings
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
Encoding(us) - rep('latin1', length(us))
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
pus - paste(us[1], us[5], sep='')
Encoding(pus)
[1] latin1

Version:
platform = i386-pc-mingw32
arch = i386
os = mingw32
system = i386, mingw32
status = Patched
major = 2
minor = 8.0
year = 2008
month = 11
day = 04
svn rev = 46830
language = R
version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
.GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Umlaut read from csv-file

2008-11-07 Thread Heinz Tuechler

At 13:34 07.11.2008, Peter Dalgaard wrote:

Heinz Tuechler wrote:
 Dear Prof.Ripley!

 Thank you very much for your attention. In the given example Encoding(),
 or the encoding parameter of read.csv solve the problem. I hope your
 patch will solve also the problem, when I read a spss file by
 spss.get(), since this function has no encoding parameter and my real
 problem originated there.

read.spss() (package foreign) does have a reencode argument, though; and
 this is called by spss.get(), so it looks like an easy hack to add it
there.


Thank you, that means, I have to change spss.get 
to make it accept the reencode argument and pass 
it to read.spss. At the moment I prefer to step 
back to R 2.7.2 and to wait for a more general 
solution, because to me, there seem to be still strange effects of encoding.


In the following example the encoding gets lost 
by dumping and rereading, even if I use the 
encoding parameter of source(). But may be, I 
don't understand what this parameter should do.


Heinz Tüchler


us - c(a, b, c, ä, ö, ü)
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
dump('us', 'us_dump.txt')
rm(us)
source('us_dump.txt', encoding='latin1')
us
[1] a b c ä ö ü
Encoding(us)
[1] unknown unknown unknown unknown unknown unknown
unlink('us_dump.txt')






--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mismatch in logical result?

2008-11-07 Thread Heinz Tuechler

Maybe this?
http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

At 11:23 07.11.2008, Shubha Vishwanath Karanth wrote:

Content-Type: text/plain
Content-Disposition: inline
Content-length: 569



Hi R,



I have certain checkings, which gives FALSE, but 
actually it is true. Why does this happen? Note 
that the equations that I am checking below are 
not even the case of recurring decimals...




 1.4^2 == 1.96

[1] FALSE



 1.2^3==1.728

[1] FALSE





Thanks in advance, Shubha

Shubha Karanth | Amba Research

Ph +91 80 3980 8031 | Mob +91 94 4886 4510

Bangalore * Colombo * London * New York * San 
José * Singapore * www.ambaresearch.com




This e-mail may contain confidential and/or privileged i...{{dropped:13}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Umlaut read from csv-file

2008-11-07 Thread Heinz Tuechler

At 16:52 07.11.2008, Prof Brian Ripley wrote:

On Fri, 7 Nov 2008, Peter Dalgaard wrote:


Heinz Tuechler wrote:

Dear Prof.Ripley!

Thank you very much for your attention. In the given example Encoding(),
or the encoding parameter of read.csv solve the problem. I hope your
patch will solve also the problem, when I read a spss file by
spss.get(), since this function has no encoding parameter and my real
problem originated there.


read.spss() (package foreign) does have a reencode argument, though; and
this is called by spss.get(), so it looks like an easy hack to add it
there.


Yes, older software like spss.get needs to get 
updated for the internationalization 
age.  Modifying it to have a ... argument passed 
to read.spss would be a good idea (and future-proofing).


In cases like this it is likely that the SPSS 
file does contain its encoding (although 
sometimes it does not and occasionally it is 
wrong), so it is helpful to make use of the info 
if it is there.  However, the default is 
read.spss(reencode=NA) because of the problems 
of assuming that the info is correct when it is not are worse.


The cause, why I tried the example below was to 
solve the encoding by dumping and then 
re-sourcing a data.frame with the encoding 
parameter set to latin1. As you can see, 
source(x, encoding='latin1') does not have the 
effect I expected. Unfortunately I do not have 
any idea, what I understood wrong regarding the meaning of encoding='latin1'.


Heinz Tüchler


us - c(a, b, c, ä, ö, ü)
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
dump('us', 'us_dump.txt')
rm(us)
source('us_dump.txt', encoding='latin1')
us
[1] a b c ä ö ü
Encoding(us)
[1] unknown unknown unknown unknown unknown unknown
unlink('us_dump.txt')





--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Umlaut read from csv-file

2008-11-08 Thread Heinz Tuechler

At 08:01 08.11.2008, Prof Brian Ripley wrote:

We have no idea what you understood (you didn't tell us), but the help says

encoding: character vector.  The encoding(s) to be assumed when 'file'
  is a character string: see 'file'.  A possible value is
  'unknown': see the ‘Details’.

...
 This paragraph applies if 'file' is a filename (rather than a
 connection).  If 'encoding = unknown', an attempt is made to
 guess the encoding.  The result of 'localeToCharset()' is used as
 a guide.  If 'encoding' has two or more elements, they are tried
 in turn until the file/URL can be read without error in the trial
 encoding.

So source(encoding=latin1) says the file is 
encoded in Latin-1 and should be re-encoded if 
necessary (e.g. in  UTF-8 locale).


Setting the Encoding of parsed character strings is not mentioned.

You could have written out a data frame with 
write.csv() and re-read it with 
read.csv(encoding = latin1): that was the 
workaround you were given earlier (not to use source).


Thank you for this explanation. I felt that I did 
not understand the help page of source() and I 
hoped, encoding='latin1' would have the same 
effect as in read.csv(), but rethinking it, I see 
that it would conflict with the primary functionality of source().
Earlier I tried writing the data.frame with 
write.csv and re-reading it. This works, but 
additional information like labels(), I have to tranfer in a second step.
The best way I could immagine, would be some 
function, which marks every character string in 
the whole structure of a data.frame, including all attributes, as latin1.



On Sat, 8 Nov 2008, Heinz Tuechler wrote:


At 16:52 07.11.2008, Prof Brian Ripley wrote:

On Fri, 7 Nov 2008, Peter Dalgaard wrote:


Heinz Tuechler wrote:

Dear Prof.Ripley!
Thank you very much for your attention. In the given example Encoding(),
or the encoding parameter of read.csv solve the problem. I hope your
patch will solve also the problem, when I read a spss file by
spss.get(), since this function has no encoding parameter and my real
problem originated there.

read.spss() (package foreign) does have a reencode argument, though; and
this is called by spss.get(), so it looks like an easy hack to add it
there.
Yes, older software like spss.get needs to get 
updated for the internationalization 
age.  Modifying it to have a ... argument 
passed to read.spss would be a good idea (and future-proofing).
In cases like this it is likely that the SPSS 
file does contain its encoding (although 
sometimes it does not and occasionally it is 
wrong), so it is helpful to make use of the 
info if it is there.  However, the default is 
read.spss(reencode=NA) because of the problems 
of assuming that the info is correct when it is not are worse.


The cause, why I tried the example below was to 
solve the encoding by dumping and then 
re-sourcing a data.frame with the encoding 
parameter set to latin1. As you can see, 
source(x, encoding='latin1') does not have the 
effect I expected. Unfortunately I do not have 
any idea, what I understood wrong regarding the meaning of encoding='latin1'.


Heinz Tüchler


us - c(a, b, c, ä, ö, ü)
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
dump('us', 'us_dump.txt')
rm(us)
source('us_dump.txt', encoding='latin1')
us
[1] a b c ä ö ü
Encoding(us)
[1] unknown unknown unknown unknown unknown unknown
unlink('us_dump.txt')





--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] attr.all.equal() and all.equal(attributes(), attributes())

2008-11-09 Thread Heinz Tuechler

Dear All!

If I try to compare the attributes of two 
objects, I find a surprising behaviour of 
attr.all.equal(). With identical attributes I 
receive the answert NULL. If the attributes 
differ, the answer is as expecxted and differences are shown.
all.equal(attributes(), attributes()) instead 
returns TRUE, if attributes are equal.


See example:

v - 1:5
attr(v, 'testattribute') - 'testattribute v'
v_c - v
attributes(v)
$testattribute
[1] testattribute v

attributes(v_c)
$testattribute
[1] testattribute v

all.equal(v, v_c)
[1] TRUE
attr.all.equal(v, v_c)
NULL   - - - - - - - - - - here is, what I don't expected
all.equal(attributes(v), attributes(v_c))
[1] TRUE

attr(v_c, 'testattribute') - 'testattribute v_c'
attr.all.equal(v, v_c)
[1] Attributes:  Component 1: 1 string mismatch 
all.equal(attributes(v), attributes(v_c))
[1] Component 1: 1 string mismatch

Thanks for your attention

Heinz Tüchler

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 8.0
 year = 2008
 month = 11
 day = 04
 svn rev = 46830
 language = R
 version.string = R version 2.8.0 Patched (2008-11-04 r46830)

Windows XP (build 2600) Service Pack 2

Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Compare objects

2008-11-09 Thread Heinz Tuechler

At 13:26 09.11.2008, Leon Yee wrote:

Hi, friends

   Is there any functions for object comparing? For example, I have two
list objects, and I want to know whether they are the same. Since the
the components of list are not necessary atomic, this kind of comparison
should be recursive. Does this kind of function exist?
   Thank you for your help!

Leon


see maybe:
all.equal()
identical()

Heinz


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Umlaut read from csv-file

2008-11-09 Thread Heinz Tuechler

At 06:25 09.11.2008, Prof Brian Ripley wrote:

On Sat, 8 Nov 2008, Heinz Tuechler wrote:


At 08:01 08.11.2008, Prof Brian Ripley wrote:

We have no idea what you understood (you didn't tell us), but the help says
encoding: character vector.  The encoding(s) to be assumed when 'file'
  is a character string: see 'file'.  A possible value is
  'unknown': see the â??Detailsâ??.
...
 This paragraph applies if 'file' is a filename (rather than a
 connection).  If 'encoding = unknown', an attempt is made to
 guess the encoding.  The result of 'localeToCharset()' is used as
 a guide.  If 'encoding' has two or more elements, they are tried
 in turn until the file/URL can be read without error in the trial
 encoding.
So source(encoding=latin1) says the file is 
encoded in Latin-1 and should be re-encoded if 
necessary (e.g. in  UTF-8 locale).

Setting the Encoding of parsed character strings is not mentioned.
You could have written out a data frame with 
write.csv() and re-read it with 
read.csv(encoding = latin1): that was the 
workaround you were given earlier (not to use source).


Thank you for this explanation. I felt that I 
did not understand the help page of source() 
and I hoped, encoding='latin1' would have the 
same effect as in read.csv(), but rethinking 
it, I see that it would conflict with the primary functionality of source().
Earlier I tried writing the data.frame with 
write.csv and re-reading it. This works, but 
additional information like labels(), I have to tranfer in a second step.
The best way I could immagine, would be some 
function, which marks every character string in 
the whole structure of a data.frame, including all attributes, as latin1.


I think it is possible that

con - file(foo)
source(con, encoding=latin1)
close(foo)

will also do what you want, although that's an udocumented side effect.


You are right. It does work in my real data problem. Thank you.

(minor remark: I think close(foo) should be close(con))


But all of this should be unnecessary in 
R-patched (although it is possible that there 
are other quirks with unmarked strings lurking 
in the shadows, there are no other obvious changes from 2.7.2).





On Sat, 8 Nov 2008, Heinz Tuechler wrote:


At 16:52 07.11.2008, Prof Brian Ripley wrote:

On Fri, 7 Nov 2008, Peter Dalgaard wrote:


Heinz Tuechler wrote:

Dear Prof.Ripley!
Thank you very much for your attention. In the given example Encoding(),
or the encoding parameter of read.csv solve the problem. I hope your
patch will solve also the problem, when I read a spss file by
spss.get(), since this function has no encoding parameter and my real
problem originated there.

read.spss() (package foreign) does have a reencode argument, though; and
this is called by spss.get(), so it looks like an easy hack to add it
there.
Yes, older software like spss.get needs to 
get updated for the internationalization 
age.  Modifying it to have a ... argument 
passed to read.spss would be a good idea (and future-proofing).
In cases like this it is likely that the 
SPSS file does contain its encoding 
(although sometimes it does not and 
occasionally it is wrong), so it is helpful 
to make use of the info if it is 
there.  However, the default is 
read.spss(reencode=NA) because of the 
problems of assuming that the info is correct when it is not are worse.
The cause, why I tried the example below was 
to solve the encoding by dumping and then 
re-sourcing a data.frame with the encoding 
parameter set to latin1. As you can see, 
source(x, encoding='latin1') does not have 
the effect I expected. Unfortunately I do not 
have any idea, what I understood wrong 
regarding the meaning of encoding='latin1'.

Heinz Tüchler

us - c(a, b, c, ä, ö, ü)
Encoding(us)
[1] unknown unknown unknown latin1  latin1  latin1
dump('us', 'us_dump.txt')
rm(us)
source('us_dump.txt', encoding='latin1')
us
[1] a b c ä ö ü
Encoding(us)
[1] unknown unknown unknown unknown unknown unknown
unlink('us_dump.txt')



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595





--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied

Re: [R] attr.all.equal() and all.equal(attributes(), attributes())

2008-11-09 Thread Heinz Tuechler

At 14:24 09.11.2008, Peter Dalgaard wrote:

Heinz Tuechler wrote:

Dear All!
If I try to compare the attributes of two 
objects, I find a surprising behaviour of 
attr.all.equal(). With identical attributes I 
receive the answert NULL. If the attributes 
differ, the answer is as expecxted and differences are shown.
all.equal(attributes(), attributes()) instead 
returns TRUE, if attributes are equal.


That _is_ as documented,  although if you want 
to quibble, it should probably also have been mentioned in the Value section.


Sorry, I admit that I read only the value section.
Heinz




I don't know if there's a a rationale for it. The actual code goes

msg - NULL
if () msg - c(msg, some text)
if (.
msg

so it returns NULL if none of the ifs are taken, 
but it could easily be changed to return


if (is.null(msg)) TRUE else msg

(easily depending, of course, on how much code 
actually depends on the current behaviour...)



See example:
v - 1:5
attr(v, 'testattribute') - 'testattribute v'
v_c - v
attributes(v)
$testattribute
[1] testattribute v
attributes(v_c)
$testattribute
[1] testattribute v
all.equal(v, v_c)
[1] TRUE
attr.all.equal(v, v_c)
NULL   - - - - - - - - - - here is, what I don't expected
all.equal(attributes(v), attributes(v_c))
[1] TRUE
attr(v_c, 'testattribute') - 'testattribute v_c'
attr.all.equal(v, v_c)
[1] Attributes:  Component 1: 1 string mismatch 
all.equal(attributes(v), attributes(v_c))
[1] Component 1: 1 string mismatch
Thanks for your attention
Heinz Tüchler
Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 8.0
 year = 2008
 month = 11
 day = 04
 svn rev = 46830
 language = R
 version.string = R version 2.8.0 Patched (2008-11-04 r46830)
Windows XP (build 2600) Service Pack 2
Locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252 


Search Path:
 .GlobalEnv, package:stats, package:graphics, 
package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survival::survfit,plot.survfit

2009-02-27 Thread Heinz Tuechler

At 15:28 26.02.2009, Terry Therneau wrote:

 plot(survfit(fit)) should plot the survival-function for x=0 or
 equivalently beta'=0. This curve is independent of any covariates.

  This is not correct.  It plots the curve for a hypothetical 
subject with x=

mean of each covariate.


Does this mean, the curve corresponds to the one you would get based 
on the base line hazard?


Heinz


  This is NOT the average survival of the data set.  Imagine a 
cohort made up

of 60 year old men and their 10 year old grandsons: the expected survival of
this cohort does not look that for a 35 year old male.

Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to write a Surv object to a csv-file?

2008-12-19 Thread Heinz Tuechler

Dear All,

trying to write a data.frame, containing Surv objects to a csv-file I get
Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) :
  length of 'dimnames' [2] not equal to array extent.

See example below.

May be, I overlooked something, but I expected 
that also data.frames containing Surv objects may be written to csv files.


Is there a better way to write to csv files?

Thanks,

Heinz Tüchler



###   write Surv-object in csv-file
library(survival)
## create example data
soa - Surv(1:5, c(0, 0, 1, 0, 1))
df.soa - data.frame(soa)
write.csv(df.soa, 'df.soa.csv')## works as I expected
read.csv('df.soa.csv')  ## works as I expected

df.soa2 - data.frame(soa, soa2=soa)
write.csv(df.soa2, 'df.soa2.csv')  ## works as I expected
read.csv('df.soa2.csv')## works as I expected

char1 - letters[1:5]
df.soac - data.frame(soa, char1)
write.csv(df.soac, 'df.soac.csv')  ## generates the following error message:

Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) :
  length of 'dimnames' [2] not equal to array extent

df.csoa - data.frame(char1, soa)
write.csv(df.csoa, 'df.soac.csv')  ## generates the following error message:

Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names = FALSE)) :
  length of 'dimnames' [2] not equal to array extent


platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status Patched
major  2
minor  8.0
year   2008
month  11
day10
svn rev46884
language   R
version.string R version 2.8.0 Patched (2008-11-10 r46884)
 sessionInfo()
R version 2.8.0 Patched (2008-11-10 r46884)
i386-pc-mingw32

locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] survival_2.34-1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to write a Surv object to a csv-file?

2008-12-19 Thread Heinz Tuechler

Dear David!

Thank you for your response. I like csv files, 
because in that case I can easily compare 
different versions of similar data.frames. 
Similar in this case means that I may add a 
column or change some transformation command for 
one column. With dput it's rather difficult, and 
when I tried the compare package, I had no 
success comparing data.frames containing Surv objects.


Thanks again

Heinz

At 22:31 19.12.2008, David Winsemius wrote:


On Dec 19, 2008, at 2:04 PM, Heinz Tuechler wrote:


Dear All,

trying to write a data.frame, containing Surv objects to a csv-file
I get
Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names =
FALSE)) :
 length of 'dimnames' [2] not equal to array extent.

See example below.

May be, I overlooked something, but I expected that also data.frames
containing Surv objects may be written to csv files.

Is there a better way to write to csv files?


Yes, if the goal is creating an ASCII structure that can be recovered
by an R interpreter:

?dput
?dget

 dput(df.soac, test)
 copy.df.soac - dget(test)
 all.equal(df.soac, copy.df.soac)

 Doesn't give you a result that you would want to read with Excel,
but that does not appear to be your goal. You can examine it with a
text editor.

--
David Winsemius





Thanks,

Heinz Tüchler



###   write Surv-object in csv-file
library(survival)
## create example data
soa - Surv(1:5, c(0, 0, 1, 0, 1))
df.soa - data.frame(soa)
write.csv(df.soa, 'df.soa.csv')## works as I expected
read.csv('df.soa.csv')  ## works as I expected

df.soa2 - data.frame(soa, soa2=soa)
write.csv(df.soa2, 'df.soa2.csv')  ## works as I expected
read.csv('df.soa2.csv')## works as I expected

char1 - letters[1:5]
df.soac - data.frame(soa, char1)
write.csv(df.soac, 'df.soac.csv')  ## generates the following error
message:

Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names =
FALSE)) :
 length of 'dimnames' [2] not equal to array extent

df.csoa - data.frame(char1, soa)
write.csv(df.csoa, 'df.soac.csv')  ## generates the following error
message:

Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names =
FALSE)) :
 length of 'dimnames' [2] not equal to array extent


platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status Patched
major  2
minor  8.0
year   2008
month  11
day10
svn rev46884
language   R
version.string R version 2.8.0 Patched (2008-11-10 r46884)
 sessionInfo()
R version 2.8.0 Patched (2008-11-10 r46884)
i386-pc-mingw32

locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria. 
1252;LC_MONETARY=German_Austria. 1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252


attached base packages:
[1] splines   stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
[1] survival_2.34-1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to write a Surv object to a csv-file?

2008-12-20 Thread Heinz Tuechler

Dear Charles,

yes, your solution does what I need.
Maybe, it offers also a way to use the compare package with Surv objects.

Thank you,

Heinz


At 23:30 19.12.2008, Charles C. Berry wrote:

On Fri, 19 Dec 2008, Heinz Tuechler wrote:


Dear David!

Thank you for your response. I like csv files, 
because in that case I can easily compare 
different versions of similar data.frames. 
Similar in this case means that I may add a 
column or change some transformation command 
for one column. With dput it's rather 
difficult, and when I tried the compare 
package, I had no success comparing data.frames containing Surv objects.


Heinz,

Is this good enough?


mat - as.data.frame( lapply( df.soac, unclass ) )
write.csv(mat,'mat.csv')
read.csv('mat.csv')

  X soa.time soa.status char1
1 11  0 1
2 22  0 2
3 33  1 3
4 44  0 4
5 55  1 5


The bug seems to be in as.matrix.data.frame.

HTH,

Chuck



Thanks again

Heinz

At 22:31 19.12.2008, David Winsemius wrote:


On Dec 19, 2008, at 2:04 PM, Heinz Tuechler wrote:
 Dear All,
  trying to write a data.frame, containing Surv objects to a csv-file
 I get
 Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names =
 FALSE)) :
   length of 'dimnames' [2] not equal to array extent.
  See example below.
  May be, I overlooked something, but I expected that also data.frames
 containing Surv objects may be written to csv files.
  Is there a better way to write to csv files?
Yes, if the goal is creating an ASCII structure that can be recovered
by an R interpreter:
?dput
?dget
  dput(df.soac, test)
  copy.df.soac - dget(test)
  all.equal(df.soac, copy.df.soac)

  Doesn't give you a result that you would want to read with Excel,
but that does not appear to be your goal. You can examine it with a
text editor.
--
David Winsemius

   Thanks,
  Heinz Tüchler
###   write Surv-object in csv-file
 library(survival)
 ## create example data
 soa - Surv(1:5, c(0, 0, 1, 0, 1))
 df.soa - data.frame(soa)
 write.csv(df.soa, 'df.soa.csv')## works as I expected
 read.csv('df.soa.csv')  ## works as I expected
  df.soa2 - data.frame(soa, soa2=soa)
 write.csv(df.soa2, 'df.soa2.csv')  ## works as I expected
 read.csv('df.soa2.csv')## works as I expected
  char1 - letters[1:5]
 df.soac - data.frame(soa, char1)
 write.csv(df.soac, 'df.soac.csv')  ## generates the following error
 message:
  Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names =
 FALSE)) :
   length of 'dimnames' [2] not equal to array extent
  df.csoa - data.frame(char1, soa)
 write.csv(df.csoa, 'df.soac.csv')  ## generates the following error
 message:
  Error in dimnames(X) - list(dn[[1L]], unlist(collabs, use.names =
 FALSE)) :
   length of 'dimnames' [2] not equal to array extent
   platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status Patched
 major  2
 minor  8.0
 year   2008
 month  11
 day10
 svn rev46884
 language   R
 version.string R version 2.8.0 Patched (2008-11-10 r46884)
   sessionInfo()
 R version 2.8.0 Patched (2008-11-10 r46884)
 i386-pc-mingw32
  locale:
 
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.  
  1252;LC_MONETARY=German_Austria.  
1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

  attached base packages:
 [1] splines   stats graphics  grDevices utils datasets
 methods
 [8] base
  other attached packages:
 [1] survival_2.34-1
  __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry(858) 534-2098

Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How many attributes are there of a variable?

2009-09-07 Thread Heinz Tuechler
Peng, based on a suggestion, Frank made years ago (18.7.2006), I use 
one attribute that contains all further attributes, I want to assign 
to variables. It's necessary to create your own class and subsetting 
method, so that this attribute does not get lost. Together with some 
functions I use labels for variables, value.labels, missing.value 
definitions etc.
It seems, without protection by your own class and the corresponding 
subsetting method, you can never be sure, if an attribute survives subsetting.


Heinz

At 23:21 06.09.2009, Frank E Harrell Jr wrote:

Peng,

You can create all the attributes you want, with one headache: R 
does not keep attributes across subsetting operations so you need to 
write classes and [.something methods when attributions need to be 
kept or adjusted upon subsetting rows.


The Hmisc package uses attributes such as label, units, 
imputed.  You might look at the code to see how it did that.  For 
example, label(x) will use attr(x, 'label') to fetch the 'label' 
attribute.  There are attribute-setting functions there too.


Frank


Peng Yu wrote:

Hi,
According to the example below this email, attr(x,names) is the same
as names(x). I am wondering how many attributes there are of a given
variable. How to find out what they are? Can I always use
some_attribute(x) instead of attr(x, some_attribute)?
Regards,
Peng


x=c(1,2,3)
attr(x,names)=c(a,b,c)
x

a b c
1 2 3

y=c(1,2,3)
names(y)=c(a,b,c)
y

a b c
1 2 3
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What determines the unit of POSIXct differences?

2009-09-11 Thread Heinz Tuechler

Jim - Thank you very much for this explanation and the hint to use difftime.

Heinz

At 15:09 11.09.2009, jim holtman wrote:

'-' calls 'difftime' which, if you don't specify the units, makes the
following assumptions in the code:

 difftime
function (time1, time2, tz = , units = c(auto, secs, mins,
hours, days, weeks))
{
time1 - as.POSIXct(time1, tz = tz)
time2 - as.POSIXct(time2, tz = tz)
z - unclass(time1) - unclass(time2)
units - match.arg(units)
if (units == auto) {
if (all(is.na(z)))
units - secs
else {
zz - min(abs(z), na.rm = TRUE)
if (is.na(zz) || zz  60)
units - secs
else if (zz  3600)
units - mins
else if (zz  86400)
units - hours
else units - days
}
}
switch(units, secs = structure(z, units = secs, class = difftime),
mins = structure(z/60, units = mins, class = difftime),
hours = structure(z/3600, units = hours, class = difftime),
days = structure(z/86400, units = days, class = difftime),
weeks = structure(z/(7 * 86400), units = weeks, class = 
difftime))

}


You can use difftime explicitly so you can control the units.

 c(as.POSIXct('2009-09-01'), as.POSIXct('2009-10-11')) 
-   as.POSIXct('2009-08-31')

Time differences in days
[1]  1 41
 difftime(c(as.POSIXct('2009-09-01'), as.POSIXct('2009-10-11')), 
as.POSIXct('2009-08-31'), units='sec')

Time differences in secs
[1]   86400 3542400




On Fri, Sep 11, 2009 at 7:50 AM, Heinz Tuechler tuech...@gmx.at wrote:
 Dear All,

 what determines if a difference between POSIXct objects gets expressed in
 days or seconds?
 In the following example, it's sometimes seconds, sometimes days.

 as.POSIXct('2009-09-01') - as.POSIXct(NA)
 Time difference of NA secs

 c(as.POSIXct('2009-09-01'), as.POSIXct(NA)) -
  c(as.POSIXct('2009-09-01'), as.POSIXct('2009-08-31'))
 Time differences in secs
 [1]  0 NA

 c(as.POSIXct('2009-09-01'), as.POSIXct(NA)) -
  as.POSIXct('2009-08-31')
 Time differences in days
 [1]  1 NA

 Thanks,
 Heinz

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread Heinz Tuechler

Dear All,

to produce output of several columns of a data frame, I tried to use 
lapply and also l_ply. In both cases, I would like to print a header 
line containing also the name of the respective column in the data frame.


For example, I would like the following

lapply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x

to produce:
[1] a
[1] b

and not, what it actually does:
[1] X[[1L]]
[1] X[[2L]]
$a
[1] X[[1L]]

$b
[1] X[[2L]]

or with l_ply (plyr package)
l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x

to produce:
[1] a
[1] b

and not, what it actually does:
[1] .data[[i]]
[1] .data[[i]]

Is this possible?

Thanks,
Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread Heinz Tuechler

Thank you, Henrique,

my example was simplified. In a more complexe 
function I want to use the objects, not just 
their names. In your solution, I have to adapt 
the function itself, depending on the name of the 
data.frame, which I would like to avoid.


Thanks,
Heinz


At 13:36 28.09.2009, Henrique Dallazuanna wrote:

You can use names insteed:

DF - data.frame(a=1:3, b=2:4)
lapply(names(DF), function(x){
print(x)
DF[x]
})

On Mon, Sep 28, 2009 at 8:22 AM, Heinz Tuechler tuech...@gmx.at wrote:
 Dear All,

 to produce output of several columns of a data frame, I tried to use lapply
 and also l_ply. In both cases, I would like to print a header line
 containing also the name of the respective column in the data frame.

 For example, I would like the following

 lapply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x

 to produce:
 [1] a
 [1] b

 and not, what it actually does:
 [1] X[[1L]]
 [1] X[[2L]]
 $a
 [1] X[[1L]]

 $b
 [1] X[[2L]]

 or with l_ply (plyr package)
 l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x

 to produce:
 [1] a
 [1] b

 and not, what it actually does:
 [1] .data[[i]]
 [1] .data[[i]]

 Is this possible?

 Thanks,
 Heinz

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.




--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread Heinz Tuechler

Henrique,

based on your solution I found out, how to avoid to name explicitly the object.

lapply(data.frame(a=1:3, b=2:4), function(x)
  names(eval(as.list(sys.call(-1))[[2]]))
   [as.numeric(gsub([^0-9], , deparse(substitute(x]
)

Thanks,
Heinz

At 13:57 28.09.2009, Henrique Dallazuanna wrote:

Heinz,

Try this:

lapply(DF, function(x)names(DF)[as.numeric(gsub([^0-9], ,
deparse(substitute(x])

On Mon, Sep 28, 2009 at 8:43 AM, Heinz Tuechler tuech...@gmx.at wrote:
 Thank you, Henrique,

 my example was simplified. In a more complexe function I want to use the
 objects, not just their names. In your solution, I have to adapt the
 function itself, depending on the name of the 
data.frame, which I would like

 to avoid.

 Thanks,
 Heinz


 At 13:36 28.09.2009, Henrique Dallazuanna wrote:

 You can use names insteed:

 DF - data.frame(a=1:3, b=2:4)
 lapply(names(DF), function(x){
print(x)
DF[x]
})

 On Mon, Sep 28, 2009 at 8:22 AM, Heinz Tuechler tuech...@gmx.at wrote:
  Dear All,
 
  to produce output of several columns of a data frame, I tried to use
  lapply
  and also l_ply. In both cases, I would like to print a header line
  containing also the name of the respective column in the data frame.
 
  For example, I would like the following
 
  lapply(data.frame(a=1:3, b=2:4), function(x)
  print(deparse(substitute(x
 
  to produce:
  [1] a
  [1] b
 
  and not, what it actually does:
  [1] X[[1L]]
  [1] X[[2L]]
  $a
  [1] X[[1L]]
 
  $b
  [1] X[[2L]]
 
  or with l_ply (plyr package)
  l_ply(data.frame(a=1:3, b=2:4), function(x)
  print(deparse(substitute(x
 
  to produce:
  [1] a
  [1] b
 
  and not, what it actually does:
  [1] .data[[i]]
  [1] .data[[i]]
 
  Is this possible?
 
  Thanks,
  Heinz
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O






--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread Heinz Tuechler

Hadley,

many thanks for your answer and for the enormous work you put into 
plyr, a really powerful package.
For now, I will solve my problem with a variable label attribute, I 
usually attach to columns in data frames. I asked the list, because I 
thought, I am overlooking something trivial, since lapply itself 
apparently knows the object names, as it labels the output by them. 
It just does not supply them to the function it calls.
Maybe deparse(substitute(x)) with the right environment would do it, 
but I did not find it.


Thanks,
Heinz


At 16:27 28.09.2009, hadley wickham wrote:

 or with l_ply (plyr package)
 l_ply(data.frame(a=1:3, b=2:4), function(x) print(deparse(substitute(x


The best way to do this is to supply both the object you want to
iterate over, and its names.  Unfortunately it's slightly difficult to
create a data structure of the correct form to do this with m_ply.

df - data.frame(a=1:3, b=2:4)
input - list(x = df, name = names(df))
inputdf - structure(input,
class = data.frame,
row.names = seq_along(input[[1]]))

m_ply(inputdf, function(x, name) {
  cat(name, -\n)
  print(x)
})

I'll think about how to improve this for a future version.

Hadley


--
http://had.co.nz/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to assess object names within a function in lapply or l_ply?

2009-09-28 Thread Heinz Tuechler

At 18:17 28.09.2009, hadley wickham wrote:

 many thanks for your answer and for the enormous work you put into plyr, a
 really powerful package.
 For now, I will solve my problem with a variable label attribute, I usually
 attach to columns in data frames. I asked the list, because I thought, I am
 overlooking something trivial, since lapply itself apparently knows the
 object names, as it labels the output by them. It just does not supply them
 to the function it calls.

lapply knows the names - the calling function doesn't - it takes the
output add then fixes up the names after it's run.

Hadley


A theoretical question, as you are not responsible for lapply: would 
you think that problems arise, if lapply would name each list object 
with it's name as it calls the function in it's body, instead of 
naming it X[[1L]], ... ?


Heinz


--
http://had.co.nz/


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different result of multiple regression in R and SPSS

2011-07-20 Thread Heinz Tuechler

At 19.07.2011 18:50 -0700, Spencer Graves wrote:

On 7/19/2011 4:04 PM, Bert Gunter wrote:
On Tue, Jul 19, 2011 at 3:45 PM, David 
Winsemiusdwinsem...@comcast.net  wrote:

On Jul 19, 2011, at 6:29 PM, J. wrote:


Thanks for the answer.

#

However, I am still curious about which result I should use? The result
from
R or the one from SPSS?

It is becoming apparent that you do not know how to use the results from
either system. The progress of science would be safer if you get some advice
from a person that knows what they are doing.

##
I nominate this for an R fortune.

-- Bert


None of us ever know what we're doing at some 
level.  We often think we do, and sometimes we 
get results more in spite of what we've done 
than because of it.  That of course increases 
our confidence and encourages us to repeat 
mistakes in contexts where we might not be so lucky.



Spencer



Wise!

Heinz



Why the results from two programs are different?

Different parametrizations. If I had to guess I would bet that the gender
coefficient is R is exactly twice that of the one from SPSS. They are
probably both correct in the context of their respective codings.

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] value.labels

2011-08-11 Thread Heinz Tuechler

At 11.08.2011 21:50 +0300, Zeki Çatav wrote:

Prş, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı:

 On 11.08.2011 19:22, David Winsemius wrote:
 
  On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote:
 
 
 
  On 11.08.2011 16:10, zcatav wrote:
  Hello R people,
 
  I have a data.frame. Status variable has 3 values. 0-alive,
  1-dead and
  2-missed..
 .
 As I understood the question, just how to rename the levels was the
 original question.

 Uwe

I don't want to rename levels or converting from numeric to string. I
want to add each corresponding levels value, a label, as in SPSS.
Level 0 labeled with alive,
level 1 labeled with dead and
level 2 labeled with missed.


This is not possible with a factor, because 
factor levels can only be positive integers.


Heinz




--
Zeki Çatav


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] value.labels

2011-08-11 Thread Heinz Tuechler

At 12.08.2011 09:11 +1200, Rolf Turner wrote:

On 12/08/11 09:59, Heinz Tuechler wrote:

At 11.08.2011 21:50 +0300, Zeki Çatav wrote:

Prş, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı:

 On 11.08.2011 19:22, David Winsemius wrote:
 
  On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote:
 
 
 
  On 11.08.2011 16:10, zcatav wrote:
  Hello R people,
 
  I have a data.frame. Status variable has 3 values. 0-alive,
  1-dead and
  2-missed..
 .
 As I understood the question, just how to rename the levels was the
 original question.

 Uwe

I don't want to rename levels or converting from numeric to string. I
want to add each corresponding levels value, a label, as in SPSS.
Level 0 labeled with alive,
level 1 labeled with dead and
level 2 labeled with missed.


This is not possible with a factor, because 
factor levels can only be positive integers.


That is just plain (ridiculously) wrong. RTFM.

cheers,

Rolf Turner


So, how would you construct a factor with levels 
0, 1, 2 and labels alive, dead, and missed, as the original post asked for?


Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] value.labels

2011-08-11 Thread Heinz Tuechler

At 12.08.2011 11:05 +1200, Rolf Turner wrote:

On 12/08/11 11:34, Heinz Tuechler wrote:

At 12.08.2011 09:11 +1200, Rolf Turner wrote:

On 12/08/11 09:59, Heinz Tuechler wrote:

At 11.08.2011 21:50 +0300, Zeki Çatav wrote:

Prş, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı:

 On 11.08.2011 19:22, David Winsemius wrote:
 
  On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote:
 
 
 
  On 11.08.2011 16:10, zcatav wrote:
  Hello R people,
 
  I have a data.frame. Status variable has 3 values. 0-alive,
  1-dead and
  2-missed..
 .
 As I understood the question, just how to rename the levels was the
 original question.

 Uwe

I don't want to rename levels or converting from numeric to string. I
want to add each corresponding levels value, a label, as in SPSS.
Level 0 labeled with alive,
level 1 labeled with dead and
level 2 labeled with missed.


This is not possible with a factor, because 
factor levels can only be positive integers.


That is just plain (ridiculously) wrong. RTFM.

cheers,

Rolf Turner


So, how would you construct a factor with 
levels 0, 1, 2 and labels alive, dead, and 
missed, as the original post asked for?


Heinz


As I said, RTFM. But for completeness:

x - sample(0:2,100,TRUE)
y - factor(x,labels=c(alive,dead,missed))

Duhhh.

cheers,

Rolf Turner


Maybe you would like to look at the structure.

str(y)
Factor w/ 3 levels alive,dead,..: 3 1 2 2 3 3 2 1 1 1 ...

or

dput(y)
structure(c(3L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 3L,
1L, 2L, 3L, 2L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L,
1L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 3L,
2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 2L, 1L, 3L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 3L, 2L,
3L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 1L, 1L, 1L,
2L, 3L, 3L, 2L, 1L, 2L, 3L), .Label = c(alive, dead, missed
), class = factor)


Anything else but positive integers?

Heinz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] value.labels

2011-08-11 Thread Heinz Tuechler

At 12.08.2011 01:53 +0100, Heinz Tuechler wrote:

At 12.08.2011 11:05 +1200, Rolf Turner wrote:

On 12/08/11 11:34, Heinz Tuechler wrote:

At 12.08.2011 09:11 +1200, Rolf Turner wrote:

On 12/08/11 09:59, Heinz Tuechler wrote:

At 11.08.2011 21:50 +0300, Zeki Çatav wrote:

Prş, 2011-08-11 tarihinde 19:27 +0200 saatinde, Uwe Ligges yazdı:

 On 11.08.2011 19:22, David Winsemius wrote:
 
  On Aug 11, 2011, at 11:42 AM, Uwe Ligges wrote:
 
 
 
  On 11.08.2011 16:10, zcatav wrote:
  Hello R people,
 
  I have a data.frame. Status variable has 3 values. 0-alive,
  1-dead and
  2-missed..
 .
 As I understood the question, just how to rename the levels was the
 original question.

 Uwe

I don't want to rename levels or converting from numeric to string. I
want to add each corresponding levels value, a label, as in SPSS.
Level 0 labeled with alive,
level 1 labeled with dead and
level 2 labeled with missed.


This is not possible with a factor, because 
factor levels can only be positive integers.


That is just plain (ridiculously) wrong. RTFM.

cheers,

Rolf Turner


So, how would you construct a factor with 
levels 0, 1, 2 and labels alive, dead, and 
missed, as the original post asked for?


Heinz


As I said, RTFM. But for completeness:

x - sample(0:2,100,TRUE)
y - factor(x,labels=c(alive,dead,missed))

Duhhh.

cheers,

Rolf Turner


Maybe you would like to look at the structure.

str(y)
Factor w/ 3 levels alive,dead,..: 3 1 2 2 3 3 2 1 1 1 ...

or

dput(y)
structure(c(3L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 3L,
1L, 2L, 3L, 2L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L, 2L, 1L,
1L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 3L,
2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 2L, 1L, 3L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 2L, 3L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 3L, 2L,
3L, 1L, 3L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 3L, 2L, 2L, 1L, 1L, 1L,
2L, 3L, 3L, 2L, 1L, 2L, 3L), .Label = c(alive, dead, missed
), class = factor)


Anything else but positive integers?

Heinz


To be fair, you can construct a factor containing zeros.

b - c(0L,0L,1L,1L,1L,2L,2L,2L,2L)
 b
[1] 0 0 1 1 1 2 2 2 2
 str(b)
 int [1:9] 0 0 1 1 1 2 2 2 2
 table(b)
b
0 1 2
2 3 4
 levels(b) - letters[1:3]
 str(b)
 atomic [1:9] 0 0 1 1 1 2 2 2 2
 - attr(*, levels)= chr [1:3] a b c
 class(b) - 'factor'
 str(b)
 Factor w/ 3 levels a,b,c: 0 0 1 1 1 2 2 2 2

But, if you print it, you get a warning.

 b
[1] a a a b b b b a a
Levels: a b c
Warning message:
In xx[] - as.character(x) :
  number of items to replace is not a multiple of replacement length

And table() gives a wrong result.
 table(b)
b
a b c
3 4 0


If you take a numeric, not explicitly integer vector, you are less lucky.

c - c(0,0,1,1,1,2,2,2,2)
 str(c)
 num [1:9] 0 0 1 1 1 2 2 2 2
 levels(c) - letters[1:3]
 str(c)
 atomic [1:9] 0 0 1 1 1 2 2 2 2
 - attr(*, levels)= chr [1:3] a b c

Assigning class factor is rejected with an error.
 class(c) - 'factor'
Error in class(c) - factor :
  adding class factor to an invalid object


Heinz


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] retain class after merge

2011-08-19 Thread Heinz Tuechler

Dear All,

is there a simple way to retain the class attribute of a column, if 
merging two data.frames?
When merging the example data.frames form help(merge) I am unable to 
keep the class attribute as set before merging (see below).
Two columns are assigned new classes before merge (myclass1, 
myclass2), but after merge the resulting column has class character.


best regards,

Heinz


## use character columns of names to get sensible sort order
authors - data.frame(
surname = I(c(Tukey, Venables, Tierney, Ripley, McNeil)),
nationality = c(US, Australia, US, UK, Australia),
deceased = c(yes, rep(no, 4)))
books - data.frame(
name = I(c(Tukey, Venables, Tierney,
 Ripley, Ripley, McNeil, R Core)),
title = c(Exploratory Data Analysis,
  Modern Applied Statistics ...,
  LISP-STAT,
  Spatial Statistics, Stochastic Simulation,
  Interactive Data Analysis,
  An Introduction to R),
other.author = c(NA, Ripley, NA, NA, NA, NA,
 Venables  Smith))

class(authors$surname) - 'myclass1'
class(books$name) - 'myclass2'
(m1 - merge(authors, books, by.x = surname, by.y = name))
class(m1$surname)

[1] character

 sessionInfo()
R version 2.13.1 Patched (2011-08-08 r56671)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Feature request: rating/review system for R packages

2011-03-20 Thread Heinz Tuechler
It's unclear to me, why the rating/review system should relate to 
entire packages. Would it not be more informative, if single specific 
functions would be rated and reviewed?
I would like to see if + is rated better than -, or if more 
difficulties are reported for * than for /. I could then consider 
in the future to prefer sums over differences.


best,
Heinz

At 20.03.2011 19:03 +, Ben Bolker wrote:

Dieter Menne dieter.menne at menne-biomed.de writes:



  After pondering all the pros and cons regarding the usefulness of a
  rating/review system for R packages, don't you think it would make sense
  to implement such a thing?
 

 Or to look what is there, and how little it is filled:

 http://crantastic.org/

 Dieter

  If I were feeling a little more ambitious, I would write a contributed
popularity contest package (cf. http://lwn.net/Articles/75753/,
http://popcon.debian.org/) that did the following:

  * recorded information on a user's configuration and installed packages
and reported it *somewhere* (web server, etc.; R has plenty of communications
facilities built in)

  for more intrusive but complete information:

  * gave users an option to install a `hook' that would report at some
interval (regular? random?) which packages were actually loaded
(on Unix-alike machines one might be able to use the 'atime' feature
to guess when a package was *last* loaded even if it wasn't currently
in use)
  * gave users an option to contribute further information (country,
research field, etc.)
  * might pop up a window showing installed packages and offering users the
option to comment or to give ratings to particularly good or bad 
packages, which

would be sent to wherever ...

  This would be completely optional, but *if* word got around it
could collect a useful (albeit completely statistically unsound)
set of information.

  *If* I were writing this I would (a) be very clear in the package
description etc etc what information would be collected and stored,
where, and how it would be used; (b) carefully think about the tradeoffs
between annoying users and collecting more information; (c) consult
with the fine folks running CRANtastic to see if they wanted to somehow
integrate it into their infrastructure.

  The big advantage of this approach is that you don't need to convince
anyone from R-core to do anything, you just need to convince users to
install your package.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why do we have to turn factors into characters for various functions?

2010-12-12 Thread Heinz Tuechler

At 12.12.2010 00:48 +0200, Tal Galili wrote:

Hello dear R-help mailing list,

My question is *not* about how factors are implemented in R (which is, if I
understand correctly, that factors keeps numbers and assign levels to them).
My question *is* about why so many functions that work on factors don't
treat them as characters by default?

Here are two simple examples:
Example one turning the characters inside a factor into numeric:

x - factor(4:6)
as.numeric(x) # output: 1 2 3
as.numeric(as.character(x)) # output: 4 5 6  # isn't this what we wanted?


Example two, using strsplit on a factor:

x - factor(paste(letters[4:6], 4:6, sep=A))
strsplit(x, A) # will result in an error:  # Error in strsplit(x, A) :
non-character argument
strsplit(as.character(x), A) # will work and split


So what is the reason this is the case?
Is it that implementing a switch of factors to characters as the default in
some of the basic function will cause old code to break?
Is it a better design in some other way?

I am curious to know the reason for this.


In my view the answer can be found implicitly in the language definition.

Factors are currently implemented using an integer array to specify 
the actual levels and a second array of names that are mapped to the 
integers. Rather unfortunately users often make use of the 
implementation in order to make some calculations easier.


It is the unfortunate use of factors that seems generally accepted, 
even if the language definition continues:


This, however, is an implementation issue and is not guaranteed to 
hold in all implementations of R.


Personally, like some others, I avoid factors, except in cases, where 
they represent a statistical concept.


Certainly I would agree with you that, if only reading the R 
Language Definition and not the documentation of the function 
factor, one would rather expect functions like as.numeric or strsplit 
to operate on the levels of a factor and not on the underlying, 
implementation specific, integer array.


Heinz




Thank you for your reading,
Tal

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why do we have to turn factors into characters for various functions?

2010-12-13 Thread Heinz Tuechler

Hello Petr,

don't want to convince you. If you like the following:

x - factor(1:4, labels=c(one, two, three, four))

y - factor(3:5, labels=c(three, four, five))

data.frame(character=c(as.character(x), as.character(y)), numeric=c(x, y))

  character numeric
1   one   1
2   two   2
3 three   3
4  four   4
5 three   1
6  four   2
7  five   3

For me the behaviour of character vectors is easier to follow and 
less errror prone.


cx - c(one, two, three, four)

cy - c(three, four, five)

c(cx, cy)

[1] one   two   three four  three four  five



Anyway it is maybe more about personal habits than about bad factor
features


I agree with you regarding personal habits. It's not the features of 
factors. For me it's the rather inconsistent use in functions like 
c() or print().
If you print a factor, you see it's levels, but if you combine it 
using c(), you combine the famouse implementation specific underlying 
integer vector.


best regards,

Heinz

At 13.12.2010 08:50 +0100, Petr PIKAL wrote:

Hi

r-help-boun...@r-project.org napsal dne 12.12.2010 21:00:37:

 At 12.12.2010 00:48 +0200, Tal Galili wrote:
 Hello dear R-help mailing list,
 
 My question is *not* about how factors are implemented in R (which is,
if I
 understand correctly, that factors keeps numbers and assign levels to
them).
 My question *is* about why so many functions that work on factors don't
 treat them as characters by default?
 
 Here are two simple examples:
 Example one turning the characters inside a factor into numeric:
 
 x - factor(4:6)
 as.numeric(x) # output: 1 2 3
 as.numeric(as.character(x)) # output: 4 5 6  # isn't this what we
wanted?
 
 
 Example two, using strsplit on a factor:
 
 x - factor(paste(letters[4:6], 4:6, sep=A))
 strsplit(x, A) # will result in an error:  # Error in strsplit(x,
A) :
 non-character argument
 strsplit(as.character(x), A) # will work and split
 
 
 So what is the reason this is the case?
 Is it that implementing a switch of factors to characters as the
default in
 some of the basic function will cause old code to break?
 Is it a better design in some other way?
 
 I am curious to know the reason for this.

 In my view the answer can be found implicitly in the language
definition.

 Factors are currently implemented using an integer array to specify
 the actual levels and a second array of names that are mapped to the
 integers. Rather unfortunately users often make use of the
 implementation in order to make some calculations easier.

 It is the unfortunate use of factors that seems generally accepted,
 even if the language definition continues:

 This, however, is an implementation issue and is not guaranteed to
 hold in all implementations of R.

 Personally, like some others, I avoid factors, except in cases, where
 they represent a statistical concept.

On contrary I find factors quite useful. Consider possibility to change
its levels

 set.seed(111)
 x - factor(sample(1:4, 20, replace=T), labels=c(one, two, three,
four))
 x
 [1] three three two   three two   two   one   three two   one   three
three
[13] one   one   one   two   one   four  two   three
Levels: one two three four
 levels(x)[3:4] - more
 x
 [1] more more two  more two  two  one  more two  one  more more one  one
one
[16] two  one  more two  more
Levels: one two more

I believe that if x is character, it can be also done but factor way seems
to me more convenient. I also use point distinction in plots by
pch=as.numeric(some.factor) quite often.

Anyway it is maybe more about personal habits than about bad factor
features

Regards
Petr


 Certainly I would agree with you that, if only reading the R
 Language Definition and not the documentation of the function
 factor, one would rather expect functions like as.numeric or strsplit
 to operate on the levels of a factor and not on the underlying,
 implementation specific, integer array.

 Heinz



 Thank you for your reading,
 Tal
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
|
 www.r-statistics.com (English)

--- 
---

 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__

[R] survexp - unable to reproduce example

2010-12-20 Thread Heinz Tuechler

Dear All,

when I try to reproduce an example of survexp, taken from the help 
page of survdiff, I receive the error message

Error in floor(temp) : Non-numeric argument to mathematical function
.
It seems to come from match.ratetable. I think, it has to do with 
character variables in a ratetable.
I would be interested to know, if it works for others. With an older 
version of survival, it worked well.


best regards,

Heinz

library(survival)
Loading required package: splines
 ## Example from help page of survdiff
 ## Expected survival for heart transplant patients based on
 ## US mortality tables
 expect - survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white), jasa, cohort=FALSE,
+  ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function
 sessionInfo('survival')
R version 2.12.1 Patched (2010-12-18 r53869)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
character(0)

other attached packages:
[1] survival_2.36-2

loaded via a namespace (and not attached):
[1] base_2.12.1  graphics_2.12.1  grDevices_2.12.1 methods_2.12.1
[5] splines_2.12.1   stats_2.12.1 tools_2.12.1 utils_2.12.1
 traceback()
2: match.ratetable(rdata, ratetable)
1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1,
   year = accept.dt, race = white), jasa, cohort = FALSE,
   ratetable = survexp.usr)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survexp - example produces error

2010-12-31 Thread Heinz Tuechler

Dear All,

reposting, because I did not find a solution, maybe someone could 
check the example below.


It's taken from the help page of survdiff. Executing it, gives the error

Error in floor(temp) : Non-numeric argument to mathematical function

best regards,

Heinz

library(survival)

## Example from help page of survdiff
## Expected survival for heart transplant patients based on
## US mortality tables
expect -
  survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
 sex=1,year=accept.dt,race=white),
  jasa, cohort=FALSE,
  ratetable=survexp.usr)

Error in floor(temp) : Non-numeric argument to mathematical function

sessionInfo('survival')

R version 2.12.1 Patched (2010-12-18 r53869)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
character(0)

other attached packages:
[1] survival_2.36-2

loaded via a namespace (and not attached):
[1] base_2.12.1  graphics_2.12.1  grDevices_2.12.1 methods_2.12.1
[5] splines_2.12.1   stats_2.12.1 tools_2.12.1 utils_2.12.1
 traceback()
2: match.ratetable(rdata, ratetable)
1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1,
   year = accept.dt, race = white), jasa, cohort = FALSE,
   ratetable = survexp.usr)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survexp - example produces error

2010-12-31 Thread Heinz Tuechler

Thank you, Peter

after setting options(error=recover), see the output below, once for 
frame number 2, which I suspect to be the problem, once for frame number 1.


Heinz

 expect -
+   survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white),
+   jasa, cohort=FALSE,
+   ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function

Enter a frame number, or 0 to exit

1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac
2: match.ratetable(rdata, ratetable)

Selection: 2
Called from: top level
Browse[1] temp
  [1] white white white white white white white white white
 [10] white white white white white white white white white
 [19] white white white white white white white white white
 [28] white white white white white white white white white
 [37] white white white white white white white white white
 [46] white white white white white white white white white
 [55] white white white white white white white white white
 [64] white white white white white white white white white
 [73] white white white white white white white white white
 [82] white white white white white white white white white
 [91] white white white white white white white white white
[100] white white white white
Browse[1] Q


There is also 'temp' in frame number 1.

 expect -
+   survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white),
+   jasa, cohort=FALSE,
+   ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function

Enter a frame number, or 0 to exit

1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac
2: match.ratetable(rdata, ratetable)

Selection: 1
Called from: top level
Browse[1] temp
 [1]   495   15   38   172  674   39   84   57  1527 
80 13860

[16]  307   35   42   36   27 1031   50  732  218 1799 1400  262   71   34  851
[31]   76 1586 1571   11   99   654   52 1407 13211   44  9958 1141
[46]  979  284  101  187   60  941  148  342  915   67   68  841  583   77   31
[61]  669   29  619  595   89   16  544   20  514   95  481  444  427   79  333
[76]  396  109  369  206  185  339  264  164  179  130  108   30   10
Browse[1] Q



At 31.12.2010 13:46 +0100, peter dalgaard wrote:


On Dec 31, 2010, at 10:21 , Heinz Tuechler wrote:

 Dear All,

 reposting, because I did not find a solution, maybe someone could 
check the example below.


 It's taken from the help page of survdiff. Executing it, gives the error

 Error in floor(temp) : Non-numeric argument to mathematical function

Hmm, it's not happening to me (Mac OSX) either with 2.12.1 or the 
current R-patched (r53892). Could be a platform issue (sounds 
unlikely), a local user issue, or a locale one.


Could you set options(error=recover) and find out what is the value 
of temp when the error occurs?



 best regards,

 Heinz

 library(survival)

 ## Example from help page of survdiff
 ## Expected survival for heart transplant patients based on
 ## US mortality tables
 expect -
  survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
 sex=1,year=accept.dt,race=white),
  jasa, cohort=FALSE,
  ratetable=survexp.usr)

 Error in floor(temp) : Non-numeric argument to mathematical function

 sessionInfo('survival')

 R version 2.12.1 Patched (2010-12-18 r53869)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
 [5] LC_TIME=German_Switzerland.1252

 attached base packages:
 character(0)

 other attached packages:
 [1] survival_2.36-2

 loaded via a namespace (and not attached):
 [1] base_2.12.1  graphics_2.12.1  grDevices_2.12.1 methods_2.12.1
 [5] splines_2.12.1   stats_2.12.1 tools_2.12.1 utils_2.12.1
  traceback()
 2: match.ratetable(rdata, ratetable)
 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1,
   year = accept.dt, race = white), jasa, cohort = FALSE,
   ratetable = survexp.usr)
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survexp - example produces error

2010-12-31 Thread Heinz Tuechler

Dear Peter, Dear All,

a further attempt led me to an answer. If I set 
options(stringsAsFactors=TRUE), which I usually have set to FALSE, no 
error occurs.

I am, however not happy with this solution.

Heinz

Thank you, Peter

after setting options(error=recover), see the output below, once for 
frame number 2, which I suspect to be the problem, once for frame number 1.


Heinz

 expect -
+   survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white),
+   jasa, cohort=FALSE,
+   ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function

Enter a frame number, or 0 to exit

1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac
2: match.ratetable(rdata, ratetable)

Selection: 2
Called from: top level
Browse[1] temp
  [1] white white white white white white white white white
 [10] white white white white white white white white white
 [19] white white white white white white white white white
 [28] white white white white white white white white white
 [37] white white white white white white white white white
 [46] white white white white white white white white white
 [55] white white white white white white white white white
 [64] white white white white white white white white white
 [73] white white white white white white white white white
 [82] white white white white white white white white white
 [91] white white white white white white white white white
[100] white white white white
Browse[1] Q


There is also 'temp' in frame number 1.

 expect -
+   survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white),
+   jasa, cohort=FALSE,
+   ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function

Enter a frame number, or 0 to exit

1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac
2: match.ratetable(rdata, ratetable)

Selection: 1
Called from: top level
Browse[1] temp
 [1]   495   15   38   172  674   39   84   57  1527 
80 13860

[16]  307   35   42   36   27 1031   50  732  218 1799 1400  262   71   34  851
[31]   76 1586 1571   11   99   654   52 1407 13211   44  9958 1141
[46]  979  284  101  187   60  941  148  342  915   67   68  841  583   77   31
[61]  669   29  619  595   89   16  544   20  514   95  481  444  427   79  333
[76]  396  109  369  206  185  339  264  164  179  130  108   30   10
Browse[1] Q



At 31.12.2010 13:46 +0100, peter dalgaard wrote:


On Dec 31, 2010, at 10:21 , Heinz Tuechler wrote:

 Dear All,

 reposting, because I did not find a solution, maybe someone could 
check the example below.


 It's taken from the help page of survdiff. Executing it, gives the error

 Error in floor(temp) : Non-numeric argument to mathematical function

Hmm, it's not happening to me (Mac OSX) either with 2.12.1 or the 
current R-patched (r53892). Could be a platform issue (sounds 
unlikely), a local user issue, or a locale one.


Could you set options(error=recover) and find out what is the value 
of temp when the error occurs?



 best regards,

 Heinz

 library(survival)

 ## Example from help page of survdiff
 ## Expected survival for heart transplant patients based on
 ## US mortality tables
 expect -
  survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
 sex=1,year=accept.dt,race=white),
  jasa, cohort=FALSE,
  ratetable=survexp.usr)

 Error in floor(temp) : Non-numeric argument to mathematical function

 sessionInfo('survival')

 R version 2.12.1 Patched (2010-12-18 r53869)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
 [5] LC_TIME=German_Switzerland.1252

 attached base packages:
 character(0)

 other attached packages:
 [1] survival_2.36-2

 loaded via a namespace (and not attached):
 [1] base_2.12.1  graphics_2.12.1  grDevices_2.12.1 methods_2.12.1
 [5] splines_2.12.1   stats_2.12.1 tools_2.12.1 utils_2.12.1
  traceback()
 2: match.ratetable(rdata, ratetable)
 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1,
   year = accept.dt, race = white), jasa, cohort = FALSE,
   ratetable = survexp.usr)
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r

Re: [R] survexp - example produces error

2010-12-31 Thread Heinz Tuechler

Follow up:

The critical line seems to be in survexp around line 97

rdata - data.frame(eval(rcall, m))

If changed to:

old.stringsAsFactors - options()$stringsAsFactors
options(stringsAsFactors=TRUE)
rdata - data.frame(eval(rcall, m))  ### - seems to be critical
options(stringsAsFactors=old.stringsAsFactors)

it seems to work.

Heinz

At 31.12.2010 15:53 +0100, Heinz Tuechler wrote:

Dear Peter, Dear All,

a further attempt led me to an answer. If I set 
options(stringsAsFactors=TRUE), which I usually have set to FALSE, 
no error occurs.

I am, however not happy with this solution.

Heinz

Thank you, Peter

after setting options(error=recover), see the output below, once for 
frame number 2, which I suspect to be the problem, once for frame number 1.


Heinz

 expect -
+   survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white),
+   jasa, cohort=FALSE,
+   ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function

Enter a frame number, or 0 to exit

1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac
2: match.ratetable(rdata, ratetable)

Selection: 2
Called from: top level
Browse[1] temp
  [1] white white white white white white white white white
 [10] white white white white white white white white white
 [19] white white white white white white white white white
 [28] white white white white white white white white white
 [37] white white white white white white white white white
 [46] white white white white white white white white white
 [55] white white white white white white white white white
 [64] white white white white white white white white white
 [73] white white white white white white white white white
 [82] white white white white white white white white white
 [91] white white white white white white white white white
[100] white white white white
Browse[1] Q


There is also 'temp' in frame number 1.

 expect -
+   survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
+  sex=1,year=accept.dt,race=white),
+   jasa, cohort=FALSE,
+   ratetable=survexp.usr)
Error in floor(temp) : Non-numeric argument to mathematical function

Enter a frame number, or 0 to exit

1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1, year = ac
2: match.ratetable(rdata, ratetable)

Selection: 1
Called from: top level
Browse[1] temp
 [1]   495   15   38   172  674   39   84   57  1527 
80 13860
[16]  307   35   42   36   27 1031   50  732  218 1799 
1400  262   71   34  851
[31]   76 1586 1571   11   99   654   52 1407 
13211   44  9958 1141
[46]  979  284  101  187   60  941  148  342  915   67   68  841 
583   77   31
[61]  669   29  619  595   89   16  544   20  514   95  481  444 
427   79  333

[76]  396  109  369  206  185  339  264  164  179  130  108   30   10
Browse[1] Q



At 31.12.2010 13:46 +0100, peter dalgaard wrote:


On Dec 31, 2010, at 10:21 , Heinz Tuechler wrote:

 Dear All,

 reposting, because I did not find a solution, maybe someone 
could check the example below.


 It's taken from the help page of survdiff. Executing it, gives the error

 Error in floor(temp) : Non-numeric argument to mathematical function

Hmm, it's not happening to me (Mac OSX) either with 2.12.1 or the 
current R-patched (r53892). Could be a platform issue (sounds 
unlikely), a local user issue, or a locale one.


Could you set options(error=recover) and find out what is the value 
of temp when the error occurs?



 best regards,

 Heinz

 library(survival)

 ## Example from help page of survdiff
 ## Expected survival for heart transplant patients based on
 ## US mortality tables
 expect -
  survexp(futime ~ ratetable(age=(accept.dt - birth.dt),
 sex=1,year=accept.dt,race=white),
  jasa, cohort=FALSE,
  ratetable=survexp.usr)

 Error in floor(temp) : Non-numeric argument to mathematical function

 sessionInfo('survival')

 R version 2.12.1 Patched (2010-12-18 r53869)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
 [5] LC_TIME=German_Switzerland.1252

 attached base packages:
 character(0)

 other attached packages:
 [1] survival_2.36-2

 loaded via a namespace (and not attached):
 [1] base_2.12.1  graphics_2.12.1  grDevices_2.12.1 methods_2.12.1
 [5] splines_2.12.1   stats_2.12.1 tools_2.12.1 utils_2.12.1
  traceback()
 2: match.ratetable(rdata, ratetable)
 1: survexp(futime ~ ratetable(age = (accept.dt - birth.dt), sex = 1,
   year = accept.dt, race = white), jasa, cohort = FALSE,
   ratetable = survexp.usr)
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org

Re: [R] CSV value not being read as it appears

2011-01-14 Thread Heinz Tuechler

At 14.01.2011 07:09 -0800, Peter Ehlers wrote:

On 2011-01-14 02:09, bgr...@dyson.brisnet.org.au wrote:

Brian,

Thanks. My response to David follows. I should add that this problem has
never occurred previously as far as I know (I have now checked the
previous report I was sent):


This problem occurs to me frequently. Like Philipp and David,
I too always check imported categorical variables. The worst
cases are trailing spaces (in quoted text).



These are still the best worst cases. My favourite worst cases 
are entries like 5-10 or similar that are trasformed into dates, 
e.g. 05Oct2011. My problem is, however that I don't know any other 
universally known format to exchange data with a medical  colleague 
or with a social scientist.


Heinz



It is hardly R's fault that Excel users routinely commit
crimes against data.

Peter Ehlers


Hello David,

Thanks for your e-mail. The data was a report derived from a statewide
database, saved in EXCEL format, so the usual issue of the vagaries of
human data entry variation wasn't the issue as the data was an automated
report, which is run every three months. I would not have even noticed
this problem if I hadn't been double checking the numbers of people by
district. Visual inspection didn't reveal this problem - no white space
was obvious and the spelling was identical. Tabulation via R wouldn't have
detected this - I was obtaining the EXCEL totals via filter which I then
compared with R output. I'm hoping I can skip this step, in future, with
Jim's suggestion.

regards

Bob







On Fri, 14 Jan 2011, David Scott wrote:


As a further note, this is a reminder that whenever you get data via
a spreadsheet the first thing to do is examine it and clean up any
problems. A basic requirement is to tabulate any categorical
variable. Spreadsheets allow any sort of data to be entered, with no
controls. My experience is that those who enter data into
spreadsheets enter all sorts of variations of what a human would
wish to treat as the same (Open, Open , open, etc.), even when
told not to.


Another common problem is that they enter characters such as
non-breaking space or zero-width characters: we added support for
known encodings of NBSP to strip.white about five years ago.



David Scott

On 14/01/2011 4:03 p.m., Jim Holtman wrote:

try strip.white=TRUE to strip out white space

Sent from my iPad

On Jan 13, 2011, at 21:44, bgr...@dyson.brisnet.org.au wrote:



I have a frustrating issue which I am hoping someone may have a
suggestion
about.

I am running XP and R 2.12.0 and saved an EXCEL file that I was sent
as a
csv file.

The initial code I ran follows.

dec- read.csv(g://FMH/FO30122010.csv,header=T)
dec.open- subset (dec, Status == Open)
table(dec.open$AMHS)

I was checking the output and noticed a difference between my manual
count
and R output. Two subject's rows were not being detected by the subset
command:

For the AMHS where there was a discrepancy I then ran:
wm- subset (dec, AMHS == WM)

The problem appears to be that there is a space before the 'Open
value
for two indivduals, as per the example below.

10/02/2010  Open
22/08/2007   Open

Checking in EXCEL there does not appear to be a space and the format
is
the same (e.g 'general').  I resolved the problem by copying over the
values for the two individuals where I identified  a problem.

Given this problem was not detected by visual scanning I would
appreciate
advice on how this problem can be detected in future without my having
to
manually check raw data against R output.

Any assistance is appreciated,

Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, 

[R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

2013-10-29 Thread Heinz Tuechler

Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the 
following timings:


R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
   user  system elapsed
  62.040.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
   user  system elapsed
 388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R 
version 2.15.2?


best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec -
  c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
  df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
  dump('df0', file='testdump')
  cat('length:', i, '\n')
  print(system.time(source('testdump', keep.source = FALSE,
   encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):

sessionInfo()

R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

sample.vec -
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 
'the',

+ 'named', 'file', 'or', 'URL', 'or', 'connection')

dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {

+   df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+encoding='')))
+ }
length: 10
   user  system elapsed
  0   0   0
length: 100
   user  system elapsed
  0   0   0
length: 1000
   user  system elapsed
  0   0   0
length: 1
   user  system elapsed
   0.020.000.01
length: 1e+05
   user  system elapsed
   0.210.000.20
length: 1e+06
   user  system elapsed
   4.470.044.51
length: 1e+07
   user  system elapsed
  62.040.22   62.26





output for R version 3.0.2 Patched (2013-10-27 r64116):

sessionInfo()

R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

sample.vec -
+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 
'the',

+ 'named', 'file', 'or', 'URL', 'or', 'connection')

dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {

+   df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+encoding='')))
+ }
length: 10
   user  system elapsed
  0   0   0
length: 100
   user  system elapsed
  0   0   0
length: 1000
   user  system elapsed
  0   0   0
length: 1
   user  system elapsed
   0.010.000.01
length: 1e+05
   user  system elapsed
   0.360.060.42
length: 1e+06
   user  system elapsed
   6.021.867.88
length: 1e+07
   user  system elapsed
 388.63  176.42  566.41






--
Heinz Tüchler +4317146261 / +436605653878

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

2013-10-30 Thread Heinz Tuechler
All was run on the identical machine in independent sessions. I did not 
restart Windows. I also tried 32bit R 3.0.2 and it seemed slightly 
faster than 64bit.
Using Process Explorer v15.23 
(http://technet.microsoft.com/de-de/sysinternals/bb896653) my impression 
was that R 3.0.2 manages memory in a different way than R 2.15.2. While 
in R 2.15.2 the physical memory used grows steadily, when sourcing a big 
file, in R 3.0.2 growth and shrinking cycle.


best,
Heinz

on/am 30.10.2013 13:28, Carl Witthoft wrote/hat geschrieben:

Did you run the identical code on the identical machine, and did you verify
there were no other tasks running which might have limited the RAM available
to R?  And equally important, did you run these tests in the reverse order
(in case R was storing large objects from the first run, thus chewing up
RAM)?



Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
 user  system elapsed
62.040.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
 user  system elapsed
   388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec -
c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
  'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
dump('df0', file='testdump')
cat('length:', i, '\n')
print(system.time(source('testdump', keep.source = FALSE,
 encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):

sessionInfo()

R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

sample.vec -

+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+ 'named', 'file', 'or', 'URL', 'or', 'connection')

dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {

+   df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+encoding='')))
+ }
length: 10
 user  system elapsed
0   0   0
length: 100
 user  system elapsed
0   0   0
length: 1000
 user  system elapsed
0   0   0
length: 1
 user  system elapsed
 0.020.000.01
length: 1e+05
 user  system elapsed
 0.210.000.20
length: 1e+06
 user  system elapsed
 4.470.044.51
length: 1e+07
 user  system elapsed
62.040.22   62.26





output for R version 3.0.2 Patched (2013-10-27 r64116):

sessionInfo()

R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

sample.vec -

+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+ 'named', 'file', 'or', 'URL', 'or', 'connection')

dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {

+   df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+encoding='')))
+ }
length: 10
 user  system elapsed
0   0   0
length: 100
 user  system elapsed
0   0   0
length: 1000
 user  system elapsed
0   0   0
length: 1
 user  system elapsed
 0.010.000.01
length: 1e+05
 user  system elapsed
 0.360.060.42
length: 1e+06
 user  system elapsed
 6.021.867.88
length: 1e+07
 user  system elapsed
   388.63  176.42  566.41








--
View this message in context: 
http://r.789695.n4.nabble.com/big-speed-difference-in-source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__

Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

2013-10-30 Thread Heinz Tuechler
Best thanks for confirming my impression. I use dump for storing large 
data.frames with a number of attributes for each variable. save/load is 
much faster, but I am unsure, if such files will be readable by R 
versions years later.
What format/functions would you suggest for data storage/transfer 
between different (future) R versions?


best regards,
Heinz

on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:

I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by source())
when it is parsing long vectors of numeric data.  dump/source has never been an 
efficient
way of transferring data between different R session, but it is much worse
now for long vectors.   In 2.15.2 doubling the size of the vector (of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. 2.1.
In 3.0.2 that factor is more like 4.4.

n elapsed-2.15.2 elapsed-3.0.2
 2048  0.003 0.018
 4096  0.006 0.065
 8192  0.013 0.254
16384  0.025 1.067
32768  0.050 4.114
65536  0.10016.236
   131072  0.21966.013
   262144  0.808   291.883
   524288  2.022  1285.265
  1048576  4.918NA
  2097152  9.857NA
  4194304 22.916NA
  8388608 49.671NA
16777216101.042NA
33554432512.719NA

I tried this with 64-bit R on a Linux box.  The NA's represent sizes that did 
not
finish while I was at a 1 1/2 hour dentist's apppointment.  The timing function
was:
   test - function(n = 2^(11:25))
   {
   tf - tempfile()
   on.exit(unlink(tf))
   t(sapply(n, function(n){
   dput(log(seq_len(n)), file=tf)
   print(c(n=n, system.time(parse(file=tf))[1:3]))
   }))
   }

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Carl Witthoft
Sent: Wednesday, October 30, 2013 5:29 AM
To: r-help@r-project.org
Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

Did you run the identical code on the identical machine, and did you verify
there were no other tasks running which might have limited the RAM available
to R?  And equally important, did you run these tests in the reverse order
(in case R was storing large objects from the first run, thus chewing up
RAM)?



Dear All,

is it known that source works much faster in  R 2.15.2 than in R 3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
 user  system elapsed
62.040.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
 user  system elapsed
   388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec -
c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', 'the',
  'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
dump('df0', file='testdump')
cat('length:', i, '\n')
print(system.time(source('testdump', keep.source = FALSE,
 encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):

sessionInfo()

R version 2.15.2 Patched (2012-11-29 r61184)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

sample.vec -

+   c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from',
'the',
+ 'named', 'file', 'or', 'URL', 'or', 'connection')

dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {

+   df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
+   dump('df0', file='testdump')
+   cat('length:', i, '\n')
+   print(system.time(source('testdump', keep.source = FALSE,
+encoding='')))
+ }
length: 10
 user  system elapsed
0   0   0
length: 100
 user  system elapsed
0   0   0
length: 1000
 user  system elapsed
0   0   0
length: 1
 user  system elapsed
 0.020.000.01
length: 1e+05
 user  system elapsed
 0.210.000.20
length: 1e+06
 user  system elapsed
 4.470.044.51
length: 1e+07
 user  system elapsed
62.040.22   62.26





output for R version 3.0.2 Patched (2013-10-27 r64116):

sessionInfo()

R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-w64-mingw32/x64 (64-bit)


Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ?

2013-10-31 Thread Heinz Tuechler

on/am 31.10.2013 09:12, Prof Brian Ripley wrote/hat geschrieben:

On 30/10/2013 21:15, William Dunlap wrote:

I have to defer to others for policy declarations like how long
the current format used by load and save should be readable.


You could also ask how long R will last 

R can still read (but not write) save() formats used in the 1990's.  We
would expect R to be able to read saves since R 1.0.0 for as long as R
exists.  And as R is Open Source, you would be able to compile it and
dump the objects you want for as long as suitable compilers and OSes
exist   And of course R is not the only application which will read
the format.

There is no guarantee that source() will be able to parse dumps from
earlier versions of R, and that has not always been true.

People commenting on parse() speed should note the NEWS for R-devel:

 • The parser has been modified to use less memory.



Thank you for the hint.
It appears to me that source() in R-devel performs at about the same 
speed as in R 2.15.2.


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: Heinz Tuechler [mailto:tuech...@gmx.at]
Sent: Wednesday, October 30, 2013 1:43 PM
To: William Dunlap
Cc: Carl Witthoft; r-help@r-project.org
Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
3.0.2 ?

Best thanks for confirming my impression. I use dump for storing large
data.frames with a number of attributes for each variable. save/load is
much faster, but I am unsure, if such files will be readable by R
versions years later.
What format/functions would you suggest for data storage/transfer
between different (future) R versions?

best regards,
Heinz

on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben:

I see a big 2.15.2/3.0.2 speed difference in parse() (which is used
by source())
when it is parsing long vectors of numeric data.  dump/source has
never been an

efficient

way of transferring data between different R session, but it is much
worse
now for long vectors.   In 2.15.2 doubling the size of the vector
(of lengths
in the range 10^4 to 10^7) makes the time to parse go up by a factor
of c. 2.1.
In 3.0.2 that factor is more like 4.4.

 n elapsed-2.15.2 elapsed-3.0.2
  2048  0.003 0.018
  4096  0.006 0.065
  8192  0.013 0.254
 16384  0.025 1.067
 32768  0.050 4.114
 65536  0.10016.236
131072  0.21966.013
262144  0.808   291.883
524288  2.022  1285.265
   1048576  4.918NA
   2097152  9.857NA
   4194304 22.916NA
   8388608 49.671NA
16777216101.042NA
33554432512.719NA

I tried this with 64-bit R on a Linux box.  The NA's represent sizes
that did not
finish while I was at a 1 1/2 hour dentist's apppointment.  The
timing function
was:
test - function(n = 2^(11:25))
{
tf - tempfile()
on.exit(unlink(tf))
t(sapply(n, function(n){
dput(log(seq_len(n)), file=tf)
print(c(n=n, system.time(parse(file=tf))[1:3]))
}))
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On

Behalf

Of Carl Witthoft
Sent: Wednesday, October 30, 2013 5:29 AM
To: r-help@r-project.org
Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R
3.0.2 ?

Did you run the identical code on the identical machine, and did
you verify
there were no other tasks running which might have limited the RAM
available
to R?  And equally important, did you run these tests in the
reverse order
(in case R was storing large objects from the first run, thus
chewing up
RAM)?



Dear All,

is it known that source works much faster in  R 2.15.2 than in R
3.0.2 ?
In the example below I observe e.g. for a data.frame with 10^7 rows
the
following timings:

R version 2.15.2 Patched (2012-11-29 r61184)
length: 1e+07
  user  system elapsed
 62.040.22   62.26

R version 3.0.2 Patched (2013-10-27 r64116)
length: 1e+07
  user  system elapsed
388.63  176.42  566.41

Is there a way to speed R version 3.0.2 up to the performance of R
version 2.15.2?

best regards,

Heinz Tüchler


example:
sessionInfo()
sample.vec -
 c('source', 'causes', 'R', 'to', 'accept', 'its', 'input',
'from', 'the',
   'named', 'file', 'or', 'URL', 'or', 'connection')
dmp.size - c(10^(1:7))
set.seed(37)

for(i in dmp.size) {
 df0 - data.frame(x=sample(sample.vec, i, replace=TRUE))
 dump('df0', file='testdump')
 cat('length:', i, '\n')
 print(system.time(source('testdump', keep.source = FALSE,
  encoding='')))
}

output for R version 2.15.2 Patched (2012-11-29 r61184):

sessionInfo()

R version 2.15.2 Patched (2012-11-29 r61184

Re: [R] Comparing Cox model with Competing Risk model

2013-03-16 Thread Heinz Tuechler

Dear Terry,

as soon as the vignette is ready, I would be very happy, to know about 
it. Will you send a note to r-help, or will it be announced in some 
other way?


best regards,

Heinz

On 08.03.2013 15:12, Terry Therneau wrote:

-- begin included message --
I have a competing risk data where a patient may die from either AIDS or
Cancer. I want to compare the cox model for each of the event of interest
with a competing risk model. In the competing risk model the cumulative
incidence function is used directly.

-end inclusion ---
  If you do want to persue the Fine-Gray model I would suggest using
software that already exists.  Find the Task Views tab on CRAN, and
follow it to survival and then look at the competing risks section.
There is a lot to offer.  I would trust it more than rolling your own
function.

  As an aside, modeling the subdistribution function is ONE way of
dealing with competing risks, but not everyone thinks that it is the
best way to proceed.  The model corresponds to a biology that I find
unlikely, though it makes for nice math.  Since the alternative is
discussed in a vignette that I haven't-yet-quite-written we won't persue
that any further, however. :-)

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation between categorical data

2015-01-28 Thread Heinz Tuechler

comment inline

David Winsemius wrote on 24.01.2015 21:08:


On Jan 23, 2015, at 5:54 PM, JohnDee wrote:


Heinz Tuechler wrote

At 07:40 21.06.2009, J Dougherty wrote:

[...]

There are other ways of regarding the FET.  Since it is precisely
what it says
- an exact test - you can argue that you should avoid carrying over any
conclusions drawn about the small population the test was applied to and
employing them in a broader context.  In so far as the test is concerned,

the

sample data and the contingency table it is arrayed in are the entire
universe.  In that sense, the FET can't be conservative or liberal.

It

isn't actually a hypothesis test and should not be thought of as one or

used

in the place of one.



JDougherty


Could you give some reference, supporting this, for me, surprising
view? I don't see a necessary connection between an exact test and
the idea that it does not test a hypothesis.

Thanks,
Heinz







Fisher's Exact Test is a nonparametric test.  It tests the distribution in
the contingency table against the total possible arrangements and gives you
the precise likelihood of that many items being arranged in that manner.


That's not the way I understand the construction of the result. The statistic 
gives rather the ratio of the number of permutations as extreme or more extreme 
(as measured by the odds ratio) while holding the marginals constant which is 
then divided by the total number of possible permutations of the data.



  No
more and no less.  You could argue about the greater population from which
your sample is drawn, but FET makes no assumptions at all about any greater
sample universe.


It is conditional on the margins, so that is the description of the universe.


  Also, since the population being used in FET is strictly
limited to the members of the contingency table, the results are a subset of
a finite group of possible results that are relevant to that specific
arrangement of data.  You are not estimating parameters of a parent
population or making any assumptions about the parent distribution.  You can
designate a p value such as 0.05 as a level of significance, but there is
no error term in the FET result.  Fisher stated that the test DOES assume
a null hypothesis of independence to a hypergeometric distribution of the
cell members.  But that creates other issues if you are attempting to use
the results in conjunction with assumptions about a broader sample universe
than that in the test.  For instance you have to carry the assumption of a
hypergeometric distribution over in to the land of reality your sample is
drawn from and you then have to justify that.
In this respect I agree. A real world situation with a universe of fixed 
margins seems unusual to me.




And this is off-topic on Rhelp .
Sorry for asking a question off-topic more than five years ago. A nice 
surprise to get an answer.

Thanks,
Heinz




--
View this message in context: 
http://r.789695.n4.nabble.com/correlation-between-categorical-data-tp888975p4702235.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Probable Error in fmsb package

2015-12-31 Thread Heinz Tuechler



Anindya Sankar Dey wrote/hat geschrieben on/am 30.12.2015 07:35:

Hi All,

The fmsb package has a function called Variance Inflation Factor and it
states the definition of the function as follows:-

"To evaluate multicolinearity of multiple regression model, calculating the
variance inflation factor (VIF) from the result of lm(). If VIF is more
than 10, multicolinearity is strongly suggested.
"

​The function computes VIF of a model as 1/(1-R^2) where R^2 is the
coefficient of determination.

Now nowhere in literature I have come across this definition of VIF, as VIF
is always computed at individual variable level. Though the structure is
almost the same, R^2 in theoretical VIF is the partial correlation
coefficient.

​I only came aware when lots of freshers from non statistics background I
interviewed for analytics position answered that the only definition of VIF
they know is 1/(1 - Coeff. of Determination), and there is a R package
which calculates VIF like that.

After researched I found that such a function indeed exist in fmsb package.

Please help me understand has an alternate definition of Variance Inflation
Factor has ever emerged in theory? Does it really make sense to have VIF at
a model level, as it does not help in solving the problem of
multicollinearity during model building.

And if I am right, what steps I should do about it.



Dear Anindya,

to me it seems clear from the example on the help page that VIF() is not 
intended to be applied to the model of interest, but to separate models 
for each covariable.


The model of interest in the example is
# the target multiple regression model
res <- lm(Ozone ~ Wind+Temp+Solar.R, data=airquality)

The VIF is calculated on submodels for each covariate.
# checking multicolinearity for independent variables.
VIF(lm(Wind ~ Temp+Solar.R, data=airquality))
VIF(lm(Temp ~ Wind+Solar.R, data=airquality))
VIF(lm(Solar.R ~ Wind+Temp, data=airquality))

Does that agree with your usual definition of a variance inflation factor?

best regards,

Heinz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] p values from GLM

2016-04-02 Thread Heinz Tuechler


Bert Gunter wrote on 01.04.2016 23:46:

... of course, whether one **should** get them is questionable...

http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503#/ref-link-1

This paper repeats the common place statement that a small p-value does 
not necessarily indicate an important finding. Agreed, but maybe I 
overlooked examples of important findings with large p-values.
If there are some, I would be happy to get to know some of them. 
Otherwise a small p-value is no guarantee of importance, but a prerequisite.


best regards,

Heinz



Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Apr 1, 2016 at 3:26 PM, Duncan Murdoch  wrote:

On 01/04/2016 6:14 PM, John Sorkin wrote:

How can I get the p values from a glm ? I want to get the p values so I
can add them to a custom report


   fitwean<-
glm(data[,"JWean"]~data[,"Group"],data=data,family=binomial(link ="logit"))
   summary(fitwean) # This lists the coefficeints, SEs, z and p
values, but I can't isolate the pvalues.
   names(summary(fitwean))  # I see the coefficients, but not the p values
   names(fitmens)  # p values are not found here.


Doesn't summary(fitwean) give a matrix? Then it's
colnames(summary(fitwean)$coefficients) you want, not names(fitwean).

Duncan Murdoch

P.S. If you had given a reproducible example, I'd try it myself.





Thank you!
John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and
Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply
email and destroy all copies of the original message.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] post_processor in rmarkdown not working

2017-09-07 Thread Heinz Tuechler

Are you sure that you want to read in the output_file in

text <- readLines(output_file, warn = FALSE)?

best regards,

Heinz

Thierry Onkelinx wrote/hat geschrieben on/am 06.09.2017 11:41:

Dear all,

I'm trying to write a post_processor() for a custom rmarkdown format. The
goal of the post_processor() is to modify the latex file before it is
compiled. For some reason the post_processor() is not run. The
post_processor() does work when I run it manually on the tex file.

Any suggestions on what I'm doing wrong? Below is the relevant snippet of
the code. The full code is available at
https://github.com/inbo/INBOmd/blob/post_processor/R/rsos_article.R
https://github.com/inbo/INBOmd/blob/post_processor/inst/rmarkdown/templates/rsos_article/skeleton/skeleton.Rmd
is an Rmd is a MWE that fails compile because the post_processor() is not
run.

Best regards,

Thierry

  post_processor <- function(
metadata, input_file, output_file, clean, verbose
  ) {
text <- readLines(output_file, warn = FALSE)

# set correct text in fmtext environment
end_first_page <- grep("EndFirstPage", text) #nolint
if (length(end_first_page) == 1) {
  maketitle <- grep("maketitle", text) #nolint
  text <- c(
text[1:(maketitle - 1)],
"\\begin{fmtext}",
text[(maketitle + 1):(end_first_page - 1)],
"\\end{fmtext}",
"\\maketitle",
text[(end_first_page + 1):length(text)]
  )
  writeLines(enc2utf8(text), output_file, useBytes = TRUE)
}
output_file
  }

  output_format(
knitr = knitr_options(
  opts_knit = list(
width = 60,
concordance = TRUE
  ),
  opts_chunk = opts_chunk,
  knit_hooks = knit_hooks
),
pandoc = pandoc_options(
  to = "latex",
  latex_engine = "xelatex",
  args = args,
  keep_tex = keep_tex
),
post_processor = post_processor,
clean_supporting = !keep_tex
  )



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Heinz Tüchler +436605653878

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] boot.stepAIC fails with computed formula

2017-08-23 Thread Heinz Tuechler
It seems that if you build the formula as a character string, and 
postpone the "as.formula" into the lm call, it works.


instead of
frm1 <- as.formula(paste(trg,"~1"))
use
frm1a <- paste(trg,"~1")
and then
strt <- lm(as.formula(frm1a),dat)

regards,

Heinz

Stephen O'hagan wrote/hat geschrieben on/am 23.08.2017 12:07:

Until I get a fix that works, a work-around would be to rename the 'y1' column, 
used a fixed formula, and rename it back afterwards.

Thanks for your help.
SGO.

-Original Message-
From: Bert Gunter [mailto:bgunter.4...@gmail.com]
Sent: 22 August 2017 20:38
To: Stephen O'hagan 
Cc: r-help@r-project.org
Subject: Re: [R] boot.stepAIC fails with computed formula

OK, here's the problem. Continuing with your example:

strt1 <- lm(y1 ~1, dat)
strt2 <- lm(frm1,dat)



strt1


Call:
lm(formula = y1 ~ 1, data = dat)

Coefficients:
(Intercept)
  41.73


strt2


Call:
lm(formula = frm1, data = dat)

Coefficients:
(Intercept)
  41.73


Note that the formula objects of the lm object are different: strt2 does not 
evaluate the formula. So presumably boot.step.AIC does no evaluation and 
therefore gets confused with the errors you saw. So you need to get the 
evaluated formula into the lm object. This can be done, e.g. via:


strt2 <- eval(substitute(lm(form,data = dat), list(form = frm1)))


## yielding


strt2


Call:
lm(formula = y1 ~ 1, data = dat)

Coefficients:
(Intercept)
  41.73

So this looks like it should fix the problem, but alas no, the boot.stepAIC 
call still fails with the same error message. Here's why:


identical(strt$call, strt2$call)

[1] FALSE

So one might rightfully ask, what the heck is going on here?! Further digging:


str(strt$call)

 language lm(formula = y1 ~ 1, data = dat)


str(strt2$call)

 language lm(formula = y1 ~ 1, data = dat)

These certainly look identical! -- but of course they're not:


names(strt$call)

[1] """formula" "data"

names(strt2$call)

[1] """formula" "data"

So the difference must lie in the formula component, right? ...


strt$call$formula

y1 ~ 1

strt2$call$formula

y1 ~ 1

So, thus far, huhh? But..


class(strt2$call$formula)

[1] "formula"


class(strt$call$formula)

[1] "call"

So I think therein lies the critical difference that is screwing things up. 
NOTE: If I am wrong about this someone **PLEASE** correct me.

I see no clear workaround for this other than to explicitly avoid
passing a formula in the lm() call with y~1 or y ~ .   I think the
real fix is to make the  boot.stepAIC function smarter in how it handles its 
formula argument, and that is above my paygrade (and degree of interest) . You 
should probably email the maintainer, who may not monitor this list. But give 
it a day or so to give someone else a chance to correct me if I'm wrong.


HTH.

Cheers,

Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking 
things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan  
wrote:

I'm trying to use boot.stepAIC for feature selection; I need to be able to 
specify the name of the dependent variable programmatically, but this appear to 
fail:

In R-Studio with MS R Open 3.4:

library(bootStepAIC)

#Fake data
n<-200

x1 <- runif(n, -3, 3)
x2 <- runif(n, -3, 3)
x3 <- runif(n, -3, 3)
x4 <- runif(n, -3, 3)
x5 <- runif(n, -3, 3)
x6 <- runif(n, -3, 3)
x7 <- runif(n, -3, 3)
x8 <- runif(n, -3, 3)
y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5)

dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1)
#the real data won't have these names...

cn <- names(dat)
trg <- "y1"
xvars <- cn[cn!=trg]

frm1<-as.formula(paste(trg,"~1"))
frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+")))

strt=lm(y1~1,dat) # boot.stepAIC Works fine

#strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ##

#strt=lm(frm1,dat) ## boot.stepAIC FAILS ##

limit<-5


stp=stepAIC(strt,direction='forward',steps=limit,
scope=list(lower=frm1,upper=frm2))

bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit,
scope=list(lower=frm1,upper=frm2))

b1 <- bst$Covariates
ball <- data.frame(b1)
names(ball)=unlist(trg)

Any ideas?

Cheers,
SOH


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




Re: [R] SE for all fixed factor effect in GLMM

2018-12-29 Thread Heinz Tuechler
maybe qvcalc https://cran.r-project.org/web/packages/qvcalc/index.html 
is useful for you.


Marc Girondot via R-help wrote/hat geschrieben on/am 30.12.2018 05:31:

Dear members,

Let do a example of simple GLMM with x and G as fixed factors and R as
random factor:

(note that question is the same with GLM or even LM):

x <- rnorm(100)
y <- rnorm(100)
G <- as.factor(sample(c("A", "B", "C", "D"), 100, replace = TRUE))
R <- as.factor(rep(1:25, 4))

library(lme4)

m <- lmer(y ~ x + G + (1 | R))
summary(m)$coefficients

I get the fixed effect fit and their SE


summary(m)$coefficients

   Estimate Std. Errort value
(Intercept)  0.07264454  0.1952380  0.3720820
x   -0.02519892  0.1238621 -0.2034433
GB   0.10969225  0.3118371  0.3517614
GC  -0.09771555  0.2705523 -0.3611706
GD  -0.12944760  0.2740012 -0.4724344

The estimate for GA is not shown as it is fixed to 0. Normal, it is the
reference level.

But is there a way to get SE for GA of is-it non-sense question because
GA is fixed to 0 ?

__

I propose here a solution but I don't know if it is correct. It is based
on reordering levels and averaging se for all reordering:

G <- relevel(G, "A")
m <- lmer(y ~ x + G + (1 | R))
sA <- summary(m)$coefficients

G <- relevel(G, "B")
m <- lmer(y ~ x + G + (1 | R))
sB <- summary(m)$coefficients

G <- relevel(G, "C")
m <- lmer(y ~ x + G + (1 | R))
sC <- summary(m)$coefficients

G <- relevel(G, "D")
m <- lmer(y ~ x + G + (1 | R))
sD <- summary(m)$coefficients

seA <- mean(sB["GA", "Std. Error"], sC["GA", "Std. Error"], sD["GA",
"Std. Error"])
seB <- mean(sA["GB", "Std. Error"], sC["GB", "Std. Error"], sD["GB",
"Std. Error"])
seC <- mean(sA["GC", "Std. Error"], sB["GC", "Std. Error"], sD["GC",
"Std. Error"])
seD <- mean(sA["GD", "Std. Error"], sB["GD", "Std. Error"], sC["GD",
"Std. Error"])

seA; seB; seC; seD


Thanks,

Marc

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Labels: SPSS Dataset to R

2020-02-07 Thread Heinz Tuechler

Maybe it helps searching at https://rseek.org/ for "SPSS to R transition
value labels".
In particular
https://cran.r-project.org/web/packages/expss/vignettes/labels-support.html
seems useful, as well as
https://www.r-bloggers.com/migrating-from-spss-to-r-rstats/

best regards,
Heinz

Jim Lemon wrote on 07.02.2020 22:58:

Hi Yawo,

From your recent post, you say you have coerced the variables to

factors. If so, perhaps:

as.character(x) is what you want.

If not, creating a new variable like this:

Scratch$new_race<-factor(as.character(Scratch$race),levels=c("WHITE","BLACK"))

may do it. Note the "levels" argument to get the numeric values in the
same order as the original.

Jim

On Sat, Feb 8, 2020 at 7:32 AM Yawo Kokuvi  wrote:


Thanks for all your assistance

Attached please is the Rdata scratch I have been using

-


head(Scratch, n=13)

# A tibble: 13 x 6
  ID   maritalsex  racepaeducspeduc
 
 1 1 3 [DIVORCED]  1 [MALE]   1 [WHITE]NANA
 2 2 1 [MARRIED]   1 [MALE]   1 [WHITE]NANA
 3 3 3 [DIVORCED]  1 [MALE]   1 [WHITE] 4NA
 4 4 4 [SEPARATED] 1 [MALE]   1 [WHITE]16NA
 5 5 3 [DIVORCED]  1 [MALE]   1 [WHITE]18NA
 6 6 1 [MARRIED]   2 [FEMALE] 1 [WHITE]1420
 7 7 1 [MARRIED]   2 [FEMALE] 2 [BLACK]NA12
 8 8 1 [MARRIED]   2 [FEMALE] 1 [WHITE]NA12
 9 9 3 [DIVORCED]  2 [FEMALE] 1 [WHITE]11NA
1010 1 [MARRIED]   2 [FEMALE] 1 [WHITE]1612
1111 5 [NEVER MARRIED] 2 [FEMALE] 2 [BLACK]NANA
1212 3 [DIVORCED]  2 [FEMALE] 2 [BLACK]NANA
1313 3 [DIVORCED]  2 [FEMALE] 2 [BLACK]16NA

-

and below is my script/command file.

*#1: Load library and import SPSS dataset*
library(haven)
Scratch <- read_sav("~/Desktop/Scratch.sav")

*#2: save the dataset with a name*
save(ScratchImport, file="Scratch.Rdata")

*#3: install & load necessary packages for descriptive statistics*
install.packages ("freqdist")
library (freqdist)

install.packages ("sjlabelled")
library (sjlabelled)

install.packages ("labelled")
library (labelled)

install.packages ("surveytoolbox")
library (surveytoolbox)

*#4: Check the value labels of gender and marital status*
Scratch$sex %>% attr('labels')
Scratch$marital %>% attr('labels')

*#5:  Frequency Distribution and BarChart for Categorical/Ordinal Level
Variables such as Gender - SEX*
freqdist(Scratch$sex)
barplot(table(Scratch$marital))

-

As you can see from above, I use the  package to import the data
from SPSS.  Apparently, the haven function keeps the value labels, as the
attribute options in section #4 of my script shows.
The problem is that when I run frequency distribution for any of the
categorical variables like sex or marital status, only the numbers (1, 2,)
are displayed in the output.  The labels (male, female) for example are not.

Is there any way to force these to be shown in the output?  Is there a
global property that I have to set so that these value labels are reliably
displayed with every output?  I read I can declare them as factors using
the , but once I do so, how do I invoke them in my commands so
that the value labels show...

Sorry about all the noobs questions, but Ihopefully, I am able to get this
working.

Thanks in advance.


Thanks - cY


On Fri, Feb 7, 2020 at 1:14 PM  wrote:


I've never used it, but there is a labels function in haven...

On 7 Feb 2020 17:05, Bert Gunter  wrote:

What does your data look like after importing? -- see ?head and ?str to
tell us. Show us the code that failed to provide "labels." See the posting
guide below for how to post questions that are likely to elicit helpful
responses.

I know nothing about the haven package, but see ?factor or go through an R
tutorial or two to learn about factors, which may be part of the issue
here. R *generally* obtains whatever "label" info it needs from the object
being tabled -- see ?tabulate, ?table etc. -- if that's what you're doing.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Feb 7, 2020 at 8:28 AM Yawo Kokuvi  wrote:


Hello,

I am just transitioning from SPSS to R.

I used the haven library to import some of my spss data files to R.

However, when I run procedures such as frequencies or crosstabs, value
labels for categorical variables such as gender (1=male, 2=female) are

not

shown. The same applies to many other output.

I am confused.

1. Is there a global setting that I can use to force all categorical
variables to display labels?

2. Or, are these 

Re: [R] My dream ...

2020-05-12 Thread Heinz Tuechler

Abby Spurdle wrote/hat geschrieben on/am 12.05.2020 10:38:

In my opinion the advantage of computers is not Artificial
Intelligence, but rather Artificial Patience (most AI that I have seen
is really doing a bunch of what I would consider to be boring, really
fast so people don't have to).  Leave the Intelligence to the people.


Hmmm...
https://en.wikipedia.org/wiki/Artificial_intelligence_in_video_games

Also, I found the following while searching for battle chess:
https://youtu.be/hBNG7444lOw

(Warning: Contains aggressive chess tactics).

Also, correct me if I'm wrong, but doesn't Emacs have historical
connections to AI research...?


Maybe a matter of definition, but admittedly I have to use a lot of my
intelligence for doing boring work.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Some seemingly odd behavior of survfit in library survival

2020-03-17 Thread Heinz Tuechler

Jeff Newmiller wrote/hat geschrieben on/am 17.03.2020 05:39:

The coxph function appears to rely on finding the name of the data argument in 
the environment in which the formula was created. The lm function does not have 
this problem.

Oh, and df is the name of the F distribution density function, which explains why the 
error complained about a "closure".

new<-as.data.frame(list(t=1:4,d=rep(1,4),s=c(1,0,0,1)))
library(survival)
mine<-function(ff,df){
fit<-coxph(as.formula(ff),data=df)
 survfit(fit,df)
}
mine("Surv(t,d)~s",new)



Therefore a workaround could be to use the character string for the
formula as argument and apply as.formula() within the function.

mine2 <-function(fstr,df){
fit<-coxph(as.formula(fstr),data=df)
out<-survfit(fit,df)
out
}

mine2("Surv(t,d)~s",new)

Heinz


On March 16, 2020 7:23:26 PM PDT, John Kolassa  wrote:

I ran across an issue that looks like variable scoping in survfit is
not acting as I would expect.  Here's a minimal example:

new<-as.data.frame(list(t=1:4,d=rep(1,4),s=c(1,0,0,1)))
library(survival)
mine<-function(ff,df){
   fit<-coxph(ff,data=df)
   out<-survfit(fit,df)
}
mine(as.formula("Surv(t,d)~s"),new)

I would expect this to fit the proportional hazards regression model
using formula Surv(t,d)~s, using data set new, and then calculate a
separate fitted survival curve for each member of the data set.
Instead I get an error

Error in eval(predvars, data, env) :
 invalid 'envir' argument of type 'closure'

The code runs without error if I modify it by copying the data set new
to the local variable within the function mine before running:

new<-as.data.frame(list(t=1:4,d=rep(1,4),s=c(1,0,0,1)))
library(survival)
mine<-function(ff,df){
   fit<-coxph(ff,data=df)
   out<-survfit(fit,df)
}
df<-new
mine(as.formula("Surv(t,d)~s"),new)

which leads me to believe that there's some variable scoping error.
Can anyone point out what I'm doing wrong?  Thanks, John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert European short dates to ISO format?

2020-06-10 Thread Heinz Tuechler

maybe
isoDates <- as.Date(oriDates, format = "%d/%m/%y")

Heinz

Luigi Marongiu wrote/hat geschrieben on/am 10.06.2020 10:20:

Hello,
I have been trying to convert European short dates formatted as
dd/mm/yy into the ISO 8601 but the function as.Dates interprets them
as American ones (mm/dd/yy), thus I get:

```
oriDates = c("23/01/20", "24/01/20", "25/01/20", "26/01/20",
"27/01/20", "28/01/20", "29/01/20", "30/01/20",
 "31/01/20", "01/02/20", "02/02/20", "03/02/20",
"04/02/20", "05/02/20", "06/02/20", "07/02/20")
isoDates = as.Date(oriDates, format = "%m/%d/%y")

isoDates

 [1] NA   NA   NA   NA   NA
NA   NA
 [8] NA   NA   "2020-01-02" "2020-02-02" "2020-03-02"
"2020-04-02" "2020-05-02"
[15] "2020-06-02" "2020-07-02"
```

How can I convert properly?



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] svglite with multiple files

2020-12-09 Thread Heinz Tuechler

Dear Bert,

of course I have read the posting guide, almost two decades ago, but
usually I focus on reading carefully a posting before replying.
To be more precise, I did *not* ask about the svglite-package, as it
works as described.
My question was: "Is there a simple solution to make svglite() work like
svg() to produce several files?"
To make it more explicit for you, I could add "using standard packages
distributed with R".
So maybe you or others have some ideas, how to solve that question in a
convenient way. As mentioned I know complicated solutions, as e.g. call
svglite() before dev.off() after every plot.

best,
Heinz


Bert Gunter wrote/hat geschrieben on/am 09.12.2020 17:35:

Sigh... Per the posting guide (which you have read, right?):

"For questions about functions in standard packages distributed with R (see
the FAQ Add-on packages in R
<http://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R>), ask
questions on R-help.If the question relates to a *contributed package* ,
e.g., one downloaded from CRAN, try contacting the package maintainer
first. You can also use find("functionname") and
packageDescription("packagename") to find this information. *Only* send
such questions to R-help or R-devel if you get no reply or need further
assistance. This applies to both requests for help and to bug reports. "

This certainly sounds like a question for the svglite maintainer,
?maintainer, who might know about "tricks" that one could use. Though you
might get lucky here -- it's just that you should not expect to. If you
tried to contact the maintainer but received no response, do include that
info in your post.

Cheers,

Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Dec 9, 2020 at 3:33 AM Heinz Tuechler  wrote:


Dear All,

while svg() (package grDevices) can produce several files, svglite()
(package svglite) is limited to one file/page only (as documented in the
respective help page).
Is there a simple solution to make svglite() work like svg() to produce
several files?
Of course one could call svglite() before dev.off() after every plot.

best regards,

Heinz

## example
svg("Rplot%03d.svg")
plot(1)
plot(2)
plot(3)
dev.off()
## three files Rplot001.svg, Rplot002.svg, Rplot003.svg are produced

library(svglite)
svglite("Rplot-lite.svg")
plot(1)
plot(2) ## as documented: Error in plot.new() : svglite only supports
one page

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] svglite with multiple files

2020-12-09 Thread Heinz Tuechler

Dear All,

while svg() (package grDevices) can produce several files, svglite()
(package svglite) is limited to one file/page only (as documented in the
respective help page).
Is there a simple solution to make svglite() work like svg() to produce
several files?
Of course one could call svglite() before dev.off() after every plot.

best regards,

Heinz

## example
svg("Rplot%03d.svg")
plot(1)
plot(2)
plot(3)
dev.off()
## three files Rplot001.svg, Rplot002.svg, Rplot003.svg are produced

library(svglite)
svglite("Rplot-lite.svg")
plot(1)
plot(2) ## as documented: Error in plot.new() : svglite only supports
one page

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inappropriate color name

2020-11-19 Thread Heinz Tuechler

inline - David Wright wrote on 19.11.2020 12:39:

Appropriation of Indian Red as 'Chestnut' (or other alternative) will
be viewed by some as 'making appropriate' the label for a colour, and
no doubt by other groups as cultural theft by excising reference to
its origin.

Seems the best option is to recognise the actual etymology carries no
semblance of offense whatsoever, and leave well alone.



One may remember that people who might feel offended by "Indian Red"
(Native Americans) make up less than 0.5 percent of all "Indians".
It is hardly the fault of the people of India that Native Americans were
called Indians by an Italian navigator who thought he had landed in India.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Defining partial list of variables

2021-01-05 Thread Heinz Tuechler

What about the Cs()-function in Hmisc?
library(Hmisc)
Cs(a,b,c)
[1] "a" "b" "c"

Steven Yen wrote/hat geschrieben on/am 05.01.2021 13:29:

Thanks Eric. Yes, "unlist" makes a difference. Below, I am doing not
regression but summary to keep the example simple.

 > set.seed(123)
 > data<-matrix(runif(1:25),nrow=5)
 > colnames(data)<-c("x1","x2","x3","x4","x5"); data
 x1x2x3 x4x5
[1,] 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393
[2,] 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034
[3,] 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068
[4,] 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698
[5,] 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058
 > j<-strsplit(gsub("[\n ]","","x1,x3,x5"),",")
 > j<-unlist(j); j
[1] "x1" "x3" "x5"
 > summary(data[,j])
x1   x3   x5
  Min.   :0.2876   Min.   :0.1029   Min.   :0.6405
  1st Qu.:0.4090   1st Qu.:0.4533   1st Qu.:0.6557
  Median :0.7883   Median :0.5726   Median :0.6928
  Mean   :0.6617   Mean   :0.5527   Mean   :0.7746
  3rd Qu.:0.8830   3rd Qu.:0.6776   3rd Qu.:0.8895
  Max.   :0.9405   Max.   :0.9568   Max.   :0.9943

On 2021/1/5 下午 07:08, Eric Berger wrote:

wrap it in unlist

xx <- unlist(strsplit(  ))



On Tue, Jan 5, 2021 at 12:59 PM Steven Yen mailto:st...@ntu.edu.tw>> wrote:

Thanks Eric. Perhaps I should know when to stop. The approach
produces a slightly different variable list (note the [[1]]).
Consequently, I was not able to use xx in defining my regression
formula.

> x<-colnames(subset(mydata,select=c(

+hhsize,urban,male,
+age3045,age4659,age60, # age1529
+highsc,tert,   # primary
+gov,nongov,# unemp
+married))); x
 [1] "hhsize"  "urban"   "male""age3045" "age4659" "age60"
"highsc"  "tert"
 [9] "gov" "nongov"  "married"
> xx<-strsplit(gsub("[\n ]","",
+"hhsize,urban,male,
+ age3045,age4659,age60,
+ highsc,tert,
+ gov,nongov,
+ married"
+ ),","); xx
[[1]]
 [1] "hhsize"  "urban"   "male""age3045" "age4659" "age60"
"highsc"  "tert"
 [9] "gov" "nongov"  "married"

> eq1<-my.formula(y="cig",x=x); eq1
cig ~ hhsize + urban + male + age3045 + age4659 + age60 + highsc +
tert + gov + nongov + married
> eq2<-my.formula(y="cig",x=xx); eq2
cig ~ c("hhsize", "urban", "male", "age3045", "age4659", "age60",
"highsc", "tert", "gov", "nongov", "married")

On 2021/1/5 下午 06:01, Eric Berger wrote:

If your column names have no spaces the following should work

 x<-strsplit(gsub("[\n ]","",
 "hhsize,urban,male,
+ gov,nongov,married"),","); x

On Tue, Jan 5, 2021 at 11:47 AM Steven Yen mailto:st...@ntu.edu.tw>> wrote:

Here we go! BUT, it works great for a continuous line. With
line break(s), I got the nuisance "\n" inserted.

> x<-strsplit("hhsize,urban,male,gov,nongov,married",","); x
[[1]]
[1] "hhsize"  "urban"   "male""gov" "nongov"  "married"

> x<-strsplit("hhsize,urban,male,
+ gov,nongov,married",","); x
[[1]]
[1] "hhsize""urban" "male"
"\ngov"
[5] "nongov""married"

On 2021/1/5 下午 05:34, Eric Berger wrote:


zx<-strsplit("age,exercise,income,white,black,hispanic,base,somcol,grad,employed,unable,homeowner,married,divorced,widowed",",")



On Tue, Jan 5, 2021 at 11:01 AM Steven Yen mailto:st...@ntu.edu.tw>> wrote:

Thank you, Jeff. IMO, we are all here to make R work
better to suit our
various needs. All I am asking is an easier way to
define variable list
zx, differently from the way z0 , x0, and treat are defined.

 > zx<-colnames(subset(mydata,select=c(
+
age,exercise,income,white,black,hispanic,base,somcol,grad,employed,
+ unable,homeowner,married,divorced,widowed)))
 > z0<-c("fruit","highblood")
 > x0<-c("vgood","poor")
 > treat<-"depression"
 > eq1 <-my.formula(y="depression",x=zx,z0)
 > eq2 <-my.formula(y="bmi", x=zx,x0)
 > eq2t<-my.formula(y="bmi", x=zx,treat)
 > eqs<-list(eq1,eq2); eqs
[[1]]
depression ~ age + exercise + income + white + black +
hispanic +
 base + somcol + grad + employed + unable +
homeowner + married +
 divorced + widowed + fruit + highblood

[[2]]
bmi ~ age + exercise + income + white + black + hispanic
+ base +
 somcol + grad + employed + unable + homeowner +
married +
 divorced + widowed + vgood + poor

 > eqt<-list(eq1,eq2t); eqt
[[1]]
depression ~ age + exercise + income + white 

Re: [R] Defining partial list of variables

2021-01-05 Thread Heinz Tuechler

see below

Steven Yen wrote/hat geschrieben on/am 05.01.2021 08:14:

I constantly define variable lists from a data frame (e.g., to define a
regression equation). Line 3 below does just that. Placing each variable
name in quotation marks is too much work especially for a long list so I
do that with line 4. Is there an easier way to accomplish thisto
define a list of variable names containing "a","c","e"? Thank you!


data<-as.data.frame(matrix(1:30,nrow=6))
colnames(data)<-c("a","b","c","d","e"); data


  a  b  c  d  e
1 1  7 13 19 25
2 2  8 14 20 26
3 3  9 15 21 27
4 4 10 16 22 28
5 5 11 17 23 29
6 6 12 18 24 30

x1<-c("a","c","e"); x1 # line 3

[1] "a" "c" "e"

x2<-colnames(subset(data,select=c(a,c,e))); x2 # line 4


[1] "a" "c" "e"


What about:
x3 <- names(data)[c(1,3,5)]
x3
[1] "a" "c" "e"

If I have to compile longer vectors of variable names I do it as follows:
First I use:
dput(names(data))
resulting in a vector of names.
c("a", "b", "c", "d", "e")
Then I edit the output by hand, e.g.
x4 <- c("a", "b", "c", "d", "e")
x4 <- c("a", "c", "e")
This is especially useful with long names, where I could easily make
typing errors.

regards,
Heinz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Translation of the charter

2021-11-02 Thread Heinz Tuechler

Greg,

Greg Minshall wrote/hat geschrieben on/am 02.11.2021 08:57:

Heinz,


x <- c("a","b","c")
lettersnum <- 1:length(letters[])
names(lettersnum) <- letters[]
lettersnum[x]

lettersnum[x]

a b c
1 2 3


i'm not sure if the following is obviously better, but one might do


b <- match(a, a)
names(b) <- a
b

a b c
1 2 3


cheers, Greg


You are right - match seems obviously better, but why not do

x <- c("a","b","c")
match(x, letters[])

best, Heinz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Translation of the charter

2021-11-02 Thread Heinz Tuechler

Alice wrote/hat geschrieben on/am 31.10.2021 07:33:

Dear members,

How to translate the charter to the underline inter?
I tried this:


x <- c("a","b","c")



as.numeric(x)


[1] NA NA NA

Warning message:

NAs introduced by coercion


It didn't work.

Sorry for my newbie questions.


B.R.

Alice

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Is this, what you are looking for:

x <- c("a","b","c")
lettersnum <- 1:length(letters[])
names(lettersnum) <- letters[]
lettersnum[x]
> lettersnum[x]
a b c
1 2 3

best,
Heinz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] return value of {....}

2023-01-13 Thread Heinz Tuechler

09.01.2023 18:05:58 akshay kulkarni :

We are living in the 21st century world, and the R-core team might,I suppose, 
have a definite reason ...



Maybe compatibility reasons with S and R-versions from the 20st century?
But maybe, you would have expected some reason even then.

best regards,

Heinz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Stacking of Two Columns

2023-04-03 Thread Heinz Tuechler

Jeff Newmiller wrote/hat geschrieben on/am 03.04.2023 18:26:

unname(unlist(NamesWide))

Why not:

NamesWide <- data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly"))
NamesLong <- data.frame(Names=with(NamesWide, c(Name1, Name2)))



On April 3, 2023 8:08:59 AM PDT, "Sparks, John"  wrote:

Hi R-Helpers,

Sorry to bother you, but I have a simple task that I can't figure out how to do.

For example, I have some names in two columns

NamesWide<-data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly"))

and I simply want to get a single column
NamesLong<-data.frame(Names=c("Tom","Dick","Larry","Curly"))

NamesLong

 Names
1   Tom
2  Dick
3 Larry
4 Curly


Stack produces an error
NamesLong<-stack(NamesWide$Name1,NamesWide$Names2)
Error in if (drop) { : argument is of length zero

So does bind_rows

NamesLong<-dplyr::bind_rows(NamesWide$Name1,NamesWide$Name2)

Error in `dplyr::bind_rows()`:
! Argument 1 must be a data frame or a named atomic vector.
Run `rlang::last_error()` to see where the error occurred.

I tried making separate dataframes to get around the error in bind_rows but it 
puts the data in two different columns
Name1<-data.frame(c("Tom","Dick"))
Name2<-data.frame(c("Larry","Curly"))
NamesLong<-dplyr::bind_rows(Name1,Name2)

NamesLong

 c..TomDick.. c..LarryCurly..
1  Tom
2 Dick
3Larry
4Curly

gather makes no change to the data
NamesLong<-gather(NamesWide,Name1,Name2)

NamesLong

 Name1 Name2
1   Tom Larry
2  Dick Curly


Please help me solve what should be a very simple problem.

Thanks,
John Sparks





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.