date:20081126

Re: [R] How to create a string containing '\/' to be used with SED?

2008-11-26 Thread seanpor


Good morning,

You do not need to quote a forward slash / in R, but you do need to quote a
backslash when you're inputting it... so to get a string which actually
contains blah\/blah... you need to use blah\\/blah

http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-backslash-behave-strangely-inside-strings_003f

Unless this is a very very big file you shouldn't need to go out to sed, as
gsub() should work adequately... and probably quicker and cleaner.  So
something along the lines of.. (UNTESTED!!! since I don't have a
reproduceable example)

tmp1 - readLines(configurationFile)
tmp1 - gsub(^instance .*, paste(instance = , data$instancePath, /,
data$newInstance, sep = ), tmp1)


I'm working on 50mb text files, and doing all sorts of manipulations and I
do it all inside R under windows XP...  reading a 50mb text file across the
100mb network and doing a gsub() on most lines takes an elapsed 16 seconds
on this office desktop.

hth...

Regards,
Sean


ikarus wrote:
 
 Hi guys,
 I've been struggling to find a solution to the following issue:
 I need to change strings in .ini files that are given in input to a
 program whose output is processed by R. The strings to be changed looks
 like: 
 instance = /home/TSPFiles/TSPLIB/berlin52.tsp 
 
 I normally use Sed for this kind of things. So, inside R I'd like to write
 something like:
 
  command - paste(sed -i 's/^instance .*/instance = , data$instancePath,
data$newInstance, /' , configurationFile, sep = )
  system(command)
 
 This will overwrite the line starting with instance  using instance =
 the_new_instance
 In the example I gave, data$instancePath = /home/TSPFiles/TSPLIB/ and 
 data$newInstance = berlin52.tsp
 
 The problem is that I need to pass the above path string to sed in the
 form:
 \/home\/TSPFiles\/TSPLIB\/ 
 
 However, I couldn't find a way to create such a string in R. I tried in
 several different ways,
 but it always complains saying that '\/' is an unrecognized escape!
  
 Any suggestion? 
 
 Thanks!
 

-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20696613.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LaTeX and R-scripts/-results

2008-11-26 Thread Thomas Petzoldt


Oliver Bandel wrote:

Hello,

at some places I read about good interaction of
LaTeX and R.

Can you give me a starting point, where I can find
information about it?

Are there special LaTeX-packages for the support,
or does R have packages for support of LaTeX?
Or will an external Code-Generator be used?

TIA,
   Oliver



Hi Oliver,

you are right, LaTeX and R are perfect companions. Look for Sweave(*). 
You find an introduction of Fritz Leisch in R-News 2002, Vol 2/3:


http://cran.r-project.org/doc/Rnews/Rnews_2002-3.pdf

and an entire homepage about it:

http://www.statistik.lmu.de/~leisch/Sweave/

HTH Thomas P.

(*) Many thanks to Friedrich Leisch for this great peace of software!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

2008-11-26 Thread Charlie Brush


I am doing multiple imputation with Hmisc, and
can't figure out how to replace the NA values with
the imputed values.

Here's a general ourline of the process:

 set.seed(23)
 library(mice)
 library(Hmisc)
 library(Design)
 d - read.table(DailyDataRaw_01.txt,header=T)
 length(d);length(d[,1])
[1] 43
[1] 2666
Do for this data set, there are 43 columns and 2666 rows

Here is a piece of data.frame d:
 d[1:20,4:6]
 P01  P02  P03
1  0.1 0.16 0.16
2   NA 0.00 0.00
3   NA 0.60 0.04
4   NA 0.15 0.00
5   NA 0.00 0.00
6  0.7 0.00 0.75
7   NA 0.00 0.00
8   NA 0.00 0.00
9  0.0 0.00 0.00
10 0.0 0.00 0.00
11 0.0 0.00 0.00
12 0.0 0.00 0.00
13 0.0 0.00 0.00
14 0.0 0.00 0.00
15 0.0 0.00 0.03
16  NA 0.00 0.00
17  NA 0.01 0.00
18 0.0 0.00 0.00
19 0.0 0.00 0.00
20 0.0 0.00 0.00

These are daily precipitation values at NCDC stations, and
NA values at station P01 will be filled using multiple
imputation and data from highly correlated stations P02 and P08:

 f - aregImpute(~ I(P01) + I(P02) + I(P08), 
n.impute=10,match='closest',data=d)

Iteration 13
 fmi - fit.mult.impute( P01 ~ P02 + P08 , ols, f, d)

Variance Inflation Factors Due to Imputation:

Intercept   P02   P08
   1.01  1.39  1.16

Rate of Missing Information:

Intercept   P02   P08
   0.01  0.28  0.14

d.f. for t-distribution for Tests of Single Coefficients:

Intercept   P02   P08
242291.18116.05454.95
 r - apply(f$imputed$P01,1,mean)
 r
   2 3 4 5 7 81617   249   250   251
0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002
 252   253   254   255   256   257   258   259   260   261   262
1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719
 263   264   265   266   267   268   269   270   271   272   273
1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135
 274   275   276   277   278   279   280   281   282   283   284
0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029
 285   286   287   288   289   418   419   420   421   422   700
0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002

--
So far, this is working great.
Now, make a copy of d:
 dnew - d

And then fill in the NA values in P01 with the values in r

For example:
 for (i in 1:length(r)){
   dnew$P01[r[i,1]] - r[i,2]
   }
This doesn't work, because each 'piece' of r is two numbers:
 r[1]
  2
0.002
 r[1,1]
Error in r[1, 1] : incorrect number of dimensions

My question: how can I separate the the two items in (for example)
r[1] to use the first part as an index and the second as a value,
and then use them to replace the NA values with the imputed values?

Or is there a better way to replace the NA values with the imputed values?

Thanks in advance for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LaTeX and R-scripts/-results

2008-11-26 Thread Philipp Pagel

On Wed, Nov 26, 2008 at 08:50:33AM +0100, Oliver Bandel wrote:
 at some places I read about good interaction of
 LaTeX and R.
 
 Can you give me a starting point, where I can find
 information about it?

Have a look at these:

Sweave()
xtable()(xtable)
latex() (Hmisc)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in sqlCopy in RODBC

2008-11-26 Thread Dieter Menne




BKMooney wrote:
 
 I am trying to copy portions of tables from one SQL database to another,
 using sqlCopy in the RODBC package.
 
 ...
 I am currently getting an error:
 Error in sqlSave(destchannel, dataset, destination, verbose = verbose,  :
   table 'LocalTable' already exists
 

I can reproduce your error with my example file and fast=TRUE. You might try
fast=FALSE 
and append=TRUE when things like this happens. The following works for me

library(RODBC)
channel = odbcConnectAccess(db.mdb)
sqlCopy(channel,Select * from tab,newtab,destchannel=channel,
  safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE)
odbcClose(channel)



-- 
View this message in context: 
http://www.nabble.com/Error-in-sqlCopy-in-RODBC-tp20691929p20697101.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] eclipse and R

2008-11-26 Thread Ruud H. koning

Hello, I am trying to install Eclipse and R on an amd64 machine running 
Suse linux 9.3. I have compiled R 2.8.0 with --enable-R-shlib and it 
seems that compilation was successfull. After starting R, I installed 
the latest rJava package, from the output:

checking whether JRI is requested... yes
cp src/libjri.so libjri.so
It seems JRI support has been compiled successfully. However, when I try 
to open R from within Eclipse, I receive an error message:


Launching the R Console was cancelled, because it seems starting the 
Java process/R engine failed.

Please make sure that R package 'rJava' with JRI is installed.

I can open an R console from the command line, and attach the rJava 
library without problems. What am I doing wrong here?

Thanks, Ruud

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Running rtest - how to/ help

2008-11-26 Thread indian scorpio

I am trying to run rTest example (XP, Eclipse, ) available with code.
JRI etc are used from the rJava package (i.e. Haven't built myself)

The code dosent runs properly - it shows ( i tried passing argument as
--save). what else is possible.

'Creating Rengine (with arguments)' and then gets terminated.

When I am trying to use run.bat
run.bat examples\rtest.java
it says
 '-cp' is not recognized as an internal or external command
C:\Documents and Settings\shlahoti\Desktop\JRI_0.4-1\JRIset PATH=C:\WINDOWS\sys
tem32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\system32\;C:\WINDOWS\system
32\Wbem;C:\Program Files\WinZip;C:\Program Files\Visual Networks\Dial Analysis\;
c:\Program Files\Common Files\Roxio Shared\DLLShared\;C:\Program Files\Windows I
maging\;C:\Program Files\Subversion\bin;C:\Graphviz2.16\Bin;C:\jEdit;C:\QuickT
ime\QTSystem\;C:\php;C:\Program Files\R\R-2.8.0\library\rJava\jri\;C:\Program Fi
les\R\R-2.8.0\bin\;\bin;\lib;\bin;\lib;\bin;\lib;\bin;\lib;\bin;\lib;\bin;\lib

C:\Documents and Settings\shlahoti\Desktop\JRI_0.4-1\JRI-cp .;examples;src/JRI.
jar rtest $
'-cp' is not recognized as an internal or external command,
operable program or batch file.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to read .sps (SPSS file extension)?

2008-11-26 Thread Eik Vettorazzi

maybe the importers of the memisc-package will help you, but I never 
tried them, see


 help(importers,package=memisc)

At a first glance it seems, that you have to split your file manually, 
but maybe there is another way.

hth.

livio finos schrieb:

sorry, you are completely right!
sps is not the extension for portable file! sorry for the time I make 
you spend.

I try to make my problem more clear.
I exporting a dataset from limesurvey (a free software for internet 
survey). It works very fine and it allow to export in different format 
such as csv and excel. this fine, but what I like from spss formats is 
the variables labels.
limesurvey declares to export in spss, but it export in sps format 
which is not a format but a code actually. I realize it just now, 
sorry. I attach an extract of the code here below.
do you have any suggestion on how to manage that? I think it will be 
great if we can improve the interconnettivity among free software.

thanks again..
  livio

NEW FILE.
FILE TYPE NESTED RECORD=1(A).
- RECORD TYPE 'A'.
- DATA LIST LIST / i0(A1) d1(N3) d2(DATETIME20.0) d3(A15) d4(N2) 
d5(N1) d6(N1) d7(N1) d8(N1) d9(N1) d10(N1) d11(N1) d12(N1) d13(N1) 
d14(N1) d15(N1) d16(N1) d17(N2) d18(N1) d19(N1) d20(N1) .


- RECORD TYPE 'B'.
- DATA LIST LIST / i1(A1) d21(N1) d22(N1) d23(N1) d24(N1) d25(N1) 
d26(N1) d27(N1) d28(N1) d29(N1) d30(N1) d31(N1) d32(N1) d33(N1) 
d34(N1) d35(N1) d36(N1) d37(N1) d38(A37) d39(N1) d40(N1) .


- RECORD TYPE 'C'.
- DATA LIST LIST / i2(A1) d41(N1) d42(N1) d43(N1) d44(N1) d45(N1) 
d46(N1) d47(N1) d48(N1) d49(N1) d50(N1) d51(N1) d52(N1) d53(N1) 
d54(N1) d55(N1) d56(N1) d57(N1) d58(N1) d59(N1) d60(N1) .


- RECORD TYPE 'D'.
- DATA LIST LIST / i3(A1) d61(N1) d62(N1) d63(N1) d64(N1) d65(N1) 
d66(N1) d67(N1) d68(N1) d69(N1) d70(N1) d71(N1) d72(N1) d73(N1) 
d74(N1) d75(N1) d76(N1) d77(N1) d78(N1) d79(N1) d80(N1) .


- RECORD TYPE 'E'.
- DATA LIST LIST / i4(A1) d81(N1) d82(N1) d83(N1) d84(N1) d85(N1) 
d86(N1) d87(N1) d88(N1) d89(N1) d90(N1) d91(N1) d92(N1) d93(N1) 
d94(N1) d95(N1) d96(N1) d97(N1) d98(N1) d99(N1) d100(N1) .


- RECORD TYPE 'F'.
- DATA LIST LIST / i5(A1) d101(N1) d102(N1) d103(N1) d104(N1) d105(N1) 
d106(N1) d107(N1) d108(N1) d109(N1) .

END FILE TYPE.

BEGIN DATA
A '8' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '1' '1' 
'1' '1' '4' '0' '' '1' '' '1'
B '' '1' '1' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '2' '7' '3' '2' 
'0''2' '3'
C '3' '4' '4' '2' '2' '1' '4' '4' '2' '3' '1' '4' '1' '4' '1' '1' '4' 
'3' '4' '3'
D '1' '3' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '2' '2' '2' '2' '3' 
'3' '3' '2'
E '2' '3' '2' '2' '0''0''0''0''0''0''0''3' '4' '4' '2' '1' '5' '2' '5' 
'4'

F '1' '2' '2' '2' '2' '1' '2' '2' '5'
A '9' '01-01-1980 00:00:00' 'it' '13' '2' '1' '' '1' '' '0' '' '1' '1' 
'1' '3' '0' '' '1' '' '1'
B '' '0' '' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '3' '7' '3' '2' 
'0''2' '4'
C '3' '4' '4' '3' '3' '3' '4' '3' '2' '2' '1' '3' '1' '4' '1' '4' '4' 
'4' '4' '3'
D '1' '2' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '1' '3' '3' '3' 
'3' '3' '3'
E '3' '3' '3' '2' '0''0''0''0''0''0''0''3' '4' '2' '2' '5' '3' '5' '2' 
'5'

F '1' '5' '5' '5' '5' '5' '5' '2' '5'
A '10' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '2' '0' 
'' '0' '' '0' '' '1' '' '1'
B '' '1' '2' '0' '' '0' '' '0' '' '1' '2' '1' '4' '6' '7' '3' '2' 
'0''3' '3'
C '4' '4' '4' '4' '3' '4' '4' '3' '2' '2' '1' '4' '1' '4' '1' '3' '4' 
'4' '4' '1'
D '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '3' '3' '3' '3' '3' 
'3' '3' '3'
E '3' '3' '3' '2' '0''0''0''0''0''0''0''5' '4' '5' '2' '5' '5' '5' '5' 
'5'

F '1' '5' '5' '5' '5' '5' '5' '1' '5'
END DATA.
EXECUTE.

*Define Variable Properties.
VARIABLE LABELS d1 'Record ID'.
VARIABLE LABELS d2 'Data di completamento'.
VARIABLE LABELS d3 'Lingua di partenza'.
VARIABLE LABELS d4 'EtÃ :'.
VARIABLE LABELS d5 'Sesso:'.
VARIABLE LABELS d6 '3 - PapÃ '.
VARIABLE LABELS d7 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d8 '3 - Mamma'.
VARIABLE LABELS d9 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d10 '3 - Fratelli nÂ°'.
VARIABLE LABELS d11 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d12 '3 - Sorelle nÂ°'.
VARIABLE LABELS d13 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d14 '3 - Nonni nÂ°'.
VARIABLE LABELS d15 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d16 '3 - Altre figure parentali (zii, cugini, ecc.) nÂ°'.
VARIABLE LABELS d17 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d18 '4 - PapÃ '.
VARIABLE LABELS d19 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

VARIABLE LABELS d20 '4 - Mamma'.
VARIABLE LABELS d21 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

VARIABLE LABELS d22 '4 - Fratelli nÂ°'.
VARIABLE LABELS d23 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

VARIABLE LABELS d24 '4 - Sorelle nÂ°'.
VARIABLE LABELS d25 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

Re: [R] how to read .sps (SPSS file extension)?

2008-11-26 Thread Eik Vettorazzi


sorry for the typo,

help(importer, package=memisc)

will do the trick.

Eik Vettorazzi schrieb:
maybe the importers of the memisc-package will help you, but I never 
tried them, see


 help(importers,package=memisc)

At a first glance it seems, that you have to split your file manually, 
but maybe there is another way.

hth.

livio finos schrieb:

sorry, you are completely right!
sps is not the extension for portable file! sorry for the time I make 
you spend.

I try to make my problem more clear.
I exporting a dataset from limesurvey (a free software for internet 
survey). It works very fine and it allow to export in different 
format such as csv and excel. this fine, but what I like from spss 
formats is the variables labels.
limesurvey declares to export in spss, but it export in sps format 
which is not a format but a code actually. I realize it just now, 
sorry. I attach an extract of the code here below.
do you have any suggestion on how to manage that? I think it will be 
great if we can improve the interconnettivity among free software.

thanks again..
  livio

NEW FILE.
FILE TYPE NESTED RECORD=1(A).
- RECORD TYPE 'A'.
- DATA LIST LIST / i0(A1) d1(N3) d2(DATETIME20.0) d3(A15) d4(N2) 
d5(N1) d6(N1) d7(N1) d8(N1) d9(N1) d10(N1) d11(N1) d12(N1) d13(N1) 
d14(N1) d15(N1) d16(N1) d17(N2) d18(N1) d19(N1) d20(N1) .


- RECORD TYPE 'B'.
- DATA LIST LIST / i1(A1) d21(N1) d22(N1) d23(N1) d24(N1) d25(N1) 
d26(N1) d27(N1) d28(N1) d29(N1) d30(N1) d31(N1) d32(N1) d33(N1) 
d34(N1) d35(N1) d36(N1) d37(N1) d38(A37) d39(N1) d40(N1) .


- RECORD TYPE 'C'.
- DATA LIST LIST / i2(A1) d41(N1) d42(N1) d43(N1) d44(N1) d45(N1) 
d46(N1) d47(N1) d48(N1) d49(N1) d50(N1) d51(N1) d52(N1) d53(N1) 
d54(N1) d55(N1) d56(N1) d57(N1) d58(N1) d59(N1) d60(N1) .


- RECORD TYPE 'D'.
- DATA LIST LIST / i3(A1) d61(N1) d62(N1) d63(N1) d64(N1) d65(N1) 
d66(N1) d67(N1) d68(N1) d69(N1) d70(N1) d71(N1) d72(N1) d73(N1) 
d74(N1) d75(N1) d76(N1) d77(N1) d78(N1) d79(N1) d80(N1) .


- RECORD TYPE 'E'.
- DATA LIST LIST / i4(A1) d81(N1) d82(N1) d83(N1) d84(N1) d85(N1) 
d86(N1) d87(N1) d88(N1) d89(N1) d90(N1) d91(N1) d92(N1) d93(N1) 
d94(N1) d95(N1) d96(N1) d97(N1) d98(N1) d99(N1) d100(N1) .


- RECORD TYPE 'F'.
- DATA LIST LIST / i5(A1) d101(N1) d102(N1) d103(N1) d104(N1) 
d105(N1) d106(N1) d107(N1) d108(N1) d109(N1) .

END FILE TYPE.

BEGIN DATA
A '8' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '1' '1' 
'1' '1' '4' '0' '' '1' '' '1'
B '' '1' '1' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '2' '7' '3' '2' 
'0''2' '3'
C '3' '4' '4' '2' '2' '1' '4' '4' '2' '3' '1' '4' '1' '4' '1' '1' '4' 
'3' '4' '3'
D '1' '3' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '2' '2' '2' '2' '3' 
'3' '3' '2'
E '2' '3' '2' '2' '0''0''0''0''0''0''0''3' '4' '4' '2' '1' '5' '2' 
'5' '4'

F '1' '2' '2' '2' '2' '1' '2' '2' '5'
A '9' '01-01-1980 00:00:00' 'it' '13' '2' '1' '' '1' '' '0' '' '1' 
'1' '1' '3' '0' '' '1' '' '1'
B '' '0' '' '1' '1' '0' '' '0' '' '2' '2' '1' '4' '3' '7' '3' '2' 
'0''2' '4'
C '3' '4' '4' '3' '3' '3' '4' '3' '2' '2' '1' '3' '1' '4' '1' '4' '4' 
'4' '4' '3'
D '1' '2' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '1' '3' '3' '3' 
'3' '3' '3'
E '3' '3' '3' '2' '0''0''0''0''0''0''0''3' '4' '2' '2' '5' '3' '5' 
'2' '5'

F '1' '5' '5' '5' '5' '5' '5' '2' '5'
A '10' '01-01-1980 00:00:00' 'it' '13' '1' '1' '' '1' '' '1' '2' '0' 
'' '0' '' '0' '' '1' '' '1'
B '' '1' '2' '0' '' '0' '' '0' '' '1' '2' '1' '4' '6' '7' '3' '2' 
'0''3' '3'
C '4' '4' '4' '4' '3' '4' '4' '3' '2' '2' '1' '4' '1' '4' '1' '3' '4' 
'4' '4' '1'
D '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '1' '3' '3' '3' '3' '3' '3' 
'3' '3' '3'
E '3' '3' '3' '2' '0''0''0''0''0''0''0''5' '4' '5' '2' '5' '5' '5' 
'5' '5'

F '1' '5' '5' '5' '5' '5' '5' '1' '5'
END DATA.
EXECUTE.

*Define Variable Properties.
VARIABLE LABELS d1 'Record ID'.
VARIABLE LABELS d2 'Data di completamento'.
VARIABLE LABELS d3 'Lingua di partenza'.
VARIABLE LABELS d4 'EtÃ :'.
VARIABLE LABELS d5 'Sesso:'.
VARIABLE LABELS d6 '3 - PapÃ '.
VARIABLE LABELS d7 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d8 '3 - Mamma'.
VARIABLE LABELS d9 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d10 '3 - Fratelli nÂ°'.
VARIABLE LABELS d11 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d12 '3 - Sorelle nÂ°'.
VARIABLE LABELS d13 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d14 '3 - Nonni nÂ°'.
VARIABLE LABELS d15 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d16 '3 - Altre figure parentali (zii, cugini, ecc.) 
nÂ°'.

VARIABLE LABELS d17 'Comapos;Ã¨ composta la tua famiglia? - COMMENT'.
VARIABLE LABELS d18 '4 - PapÃ '.
VARIABLE LABELS d19 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

VARIABLE LABELS d20 '4 - Mamma'.
VARIABLE LABELS d21 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

VARIABLE LABELS d22 '4 - Fratelli nÂ°'.
VARIABLE LABELS d23 'QualiÂ di queste personeÂ vivono in casa con te? 
- COMMENT'.

VARIABLE LABELS d24 '4 -

Re: [R] construct a vector

2008-11-26 Thread Marc Schwartz

on 11/26/2008 07:11 AM axionator wrote:
 Hi all,
 I have an unkown number of vectors (=2) all of the same length. Out
 of these, I want to construct a new one as follows:
 having vectors u,v and w, the resulting vector z should have entries:
 z[1] = u[1], z[2] = v[1], z[3] = w[1]
 z[4] = u[2], z[5] = v[2], z[6] = w[2]
 ...
 i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ...
 elements and store them consecutively in z.
 Is there an efficient way in R to do this?
 
 Thanks in advance
 Armin

Is this what you want?

u - 1:10
v - 11:20
w - 21:30

z - as.vector(rbind(u, v, w))

 z
 [1]  1 11 21  2 12 22  3 13 23  4 14 24  5 15 25  6 16 26  7 17 27  8
[23] 18 28  9 19 29 10 20 30


Essentially, we are creating a matrix from the 3 vectors:

 rbind(u, v, w)
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
u12345678910
v   11   12   13   14   15   16   17   18   1920
w   21   22   23   24   25   26   27   28   2930

Then coercing that to a vector, taking advantage of the way in which
matrix elements are stored.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] construct a vector

2008-11-26 Thread Richard . Cotton

 I have an unkown number of vectors (=2) all of the same length. Out
 of these, I want to construct a new one as follows:
 having vectors u,v and w, the resulting vector z should have entries:
 z[1] = u[1], z[2] = v[1], z[3] = w[1]
 z[4] = u[2], z[5] = v[2], z[6] = w[2]
 ...
 i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ...
 elements and store them consecutively in z.
 Is there an efficient way in R to do this?

u - 1:5
v - (1:5) + 0.1
w - (1:5) + 0.2
as.vector(rbind(u,v,w))
# [1] 1.0 1.1 1.2 2.0 2.1 2.2 3.0 3.1 3.2 4.0 4.1 4.2 5.0 5.1 5.2

Regards,
Richie.

Mathematical Sciences Unit
HSL



ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

2008-11-26 Thread Frank E Harrell Jr


Charlie Brush wrote:

I am doing multiple imputation with Hmisc, and
can't figure out how to replace the NA values with
the imputed values.

Here's a general ourline of the process:

  set.seed(23)
  library(mice)
  library(Hmisc)
  library(Design)
  d - read.table(DailyDataRaw_01.txt,header=T)
  length(d);length(d[,1])
[1] 43
[1] 2666
Do for this data set, there are 43 columns and 2666 rows

Here is a piece of data.frame d:
  d[1:20,4:6]
 P01  P02  P03
1  0.1 0.16 0.16
2   NA 0.00 0.00
3   NA 0.60 0.04
4   NA 0.15 0.00
5   NA 0.00 0.00
6  0.7 0.00 0.75
7   NA 0.00 0.00
8   NA 0.00 0.00
9  0.0 0.00 0.00
10 0.0 0.00 0.00
11 0.0 0.00 0.00
12 0.0 0.00 0.00
13 0.0 0.00 0.00
14 0.0 0.00 0.00
15 0.0 0.00 0.03
16  NA 0.00 0.00
17  NA 0.01 0.00
18 0.0 0.00 0.00
19 0.0 0.00 0.00
20 0.0 0.00 0.00

These are daily precipitation values at NCDC stations, and
NA values at station P01 will be filled using multiple
imputation and data from highly correlated stations P02 and P08:

  f - aregImpute(~ I(P01) + I(P02) + I(P08), 
n.impute=10,match='closest',data=d)

Iteration 13
  fmi - fit.mult.impute( P01 ~ P02 + P08 , ols, f, d)

Variance Inflation Factors Due to Imputation:

Intercept   P02   P08
   1.01  1.39  1.16

Rate of Missing Information:

Intercept   P02   P08
   0.01  0.28  0.14

d.f. for t-distribution for Tests of Single Coefficients:

Intercept   P02   P08
242291.18116.05454.95
  r - apply(f$imputed$P01,1,mean)
  r
   2 3 4 5 7 81617   249   250   251
0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002
 252   253   254   255   256   257   258   259   260   261   262
1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719
 263   264   265   266   267   268   269   270   271   272   273
1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135
 274   275   276   277   278   279   280   281   282   283   284
0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029
 285   286   287   288   289   418   419   420   421   422   700
0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002

--
So far, this is working great.
Now, make a copy of d:
  dnew - d

And then fill in the NA values in P01 with the values in r

For example:
  for (i in 1:length(r)){
   dnew$P01[r[i,1]] - r[i,2]
   }
This doesn't work, because each 'piece' of r is two numbers:
  r[1]
  2
0.002
  r[1,1]
Error in r[1, 1] : incorrect number of dimensions

My question: how can I separate the the two items in (for example)
r[1] to use the first part as an index and the second as a value,
and then use them to replace the NA values with the imputed values?

Or is there a better way to replace the NA values with the imputed values?

Thanks in advance for any help.



You didn't state your goal, and why fit.mult.impute does not do what you 
want.   But you can look inside fit.mult.impute to see how it retrieves 
the imputed values.  Also see the example in documentation for transcan 
in which the command impute(xt, imputation=1) to retrieve one of the 
multiple imputations.


Note that you can say library(Design) (omit the quotes) to access both 
Design and Hmisc.


Frank
--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] construct a vector

2008-11-26 Thread Wacek Kusnierczyk

axionator wrote:
 Hi all,
 I have an unkown number of vectors (=2) all of the same length. Out
 of these, I want to construct a new one as follows:
 having vectors u,v and w, the resulting vector z should have entries:
 z[1] = u[1], z[2] = v[1], z[3] = w[1]
 z[4] = u[2], z[5] = v[2], z[6] = w[2]
 ...
 i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ...
 elements and store them consecutively in z.
 Is there an efficient way in R to do this?

   

suppose you have your vectors collected into a list, say vs; then the
following will do:

as.vector(do.call(rbind, vs))

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] construct a vector

2008-11-26 Thread axionator

Thanks, works fine.

Armin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about Kolmogorov-Smirnov Test

2008-11-26 Thread Uwe Ligges




Ricardo Ríos wrote:

Hi wizards

I have the following code for a Kolmogorov-Smirnov Test:

z-c(1.6,10.3,3.5,13.5,18.4,7.7,24.3,10.7,8.4,4.9,7.9,12,16.2,6.8,14.7)
ks.test(z,pexp,1/10)$statistic

The Kolmogorov-Smirnov statistic is:

   D
0.293383

However, I have calculated the Kolmogorov-Smirnov statistic with the
following R code:

z-c(1.6,10.3,3.5,13.5,18.4,7.7,24.3,10.7,8.4,4.9,7.9,12,16.2,6.8,14.7)
a-sort(z)
d- pexp(a, rate = 1/10, lower.tail = TRUE, log.p = FALSE)
w=numeric(length = length(a))
for(i in 1:length(a)) w[i]=i/15
max(abs(w-d))

But I have obtained the following result:

[1] 0.2267163

Why these results are not equal?


w is calculated as follows:

w - (seq(along=a)-1)/length(a)
[ {0, ..., n-1} rather than {1, ..., n} ]


Uwe Ligges



Thanks in advance



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory limit

2008-11-26 Thread seanpor

Good afternoon,

The short answer is yes, the long answer is it depends.

It all depends on what you want to do with the data, I'm working with
dataframes of a couple of million lines, on this plain desktop machine and
for my purposes it works fine. I read in text files, manipulate them,
convert them into dataframes, do some basic descriptive stats and tests on
them, a couple of columns at a time, all quick and simple in R. There are
some libraries which are setup to handle very large datasets, e.g. biglm
[1].

If you're using algorithms which require vast quantities of memory, then as
the previous emails in this thread suggest, you might need R running on
64-bit.

If you're working with a problem which is embarrassingly parallel[2], then
there are a variety of solutions - if you're in between then the solutions
are much more data dependant.

the flip question: how long would it take you to get up and running with the
functionallity (tried and tested in R) you require if you're going to be
re-working things in C++?

I suggest that you have a look at R, possibly using a subset of your full
set to start with - you'll be amazed how quickly you can get up and running.

As suggested at the start of this email... it depends...

Best Regards,
Sean O'Riordain
Dublin

[1] http://cran.r-project.org/web/packages/biglm/index.html
[2] http://en.wikipedia.org/wiki/Embarrassingly_parallel

iwalters wrote:

I'm currently working with very large datasets that consist out of
1,000,000 + rows. Is it at all possible to use R for datasets this size
or should I rather consider C++/Java.

--
View this message in context:
http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20700590.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to measure the significant difference between two time series

2008-11-26 Thread Gerard M. Keogh

Q1: Quick answer

a) you need to remove the seasonality - there s/b a tool in the time series
package to do this - though I'm not familar enough with R to know this.
b) check the resulting series  to see if it is stationary  - acf decays
quickly i.e. within a couple of lags.
c) if two series are stationary then overlay the acf plots and if second
plot lies within the CI of the first they are the same.
d) on the seasonal parts of the two series do regression on year vs b1*q1+
b2*q2 + b3*q3 + b4*q4 and if the coefficients of the second are within the
2 Se band of the first - seasonality is similar.

if c) and d) are not signif - the two series are similar.

Also, standardise all your series so that they have mean 0 (or 1) and
variance 1 before you start - it'll make the numerics and comparisons
better.
Also, before starting you may need to take logs if the series oscillations
increase with time.

Q2 - don't know what you mean by interannual variation - within year
possibly.

Hope this helps.

Gerard



   
 Bin Zhao
 [EMAIL PROTECTED] 
 l.com To 
 Sent by:  r-help r-help@r-project.org 
 [EMAIL PROTECTED]  cc 
 project.org   
   Subject 
   [R] How to measure the significant  
 24/11/2008 04:13  difference between two time 
   series  
   
 Please respond to 
 [EMAIL PROTECTED] 
   .com
   
   




Dear R experts and statisticians,

I have some time series datasets, they are several years vegetation indices
(about 50 data points per year) sampled from different station. These
indices have similar dynamics with seasonal change.

My questions are,
1) How can I compare the difference among the indices, and how can I say
there is significant differnce between two time series. They don't have the
same values in the same date, but they may have the similar tendency and
change curves?

2) I also want to compare the interannual variation of the vegetation index
in the same station. Of course, the index can be separated into several
sub-series, i.e., one year one sub-series. Now I have the same question as
1), how can I compare the significant differnce between different year?

If there are mature theoretical framework can resolve my questions, is
there any package in R can do this?

Any further advice is highly appreciated.


Regards,

Bin Zhao

2008-11-24 , 10:53:58

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



**
The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential and/or privileged material. Any 
review, retransmission, dissemination or other use of, or taking of any action 
in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you received this in error, please contact 
the sender and delete the material from any computer.  It is the policy of the 
Department of Justice, Equality and Law Reform and the Agencies and Offices 
using its IT services to disallow the sending of offensive material.
Should you consider that the material contained in this message is offensive 
you should contact the sender immediately and also mailminder[at]justice.ie.

Is le haghaidh an duine nó an eintitis ar a bhfuil sí dírithe, agus le haghaidh 
an duine nó an eintitis sin amháin, a bheartaítear an fhaisnéis a tarchuireadh 
agus féadfaidh sé go bhfuil ábhar faoi rún agus/nó faoi phribhléid inti. 
Toirmisctear aon athbhreithniú, atarchur nó leathadh a dhéanamh ar an 
bhfaisnéis seo, aon úsáid eile a bhaint aisti nó aon ghníomh a dhéanamh ar a 
hiontaoibh, ag daoine nó ag eintitis seachas an faighteoir beartaithe. Má fuair 
tú é seo trí dhearmad, téigh i dteagmháil leis an seoltóir, le do

Re: [R] plotting density for truncated distribution

2008-11-26 Thread Chris Andrews


Another option

mydata - rnorm(10)
mydata - mydata[mydata0]
plot(density(c(mydata, -mydata), from=0))

If you want the area under the curve to be one, you'll need to double the
density estimate

dx - density(c(mydata, -mydata), from=0)
dx$y - dx$y * 2
plot(dx)

Chris



Jeroen Ooms wrote:
 
 I am using density() to plot a density curves. However, one of my
 variables is truncated at zero, but has most of its density around zero. I
 would like to know how to plot this with the density function.
 
 The problem is that if I do this the regular way density(), values near
 zero automatically get a very low value because there are no observed
 values below zero. Furthermore there is some density below zero, although
 there are no observed values below zero. 
 
 This illustrated the problem: 
 
 mydata - rnorm(10);
 mydata - mydata[mydata0];
 plot(density(mydata));
 
 the 'real' density is exactly the right half of a normal distribution, so
 truncated at zero. However using the default options, the line seems to
 decrease with a nice curve at the left, with some density below zero. This
 is pretty confusing for the reader. I have tried to decrease the bw, masks
 (but does not fix) some of the problem, but than also the rest of the
 curve loses smoothness. I would like to make a plot of this data that
 looks like the right half of a normal distribution, while keeping the
 curve relatively smooth.
 
 Is there any way to specify this truncation in the density function, so
 that it will only use the positive domain to calculate density?
 

-- 
View this message in context: 
http://www.nabble.com/plotting-density-for-truncated-distribution-tp20684995p20699699.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running rtest - how to/ help

2008-11-26 Thread indian scorpio

Well after trying a couple of things
- rtest.java example with command line argument being --zero-init
this is the error
Creating Rengine (with arguments)
Rengine created, waiting for R
#
# An unexpected error has been detected by Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6c733b9d, pid=3640, tid=5016
#
# Java VM: Java HotSpot(TM) Client VM (10.0-b19 mixed mode, sharing windows-x86)
# Problematic frame:
# C  [R.dll+0x33b9d]
#
# An error report file with more information is saved as:
# C:\workspaceVE\XTest\hs_err_pid3640.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

2. while for rtest2.java
it remains the same that is terminates after Creating Rengine (with
arguments) But difference being some window/ frame which opens for a
fraction of a second

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread jim holtman

Your time is being taken up in cor.test because you are calling it
100,000 times.  So grin and bear it with the amount of work you are
asking it to do.

Here I am only calling it 100 time:

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, 
 function(y) { cor.test(x,y)$p.value }) }))
   user  system elapsed
   8.860.008.89


so my guess is that calling it 100,000 times will take:  100,000 *
0.0886 seconds or about 3 hours.

If you run Rprof, you will see if is spending most of its time there:

  0   8.8 root
  1.8.8 apply
  2. .8.8 FUN
  3. . .8.8 apply
  4. . . .8.7 FUN
  5. . . . .8.6 cor.test
  6. . . . . .8.4 cor.test.default
  7. . . . . . .2.4 match.arg
  8. . . . . . . .1.7 eval
  9. . . . . . . . .1.4 deparse
 10. . . . . . . . . .0.6 .deparseOpts
 11. . . . . . . . . . .0.2 pmatch
 11. . . . . . . . . . .0.1 sum
 10. . . . . . . . . .0.5 %in%
 11. . . . . . . . . . .0.3 match
 12. . . . . . . . . . . .0.3 is.factor
 13. . . . . . . . . . . . .0.3 inherits
  8. . . . . . . .0.2 formals
  9. . . . . . . . .0.2 sys.function
  7. . . . . . .2.1 cor
  8. . . . . . . .1.1 match.arg
  9. . . . . . . . .0.7 eval
 10. . . . . . . . . .0.6 deparse
 11. . . . . . . . . . .0.3 .deparseOpts
 12. . . . . . . . . . . .0.1 pmatch
 11. . . . . . . . . . .0.2 %in%
 12. . . . . . . . . . . .0.2 match
 13. . . . . . . . . . . . .0.1 is.factor
 14. . . . . . . . . . . . . .0.1 inherits
  9. . . . . . . . .0.1 formals
  8. . . . . . . .0.5 stopifnot
  9. . . . . . . . .0.2 match.call
  8. . . . . . . .0.1 pmatch
  8. . . . . . . .0.1 is.data.frame
  9. . . . . . . . .0.1 inherits
  7. . . . . . .1.5 paste
  8. . . . . . . .1.4 deparse
  9. . . . . . . . .0.6 .deparseOpts
 10. . . . . . . . . .0.3 pmatch
 10. . . . . . . . . .0.1 any
  9. . . . . . . . .0.6 %in%
 10. . . . . . . . . .0.6 match
 11. . . . . . . . . . .0.5 is.factor
 12. . . . . . . . . . . .0.4 inherits
 13. . . . . . . . . . . . .0.2 mode
  7. . . . . . .0.4 switch
  8. . . . . . . .0.1 qnorm
  7. . . . . . .0.2 pt
  5. . . . .0.1 $

On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote:

 My two matrices are roughly the sizes of m1 and m2. I tried using two apply 
 and cor.test to compute the correlation p.values. More than an hour, and the 
 codes are still running. Please help to make it more efficient.

 m1 - matrix(rnorm(10), ncol=100)
 m2 - matrix(rnorm(1000), ncol=100)

 cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) { 
 cor.test(x,y)$p.value }) })

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] construct a vector

2008-11-26 Thread axionator

Hi all,
I have an unkown number of vectors (=2) all of the same length. Out
of these, I want to construct a new one as follows:
having vectors u,v and w, the resulting vector z should have entries:
z[1] = u[1], z[2] = v[1], z[3] = w[1]
z[4] = u[2], z[5] = v[2], z[6] = w[2]
...
i.e. go through the vector u,v,w, take at each time the 1st, 2sd, ...
elements and store them consecutively in z.
Is there an efficient way in R to do this?

Thanks in advance
Armin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to check linearity in Cox regression

2008-11-26 Thread Terry Therneau

 On examining non-linearity of Cox coefficients with penalized splines -  I
 have not been able to dig up a completely clear description of the test
 performed in R or S-plus.

 One iron clad way to test is to fit a model that has the variable of 
interest 
x as a linear term, then a second model with splines, and do a likelihood 
ratio test with 2*(difference in log-likelihood) on (difference in df) degrees 
of freedom.  With a penalized model this test is conservative: the chi-square 
is 
not quite the right distribution, the true dist has the same mean but smaller 
variance.

 The pspline function uses an evenly spaced set of symmetric basis functions.  
A 
neat consequence of this is that the Wald test for linear vs 'more general' is 
a 
test that the coefficients of the spline terms fall in a linear series.  That 
is, a linear trend test on the coefficients.  This is what coxph does.  As with 
the LR test, the chi-square dist is conservative.  I have not worked at putting 
in the more correct distribution.  See Eilers and Marx, Statistical Science 
1986.
 
  And what is the null for the non-linear test?

The linear test is is a linear better than nothing, the non-linear one is a 
sequential test is the non-linear better than the linear.  The second test of 
course depends on the total number of df you allowed for the pspline fit.  As a 
silly example adding + pspline(x, df=200) would likely show that the 
nonlinear 
term was not a significant addition, i.e., not worth 199 more degrees of 
freedom.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens

I was asked by my boss to do an analysis on a large data set, and I am
trying to convince him to let me use R rather than SPSS. I think Sweave
could make my life much much easier. To get me a little closer to this
goal, I ran my analysis through R and SPSS and compared the resulting
values. In all but one case, they were the same. Given the matrix

[,1] [,2]
[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32

This is the output from R:
 chisq.test(test29)

Pearson's Chi-squared test

data:  test29
X-squared = 9.593, df = 4, p-value = 0.04787

But, the same data in SPSS generates a p value of .051. It's a small but
important difference. I played around and rescaled things, and tried
different values for B, but I never could get R to reach .051.

I'd like to know which program is correct - R or SPSS? I know, this is a
biased place to ask such a question. I also appreciate all input that
will help me use R more effectively. The difference could be the result
of my own ignorance.

thanks
--andy

-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] eclipse and R

2008-11-26 Thread Tobias Verbeke

Hi Ruud, 

I forwarded your message to the StatET (R in Eclipse) list;
there might be StatET users with a similar setup as yours
on that list (and the StatET developer is more likely to 
pick up your question there).

Best,
Tobias

Hello, I am trying to install Eclipse and R on an amd64 machine running 
Suse linux 9.3. I have compiled R 2.8.0 with --enable-R-shlib and it 
seems that compilation was successfull. After starting R, I installed 
the latest rJava package, from the output:
checking whether JRI is requested... yes
cp src/libjri.so libjri.so
It seems JRI support has been compiled successfully. However, when I try 
to open R from within Eclipse, I receive an error message:

Launching the R Console was cancelled, because it seems starting the 
Java process/R engine failed.
Please make sure that R package 'rJava' with JRI is installed.

I can open an R console from the command line, and attach the rJava 
library without problems. What am I doing wrong here?
Thanks, Ruud

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Efficient passing through big data.frame and modifying select

2008-11-26 Thread Johannes Graumann

Marvelous! Thanks guys for your hints and time! Very smooth now!

Joh

On Wednesday 26 November 2008 03:41:49 Henrik Bengtsson wrote:
 Alright, here are another $.02: using 'use.names=FALSE' in unlist() is
 much faster than the default 'use.names=TRUE'. /Henrik

 On Tue, Nov 25, 2008 at 6:40 PM, Henrik Bengtsson [EMAIL PROTECTED] 
wrote:
  My $.02: Using argument 'fixed=TRUE' in strsplit() is much faster than
  the default 'fixed=FALSE'. /Henrik
 
  On Tue, Nov 25, 2008 at 1:02 PM, William Dunlap [EMAIL PROTECTED] wrote:
  -Original Message-
  From: William Dunlap
  Sent: Tuesday, November 25, 2008 9:16 AM
  To: '[EMAIL PROTECTED]'
  Subject: Re: [R] Efficient passing through big data.frame and
  modifying select fields
 
   Johannes Graumann johannes_graumann at web.de
   Tue Nov 25 15:16:01 CET 2008
  
   Hi all,
  
   I have relatively big data frames ( 1 rows by 80 columns)
   that need to be exposed to merge. Works marvelously well in
   general, but some fields of the data frames actually contain
   multiple ;-separated values encoded as a character string without
   defined order, which makes the fields not match each other.
  
   Example:
frame1[1,1]
  
   [1] some;thing
  
   frame2[2,1]
  
   [2] thing;some
  
   In order to enable merging/duplicate identification of columns
   containing these strings, I wrote the following function, which
   passes through the rows one by one, identifies ;-containing cells,
   splits and resorts them.
  
   ResortCombinedFields - function(dframe){
if(!is.data.frame(dframe)){
  stop(\ResortCombinedFields\ input needs to be a data frame.)
}
for(row in seq(nrow(dframe))){
  for(mef in grep(;,dframe[row,])){
 
  I needed to add drop=TRUE to the above dframe[row,] for this to work.
 
dframe[row,mef] -
 
  paste(sort(unlist(strsplit(dframe[row,mef],;))),collapse=;)
 
  }
}
return(dframe)
   }
  
   works fine, but is horribly inefficient. How might this be
 
  tackled more elegantly?
 
   Thanks for any input, Joh
 
  It is usually faster to loop over columns of an data frame and use row
  subscripting, if needed, on individual columns.  E.g., the following
  2 are much quicker on a sample 1000 by 4 dataset I made with
 
  dframe-data.frame(lapply(c(One=1,Two=2,Three=3),
 function(i)sapply(1:1000,
function(i)
 
  paste(sample(LETTERS[1:5],size=sample(3,size=1),repl=FALSE),
  collapse=;))),
 stringsAsFactors=FALSE)
  dframe$Four-sample(LETTERS[1:5], size=nrow(dframe),
  replace=TRUE) # no ;'s in column Four
 
  The first function, f1, doesn't try to find which rows may
  need adjusting
  and the second, f2, does.
 
  f1 - function(dframe){
if(!is.data.frame(dframe)){
  stop(\ResortCombinedFields\ input needs to be a data frame.)
}
for(icol in seq_len(ncol(dframe))){
  dframe[,icol] - unlist(lapply(strsplit(dframe[,icol],
  ;), function(parts) paste(sort(parts), collapse=;)))
}
return(dframe)
  }
 
  f2 -
  function(dframe){
if(!is.data.frame(dframe)){
  stop(\ResortCombinedFields\ input needs to be a data frame.)
}
for(icol in seq_len(ncol(dframe))){
  col - dframe[,icol]
  irow - grep(;, col)
  if (length(irow)) {
  col[irow] - unlist(lapply(strsplit(col[irow], ;),
  function(parts) paste(sort(parts), collapse=;)))
  dframe[,icol] - col
  }
}
return(dframe)
  }
 
  Times were
 
   unix.time(z-ResortCombinedFields(dframe))
 
 user  system elapsed
2.526   0.022   2.559
 
   unix.time(f1z-f1(dframe))
 
 user  system elapsed
0.509   0.000   0.508
 
   unix.time(f2z-f2(dframe))
 
 user  system elapsed
0.259   0.004   0.264
 
   identical(z, f1z)
 
  [1] TRUE
 
   identical(z, f2z)
 
  [1] TRUE
 
  In R 2.7.0 (April 2008) f1() and f2() both take time proportional
  to nrow(dframe), while your original ResortCombinedFields() takes
  time proportional to the square of nrow(dframe).  E.g., for 50,000
  rows ResortCombinedFields takes 4252 seconds while f2 takes 14 seconds
  It looks like 2.9 acts about the same.
 
  Bill Dunlap
  TIBCO Software Inc - Spotfire Division
  wdunlap tibco.com
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html and provide commented,
  minimal, self-contained, reproducible code.



signature.asc
Description: This is a digitally signed message part.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Finding Stopping time

2008-11-26 Thread Debanjan Bhattacharjee

Can any one help me to solve problem in my code? I am actually trying to
find the stopping index N.
So first I generate random numbers from normals. There is no problem in
finding the first stopping index.
Now I want to find the second stopping index using obeservation starting
from the one after the first stopping index.
E.g. If my first stopping index was 5. I want to set 6th observation from
the generated normal variables as the first random
number, and I stop at second stopping index.

This is my code,


alpha - 0.05
beta - 0.07
a - log((1-beta)/alpha)
b - log(beta/(1-alpha))
theta1 - 2
theta2 - 3

cumsm-function(n)
  {y-NULL
   for(i in 1:n)
   {y[i]=x[i]^2}
   s=sum(y)
   return(s)
  }
psum - function(p,q)
   {z - NULL
for(l in p:q)
   { z[l-p+1] - x[l]^2}
ps - sum(z)
return(ps)
   }
smm - NULL
sm - NULL
N - NULL
Nout - NULL
T - NULL
k-0
 x - rnorm(100,theta1,theta1)
 for(i in 1:length(x))
{
   sm[i] - psum(1,i)
   T[i] -
((i/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*sm[i])-(i*(theta2-theta1)/2)
   if (T[i]=b | T[i]=a){N[1]-i
  break}

}
for(j in 2:200)
{
 for(k in (N[j-1]+1):length(x))
{  smm[k] - psum((N[j-1]+1),k)
   T[k] -
((k/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*smm[k])-(k*(theta2-theta1)/2)
   if (T[k]=b | T[k]=a){N[j]-k
  break}
 }
}

But I cannot get the stopping index after the first one.

Tanks
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread David Winsemius

He might try rcorr from Hmisc instead. Using your test suite, it gives  
about a 20% improvement on my MacPro:


 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { rcorr(x,y)$P }) }))

   user  system elapsed
  4.221   0.049   4.289

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { cor.test(x,y)$p.value }) }))

   user  system elapsed
  5.328   0.038   5.355

I'm not a smart enough programmer to figure out whether there might be  
an even more efficient method that takes advantage rcorr's  implicit  
looping through a set of columns to produce an all combinations  
return.


--
David Winsemius, MD
Heritage Labs


On Nov 26, 2008, at 9:14 AM, jim holtman wrote:


Your time is being taken up in cor.test because you are calling it
100,000 times.  So grin and bear it with the amount of work you are
asking it to do.

Here I am only calling it 100 time:


m1 - matrix(rnorm(1), ncol=100)
m2 - matrix(rnorm(1), ncol=100)
Rprof('/tempxx.txt')
system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,  
function(y) { cor.test(x,y)$p.value }) }))

  user  system elapsed
  8.860.008.89




so my guess is that calling it 100,000 times will take:  100,000 *
0.0886 seconds or about 3 hours.

If you run Rprof, you will see if is spending most of its time there:

 0   8.8 root
 1.8.8 apply
 2. .8.8 FUN
 3. . .8.8 apply
 4. . . .8.7 FUN
 5. . . . .8.6 cor.test
 6. . . . . .8.4 cor.test.default
 7. . . . . . .2.4 match.arg
 8. . . . . . . .1.7 eval
 9. . . . . . . . .1.4 deparse
10. . . . . . . . . .0.6 .deparseOpts
11. . . . . . . . . . .0.2 pmatch
11. . . . . . . . . . .0.1 sum
10. . . . . . . . . .0.5 %in%
11. . . . . . . . . . .0.3 match
12. . . . . . . . . . . .0.3 is.factor
13. . . . . . . . . . . . .0.3 inherits
 8. . . . . . . .0.2 formals
 9. . . . . . . . .0.2 sys.function
 7. . . . . . .2.1 cor
 8. . . . . . . .1.1 match.arg
 9. . . . . . . . .0.7 eval
10. . . . . . . . . .0.6 deparse
11. . . . . . . . . . .0.3 .deparseOpts
12. . . . . . . . . . . .0.1 pmatch
11. . . . . . . . . . .0.2 %in%
12. . . . . . . . . . . .0.2 match
13. . . . . . . . . . . . .0.1 is.factor
14. . . . . . . . . . . . . .0.1 inherits
 9. . . . . . . . .0.1 formals
 8. . . . . . . .0.5 stopifnot
 9. . . . . . . . .0.2 match.call
 8. . . . . . . .0.1 pmatch
 8. . . . . . . .0.1 is.data.frame
 9. . . . . . . . .0.1 inherits
 7. . . . . . .1.5 paste
 8. . . . . . . .1.4 deparse
 9. . . . . . . . .0.6 .deparseOpts
10. . . . . . . . . .0.3 pmatch
10. . . . . . . . . .0.1 any
 9. . . . . . . . .0.6 %in%
10. . . . . . . . . .0.6 match
11. . . . . . . . . . .0.5 is.factor
12. . . . . . . . . . . .0.4 inherits
13. . . . . . . . . . . . .0.2 mode
 7. . . . . . .0.4 switch
 8. . . . . . . .0.1 qnorm
 7. . . . . . .0.2 pt
 5. . . . .0.1 $

On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED]  
wrote:


My two matrices are roughly the sizes of m1 and m2. I tried using  
two apply and cor.test to compute the correlation p.values. More  
than an hour, and the codes are still running. Please help to make  
it more efficient.


m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(1000), ncol=100)

cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y)  
{ cor.test(x,y)$p.value }) })


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory limit

2008-11-26 Thread iwalters


I'm currently working with very large datasets that consist out of 1,000,000
+ rows.  Is it at all possible to use R for datasets this size or should I
rather consider C++/Java.


-- 
View this message in context: 
http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Needs suggestions for choosing appropriate R packages

2008-11-26 Thread zhijie zhang

Dear all,
  I am thinking to fit a multilevel dataset  with R. I have found several
possible packages for my task, such as
glmmPQL(MASS),glmm(repeated),glmer(lme4), et al. I am a little confused by
these functions.
  Could anybody tell me which function/package is the correct one to analyse
my dataset?
My dataset is as follows:
the response variable P is binary variable (the subject is a patient or
not);
two explanatory variables X1 (age) and X2 (sex);
this dataset was sampled from three different level,
district,school,individual,  so this was regarded as a multilevel dataset.
I hope to fit the 3-level model(Y is binary variable):
Logit(Pijk)=(a0+b0k+c0jk)+b1*X1+b2*X2
  i-individual, first level;   j-school, 2nd level;   k-district,3rd level.
 I know that the GLIMMIX procedure in the latest version SAS9.2 is a choice
for that, but unfortunately we donot have the latest version.
 R must have the similar functions to do that,  can anybody give me some
suggestions or help on analysing my dataset?
Q1: Which package/functions is appropriate for my task?  Could u show me
some example codes if possible?
Q2: Logit(Pijk)=(a0+b0k+c0jk)+(b1+b1j)*X1+b2*X2
  If the randome effect was also specified in X1 as above, Which
package/functions is possible?
  Thanks a lot.


-- 
With Kind Regards,

oooO:
(..):
:\.(:::Oooo::
::\_)::(..)::
:::)./:::
::(_/
:
[***]
ZhiJie Zhang ,PhD
Dept.of Epidemiology, School of Public Health,Fudan University
Office:Room 443, Building 8
Office Tel./Fax.:+86-21-54237410
Address:No. 138 Yi Xue Yuan Road,Shanghai,China
Postcode:200032
Email:[EMAIL PROTECTED] [EMAIL PROTECTED]
Website: www.statABC.com
[***]
oooO:
(..):
:\.(:::Oooo::
::\_)::(..)::
:::)./:::
::(_/
:

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plm pakage

2008-11-26 Thread valeria pedrina

Hi everyone, I'm doing a panel data analisys and I want to run three
estimation methods against my available dataset:pooled OLS, random and fixed
effects. I have 9 individuals and 5 observation for each individual. this is
my code,what's wrong?

X - cbind(y,x)
X -data.frame(X)
ooo-pdata.frame(X,9)
vedo-plm(y~x, data=ooo)

and this is the error:
Errore in X.m[, coef.within, drop = F] : numero di dimensione errato
thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] S4 object

2008-11-26 Thread Laurina Guerra

Hola buen dia!! Alguien me podría orientar acerca de que es la clase S4
object y como funciona de la forma mas sencilla posible. Se les agradecería
su respuesta lo mas pronto posible.. gracias!!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Chuck Cleland

On 11/26/2008 9:51 AM, Andrew Choens wrote:
 I was asked by my boss to do an analysis on a large data set, and I am
 trying to convince him to let me use R rather than SPSS. I think Sweave
 could make my life much much easier. To get me a little closer to this
 goal, I ran my analysis through R and SPSS and compared the resulting
 values. In all but one case, they were the same. Given the matrix
 
 [,1] [,2]
 [1,]  110  358
 [2,]   71  312
 [3,]   29  139
 [4,]   31   77
 [5,]   13   32
 
 This is the output from R:
 chisq.test(test29)
 
   Pearson's Chi-squared test
 
 data:  test29
 X-squared = 9.593, df = 4, p-value = 0.04787
 
 But, the same data in SPSS generates a p value of .051. It's a small but
 important difference. I played around and rescaled things, and tried
 different values for B, but I never could get R to reach .051.
 
 I'd like to know which program is correct - R or SPSS? I know, this is a
 biased place to ask such a question. I also appreciate all input that
 will help me use R more effectively. The difference could be the result
 of my own ignorance.

  The SPSS p-value is for the Likelihood Ratio Chi-squared test, not
Pearson's.  For Pearson's Chi-squared test in SPSS (16.0.2), I get
p=0.04787, so the results do match if you do the same Chi-squared test.

 thanks
 --andy 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread Jorge Ivan Velez

Hi Daren,
Here is another aproach a little bit faster taking into account that I'm
using your original matrices.  My session info is at the end. I'm using a
2.4 GHz Core 2-Duo processor and 3 GB of RAM.

 # Data
 set.seed(123)
 m1 - matrix(rnorm(10), ncol=100)
 m2 - matrix(rnorm(10), ncol=100)
 colnames(m1)=paste('m1_',1:100,sep=)
 colnames(m2)=paste('m2_',1:100,sep=)

# Combinations
 combs=expand.grid(colnames(m1),colnames(m2))

# ---
# Option 1
#
system.time(apply(combs,1,function(x)
cor.test(m1[,x[1]],m2[,x[2]])$p.value)-pvalues1)
#  user  system elapsed
#   8.120.018.20

# ---
# Option 2
#
require(Hmisc)
system.time(apply(combs,1,function(x)
rcorr(m1[,x[1]],m2[,x[2]])$P[2])-pvalues2)
#   user  system elapsed
#   7.000.007.02


HTH,

Jorge


# -  Session Info 
R version 2.8.0 Patched (2008-11-08 r46864)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote:


 My two matrices are roughly the sizes of m1 and m2. I tried using two apply
 and cor.test to compute the correlation p.values. More than an hour, and the
 codes are still running. Please help to make it more efficient.

 m1 - matrix(rnorm(10), ncol=100)
 m2 - matrix(rnorm(1000), ncol=100)

 cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
 cor.test(x,y)$p.value }) })

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] S4 slot containing either aov or NULL

2008-11-26 Thread Thomas Kaliwe


Dear listmembers,

I would like to define a class with a slot that takes either an object 
of class aov or NULL. I have been reading S4 Classes in 15 pages more 
or less and Lecture: S4 classes and methods


#First I tried with list and NULL
setClass(listOrNULL)
setIs(list, listOrNULL)
setIs(NULL, listOrNULL)
#doesn't work

#defining a union class it works with list and null
setClassUnion(listOrNULL, c(list, NULL))
setClass(c1, representation(value = listOrNULL))
y1 = new(c1, value = NULL)
y2 = new(c1, value = list(a = 10))

#but it won't work with aov or null
setClassUnion(aovOrNULL, c(aov, NULL))
setClass(c1, representation(value = aovOrNULL))
y1 = new(c1, value = NULL)

#trying to assign an aov object to the slot doesn't work
utils::data(npk, package=MASS)
npk.aov - aov(yield ~ block + N*P*K, npk)
y2 = new(c1, value = npk.aov )

Any ideas?

Thank you

Thomas Kaliwe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] odfWeave and XML... on a Mac

2008-11-26 Thread Tubin


I'm trying out odfWeave in a Mac environment and getting some odd behavior.
Having figured out that the code snippets only work if they're in certain
fonts, I was able to get R to run a test document through and produce an
output document.  After running it, though, I get a warning message: 

Warning message:
In file.remove(styles_2.xml) :
  cannot remove file 'styles_2.xml', reason 'No such file or directory'

This message is interesting given that about 20 lines earlier I see:
Renaming styles_2.xml to styles.xml

If I run the test doc in results=verbatim mode, I see that warning by my
results appear in the appropriate places on the output document:

*output*
This is the basic text stuff.  Now I will try to input the other stuff:

[1] 5

And this is the after-text. 
*end*

If I run the test document in results=xml mode, though, the output is blank:

***output*

This is the basic text stuff.  Now I will try to input the other stuff:


And this is the after-text.
**end*

Earlier posts on this forum suggest that the solution may involve loading an
earlier build of XML.  Is that likely to work?  And if so - stupid question
I'm sure, but how do I do that?

Thanks in advance for the time and attention of people more experienced than
myself...


-- 
View this message in context: 
http://www.nabble.com/odfWeave-and-XML...-on-a-Mac-tp20702670p20702670.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread hadley wickham

On Wed, Nov 26, 2008 at 8:14 AM, jim holtman [EMAIL PROTECTED] wrote:
 Your time is being taken up in cor.test because you are calling it
 100,000 times.  So grin and bear it with the amount of work you are
 asking it to do.

 Here I am only calling it 100 time:

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, 
 function(y) { cor.test(x,y)$p.value }) }))
   user  system elapsed
   8.860.008.89


 so my guess is that calling it 100,000 times will take:  100,000 *
 0.0886 seconds or about 3 hours.

You can make it ~3 times faster by vectorising the testing:

m1 - matrix(rnorm(1), ncol=100)
m2 - matrix(rnorm(1), ncol=100)

system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,
function(y) { cor.test(x,y)$p.value })}))


system.time({
r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})

df - nrow(m1) - 2
t - sqrt(df) * r / sqrt(1 - r ^ 2)
p - pt(t, df)
p - 2 * pmin(p, 1 - p)
})


all.equal(cor.pvalues, p)


You can make cor much faster by stripping away all the error checking
code and calling the internal c function  directly (suggested by the
Rprof output):


system.time({
r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
})

system.time({
r2 - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
.Internal(cor(x, y, 4L, FALSE)) })})
})

1.5s vs 0.2 s on my computer.  Combining both changes gives me a ~25
time speed up - I suspect you can do even better if you think about
what calculations are being duplicated in the computation of the
correlations.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread Daren Tan


Out of desperation, I made the following function which hadley beats me to it 
:P. Thanks everyone for the great help. 
 

cor.p.values - function(r, n) {
  df - n - 2
  STATISTIC - c(sqrt(df) * r / sqrt(1 - r^2))
  p - pt(STATISTIC, df)
  return(2 * pmin(p, 1 - p))
}

 Date: Wed, 26 Nov 2008 09:33:59 -0600
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Very slow: using double apply and cor.test to compute 
 correlation p.values for 2 matrices
 CC: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 
 On Wed, Nov 26, 2008 at 8:14 AM, jim holtman  wrote:
 Your time is being taken up in cor.test because you are calling it
 100,000 times. So grin and bear it with the amount of work you are
 asking it to do.

 Here I am only calling it 100 time:

 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 Rprof('/tempxx.txt')
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, 
 function(y) { cor.test(x,y)$p.value }) }))
 user system elapsed
 8.86 0.00 8.89


 so my guess is that calling it 100,000 times will take: 100,000 *
 0.0886 seconds or about 3 hours.
 
 You can make it ~3 times faster by vectorising the testing:
 
 m1 - matrix(rnorm(1), ncol=100)
 m2 - matrix(rnorm(1), ncol=100)
 
 system.time(cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1,
 function(y) { cor.test(x,y)$p.value })}))
 
 
 system.time({
 r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
 
 df - nrow(m1) - 2
 t - sqrt(df) * r / sqrt(1 - r ^ 2)
 p - pt(t, df)
 p - 2 * pmin(p, 1 - p)
 })
 
 
 all.equal(cor.pvalues, p)
 
 
 You can make cor much faster by stripping away all the error checking
 code and calling the internal c function directly (suggested by the
 Rprof output):
 
 
 system.time({
 r - apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor(x,y) })})
 })
 
 system.time({
 r2 - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
 .Internal(cor(x, y, 4L, FALSE)) })})
 })
 
 1.5s vs 0.2 s on my computer. Combining both changes gives me a ~25
 time speed up - I suspect you can do even better if you think about
 what calculations are being duplicated in the computation of the
 correlations.
 
 Hadley
 
 -- 
 http://had.co.nz/
_
[[elided Hotmail spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotting density for truncated distribution

2008-11-26 Thread Jeroen Ooms


thank you, both solutions are really helpful!
-- 
View this message in context: 
http://www.nabble.com/plotting-density-for-truncated-distribution-tp20684995p20703469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RES: S4 object

2008-11-26 Thread Rodrigo Aluizio

Take a look at the links you will found on this previous post of the list.

http://tolstoy.newcastle.edu.au/R/help/06/01/18259.html

I myself don't know anything about this subject.

Sorry, but you probably will found what you need there.

Best Wishes

Rodrigo.

-Mensagem original-
De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em
nome de Laurina Guerra
Enviada em: quarta-feira, 26 de novembro de 2008 11:43
Para: r-help@r-project.org
Assunto: [R] S4 object

Hola buen dia!! Alguien me podrma orientar acerca de que es la clase S4
object y como funciona de la forma mas sencilla posible. Se les agradecerma
su respuesta lo mas pronto posible.. gracias!!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RES: memory limit

2008-11-26 Thread Leandro Marino

It depends of the number of the variables. If you are using 2 or 3 variables
you can do some things.

I should you read about ff package and ASOR packages they manage the dataset
to do some kind of IO.

Regards,

-Mensagem original-
De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Em
nome de iwalters
Enviada em: quarta-feira, 26 de novembro de 2008 09:42
Para: r-help@r-project.org
Assunto: [R] memory limit


I'm currently working with very large datasets that consist out of 1,000,000
+ rows.  Is it at all possible to use R for datasets this size or should I
rather consider C++/Java.


-- 
View this message in context:
http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-
tp20675880p20699700.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ts subscripting problem

2008-11-26 Thread andrew collier

hi,

i am having trouble getting a particular time series to plot. this is what i
have:

 class(irradiance)
[1] ts
 irradiance[1:30]
  197811   197812   197901   197902   197903   197904   197905   197906
1366.679 1366.729 1367.476 1367.739 1368.339 1367.883 1367.916 1367.055
  197907   197908   197909   197910   197911   197912   198001   198002
1367.484 1366.887 1366.935 1367.034 1366.997 1367.310 1367.041 1366.459
  198003   198004   198005   198006   198007   198008   198009   198010
1367.143 1366.553 1366.597 1366.854 1366.814 1366.901 1366.622 1366.669
  198011   198012   198101   198102   198103   198104
1365.874 1366.098 1367.141 1366.239 1366.323 1366.388
 plot(irradiance[1:30])
 plot(irradiance)
Error in dn[[2]] : subscript out of bounds

so, if i plot a subset of the data it works fine. but if i try to plot the
whole thing it breaks. the ts object was created using:

irradiance = ts(tapply(d$number, f, mean), freq = 12, start = c(1978, 11))

and other ts objects that i have defined using basically the same approach
work fine.

any ideas greatly appreciated!

cheers,
andrew.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] survreg and pweibull

2008-11-26 Thread Andrew Beckerman


Dear all -

I have followed the thread the reply to which was lead by Thomas  
Lumley about using pweibull to generate fitted survival curves for  
survreg models.


http://tolstoy.newcastle.edu.au/R/help/04/11/7766.html

Using the lung data set,

data(lung)
lung.wbs - survreg( Surv(time, status)~ 1, data=lung, dist='weibull')
curve(pweibull(x, scale=exp(coef(lung.wbs)), shape=1/lung.wbs 
$scale,lower.tail=FALSE),from=0, to=max(lung$time))

lines(survfit(Surv(time,status)~1, data=lung), col=red)

Assuming this is correct, why does the inflection point of this curve  
not match up to the exp(scale parameter)?  Am I wrong in assuming that  
the scale represents the inflection, and the shape adjusts the shape  
around this point?  I think I am perhaps confusing the scale and  
the median with the inflection point calcuation?


One can visualise the mismatch with:

abline(v=exp(coef(lung.wbs)),lty=2)
abline(h=0.5,lty=2)

Many thanks for the clarification

R version 2.8.0 (2008-10-20)
i386-apple-darwin8.11.1
locale:
en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] splines   datasets  utils stats graphics  grDevices  
methods   base

other attached packages:
[1] survival_2.34-1 Hmisc_3.4-3 lattice_0.17-15 MASS_7.2-44
loaded via a namespace (and not attached):
[1] cluster_1.11.11 grid_2.8.0  tools_2.8.0

Andrew

-
Dr. Andrew Beckerman
Department of Animal and Plant Sciences, University of Sheffield,
Alfred Denny Building, Western Bank, Sheffield S10 2TN, UK
ph +44 (0)114 222 0026; fx +44 (0)114 222 0002
http://www.beckslab.staff.shef.ac.uk/

http://www.flickr.com/photos/apbeckerman/
http://www.warblefly.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Second y-axis

2008-11-26 Thread Dr. Alireza Zolfaghari

Hi list,
In the following code, how can I place the percentage label away from
numbers in the second y-axis (lets say all should be inside plot area)?

Thanks
Alireza

=
require(grid)
vp- viewport(x=.1,y=.1,width=.6,height=.6,just=c(left, bottom))
pushViewport(vp)
plotDATA=data.frame(Loss=c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),Level=c(AvgAll,AvgAll,AvgAll,AvgAll,AvgAll,AvgAll,AvgAll,
AvgAll,AvgAll,AvgAll,AvgAll,AvgAll,GUL,GUL,GUL,GUL,GUL,GUL,GUL,GUL),Line=c(1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8))
library(lattice)
xyplot( Loss ~ Line, data=plotDATA, t=p,
scales=list(relation=free, x=list(draw=TRUE, tick.number=12, labels=
1:12)),par.settings = list(clip = list(panel = off)))
p- xyplot( Loss ~ Line, data=plotDATA,
t=p,scales=list(relation=free,x=list(at = 1:12)),
panel=function(x,y,subscripts, groups,...){
panel.xyplot(subset(plotDATA, Level==AvgAll)$Line,subset(plotDATA,
Level==AvgAll)$Loss ,col=Lloydscolour(colIncP),lwd=3,origin=0,...)
panel.axis(side = right,
at=unique(plotDATA$Loss),labels=unique(plotDATA$Loss)/max(plotDATA$Loss)*100,outside=FALSE,ticks=TRUE,half=FALSE)
panel.axis(side = right,
at=median(plotDATA$Loss),labels=Percentage,outside=FALSE,ticks=FALSE,half=FALSE,rot=90)
panel.axis(side = right,
at=c(4,8),labels=c(200,400),outside=TRUE,ticks=TRUE,half=FALSE)
panel.barchart(subset(plotDATA,Level==GUL )$Line,
subset(plotDATA,Level==GUL )$Loss,box.ratio=1,horizontal = FALSE,stack =
TRUE,reference = TRUE,col=blue,border=blue)#,origin=0)
 }
 )

 print(p,position = c(0.1, 0.1, 0.9, .9))
=

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

2008-11-26 Thread Charlie Brush


Frank E Harrell Jr wrote:

Charlie Brush wrote:

I am doing multiple imputation with Hmisc, and
can't figure out how to replace the NA values with
the imputed values.

Here's a general ourline of the process:

  set.seed(23)
  library(mice)
  library(Hmisc)
  library(Design)
  d - read.table(DailyDataRaw_01.txt,header=T)
  length(d);length(d[,1])
[1] 43
[1] 2666
Do for this data set, there are 43 columns and 2666 rows

Here is a piece of data.frame d:
  d[1:20,4:6]
 P01  P02  P03
1  0.1 0.16 0.16
2   NA 0.00 0.00
3   NA 0.60 0.04
4   NA 0.15 0.00
5   NA 0.00 0.00
6  0.7 0.00 0.75
7   NA 0.00 0.00
8   NA 0.00 0.00
9  0.0 0.00 0.00
10 0.0 0.00 0.00
11 0.0 0.00 0.00
12 0.0 0.00 0.00
13 0.0 0.00 0.00
14 0.0 0.00 0.00
15 0.0 0.00 0.03
16  NA 0.00 0.00
17  NA 0.01 0.00
18 0.0 0.00 0.00
19 0.0 0.00 0.00
20 0.0 0.00 0.00

These are daily precipitation values at NCDC stations, and
NA values at station P01 will be filled using multiple
imputation and data from highly correlated stations P02 and P08:

  f - aregImpute(~ I(P01) + I(P02) + I(P08), 
n.impute=10,match='closest',data=d)

Iteration 13
  fmi - fit.mult.impute( P01 ~ P02 + P08 , ols, f, d)

Variance Inflation Factors Due to Imputation:

Intercept   P02   P08
   1.01  1.39  1.16

Rate of Missing Information:

Intercept   P02   P08
   0.01  0.28  0.14

d.f. for t-distribution for Tests of Single Coefficients:

Intercept   P02   P08
242291.18116.05454.95
  r - apply(f$imputed$P01,1,mean)
  r
   2 3 4 5 7 81617   249   250   251
0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002
 252   253   254   255   256   257   258   259   260   261   262
1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719
 263   264   265   266   267   268   269   270   271   272   273
1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135
 274   275   276   277   278   279   280   281   282   283   284
0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029
 285   286   287   288   289   418   419   420   421   422   700
0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002

--
So far, this is working great.
Now, make a copy of d:
  dnew - d

And then fill in the NA values in P01 with the values in r

For example:
  for (i in 1:length(r)){
   dnew$P01[r[i,1]] - r[i,2]
   }
This doesn't work, because each 'piece' of r is two numbers:
  r[1]
  2
0.002
  r[1,1]
Error in r[1, 1] : incorrect number of dimensions

My question: how can I separate the the two items in (for example)
r[1] to use the first part as an index and the second as a value,
and then use them to replace the NA values with the imputed values?

Or is there a better way to replace the NA values with the imputed 
values?


Thanks in advance for any help.



You didn't state your goal, and why fit.mult.impute does not do what 
you want.   But you can look inside fit.mult.impute to see how it 
retrieves the imputed values.  Also see the example in documentation 
for transcan in which the command impute(xt, imputation=1) to retrieve 
one of the multiple imputations.


Note that you can say library(Design) (omit the quotes) to access both 
Design and Hmisc.


Frank

Thanks for your help.
My goal is to replace the NA values in the (copy of the) data frame with 
the means of the imputed values (which are now in variable 'r').
fit.mult.impute works fine. I just can't figure out the last step, 
taking the results of fit.mult.impute (which are in variable 'r') and 
replacing the NA values in the (copy of the) data frame.
A simple for loop doesn't work because the items in 'r' don't look like 
a normal vector, as for example r[1] returns

 2
0.002
Is there a command to replace the NA values in the data frame with the 
means of the imputed values?


Thanks,
Charlie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

2008-11-26 Thread Frank E Harrell Jr


Charlie Brush wrote:

Frank E Harrell Jr wrote:

Charlie Brush wrote:

I am doing multiple imputation with Hmisc, and
can't figure out how to replace the NA values with
the imputed values.

Here's a general ourline of the process:

  set.seed(23)
  library(mice)
  library(Hmisc)
  library(Design)
  d - read.table(DailyDataRaw_01.txt,header=T)
  length(d);length(d[,1])
[1] 43
[1] 2666
Do for this data set, there are 43 columns and 2666 rows

Here is a piece of data.frame d:
  d[1:20,4:6]
 P01  P02  P03
1  0.1 0.16 0.16
2   NA 0.00 0.00
3   NA 0.60 0.04
4   NA 0.15 0.00
5   NA 0.00 0.00
6  0.7 0.00 0.75
7   NA 0.00 0.00
8   NA 0.00 0.00
9  0.0 0.00 0.00
10 0.0 0.00 0.00
11 0.0 0.00 0.00
12 0.0 0.00 0.00
13 0.0 0.00 0.00
14 0.0 0.00 0.00
15 0.0 0.00 0.03
16  NA 0.00 0.00
17  NA 0.01 0.00
18 0.0 0.00 0.00
19 0.0 0.00 0.00
20 0.0 0.00 0.00

These are daily precipitation values at NCDC stations, and
NA values at station P01 will be filled using multiple
imputation and data from highly correlated stations P02 and P08:

  f - aregImpute(~ I(P01) + I(P02) + I(P08), 
n.impute=10,match='closest',data=d)

Iteration 13
  fmi - fit.mult.impute( P01 ~ P02 + P08 , ols, f, d)

Variance Inflation Factors Due to Imputation:

Intercept   P02   P08
   1.01  1.39  1.16

Rate of Missing Information:

Intercept   P02   P08
   0.01  0.28  0.14

d.f. for t-distribution for Tests of Single Coefficients:

Intercept   P02   P08
242291.18116.05454.95
  r - apply(f$imputed$P01,1,mean)
  r
   2 3 4 5 7 81617   249   250   251
0.002 0.430 0.044 0.002 0.002 0.002 0.002 0.123 0.002 0.002 0.002
 252   253   254   255   256   257   258   259   260   261   262
1.033 0.529 1.264 0.611 0.002 0.513 0.085 0.002 0.705 0.840 0.719
 263   264   265   266   267   268   269   270   271   272   273
1.489 0.532 0.150 0.134 0.002 0.002 0.002 0.002 0.002 0.055 0.135
 274   275   276   277   278   279   280   281   282   283   284
0.009 0.002 0.002 0.002 0.008 0.454 1.676 1.462 0.071 0.002 1.029
 285   286   287   288   289   418   419   420   421   422   700
0.055 0.384 0.947 0.002 0.002 0.008 0.759 0.066 0.009 0.002 0.002

--
So far, this is working great.
Now, make a copy of d:
  dnew - d

And then fill in the NA values in P01 with the values in r

For example:
  for (i in 1:length(r)){
   dnew$P01[r[i,1]] - r[i,2]
   }
This doesn't work, because each 'piece' of r is two numbers:
  r[1]
  2
0.002
  r[1,1]
Error in r[1, 1] : incorrect number of dimensions

My question: how can I separate the the two items in (for example)
r[1] to use the first part as an index and the second as a value,
and then use them to replace the NA values with the imputed values?

Or is there a better way to replace the NA values with the imputed 
values?


Thanks in advance for any help.



You didn't state your goal, and why fit.mult.impute does not do what 
you want.   But you can look inside fit.mult.impute to see how it 
retrieves the imputed values.  Also see the example in documentation 
for transcan in which the command impute(xt, imputation=1) to retrieve 
one of the multiple imputations.


Note that you can say library(Design) (omit the quotes) to access both 
Design and Hmisc.


Frank

Thanks for your help.
My goal is to replace the NA values in the (copy of the) data frame with 
the means of the imputed values (which are now in variable 'r').
fit.mult.impute works fine. I just can't figure out the last step, 
taking the results of fit.mult.impute (which are in variable 'r') and 
replacing the NA values in the (copy of the) data frame.
A simple for loop doesn't work because the items in 'r' don't look like 
a normal vector, as for example r[1] returns

 2
0.002
Is there a command to replace the NA values in the data frame with the 
means of the imputed values?


Thanks,
Charlie



Don't do that, as this would no longer be multiple imputation.  If you 
want single conditional mean imputation use transcan.


Frank


--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Berwin A Turlach

G'day Andy,

On Wed, 26 Nov 2008 14:51:50 +
Andrew Choens [EMAIL PROTECTED] wrote:

 I was asked by my boss to do an analysis on a large data set, and I am
 trying to convince him to let me use R rather than SPSS. 

Very laudable of you. :)

 This is the output from R:
  chisq.test(test29)
 
   Pearson's Chi-squared test
 
 data:  test29
 X-squared = 9.593, df = 4, p-value = 0.04787
 
 But, the same data in SPSS generates a p value of .051. It's a small
 but important difference. 

Chuck explained already the reason for this small difference.  I just
take issue about it being an important difference.  In my opinion, this
difference is not important at all.  It would only be important to
people who are still sticking to arbitrary cut-off points that are
mainly due to historical coincidences and the lack of computing power
at those time in history.  If somebody tells you that this difference
is important, ask him or her whether he or she will be willing to
finance you a room full of calculators (in the sense of Pearson's time)
and whether he or she wants you to do all your calculations and analyses
with these calculators in future.  Alternatively, you could ask the
person whether he or she would like the anaesthetist during his or her
next operation to use chloroform given his or her nostalgic penchant for
out-dated rituals/methods.

 I played around and rescaled things, and tried different values for
 B, but I never could get R to reach .051.

Well, I have no problem when using simulated p-values to get something
close to 0.051; look at the last try.  The second one might also be
noteworthy.  Unfortunately, I didn't save the seed beforehand.

 test29 - matrix(c(110,358,71,312,29,139,31,77,13,32), byrow=TRUE,
 ncol=2) test29
 [,1] [,2]
[1,]  110  358
[2,]   71  312
[3,]   29  139
[4,]   31   77
[5,]   13   32
 chisq.test(test29, simul=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.04798

 chisq.test(test29, simul=TRUE)

Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.05697

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0463

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0499

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.0486

 chisq.test(test29, simul=TRUE, B=2)

Pearson's Chi-squared test with simulated p-value (based on
2 replicates)

data:  test29 
X-squared = 9.593, df = NA, p-value = 0.05125


Cheers,

Berwin

=== Full address =
Berwin A TurlachTel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability+65 6516 6650 (self)
Faculty of Science  FAX : +65 6872 3919   
National University of Singapore 
6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
Singapore 117546http://www.stat.nus.edu.sg/~statba

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens

On Thu, 2008-11-27 at 00:46 +0800, Berwin A Turlach wrote:
 Chuck explained already the reason for this small difference.  I just
 take issue about it being an important difference.  In my opinion,
 this difference is not important at all.  It would only be important
 to people who are still sticking to arbitrary cut-off points that are
 mainly due to historical coincidences and the lack of computing power
 at those time in history.  If somebody tells you that this difference
 is important, ask him or her whether he or she will be willing to
 finance you a room full of calculators (in the sense of Pearson's
 time) and whether he or she wants you to do all your calculations and
 analyses with these calculators in future.  Alternatively, you could
 ask the person whether he or she would like the anaesthetist during
 his or her next operation to use chloroform given his or her nostalgic
 penchant for out-dated rituals/methods.

Yes he did and when I realized the source of my confusion I was
appropriately chastised. I felt like a bit of a fool. Of course, I
should try comparing apples to apples. Oranges are another thing
entirely.

As to the importance of the difference, I am of two minds. On the one
hand I fully agree with you. It is an anachronistic approach. On the
other hand we don't all have the pleasure of working in a math
department where such subtleties are well understood.

I work for a consulting firm that advises state and local governments
(USA). I personally do try to expand my understanding on statistics and
math (I do not have a degree in math), but my clients do not. When I'm
working with someone from the government, it is sometimes easier to
simply tell them that relationship x is significant at a certain level
of certainty. Although I doubt they could really explain the details,
they have some basic understanding of what I am talking about.
Subtleties are sometimes lost on our public servants.

And, since I do work for government, if I ask for a roomful of
calculators, I might just get them. And really, what am I going to do
with a roomful of calculators?

--andy


-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with aovlmer.fnc in languageR

2008-11-26 Thread Mats Exter


Dear R list,

I have a recurring problem with the languageR package, specifically the 
aovlmer.fnc function. When I try to run the following code (from R. H. 
Baayen's textbook):



# Example 1:
library(languageR)
latinsquare.lmer - lmer(RT ~ SOA + (1 | Word) + (1 | Subject),
 data = latinsquare)
x - pvals.fnc(latinsquare.lmer,
   withMCMC = TRUE)
aovlmer.fnc(latinsquare.lmer,
mcmc = x$mcmc,
which = c(SOAmedium, SOAshort))


I get the following error message (German locale):


Fehler in anova(object) : Calculated PWRSS for a LMM is negative


Invoking traceback yields the following result:


 traceback()
4: .Call(mer_update_projection, object)
3: anova(object)
2: anova(object)
1: aovlmer.fnc(latinsquare.lmer, mcmc = x$mcmc, which = c(SOAmedium,
   SOAshort))


By contrast, the following code (without the aovlmer.fnc command) runs 
without error:



# Example 2:
library(languageR)
latinsquare.lmer - lmer(RT ~ SOA + (1 | Word) + (1 | Subject),
 data = latinsquare)
pvals.fnc(latinsquare.lmer,
  withMCMC = TRUE)


Similarly, the following code (without the pvals.fnc command, and 
consequently running aovlmer.fnc *without* MCMC sampling) also runs 
without error:



# Example 3:
library(languageR)
latinsquare.lmer - lmer(RT ~ SOA + (1 | Word) + (1 | Subject),
 data = latinsquare)
aovlmer.fnc(latinsquare.lmer,
noMCMC = TRUE)


However, the following code results in exactly the same error message as 
given above, which puzzles me because up to and including the pvals.fnc 
command, the code is basically the same as in Example 2 (which runs 
fine); and the code in the final aovlmer.fnc command is the same as in 
Example 3 (which also runs fine) and, crucially, does not refer to the 
object created by pvals.fnc (named x) at all:



# Example 4:
library(languageR)
latinsquare.lmer - lmer(RT ~ SOA + (1 | Word) + (1 | Subject),
 data = latinsquare)
x - pvals.fnc(latinsquare.lmer,
   withMCMC = TRUE)
aovlmer.fnc(latinsquare.lmer,
noMCMC = TRUE)


Trying to run the examples given in the languageR documentation, no 
error occurs for me with the example in ?pvals.fnc; however, the same 
error as above occurs when running the example given in ?aovlmer.fnc.


I am using Windows XP SP3, R version 2.8.0, and updated packages. The 
following is the result of sessionInfo:



 sessionInfo()
R version 2.8.0 (2008-10-20)
i386-pc-mingw32

locale:

LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods 
base


other attached packages:
[1] languageR_0.953lme4_0.999375-27   Matrix_0.999375-16 
zipfR_0.6-5lattice_0.17-17


loaded via a namespace (and not attached):
[1] grid_2.8.0  tools_2.8.0


To sum up, the error occurs when invoking aovlmer.fnc, but somehow seems 
to be caused by previous invocation of pvals.fnc. I have reproduced the 
error on several other computers (running on Windows XP SP2, Windows XP 
SP3, and Windows Vista) with fresh installations of R, so I do hope I 
have excluded any trivial blunders on my part. Any help is greatly 
appreciated.


Thank you very much in advance!

Mats Exter

--
Mats Exter
Institut für Linguistik
Allgemeine Sprachwissenschaft
Universität zu Köln
50923 Köln
Germany

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] S4 slot containing either aov or NULL

2008-11-26 Thread Matthias Kohl


Dear Thomas,

take a look at setOldClass ...

## register aov (an S3-class) as a formally defined class
setOldClass(aov)

## list and some others are special cases; cf.
class ? list

## now your code should work
setClassUnion(aovOrNULL, c(aov, NULL))
setClass(c1, representation(value = aovOrNULL))
y1 - new(c1, value = NULL)

#trying to assign an aov object to the slot doesn't work
utils::data(npk, package=MASS)
npk.aov - aov(yield ~ block + N*P*K, npk)
y2 - new(c1, value = npk.aov )

Best,
Matthias

Thomas Kaliwe wrote:

Dear listmembers,

I would like to define a class with a slot that takes either an object 
of class aov or NULL. I have been reading S4 Classes in 15 pages more 
or less and Lecture: S4 classes and methods


#First I tried with list and NULL
setClass(listOrNULL)
setIs(list, listOrNULL)
setIs(NULL, listOrNULL)
#doesn't work

#defining a union class it works with list and null
setClassUnion(listOrNULL, c(list, NULL))
setClass(c1, representation(value = listOrNULL))
y1 = new(c1, value = NULL)
y2 = new(c1, value = list(a = 10))

#but it won't work with aov or null
setClassUnion(aovOrNULL, c(aov, NULL))
setClass(c1, representation(value = aovOrNULL))
y1 = new(c1, value = NULL)

#trying to assign an aov object to the slot doesn't work
utils::data(npk, package=MASS)
npk.aov - aov(yield ~ block + N*P*K, npk)
y2 = new(c1, value = npk.aov )

Any ideas?

Thank you

Thomas Kaliwe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Dr. Matthias Kohl
www.stamats.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Ted Harding

On 26-Nov-08 17:57:52, Andrew Choens wrote:
 [...]
 And, since I do work for government, if I ask for a roomful of
 calculators, I might just get them. And really, what am I going
 to do with a roomful of calculators?
 
 --andy
 Insert something humorous here.  :-)

Next time the launch of an incoming nuclear strike is detected,
set them to work as follows (following Karl Pearson's historical
precedent):

  Anti-aircraft guns all day long: Computing for the
 Ministry of Munitions
 JUNE BARROW GREEN (Open University)
   From January 1917 until March 1918 Pearson and his
   staff of mathematicians and human computers at the
   Drapers Biometric Laboratory worked tirelessly on
   the computing of ballistic charts, high-angle range
   tables and fuze-scales for AV Hill of the Anti-Aircraft
   Experimental Section. Things did not always go smoothly
   -- Pearson did not take kindly to the calculations of
   his staff being questioned -- and Hill sometimes had
   to work hard to keep the peace.

If you have enough of them (and Pearson undoubtedly did, so you
can quote that in your requisition request), then you might just
get the answer in time!

[ The above excerpted from http://tinyurl.com/6byoub ]

Good luck!
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Nov-08   Time: 18:35:25
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in sqlCopy in RODBC

2008-11-26 Thread BKMooney


I tried your suggestion...

library(RODBC)
channel = odbcConnectAccess(db.mdb)
sqlCopy(channel,Select * from tab,newtab,destchannel=channel,
  safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE)
odbcClose(channel)

however, I am still running into errors, both when appending to an existing
table, or creating a table if the destination table does not exist.  

The code I am using is: 

query -   select * from tblHistorical where MyDate between '2008-11-21'
and '2008-11-25' ; 
sqlCopy(RemoteChannel, query, NewTable, destchannel=LocalChannel,
safer=TRUE, append=TRUE, rownames=FALSE, fast=FALSE)


If I run this when NewTable does not yet exist in the database, it returns
the error:  
ERROR: Could not SQLExecDirect 42000 102 [Microsoft][ODBC SQL Server
Driver][SQL Server]Incorrect syntax near '16'.


And when this runs when NewTable is already defined with correct data
types, I get the error:  
Error in sqlSave(destchannel, dataset, destination, verbose = verbose,  :
unable to append to table ‘NewTable’


It seems I can't get it to work in either scenario.  

Any insight anyone could provide would be greatly appreciated.  



Dieter Menne wrote:
 
 
 
 BKMooney wrote:
 
 I am trying to copy portions of tables from one SQL database to another,
 using sqlCopy in the RODBC package.
 
 ...
 I am currently getting an error:
 Error in sqlSave(destchannel, dataset, destination, verbose = verbose,  :
   table 'LocalTable' already exists
 
 
 I can reproduce your error with my example file and fast=TRUE. You might
 try fast=FALSE 
 and append=TRUE when things like this happens. The following works for me
 
 library(RODBC)
 channel = odbcConnectAccess(db.mdb)
 sqlCopy(channel,Select * from tab,newtab,destchannel=channel,
   safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE)
 odbcClose(channel)
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Error-in-sqlCopy-in-RODBC-tp20691929p20706667.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Smoothed 3D plots

2008-11-26 Thread Jorge Ivan Velez

DeaR list,

I'm trying to represent some information via 3D plots. My data and session
info are at the end of this message. So far, I have tried scatterplot3d
(scatterplot3d),
persp3d (rgl), persp (graphics) and scatter3d (Rmcdr) but any of them gave
me what I'd like to have as final result (please see [1] for a similar 3D
plot changing
PF by ypred, pdn by h4 and pup by h11).

In general terms, I would like to plot a smoothed surface with h4 and h11 in
the x and y axis, respectively, and y in the z axis. I think that a
smoothing procedure should
work but I don't know how to get that work in R.

I would really appreciate any help you can provide me.

Thanks in advance,

Jorge Velez

[1] http://www.meyersanalytics.com/pfpuppdn.gif


# ---
# Data
# ---
res=structure(list(ypred = c(0.718230085350727, 0.654361814214854,
0.644756374003864, 0.652006785905442, 0.695903817103918, 0.812977870063322,
0.639339808892218, 0.701034703527025, 0.740110233651808, 0.664942946000872,
0.866847509624424, 0.652502006511821, 0.650617983864292, 0.713152860580267,
0.717738908917513, 0.73869721161, 0.666120460155577, 0.652242288736383,
0.596880009623886, 0.671197684926038, 0.45, 0.820174620312253,
0.665066954282449, 0.735164674202043, 0.696027875479989, 0.646119215131913,
0.644235192484384, 0.663550099786557), h4 = c(1, 0, 1, 0, 1,
1, 1, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 1, 1, 0, 0, 0,
0, 1), h11 = c(0, 0, 1, 0, 1, 2, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1,
2, 0, 1, 2, 2, 2, 0, 1, 1, 0, 0, 0)), .Names = c(y, h4,
h11), class = data.frame, row.names = c(NA, -28L))


# 
# Some plots
# 

# Option 1
require(Rcmdr)
with(res,
scatter3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11',
ylab='Ratio',fit=c(linear,quadratic))
)

# Option 2
require(scatterplot3d)
with(res,
scatterplot3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11',ylab='Ratio')
)


# -
# Session info
# -
R version 2.8.0 Patched (2008-11-08 r46864)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods
base

other attached packages:
[1] rgl_0.81 mgcv_1.4-1   Rcmdr_1.4-4  car_1.2-9

[5] scatterplot3d_0.3-27 lattice_0.17-15

loaded via a namespace (and not attached):
[1] grid_2.8.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] weighted ftable

2008-11-26 Thread Andrew Choens

I didn't know that. That is exactly what I need.

On Tue, 2008-11-25 at 13:00 -0800, Thomas Lumley wrote:
 On Mon, 24 Nov 2008, Andrew Choens wrote:
 
  I need to do some fairly deep tables, and ftable() offers most of what I
  need, except for the weighting. With smaller samples, I've just used
  replicate to let me have a weighted data set, but with this data set,
  I'm afraid replicate is going to make my data set too big for R to
  handle comfortably.
 
  That being said, is there some way to weight my table (similar to
  wtd.table) but offer the nuanced control and depth of ftable?
 
 xtabs() will take a weight as the left-hand side of the formula, and its 
 output can then be processed by ftable().
 
   -thomas
 
 Thomas Lumley Assoc. Professor, Biostatistics
 [EMAIL PROTECTED] University of Washington, Seattle
-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi-Square Test Disagreement

2008-11-26 Thread Andrew Choens


 Next time the launch of an incoming nuclear strike is detected,
 set them to work as follows (following Karl Pearson's historical
 precedent):
 
   Anti-aircraft guns all day long: Computing for the
  Ministry of Munitions
  JUNE BARROW GREEN (Open University)
From January 1917 until March 1918 Pearson and his
staff of mathematicians and human computers at the
Drapers Biometric Laboratory worked tirelessly on
the computing of ballistic charts, high-angle range
tables and fuze-scales for AV Hill of the Anti-Aircraft
Experimental Section. Things did not always go smoothly
-- Pearson did not take kindly to the calculations of
his staff being questioned -- and Hill sometimes had
to work hard to keep the peace.
 
 If you have enough of them (and Pearson undoubtedly did, so you
 can quote that in your requisition request), then you might just
 get the answer in time!
 
 [ The above excerpted from http://tinyurl.com/6byoub ]
 
 Good luck!
 Ted.
 

That is absolutely classic.


-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reshape with var as fun.aggregate

2008-11-26 Thread locklin . jason

I used to be able to use variance for fun.aggregate, but since I 
upgraded from R 2.6 to 2.7 (Ubuntu hardy to intrepid), it no longer 
works.

 library(reshape)
 data(french_fries)
 ffm - melt(french_fries, id=1:4, na.rm = TRUE)
 cast(ffm, rep ~ treatment, length)
  rep   1   2   3
1   1 579 578 575
2   2 580 579 580
 cast(ffm, rep ~ treatment, mean)
  rep123
1   1 3.207772 3.177336 3.024522
2   2 3.181207 3.115544 3.277759
 cast(ffm, rep ~ treatment, var)
Error in fun.aggregate(data$value[0]) : 'x' is empty

Any ideas?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating a vector based on lookup function

2008-11-26 Thread PDXRugger


I am still searching for a solution to what i think is a simple problem i am
having with building a vector in a for loop.  I have built a more
understandable example so hopefully that will help..help you, help me, if
you know what i mean.  

dev=400
#test location model TAZs to reference
cands=c(101,105,109)

#Create Object of length of cands
candslength=length(cands)

#TEST TAZ Vector 
CANDS=(100:110)

#Test Dev Vector
TAZDEVS=c(120,220,320,420,520,620,720,820,920,1020,1120)

#Creates datframe of CANDS and DEVS
TAZLIST=data.frame(CANDS,TAZDEVS)

for(i in 1:candslength){

cand=cands[i]   

#Creates an object based on the value of cands in TAZLIST
TAZDet=TAZLIST[TAZLIST$CANDS==cand[i],2]

}

What i would like to see is TAZDet filled with the cands corresponding
values found in TAZLIST's second column TAZDEVS, they would be
120,520,920.  So if things worked the way i would like them TAZDet would be
a vector containing these three values (102,520,920).  At this point it
gives me NAs.  Any help would be desirable, Thanks guys and gals.


Cheers, 
JR
-- 
View this message in context: 
http://www.nabble.com/Creating-a-vector-based-on-lookup-function-tp20706588p20706588.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Request for Assistance in R with NonMem

2008-11-26 Thread Michael White

Hi
I am having some problems running a covariate analysis with my
colleage using R with the NonMem program we are using for a graduate
school project. R and NonMem run fine without adding in the
covariates, but the program is giving us a problem when the covariate
analysis is added. We think the problem is with the R code to run the
covariate data analysis. We have the control stream, R code (along
with the error), and data attached. If anyone can help we would really
appreciate it. Thank you very much.
Michael



R Code:
# load MItools from within R
library(MItools)

ProjectDir - ~/EssentialsOpenCourseware/continuous_PKPD/Examples
NMcom - nm6osxg77big.pl
cov   -  c(AGE, BW)

# run NONMEM using 3032.ctl
NONR(ProjectDir=ProjectDir,
 NMcom=NMcom,
 dvname=Test (mcg/L),
 diag=1,
 covplt=1,
 b=3032,
 boot=0,
 nochecksum=TRUE,
 grp=c(FLAG)
)

ERROR MESSAGE:
ld warning: for symbol _cm41_ tentative definition of size 32 from
/usr/local/nm6osxg77big/nm/nonmem.a(INITL.o) is is smaller than the
real definition of size 28 from /usr/local/nm6osxg77big/nm/BLKDAT.o
/usr/local/nm6osxg77big/test/nm6osxg77big.pl is starting
/Users/mpdubb/EssentialsOpenCourseware/continuous_PKPD/Examples/3033/nonmem.exe
/usr/local/nm6osxg77big/test/nm6osxg77big.pl is complete
No TAD item for run 3033.
Error in as.vector(x, mode) : invalid 'mode' argument
Run 3033 complete.
NONR complete.

CONTROL STREAM (in NONMEM):
$PROB RUN# 3033 EFFECT COMP POPULATION PK-PD MODEL 2
$INPUT C REPL ID ARM TIME AGE BW GNDR DV MDV FLAG AMT EVID
;FLAG 1 = PK, FLAG 2=PD,
;CMT 1=GUT, 2=CENTRAL
;DV =PK  ORDERED BY ID, TIME
$DATA ../projectRSD1.csv IGNORE=C
$SUBROUTINE ADVAN2 TRANS2 INFN=../MItoolsRunlogNM6.for
$ABB COMRES=8
$PK
FLA2=FLAG
TVKA=THETA(1)
TVCL=THETA(2)
TVV=THETA(3)
CL=TVCL*EXP(ETA(1))
V=TVV*EXP(ETA(2))
KA=TVKA*EXP(ETA(3))
S2=V
$ERROR
EMAX=THETA(4)*EXP(ETA(4))
EC50=THETA(5)*EXP(ETA(5))
N = THETA(6)*EXP(ETA(6))
EFF= (EMAX*F**N)/(EC50**N+F**N) + ERR(2)
CONC= F + F*ERR(1)
Y = CONC*(2-FLAG) + EFF*(FLAG-1)
IPRED=Y
  LAST
  COM(1)=G(1,1)
  COM(2)=G(2,1)
  COM(3)=G(3,1)
  COM(4)=G(4,1)
  COM(5)=G(5,1)
  COM(6)=G(6,1)
  COM(7)=HH(1,1)
  COM(8)=HH(2,1)
$THETA
(0, 1.7) ;KA
(0, 4.7) ;CL
(0, 38) ;V
(0.0, .5)   ; EMAX
(0.0, 100)  ; EC50
(0.0, 1, 10); N
$OMEGA
0.04
0.04
0.04
0.04
0.04
0.04
$SIGMA
0.04
5
$ESTIMATION MAXEVAL= NOABORT METHOD=0 POSTHOC PRINT=5
$COVARIANCE
$TABLE ID TIME  AGE BW GNDR TVKA TVV TVCL ETA1 ETA2 ETA3
EVID NOPRINT ONEHEADER FILE=TABLE.PAR
$TABLE NOPRINT FILE=../3033.TAB ONEHEADER
ID TIME AGE BW GNDR KA V CL TVKA TVCL TVV FLAG FLA2 EVID EFF CONC IPRED FLAG
$TABLE NOPRINT ONEHEADER FILE=../3033par.TAB
ID TIME  AGE BW GNDR KA V CL TVKA TVCL TVV EMAX EC50 ETA1 ETA2 ETA3
EVID
$TABLE ID TIME COM(1)=G11 COM(2)=G21
COM(3)=G31 COM(4)=G41 COM(5)=G51 COM(6)=G61 COM(7)=H11
COM(8)=H21 IPRED MDV NOPRINT ONEHEADER FILE=cwtab1.deriv
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Smoothed 3D plots

2008-11-26 Thread Mark Difford


Hi Jorge,

 In general terms, I would like to plot a smoothed surface with h4 and h11
 in
 the x and y axis, respectively, and y in the z axis. I think that a 
 smoothing procedure should work but I don't know how to get that work in
 R.

There are quite a few options. If I were you I would have a look at package
locfit:

plot(locfit(y ~ lp(h4, h11), data=res), type=persp, theta=-50, phi=25,
tick=detailed)

See, amongst other options, ?locfit.raw

Regards, Mark.

PS: Don't worry about the warning messages. They refer to switches that
should be set for panel functions.


Jorge Ivan Velez wrote:
 
 DeaR list,
 
 I'm trying to represent some information via 3D plots. My data and session
 info are at the end of this message. So far, I have tried scatterplot3d
 (scatterplot3d),
 persp3d (rgl), persp (graphics) and scatter3d (Rmcdr) but any of them gave
 me what I'd like to have as final result (please see [1] for a similar 3D
 plot changing
 PF by ypred, pdn by h4 and pup by h11).
 
 In general terms, I would like to plot a smoothed surface with h4 and h11
 in
 the x and y axis, respectively, and y in the z axis. I think that a
 smoothing procedure should
 work but I don't know how to get that work in R.
 
 I would really appreciate any help you can provide me.
 
 Thanks in advance,
 
 Jorge Velez
 
 [1] http://www.meyersanalytics.com/pfpuppdn.gif
 
 
 # ---
 # Data
 # ---
 res=structure(list(ypred = c(0.718230085350727, 0.654361814214854,
 0.644756374003864, 0.652006785905442, 0.695903817103918,
 0.812977870063322,
 0.639339808892218, 0.701034703527025, 0.740110233651808,
 0.664942946000872,
 0.866847509624424, 0.652502006511821, 0.650617983864292,
 0.713152860580267,
 0.717738908917513, 0.73869721161, 0.666120460155577,
 0.652242288736383,
 0.596880009623886, 0.671197684926038, 0.45, 0.820174620312253,
 0.665066954282449, 0.735164674202043, 0.696027875479989,
 0.646119215131913,
 0.644235192484384, 0.663550099786557), h4 = c(1, 0, 1, 0, 1,
 1, 1, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 1, 1, 0, 0, 0,
 0, 1), h11 = c(0, 0, 1, 0, 1, 2, 1, 0, 1, 2, 2, 1, 1, 0, 2, 1,
 2, 0, 1, 2, 2, 2, 0, 1, 1, 0, 0, 0)), .Names = c(y, h4,
 h11), class = data.frame, row.names = c(NA, -28L))
 
 
 # 
 # Some plots
 # 
 
 # Option 1
 require(Rcmdr)
 with(res,
 scatter3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11',
 ylab='Ratio',fit=c(linear,quadratic))
 )
 
 # Option 2
 require(scatterplot3d)
 with(res,
 scatterplot3d(h4,ypred,h11, xlab='Chr4',zlab='Chr11',ylab='Ratio')
 )
 
 
 # -
 # Session info
 # -
 R version 2.8.0 Patched (2008-11-08 r46864)
 i386-pc-mingw32
 
 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
 
 attached base packages:
 [1] tcltk stats graphics  grDevices utils datasets  methods
 base
 
 other attached packages:
 [1] rgl_0.81 mgcv_1.4-1   Rcmdr_1.4-4 
 car_1.2-9
 
 [5] scatterplot3d_0.3-27 lattice_0.17-15
 
 loaded via a namespace (and not attached):
 [1] grid_2.8.0
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Smoothed-3D-plots-tp20706730p20708262.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SPSSyntax function

2008-11-26 Thread Adrian Dusa

Dear all,

I've created a function called spssyntax that creates a syntax for
variable labels and value labels in SPSS (in the *sps* syntax format
that was recently discussed), using a list of such labels in R.
The entry list looks like this (between the ### signs):

###
labels - list()

labels$varlab$id - ID
labels$varlab$v1 - A label for the first variable

labels$vallab$v1 - c(
Very low=1,
Low=2,
Middle=3,
High=4,
Very high=5,
Not Applicable=-1,
Not Answered=-2)
###


And the result syntax file looks like this (again, between the ### signs):

###
VARIABLE LABELS
id ID
v1 A label for the first variable
.

VALUE LABELS
v1
1 Very low
2 Low
3 Middle
4 High
5 Very high
-1 Not Applicable
-2 Not Answered
/
.

MISSING VALUES
v1 (-1, -2)
.
###

Should there be any package interested in adopting this function
(maybe foreign?), I'd be happy to contribute it.
Best regards,
Adrian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R and SPSS

2008-11-26 Thread Applejus


Hi,

I have a code in R. Could anyone give me the best possible way (or just
ways!) to integrate it in SPSS?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/R-and-SPSS-tp20708367p20708367.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating a vector based on lookup function

2008-11-26 Thread Charles C. Berry


On Wed, 26 Nov 2008, PDXRugger wrote:



I am still searching for a solution to what i think is a simple problem i am
having with building a vector in a for loop.  I have built a more
understandable example so hopefully that will help..help you, help me, if
you know what i mean.

dev=400
#test location model TAZs to reference
cands=c(101,105,109)

#Create Object of length of cands
candslength=length(cands)

#TEST TAZ Vector
CANDS=(100:110)

#Test Dev Vector
TAZDEVS=c(120,220,320,420,520,620,720,820,920,1020,1120)

#Creates datframe of CANDS and DEVS
TAZLIST=data.frame(CANDS,TAZDEVS)

for(i in 1:candslength){

cand=cands[i]

#Creates an object based on the value of cands in TAZLIST
TAZDet=TAZLIST[TAZLIST$CANDS==cand[i],2]

}

What i would like to see is TAZDet filled with the cands corresponding
values found in TAZLIST's second column TAZDEVS, they would be
120,520,920.


No.

101 is the second element of CANDS
220 is the second element of TAZDEVS

So, 120 is not 'corresponding' in any sense I know of. et cetera.


 So if things worked the way i would like them TAZDet would be

a vector containing these three values (102,520,920).  At this point it
gives me NAs.


Which happens because you have an obvious bug. What is

TAZLIST$CANDS==cand[3]
??




Any help would be desirable, Thanks guys and gals.


Is this what you seek?

with(TAZLIST,TAZDEVS[match(cands,CANDS)])

HTH,

Chuck





Cheers,
JR
--
View this message in context: 
http://www.nabble.com/Creating-a-vector-based-on-lookup-function-tp20706588p20706588.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

2008-11-26 Thread Thomas Lumley



You can do much better by doing the correlations as a matrix operation:

system.time({

+   m1-scale(m1)
+   m2-scale(m2)
+   r-crossprod(m1,m2)/100
+   df-100
+   tstat-sqrt(df)*r/sqrt(1-r^2)
+   p-pt(tstat,df)
+   })
   user  system elapsed
  0.025   0.004   0.028

There might be a factor of n/(n-1) missing somewhere, which would be 
fixable if you could bring yourself to care about it.


-thomas



On Wed, 26 Nov 2008, Jorge Ivan Velez wrote:


Hi Daren,
Here is another aproach a little bit faster taking into account that I'm
using your original matrices.  My session info is at the end. I'm using a
2.4 GHz Core 2-Duo processor and 3 GB of RAM.

# Data
set.seed(123)
m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(10), ncol=100)
colnames(m1)=paste('m1_',1:100,sep=)
colnames(m2)=paste('m2_',1:100,sep=)

# Combinations
combs=expand.grid(colnames(m1),colnames(m2))

# ---
# Option 1
#
system.time(apply(combs,1,function(x)
cor.test(m1[,x[1]],m2[,x[2]])$p.value)-pvalues1)
#  user  system elapsed
#   8.120.018.20

# ---
# Option 2
#
require(Hmisc)
system.time(apply(combs,1,function(x)
rcorr(m1[,x[1]],m2[,x[2]])$P[2])-pvalues2)
#   user  system elapsed
#   7.000.007.02


HTH,

Jorge


# -  Session Info 
R version 2.8.0 Patched (2008-11-08 r46864)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan [EMAIL PROTECTED] wrote:



My two matrices are roughly the sizes of m1 and m2. I tried using two apply
and cor.test to compute the correlation p.values. More than an hour, and the
codes are still running. Please help to make it more efficient.

m1 - matrix(rnorm(10), ncol=100)
m2 - matrix(rnorm(1000), ncol=100)

cor.pvalues - apply(m1, 1, function(x) { apply(m2, 1, function(y) {
cor.test(x,y)$p.value }) })

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory limit

2008-11-26 Thread Stavros Macrakis

I routinely compute with a 2,500,000-row dataset with 16 columns,
which takes 410MB of storage; my Windows box has 4GB, which avoids
thrashing. As long as I'm careful not to compute and save multiple
copies of the entire data frame (because 32-bit Windows R is limited
to about 1.5GB address space total, including any intermediate
results), R works impressively well and fast with this dataset for
selections, calculations, cross-tabs, plotting, etc. For example,
simple single-column statistics and cross-tabs take 1 sec., summary
of the whole thing takes 16 sec. A linear regression between two
numeric columns takes 20 sec. Plotting of all 2.5M points takes a
while, but that is no surprise (and is usually pointless [sic]
anyway). I have not tried to do any compute-intensive statistical
calculations on the whole data set.

The main (but minor) annoyance with it is that it takes about 90 secs
to load into memory using R's native binary save format, so I tend
to keep the process lying around rather than re-starting and
re-loading for each analysis. Fortunately, garbage collection is very
effective in reclaiming unused storage as long as I'm careful to
remove unnecessary objects.

-s

On Wed, Nov 26, 2008 at 7:42 AM, iwalters [EMAIL PROTECTED] wrote:

I'm currently working with very large datasets that consist out of 1,000,000
+ rows. Is it at all possible to use R for datasets this size or should I
rather consider C++/Java.

--
View this message in context:
http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
Sent from the R help mailing list archive at Nabble.com.

[R] Installing packages based on the license

2008-11-26 Thread Ted Mendum

Hello,

I would like have an automatic way to avoid installing packages that I cannot 
use due to license restrictions.

For example, the conf.design package is limited to non-commercial use, and 
since I work for a for-profit business, I cannot use it.

I found out about the license terms of conf.design by chance; I would like to 
avoid any possibility of a mistake in the future.

Is there some clever combination of grep and install.packages that I could use 
to limit my downloads to, say, GPL-only packages?

Thanks,

Ted


Ted Mendum | Senior Scientist
Warner Babcock Institute for Green Chemistry
66 Cummings Park, Woburn, MA 01801

p:   781-937-9000   
f:781-937-9001

[EMAIL PROTECTED]
www.warnerbabcock.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R and SPSS

2008-11-26 Thread Liviu Andronic

Hello,

On Wed, Nov 26, 2008 at 9:25 PM, Applejus [EMAIL PROTECTED] wrote:
 I have a code in R. Could anyone give me the best possible way (or just
 ways!) to integrate it in SPSS?

I would doubt you could do this, but for the least provide commented,
minimal, self-contained, reproducible code. It would help if you were
more specific.
Liviu




-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SVM

2008-11-26 Thread simon abai bisrat



Hi All,

I fitted several classifiers in a two class problem. I then used the package 
'yaImpute' - to apply my predictive models to asciigrids and thereby generate a
probability maps. So far I successfully used yaImpute to generate maps for
Random Forests, Classification trees, Generalized Linear Models (GLMs) and
Generalized Additive Models (GAMs). But when I try to use it to two other 
classifiers
- support vector machine (SVM) and linear discriminant analysis (LDA) - I am
getting an error message which I am not really sure what it is trying to tell
me. Can you please take few minutes of your time to help me understand what
these error messages are?

I used the following piece of code to apply my models to grids once I fit the 
model for both LDA and SVM: 

AsciiGridPredict(lda.fit,xfiles=namelist,outfiles = as.character(outfile))
AsciiGridPredict(svm.fit,xfiles=namelist,outfiles = as.character(outfile))

I am getting the following error message for LDA: 

Rows per dot:  1  Rows to do: 163 
ToDo:
...
Done: .
First six lines of predicted data for map row:  2 
  predict.class predict.posterior.0 predict.posterior.1 predict.LD1
1 
NA  
-  
-   -
2 
NA  
-  
-   -
3 
NA  
-  
-   -
4 
NA  
-  
-   -
5 
NA  
-  
-   -
6 
NA  
-  
-   -
Error in AsciiGridImpute(object, xfiles, outfiles, xtypes = xtypes, lon =
lon,  : 
  predict is not present in the predicted data
In addition: Warning message:
In `[-.factor`(`*tmp*`, ri, value = c(-, -, -, -,  :
  invalid factor level, NAs generated

I am getting the following error message for SVM:

Rows per dot:  1  Rows to do: 163 
ToDo:
...
Done:
...
Legend of levels in output grids:
  predict
1   0
2   1
There were 50 or more warnings (use warnings() to see the first 50)
 warnings()
Warning messages:
1: In `[-.factor`(`*tmp*`, ri, value = c(-, -, -,  ... :
  invalid factor level, NAs generated
2: In `[-.factor`(`*tmp*`, ri, value = c(-, -, -,  ... :
  invalid factor level, NAs generated
3: In `[-.factor`(`*tmp*`, ri, value = c(-, -, -,  ... :
  invalid factor level, NAs generated
4: In `[-.factor`(`*tmp*`, ri, value = c(-, -, -,  ... :
  invalid factor level, NAs generated
5: In `[-.factor`(`*tmp*`, ri, value = c(-, -, -,  ... :
  invalid factor level, NAs generated

I hope my writing is clear and my questions make sense.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Estimates of coefficient variances and covariances from a multinomial logistic regression?

2008-11-26 Thread Grant Gillis

Hello and thanks in advance for any help,

I am using the 'multinom' function from the nnet package to calculate a
multinomial logistic regression.  I would like to get a matrix estimates of
the estimated coefficient variances and covariances.   Am I missing some
easy way to extract these?

Grant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing packages based on the license

2008-11-26 Thread Charles C. Berry


On Wed, 26 Nov 2008, Ted Mendum wrote:


Hello,

I would like have an automatic way to avoid installing packages that I cannot 
use due to license restrictions.

For example, the conf.design package is limited to non-commercial use, and 
since I work for a for-profit business, I cannot use it.

I found out about the license terms of conf.design by chance; I would like to 
avoid any possibility of a mistake in the future.

Is there some clever combination of grep and install.packages that I could use 
to limit my downloads to, say, GPL-only packages?


Something like:


txt - readLines(url(http://cran.r-project.org/web/packages/ALS/index.html;))
aok - regexpr(GPL,txt[grep(License,txt)+1])0


will work on CRAN as it is formatted now. Maybe roll it up in a function, 
say license.ok(), that returns the name of package if aok or print the 
license and stop if not aok. Then install.packages(license.ok(pkg.name))
should do it. Of course, this does not handle dependencies. For that you 
might want to parse with index.html with XML, follow the dependencies and 
check them all out, first.



HTH,

Chuck



Thanks,

Ted


Ted Mendum | Senior Scientist
Warner Babcock Institute for Green Chemistry
66 Cummings Park, Woburn, MA 01801

p:   781-937-9000  
f:781-937-9001

[EMAIL PROTECTED]
www.warnerbabcock.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Estimates of coefficient variances and covariances from a multinomial logistic regression?

2008-11-26 Thread Charles C. Berry


On Wed, 26 Nov 2008, Grant Gillis wrote:


Hello and thanks in advance for any help,

I am using the 'multinom' function from the nnet package to calculate a
multinomial logistic regression.  I would like to get a matrix estimates of
the estimated coefficient variances and covariances.   Am I missing some
easy way to extract these?


See

?vcov

Chuck




Grant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installing packages based on the license

2008-11-26 Thread Duncan Murdoch


On 26/11/2008 4:43 PM, Ted Mendum wrote:

Hello,

I would like have an automatic way to avoid installing packages that I cannot 
use due to license restrictions.

For example, the conf.design package is limited to non-commercial use, and 
since I work for a for-profit business, I cannot use it.

I found out about the license terms of conf.design by chance; I would like to 
avoid any possibility of a mistake in the future.

Is there some clever combination of grep and install.packages that I could use 
to limit my downloads to, say, GPL-only packages?


In theory, you should be able to look at the License field of 
available.packages(fields=License) and limit yourself to those with 
acceptable licenses.  However, I don't think CRAN reports on the license.


However, after you have installed a set of packages, you can scan the 
set of licenses using this code:


installed.packages(fields=License)[,License]

Duncan Murdoch



Thanks,

Ted


Ted Mendum | Senior Scientist
Warner Babcock Institute for Green Chemistry
66 Cummings Park, Woburn, MA 01801

p:   781-937-9000   
f:781-937-9001


[EMAIL PROTECTED]
www.warnerbabcock.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory limit

2008-11-26 Thread Henrik Bengtsson

On Wed, Nov 26, 2008 at 1:16 PM, Stavros Macrakis [EMAIL PROTECTED] wrote:
I routinely compute with a 2,500,000-row dataset with 16 columns,
which takes 410MB of storage; my Windows box has 4GB, which avoids
thrashing. As long as I'm careful not to compute and save multiple
copies of the entire data frame (because 32-bit Windows R is limited
to about 1.5GB address space total, including any intermediate
results), R works impressively well and fast with this dataset for
selections, calculations, cross-tabs, plotting, etc. For example,
simple single-column statistics and cross-tabs take 1 sec., summary
of the whole thing takes 16 sec. A linear regression between two
numeric columns takes 20 sec. Plotting of all 2.5M points takes a
while, but that is no surprise (and is usually pointless [sic]
anyway). I have not tried to do any compute-intensive statistical
calculations on the whole data set.

FYI, objects saved with save(..., compress=FALSE) are notable faster
to read back.

/Henrik

-s

On Wed, Nov 26, 2008 at 7:42 AM, iwalters [EMAIL PROTECTED] wrote:

I'm currently working with very large datasets that consist out of 1,000,000
+ rows. Is it at all possible to use R for datasets this size or should I
rather consider C++/Java.

--
View this message in context:
http://www.nabble.com/increasing-memory-limit-in-Windows-Server-2008-64-bit-tp20675880p20699700.html
Sent from the R help mailing list archive at Nabble.com.

[R] Drawing a tree in R

2008-11-26 Thread Severin Hacker

Hi,

I've been trying to make use of the dendrogram class to plot a tree. However, 
my attempts have failed so far because I'm not sure how to build up the nested 
list. (where do I have to store the nodes?). The tree I want to be drawn is 
like this: in the form (node, left child,right child) (root,(n1,1,(n2,2,3)),4) 

If it is not easily possible to draw a tree with the help of the dendrogram 
class I'm open to any other method to draw a tree in R!!

Thanks!

-Severin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshape with var as fun.aggregate

2008-11-26 Thread hadley wickham

Hi Jason,

The reason you are getting this error is that the latest version of
reshape uses fun.aggregate(numeric(0)) to figure out what missing
values should be filled in with.  Unfortunately there was a brief
period in R versions when var(numeric(0)) returned an error rather
than a missing value.  You can work around the problem by manually
specifying fill = NA.

Regards,

Hadley

On Wed, Nov 26, 2008 at 1:05 PM,  [EMAIL PROTECTED] wrote:
 I used to be able to use variance for fun.aggregate, but since I
 upgraded from R 2.6 to 2.7 (Ubuntu hardy to intrepid), it no longer
 works.

 library(reshape)
 data(french_fries)
 ffm - melt(french_fries, id=1:4, na.rm = TRUE)
 cast(ffm, rep ~ treatment, length)
  rep   1   2   3
 1   1 579 578 575
 2   2 580 579 580
 cast(ffm, rep ~ treatment, mean)
  rep123
 1   1 3.207772 3.177336 3.024522
 2   2 3.181207 3.115544 3.277759
 cast(ffm, rep ~ treatment, var)
 Error in fun.aggregate(data$value[0]) : 'x' is empty

 Any ideas?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R and SPSS

2008-11-26 Thread Andrew Choens

On Wed, 2008-11-26 at 12:25 -0800, Applejus wrote:
 Hi,
 
 I have a code in R. Could anyone give me the best possible way (or just
 ways!) to integrate it in SPSS?
 
 Thanks!

You will need a SPSS registration, but go here and get the SPSS r
plugin.

http://www.spss.com/devcentral/

It lets you access R from within SPSS. Best of both worlds.

-- 
Insert something humorous here.  :-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R and SPSS

2008-11-26 Thread David Winsemius



On Nov 26, 2008, at 7:57 PM, Andrew Choens wrote:



It lets you access R from within SPSS. Best of both worlds.

--
Insert something humorous here.  :-)


OK, I'll bite.

It lets you access R from within SPSS. Best of both worlds.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Boundary value problem

2008-11-26 Thread karthik jayasurya

Hi,

   I want to solve an ordinary differential equation as boundary value
problem. I already did it as initial value problem, but now I have
constraints for variable values at more than one point. I searched with the
key words , boundary value problem, shooting method, but couldnt find
any help online. wondering if there is a package available out there to do
this? As far as I know, its not there in either the package odesolver or
desolve.

thanks
karthik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Business Data Sets

2008-11-26 Thread nmarti


Can someone please help me, or point me in the right direction, to find some
“Business” data sets.

In the next couple of weeks I will be teaching a “small business”
class/seminar, and my intensions are to preach the importance of collecting
data on business operations and how to analyze that data.  I.e., sales can
be forecasted by calculating the correlations between x1, x2, and x3.  It’s
nothing complex, I’m just looking for some relatively straight forward data
sets.

So, is there any simple data sets within libraries and packages in R, or
other websites that you know of?

Any help would be appreciated.

-- 
View this message in context: 
http://www.nabble.com/Business-Data-Sets-tp20711909p20711909.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create a string containing '\/' to be used with SED?

2008-11-26 Thread ikarus


Thanks Sean!!

I followed you suggestion to use gsub() and it worked perfectly!
I still can't create a string with inside \/  (e.g., a -
..\\/path\\/file
doesn't work, R complains and the \\ are removed), but I don't care, 
gsub() does the same job as sed and without using any system call.



seanpor wrote:
 
 Good morning,
 
 You do not need to quote a forward slash / in R, but you do need to quote
 a backslash when you're inputting it... so to get a string which actually
 contains blah\/blah... you need to use blah\\/blah
 
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-backslash-behave-strangely-inside-strings_003f
 
 Unless this is a very very big file you shouldn't need to go out to sed,
 as gsub() should work adequately... and probably quicker and cleaner.  So
 something along the lines of.. (UNTESTED!!! since I don't have a
 reproduceable example)
 
 tmp1 - readLines(configurationFile)
 tmp1 - gsub(^instance .*, paste(instance = , data$instancePath, /,
 data$newInstance, sep = ), tmp1)
 
 
 I'm working on 50mb text files, and doing all sorts of manipulations and I
 do it all inside R under windows XP...  reading a 50mb text file across
 the 100mb network and doing a gsub() on most lines takes an elapsed 16
 seconds on this office desktop.
 
 hth...
 
 Regards,
 Sean
 
 
 ikarus wrote:
 
 Hi guys,
 I've been struggling to find a solution to the following issue:
 I need to change strings in .ini files that are given in input to a
 program whose output is processed by R. The strings to be changed looks
 like: 
 instance = /home/TSPFiles/TSPLIB/berlin52.tsp 
 
 I normally use Sed for this kind of things. So, inside R I'd like to
 write something like:
 
  command - paste(sed -i 's/^instance .*/instance = ,
 data$instancePath,
data$newInstance, /' , configurationFile, sep = )
  system(command)
 
 This will overwrite the line starting with instance  using instance =
 the_new_instance
 In the example I gave, data$instancePath = /home/TSPFiles/TSPLIB/ and 
 data$newInstance = berlin52.tsp
 
 The problem is that I need to pass the above path string to sed in the
 form:
 \/home\/TSPFiles\/TSPLIB\/ 
 
 However, I couldn't find a way to create such a string in R. I tried in
 several different ways,
 but it always complains saying that '\/' is an unrecognized escape!
  
 Any suggestion? 
 
 Thanks!
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20711848.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] base:::rbind

2008-11-26 Thread Fernando Saldanha

It seems that after loading the package timeSeries the function
base:::rbind is replaced by methods:::rbind

 identical(base:::rbind, methods:::rbind)
[1] FALSE
 rbb = base:::rbind
 library(timeSeries)
Loading required package: timeDate
 identical(base:::rbind, methods:::rbind)
[1] TRUE
 identical(rbb, base:::rbind)
[1] FALSE

Is there a way to access the original function base:::rbind after
loading timeSeries?

I am runing R 2.8.0 under Vista Home Premium.

Thanks for any help.

FS

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2 problem

2008-11-26 Thread steve


I'm using ggplot2 2.0.8 and R 2.8.0

df = data.frame(Year = rep(1:5,2))
m = ggplot(df, aes(Year=Year))
m + geom_bar()

Error in get(calculate, env = ., inherits = TRUE)(., ...) :
  attempt to apply non-function

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 problem

2008-11-26 Thread Eric

aes() does not have an argument, Year but it does have an argument x 
so try:


df - data.frame(Year = rep(1:5,2))
m - ggplot(df, aes(x=Year))
m + geom_bar()

(It works for me.)

Eric



steve wrote:

I'm using ggplot2 2.0.8 and R 2.8.0

df = data.frame(Year = rep(1:5,2))
m = ggplot(df, aes(Year=Year))
m + geom_bar()

Error in get(calculate, env = ., inherits = TRUE)(., ...) :
  attempt to apply non-function

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 problem

2008-11-26 Thread steve


Yes - that's it.

 thank you

Steve

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create a string containing '\/' to be used with SED?

2008-11-26 Thread seanpor


What is the problem error message?  I can say

 fred - blah1\\/blah2\\/blah3

and then the string looks like...

 cat(#, fred, '#\n', sep='')
#blah1\/blah2\/blah3#

If you just ask R to print it then it looks like...
 fred
[1] blah1\\/blah2\\/blah3


when you're playing with strings and regular expressions, it's vital to
understand the backslash quoting mechanism...

Best regards,
Sean


ikarus wrote:
 
 I still can't create a string with inside \/  (e.g., a -
 ..\\/path\\/file
 doesn't work, R complains and the \\ are removed), ... snip
 
-- 
View this message in context: 
http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20713699.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] as.numeric in data.frame, but only where it is possible

2008-11-26 Thread Kinoko

Hi,

I would like to convert my character sequences in my matrix/
data.frame into numeric where it is possible.
I would also like to retain my alphabetic character strings in their
original forms.
5.15.1
hm   hm

k-matrix(c(aa, bb, 1,2, 4.3, 0), nrow=2)
mode(k) - numeric
# ln1  coerces numeric chars into character  and
# ln2  replaces alphabet chars with NA  (with warnings)
#   = OK as matrix can't have mixed types

k-matrix(c(aa, bb, 1,2, 4.3, 0), nrow=2)
g-as.data.frame(k, stringsAsFactos=FALSE)
g[,2]
# returns:
# [1] 1 2
# Levels: 1 2
# hm... let them be levels then...

sum(g[,2])
# going down in flames

g[,2] - as.numeric(g[,2])
sum(g[,2])
# is the winner of the day,
# but I have a hunch that there must be something more trivial/general
solution.

- I'm looking for something like, as.data.frame(...
as.numeric.if.possible=TRUE) ,
if possible.
- Mainly, I don't know in advance if a column has alphabetical or
numeric characters.
-  I have relatively large matrices (or a relatively slow PC) (max
100,000 x 100 cells), if that matters

*** *** *** *** ***
Another related problem of mine is to create data.frames with mixed
types (it should be possible, shouldn't it).

d-data.frame(b=seq(1,3))
d-cbind(a,b)
typeof(d)
# d was coerced into character


any help is greatly appreciated,

gabor

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

84 matches

Mail list logo