Re: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression

2010-06-26 Thread Bill.Venables
Here is another one that works:

 do.call(subset, list(dat, subsetexp))
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10
  



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Vadim Ogranovich
Sent: Saturday, 26 June 2010 11:13 AM
To: 'r-help@r-project.org'
Subject: [R] subset arg in subset(). was: converting result of substitute to 
'ordidnary' expression

Dear R users,

Please disregard my previous post converting result of substitute to 
'ordidnary' expression. The problem I have has nothing to do with substitute.

Consider:

 dat - data.frame(x=1:10, y=1:10)

 subsetexp - expression(5x)

 ## this does work
 subset(dat, eval(subsetexp))
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

 ## and so does this
 subset(dat, 5x)
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

 ## but this doesn't work
 subset(dat, subsetexp)
Error in subset.data.frame(dat, subsetexp) :
  'subset' must evaluate to logical

Why did the last expression fail and why it worked with eval()?

Thank you very much for your help,
Vadim

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments.  Email 
transmission cannot be guaranteed to be secure or error-free.  Jump Trading, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments.  This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict newdata question

2010-06-26 Thread Felipe Carrillo
Thanks Bill, that worked great!!
 
 You ask:

# How can I use predict here, 'newdata' 
 crashes
predict(m1,newdata=wolf$predicted);wolf # it doesn't work

To 
 use predict() you need to give a fitted model object (here m1) and a *data 
 frame* to specify the values of the predictors for which you want 
 predictions.  Here wolf$predicted is not a data frame, it is a 
 vector.

What I think you want is

pv - predict(m1, newdata = 
 wolf)

That will get you linear predictors.  To get probabilities you 
 need to say so as

probs - predict(m1, newdata = wolf, type = 
 response)

You can put these back into the data frame if you wish, 
 e.g.

wolf - within(wold, {
    lpreds - 
 predict(m1, wolf)
    probs - predict(m1, wolf, type = 
 response)
})

Now if you look at 

head(wolf)

you will 
 see two extra columns.


-Original Message-
From:  ymailto=mailto:r-help-boun...@r-project.org; 
 href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org 
 [mailto: 
 href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org] On 
 Behalf Of Felipe Carrillo
Sent: Saturday, 26 June 2010 10:35 AM
To:  ymailto=mailto:r-h...@stat.math.ethz.ch; 
 href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch
Subject: 
 [R] predict newdata question

Hi:
I am using a subset of the below 
 dataset to predict PRED_SUIT for
the whole dataset but I am having trouble 
 with 'newdata'. The model
was created with 153 records and want to predict 
 for 208 records. 

[lots of stuff 
 omitted]

wolf$prob99-(exp(wolf$predicted))/(1+exp(wolf$predicted))
head(wolf);dim(wolf)
   
 # How can I use predict here, 'newdata' 
 crashes
 predict(m1,newdata=wolf$predicted);wolf  # it doesn't 
 work

Thanks for any hints


Felipe D. Carrillo
Supervisory 
 Fishery Biologist
Department of the Interior
US Fish  Wildlife 
 Service
California, 
 USA




__
 ymailto=mailto:R-help@r-project.org; 
 href=mailto:R-help@r-project.org;R-help@r-project.org mailing list
 href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank 
 https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
 guide http://www.R-project.org/posting-guide.html
and provide commented, 
 minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export Results

2010-06-26 Thread Tal Galili
See
?pdf
?png
?sink

There is also R2wd (about which I wrote here:
http://www.r-statistics.com/2010/05/exporting-r-output-to-ms-word-with-r2wd-an-example-session/
)

And there are also the brew, and Sweave packages (as Henrique
mentioned).


Best,
Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Jun 25, 2010 at 6:58 PM, Pedro Mota Veiga motave...@net.sapo.ptwrote:


 Hi R users,
 How can I automatically export results and graphs to a file?
 Thanks in advance

 Pedro Mota Veiga

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Export-Results-tp2268622p2268622.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create group markers in original data frame ie.countinued... ? to calculate sth for groups defined betweenpoints in one variable (string), /separating/ spliting variable into groups by i.e. be

2010-06-26 Thread Eugeniusz Kałuża

Dear useRs, and expeRts,
tahanks 
I have found idea how to add to oryginal data a column with markers to know 
with all data in wchich period in c2 they are, suimply in the code I could add: 

  stacked_idx-stack(idx)
  merge(stacked_idx,C.df,by.x=c('values'),by.y=c('c0'), all=T) 
 
thanks for suggestions,
Kaluza 

-Wiadomość oryginalna-
Od: r-help-boun...@r-project.org w imieniu Eugeniusz Kaluza
Wysłano: Pt 2010-06-25 14:48
Do: c
Temat: Re: [R] create group markers in original data frame ie.countinued... ? 
to calculate sth for groups defined betweenpoints in one variable 
(string),/separating/ spliting variable into groups by i.e. between A-B,B-C, 
C-D, from: A, NA, NA, B, NA, NA, C, NA, NA, NA, D
 

Dear useRs,

at the beginning, 
Joris Meys, thank you for explaining how to obtain calculation result possible 
for groups between string marks in one variable in data frame, like in this 
example below (between START and STOP), wchich I would like to complete at the 
end by asking about... how is possible to mark each observations presented in 
oryginal data set
 

# so firstly, below 
# START...working example of solution proposed by: Joris Meys 
[jorism...@gmail.com] 
# Same trick :
  c0-rbind( 1,  2 , 3, 4,  5, 6, 7, 8, 9,10,11,
  12,13,14,15,16,17 )
  c0 
  c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50,  30,10,
  0,NA,20,10.3444)
  c1
  c2-rbind(NA,A,NA,NA,B,NA,NA,NA,NA,NA,NA,C,NA,NA,NA,NA,D)
  c2

  pos - which(!is.na(C.df$c2))
  idx - sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1))
  names(idx) - sapply(2:length(pos),
  function(i) paste(C.df$c2[pos[i-1]],-,C.df$c2[pos[i]]))
  out - lapply(idx,function(i) summary(C.df[i,1:2]))
  out
#STOP ... below from: Sent: Thu 2010-06-24 18:02:  Joris Meys 
[jorism...@gmail.com]


#Thank you, it is done and works very well

# - - - - - - - -- - - - - - -- - -
# Now, I try to finish my question to add gruping sybol to the whole set, 
making 
# each observation marked by the name of the interval in which that observation 
is placed.
# to tell the observator, that this observation is between ...A and B, to 
enable sorting, to eneable simple acess using match
in_sub_starting_from-rbind(NA,A,A,A,B,B,B,B,B,B,B,C,C,C,C,C,C)
in_sub_finished_by 
-rbind(NA,B,B,B,C,C,C,C,C,C,C,D,D,D,D,D,D)
in_sub_limited_by-rbind(NA,A-B,A-B,A-B,B-C,B-C,B-C,B-C,B-C,B-C,B-C,C-D,C-D,C-D,C-D,C-D,C-D)
C.df-data.frame(c0,c1,c2,in_sub_starting_from,in_sub_finished_by,in_sub_limited_by)
C.df
#

# Therefore my one more question: 
How is possible to create these vectors automaticly, having  C.df$c2 (and of 
course having also: C.df$c0,C.df$c1), :
C.df$in_sub_starting_from
C.df$in_sub_finished_by
C.df$in_sub_limited_by
#to tell the observator, that this observation is between ...A and B, to enable 
sorting, to eneable simple acess using match


#for example, to make possible this access to data:
#to to take the 7'th observation from any row of data frame,
C.df$c0[7]
C.df$c1[c0==7]
#and could
#find in this same row in_sub_starting_from  that observation is preceded by 
... 
C.df$in_sub_starting_from[c0==7]
#find in this same row in_sub_finished_by  that observation is before ...   
  
C.df$in_sub_finished_by[c0==7]
#find in this same row in_sub_finished_by  that this observation is between ... 

C.df$in_sub_limited_by[c0==7]
#

?





#Thanks for advices, and maybe and this answer, 
#looking impatiently for time with possible access to internet... 

#

Sincerely,
Kaluza


and the beginnig of this story;







-Original Message-
From: Eugeniusz Kaluza
Sent: Thu 2010-06-24 17:12
To: r-help@r-project.org
Subject: PD: [R] ?to calculate sth for groups defined between points in one 
variable (string), / value separating/ spliting variable into groups by i.e. 
between start, NA, NA, stop1, start2, NA, stop2

Dear useRs,

Thanks for advice from Joris Meys, 
Now will try to think how to make it working for less specyfic case, 
to make the problem more general.
Then the result should be displayed for every group between non empty string in 
c2 
i.e. not only result for:
 #mean:
  c1 c3c4   c5
  20  Start1 Stop1 Start1-Stop1
25.48585  Start2 Stop2 Start2-Stop2 

but also for every one group created by space between two closest strings in 
c2, that contains only seriess of Na, NA, NA, separated from time to time by 
one string i.e.:
 #mean:
  c1 c3c4   c5
  20 Start1 Stop1 Start1-Stop1
  .. Stop1 Start2 Stop1-Start2
25.48585  Start2 Stop2 Start2-Stop2 

i.e.
to rewrite this maybe for another simpler version of command

but also for every one group created by space between two closest strings in 
c2, that contains only seriess of Na, NA, NA, separated from time to time by 
one 

[R] become a member of R user community

2010-06-26 Thread Albert Lee, Ph.D.
How do I become a member of R user community?

Albert  Lee,
Ph.D. statistician


Confidentiality Notice: This communication, and any file...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen

Greg Snow kirjoitti 25.6.2010 kello 21.55:

 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

Exactly.


 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation of  
 it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or not.

Thank you! I will do this.

Is this kind of !Monte Carlo -evaluation (?) often used in  
statistics.If it is, do you know any reference for ti?

Atte


 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the  
 distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you much

 about
 whether the median is different than the hypothetical value.

 You are being misled. Simply finding a statement on a statistics
 software website, even one as reputable as Graphpad (???), does
 not
 mean
 that it is necessarily true. My understanding (confirmed
 reviewing
 Nonparametric statistical methods for complete and censored
 data
 by M.
 M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
 does
 not require that the underlying distributions be symmetric. The
 above
 quotation is highly inaccurate.


 To add to what David and others have said, look at the kernel that

 the

 U-statistic associated with the WSR test uses: the indicator (0/1)
 of
 xi
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
 average of a randomly chosen pair of values is positive.  [If
 there
 are
 ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
 

 0], i neq j.

 Frank

 --
 Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
  Department of Biostatistics   Vanderbilt
 University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 

[R] Different standard errors from R and other software

2010-06-26 Thread Min Chen
Hi all,

Sorry to bother you. I'm estimating a discrete choice model in R using
the maxBFGS command. Since I wrote the log-likelihood myself, in order to
double check, I run the same model in Limdep. It turns out that the
coefficient estimates are quite close; however, the standard errors are very
different. I also computed the hessian and outer product of the gradients in
R using the numDeriv package, but the results are still very different from
those in Limdep. Is it the routine to compute the inverse hessian that
causes the difference? Thank you very much!

 Best wishes.


Min


-- 
Min Chen
Ph.D. Candidate
Department of Agricultural, Food, and Resource Economics
125 Cook Hall
Michigan State University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Accessing matrix elements within a list

2010-06-26 Thread Maria P Petrova
Hi there, 

I cannot seem to figure out how to access the elements of a list if those 
elements are a matrix.
For example I have a the following list
 
df.list - vector(list, 3)
and I have made each of the elements a matrix as follows

for(i in 1:3){
 assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow
= FALSE, dimnames = NULL))
}

# and then insert them with a loop like this

# put matrices names in a vector
matrices-c(s1,s2,s3) 

# insert
for(i in 1:3){
df.list[[i]] - matrices[i]
}

My question is I cannot access the first rwo of the matrix within a list. The 
following does not work

df.list [[1]][1,]

Thanks for your help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing the parameter (file name) to png()

2010-06-26 Thread Maulik Shah
I am fitting 3 parameter model to my response matrix and want to generate
item characterstic curve.
I want to specify file name to save item characterstic curve by passing it
as external parameter to the R batch script. The following is the code I
have written for this.

*R Script:*

library(ltm)
cmd_args = commandArgs();
for (arg in cmd_args) cat(  , arg, \n, sep=)
respmat - read.table(C:\\rphp\\responsedata.txt)
fit3pl - tpm(respmat)
cat(  , arg, \n, sep=)
b - c(C:\\rphp\\,arg)
png(file=b, bg=transparent)
plot(fit3pl,items=c,lwd=3)
dev.off()
rm(respmat,fit3pl,b)
q()

Could you please help me in doing so? I get an error message when R executes
png().

Thanks and Regards,
Maulik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen

Greg Snow kirjoitti 25.6.2010 kello 21.55:

 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation of  
 it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or not.

I check, so you mean doing it this way:

t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean 
(SAMPLE), alt = less)

Atte


 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the  
 distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you much

 about
 whether the median is different than the hypothetical value.

 You are being misled. Simply finding a statement on a statistics
 software website, even one as reputable as Graphpad (???), does
 not
 mean
 that it is necessarily true. My understanding (confirmed
 reviewing
 Nonparametric statistical methods for complete and censored
 data
 by M.
 M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
 does
 not require that the underlying distributions be symmetric. The
 above
 quotation is highly inaccurate.


 To add to what David and others have said, look at the kernel that

 the

 U-statistic associated with the WSR test uses: the indicator (0/1)
 of
 xi
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
 average of a randomly chosen pair of values is positive.  [If
 there
 are
 ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
 

 0], i neq j.

 Frank

 --
 Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
  Department of Biostatistics   Vanderbilt
 University


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide 

Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-26 Thread gokhanocakoglu

I am using EDMA for comparing the interlandmark distances of two forms.
Actually there is a software called EDMA of course developed by Lele and
Richstmeier but main point is I am trying to solve quite different problem
on the same data set using R and EDMA so data entry format of EDMA software
is different than R format, every trial(for every different data set) I have
to set the data entry format according to EDMA.
Now I am checking Julien Claud's book named Morphometrics with R
(http://www.springer.com/statistics/life+sciences,+medicine+%26+health/book/978-0-387-77789-4)
there is a section about EDMA and hopefully I am trying the reach same
results with EDMA using EDMA software... 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2269210.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen

Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:


 Greg Snow kirjoitti 25.6.2010 kello 21.55:

 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation  
 of it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or  
 not.

 I check, so you mean doing it this way:

 t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean 
 (SAMPLE), alt = less)

NO, this way:

t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace =  
FALSE)], mu=mean(SAMPLE), alt = less)

Atte


 Atte


 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the  
 distances
 according to the reference set through a musical piece result in  
 more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of  
 all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you  
 offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data  
 are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you  
 much

 about
 whether the median is different than the hypothetical value.

 You are being misled. Simply finding a statement on a statistics
 software website, even one as reputable as Graphpad (???), does
 not
 mean
 that it is necessarily true. My understanding (confirmed
 reviewing
 Nonparametric statistical methods for complete and censored
 data
 by M.
 M. Desu, Damaraju Raghavarao, is that the Wilcoxon signed-rank
 test
 does
 not require that the underlying distributions be symmetric. The
 above
 quotation is highly inaccurate.


 To add to what David and others have said, look at the kernel  
 that

 the

 U-statistic associated with the WSR test uses: the indicator  
 (0/1)
 of
 xi
 + xj  0.  So WSR tests H0:p=0.5 where p = the probability that
 the
 average of a randomly chosen pair of values is positive.  [If
 there
 are
 ties this probably needs to be worded as P[xi + xj  0] = P[xi +
 xj
 

 0], i neq j.

 Frank

 --
 Frank E Harrell Jr   Professor and ChairmanSchool of
 Medicine
  Department of Biostatistics   Vanderbilt
 University


 

Re: [R] become a member of R user community

2010-06-26 Thread Ted Harding
On 25-Jun-10 21:46:13, Albert Lee, Ph.D. wrote:
 How do I become a member of R user community?
 
 Albert  Lee,
 Ph.D. statistician

1. By using R
2. By subscribing to the R-help mailing list and keeping in touch
   with the rest of us!

To subscribe your email address to the list, visit the R-help
info page at:

  https://stat.ethz.ch/mailman/listinfo/r-help

and follow the instructions under Subscribing to R-help.

Welcome!
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 26-Jun-10   Time: 10:52:00
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem: RWinEdt and Windows 7

2010-06-26 Thread Johannes Reichl
Hi I can install RWinEdt if I start R with administrator rigths, but it
does not paste my code to the console. I found advice in the link below
how to manage the problem, but it did not work, any other idea? 
http://yusung.blogspot.com/2009/01/rwinedt-and-windows-vistawindow-7.html
 Thanks a lot,Johannes From: Uwe Ligges
ligges_at_statistik.tu-dortmund.de 
Date: Sun, 08 Nov 2009 16:23:34 +0100


Aha, what is that blog post and what does not work for you? I haven't
got any report so far and do not have Windows 7 easily available yet. 
Best, 
Uwe Ligges 
Peter Flom wrote: 
 Good morning  (
http://tolstoy.newcastle.edu.au/R/e8/help/09/11/4040.html#4042qlink1 )
 
 I just got a new computer with Windows 7. R works fine, but the
editor I am used to using RWinEdt does not. I did find one blog post
on how to get RWinEdt to work in Windows 7, but I could not get those
instructions to work either. 
 
 Is there a patch for RWinEdt? 
 
 If not, is there another good R editor that works under Windows 7? 
 
 I tried RSiteSearch with various combinations of Windows 7 and Editor
and so on, but found nothing. I also tried googling on these terms. 
 
 Thanks 
 
 Peter 
 
 Peter L. Flom, PhD 
 Statistical Consultant 
 Website: www DOT peterflomconsulting DOT com 
 Writing; http://www.associatedcontent.com/user/582880/peter_flom.html

 Twitter: @peterflom 
 
 __ 
 R-help_at_r-project.org mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html (
http://www.r-project.org/posting-guide.html ) 
 and provide commented, minimal, self-contained, reproducible code. 

 
Dr. Johannes Reichl
Abteilung Energiewirtschaft
Energieinstitut an der Johannes Kepler Universität Linz
Altenberger Straße 69
A-4040 Linz
*
Tel.: +43-732-2468-5652
Fax: +43-732-2468-5651
Email: rei...@energieinstitut-linz.at 

Web: www.energieinstitut-linz.at
 www.energyefficiency.at
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Passing the parameter (file name) to png()

2010-06-26 Thread jim holtman
b - paste(C:\\rphp\\,arg, sep='')

On Sat, Jun 26, 2010 at 12:55 AM, Maulik Shah maulik.shah2...@gmail.com wrote:
 I am fitting 3 parameter model to my response matrix and want to generate
 item characterstic curve.
 I want to specify file name to save item characterstic curve by passing it
 as external parameter to the R batch script. The following is the code I
 have written for this.

 *R Script:*

 library(ltm)
 cmd_args = commandArgs();
 for (arg in cmd_args) cat(  , arg, \n, sep=)
 respmat - read.table(C:\\rphp\\responsedata.txt)
 fit3pl - tpm(respmat)
 cat(  , arg, \n, sep=)
 b - c(C:\\rphp\\,arg)
 png(file=b, bg=transparent)
 plot(fit3pl,items=c,lwd=3)
 dev.off()
 rm(respmat,fit3pl,b)
 q()

 Could you please help me in doing so? I get an error message when R executes
 png().

 Thanks and Regards,
 Maulik

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accessing matrix elements within a list

2010-06-26 Thread jim holtman
first of all take a look at the object you created:

 df.list - vector(list, 3)


 for(i in 1:3){
+  assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow
+ = FALSE, dimnames = NULL))
+ }

 # and then insert them with a loop like this

 # put matrices names in a vector
 matrices-c(s1,s2,s3)

 # insert
 for(i in 1:3){
+ df.list[[i]] - matrices[i]
+ }


 str(df.list)
List of 3
 $ : chr s1
 $ : chr s2
 $ : chr s3


you will see that it is a list of characters since that is what is in
'matrices'  What you need to do is to use 'get':

 df.list - vector(list, 3)


 for(i in 1:3){
+  assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow
+ = FALSE, dimnames = NULL))
+ }

 # and then insert them with a loop like this

 # put matrices names in a vector
 matrices-c(s1,s2,s3)

 # insert
 for(i in 1:3){
+ df.list[[i]] - get(matrices[i])
+ }


 str(df.list)
List of 3
 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 df.list[[1]][1,]
[1] 0 0 0 0 0


or even better just put them in the list at the first:

 df.list - vector(list, 3)


 for(i in 1:3){
+ df.list[[i]] - matrix(0, nrow = 20, ncol = 5, byrow
+ = FALSE, dimnames = NULL)
+ }

 str(df.list)
List of 3
 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:20, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
 df.list[[1]][1,]
[1] 0 0 0 0 0




On Fri, Jun 25, 2010 at 5:29 PM, Maria P Petrova
mpetr...@u.washington.edu wrote:
 Hi there,

 I cannot seem to figure out how to access the elements of a list if those 
 elements are a matrix.
 For example I have a the following list

 df.list - vector(list, 3)
 and I have made each of the elements a matrix as follows

 for(i in 1:3){
  assign(paste(s,i, sep=),matrix(0, nrow = 20, ncol = 5, byrow
 = FALSE, dimnames = NULL))
 }

 # and then insert them with a loop like this

 # put matrices names in a vector
 matrices-c(s1,s2,s3)

 # insert
 for(i in 1:3){
 df.list[[i]] - matrices[i]
 }

 My question is I cannot access the first rwo of the matrix within a list. The 
 following does not work

 df.list [[1]][1,]

 Thanks for your help!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recursive indexing failed at level 2

2010-06-26 Thread Jim Hargreaves

Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]

where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone through 
about 200,000 iterations before giving this error.


Actual code:
 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

  recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Duncan Murdoch

On 26/06/2010 7:53 AM, Jim Hargreaves wrote:

Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]
  


If pulse is a list, then pulse[i] is also a list, with one element.  I 
think you want pulse[[i]], which extracts element i.


If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to 
pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies 
that it is a list containing a list etc, nested 20 levels deep.  The 
error message is telling you that it's not.  I'm not sure what your 
intention is in this case.


Duncan Murdoch

where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone through 
about 200,000 iterations before giving this error.


Actual code:
  
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Jim Hargreaves

On 06/26/2010 01:20 PM, Duncan Murdoch wrote:

On 26/06/2010 7:53 AM, Jim Hargreaves wrote:

Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]


If pulse is a list, then pulse[i] is also a list, with one element.  I 
think you want pulse[[i]], which extracts element i.
Ahh, I specified pulse[i] has 20 values in my original mail. Basically 
pulse is a list 1000 elements long, with each element in pulse having 
between 1000 and 2000 elements of it's own. Pulse is a list of lists. 
Also as far as I am aware, [[ ]]'s should only be used when assigning 
values to elements of a list/vector.


unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc.


If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to 
pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies 
that it is a list containing a list etc, nested 20 levels deep.  The 
error message is telling you that it's not.  I'm not sure what your 
intention is in this case.

Duncan Murdoch

where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone 
through about 200,000 iterations before giving this error.


Actual code:
 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recoding dates to session id in a longitudinal dataset

2010-06-26 Thread John-Paul Bogers
Hi,

I'm fairly new to R but I have a large dataset (30 obs) containing
patient material. Some patients came 2-9 times during the three year
observation period. The patients are identified by a unique idnr, the
sessions can be distinguished using the session date. How can I recode the
date of the session to a session id (1-9). This would be necessary to obtain
information and do some analysis on the first occurence of a specific
patient or to look for trends.

Thanks

JP Bogers
University of Antwerp

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Duncan Murdoch

On 26/06/2010 8:29 AM, Jim Hargreaves wrote:

On 06/26/2010 01:20 PM, Duncan Murdoch wrote:
  

On 26/06/2010 7:53 AM, Jim Hargreaves wrote:


Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]
  
If pulse is a list, then pulse[i] is also a list, with one element.  I 
think you want pulse[[i]], which extracts element i.

Ahh, I specified pulse[i] has 20 values in my original mail. 
But that could not be correct.  Take a look at length(pulse[i]).  
Assuming that i is a scalar value, length(pulse[i]) will be 1.  You 
really do want pulse[[i]].   You used unlist(pulse[i]) which is 
sometimes the same as pulse[[i]], but it really depends on what 
pulse[[i]] is.  unlist() is a very crude tool, and you should avoid it 
unless you really need it.



Basically 
pulse is a list 1000 elements long, with each element in pulse having 
between 1000 and 2000 elements of it's own. Pulse is a list of lists. 
  
Also as far as I am aware, [[ ]]'s should only be used when assigning 
values to elements of a list/vector.
  


Whoever told you that was mistaken.

Duncan Murdoch

unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc.
  
If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent to 
pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax implies 
that it is a list containing a list etc, nested 20 levels deep.  The 
error message is telling you that it's not.  I'm not sure what your 
intention is in this case.

Duncan Murdoch


where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone 
through about 200,000 iterations before giving this error.


Actual code:
  
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recoding session date into session id in a longitudinal dataset

2010-06-26 Thread JP Bogers
Hi,

I'm fairly new to R but I have a large dataset (30 obs) containing
patient material. Some patients came 2-9 times during the three year
observation period. The patients are identified by a unique idnr, the
sessions can be distinguished using the session date. How can I recode the
date of the session to a session id (1-9). This would be necessary to obtain
information and do some analysis on the first occurence of a specific
patient or to look for trends.

Thanks

JP Bogers
University of Antwerp

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Jim Hargreaves

Hi Duncan, list,

Thanks for the advice, but unfortunately that wasn't what was causing my 
problem. I'm still getting the Recursive indexing failed at level 2 
message even after replacing my unlist(pulse[i]) with pulse[[i]].


Error:

 
pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]
Error in 
pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

  recursive indexing failed at level 2

It's almost as if the length of pulse[[i]] is too small, but it's length 
is 1001 and peak_start[i] and peak_end[i] are 192 and 208 respectively.


Also why would the problem crop up only after 200,000 runs?

Bizarre!

Regards,
Jim Hargreaves



On 06/26/2010 01:38 PM, Duncan Murdoch wrote:

On 26/06/2010 8:29 AM, Jim Hargreaves wrote:

On 06/26/2010 01:20 PM, Duncan Murdoch wrote:

On 26/06/2010 7:53 AM, Jim Hargreaves wrote:

Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]
If pulse is a list, then pulse[i] is also a list, with one element.  
I think you want pulse[[i]], which extracts element i.
Ahh, I specified pulse[i] has 20 values in my original mail. 
But that could not be correct.  Take a look at length(pulse[i]).  
Assuming that i is a scalar value, length(pulse[i]) will be 1.  You 
really do want pulse[[i]].   You used unlist(pulse[i]) which is 
sometimes the same as pulse[[i]], but it really depends on what 
pulse[[i]] is.  unlist() is a very crude tool, and you should avoid it 
unless you really need it.



Basically pulse is a list 1000 elements long, with each element in 
pulse having between 1000 and 2000 elements of it's own. Pulse is a 
list of lists.   Also as far as I am aware, [[ ]]'s should only be 
used when assigning values to elements of a list/vector.


Whoever told you that was mistaken.

Duncan Murdoch

unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc.
If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent 
to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax 
implies that it is a list containing a list etc, nested 20 levels 
deep.  The error message is telling you that it's not.  I'm not sure 
what your intention is in this case.

Duncan Murdoch

where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone 
through about 200,000 iterations before giving this error.


Actual code:
  
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- 
unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Duncan Murdoch

Jim Hargreaves wrote:

Hi Duncan, list,

Thanks for the advice, but unfortunately that wasn't what was causing my 
problem. I'm still getting the Recursive indexing failed at level 2 
message even after replacing my unlist(pulse[i]) with pulse[[i]].
  


Read the second part of my first message, which explains the error.  You 
had two errors in the original expression, and have only fixed one.


Duncan Murdoch

Error:

  
pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]
Error in 
pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

It's almost as if the length of pulse[[i]] is too small, but it's length 
is 1001 and peak_start[i] and peak_end[i] are 192 and 208 respectively.


Also why would the problem crop up only after 200,000 runs?

Bizarre!

Regards,
Jim Hargreaves



On 06/26/2010 01:38 PM, Duncan Murdoch wrote:
  

On 26/06/2010 8:29 AM, Jim Hargreaves wrote:


On 06/26/2010 01:20 PM, Duncan Murdoch wrote:
  

On 26/06/2010 7:53 AM, Jim Hargreaves wrote:


Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]
  
If pulse is a list, then pulse[i] is also a list, with one element.  
I think you want pulse[[i]], which extracts element i.

Ahh, I specified pulse[i] has 20 values in my original mail. 
  
But that could not be correct.  Take a look at length(pulse[i]).  
Assuming that i is a scalar value, length(pulse[i]) will be 1.  You 
really do want pulse[[i]].   You used unlist(pulse[i]) which is 
sometimes the same as pulse[[i]], but it really depends on what 
pulse[[i]] is.  unlist() is a very crude tool, and you should avoid it 
unless you really need it.




Basically pulse is a list 1000 elements long, with each element in 
pulse having between 1000 and 2000 elements of it's own. Pulse is a 
list of lists.   Also as far as I am aware, [[ ]]'s should only be 
used when assigning values to elements of a list/vector.
  

Whoever told you that was mistaken.

Duncan Murdoch


unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc.
  
If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent 
to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax 
implies that it is a list containing a list etc, nested 20 levels 
deep.  The error message is telling you that it's not.  I'm not sure 
what your intention is in this case.

Duncan Murdoch


where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone 
through about 200,000 iterations before giving this error.


Actual code:
  
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- 
unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
  


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Jim Hargreaves

On 06/26/2010 02:07 PM, Duncan Murdoch wrote:

Jim Hargreaves wrote:

Hi Duncan, list,

Thanks for the advice, but unfortunately that wasn't what was causing 
my problem. I'm still getting the Recursive indexing failed at level 
2 message even after replacing my unlist(pulse[i]) with pulse[[i]].


Read the second part of my first message, which explains the error.  
You had two errors in the original expression, and have only fixed one.

Doh!

Working as intended now, thanks very much for your help!

Regards,
Jim Hargreaves


Duncan Murdoch

Error:

 
pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]
Error in 
pulse_subset[[1:as.numeric(length(pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- pulse[[i]][as.numeric(peak_start[i]):as.numeric(peak_end[i])] :

   recursive indexing failed at level 2

It's almost as if the length of pulse[[i]] is too small, but it's 
length is 1001 and peak_start[i] and peak_end[i] are 192 and 208 
respectively.


Also why would the problem crop up only after 200,000 runs?

Bizarre!

Regards,
Jim Hargreaves



On 06/26/2010 01:38 PM, Duncan Murdoch wrote:

On 26/06/2010 8:29 AM, Jim Hargreaves wrote:

On 06/26/2010 01:20 PM, Duncan Murdoch wrote:

On 26/06/2010 7:53 AM, Jim Hargreaves wrote:

Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]
If pulse is a list, then pulse[i] is also a list, with one 
element.  I think you want pulse[[i]], which extracts element i.
Ahh, I specified pulse[i] has 20 values in my original mail. 
But that could not be correct.  Take a look at length(pulse[i]).  
Assuming that i is a scalar value, length(pulse[i]) will be 1.  You 
really do want pulse[[i]].   You used unlist(pulse[i]) which is 
sometimes the same as pulse[[i]], but it really depends on what 
pulse[[i]] is.  unlist() is a very crude tool, and you should avoid 
it unless you really need it.



Basically pulse is a list 1000 elements long, with each element in 
pulse having between 1000 and 2000 elements of it's own. Pulse is a 
list of lists.   Also as far as I am aware, [[ ]]'s should only be 
used when assigning values to elements of a list/vector.

Whoever told you that was mistaken.

Duncan Murdoch

unlist(pulse[1]) gives x1, x2, x3, x4, x5 etc. etc.
If pulse_subset is a list, then pulse_subset[[1:20]] is equivalent 
to pulse_subset[[1]][[2]][[3]][[4]] ... [[20]], i.e. the syntax 
implies that it is a list containing a list etc, nested 20 levels 
deep.  The error message is telling you that it's not.  I'm not 
sure what your intention is in this case.

Duncan Murdoch

where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone 
through about 200,000 iterations before giving this error.


Actual code:
  
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- 
unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]


Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]] 
- 
unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] 
:

   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] several common sub-axes within multiple plot area

2010-06-26 Thread Karl Brand

Dear List,

I'd really appreciate tip's or code demonstrating how i can achieve some 
common axis labels integrated into a multiple plot.


In my example (below), i'm trying to achieve:

-a single Results 1 (Int) centered  btwn row 1 and row 2;
-a single Results 2 (Int) centered  btwn row 2 and row 3;  and,
-a single Results 3 (Int) centered at the bottom, ie., below row 3.

I played with mtext() and par(oma=... per this post-

https://stat.ethz.ch/pipermail/r-help/2004-October/059453.html

But have so far failed to achieve my goal. Can i succeed with something 
combined with the 'high level' plot() function? Or do i need to get 
specific with some low level commands (help!)?


With big thanks in advance for any suggestions/examples.

cheers,

Karl

#my example:
dev.new()
plot.new()
par(mfrow=c(3,2))
#Graph 1:
plot(rnorm(20), rnorm(20),
 xlab = Results 1 (Int),
 ylab = Variable A,
 main = Factor X)
#Graph 2:
plot(rnorm(20), rnorm(20),
 xlab = Results 1 (Int),
 ylab = Variable A,
 main = Factor Y)
#Graph 3:
plot(rnorm(20), rnorm(20),
 xlab = Results 2 (Int),
 ylab = Variable B)
#Graph 4:
plot(rnorm(20), rnorm(20),
 xlab = Results 2 (Int),
 ylab = Variable B)
#Graph 5:
plot(rnorm(20), rnorm(20),
 xlab = Results 3 (Int),
 ylab = Variable C)
#Graph 6:
plot(rnorm(20), rnorm(20),
 xlab = Results 3 (Int),
 ylab = Variable C)


--
Karl Brand
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
T +31 (0)10 704 3457 |F +31 (0)10 704 4743 |M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integration of two normal density

2010-06-26 Thread Ravi Varadhan
I sent you a doubble integration solution a couple of days ago, that answered 
your question.  You did not have the coureteousy to acknowledge that.  Now, you 
are asking a different question that is incorrectly formulated.  What you are 
doing is not multivariate integration.  You are just integrating a univariate 
function, which as Prof. Venables pointed out, is not even a density.

Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Carrie Li carrieands...@gmail.com
Date: Friday, June 25, 2010 11:29 pm
Subject: [R]  integration of two normal density
To: r-help R-help@r-project.org


 Hello everyone,
  
  I have a question about integration of two density function
  Intuitively, I think the value after integration should be 1, but 
 they are
  not. Am I missing something here ?
  
   t=function(y){dnorm(y, mean=3)*dnorm(y/2, mean=1.5)}
   integrate(t, -Inf, Inf)
  0.3568248 with absolute error  4.9e-06
  
  
  Also, is there any R function or package could do multivariate 
 integration ?
  
  Thanks for any suggestions!
  
  Carrie
  
   [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integration of two normal density

2010-06-26 Thread Matt Shotwell
On Fri, 2010-06-25 at 23:28 -0400, Carrie Li wrote:
 Hello everyone,
 
 I have a question about integration of two density function
 Intuitively, I think the value after integration should be 1, but they are
 not. Am I missing something here ?
 
  t=function(y){dnorm(y, mean=3)*dnorm(y/2, mean=1.5)}
  integrate(t, -Inf, Inf)
 0.3568248 with absolute error  4.9e-06

You've demonstrated (numerically) that the product of two normal density
functions, with means 3, and 1.5 respectively and variance 1, doesn't
result in a pdf. However, you could make a numerically normalized pdf by
multiplying by 1/0.3568248.

 K - integrate(t, -Inf, Inf)$value
 Kt - function(y) 1/K * dnorm(y, 3) * dnorm(y/2, 1.5)
 integrate(Kt, -Inf, Inf)
1 with absolute error  1.4e-05

Hence, the quantity you computed (K) is the normalization constant, with
some small error. Note that this strategy _may_ not always work. Here's
a good homework question: Can the product of two pdfs with identical
support always be normalized to form a new pdf?

As for empirical multivariate integration, it's tough, especially if you
want to enumerate the area under the surface, which is exactly the
strategy of functions like 'integrate' (search Wikipedia for numerical
integration). This problem becomes increasingly difficult in additional
dimensions; the dreaded curse of dimensionality. On the bright side,
Bayesian statistical methods have to deal with this all the time, and we
have some good methods to compute numerical integrals. Check out Monte
Carlo integration, and Markov chain Monte Carlo methods.

-Matt

 
 
 Also, is there any R function or package could do multivariate integration ?
 
 Thanks for any suggestions!
 
 Carrie
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
Matthew S. Shotwell
Graduate Student
Division of Biostatistics and Epidemiology
Medical University of South Carolina
http://biostatmatt.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Forcing scalar multiplication.

2010-06-26 Thread Uwe Ligges

 t(e$values * t(e$vectors))

Uwe Ligges

On 25.06.2010 20:42, rkevinbur...@charter.net wrote:

I am trying to check the results from an Eigen decomposition and I need to force a 
scalar multiplication. The fundamental equation is: Ax = lx. Where 'l' is the 
eigen value and x is the eigen vector corresponding to the eigenvalue. 'R' returns 
the eigenvalues as a vector (e- eigen(A); e$values). So in order to 'check' 
the result I would multiply the eigenvalues ('l') by the eigenvectors. But unless 
I do it one by one (say e$values[1] * e$vectors[,1]) 'R' tries a matrix 
multiplication and that is not what I want.  I would like a matrix that is formed 
by the SCALAR multiplication of each of the values by the corresponding 
eigenvector. How can I force such a multiplication?

Thank you.

Kevin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boot with strata: strata argument ignored?

2010-06-26 Thread Bryan Hanson
Hello All.  I must be missing the really obvious here:

mm - function(d, i) median(d[i])
b1 - boot(gravity$g, mm, R = 1000)
b1
b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series)
b2

Both b1 and b2 seem to have done (almost) the same thing, but it looks like
the strata argument in b2 has been ignored.  However, str(b1) vs str(b2)
does show that the strata have been noted correctly.  But b2$t is a 1000 x 1
array, not a 1000 x 8 array (gravity$series is a factor with 8 levels).

There is a more complex example in ?boot using the same data set that gives
a result that seems to make sense (2 levels in the factor, so $t has 2
columns).

I either misunderstand the expected behavior or I've missed some punctuation
or syntax detail.

TIA, Bryan

*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA

 sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] datasets  tools grid  graphics  grDevices utils stats
[8] methods   base 

other attached packages:
 [1] boot_1.2-42brew_1.0-3 faraway_1.0.4
 [4] GGally_0.2 xtable_1.5-6   mvbutils_2.5.1
 [7] ggplot2_0.8.7  digest_0.4.2   reshape_0.8.3
[10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0
[13] R.oo_1.7.2 R.methodsS3_1.2.0  rgl_0.91
[16] lattice_0.18-5 mvoutlier_1.4  plyr_0.1.9
[19] RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
[22] robustbase_0.5-0-1 rpart_3.1-46   pls_2.1-0
[25] pcaPP_1.8-1mvtnorm_0.9-9  nnet_7.3-1
[28] mclust_3.4.4   MASS_7.3-5 lars_0.9-7
[31] e1071_1.5-23   class_7.3-2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-26 Thread Corey Sparks

Did you try tapply?
?tapply

tapply(RT, RT$R, fun=WA)

or something like that

-
Corey Sparks, PhD
Assistant Professor
Department of Demography and Organization Studies
University of Texas at San Antonio
501 West Durango Blvd
Monterey Building 2.270C
San Antonio, TX 78207
210-458-3166
corey.sparks 'at' utsa.edu
https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269444.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-26 Thread Corey Sparks

I think the hardest thing about true EDMA (meaning the Richtsmeier and Lele
version) is the bootstrapping to get significance.  Have you tried their
software?
http://www.getahead.psu.edu/resource_new.html


-
Corey Sparks, PhD
Assistant Professor
Department of Demography and Organization Studies
University of Texas at San Antonio
501 West Durango Blvd
Monterey Building 2.270C
San Antonio, TX 78207
210-458-3166
corey.sparks 'at' utsa.edu
https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2269445.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behaviour of CairoPNG

2010-06-26 Thread Thomas Steiner
Thank you Henrik for your answer.

I hope now I am inline with the posting huide and perhaps I get an
answer, thank you.

 sessionInfo()
R version 2.9.0 alpha (2009-03-23 r48200)
i386-pc-mingw32

locale:
LC_COLLATE=German_Austria.1252;LC_CTYPE=German_Austria.1252;LC_MONETARY=German_Austria.1252;LC_NUMERIC=C;LC_TIME=German_Austria.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Cairo_1.4-4



2010/6/5 Henrik Bengtsson h...@stat.berkeley.edu:
 FYI, follow the information in the email footer:

 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

 and make sure at a minimum to report your sessionInfo(). That
 increases your chances to get a response.

 /Henrik

 On Sat, Jun 5, 2010 at 11:42 AM, Thomas Steiner finbref.2...@gmail.com 
 wrote:
 OK, no reply. :-(
 I'm more offensive: this is a bug!
 the font-parameter of the text fucntion does not work properly in the
 Cairo-package
 thomas


 2010/6/4 Thomas Steiner finbref.2...@gmail.com:
 Hi,
 could it be that the text() fuction gives different output for normal
 png() and CarioPNG()?
 See the following example and the attached images: the font=2 and
 font=3 seem to be exchanged!
 Thanks for help,
 Thomas

 CairoPNG(Test-cairo.png,width=750,height=690)
 #png(Test-normal.png,width=750,height=690)

 plot(1,1,type=n,main=normal)
 text(1,1,normal,adj=c(1,1))
 text(1,1,bold,font=2,adj=c(-1,-1))
 text(1,1,italic,font=3,adj=c(1,-1))
 text(1,1,italicbold,font=4,adj=c(-1,1))

 dev.off()


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating Summaries for each level of a Categorical variable

2010-06-26 Thread RaoulD

Hi,

I have a dataset which has a categorical variable R,a count variable C
(integer) and 4 or more numeric variables (A,T,W,H - integers) containing
measures for R. I would like to summarize each level of the variable R by
the average for A,T,W and H. 

I have written a function to calculate weighted averages using C as the
weight and this is given below. The function works perfectly but how do I
add the additional dimension I require to this function?

Dataset: RT=
R A  T   W   H
R1   10 20 20  10
R2   60 20 50  10
R3   45 10 20  50
R4   68 50 20  10
R1   73 20 40  46
R3   25 30 10  54
R3   36 90 20  10
R2   29 10 30  30

# FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C
WA-function(A,C) {
 sp_A-c(A %*% C)
 sum_C-sum(C)
 WA-sp_A/sum_C   
 return(WA)  
 }

I am trying to incorporate the additional step of calculating the weighted
average of A,T,W and H for each level of R. Need help with this.

Thanks in advance!
Raoul
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-26 Thread Muenchen, Robert A (Bob)


-Original Message-
From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Friday, June 25, 2010 10:10 PM
To: Muenchen, Robert A (Bob)
Cc: Dario Solari; r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

I had taken the opposite tack with Google Trends by subtracting
 keywords
like:
SAS -shoes -airlines -sonar...
but never got as good results as that beautiful X code for search.
When you see the end-of-semester panic bumps in traffic, you know
 you're
nailing it!

 I have to eat those words already. The R code for search that
showed
a
 peak every December did not have quotes around it, so it was
searching
 for those three words not the complete phrase. When you add the
quotes,
 the peaks vanish.

Don't swallow! You're looking through search terms, not through web
pages. R code for regression, regression code R etc. are all valid
searches, no quotation marks needed.

I wondered why those clear peaks had vanished when I added quotes.
Here's one that combines the search terms without the quotes. It shows
several March/April  October/November peaks: 

http://www.google.com/insights/search/#q=r%20code%20for%2Br%20manual%2Br
%20tutorial%2Br%20graph%2Csas%20code%20for%2Bsas%20manual%2Bsas%20tutori
al%2Bsas%20graph%2Cspss%20code%20for%2Bspss%20manual%2Bspss%20tutorial%2
Bspss%20graph%2Cstata%20code%20for%2Bstata%20manual%2Bstata%20tutorial%2
Bstata%20graph%2Cs-plus%20code%20for%2Bs-plus%20manual%2Bs-plus%20tutori
al%2Bs-plus%20graphcmpt=q

I've been trying to make sense of Google Scholar searches. I'm obviously
missing something basic. Here are two searches on www.google.com:

sas - gets 68M hits
sas OR spss - gets 74.3M hits. A bigger number as OR would imply.

But when I do the same searches on scholar.google.com, here's what I
get:

sas - gets 4.6M hits
sas OR spss - gets 1.65M hits

How on earth can an OR get you less??

Thanks,
Bob


http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20
S
AS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q

This one is nice too. You can see that the bump in the autumn semester
for R is replacing the one for Matlab. Then in the spring semester
Matlab stays high but R drops. And both the US and India always have a
very large search index, whereas the rest of the world is essentially
worthless. Which leads me to the conclusion that : 1) The results are
probably coming from google.com, excluding local versions, and 2) in
the US (and India), statistics is mainly taught in the autumn
semester. Given the fact that daylight has a beneficial effect on the
emotional well being, the impopularity of statistics is likely caused
by unfortunate scheduling.

Forget Excel. Google rocks! ;-)

Cheers
Joris


 Once you go the phrase route, you gain precision but end up with zero
 counts on various phrases. I avoided that by combining them with +
to
 get enough to plot. The resulting graph shows SAS dominant until
 mid-2006 when SPSS takes the top position, followed by R, SAS, Stata
in
 order:


http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20
m

anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%2
2

%2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22s
p

ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22s
p

ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22st
a
 ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-
plus%20code%20for%22%2B%22
 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-
plus%20graph%22cmpt=q

 This might be a good one to add to http://r4stats.com/popularity

 Bob


I see that there's a car, the R Code Mustang, that adding for gets
 rid
of.

Thanks for getting me back on a topic that I had given up on!

Bob

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Joris Meys
Sent: Thursday, June 24, 2010 7:56 PM
To: Dario Solari
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

Nice idea, but quite sensitive to search terms, if you compare your
result on ... code with ... code for:
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code
%
2
 0
f
or%2Cspss%20code%20forcmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari
 dario.sol...@gmail.com
wrote:
 First: excuse for my english

 My opinion: a useful font for measuring popoularity can be
Google
 Insights for Search - http://www.google.com/insights/search/#

 Every person using a software like R, SAS, SPSS needs first to
learn
 it. So probably he make a web-search for a manual, a tutorial, a
 guide. One can measure the share of this kind of serach query.
 This kind of results can be useful to determine trends of
 popularity.

 Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
 SPSS tutorial/manual/guide

http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%2
0
m
 a
n
ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%2
2
%
 2
B

Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-26 Thread Allan Engelhardt



On 26/06/10 16:07, Muenchen, Robert A (Bob) wrote:

I've been trying to make sense of Google Scholar searches. I'm obviously
missing something basic. Here are two searches on www.google.com:

sas - gets 68M hits
sas OR spss - gets 74.3M hits. A bigger number as OR would imply.

But when I do the same searches on scholar.google.com, here's what I
get:

sas - gets 4.6M hits
sas OR spss - gets 1.65M hits

How on earth can an OR get you less??
   


Because the search for SAS alone stems the words so you get hist on SA 
alone (SAS obviously (!) being the plural of SA).  As you will see from 
the first few hits (hint: the matched word is highlighted in bold).  
With the OR you don't stem (weird but true).  Put quotes around the 
single search term to avoid (some of) the stemming:


SAS - 4.62M
SAS - 1.62M
SPSS - 0.635M
SAS OR SPSS - 1.52M

It is obviously still not right, but closer.  Happy reading of the 
articles by D. Sas, S.A.S. Eddington, etc.


Any follow-ups probably belong on a different mailing list - I think 
there are forums for Google search.



Allan



Thanks,
Bob

   


http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20
 

S
   

AS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q

This one is nice too. You can see that the bump in the autumn semester
for R is replacing the one for Matlab. Then in the spring semester
Matlab stays high but R drops. And both the US and India always have a
very large search index, whereas the rest of the world is essentially
worthless. Which leads me to the conclusion that : 1) The results are
probably coming from google.com, excluding local versions, and 2) in
the US (and India), statistics is mainly taught in the autumn
semester. Given the fact that daylight has a beneficial effect on the
emotional well being, the impopularity of statistics is likely caused
by unfortunate scheduling.

Forget Excel. Google rocks! ;-)

Cheers
Joris

 

Once you go the phrase route, you gain precision but end up with zero
counts on various phrases. I avoided that by combining them with +
   

to
 

get enough to plot. The resulting graph shows SAS dominant until
mid-2006 when SPSS takes the top position, followed by R, SAS, Stata
   

in
 

order:


   

http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20
 

m
   
   

anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%2
 

2
   
   

%2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22s
 

p
   
   

ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22s
 

p
   
   

ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22st
 

a
   

ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-
   

plus%20code%20for%22%2B%22
 

s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-
   

plus%20graph%22cmpt=q
 

This might be a good one to add to http://r4stats.com/popularity

Bob

   

I see that there's a car, the R Code Mustang, that adding for gets
 

rid
   

of.

Thanks for getting me back on a topic that I had given up on!

Bob

 

-Original Message-
From: r-help-boun...@r-project.org
   

[mailto:r-help-boun...@r-project.org]
 

On Behalf Of Joris Meys
Sent: Thursday, June 24, 2010 7:56 PM
To: Dario Solari
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

Nice idea, but quite sensitive to search terms, if you compare your
result on ... code with ... code for:
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code
   

%
   

2
 

0
   

f
 

or%2Cspss%20code%20forcmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari
   

dario.sol...@gmail.com
   

wrote:
   

First: excuse for my english

My opinion: a useful font for measuring popoularity can be
 

Google
   

Insights for Search - http://www.google.com/insights/search/#

Every person using a software like R, SAS, SPSS needs first to
 

learn
 

it. So probably he make a web-search for a manual, a tutorial, a
guide. One can measure the share of this kind of serach query.
This kind of results can be useful to determine trends of
popularity.

Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
SPSS tutorial/manual/guide

 

http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%2
   

0
   

m
 

a
   

n
 

ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%2
   

2
   

%
 

2
   

B
 

%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%2
   

2
   

s
 

a
   

s
 

%20manual%22%2B%22sas%20guide%22cmpt=q
   

Example 2: R software, SAS software, SPSS software

 

http://www.google.com/insights/search/#q=%22r%20software%22%2C%22sps
   

s
   

%
 

2
   

0
 


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-26 Thread Dario Solari
Bob, i'm confused.
You try a search with Google Scholar or with Google Insights for search?

---
useful references for Google Insights for search:

* matching terms:
http://www.google.com/support/insights/bin/answer.py?hl=enanswer=94777

* interpreting search volumes:
http://www.google.com/support/insights/bin/answer.py?hl=enanswer=92769

---
useful references for Google Scholar

http://scholar.google.com/intl/en/scholar/refinesearch.html

---

Seems that the OR option in Google Scholar doesn't work.
Try to conctact the Google Scholar Support Centre:
http://www.google.com/support/scholar/bin/request.py?contact_type=general


On Sat, Jun 26, 2010 at 5:07 PM, Muenchen, Robert A (Bob)
muenc...@utk.eduwrote:



 -Original Message-
 From: Joris Meys [mailto:jorism...@gmail.com]
 Sent: Friday, June 25, 2010 10:10 PM
 To: Muenchen, Robert A (Bob)
 Cc: Dario Solari; r-help@r-project.org
 Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...
 
 I had taken the opposite tack with Google Trends by subtracting
  keywords
 like:
 SAS -shoes -airlines -sonar...
 but never got as good results as that beautiful X code for search.
 When you see the end-of-semester panic bumps in traffic, you know
  you're
 nailing it!
 
  I have to eat those words already. The R code for search that
 showed
 a
  peak every December did not have quotes around it, so it was
 searching
  for those three words not the complete phrase. When you add the
 quotes,
  the peaks vanish.
 
 Don't swallow! You're looking through search terms, not through web
 pages. R code for regression, regression code R etc. are all valid
 searches, no quotation marks needed.

 I wondered why those clear peaks had vanished when I added quotes.
 Here's one that combines the search terms without the quotes. It shows
 several March/April  October/November peaks:

 http://www.google.com/insights/search/#q=r%20code%20for%2Br%20manual%2Br
 %20tutorial%2Br%20graph%2Csas%20code%20for%2Bsas%20manual%2Bsas%20tutori
 al%2Bsas%20graph%2Cspss%20code%20for%2Bspss%20manual%2Bspss%20tutorial%2
 Bspss%20graph%2Cstata%20code%20for%2Bstata%20manual%2Bstata%20tutorial%2
 Bstata%20graph%2Cs-plus%20code%20for%2Bs-plus%20manual%2Bs-plus%20tutori
 al%2Bs-plus%20graphcmpt=q

 I've been trying to make sense of Google Scholar searches. I'm obviously
 missing something basic. Here are two searches on www.google.com:

 sas - gets 68M hits
 sas OR spss - gets 74.3M hits. A bigger number as OR would imply.

 But when I do the same searches on scholar.google.com, here's what I
 get:

 sas - gets 4.6M hits
 sas OR spss - gets 1.65M hits

 How on earth can an OR get you less??

 Thanks,
 Bob

 
 http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20
 S
 AS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q
 
 This one is nice too. You can see that the bump in the autumn semester
 for R is replacing the one for Matlab. Then in the spring semester
 Matlab stays high but R drops. And both the US and India always have a
 very large search index, whereas the rest of the world is essentially
 worthless. Which leads me to the conclusion that : 1) The results are
 probably coming from google.com, excluding local versions, and 2) in
 the US (and India), statistics is mainly taught in the autumn
 semester. Given the fact that daylight has a beneficial effect on the
 emotional well being, the impopularity of statistics is likely caused
 by unfortunate scheduling.
 
 Forget Excel. Google rocks! ;-)
 
 Cheers
 Joris
 
 
  Once you go the phrase route, you gain precision but end up with zero
  counts on various phrases. I avoided that by combining them with +
 to
  get enough to plot. The resulting graph shows SAS dominant until
  mid-2006 when SPSS takes the top position, followed by R, SAS, Stata
 in
  order:
 
 
 http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20
 m
 
 anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%2
 2
 
 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22s
 p
 
 ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22s
 p
 
 ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22st
 a
  ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-
 plus%20code%20for%22%2B%22
  s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-
 plus%20graph%22cmpt=q
 
  This might be a good one to add to http://r4stats.com/popularity
 
  Bob
 
 
 I see that there's a car, the R Code Mustang, that adding for gets
  rid
 of.
 
 Thanks for getting me back on a topic that I had given up on!
 
 Bob
 
 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of Joris Meys
 Sent: Thursday, June 24, 2010 7:56 PM
 To: Dario Solari
 Cc: r-help@r-project.org
 Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...
 
 Nice idea, but quite sensitive to search terms, if you compare your
 result on ... code with ... code for:
 

Re: [R] boot with strata: strata argument ignored?

2010-06-26 Thread Charles C. Berry

On Sat, 26 Jun 2010, Bryan Hanson wrote:


Hello All.  I must be missing the really obvious here:

mm - function(d, i) median(d[i])
b1 - boot(gravity$g, mm, R = 1000)
b1
b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series)
b2

Both b1 and b2 seem to have done (almost) the same thing, but it looks like
the strata argument in b2 has been ignored.  However, str(b1) vs str(b2)
does show that the strata have been noted correctly.  But b2$t is a 1000 x 1
array, not a 1000 x 8 array (gravity$series is a factor with 8 levels).

There is a more complex example in ?boot using the same data set that gives
a result that seems to make sense (2 levels in the factor, so $t has 2
columns).

I either misunderstand the expected behavior or I've missed some punctuation
or syntax detail.


Your punctuation and syntax is OK.

Note:


SISWR - function(x) sample(x,length(x),repl=TRUE)
# no strata
var(replicate(1000,median(SISWR(gravity$g

[1] 0.4588338

# now stratify on series
gsplit - split(gravity$g,gravity$series)
var(replicate(1000,median(unlist(lapply(gsplit,SISWR)

[1] 0.3882272


sqrt(.45) # this agrees  with b1

[1] 0.6708204

sqrt(.39) # this agrees with b2

[1] 0.6244998




The effect of stratification depends on the relative amount of variation 
within vs between strata. This suggests there is not a lot:



aov(g~series,gravity)

Call:
   aov(formula = g ~ series, data = gravity)

Terms:
  series Residuals
Sum of Squares  2818.624  8239.376
Deg. of Freedom773

Residual standard error: 10.62394
Estimated effects may be unbalanced





HTH,

Chuck



TIA, Bryan

*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA


sessionInfo()

R version 2.11.0 (2010-04-22)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] datasets  tools grid  graphics  grDevices utils stats
[8] methods   base

other attached packages:
[1] boot_1.2-42brew_1.0-3 faraway_1.0.4
[4] GGally_0.2 xtable_1.5-6   mvbutils_2.5.1
[7] ggplot2_0.8.7  digest_0.4.2   reshape_0.8.3
[10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0
[13] R.oo_1.7.2 R.methodsS3_1.2.0  rgl_0.91
[16] lattice_0.18-5 mvoutlier_1.4  plyr_0.1.9
[19] RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
[22] robustbase_0.5-0-1 rpart_3.1-46   pls_2.1-0
[25] pcaPP_1.8-1mvtnorm_0.9-9  nnet_7.3-1
[28] mclust_3.4.4   MASS_7.3-5 lars_0.9-7
[31] e1071_1.5-23   class_7.3-2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict newdata question

2010-06-26 Thread Felipe Carrillo

 Thanks Bill, that worked great!!
 
 You ask:

# How can I use 
 predict here, 'newdata' 
 
 crashes
predict(m1,newdata=wolf$predicted);wolf # it doesn't work

To 
 
 use predict() you need to give a fitted model object (here m1) and a 
 *data 
 frame* to specify the values of the predictors for which you want 
 
 predictions.  Here wolf$predicted is not a data frame, it is a 
 
 vector.

What I think you want is

pv - predict(m1, newdata = 
 
 wolf)

That will get you linear predictors.  To get probabilities 
 you 
 need to say so as

probs - predict(m1, newdata = wolf, 
 type = 
 response)

You can put these back into the data frame if 
 you wish, 
 e.g.

wolf - within(wold, {
    lpreds - 
 
 predict(m1, wolf)
    probs - predict(m1, wolf, type = 
 
 response)
})

Now if you look at 

head(wolf)

you will 
 
 see two extra columns.


-Original Message-
From: 
  ymailto=mailto: 
  href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org 
 
 href=mailto: 
 href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org 
 ymailto=mailto:r-help-boun...@r-project.org; 
 href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org 
 
 [mailto: href=mailto: ymailto=mailto:r-help-boun...@r-project.org; 
 href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org 
 ymailto=mailto:r-help-boun...@r-project.org; 
 href=mailto:r-help-boun...@r-project.org;r-help-boun...@r-project.org] On 
 
 Behalf Of Felipe Carrillo
Sent: Saturday, 26 June 2010 10:35 
 AM
To:  ymailto=mailto: 
href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch 
 
 href=mailto: 
 href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch 
 ymailto=mailto:r-h...@stat.math.ethz.ch; 
 href=mailto:r-h...@stat.math.ethz.ch;r-h...@stat.math.ethz.ch
Subject: 
 
 [R] predict newdata question

Hi:
I am using a subset of the 
 below 
 dataset to predict PRED_SUIT for
the whole dataset but I am 
 having trouble 
 with 'newdata'. The model
was created with 153 
 records and want to predict 
 for 208 records. 

[lots of stuff 
 
 
 omitted]

wolf$prob99-(exp(wolf$predicted))/(1+exp(wolf$predicted))
head(wolf);dim(wolf)
   
 
 # How can I use predict here, 'newdata' 
 
 crashes
 predict(m1,newdata=wolf$predicted);wolf  # it doesn't 
 
 work

Thanks for any hints


Felipe D. Carrillo
Supervisory 
 
 Fishery Biologist
Department of the Interior
US Fish  
 Wildlife 
 Service
California, 
 
 USA




__
 
 ymailto=mailto: href=mailto:R-help@r-project.org;R-help@r-project.org 
 
 href=mailto: href=mailto:R-help@r-project.org;R-help@r-project.org 
 ymailto=mailto:R-help@r-project.org; 
 href=mailto:R-help@r-project.org;R-help@r-project.org mailing list
 
 href= https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank 
  href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank 
 https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
 
 guide http://www.R-project.org/posting-guide.html
and provide 
 commented, 
 minimal, self-contained, reproducible 
 code.




__
 ymailto=mailto:R-help@r-project.org; 
 href=mailto:R-help@r-project.org;R-help@r-project.org mailing list
 href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank 
 https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 
 guide  http://www.R-project.org/posting-guide.html
and provide commented, 
 minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-26 Thread Dario Solari
On 26 Giu, 17:19, Allan Engelhardt all...@cybaea.com wrote:
 On 26/06/10 16:07, Muenchen, Robert A (Bob) wrote:

  I've been trying to make sense of Google Scholar searches. I'm obviously
  missing something basic. Here are two searches onwww.google.com:

  sas - gets 68M hits
  sas OR spss - gets 74.3M hits. A bigger number as OR would imply.

  But when I do the same searches on scholar.google.com, here's what I
  get:

  sas - gets 4.6M hits
  sas OR spss - gets 1.65M hits

  How on earth can an OR get you less??


Try to use this search terms (in Google Scholar):

SAS Institute, SPSS Inc, r project org...


 On 26 Giu, 17:19, Allan Engelhardt all...@cybaea.com wrote:
...
 It is obviously still not right, but closer.  Happy reading of the
 articles by D. Sas, S.A.S. Eddington, etc.

In this way you can avoid the Happy reading of the articles by D.
Sas, S.A.S. Eddington, etc.:

SAS -author:sas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] become a member of R user community

2010-06-26 Thread Peter Ehlers

On 2010-06-26 3:52, (Ted Harding) wrote:

On 25-Jun-10 21:46:13, Albert Lee, Ph.D. wrote:

How do I become a member of R user community?

Albert  Lee,
Ph.D. statistician


1. By using R
2. By subscribing to the R-help mailing list and keeping in touch
with the rest of us!

To subscribe your email address to the list, visit the R-help
info page at:

   https://stat.ethz.ch/mailman/listinfo/r-help

and follow the instructions under Subscribing to R-help.

Welcome!
Ted.


Let me just add that, for very little money, you can also become a
supporting member of the R Foundation.
See the homepage 'Foundation' link or go directly to

   http://www.r-project.org/foundation/membership.html

Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to group a large list of strings into categories based on string similarity?

2010-06-26 Thread G FANG
Hi Martin,

Thanks a lot for your advice.

I tried the process you suggested as below, it worked, but in a
different way that I planned.

library(Biostrings)
x - c(ACTCCCGCCGTTCGCGCGCAGCATGATCCTG,
  ACTCCCGCCGTTCGCGCGC,
  CAGGATCATGCTGCGCGCGAACGGCGGGAGT,
  CAGGATCATGCTGCGCGCGAANN,
  NCAGGATCATGCTGCGCGCGAAN,
  CAGGATCATGCTGCGCGCG,
  NNNCAGGATCATGCTGCGCGCGAANNN)
names(x) - seq_along(x)
dna - DNAStringSet(x)
while (!all(width(dna) == width(dna - trimLRPatterns(N, N, dna {}
names(dna)[order(dna)[rank(dna, ties.method=min)]]

The output is,
1 2 3 4 4 6 4, this is the right answer after trimining
N's, i.e. without considering N, which strings are the same.

But actually, the match I planned is position-to-position match, i.e.
1st and 2nd strings are the same except for the N's

So, the expected output is 1 1 2 2 3 2 4

Please advice.

Thanks!

--gang

On Wed, Jun 23, 2010 at 7:55 PM, Martin Morgan mtmor...@fhcrc.org wrote:
 On 06/23/2010 07:46 PM, Martin Morgan wrote:
 On 06/23/2010 06:55 PM, G FANG wrote:
 Hi,

 I want to group a large list (20 million) of strings into categories
 based on string similarity?

 The specific problem is: given a list of DNA sequence as below

 ACTCCCGCCGTTCGCGCGCAGCATGATCCTG
 ACTCCCGCCGTTCGCGCGC
 CAGGATCATGCTGCGCGCGAACGGCGGGAGT
 CAGGATCATGCTGCGCGCGAANN
 CAGGATCATGCTGCGCGCG
 ..
 .
 NNNCCGTTCGCGCGCAGCATGATCCTG
 CGCGCGCAGCATGATCCTG
 GCGCGCGAACGGCGGGAGT
 NNCGCGCAGCATGATCCTG
 NNNTGCGCGCGAACGGCGGGAGT
 NNTTCGCGCGCAGCATGATCCTG

 'N' is the missing letter

 It can be seen that some strings are the same except for those N's
 (i.e. N can match with any base)

 given this list of string, I want to have

 1) a vector corresponding to each row (string), for each string assign
 an id, such that similar strings (those only differ at N's) have the
 same id
 2) also get a mapping list from unique strings ('unique' in term of
 the same similarity defined above) to the ids

 I am a matlab user shifting to R. Please advice on efficient ways to do 
 this.

 The Bioconductor Biostrings package has many tools for this sort of
 operation. See http://bioconductor.org/packages/release/Software.html

 Maybe a one-time install

    source('http://bioconductor.org/biocLite.R')
    biocLite('Biostrings')

 then

   library(Biostrings)
   x - c(ACTCCCGCCGTTCGCGCGCAGCATGATCCTG,
         ACTCCCGCCGTTCGCGCGC,
         CAGGATCATGCTGCGCGCGAACGGCGGGAGT,
         CAGGATCATGCTGCGCGCGAANN,
         NCAGGATCATGCTGCGCGCGAAN,
         CAGGATCATGCTGCGCGCG,
         NNNCAGGATCATGCTGCGCGCGAANNN)
   names(x) - seq_along(x)
   dna - DNAStringSet(x)
   while (!all(width(dna) ==
               width(dna - trimLRPatterns(N, N, dna {}
   names(dna)[rank(dna)]

 oops, maybe closer to

   names(dna)[order(dna)[rank(dna, ties.method=min)]]

 although there might be a faster way (e.g., match 8, 4, 2, 1 N's). Also,
 your sequences likely come from a fasta file (Biostrings::readFASTA) or
 a text file with a column of sequences (ShortRead::readXStringColumns)
 or from alignment software (ShortRead::readAligned /
 ShortRead::readFastq). If you go this route you'll want to address
 questions to the Bioconductor mailing list

   http://bioconductor.org/docs/mailList.html

 Martin

 Thanks!

 Gang

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Martin Morgan
 Computational Biology / Fred Hutchinson Cancer Research Center
 1100 Fairview Ave. N.
 PO Box 19024 Seattle, WA 98109

 Location: Arnold Building M1 B861
 Phone: (206) 667-2793


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export Results

2010-06-26 Thread Liviu Andronic
On Sat, Jun 26, 2010 at 7:42 AM, Tal Galili tal.gal...@gmail.com wrote:
 And there are also the brew, and Sweave packages (as Henrique
 mentioned).

Also, odfWeave and Sweave via LyX. I believe that this is FAQed.
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boot with strata: strata argument ignored?

2010-06-26 Thread Bryan Hanson
Thanks Chuck, I understand much better what is going on with your example.
But I'm still uncertain why the b2$t array does not have the dimensions of R
x no. of strata.

Any further insight would be appreciated.  Bryan
*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA



On 6/26/10 12:43 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote:

 On Sat, 26 Jun 2010, Bryan Hanson wrote:
 
 Hello All.  I must be missing the really obvious here:
 
 mm - function(d, i) median(d[i])
 b1 - boot(gravity$g, mm, R = 1000)
 b1
 b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series)
 b2
 
 Both b1 and b2 seem to have done (almost) the same thing, but it looks like
 the strata argument in b2 has been ignored.  However, str(b1) vs str(b2)
 does show that the strata have been noted correctly.  But b2$t is a 1000 x 1
 array, not a 1000 x 8 array (gravity$series is a factor with 8 levels).
 
 There is a more complex example in ?boot using the same data set that gives
 a result that seems to make sense (2 levels in the factor, so $t has 2
 columns).
 
 I either misunderstand the expected behavior or I've missed some punctuation
 or syntax detail.
 
 Your punctuation and syntax is OK.
 
 Note:
 
 SISWR - function(x) sample(x,length(x),repl=TRUE)
 # no strata
 var(replicate(1000,median(SISWR(gravity$g
 [1] 0.4588338
 # now stratify on series
 gsplit - split(gravity$g,gravity$series)
 var(replicate(1000,median(unlist(lapply(gsplit,SISWR)
 [1] 0.3882272
 
 sqrt(.45) # this agrees  with b1
 [1] 0.6708204
 sqrt(.39) # this agrees with b2
 [1] 0.6244998
 
 
 The effect of stratification depends on the relative amount of variation
 within vs between strata. This suggests there is not a lot:
 
 aov(g~series,gravity)
 Call:
 aov(formula = g ~ series, data = gravity)
 
 Terms:
series Residuals
 Sum of Squares  2818.624  8239.376
 Deg. of Freedom773
 
 Residual standard error: 10.62394
 Estimated effects may be unbalanced
 
 
 
 HTH,
 
 Chuck
 
 
 TIA, Bryan
 
 *
 Bryan Hanson
 Acting Chair
 Professor of Chemistry  Biochemistry
 DePauw University, Greencastle IN USA
 
 sessionInfo()
 R version 2.11.0 (2010-04-22)
 x86_64-apple-darwin9.8.0
 
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] datasets  tools grid  graphics  grDevices utils stats
 [8] methods   base
 
 other attached packages:
 [1] boot_1.2-42brew_1.0-3 faraway_1.0.4
 [4] GGally_0.2 xtable_1.5-6   mvbutils_2.5.1
 [7] ggplot2_0.8.7  digest_0.4.2   reshape_0.8.3
 [10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0
 [13] R.oo_1.7.2 R.methodsS3_1.2.0  rgl_0.91
 [16] lattice_0.18-5 mvoutlier_1.4  plyr_0.1.9
 [19] RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
 [22] robustbase_0.5-0-1 rpart_3.1-46   pls_2.1-0
 [25] pcaPP_1.8-1mvtnorm_0.9-9  nnet_7.3-1
 [28] mclust_3.4.4   MASS_7.3-5 lars_0.9-7
 [31] e1071_1.5-23   class_7.3-2
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 Charles C. Berry(858) 534-2098
  Dept of Family/Preventive
 Medicine
 E mailto:cbe...@tajo.ucsd.edu UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Summaries for each level of a Categorical variable

2010-06-26 Thread Christos Argyropoulos

Look at the summary.formula function inside package Hmisc

Christos

 Date: Sat, 26 Jun 2010 05:17:34 -0700
 From: raoul.t.dso...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Calculating Summaries for each level of a Categorical variable
 
 
 Hi,
 
 I have a dataset which has a categorical variable R,a count variable C
 (integer) and 4 or more numeric variables (A,T,W,H - integers) containing
 measures for R. I would like to summarize each level of the variable R by
 the average for A,T,W and H. 
 
 I have written a function to calculate weighted averages using C as the
 weight and this is given below. The function works perfectly but how do I
 add the additional dimension I require to this function?
 
 Dataset: RT=
 R A  T   W   H
 R1   10 20 20  10
 R2   60 20 50  10
 R3   45 10 20  50
 R4   68 50 20  10
 R1   73 20 40  46
 R3   25 30 10  54
 R3   36 90 20  10
 R2   29 10 30  30
 
 # FUNCTION TO CALCULATE THE WEIGHTED AVERAGE FOR A WEIGHTED BY C
 WA-function(A,C) {
  sp_A-c(A %*% C)
  sum_C-sum(C)
  WA-sp_A/sum_C   
  return(WA)  
  }
 
 I am trying to incorporate the additional step of calculating the weighted
 average of A,T,W and H for each level of R. Need help with this.
 
 Thanks in advance!
 Raoul
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269349.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Powerful Free email with security by Microsoft.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dynamic panelmodel pgmm

2010-06-26 Thread marco schuelke


Hi,

I want to estimate a dynamic paneldata model with the following code, but 
unfortenately I received the error message below. 


form-PB~Activity+Solvency+Cap_Int
dynpanel-pgmm(dynformula(form,list(1,1,1,1)),data=panel[1:2185,1:37],effect=twoways,model=onestep,index=c(Aktie,Datum),gmm.inst=~PB,lag.gmm=list(c(2,12)),transformation=ld)
Fehler in FUN(X[[1L]], ...) : Indizierung außerhalb der Grenzen
 
dim(panel)
[1] 3408637



Best regards, Marco
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] optim() not finding optimal values

2010-06-26 Thread Derek Ogle
I am trying to use optim() to minimize a sum-of-squared deviations function 
based upon four parameters.  The basic function is defined as ...

SPsse - function(par,B,CPE,SSE.only=TRUE)  {
  n - length(B) # get number of years of data
  B0 - par[B0]# isolate B0 parameter
  K - par[K]  # isolate K parameter
  q - par[q]  # isolate q parameter
  r - par[r]  # isolate r parameter
  predB - numeric(n)
  predB[1] - B0
  for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1]
  predCPE - q*predB
  sse - sum((CPE-predCPE)^2)
  if (SSE.only) sse
else list(sse=sse,predB=predB,predCPE=predCPE)
}

My call to optim() looks like this

# the data
d - data.frame(catch= 
c(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674),
 
cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1))

pars - c(80,100,0.0001,0.17)   # put all parameters 
into one vector
names(pars) - c(B0,K,q,r)  # name the parameters
( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim()


This produces parameter estimates, however, that are not at the minimum value 
of the SPsse function.  For example, these parameter estimates produce a 
smaller SPsse,

parsbox - c(732506,1160771,0.0001484,0.4049)
names(parsbox) - c(B0,K,q,r)
( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) )

Setting the starting values near the parameters shown in parsbox even resulted 
in a movement away from (to a larger SSE) those parameter values.

( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim()


This issue most likely has to do with my lack of understanding of 
optimization routines but I'm thinking that it may have to do with the 
optimization method used, tolerance levels in the optim algorithm, or the shape 
of the surface being minimized.

Ultimately I was hoping to provide an alternative method to fisheries 
biologists who use Excel's solver routine.

If anyone can offer any help or insight into my problem here I would be greatly 
appreciative.  Thank you in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Greg Snow
No I mean something like this, assuming that the iris dataset contains the full 
population and we want to see if Setaso have a different mean than the 
population (the null would be that there is no difference in sepal width 
between species, or that species tells nothing about sepal width):


out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) )
obs1 - mean( iris$Sepal.Width[1:50] )

hist(out1, xlim=range(out1,obs1))
abline(v=obs1)

mean( out1  obs1 )


I don't have a reference (other than a text book that defines sampling 
distributions).

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: Atte Tenkanen [mailto:atte...@utu.fi]
Sent: Friday, June 25, 2010 10:08 PM
To: Atte Tenkanen
Cc: Greg Snow; David Winsemius; R mailing list
Subject: Re: [R] Wilcoxon signed rank test and its requirements


Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:



Greg Snow kirjoitti 25.6.2010 kello 21.55:


Let me see if I understand.  You actually have the data for the whole 
population (the entire piece) but you have some pre-defined sections that you 
want to see if they differ from the population, or more meaningfully they are 
different from a randomly selected set of measures.  Is that correct?

If so, since you have the entire population of interest you can create the 
actual sampling distribution (or a good approximation of it).  Just take random 
samples from the population of the given size (matching the subset you are 
interested in) and calculate the means (or other value of interest), probably 
10,000 to 1,000,000 samples.  Now compare the value from your predefined subset 
to the set of random values you generated to see if it is in the tail or not.

I check, so you mean doing it this way:

t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean(SAMPLE), alt = 
less)

NO, this way:

t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace = FALSE)], 
mu=mean(SAMPLE), alt = less)

Atte



Atte



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.orgmailto:greg.s...@imail.org
801.408.8111


-Original Message-
From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-
project.org] On Behalf Of Atte Tenkanen
Sent: Thursday, June 24, 2010 11:04 PM
To: David Winsemius
Cc: R mailing list
Subject: Re: [R] Wilcoxon signed rank test and its requirements

The values come from this kind of process:
The musical composition is segmented into so-called 'pitch-class
segments' and these segments are compared with one reference set with a
distance function. Only some distance values are possible. These
distance values can be averaged over music bars which produces smoother
distribution and the 'comparison curve' that illustrates the distances
according to the reference set through a musical piece result in more
readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ), but I
would prefer to use original values.

then, I want to pick only some regions from the piece and compare those
values of those regions, whether they are higher than the mean of all
values.

Atte

On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

Is there anything for me?

There is a lot of data, n=2418, but there are also a lot of ties.
My sample n—250-300


I do not understand why there should be so many ties. You have not
described the measurement process or units. ( ... although you offer
a

glipmse without much background  later.)

i would like to test, whether the mean of the sample differ
significantly from the population mean.

Why? What is the purpose of this investigation? Why should the mean
of

a sample be that important?


The histogram of the population looks like in attached histogram,
what test should I use? No choices?

This distribution comes from a musical piece and the values are
'tonal distances'.

http://users.utu.fi/attenka/Hist.png

That picture does not offer much insidght into the features of that
measurement. It appears to have much more structure than I would
expect for a sample from a smooth unimodal underlying population.

--
David.


Atte

On 06/24/2010 12:40 PM, David Winsemius wrote:

On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

Thanks. What I have had to ask is that

how do you test that the data is symmetric enough?
If it is not, is it ok to use some data transformation?

when it is said:

The Wilcoxon signed rank test does not assume that the data are
sampled from a Gaussian distribution. However it does assume
that

the
data are distributed symmetrically around the median. If the
distribution is asymmetrical, the P value will not tell you much

about
whether the median is different than the hypothetical value.

You are being misled. Simply finding a statement on a statistics
software website, even one as reputable as Graphpad (???), does
not
mean
that it is necessarily true. My understanding (confirmed
reviewing
Nonparametric 

Re: [R] boot with strata: strata argument ignored?

2010-06-26 Thread Charles C. Berry

On Sat, 26 Jun 2010, Bryan Hanson wrote:


Thanks Chuck, I understand much better what is going on with your example.
But I'm still uncertain why the b2$t array does not have the dimensions of R
x no. of strata.


Because the test statistic returned by mm() is a scalar. It has nothing to 
do with the use or number of strata.


Look at what the first case in example( boot ) is doing:


ncol(boot(grav1, diff.means, R=999, stype=f)$t)

[1] 2

ncol(boot(grav1, diff.means, R=999, stype=f,strata=grav1[,1])$t)

[1] 2

diff.means(grav1,1:nrow(grav1))

[1] -4.100549 14.722902




Chuck



Any further insight would be appreciated.  Bryan
*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA



On 6/26/10 12:43 PM, Charles C. Berry cbe...@tajo.ucsd.edu wrote:


On Sat, 26 Jun 2010, Bryan Hanson wrote:


Hello All.  I must be missing the really obvious here:

mm - function(d, i) median(d[i])
b1 - boot(gravity$g, mm, R = 1000)
b1
b2 - boot(gravity$g, mm, R = 1000, strata = gravity$series)
b2

Both b1 and b2 seem to have done (almost) the same thing, but it looks like
the strata argument in b2 has been ignored.  However, str(b1) vs str(b2)
does show that the strata have been noted correctly.  But b2$t is a 1000 x 1
array, not a 1000 x 8 array (gravity$series is a factor with 8 levels).

There is a more complex example in ?boot using the same data set that gives
a result that seems to make sense (2 levels in the factor, so $t has 2
columns).

I either misunderstand the expected behavior or I've missed some punctuation
or syntax detail.


Your punctuation and syntax is OK.

Note:


SISWR - function(x) sample(x,length(x),repl=TRUE)
# no strata
var(replicate(1000,median(SISWR(gravity$g

[1] 0.4588338

# now stratify on series
gsplit - split(gravity$g,gravity$series)
var(replicate(1000,median(unlist(lapply(gsplit,SISWR)

[1] 0.3882272


sqrt(.45) # this agrees  with b1

[1] 0.6708204

sqrt(.39) # this agrees with b2

[1] 0.6244998




The effect of stratification depends on the relative amount of variation
within vs between strata. This suggests there is not a lot:


aov(g~series,gravity)

Call:
aov(formula = g ~ series, data = gravity)

Terms:
   series Residuals
Sum of Squares  2818.624  8239.376
Deg. of Freedom773

Residual standard error: 10.62394
Estimated effects may be unbalanced





HTH,

Chuck



TIA, Bryan

*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA


sessionInfo()

R version 2.11.0 (2010-04-22)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] datasets  tools grid  graphics  grDevices utils stats
[8] methods   base

other attached packages:
[1] boot_1.2-42brew_1.0-3 faraway_1.0.4
[4] GGally_0.2 xtable_1.5-6   mvbutils_2.5.1
[7] ggplot2_0.8.7  digest_0.4.2   reshape_0.8.3
[10] proto_0.3-8ChemoSpec_1.43 R.utils_1.4.0
[13] R.oo_1.7.2 R.methodsS3_1.2.0  rgl_0.91
[16] lattice_0.18-5 mvoutlier_1.4  plyr_0.1.9
[19] RColorBrewer_1.0-2 chemometrics_0.8   som_0.3-5
[22] robustbase_0.5-0-1 rpart_3.1-46   pls_2.1-0
[25] pcaPP_1.8-1mvtnorm_0.9-9  nnet_7.3-1
[28] mclust_3.4.4   MASS_7.3-5 lars_0.9-7
[31] e1071_1.5-23   class_7.3-2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
 Dept of Family/Preventive
Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim() not finding optimal values

2010-06-26 Thread Nikhil Kaza
Your function is very irregular, so the optim is likely to return  
local minima rather than global minima.


Try different methods  (SANN, CG, BFGS) and see if you get the result  
you need. As with all numerical optimsation, I would check the  
sensitivity of the results to starting values.



Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.l...@gmail.com

On Jun 26, 2010, at 4:27 PM, Derek Ogle wrote:

I am trying to use optim() to minimize a sum-of-squared deviations  
function based upon four parameters.  The basic function is defined  
as ...


SPsse - function(par,B,CPE,SSE.only=TRUE)  {
 n - length(B) # get number of years of  
data

 B0 - par[B0]# isolate B0 parameter
 K - par[K]  # isolate K parameter
 q - par[q]  # isolate q parameter
 r - par[r]  # isolate r parameter
 predB - numeric(n)
 predB[1] - B0
 for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)- 
B[i-1]

 predCPE - q*predB
 sse - sum((CPE-predCPE)^2)
 if (SSE.only) sse
   else list(sse=sse,predB=predB,predCPE=predCPE)
}

My call to optim() looks like this

# the data
d - data.frame(catch=  
c 
(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674 
),  
cpe 
= 
c 
(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1 
))


pars - c(80,100,0.0001,0.17)   # put all  
parameters into one vector
names(pars) - c(B0,K,q,r)  # name the  
parameters

( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim()


This produces parameter estimates, however, that are not at the  
minimum value of the SPsse function.  For example, these parameter  
estimates produce a smaller SPsse,


parsbox - c(732506,1160771,0.0001484,0.4049)
names(parsbox) - c(B0,K,q,r)
( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) )

Setting the starting values near the parameters shown in parsbox  
even resulted in a movement away from (to a larger SSE) those  
parameter values.


( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run  
optim()



This issue most likely has to do with my lack of understanding of  
optimization routines but I'm thinking that it may have to do with  
the optimization method used, tolerance levels in the optim  
algorithm, or the shape of the surface being minimized.


Ultimately I was hoping to provide an alternative method to  
fisheries biologists who use Excel's solver routine.


If anyone can offer any help or insight into my problem here I would  
be greatly appreciative.  Thank you in advance.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Atte Tenkanen
Thanks! The results were similar to the t.test p-values show (I have  
four samples).
Thank you also for using that replicate-function which i didn't know.  
Till now I have just used for-loops that are not so beautiful... i  
don't know about the speed. Have to test that.

Atte

Greg Snow kirjoitti 26.6.2010 kello 23.30:

 No I mean something like this, assuming that the iris dataset  
 contains the full population and we want to see if Setaso have a  
 different mean than the population (the null would be that there is  
 no difference in sepal width between species, or that species tells  
 nothing about sepal width):


 out1 - replicate( 10, mean(sample(iris$Sepal.Width, 50)) )
 obs1 - mean( iris$Sepal.Width[1:50] )

 hist(out1, xlim=range(out1,obs1))
 abline(v=obs1)

 mean( out1  obs1 )


 I donÕt have a reference (other than a text book that defines  
 sampling distributions).

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111

 From: Atte Tenkanen [mailto:atte...@utu.fi]
 Sent: Friday, June 25, 2010 10:08 PM
 To: Atte Tenkanen
 Cc: Greg Snow; David Winsemius; R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements


 Atte Tenkanen kirjoitti 26.6.2010 kello 5.15:



 Greg Snow kirjoitti 25.6.2010 kello 21.55:


 Let me see if I understand.  You actually have the data for the  
 whole population (the entire piece) but you have some pre-defined  
 sections that you want to see if they differ from the population,  
 or more meaningfully they are different from a randomly selected  
 set of measures.  Is that correct?

 If so, since you have the entire population of interest you can  
 create the actual sampling distribution (or a good approximation of  
 it).  Just take random samples from the population of the given  
 size (matching the subset you are interested in) and calculate the  
 means (or other value of interest), probably 10,000 to 1,000,000  
 samples.  Now compare the value from your predefined subset to the  
 set of random values you generated to see if it is in the tail or not.

 I check, so you mean doing it this way:

 t.test(sample(POPUL, length(SAMPLE), replace = FALSE), mu=mean 
 (SAMPLE), alt = less)

 NO, this way:

 t.test(POPUL[sample(1:length(POPUL), length(SAMPLE), replace =  
 FALSE)], mu=mean(SAMPLE), alt = less)

 Atte



 Atte



 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Atte Tenkanen
 Sent: Thursday, June 24, 2010 11:04 PM
 To: David Winsemius
 Cc: R mailing list
 Subject: Re: [R] Wilcoxon signed rank test and its requirements

 The values come from this kind of process:
 The musical composition is segmented into so-called 'pitch-class
 segments' and these segments are compared with one reference set  
 with a
 distance function. Only some distance values are possible. These
 distance values can be averaged over music bars which produces  
 smoother
 distribution and the 'comparison curve' that illustrates the distances
 according to the reference set through a musical piece result in more
 readable curve (see e.g. http://users.utu.fi/attenka/with6.jpg ),  
 but I
 would prefer to use original values.

 then, I want to pick only some regions from the piece and compare  
 those
 values of those regions, whether they are higher than the mean of all
 values.

 Atte

 On Jun 24, 2010, at 6:58 PM, Atte Tenkanen wrote:

 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample nÅ250-300


 I do not understand why there should be so many ties. You have not
 described the measurement process or units. ( ... although you offer
 a

 glipmse without much background  later.)

 i would like to test, whether the mean of the sample differ
 significantly from the population mean.

 Why? What is the purpose of this investigation? Why should the mean
 of

 a sample be that important?


 The histogram of the population looks like in attached histogram,
 what test should I use? No choices?

 This distribution comes from a musical piece and the values are
 'tonal distances'.

 http://users.utu.fi/attenka/Hist.png

 That picture does not offer much insidght into the features of that
 measurement. It appears to have much more structure than I would
 expect for a sample from a smooth unimodal underlying population.

 --
 David.


 Atte

 On 06/24/2010 12:40 PM, David Winsemius wrote:

 On Jun 23, 2010, at 9:58 PM, Atte Tenkanen wrote:

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume
 that

 the
 data are 

Re: [R] subset arg in subset(). was: converting result of substitute to 'ordidnary' expression

2010-06-26 Thread Vadim Ogranovich
It does work, thank you, but the literal 5x now needs to be quoted by 
expression():

 do.call(subset, list(dat, expression(5x)))
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

This is ok, but the standard subset(dat, 5x) looks more readable.

Anyway, thank you for your help, it's a nice paradigm.

Vadim

-Original Message-
From: bill.venab...@csiro.au [mailto:bill.venab...@csiro.au]
Sent: Saturday, June 26, 2010 1:08 AM
To: Vadim Ogranovich; r-help@r-project.org
Subject: RE: [R] subset arg in subset(). was: converting result of substitute 
to 'ordidnary' expression

Here is another one that works:

 do.call(subset, list(dat, subsetexp))
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Vadim Ogranovich
Sent: Saturday, 26 June 2010 11:13 AM
To: 'r-help@r-project.org'
Subject: [R] subset arg in subset(). was: converting result of substitute to 
'ordidnary' expression

Dear R users,

Please disregard my previous post converting result of substitute to 
'ordidnary' expression. The problem I have has nothing to do with substitute.

Consider:

 dat - data.frame(x=1:10, y=1:10)

 subsetexp - expression(5x)

 ## this does work
 subset(dat, eval(subsetexp))
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

 ## and so does this
 subset(dat, 5x)
x  y
6   6  6
7   7  7
8   8  8
9   9  9
10 10 10

 ## but this doesn't work
 subset(dat, subsetexp)
Error in subset.data.frame(dat, subsetexp) :
  'subset' must evaluate to logical

Why did the last expression fail and why it worked with eval()?

Thank you very much for your help,
Vadim

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments.  Email 
transmission cannot be guaranteed to be secure or error-free.  Jump Trading, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments.  This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments.  Email 
transmission cannot be guaranteed to be secure or error-free.  Jump Trading, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments.  This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package(pls) - extracting explained Y-variance

2010-06-26 Thread Christian Jebsen


Dear R-help users,

I'd like to use the R-package pls and want to extract the explained  
Y-variance to identify the important (PLS-) principal components in my  
model, related to the y-data. For explained X-variance there is a  
function: explvar(). If I understand it right, the summary()  
function gives an overview, where the y-variance is shown, but I can't  
extract it for plotting.


How can I do it, withou pencil and paper?

Thank you very much for help,
Christian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different standard errors from R and other software

2010-06-26 Thread Joris Meys
If I understand correctly from their website, discrete choice models
are mostly generalized linear models with the common link functions
for discrete data? Apart from a few names I didn't recognize, all
analyses seem quite standard to me. So I wonder why you would write
the log-likelihood yourself for techniques that are implemented in R.

Unless I missed something pretty important, or you want to do a
specific analysis that wasn't clear to me, you should take a closer
look at the possibilities in R for generalized linear (mixed)
modelling and so on.

Binary choice translates to a simple glm with a logit function.
Multinomial choice can be done with eg. multinom() from nnet. Ordered
choice can be done with polr() from the MASS package. A nice one to
look at is the package mgcv or gamm4 in case of big datasets. They
offer very flexible models that can include random terms, specific
variance-covariance structures and non-linear relations in the form of
splines.

Apologies if this is all obvious and known to you. In that case you
might want to specify what exactly it is you are comparing and how
exactly you calculated it yourself.

Cheers
Joris

On Fri, Jun 25, 2010 at 11:47 PM, Min Chen chenmin0...@gmail.com wrote:
 Hi all,

    Sorry to bother you. I'm estimating a discrete choice model in R using
 the maxBFGS command. Since I wrote the log-likelihood myself, in order to
 double check, I run the same model in Limdep. It turns out that the
 coefficient estimates are quite close; however, the standard errors are very
 different. I also computed the hessian and outer product of the gradients in
 R using the numDeriv package, but the results are still very different from
 those in Limdep. Is it the routine to compute the inverse hessian that
 causes the difference? Thank you very much!

     Best wishes.


 Min


 --
 Min Chen
 Ph.D. Candidate
 Department of Agricultural, Food, and Resource Economics
 125 Cook Hall
 Michigan State University

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use a data frame whose name is stored as a string variable?

2010-06-26 Thread Seth

Thanks!  Works like a charm.  -Seth
-- 
View this message in context: 
http://r.789695.n4.nabble.com/use-a-data-frame-whose-name-is-stored-as-a-string-variable-tp2269095p2269732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursive indexing failed at level 2

2010-06-26 Thread Bill.Venables
Why do you use *double* square brackets on the left side of the replacement?

From the help info for [[:

The most important distinction between [, [[ and $ is that the [ can select 
more than one element whereas the other two select a single element.

You seem to be selecting 20 elements.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jim Hargreaves
Sent: Saturday, 26 June 2010 9:54 PM
To: r-help@r-project.org
Subject: [R] Recursive indexing failed at level 2

Dear fellow R users,

I am replacing elements of a list like so:

pulse_subset[[1:20]]=unlist(pulse[i])[1:20]

where pulse is a list of lists, and pulse [i] has 20 values.

This gives the error Recursive Indexing failed at level 2. But, 
interestingly this instruction is part of a loop which has gone through 
about 200,000 iterations before giving this error.

Actual code:
  
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]]
 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]

Error in 
pulse_subset[[1:(length(unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])]))]]
 
- unlist(pulse[i])[as.numeric(peak_start[i]):as.numeric(peak_end[i])] :
   recursive indexing failed at level 2

If anyone could shed some light I'd be rather grateful.

Regards,
Jim Hargreaves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-26 Thread Daniel Malter

Atte, note the similarity between what Greg described and a bootstrap. The
difference to a true bootstrap is that in Greg's version you subsample the
population (or in other instances the data). This is known as subsampling
bootstrap and discussed in Politis, Romano, and Wolf (1999).

HTH,
Daniel
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Wilcoxon-signed-rank-test-and-its-requirements-tp2266165p2269775.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ways to work with R and Postgres

2010-06-26 Thread 顾小波
Hi,

I post this message to the general r-help list hoping anyone within a wider 
range have suggestions:

 

There are three ways to integration R and postgres, especially on 64bit 
Microsoft windows Platform,

 

1. via RODBC package, which has 32 bit and 64 bit version for windows

2. via RPostgres interface, which only has 32bit version currently

3. via plr for Greenplum, which only supports a few kinds of functionality, and 
supports only specific versions of R.

 

Do you have any idea about the advantages and disadvantages of each, and the 
differences among them

 

Your sincerely 

 

Xiaobo.Gu

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim() not finding optimal values

2010-06-26 Thread Ravi Varadhan
Derek,

The problem is that your function is poorly scaled.   You can see that the 
parameters vary over 10 orders of magnitude (from 1e-04 to 1e06).   You can get 
good convergence once you properly scale your function.  Here is how you do it:

par.scale - c(1.e06, 1.e06, 1.e-06, 1.0)
 
SPoptim - optim(pars, SPsse, B=d$catch, CPE=d$cpe, control=list(maxit=1500, 
parscale=par.scale))

 SPoptim
$par
  B0Kqr 
7.329553e+05 1.160097e+06 1.484375e-04 4.050476e-01 

$value
[1] 1619.487

$counts
function gradient 
1401   NA 

$convergence
[1] 0

$message
NULL


Hope this helps,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Derek Ogle do...@northland.edu
Date: Saturday, June 26, 2010 4:28 pm
Subject: [R] optim() not finding optimal values
To: R (r-help@R-project.org) r-help@r-project.org


 I am trying to use optim() to minimize a sum-of-squared deviations 
 function based upon four parameters.  The basic function is defined as 
 ...
  
  SPsse - function(par,B,CPE,SSE.only=TRUE)  {
n - length(B) # get number of years of 
 data
B0 - par[B0]# isolate B0 parameter
K - par[K]  # isolate K parameter
q - par[q]  # isolate q parameter
r - par[r]  # isolate r parameter
predB - numeric(n)
predB[1] - B0
for (i in 2:n) predB[i] - predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1]
predCPE - q*predB
sse - sum((CPE-predCPE)^2)
if (SSE.only) sse
  else list(sse=sse,predB=predB,predCPE=predCPE)
  }
  
  My call to optim() looks like this
  
  # the data
  d - data.frame(catch= 
 c(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674),
  
 cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1))
  
  pars - c(80,100,0.0001,0.17)   # put all 
 parameters into one vector
  names(pars) - c(B0,K,q,r)  # name the parameters
  ( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim()
  
  
  This produces parameter estimates, however, that are not at the 
 minimum value of the SPsse function.  For example, these parameter 
 estimates produce a smaller SPsse,
  
  parsbox - c(732506,1160771,0.0001484,0.4049)
  names(parsbox) - c(B0,K,q,r)
  ( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) )
  
  Setting the starting values near the parameters shown in parsbox even 
 resulted in a movement away from (to a larger SSE) those parameter values.
  
  ( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run optim()
  
  
  This issue most likely has to do with my lack of understanding of 
 optimization routines but I'm thinking that it may have to do with the 
 optimization method used, tolerance levels in the optim algorithm, or 
 the shape of the surface being minimized.
  
  Ultimately I was hoping to provide an alternative method to fisheries 
 biologists who use Excel's solver routine.
  
  If anyone can offer any help or insight into my problem here I would 
 be greatly appreciative.  Thank you in advance.
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ways to work with R and Postgres

2010-06-26 Thread Gabor Grothendieck
2010/6/27 顾小波 guxiaobo1...@gmail.com:
 Hi,

 I post this message to the general r-help list hoping anyone within a wider 
 range have suggestions:



 There are three ways to integration R and postgres, especially on 64bit 
 Microsoft windows Platform,



 1. via RODBC package, which has 32 bit and 64 bit version for windows

 2. via RPostgres interface, which only has 32bit version currently

 3. via plr for Greenplum, which only supports a few kinds of functionality, 
 and supports only specific versions of R.



 Do you have any idea about the advantages and disadvantages of each, and the 
 differences among them


There is also the RpgSQL package.  In addition the sqldf package uses
RpgSQL.  sqldf by default uses SQLite but if the RpgSQL package is
loaded then it defaults to PostgreSQL.  Here BOD Is a built in R
data.frame:

 library(sqldf)
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required package: chron
 library(RpgSQL)
Loading required package: RJDBC
 BOD
  Time demand
118.3
22   10.3
33   19.0
44   16.0
55   15.6
67   19.8
 sqldf('select regr_slope(demand, Time) slope,
+ regr_intercept(demand, Time) intercept,
+ corr(demand, Time) corr from BOD')
Loading required package: tcltk
Loading Tcl/Tk interface ... done
 slope intercept  corr
1 1.721429  8.521429 0.8030693

 coef(lm(demand ~ Time, BOD)); cor(BOD$Time, BOD$demand)
(Intercept)Time
   8.5214291.721429
[1] 0.8030693

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim() not finding optimal values

2010-06-26 Thread Ravi Varadhan
A slightly better scaling is the following:

par.scale - c(1.e06, 1.e06, 1.e-05, 1)  # q is scaled differently 

 SPoptim - optim(pars, SPsse, B=d$catch, CPE=d$cpe, control=list(maxit=1500, 
 parscale=par.scale))
 SPoptim
$par
  B0Kqr 
7.320899e+05 1.159939e+06 1.485560e-04 4.051735e-01 

$value
[1] 1619.482

$counts
function gradient 
 585   NA 

$convergence
[1] 0

$message
NULL


Note that the Nelder-Mead converges in half the number of iterations compared 
to that under previous scaling.

Ravi. 


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Ravi Varadhan rvarad...@jhmi.edu
Date: Sunday, June 27, 2010 0:42 am
Subject: Re: [R] optim() not finding optimal values
To: Derek Ogle do...@northland.edu
Cc: R (r-help@R-project.org) r-help@r-project.org


 Derek,
  
  The problem is that your function is poorly scaled.   You can see 
 that the parameters vary over 10 orders of magnitude (from 1e-04 to 
 1e06).   You can get good convergence once you properly scale your 
 function.  Here is how you do it:
  
  par.scale - c(1.e06, 1.e06, 1.e-06, 1.0)
   
  SPoptim - optim(pars, SPsse, B=d$catch, CPE=d$cpe, 
 control=list(maxit=1500, parscale=par.scale))
  
   SPoptim
  $par
B0Kqr 
  7.329553e+05 1.160097e+06 1.484375e-04 4.050476e-01 
  
  $value
  [1] 1619.487
  
  $counts
  function gradient 
  1401   NA 
  
  $convergence
  [1] 0
  
  $message
  NULL
  
  
  Hope this helps,
  Ravi.
  
  
  
  Ravi Varadhan, Ph.D.
  Assistant Professor,
  Division of Geriatric Medicine and Gerontology
  School of Medicine
  Johns Hopkins University
  
  Ph. (410) 502-2619
  email: rvarad...@jhmi.edu
  
  
  - Original Message -
  From: Derek Ogle do...@northland.edu
  Date: Saturday, June 26, 2010 4:28 pm
  Subject: [R] optim() not finding optimal values
  To: R (r-help@R-project.org) r-help@r-project.org
  
  
   I am trying to use optim() to minimize a sum-of-squared deviations 
 
   function based upon four parameters.  The basic function is defined 
 as 
   ...

SPsse - function(par,B,CPE,SSE.only=TRUE)  {
  n - length(B) # get number of years 
 of 
   data
  B0 - par[B0]# isolate B0 parameter
  K - par[K]  # isolate K parameter
  q - par[q]  # isolate q parameter
  r - par[r]  # isolate r parameter
  predB - numeric(n)
  predB[1] - B0
  for (i in 2:n) predB[i] - 
 predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1]
  predCPE - q*predB
  sse - sum((CPE-predCPE)^2)
  if (SSE.only) sse
else list(sse=sse,predB=predB,predCPE=predCPE)
}

My call to optim() looks like this

# the data
d - data.frame(catch= 
   
 c(9,113300,155860,181128,198584,198395,139040,109969,71896,59314,62300,65343,76990,88606,118016,108250,108674),
  
 
   
 cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.5,89.9,117.0,93.0,116.6,90.0,105.1))

pars - c(80,100,0.0001,0.17)   # put all 
 
   parameters into one vector
names(pars) - c(B0,K,q,r)  # name the 
 parameters
( SPoptim - optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim()


This produces parameter estimates, however, that are not at the 
   minimum value of the SPsse function.  For example, these parameter 
 
   estimates produce a smaller SPsse,

parsbox - c(732506,1160771,0.0001484,0.4049)
names(parsbox) - c(B0,K,q,r)
( res2 - SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) )

Setting the starting values near the parameters shown in parsbox 
 even 
   resulted in a movement away from (to a larger SSE) those parameter 
 values.

( SPoptim2 - optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# run 
 optim()


This issue most likely has to do with my lack of understanding 
 of 
   optimization routines but I'm thinking that it may have to do with 
 the 
   optimization method used, tolerance levels in the optim algorithm, 
 or 
   the shape of the surface being minimized.

Ultimately I was hoping to provide an alternative method to 
 fisheries 
   biologists who use Excel's solver routine.

If anyone can offer any help or insight into my problem here I 
 would 
   be greatly appreciative.  Thank you in advance.

__
R-help@r-project.org mailing list

PLEASE do read the posting guide 
and provide commented, minimal, self-contained, reproducible code.
  

Re: [R] Recoding dates to session id in a longitudinal dataset

2010-06-26 Thread John-Paul Bogers
-- Forwarded message --
From: John-Paul Bogers john-paul.bog...@ua.ac.be
Date: Sat, Jun 26, 2010 at 10:14 PM
Subject: Re: [R] Recoding dates to session id in a longitudinal dataset
To: jim holtman jholt...@gmail.com


Dear Jim,

he data concerns HPV screening data.
The data looks as follows
pat1 sampledate1 HPV16 0.3
pat2 sampledate2 HPV16 0
pat3 sampledata3 HPV16 0.5
pat1 sampledate4 HPV16 0.6
pat4 sampledate5 HPV16 0
pat2 sampledate6 HPV16 0
pat1 sampledate7 HPV16 0

What I would like is

pat1 1  HPV16 0.3
pat2 1  HPV16 0
pat3 1  HPV16 0.5
pat1 2  HPV16 0.6
pat4 1  HPV16 0
pat2 2  HPV16 0
pat1 3  HPV16 0

I would like to recode sampledate (real date, in date format) to session
sequence (first sample of this patient, second sample of this patient, )

I hope this makes it clear.

Thanks

JP

PS: I answered this as a reply to your private mail, how do I get this on
the mailinglist?
 On Sat, Jun 26, 2010 at 7:59 PM, jim holtman jholt...@gmail.com wrote:

 It would be useful if you could provide an example of what the data
 looks like now and what you would like it to look like; otherwise it
 is impossible to help.

 On Sat, Jun 26, 2010 at 8:37 AM, John-Paul Bogers
 john-paul.bog...@ua.ac.be wrote:
  Hi,
 
  I'm fairly new to R but I have a large dataset (30 obs) containing
  patient material. Some patients came 2-9 times during the three year
  observation period. The patients are identified by a unique idnr, the
  sessions can be distinguished using the session date. How can I recode
 the
  date of the session to a session id (1-9). This would be necessary to
 obtain
  information and do some analysis on the first occurence of a specific
  patient or to look for trends.
 
  Thanks
 
  JP Bogers
  University of Antwerp
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.