Re: [R] cuhre usage ?? multidimensional integration

2011-10-09 Thread sevenfrost

I'm not quite familiar with R. Could you recommend some relative material.

Actually, I'm not intend to loop from 1 to 1. I'm just trying to see if it
works. It's supposed to be some len1.

I'm trying to solve the following example: 

there are 3 types of variables(x,w,t), they are not independent and have
different distribution. 
Within each type, there are more than 1 variables. I need to integrate all
the variable.

joint pdf is like f(x,w,t)=∏_i▒〖f1(wi)f2(ti)〗(∏_j▒〖f3(xj)
∏_j▒〖f4(xj*wi-ti〗)〗)


--
View this message in context: 
http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3886287.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The Sets package and element equality test

2011-10-09 Thread David Meyer


By going through the code, I came to a similar conclusion also: it 
seems the match.2.list function must be modified in the following way 
to make it work:


match.2.list - function(l1,l2){
if (class(l1)==element){
 l1 - l1[[1]]
 l2 - l2[[1]]
}
 length(l1)==length(l2)
}

because depending whether it is called from the constructor or from 
the %e% function, the assumptions about the input types are not the same.
No, this is not not the problem, it's really the hashing code which 
makes this fail. If you turn hashing off, then the original match.2.list 
function will work as expected:


 sets_options(hash, FALSE)
 match.2.list - function(l1,l2){
+  length(l1)==length(l2)
+ }
 s - set(list(1,2),list(3,4))
 lset - cset(s,matchfun = matchfun(match.2.list))
 lset
{list(2)}

 list(1,8) %e% lset
[1] TRUE



And also, if instead of working with standard objects, I want to 
work with S4 / refClass objects, I cannot build cset objects, because 
I have to go first with the set construction which prompts an error:


Currently, S4 objects are not supported at all as set elements. I will 
have a look on this.


Best
David



Error in as.vector(x, character) :
  cannot coerce type 'environment' to vector of type 'character'


So far, I don't know how to work around this latter issue.

Thanks again for the package and your help.

Regards

Johnny




On Sat, Oct 8, 2011 at 2:40 PM, David Meyer mey...@technikum-wien.at 
mailto:mey...@technikum-wien.at wrote:


Dear Johnny,

this is a bug in the hashing-code of sets. Use:


sets_options(hash, FALSE)
lset - cset(s, matchfun = matchfun(match.2.list))

which will work.

Thanks for pointing this out!

David



Hi,

I tried to use the sets package yesterday, which seems to be a
very powerful
package: thanks to the developers. I wanted to define sets of
lists where 2
lists are considered equal if they have the same length.
So, I implemented:

match.2.list - function(l1,l2){
 length(l1)==length(l2)
}

and then defined my cset as:

s - set(list(1,2),list(3,4))
lset - cset(s,matchfun(match.2.list))

so if I now do:
y - list(3,4)
y %e% lset

I get the correct answer, which is TRUE.

But if I do:
x - list(1,8)
x %e% lset

I now get FALSE, even though x is a list of length 2, and should
thus match
any of the 2 lists in lset.

I must be doing something wrong; I checked with the doc, but I don't
understand.

-- 
Priv.-Doz. Dr. David Meyer

Institut für Wirtschaftsinformatik

Fachhochschule Technikum Wien
Höchstädtplatz 5, 1200 Wien
T: +43 1 333 40 77-394 tel:%2B43%201%20333%2040%2077-394
F: +43 1 333 40 77-99 394 tel:%2B43%201%20333%2040%2077-99%20394
E: david.me...@technikum-wien.at
mailto:david.me...@technikum-wien.at
I: www.technikum-wien.at http://www.technikum-wien.at




--
Priv.-Doz. Dr. David Meyer
Institut für Wirtschaftsinformatik

Fachhochschule Technikum Wien
Höchstädtplatz 5, 1200 Wien
T: +43 1 333 40 77-394
F: +43 1 333 40 77-99 394
E: david.me...@technikum-wien.at
I: www.technikum-wien.at

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HWEBayes, swapping the homozygotes genotype frequencies

2011-10-09 Thread stat999
 I evaluated the Bayes factor in the k=2 allele case with a triangular 
prior under the null as in the example in the help file:


HWETriangBF2(nvec=c(88,10,2)) 
[1] 0.4580336

When I swap the n11 entry and n22 entry of nvec, I received totally
different Bayes factor:

 
 HWETriangBF2(nvec=c(2,10,88)) 
[1] 5.710153
 

In my understanding, defining the genotype frequency as n11 or n22 are
arbitrary.
So I was expecting the same value of Bayes factor.

This is the case for conjugate Dirichlet prior:
DirichNormHWE(nvec=c(88,10,2), c(1,1))/DirichNormSat(nvec=c(88,10,2),
c(1,1,1))
[1] 1.542047
DirichNormHWE(nvec=c(2,10,88), c(1,1))/DirichNormSat(nvec=c(2,10,88),
c(1,1,1))
[1] 1.542047

Could you explain why the HWETriangBF2 is returining completely different
values of Bayes Factor??


--
View this message in context: 
http://r.789695.n4.nabble.com/HWEBayes-swapping-the-homozygotes-genotype-frequencies-tp3886313p3886313.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round() and negative digits

2011-10-09 Thread Rolf Turner

On 09/10/11 00:18, Duncan Murdoch wrote:

On 11-10-07 5:26 PM, Carl Witthoft wrote:

Just wondering here -- I tested and found to my delight that

%  round(325.4,-2)
[1] 300

gave me exactly what I would have expected (and wanted).  Since it's not
explicitly mentioned in the documentation that negative 'digits' is
allowed, I just wanted to ask whether this behavior is intentional or a
happy turn of events.  I'm always paranoid that something not explicitly
documented might disappear in future revisons.



It is intentional, and one of the regression tests confirms that it's 
there, so it won't disappear by mistake, and would be very unlikely to 
disappear intentionally.


Uh, wouldn't it be *nice* to mention this --- not completely obvious --- 
capability

in the help file?

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round() and negative digits

2011-10-09 Thread Rolf Turner

On 09/10/11 10:39, Duncan Murdoch wrote:

On 11-10-08 5:32 PM, Rolf Turner wrote:

On 09/10/11 00:18, Duncan Murdoch wrote:

On 11-10-07 5:26 PM, Carl Witthoft wrote:

Just wondering here -- I tested and found to my delight that

%  round(325.4,-2)
[1] 300

gave me exactly what I would have expected (and wanted).  Since 
it's not

explicitly mentioned in the documentation that negative 'digits' is
allowed, I just wanted to ask whether this behavior is intentional 
or a
happy turn of events.  I'm always paranoid that something not 
explicitly

documented might disappear in future revisons.



It is intentional, and one of the regression tests confirms that it's
there, so it won't disappear by mistake, and would be very unlikely to
disappear intentionally.


Uh, wouldn't it be *nice* to mention this --- not completely obvious ---
capability
in the help file?


If we told you all of R's secrets, we'd have to kill you.


Fortune nomination?

cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with using last observation carried forward analysis for a clinical trial please

2011-10-09 Thread maspitze
Hi,
I have a series of id's with multiple visits and questionnaire scores.  This
is a clinical trial that will be analyzed using the last observation carried
forward method.  In other words, in order to comply with intent to treat
analysis when many subjects withdraw, data points for the last visit must be
generated and filled in with the last observation.  The ultimate goal is to
tabulate the difference in qustionnaires between the start of the trial and
the end of trial.  I am lost at about how to do this.  
Each subject had multiple visits, up to 13.  In general, they took a
questionnaire at each visit.  However, if a questionnaire was not completed
or the visit is missing, the data point does not exist.  To explain, I
created a table as analogy.

My goal is to take something that looks like the following:

ID  Visit   score
1   1   10
2   1   12
2   3   15
3   1   1
3   2   6
4   1   16
4   2   1
4   3   7
4   4   17

I think I then need to change to this in order to perfrom locf in zoo:
ID  Visit   score
1   1   10
1   2   na
1   3   na
1   4   na
2   1   12
2   2   na
2   3   15
2   4   na
3   1   1
3   2   6
3   3   na
3   4   na
4   1   16
4   2   1
4   3   7
4   4   17

then change to:
ID  Visit   score
1   1   10
1   2   10
1   3   10
1   4   10
2   1   12
2   2   12
2   3   15
2   4   15
3   1   1
3   2   6
3   3   6
3   4   6
4   1   16
4   2   1
4   3   7
4   4   17

I would then like to take visit 4 and subtract visit 1 to create the
difference in the questionnaire scores during the clinical trial.  I will
then compare this score by t.test between placebo and drug groups.

Would anyone please have some guidance about how to do this in r?  I would
be grateful for assistance.
Regards, Matt  


--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-using-last-observation-carried-forward-analysis-for-a-clinical-trial-please-tp3886396p3886396.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R-help tranform data

2011-10-09 Thread ricardosousa2000

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread peter dalgaard

On Oct 8, 2011, at 16:04 , francy wrote:

 Hi, 
 
 I am having trouble understanding how to approach a simulation:
 
 I have a sample of n=250 from a population of N=2,000 individuals, and I
 would like to use either permutation test or bootstrap to test whether this
 particular sample is significantly different from the values of any other
 random samples of the same population. I thought I needed to take random
 samples (but I am not sure how many simulations I need to do) of n=250 from
 the N=2,000 population and maybe do a one-sample t-test to compare the mean
 score of all the simulated samples, + the one sample I am trying to prove
 that is different from any others, to the mean value of the population. But
 I don't know:
 (1) whether this one-sample t-test would be the right way to do it, and how
 to go about doing this in R
 (2) whether a permutation test or bootstrap methods are more appropriate
 
 This is the data frame that I have, which is to be sampled:
 df-
 i.e.
 x y
 1 2
 3 4
 5 6
 7 8
 . .
 . .
 . .
 2,000
 
 I have this sample from df, and would like to test whether it is has extreme
 values of y. 
 sample1-
 i.e.
 x y
 3 4
 7 8
 . .
 . .
 . .
 250
 
 For now I only have this: 
 
 R=999 #Number of simulations, but I don't know how many...
 t.values =numeric(R)   #creates a numeric vector with 999 elements, which
 will hold the results of each simulation. 
 for (i in 1:R) {
 sample1 - df[sample(nrow(df), 250, replace=TRUE),] 
 
 But I don't know how to continue the loop: do I calculate the mean for each
 simulation and compare it to the population mean? 
 Any help you could give me would be very appreciated,
 Thank you. 

The straightforward way would be a permutation test, something like this

msamp - mean(sample1$y)
mpop - mean(df$y)
msim - replicate(1, mean(sample(df$y, 250)))

sum(abs(msim-mpop) = abs(msamp-mpop))/1

I don't really see a reason to do bootstrapping here. You say you want to test 
whether your sample could be a random sample from the population, so just 
simulate that sampling (which should be without replacement, like your sample 
is). Bootstrapping might come in if you want a confidence interval for the mean 
difference between your sample and the rest.

Instead of sampling means, you could put a full-blown t-test inside the 
replicate expression, like:

psim - replicate(1, {s-sample(1:2000, 250); t.test(df$y[s], 
df$y[-s])$p.value})

and then check whether the p value for your sample is small compared to the 
distribution of values in psim.

That'll take quite a bit longer, though; t.test() is a more complex beast than 
mean(). It is not obvious that it has any benefits either, unless you 
specifically wanted to investigate the behavior of the t test. 

(All code untested. Caveat emptor.)


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round() and negative digits

2011-10-09 Thread Ted Harding
On 09-Oct-11 00:46:58, Carl Witthoft wrote:
 
 On 10/8/11 6:11 PM, (Ted Harding) wrote:
 
 Carl Witthoft's serendipitous discovery is a nice example
 of how secrets can be guessed by wondering what if ... ?.
 So probably you don;t need to tell the secrets.

 Taking the negative digits to their logical extreme:

round(654.321,2)
# [1] 654.32
round(654.321,1)
# [1] 654.3
round(654.321,0)
# [1] 654
round(654.321,-1)
# [1] 650
round(654.321,-2)
# [1] 700
round(654.321,-3)
# [1] 1000
round(654.321,-4)
# [1] 0

 which is what you'd logically expect (but is it what you
 would intuitively expect?).

 Oh, oh, somebody's going all metaphysical on us.

Nor should one forget the rounding rules (not OS-dependent
in this case, I think ... ?):

  round(5000,-4)
  # [1] 0
  round(15000,-4)
  # [1] 2

Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 09-Oct-11   Time: 08:59:58
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sapply(pred,cor,y=resp)

2011-10-09 Thread William Claster
Hello. I am wondering why I am getting NA for all 
in cors=sapply(pred,cor,y=resp). I suppose that each column in pred has NAs in 
them. Is there some way to fix this? Thanks


 str(pred)
'data.frame':   200 obs. of  13 variables:
 $ mnO2: num  9.8 8 11.4 4.8 9 13.1 10.3 10.6 3.4 9.9 ...
 $ Cl  : num  60.8 57.8 40 77.4 55.4 ...
 $ NO3 : num  6.24 1.29 5.33 2.3 10.42 ...
 $ NH4 : num  578 370 346.7 98.2 233.7 ...
 $ oPO4: num  105 428.8 125.7 61.2 58.2 ...
 $ PO4 : num  170 558.8 187.1 138.7 97.6 ...
 $ Chla: num  50 1.3 15.6 1.4 10.5 ...
 $ a1  : num  0 1.4 3.3 3.1 9.2 15.1 2.4 18.2 25.4 17 ...
 $ a2  : num  0 7.6 53.6 41 2.9 14.6 1.2 1.6 5.4 0 ...
 $ a3  : num  0 4.8 1.9 18.9 7.5 1.4 3.2 0 2.5 0 ...
 $ a4  : num  0 1.9 0 0 0 0 3.9 0 0 2.9 ...
 $ a5  : num  34.2 6.7 0 1.4 7.5 22.5 5.8 5.5 0 0 ...
 $ a6  : num  8.3 0 0 0 4.1 12.6 6.8 8.7 0 0 ...
 str(y=resp)
Error in length(object) : 'object' is missing
 str(resp)
 num [1:200] 8 8.35 8.1 8.07 8.06 8.25 8.15 8.05 8.7 7.93 ...
 cors=sapply(pred,cor,y=resp)
 cors
mnO2   Cl  NO3  NH4 oPO4  PO4 Chla   a1   a2   a3   a4   a5   a6 
  NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA 
 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sapply(pred,cor,y=resp)

2011-10-09 Thread Joshua Wiley
Hi,

It is probably more confusing with several steps combined, but you are
correct that it is because there are NAs.  It is fairly common for R
functions to return NA if there are any NA values unless you
explicitly set an argument on what to do with missing values.  A quick
look at ?cor clearly shows that there is such an argument with several
options.  Try adding ',  use = pairwise.complete.obs  ' to your
sapply call.

Hope that helps,

Josh

On Sun, Oct 9, 2011 at 12:47 AM, William Claster dmfall2...@yahoo.com wrote:
 Hello. I am wondering why I am getting NA for all 
 in cors=sapply(pred,cor,y=resp). I suppose that each column in pred has NAs 
 in them. Is there some way to fix this? Thanks


 str(pred)
 'data.frame':   200 obs. of  13 variables:
  $ mnO2: num  9.8 8 11.4 4.8 9 13.1 10.3 10.6 3.4 9.9 ...
  $ Cl  : num  60.8 57.8 40 77.4 55.4 ...
  $ NO3 : num  6.24 1.29 5.33 2.3 10.42 ...
  $ NH4 : num  578 370 346.7 98.2 233.7 ...
  $ oPO4: num  105 428.8 125.7 61.2 58.2 ...
  $ PO4 : num  170 558.8 187.1 138.7 97.6 ...
  $ Chla: num  50 1.3 15.6 1.4 10.5 ...
  $ a1  : num  0 1.4 3.3 3.1 9.2 15.1 2.4 18.2 25.4 17 ...
  $ a2  : num  0 7.6 53.6 41 2.9 14.6 1.2 1.6 5.4 0 ...
  $ a3  : num  0 4.8 1.9 18.9 7.5 1.4 3.2 0 2.5 0 ...
  $ a4  : num  0 1.9 0 0 0 0 3.9 0 0 2.9 ...
  $ a5  : num  34.2 6.7 0 1.4 7.5 22.5 5.8 5.5 0 0 ...
  $ a6  : num  8.3 0 0 0 4.1 12.6 6.8 8.7 0 0 ...
 str(y=resp)
 Error in length(object) : 'object' is missing
 str(resp)
  num [1:200] 8 8.35 8.1 8.07 8.06 8.25 8.15 8.05 8.7 7.93 ...
 cors=sapply(pred,cor,y=resp)
 cors
 mnO2   Cl  NO3  NH4 oPO4  PO4 Chla   a1   a2   a3   a4   a5   a6
   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA

        [[alternative HTML version deleted]]

In the future, please post in plain text, not HTML (as the posting
guide requests).


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with using last observation carried forward analysis for a clinical trial please

2011-10-09 Thread David Winsemius


On Oct 8, 2011, at 8:43 PM, maspitze wrote:


Hi,
I have a series of id's with multiple visits and questionnaire  
scores.  This
is a clinical trial that will be analyzed using the last observation  
carried
forward method.  In other words, in order to comply with intent to  
treat
analysis when many subjects withdraw, data points for the last visit  
must be
generated and filled in with the last observation.  The ultimate  
goal is to
tabulate the difference in qustionnaires between the start of the  
trial and

the end of trial.  I am lost at about how to do this.
Each subject had multiple visits, up to 13.  In general, they took a
questionnaire at each visit.  However, if a questionnaire was not  
completed

or the visit is missing, the data point does not exist.  To explain, I
created a table as analogy.

My goal is to take something that looks like the following:

ID  Visit   score
1   1   10
2   1   12
2   3   15
3   1   1
3   2   6
4   1   16
4   2   1
4   3   7
4   4   17

I think I then need to change to this in order to perfrom locf in zoo:
ID  Visit   score
1   1   10
1   2   na
1   3   na
1   4   na
2   1   12
2   2   na
2   3   15
2   4   na
3   1   1
3   2   6
3   3   na
3   4   na
4   1   16
4   2   1
4   3   7
4   4   17



require(zoo)
dat2 - merge(data.frame(ID=rep(1:4, each=4), Visit=rep(1:4, 4)), dat,  
all.x=TRUE)

 dat2$Vscr.fill - na.locf(dat2$score)
 dat2
 lapply(split(dat2, dat2$ID), function(x) x$Vscr.fill[4]-x 
$Vscr.fill[1] )

#
$`1`
[1] 0

$`2`
[1] 3

$`3`
[1] 5

$`4`
[1] 1

--
David



then change to:
ID  Visit   score
1   1   10
1   2   10
1   3   10
1   4   10
2   1   12
2   2   12
2   3   15
2   4   15
3   1   1
3   2   6
3   3   6
3   4   6
4   1   16
4   2   1
4   3   7
4   4   17

I would then like to take visit 4 and subtract visit 1 to create the
difference in the questionnaire scores during the clinical trial.  I  
will

then compare this score by t.test between placebo and drug groups.

Would anyone please have some guidance about how to do this in r?  I  
would

be grateful for assistance.
Regards, Matt


--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-using-last-observation-carried-forward-analysis-for-a-clinical-trial-please-tp3886396p3886396.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round() and negative digits

2011-10-09 Thread Duncan Murdoch

On 11-10-09 4:00 AM, (Ted Harding) wrote:

On 09-Oct-11 00:46:58, Carl Witthoft wrote:


On 10/8/11 6:11 PM, (Ted Harding) wrote:


Carl Witthoft's serendipitous discovery is a nice example
of how secrets can be guessed by wondering what if ... ?.
So probably you don;t need to tell the secrets.

Taking the negative digits to their logical extreme:

round(654.321,2)
# [1] 654.32
round(654.321,1)
# [1] 654.3
round(654.321,0)
# [1] 654
round(654.321,-1)
# [1] 650
round(654.321,-2)
# [1] 700
round(654.321,-3)
# [1] 1000
round(654.321,-4)
# [1] 0

which is what you'd logically expect (but is it what you
would intuitively expect?).


Oh, oh, somebody's going all metaphysical on us.


Nor should one forget the rounding rules (not OS-dependent
in this case, I think ... ?):

   round(5000,-4)
   # [1] 0
   round(15000,-4)
   # [1] 2


The intention is that those are not OS dependent, but since they rely on 
exact representations, there could be differences:  not all platforms 
support the extended real 80 bit intermediate representations.  (If 
you were rounding to 0 d.p., they should all agree on a round to even 
rule.  Rounding to -4 d.p. involves dividing by 10^4, and that could 
lead to errors.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round() and negative digits

2011-10-09 Thread Prof Brian Ripley

On Sat, 8 Oct 2011, Duncan Murdoch wrote:


On 11-10-07 5:26 PM, Carl Witthoft wrote:

Just wondering here -- I tested and found to my delight that

%  round(325.4,-2)
[1] 300

gave me exactly what I would have expected (and wanted).  Since it's not
explicitly mentioned in the documentation that negative 'digits' is
allowed, I just wanted to ask whether this behavior is intentional or a
happy turn of events.  I'm always paranoid that something not explicitly
documented might disappear in future revisons.



It is intentional, and one of the regression tests confirms that it's there, 
so it won't disappear by mistake, and would be very unlikely to disappear 
intentionally.


It needs careful documentation though (as do the corner cases of 
signif).  Things like

round(325.4,-3)

[1] 0

signif(325.4,-3)

[1] 300

signif(325.4,0)

[1] 300
may not be what you expect.

Sometimes it is better not to document things than try to give precise 
details which may get changed *and* there will be useRs who misread 
(and maybe even file bug reports on their misreadings).  The source is 
the ultimate documentation.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generalized Additive Models: How to create publication-ready regression tables

2011-10-09 Thread Emilio López
You may build your customized matrix merging the components of the objects
before calling the xtable function:

my.matrix - rbind(model$coefficients, [vector containing errors])
xtable(my.matrix)

(I'm sorry I don't know exactly where the standard errors are stored / how
to compute them)
You can paste parentheses before and after the number with the paste
function.

Best,
Emilio


Maybe your can manipulate the standard errors in the object as text, pasting
the parenthesis before and after with the paste function (before you call
the xtable function)

2011/10/8 davidyeager dyea...@gmail.com

 Thanks!

 Yes, that produces tables that are formatted in the same way as the gam
 output.  I'm hoping to have publication-ready tables that have standard
 errors in parentheses listed below coefficients.  Do you know of a method
 to
 do that?

 David

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Generalized-Additive-Models-How-to-create-publication-ready-regression-tables-tp3884432p3885238.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with statistics in R - how to measure the effect of users in groups

2011-10-09 Thread gj
Hi,

I'm a newbie to R. My knowledge of statistics is mostly self-taught. My
problem is how to measure the effect of users in groups. I can calculate a
particular attribute for a user in a group. But my hypothesis is that the
user's attribute is not independent of each other and that the user's
attribute depends on the group ie that user's behaviour change based on the
group.

Let me give an example:

users*Group 1*Group 2*Group 3
u1*10*5*n/a
u2*6*n/a*4
u3*5*2*3

For example, I want to be able to prove that u1 behaviour is different in
group 1 than other groups and the particular thing about Group 1 is that
users in Group 1 tend to have a higher value of the attribute under
measurement.


Hence, can use R to test my hypothesis. I'm willing to learn; so if this is
very simple, just point me in the direction of any online resources about
it. At the moment, I don't even how to define these class of problems? That
will be a start.

Regards
Gawesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding solution

2011-10-09 Thread Bogaso Christofer
Dear all, I have a system of simultaneous equations with 2 unknowns as
follows:

 

x*y + (1-x) = 0.05

x*(y - .5)^2 + (1-x)*0.6 = 0.56^2

 

Ofcourse I can do it manually however wondering whether there is any direct
way in R available to get the solution of this system?

 

Thanks and regards,

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding solution

2011-10-09 Thread peter dalgaard

On Oct 9, 2011, at 14:02 , Bogaso Christofer wrote:

 Dear all, I have a system of simultaneous equations with 2 unknowns as
 follows:
 
 
 
 x*y + (1-x) = 0.05
 
 x*(y - .5)^2 + (1-x)*0.6 = 0.56^2
 
 
 
 Ofcourse I can do it manually however wondering whether there is any direct
 way in R available to get the solution of this system?


Not really (can't vouch for all 3000+ packages, though...)

You can sometimes get away with converting  to a minimization problem:

 f - function(xy,x=xy[1],y=xy[2])(x*y + (1-x) - 0.05)^2+(x*(y - .5)^2 + 
 (1-x)*0.6 - 0.56^2)^2
 optim(par=c(0,0),f,method=BFGS)$par
[1]  0.91674574 -0.03627402

$value
[1] 5.351777e-13

$counts
function gradient 
 39   13 

$convergence
[1] 0

$message
NULL


Notice that there is some risk of falling into a local minimum which has 
nothing to do with the solution. Always check that the minimum actually comes 
out as zero.


 
 
 
 Thanks and regards,
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread peter dalgaard

On Oct 9, 2011, at 12:00 , francesca casalino wrote:

 Thank you very much to both Ken and Peter for the very helpful explanations. 
 
 Just to understand this better (sorry for repeating but I am also new in 
 statistics…so please correct me where I am wrong):
 
 Ken' method:
 Random sampling of the mean, and then using these means to construct a 
 distribution of means (the 'null' distribution), and I can then use this 
 normal distribution and compare the population mean to my mean using, for 
 example, z-score.
 Of note: The initial distributions are not normal, so I thought I needed to 
 base my calculations on the median, but I can use the mean to construct a 
 normal distribution. 
 This would be defined a bootstrap test.  
 
 Peter's method:
 Random sampling of the mean, and then comparing each sampled mean with the 
 population mean and see if it is higher or equal to the difference between my 
 sample and the population mean. This is a permutation test, but to actually 
 get CI and a p-value I would need bootstrap method. 
 
 Did I understand this correctly? 
 
 I tried to start with Ken's approach for now, and followed his steps, but: 
 
 1) I get a lot of NaN in the sampling distribution, is this normal? 
 2) I think I am doing again something wrong when I try to find a p-value…This 
 is what I did:
 
 nreps=1 
 mean.dist=rep(NA,nreps) 
 
 for(replication in 1:nreps) 
 { 
 my.sample=sample(population$Y, 250, replace=F) 
 #Peter mentioned that this sampling should be without replacement, so I went 
 for that---
 
 mean.for.rep=mean(my.sample) #mean for this replication 
 mean.dist[replication]=mean.for.rep #store the mean 
 } 
 
 hist(mean.dist,main=Null Dist of Means, col=chartreuse) 
  #Show the means in a nifty color 
 
 mean_dist= mean(mean.dist, na.rm=TRUE)
 sd_pop= sd(mean.dist, na.rm=TRUE)
 
 mean_sample= mean(population$Y, na.rm=TRUE)
 

You mean sample$Y, I hope.


 z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089))
 p_value= 2*pnorm(-abs(z_stat))
 
 Is this correct? 

No. First, notice that replicate() does all the red tape of the for loop for 
you. It's not a grave error to do what you're doing, but I did give you code, 
so maybe you should try it...

Worse: The whole point of doing a permutation test is to compare the observed 
value directly to the actual permutation distribution. So you take your 
observed mean and see whether it falls in the middle or in the tails of the 
histogram. The p value can be estimated as the relative frequency of simulated 
means falling as far or further out in the tails as the observed one. 

You do not want to approximate the simulated distribution with a normal 
distribution. If that's what you wanted, you could do it with no simulation at 
all -- there are simple formulas relating the mean and variance of the sample 
mean to the mean and variance of the population, and you'd more or less wind up 
reinventing the two-sample t-test.


 THANK YOU SO MUCH FOR ALL YOUR HELP!! 
 
 2011/10/9 Ken Hutchison vicvoncas...@gmail.com
 Hi Francy, 
   A bootstrap test would likely be sufficient for this problem, but a 
 one-sample t-test isn't advisable or necessary in my opinion. If you use a 
 t-test multiple times you are making assumptions about the distribution of 
 your data; more importantly, your probability of Type 1 error will be 
 increased with each test. So, a valid thing to do would be to sample 
 (computation for this problem won't be expensive so do alotta reps) and 
 compare your mean to the null distribution of means. I.E.
 
 nreps=1
 mean.dist=rep(NA,nreps)
 
 for(replication in 1:nreps)
 {
 my.sample=sample(population, 2500, replace=T) 
 #replace could be false, depends on preference
 mean.for.rep=mean(my.sample) #mean for this replication
 mean.dist[replication]=mean.for.rep #store the mean
 }
 
 hist(mean.dist,main=Null Dist of Means, col=chartreuse) 
  #Show the means in a nifty color
 
 You can then perform various tests given the null distribution, or infer from 
 where your sample mean lies within the distribution or something to that 
 effect. Also, if the distribution is normal, which is somewhat likely since 
 it is a distribution of means: (shapiro.test or require(nortest) ad.test will 
 let you know) you should be able to make inference from that using parametric 
 methods (once) which will fit the truth a bit better than a t.test.
 Hope that's helpful, 
Ken Hutchison
 
 
 On Sat, Oct 8, 2011 at 10:04 AM, francy francy.casal...@gmail.com wrote:
 Hi,
 
 I am having trouble understanding how to approach a simulation:
 
 I have a sample of n=250 from a population of N=2,000 individuals, and I
 would like to use either permutation test or bootstrap to test whether this
 particular sample is significantly different from the values of any other
 random samples of the same population. I thought I needed to take random
 samples (but I am not sure how many simulations I need to do) of n=250 from
 the N=2,000 

[R] strucchange Nyblom-Hansen Test?

2011-10-09 Thread buehlerman
I want to apply Nyblom-Hansen test with the strucchange package, but I don't
know how is the correct way and what is the difference between the following
two approaches (leeding to different results):


data(longley)

# 1. Approach:
sctest(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley,
type = Nyblom-Hansen)

#results in:
#Score-based CUSUM test with mean L2 norm
#
#data:  Employed ~ Year + GNP.deflator + GNP + Armed.Forces 
#f(efp) = 0.8916, p-value = 0.4395

#2. Approach:
sctest(gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data =
longley), functional = meanL2BB)

#results in:
#M-fluctuation test
#
#data:  gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data =
longley) 
#f(efp) = 0.8165, p-value = 0.3924


I could not find any examples or further remarks of the first approach with
sctest(..., type = Nyblom-Hansen).
Maybe the first approach is unlike the second no joint test for all
coefficients? 

Thank you in advance for your help!

--
View this message in context: 
http://r.789695.n4.nabble.com/strucchange-Nyblom-Hansen-Test-tp3887208p3887208.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread francesca casalino
Thank you very much to both Ken and Peter for the very helpful
explanations.

Just to understand this better (sorry for repeating but I am also new in
statistics…so please correct me where I am wrong):

Ken' method:
Random sampling of the mean, and then using these means to construct a
distribution of means (the 'null' distribution), and I can then use this
normal distribution and compare the population mean to my mean using, for
example, z-score.
Of note: The initial distributions are not normal, so I thought I needed to
base my calculations on the median, but I can use the mean to construct a
normal distribution.
This would be defined a bootstrap test.

Peter's method:
Random sampling of the mean, and then comparing each sampled mean with the
population mean and see if it is higher or equal to the difference between
my sample and the population mean. This is a permutation test, but to
actually get CI and a p-value I would need bootstrap method.

Did I understand this correctly?

I tried to start with Ken's approach for now, and followed his steps, but:

1) I get a lot of NaN in the sampling distribution, is this normal?
2) I think I am doing again something wrong when I try to find a
p-value…This is what I did:

nreps=1
mean.dist=rep(NA,nreps)

for(replication in 1:nreps)
{
my.sample=sample(population$Y, 250, replace=F)
#Peter mentioned that this sampling should be without replacement, so I went
for that---

mean.for.rep=mean(my.sample) #mean for this replication
mean.dist[replication]=mean.for.rep #store the mean
}

hist(mean.dist,main=Null Dist of Means, col=chartreuse)
 #Show the means in a nifty color

mean_dist= mean(mean.dist, na.rm=TRUE)
sd_pop= sd(mean.dist, na.rm=TRUE)

mean_sample= mean(population$Y, na.rm=TRUE)

z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089))
p_value= 2*pnorm(-abs(z_stat))

Is this correct?
THANK YOU SO MUCH FOR ALL YOUR HELP!!

2011/10/9 Ken Hutchison vicvoncas...@gmail.com

 Hi Francy,
   A bootstrap test would likely be sufficient for this problem, but a
 one-sample t-test isn't advisable or necessary in my opinion. If you use a
 t-test multiple times you are making assumptions about the distribution of
 your data; more importantly, your probability of Type 1 error will be
 increased with each test. So, a valid thing to do would be to sample
 (computation for this problem won't be expensive so do alotta reps) and
 compare your mean to the null distribution of means. I.E.

 nreps=1
 mean.dist=rep(NA,nreps)

 for(replication in 1:nreps)
 {
 my.sample=sample(population, 2500, replace=T)
 #replace could be false, depends on preference
 mean.for.rep=mean(my.sample) #mean for this replication
 mean.dist[replication]=mean.for.rep #store the mean
 }

 hist(mean.dist,main=Null Dist of Means, col=chartreuse)
  #Show the means in a nifty color

 You can then perform various tests given the null distribution, or infer
 from where your sample mean lies within the distribution or something to
 that effect. Also, if the distribution is normal, which is somewhat likely
 since it is a distribution of means: (shapiro.test or require(nortest)
 ad.test will let you know) you should be able to make inference from that
 using parametric methods (once) which will fit the truth a bit better than a
 t.test.
 Hope that's helpful,
Ken Hutchison


 On Sat, Oct 8, 2011 at 10:04 AM, francy francy.casal...@gmail.com wrote:

 Hi,

 I am having trouble understanding how to approach a simulation:

 I have a sample of n=250 from a population of N=2,000 individuals, and I
 would like to use either permutation test or bootstrap to test whether
 this
 particular sample is significantly different from the values of any other
 random samples of the same population. I thought I needed to take random
 samples (but I am not sure how many simulations I need to do) of n=250
 from
 the N=2,000 population and maybe do a one-sample t-test to compare the
 mean
 score of all the simulated samples, + the one sample I am trying to prove
 that is different from any others, to the mean value of the population.
 But
 I don't know:
 (1) whether this one-sample t-test would be the right way to do it, and
 how
 to go about doing this in R
 (2) whether a permutation test or bootstrap methods are more appropriate

 This is the data frame that I have, which is to be sampled:
 df-
 i.e.
 x y
 1 2
 3 4
 5 6
 7 8
 . .
 . .
 . .
 2,000

 I have this sample from df, and would like to test whether it is has
 extreme
 values of y.
 sample1-
 i.e.
 x y
 3 4
 7 8
 . .
 . .
 . .
 250

 For now I only have this:

 R=999 #Number of simulations, but I don't know how many...
 t.values =numeric(R) #creates a numeric vector with 999 elements,
 which
 will hold the results of each simulation.
 for (i in 1:R) {
 sample1 - df[sample(nrow(df), 250, replace=TRUE),]

 But I don't know how to continue the loop: do I calculate the mean for
 each
 simulation and compare it to the population mean?
 Any help you could 

Re: [R] Finding solution

2011-10-09 Thread Bert Gunter
R is not the right tool for all things. This looks like a job for a computer
algebra system.

That said, R **does** have at least one interface to such a system. See the
Ryacas package (check my capitalization, which may be wrong). HelpeRs may
provide you with others.

-- Bert

On Sun, Oct 9, 2011 at 5:02 AM, Bogaso Christofer 
bogaso.christo...@gmail.com wrote:

 Dear all, I have a system of simultaneous equations with 2 unknowns as
 follows:



 x*y + (1-x) = 0.05

 x*(y - .5)^2 + (1-x)*0.6 = 0.56^2



 Ofcourse I can do it manually however wondering whether there is any direct
 way in R available to get the solution of this system?



 Thanks and regards,




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pdIdent in smoothing regression model

2011-10-09 Thread Lei Liu

Hi there,

I am reading the 2004 paper Smoothing with mixed model software in 
Journal of Statistical Software, by Ngo and Wand. I tried to run 
their first example in Section 2.1 using R but I had some problems. 
Here is the code:


library(nlme)
fossil - read.table(fossil.dat,header=T)
x - fossil$age
y - 10*fossil$strontium.ratio
knots - seq(94,121,length=25)
n - length(x)
X - cbind(rep(1,n),x)
Z - outer(x,knots,-)
Z - Z*(Z0)
fit - lme(y~-1+X,random=pdIdent(~-1+Z))

When I ran the code

fit - lme(y~-1+X,random=pdIdent(~-1+Z))

I got an error message:

Error in getGroups.data.frame(dataMix, groups) :   Invalid formula for groups

I was really puzzled. I asked Dr. Ngo and he said they did it in 
S-plus but not R. Does anyone knows how to do it in R? Thanks!


Lei Liu
Associate Professor
Division of Biostatistics
Department of Public Health Sciences
University of Virginia School of Medicine

http://people.virginia.edu/~ll9f/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread Bert Gunter
I may be speaking out of turn here, but I would prefer not to see R-help
turn into a tutorial site for basic statistics.Such sites already exist
(e.g. http://stats.stackexchange.com/).

I realize that there is occasionally reason to venture down this path a way
within legitimate R contexts, but this seems to me to have gone way beyond.

Just my opinion, of course.

-- Bert

On Sun, Oct 9, 2011 at 3:00 AM, francesca casalino 
francy.casal...@gmail.com wrote:

 Thank you very much to both Ken and Peter for the very helpful
 explanations.

 Just to understand this better (sorry for repeating but I am also new in
 statistics…so please correct me where I am wrong):

 Ken' method:
 Random sampling of the mean, and then using these means to construct a
 distribution of means (the 'null' distribution), and I can then use this
 normal distribution and compare the population mean to my mean using, for
 example, z-score.
 Of note: The initial distributions are not normal, so I thought I needed to
 base my calculations on the median, but I can use the mean to construct a
 normal distribution.
 This would be defined a bootstrap test.

 Peter's method:
 Random sampling of the mean, and then comparing each sampled mean with the
 population mean and see if it is higher or equal to the difference between
 my sample and the population mean. This is a permutation test, but to
 actually get CI and a p-value I would need bootstrap method.

 Did I understand this correctly?

 I tried to start with Ken's approach for now, and followed his steps, but:

 1) I get a lot of NaN in the sampling distribution, is this normal?
 2) I think I am doing again something wrong when I try to find a
 p-value…This is what I did:

 nreps=1
 mean.dist=rep(NA,nreps)

 for(replication in 1:nreps)
 {
 my.sample=sample(population$Y, 250, replace=F)
 #Peter mentioned that this sampling should be without replacement, so I
 went
 for that---

 mean.for.rep=mean(my.sample) #mean for this replication
 mean.dist[replication]=mean.for.rep #store the mean
 }

 hist(mean.dist,main=Null Dist of Means, col=chartreuse)
  #Show the means in a nifty color

 mean_dist= mean(mean.dist, na.rm=TRUE)
 sd_pop= sd(mean.dist, na.rm=TRUE)

 mean_sample= mean(population$Y, na.rm=TRUE)

 z_stat= (mean_sample - mean_dist)/(sd_pop/sqrt(2089))
 p_value= 2*pnorm(-abs(z_stat))

 Is this correct?
 THANK YOU SO MUCH FOR ALL YOUR HELP!!

 2011/10/9 Ken Hutchison vicvoncas...@gmail.com

  Hi Francy,
A bootstrap test would likely be sufficient for this problem, but a
  one-sample t-test isn't advisable or necessary in my opinion. If you use
 a
  t-test multiple times you are making assumptions about the distribution
 of
  your data; more importantly, your probability of Type 1 error will be
  increased with each test. So, a valid thing to do would be to sample
  (computation for this problem won't be expensive so do alotta reps) and
  compare your mean to the null distribution of means. I.E.
 
  nreps=1
  mean.dist=rep(NA,nreps)
 
  for(replication in 1:nreps)
  {
  my.sample=sample(population, 2500, replace=T)
  #replace could be false, depends on preference
  mean.for.rep=mean(my.sample) #mean for this replication
  mean.dist[replication]=mean.for.rep #store the mean
  }
 
  hist(mean.dist,main=Null Dist of Means, col=chartreuse)
   #Show the means in a nifty color
 
  You can then perform various tests given the null distribution, or infer
  from where your sample mean lies within the distribution or something to
  that effect. Also, if the distribution is normal, which is somewhat
 likely
  since it is a distribution of means: (shapiro.test or require(nortest)
  ad.test will let you know) you should be able to make inference from that
  using parametric methods (once) which will fit the truth a bit better
 than a
  t.test.
  Hope that's helpful,
 Ken Hutchison
 
 
  On Sat, Oct 8, 2011 at 10:04 AM, francy francy.casal...@gmail.com
 wrote:
 
  Hi,
 
  I am having trouble understanding how to approach a simulation:
 
  I have a sample of n=250 from a population of N=2,000 individuals, and I
  would like to use either permutation test or bootstrap to test whether
  this
  particular sample is significantly different from the values of any
 other
  random samples of the same population. I thought I needed to take random
  samples (but I am not sure how many simulations I need to do) of n=250
  from
  the N=2,000 population and maybe do a one-sample t-test to compare the
  mean
  score of all the simulated samples, + the one sample I am trying to
 prove
  that is different from any others, to the mean value of the population.
  But
  I don't know:
  (1) whether this one-sample t-test would be the right way to do it, and
  how
  to go about doing this in R
  (2) whether a permutation test or bootstrap methods are more appropriate
 
  This is the data frame that I have, which is to be sampled:
  df-
  i.e.
  x y
  1 2
  3 4
  5 6
  7 8
  . .
  . .
  . .
  

Re: [R] strucchange Nyblom-Hansen Test?

2011-10-09 Thread Achim Zeileis

On Sun, 9 Oct 2011, buehlerman wrote:

I want to apply Nyblom-Hansen test with the strucchange package, but I 
don't know how is the correct way and what is the difference between the 
following two approaches (leeding to different results):


The difference is that sctest(formula, type = Nyblom-Hansen) applies the 
Nyblom-Hansen test statistic to a model which assesses both coefficients 
_and_ error variance.


The approach via functional = meanL2BB, on the other hand, allows to apply 
the same type of test statistic to the score functions of any model. In 
your case, where you used the default fit = glm in gefp(), a linear 
regression model is used where the error variance is _not_ included as a 
full model parameter but only as a nuisance parameter. Hence, the 
difference.


Of course, one may also add another score function for the error variance. 
On example(DIJA, package = strucchange) I provide a function normlm() 
with corresponding estfun() method. If you load these, you can do:


R sctest(gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = 
longley, fit = normlm), functional = meanL2BB)


M-fluctuation test

data:  gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley,
  fit = normlm)
f(efp) = 0.8916, p-value = 0.4395

which leads to the same output as sctest(formula, type = Nyblom-Hansen).

Finally, instead of using gefp(..., fit = normlm), you could have also 
used efp(..., type = Score-CUSUM):


R sctest(efp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = 
longley, type = Score-CUSUM), functional = meanL2)


Score-based CUSUM test with mean L2 norm

data:  efp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = 
longley,  type = Score-CUSUM)

f(efp) = 0.8916, p-value = 0.4395

I hope that this clarifies more than adding to the confusion ;-)

The reason for the various approaches is that efp() was always confined to 
the linear model and gefp() then extended it to arbitrary estimating 
function-based models. And for the linear model this provides the option 
of treating the variance of a nuisance parameter or a full model 
parameter.


Hope that helps,
Z




data(longley)

# 1. Approach:
sctest(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data = longley,
type = Nyblom-Hansen)

#results in:
#Score-based CUSUM test with mean L2 norm
#
#data:  Employed ~ Year + GNP.deflator + GNP + Armed.Forces
#f(efp) = 0.8916, p-value = 0.4395

#2. Approach:
sctest(gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data =
longley), functional = meanL2BB)

#results in:
#M-fluctuation test
#
#data:  gefp(Employed ~ Year + GNP.deflator + GNP + Armed.Forces, data =
longley)
#f(efp) = 0.8165, p-value = 0.3924


I could not find any examples or further remarks of the first approach with
sctest(..., type = Nyblom-Hansen).
Maybe the first approach is unlike the second no joint test for all
coefficients?

Thank you in advance for your help!

--
View this message in context: 
http://r.789695.n4.nabble.com/strucchange-Nyblom-Hansen-Test-tp3887208p3887208.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Substract previous element

2011-10-09 Thread Alaios
Dear all,
I have a matrix with data and I want to substract from every value the previous 
element.

Let's assume that my vector(matrix) is c-(1,2,3,4,5)
I want to get remove_previous c-(0,1,2,3,4).

How I can do that efficiently in R?

I would like to thank you in advance for your help
B.R
Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generalized Additive Models: How to create publication-ready regression tables

2011-10-09 Thread Frank Harrell
In many cases a flexible parametric fit, using regression splines, will
result in a fit that is as good as a gam, with similar regression shapes. 
The rms package has a latex method that will represent such fits in
interpretable algebraic form.  latex(fit) does that, and print(fit,
latex=TRUE) will give a nice table though not formatted in the way you
described.
Frank

Emilio López wrote:
 
 You may build your customized matrix merging the components of the
 objects
 before calling the xtable function:
 
 my.matrix - rbind(model$coefficients, [vector containing errors])
 xtable(my.matrix)
 
 (I'm sorry I don't know exactly where the standard errors are stored / how
 to compute them)
 You can paste parentheses before and after the number with the paste
 function.
 
 Best,
 Emilio
 
 
 Maybe your can manipulate the standard errors in the object as text,
 pasting
 the parenthesis before and after with the paste function (before you call
 the xtable function)
 
 2011/10/8 davidyeager lt;dyeager@gt;
 
 Thanks!

 Yes, that produces tables that are formatted in the same way as the gam
 output.  I'm hoping to have publication-ready tables that have standard
 errors in parentheses listed below coefficients.  Do you know of a method
 to
 do that?

 David

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Generalized-Additive-Models-How-to-create-publication-ready-regression-tables-tp3884432p3885238.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Generalized-Additive-Models-How-to-create-publication-ready-regression-tables-tp3884432p3887519.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] axes3d/bbox3d - axis values not fixed

2011-10-09 Thread Ben qant
Excellent! Thank you!

Ben

On Sat, Oct 8, 2011 at 3:36 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote:

 On 11-10-08 11:04 AM, Ben qant wrote:

 Thank you!

 Sorry, I have a couple more questions:
 1) How to I turn off the box shading completely? I figured out how to
 lighten it up to a grey color with col=c(white...  . I looked in the
 package pdf, but I didn't see anything.
 2) I read about the zunit in the package pdf, but no matter what I change
 it
 to it doesn't seem to change anything.

 Here is where I am at right now:

 x- 1:10
 y- 1:10
 z- matrix(outer(x-5,y-5) + rnorm(100), 10, 10)
 open3d()
 persp3d(x, y, z, col=red,
 alpha=0.7,aspect=c(1,1,1),**xlab='',ylab='',zlab='z', axes=F)
 bbox3d(xat=c(5, 6), xlab=c(a, b), yat=c(2,4,6), zunit=10,
 col=c(white,black))


 You need to play with the material properties of the bounding box.  I
 think this gives what you want:


 bbox3d(xat=c(5, 6), xlab=c(a, b), yat=c(2,4,6), zunit=10,
  col=black, front=line, back=line, lit=FALSE)

 I see different axes for zunit=5, zunit=10, zunit=20.  Not sure why you
 don't.

 Duncan

  Thank you for your help!

 Ben


 On Sat, Oct 8, 2011 at 5:09 AM, Duncan 
 Murdochmurdoch.duncan@gmail.**commurdoch.dun...@gmail.com
 wrote:

  On 11-10-07 2:32 PM, Ben qant wrote:

 Hello,

 I'm using the rgl package and plotting a plot with it. I'd like to have

 all

 the axes values auto-hide, but I want to plot a series of characters

 instead

 of the values of the measurement for 2 of the axes. So in the end I will
 have one axis (z actually) behave per normal (auto-hide) and I'd like
 the
 other two axes to be custom character vectors that auto-hide.


 The axes in rgl are a little messy.  It's been on my todo list for a long
 time to fix them, but there are a lot of details to handle, and only so
 much
 time.

 Essentially there are two separate systems for axes:  the bbox3d
 system,
 and the axis3d system.  The former is the ones that appear and
 disappear,
 the latter is really just lines and text added to the plot.



 Example:
 x- 1:10
 y- 1:10
 z- matrix(outer(x-5,y-5) + rnorm(100), 10, 10)
 open3d()
 persp3d(x, y, z, col=red, alpha=0.7,
 aspect=c(1,1,0.5),xlab='',ylab='',zlab='', axes=F)


 For the above, axes=F for demonstration purposes only. Now when I call:
 axes3d()
 ...the axis values hide/behave the way I want, but I want my own

 characters

 in there instead of the values that default.


 You want to use the bbox3d() call.  For example,

 bbox3d(xat=c(5, 10), xlab=c(V, X), yat=c(2,4,6), zunit=10)

 for three different types of labels on the three axes.  It would be nice
 if
 axis3d had the same options as bbox3d, but so far it doesn't.

 Duncan Murdoch



 Trying again:
 open3d()
 persp3d(x, y, z, col=red, alpha=0.7,
 aspect=c(1,1,0.5),xlab='',ylab='',zlab='', axes=F)

 axis3d('x',labels='test')
 ...puts in a custom character label 'test', but I loose the

 behavior/hiding.



 Also, then how do I get the values for the z axis to populate with the
 default values and auto-hide with the other two custom string axes
 auto-hiding?

 I'm pretty sure I need to use bbox3d(), but I'm not having any luck.

 I'm new'ish to R and very new to the rgl package. Hopefully that makes
 sense.

 Thanks for your help!

 ben

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/**listinfo/r-help
 https://stat.**ethz.ch/mailman/listinfo/r-**helphttps://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide http://www.R-project.org/**

 posting-guide.htmlhttp://www.**R-project.org/posting-guide.**htmlhttp://www.R-project.org/posting-guide.html
 

  and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Substract previous element

2011-10-09 Thread R. Michael Weylandt michael.weyla...@gmail.com
Take a look at diff() though offhand I dont know what it does to matrices

M

On Oct 9, 2011, at 11:00 AM, Alaios ala...@yahoo.com wrote:

 Dear all,
 I have a matrix with data and I want to substract from every value the 
 previous element.
 
 Let's assume that my vector(matrix) is c-(1,2,3,4,5)
 I want to get remove_previous c-(0,1,2,3,4).
 
 How I can do that efficiently in R?
 
 I would like to thank you in advance for your help
 B.R
 Alex
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Substract previous element

2011-10-09 Thread Jeff Newmiller
Vectors are not matrices in R, though matrices are special cases of vectors.

See ?diff for a solution that works with vectors.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Alaios ala...@yahoo.com wrote:

Dear all,
I have a matrix with data and I want to substract from every value the previous 
element.

Let's assume that my vector(matrix) is c-(1,2,3,4,5)
I want to get remove_previous c-(0,1,2,3,4).

How I can do that efficiently in R?

I would like to thank you in advance for your help
B.R
Alex

[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] variable name question

2011-10-09 Thread Deepankar Basu
Hi All,

This is surely an easy question but somehow I am not being able to get it.

I am using R 2.13.2 and have a data set where variable names like this
appear:

pci1990, pci1991, ... , pci2009.

pci1990 has data on per capita income for 1990, pci1991 has data on per
capita income for 1991, and so on.

I would like to create the logarithm of per capita for each of the year and
could do so in STATA with the following commands:

forvalues number = 1990/2009 {
gen lpci`number' = log(pci`number')
}

What would be the corresponding set of commands in R?

Thanks a lot in advance.

Deepankar

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] convert apply to lappy

2011-10-09 Thread Alaios
Dear all I want to convert a apply to lapply. The reason for that is that there 
is a function mclappy that uses exact the same format as the lapply function.

My code looks like that

mean_power_per_tip - function(data) {
    return((apply(data[,],2,MeanTip)));
}

where data is a [m,n] matrix.

I would like to thank you in advance for your help

B.R
Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable name question

2011-10-09 Thread R. Michael Weylandt
This is surely an easy question but somehow I am not being able to get it.

get() is the key -- it takes a string and returns the object with that
string as its name. Assign() goes the other way

Try this:

for (i in 1990:2009) {
varName = paste(pci, i, collapse = )
assign(varName, log(get(varName))
}

That said, the standard advice is that its usually more R-ish to keep
all your data in a list, data frame or, for this case, a matrix.

Michael

On Sun, Oct 9, 2011 at 11:46 AM, Deepankar Basu basu...@gmail.com wrote:
 Hi All,

 This is surely an easy question but somehow I am not being able to get it.

 I am using R 2.13.2 and have a data set where variable names like this
 appear:

 pci1990, pci1991, ... , pci2009.

 pci1990 has data on per capita income for 1990, pci1991 has data on per
 capita income for 1991, and so on.

 I would like to create the logarithm of per capita for each of the year and
 could do so in STATA with the following commands:

 forvalues number = 1990/2009 {
    gen lpci`number' = log(pci`number')
 }

 What would be the corresponding set of commands in R?

 Thanks a lot in advance.

 Deepankar

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fast or space-efficient lookup?

2011-10-09 Thread ivo welch
Dear R experts---I am struggling with memory and speed issues.  Advice
would be appreciated.

I have a long data set (of financial stock returns, with stock name
and trading day).  All three variables, stock return, id and day, are
irregular.  About 1.3GB in object.size (200MB on disk).  now, I need
to merge the main data set with some aggregate data (e.g., the SP500
market rate of return, with a day index) from the same day.  this
market data set is not a big data set (object.size=300K, 5 columns,
12000 rows).

let's say my (dumb statistical) plan is to run one grand regression,
where the individual rate of return is y and the market rate of return
is x.  the following should work without a problem:

combined - merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE )
lm( stockreturn ~ marketreturn, data=combined )

alas, the merge is neither space-efficient nor fast.  in fact, I run
out of memory on my 16GB linux machine.  my guess is that by whittling
it down, I could work it (perhaps doing it in chunks, and then
rbinding it), but this is painful.

in perl, I would define a hash with the day as key and the market
return as value, and then loop over the main data set to supplement
it.

is there a recommended way of doing such tasks in R, either super-fast
(so that I merge many many times) or space efficient (so that I merge
once and store the results)?

sincerely,

/iaw


Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert apply to lappy

2011-10-09 Thread Joshua Wiley
Hi Alex,

If data is a matrix, probably the easiest option would be:

tips - as.data.frame(data)

mclapply(tips, foo)

By the way, I would recommend not using 'data' (which is also a
function) as the name of the object storing your data.  If your data
set has many columns and performance is an issue I might convert it to
a list instead of a data frame.  Note that if you wanted the
equivalent of apply(tips, 1, foo), you could transpose your matrix
first:  as.data.frame(t(data)).  lapply works on columns of a data
frame because each column is basically an element of a list (list
apply).

Cheers,

Josh

On Sun, Oct 9, 2011 at 8:47 AM, Alaios ala...@yahoo.com wrote:
 Dear all I want to convert a apply to lapply. The reason for that is that 
 there is a function mclappy that uses exact the same format as the lapply 
 function.

 My code looks like that

 mean_power_per_tip - function(data) {
     return((apply(data[,],2,MeanTip)));
 }

 where data is a [m,n] matrix.

 I would like to thank you in advance for your help

 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple levelplot with title

2011-10-09 Thread Richard O. Legendi

Hi all,

I'm new to R and to the mailing list, so please bear with me :-)

I would like to create multiple levelplots on the same chart with a nice 
main title with something like this:


  print(levelplot(matrix(c(1,2,3,4), 2, 2)), split=c(1, 1, 2, 1))
  print(levelplot(matrix(c(1,2,3,4), 2, 2)), split=c(2, 1, 2, 1),
newpage=FALSE)

I found a trick:

  mtext(Test, outer = TRUE, cex = 1.5)

here:

  https://stat.ethz.ch/pipermail/r-help/2008-July/168163.html

but it doesn't works for me.

Could anyone please show me some pointers what should I read in order to 
get an insight why this isn't working as I expect?


What I managed to find a workaround by using panel.text(), but I don't 
really like it since it requires defined x/y coordinates and not scales 
if the picture is resized.


panel.text(x=20, y=110, Test)

Thanks in advance!
Richard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ANOVA from imported data has only 1 degree of freedom

2011-10-09 Thread shardman
Hi,

I'm trying to analyse some data I have imported into R from a .csv file but
when I carry out the aov command the results show only one degree of freedom
when there should be 14. Does anyone know why? I'd really appreciate some
help, the data is pasted below.


/The imported table looks ike this this:/

Order Transect Sample Abundance
1  Coleoptera1  113
2  Coleoptera1  212
3  Coleoptera1  311
4  Coleoptera1  413
5  Coleoptera1  5 6
6  Coleoptera2  118
7  Coleoptera2  218
8  Coleoptera2  316
9  Coleoptera2  421
10 Coleoptera2  511
11 Coleoptera3  119
12 Coleoptera3  216
13 Coleoptera3  3 9
14 Coleoptera3  432
15 Coleoptera3  529

/The command I am using is this:/

anova2-aov(Abundance~Transect,data=beetle)

/The results come out like this:/

Df Sum Sq Mean Sq F value  Pr(F)  
Transect 1 250.00 250.000  7.2394 0.01852 *
Residuals   13 448.93  34.533  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Cheers,
Sam


--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887528.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread Tim Hesterberg
I'll concur with Peter Dalgaard that 
* a permutation test is the right thing to do - your problem is equivalent
  to a two-sample test,
* don't bootstrap, and
* don't bother with t-statistics
but I'll elaborate a bit on on why, including 
* two approaches to the whole problem - and how your approach relates
  to the usual approach,
* an interesting tidbit about resampling t statistics.

First, I'm assuming that your x variable is irrelevant, only y matters,
and that sample1 is a proper subset of df.  I'll also assume that you
want to look for differences in the mean, rather than arbitrary differences
(in which case you might use e.g. a Kolmogorov-Smirnov test).

There are two general approaches to this problem:
(1) two-sample problem, sample1$y vs df$y[rows other than sample 1]
(2) the approach you outlined, thinking of sample1$y as part of df$y.

Consider (1), and call the two data sets y1 and y2
The basic permutation test approach is
* compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2)
* repeat  (or 9) times:
  draw a sample of size n1 from the pooled data, call that Y1, call the rest Y2
  compute theta(Y1, Y2)
* P-value for a one-sided test is (1 + k) / (1 + )
  where k is the number of permutation samples with theta(Y1,Y2) = theta(y1,y2)

The test statistic could be
  mean(y1) - mean(y2)
  mean(y1)
  sum(y1)
  t-statistic (pooled variance)
  P-value for a t-test (pooled variance)
  mean(y1) - mean(pooled data)
  t-statistic (unpooled variance)
  P-value for a t-test (unpooled variance)
  median(y1) - median(y2)
  ...
The first six of those are equivalent - they give exactly the same P-value
for the permutation test.  The reason is that those test statistics
are monotone transformations of each other, given the data.
Hence, doing the pooled-variance t calculations gains nothing.

Now consider your approach (2).  That is equivalent to the permutation
test using the test statistic:  mean(y1) - mean(pooled data).

Why not a bootstrap?  E.g. pool the data and draw samples of size
n1 and n2 from the pooled data, independently and with replacement.
That is similar to the permutation test, but less accurate.  Probably
the easiest way to see this is to suppose there is 1 outlier in the pooled data.
In any permutation iteration there is exactly 1 outlier among the two samples.
With bootstrapping, there could be 0, 1, 2, 
The permutation test answers the question - given that there is exactly
1 outlier in my combined data, what is the probability that random chance
would give a difference as large as I observed.  The bootstrap would
answer some other question.

Tim Hesterberg
NEW!  Mathematical Statistics with Resampling and R, Chihara  Hesterberg
http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8
http://home.comcast.net/~timhesterberg
 (resampling, water bottle rockets, computers to Guatemala, shower = 2650 light 
bulbs, ...)


On Oct 8, 2011, at 16:04 , francy wrote:

 Hi,

 I am having trouble understanding how to approach a simulation:

 I have a sample of n=250 from a population of N=2,000 individuals, and I
 would like to use either permutation test or bootstrap to test whether this
 particular sample is significantly different from the values of any other
 random samples of the same population. I thought I needed to take random
 samples (but I am not sure how many simulations I need to do) of n=250 from
 the N=2,000 population and maybe do a one-sample t-test to compare the mean
 score of all the simulated samples, + the one sample I am trying to prove
 that is different from any others, to the mean value of the population. But
 I don't know:
 (1) whether this one-sample t-test would be the right way to do it, and how
 to go about doing this in R
 (2) whether a permutation test or bootstrap methods are more appropriate

 This is the data frame that I have, which is to be sampled:
 df-
 i.e.
 x y
 1 2
 3 4
 5 6
 7 8
 . .
 . .
 . .
 2,000

 I have this sample from df, and would like to test whether it is has extreme
 values of y.
 sample1-
 i.e.
 x y
 3 4
 7 8
 . .
 . .
 . .
 250

 For now I only have this:

 R=999 #Number of simulations, but I don't know how many...
 t.values =numeric(R)  #creates a numeric vector with 999 elements, which
 will hold the results of each simulation.
 for (i in 1:R) {
 sample1 - df[sample(nrow(df), 250, replace=TRUE),]

 But I don't know how to continue the loop: do I calculate the mean for each
 simulation and compare it to the population mean?
 Any help you could give me would be very appreciated,
 Thank you.

The straightforward way would be a permutation test, something like this

msamp - mean(sample1$y)
mpop - mean(df$y)
msim - replicate(1, mean(sample(df$y, 250)))

sum(abs(msim-mpop) = abs(msamp-mpop))/1

I don't really see a reason to do bootstrapping here. You say you want to test 
whether your sample could be a random sample from the population, so just 
simulate that sampling 

[R] variable name question

2011-10-09 Thread deepankar

Hi All,

This is surely an easy question but somehow I am not being able to get it.

I am using R 2.13.2 and have a data set where variable names like this 
appear:


pci1990, pci1991, ... , pci2009.

pci1990 has data on per capita income for 1990, pci1991 has data on 
per capita income for 1991, and so on.


I would like to create the logarithm of per capita for each of the year 
and could do so in STATA with the following commands:


forvalues number = 1990/2009 {
gen lpci`number' = log(pci`number')
}

What would be the corresponding set of commands in R?

Thanks a lot in advance.

Deepankar

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cuhre usage ?? multidimensional integration

2011-10-09 Thread R. Michael Weylandt
Have you taken a look at the provided introductory materials? type
help.start() to get An Introduction to R which I believe is provided
as a part of every pre-packaged version of R so it's most likely
already on your machine.

Read that and then we can sort through how to correctly implement your
function.

Michael

On Sat, Oct 8, 2011 at 7:28 PM, sevenfrost linshuan...@gmail.com wrote:

 I'm not quite familiar with R. Could you recommend some relative material.

 Actually, I'm not intend to loop from 1 to 1. I'm just trying to see if it
 works. It's supposed to be some len1.

 I'm trying to solve the following example:

 there are 3 types of variables(x,w,t), they are not independent and have
 different distribution.
 Within each type, there are more than 1 variables. I need to integrate all
 the variable.

 joint pdf is like f(x,w,t)=∏_i▒〖f1(wi)f2(ti)〗(∏_j▒〖f3(xj)
 ∏_j▒〖f4(xj*wi-ti〗)〗)


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3886287.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Permutation or Bootstrap to obtain p-value for one sample

2011-10-09 Thread francesca casalino
Dear Peter and Tim,

Thank you very much for taking the time to explain this to me! It is much
more clear now.
And sorry for using the space here maybe inappropriately, I really hope this
is OK and gets posted, I think it is really important that non-statisticians
like myself get a good idea of the concepts behind the functions of R. I am
really grateful you went through this with me.

-f

2011/10/9 Tim Hesterberg timhesterb...@gmail.com

 I'll concur with Peter Dalgaard that
 * a permutation test is the right thing to do - your problem is equivalent
  to a two-sample test,
 * don't bootstrap, and
 * don't bother with t-statistics
 but I'll elaborate a bit on on why, including
 * two approaches to the whole problem - and how your approach relates
  to the usual approach,
 * an interesting tidbit about resampling t statistics.

 First, I'm assuming that your x variable is irrelevant, only y matters,
 and that sample1 is a proper subset of df.  I'll also assume that you
 want to look for differences in the mean, rather than arbitrary differences
 (in which case you might use e.g. a Kolmogorov-Smirnov test).

 There are two general approaches to this problem:
 (1) two-sample problem, sample1$y vs df$y[rows other than sample 1]
 (2) the approach you outlined, thinking of sample1$y as part of df$y.

 Consider (1), and call the two data sets y1 and y2
 The basic permutation test approach is
 * compute the test statistic theta(y1, y2), e.g. mean(y1)-mean(y2)
 * repeat  (or 9) times:
  draw a sample of size n1 from the pooled data, call that Y1, call the rest
 Y2
  compute theta(Y1, Y2)
 * P-value for a one-sided test is (1 + k) / (1 + )
  where k is the number of permutation samples with theta(Y1,Y2) =
 theta(y1,y2)

 The test statistic could be
  mean(y1) - mean(y2)
  mean(y1)
  sum(y1)
  t-statistic (pooled variance)
  P-value for a t-test (pooled variance)
  mean(y1) - mean(pooled data)
  t-statistic (unpooled variance)
  P-value for a t-test (unpooled variance)
  median(y1) - median(y2)
  ...
 The first six of those are equivalent - they give exactly the same P-value
 for the permutation test.  The reason is that those test statistics
 are monotone transformations of each other, given the data.
 Hence, doing the pooled-variance t calculations gains nothing.

 Now consider your approach (2).  That is equivalent to the permutation
 test using the test statistic:  mean(y1) - mean(pooled data).

 Why not a bootstrap?  E.g. pool the data and draw samples of size
 n1 and n2 from the pooled data, independently and with replacement.
 That is similar to the permutation test, but less accurate.  Probably
 the easiest way to see this is to suppose there is 1 outlier in the pooled
 data.
 In any permutation iteration there is exactly 1 outlier among the two
 samples.
 With bootstrapping, there could be 0, 1, 2, 
 The permutation test answers the question - given that there is exactly
 1 outlier in my combined data, what is the probability that random chance
 would give a difference as large as I observed.  The bootstrap would
 answer some other question.

 Tim Hesterberg
 NEW!  Mathematical Statistics with Resampling and R, Chihara  Hesterberg

 http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8
 http://home.comcast.net/~timhesterberg
  (resampling, water bottle rockets, computers to Guatemala, shower = 2650
 light bulbs, ...)


 On Oct 8, 2011, at 16:04 , francy wrote:
 
  Hi,
 
  I am having trouble understanding how to approach a simulation:
 
  I have a sample of n=250 from a population of N=2,000 individuals, and I
  would like to use either permutation test or bootstrap to test whether
 this
  particular sample is significantly different from the values of any
 other
  random samples of the same population. I thought I needed to take random
  samples (but I am not sure how many simulations I need to do) of n=250
 from
  the N=2,000 population and maybe do a one-sample t-test to compare the
 mean
  score of all the simulated samples, + the one sample I am trying to
 prove
  that is different from any others, to the mean value of the population.
 But
  I don't know:
  (1) whether this one-sample t-test would be the right way to do it, and
 how
  to go about doing this in R
  (2) whether a permutation test or bootstrap methods are more appropriate
 
  This is the data frame that I have, which is to be sampled:
  df-
  i.e.
  x y
  1 2
  3 4
  5 6
  7 8
  . .
  . .
  . .
  2,000
 
  I have this sample from df, and would like to test whether it is has
 extreme
  values of y.
  sample1-
  i.e.
  x y
  3 4
  7 8
  . .
  . .
  . .
  250
 
  For now I only have this:
 
  R=999 #Number of simulations, but I don't know how many...
  t.values =numeric(R)  #creates a numeric vector with 999 elements, which
  will hold the results of each simulation.
  for (i in 1:R) {
  sample1 - df[sample(nrow(df), 250, replace=TRUE),]
 
  But I don't know how to continue 

Re: [R] Substract previous element

2011-10-09 Thread Alaios
Thanks a lot for the help




From: Jeff Newmiller jdnew...@dcn.davis.ca.us

Sent: Sunday, October 9, 2011 6:35 PM
Subject: Re: [R] Substract previous element


Vectors are not matrices in R, though matrices are special cases of vectors.

See ?diff for a solution that works with vectors.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.



Dear all,
I have a matrix with data and I want to substract from every value the 
previous element.

Let's assume that my vector(matrix) is c-(1,2,3,4,5)
I want to get remove_previous c-(0,1,2,3,4).

How I can do that efficiently in R?

I would like to thank you in advance for your help
B.R
Alex

[[alternative HTML version deleted]]



R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] apply to a matrix and insert in the middle of an array

2011-10-09 Thread Maas James Dr (MED)
If possible I'd like to produce a function that applies a formula to a column 
in a matrix (essentially calculating the mse) and then inserts it between 
values of a an array ...

confusing I know, here is an example of what I'm trying to accomplish:

## create a matrix
(a - matrix(c(3,6,4,8,5,9,12,15),nrow=4))
## get the mean of all columns
(b - apply(a,2,mean))
## calculate the mse of column 1
(c - (sd(a[,1])/sqrt(length(a[,1]
## calculate the mse of column 2
(d - (sd(a[,2])/sqrt(length(a[,2]
## now create a new vector with each mean value and its corresponding
## mse right beside it
(e - c(b[1],c,b[2],d))

Would anyone have any suggestions how to accomplish this using an apply 
statement?

Thanks a bunch,

J
===
Dr. Jim Maas
University of East Anglia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA from imported data has only 1 degree of freedom

2011-10-09 Thread David Winsemius


On Oct 9, 2011, at 11:10 AM, shardman wrote:


Hi,

I'm trying to analyse some data I have imported into R from a .csv  
file but
when I carry out the aov command the results show only one degree of  
freedom
when there should be 14. Does anyone know why? I'd really appreciate  
some

help, the data is pasted below.



To me this appears likely to be homework. If so then please read your  
college's rules regarding  academic honesty. In any case, please read  
the Posting Guide.




/The imported table looks ike this this:/

   Order Transect Sample Abundance
1  Coleoptera1  113
2  Coleoptera1  212
3  Coleoptera1  311
4  Coleoptera1  413
5  Coleoptera1  5 6
6  Coleoptera2  118
7  Coleoptera2  218
8  Coleoptera2  316
9  Coleoptera2  421
10 Coleoptera2  511
11 Coleoptera3  119
12 Coleoptera3  216
13 Coleoptera3  3 9
14 Coleoptera3  432
15 Coleoptera3  529

/The command I am using is this:/

anova2-aov(Abundance~Transect,data=beetle)

/The results come out like this:/

Df Sum Sq Mean Sq F value  Pr(F)
Transect 1 250.00 250.000  7.2394 0.01852 *
Residuals   13 448.93  34.533
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Cheers,
Sam


--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887528.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with twitteR package

2011-10-09 Thread Steven Oliver
Hey Guys,

I just started fooling around with the twitteR package in order to get a record 
of all tweets from a single public account. When I run userTimeline, I get the 
default 20 most recent tweets just fine. However, when I specify an arbitrary 
number of tweets (as described in the documentation from June 14th, 2011), I 
get the following warning:

 bjaTweets-userTimeline(BeijingAir, n=50)
Warning message:
In mapCurlOptNames(names(.els), asNames = TRUE) :
  Unrecognized CURL options: n

Does anyone familiar with the twitteR package know what is going on with 
options? Alternatively, if there are any other simple means for getting this 
sort of data?

Steve
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast or space-efficient lookup?

2011-10-09 Thread Patrick Burns

I think you are looking for the 'data.table'
package.

On 09/10/2011 17:31, ivo welch wrote:

Dear R experts---I am struggling with memory and speed issues.  Advice
would be appreciated.

I have a long data set (of financial stock returns, with stock name
and trading day).  All three variables, stock return, id and day, are
irregular.  About 1.3GB in object.size (200MB on disk).  now, I need
to merge the main data set with some aggregate data (e.g., the SP500
market rate of return, with a day index) from the same day.  this
market data set is not a big data set (object.size=300K, 5 columns,
12000 rows).

let's say my (dumb statistical) plan is to run one grand regression,
where the individual rate of return is y and the market rate of return
is x.  the following should work without a problem:

combined- merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE )
lm( stockreturn ~ marketreturn, data=combined )

alas, the merge is neither space-efficient nor fast.  in fact, I run
out of memory on my 16GB linux machine.  my guess is that by whittling
it down, I could work it (perhaps doing it in chunks, and then
rbinding it), but this is painful.

in perl, I would define a hash with the day as key and the market
return as value, and then loop over the main data set to supplement
it.

is there a recommended way of doing such tasks in R, either super-fast
(so that I merge many many times) or space efficient (so that I merge
once and store the results)?

sincerely,

/iaw


Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] two-dimensional Kolmogorov-Smirnow (2DKS) test

2011-10-09 Thread Beaulieu . Jake
The reference below describes the utility of the two-dimensional 
Kolmogorov-Smirnow (2DKS) test for detecting relationships in bivariate 
data.  If this test has been implemented in R I would love to know about 
it!

Thanks,
jake


Garvey, J. E., E.A. Marschall, and R.A. Wright (1998). From star charts 
to stoneflies: detecting relationships in continuous bivariate data. 
Ecology 79(2): 442-447.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply to a matrix and insert in the middle of an array

2011-10-09 Thread David Winsemius


On Oct 9, 2011, at 1:04 PM, Maas James Dr (MED) wrote:

If possible I'd like to produce a function that applies a formula to  
a column in a matrix (essentially calculating the mse) and then  
inserts it between values of a an array ...


confusing I know, here is an example of what I'm trying to accomplish:

## create a matrix
(a - matrix(c(3,6,4,8,5,9,12,15),nrow=4))
## get the mean of all columns
(b - apply(a,2,mean))
## calculate the mse of column 1
(c - (sd(a[,1])/sqrt(length(a[,1]
## calculate the mse of column 2
(d - (sd(a[,2])/sqrt(length(a[,2]


The sd function, like the var function, returns its values based on  
columns, so just sd(a) fives you a vector of columns. Construting an  
interleaving vector of column indices can be done matrix manipulations:


c(b, sd(a))[
 c(
matrix(1:(2*length(b)), 2, byrow=TRUE)  # create a  
row-oriented matrix
   )   # the interleaving matrix is then extracted as a  
vector by columns

   ]
#[1]  5.25  2.217356 10.25  4.272002

( You could also use the apply method of returning function(x) 
{ c(mean(x), sd(x) } from each column.) and then wrap the c() function  
around that:


c( apply(a, 2, function(x){ c(mean(x), sd(x) ) }) )
#[1]  5.25  2.217356 10.25  4.272002


## now create a new vector with each mean value and its corresponding
## mse right beside it
(e - c(b[1],c,b[2],d))

Would anyone have any suggestions how to accomplish this using an  
apply statement?


Thanks a bunch,

J
===
Dr. Jim Maas
University of East Anglia


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable name question

2011-10-09 Thread Carlos Ortega
Hello,

In R you just need to take the log of the whole whole data.frame where you
have your pci* and store in a new variable.
You do not need to use a for loop:

log.df - log(your_data_frame)

Regards,
Carlos Ortega
www.qualityexcellence.es


On Sun, Oct 9, 2011 at 5:34 PM, deepankar db...@econs.umass.edu wrote:

 Hi All,

 This is surely an easy question but somehow I am not being able to get it.

 I am using R 2.13.2 and have a data set where variable names like this
 appear:

 pci1990, pci1991, ... , pci2009.

 pci1990 has data on per capita income for 1990, pci1991 has data on per
 capita income for 1991, and so on.

 I would like to create the logarithm of per capita for each of the year and
 could do so in STATA with the following commands:

 forvalues number = 1990/2009 {
gen lpci`number' = log(pci`number')
 }

 What would be the corresponding set of commands in R?

 Thanks a lot in advance.

 Deepankar

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] two-dimensional Kolmogorov-Smirnow (2DKS) test

2011-10-09 Thread David Winsemius


On Oct 9, 2011, at 1:46 PM, beaulieu.j...@epamail.epa.gov wrote:


The reference below describes the utility of the two-dimensional
Kolmogorov-Smirnow (2DKS) test for detecting relationships in  
bivariate
data.  If this test has been implemented in R I would love to know  
about

it!



I have not been able to find an R implementation. (Tried RSiteSearch  
and RSeek with 2d kolmogorov and pursued promising links and  
citations.  This 2005 article suggests that it is not a simple (or  
even a well-defined) problem:


http://bura.brunel.ac.uk/bitstream/2438/1166/1/acat2007.pdf

This effort at a parallelized solution reports little improvement,  
perhaps supporting the Lopes, el al conclusion that the Cooks  
algorithm is not O(n*log(n) after all.


And this is a page with a link to some Matlab code:

http://www.subcortex.net/research/code/testing-for-differences-in-multidimensional-distributions



Thanks,
jake


Garvey, J. E., E.A. Marschall, and R.A. Wright (1998). From star  
charts

to stoneflies: detecting relationships in continuous bivariate data.
Ecology 79(2): 442-447.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable name question

2011-10-09 Thread David Winsemius


On Oct 9, 2011, at 2:24 PM, Carlos Ortega wrote:


Hello,

In R you just need to take the log of the whole whole data.frame  
where you

have your pci* and store in a new variable.
You do not need to use a for loop:

log.df - log(your_data_frame)


Possibly with a selection for the column names in question:

log.pci.df - log( your_data_frame[ , grep(^pci,  
names(your_data_frame) ) ]


You can also cbind() your identifiers and covariates columns to the  
result.


--
David.


Regards,
Carlos Ortega
www.qualityexcellence.es


On Sun, Oct 9, 2011 at 5:34 PM, deepankar db...@econs.umass.edu  
wrote:



Hi All,

This is surely an easy question but somehow I am not being able to  
get it.


I am using R 2.13.2 and have a data set where variable names like  
this

appear:

pci1990, pci1991, ... , pci2009.

pci1990 has data on per capita income for 1990, pci1991 has  
data on per

capita income for 1991, and so on.

I would like to create the logarithm of per capita for each of the  
year and

could do so in STATA with the following commands:

forvalues number = 1990/2009 {
  gen lpci`number' = log(pci`number')
}

What would be the corresponding set of commands in R?

Thanks a lot in advance.

Deepankar

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help 


PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.html http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA from imported data has only 1 degree of freedom

2011-10-09 Thread shardman
Hi David,

Thanks for your message.

I can assure you this is not homework. I'm working on an ecology project and
am trying to analyse the results from the fieldwork. I don't want other
people to do the work for me I was just hoping someone might be able to spot
where I have made a mistake, I'm still teaching myself to use R after having
used SPSS for a long time.

I did read the posting guidelines but couldn't see any reason not to ask for
help, I'm sorry if I've got it wrong,
Yours,
Sam

--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887979.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplots

2011-10-09 Thread Carlos Ortega
Hi,

Another way to do that is with function barchart() in package lattice.
Barchart requires a function which relates your variables with the option to
specify groups.
Check the examples (are under xyplot help) to apply them to your case.

Regards,
Carlos Ortega
www.qualityexcellence.es


On Thu, Oct 6, 2011 at 11:16 PM, Daniel Winkler winkler...@gmail.comwrote:

 Hello,

 I have somewhat of a weird data set and am attempting to create a barplot
 with it.

 I have 8 columns with different variables and their percentages. I have 1
 column with representations of 4 different treatments the variables
 undergo.
 I also have 1 column with year the data was recorded. I want to create a
 bar
 plot that plots all 8 variables grouped by treatment and year.

 I've tried creating subsets of each of the variables, no luck.

 My variables are arranged similar to this:

 Vascular.plants Solid.rock Gravel.and.cobbles
72.51.0  5
   67.52.0  6
67.52.5  9
59.02.0 25

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA from imported data has only 1 degree of freedom

2011-10-09 Thread David Winsemius


On Oct 9, 2011, at 2:08 PM, shardman wrote:


Hi David,

Thanks for your message.

I can assure you this is not homework. I'm working on an ecology  
project and
am trying to analyse the results from the fieldwork. I don't want  
other
people to do the work for me I was just hoping someone might be able  
to spot
where I have made a mistake, I'm still teaching myself to use R  
after having

used SPSS for a long time.

I did read the posting guidelines but couldn't see any reason not to  
ask for

help, I'm sorry if I've got it wrong,


Please read the Posting Guide again. Nabble is not the standard venue  
where people read this list and you seem to have missed the part where  
the PG asked you to include relevant context in replies. It also also  
asks you to describe your scientific domain (which you have now done)  
and academic position (at least in part so the list can remain non- 
homework oriented).


My memory from your initial posting was that you had numeric  
identifiers for you groups and did not wrap the group variable name in  
factor(), so it was treated as a continuous variable. Doesn't SPSS  
have some mechanism to flag a variable as discrete?


Perhaps(from memory) something along these lines:
aov(outcome ~ factor(group), data=dat)

(Further comments from memory of earlier posting.)
You also were asking why there were not 14 df in the output but there  
were since 1+13 = 14. Surely you were not expecting 14 df to be  
associated with the grouping variable when there were only 3 groups?





Yours,
Sam

--
View this message in context: 
http://r.789695.n4.nabble.com/ANOVA-from-imported-data-has-only-1-degree-of-freedom-tp3887528p3887979.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple levelplot with title

2011-10-09 Thread Carlos Ortega
Hi,

Use function ltext() instead, also available in lattice package.

Regards,
Carlos Ortega
www.qualityexcellence.es


On Sun, Oct 9, 2011 at 6:30 PM, Richard O. Legendi 
richard.lege...@gmail.com wrote:

 Hi all,

 I'm new to R and to the mailing list, so please bear with me :-)

 I would like to create multiple levelplots on the same chart with a nice
 main title with something like this:

  print(levelplot(matrix(c(1,2,**3,4), 2, 2)), split=c(1, 1, 2, 1))
  print(levelplot(matrix(c(1,2,**3,4), 2, 2)), split=c(2, 1, 2, 1),
newpage=FALSE)

 I found a trick:

  mtext(Test, outer = TRUE, cex = 1.5)

 here:

  
 https://stat.ethz.ch/**pipermail/r-help/2008-July/**168163.htmlhttps://stat.ethz.ch/pipermail/r-help/2008-July/168163.html

 but it doesn't works for me.

 Could anyone please show me some pointers what should I read in order to
 get an insight why this isn't working as I expect?

 What I managed to find a workaround by using panel.text(), but I don't
 really like it since it requires defined x/y coordinates and not scales if
 the picture is resized.

panel.text(x=20, y=110, Test)

 Thanks in advance!
 Richard

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bartlett's Test of Sphericity

2011-10-09 Thread Fernando Henrique Ferraz Pereira da Rosa
You could also check this function I implemented awhile back:

http://www.fernandohrosa.com.br/en/P/sphericity-test-for-covariance-matrices-in-r-sphericity-test/

On Fri, Jun 17, 2011 at 4:43 PM, thibault grava thibault.gr...@gmail.comwrote:

 Hello Dear R user,

 I want to conduct a Principal components analysis and I need to run two
 tests to check whether I can do it or not. I found how to run the KMO
 test, however i cannot find an R fonction for the Bartlett's test of
 sphericity. Does somebody know if it exists?

 Thanks for your help!

 Thibault

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Though this be randomness, yet there is structure in't.
Fernando H Rosa - Statistician
http://www.fernandohrosa.com.br / http://www.feferraz.net - Estatística,
Matemática e Computação
BankReview.com.br http://www.bankreview.com.br/ - Escolha melhor seus
serviços financeiros!
AprendaAlemao.net http://aprendaalemao.net/ - Seu ponto de partida para
melhorar seu Alemão!
@fhrosa

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Connecting points over missing observations in Lattice

2011-10-09 Thread Gabor Grothendieck
On Sat, Oct 8, 2011 at 5:59 AM, Allan Sikk a.s...@ucl.ac.uk wrote:
 Hello,

 I'm trying to plot connected time series of two variables in a lattice plot:
 xyplot(y1 + y2 ~ t,  data=size, type=b)

 y2 has missing data for some of the observations and some points are
 therefore not connected. It would make theoretical sense to connect the
 points - is there a way of doing that? (Without filling the obserations
 using package 'zoo').


Break it up into lines and points and draw each separately (first
three approaches) or use a custom panel (approach 4):

# set up data
library(zoo)
library(lattice)
DF - with(anscombe, data.frame(x = 1:11, y1, y2))
DF[2, 2] - DF[3, 3] - NA
z - read.zoo(DF, FUN = identity)

# approach 1
xyplot(cbind(na.approx(z), z), type = list(l, l, p, p), screen
= 1, col = 1:2)

# approach 2
xyplot(na.approx(z), screen = 1, col = 1:2)
trellis.focus()
panel.points(z[, 1], col = 1)
panel.points(z[, 2], col = 2)
trellis.unfocus()

# approach 3
xyplot(z, screen = 1, type = n)
trellis.focus()
panel.points(na.omit(z[, 1]), col = 1, type = o)
panel.points(na.omit(z[, 2]), col = 2, type = o)
trellis.unfocus()

# approach 4
xyplot(y1 + y2 ~ x, DF, panel = panel.superpose,
   panel.groups = function(x, y, subscripts, groups, ..., group.number)
 with(na.omit(data.frame(x, y)), panel.lines(x, y, type = o, col
= group.number)))

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expression set (Bioconductor) problem

2011-10-09 Thread Juliet Hannah
Note that exprs returns a matrix, so we can manipulate that just as we
would for any other type of matrix. There is also a Bioconductor
mailing list, which may be helpful.

On Thu, Oct 6, 2011 at 4:56 AM, Clayton K Collings ccoll...@purdue.edu wrote:
 Hello R people,

dim(exprs(estrogenrma)

 I have an expressionSet with 8 samples and 12695 features (genes)


 estrogenrma$estrogen
  present present absent absent present present absent absent
 estrogenrma$time.h
  10 10 10 10 48 48 48 48

 present - grep(present, as.character(estrogenrma$estrogen))
 absent  - grep(absent, as.character(estrogenrma$estrogen))
 ten - grep(10, as.character(estrogenrma$time.h))
 fortyeight  - grep(48, as.character(estrogenrma$time.h))

 present.10 - estrogenrma[, intersect(present, ten)]
 present.48 - estrogenrma[, intersect(present, fortyeight)]
 absent.10 - estrogenrma[, intersect(absent, ten)]
 absent.48 - estrogenrma[, intersect(absent, fortyeight)]


 present.10, present.48, absent.10, and absent.48 are four expression sets
 with two samples and 12695 features.

 How can I make a new 2 new expressionsets, each have 12695 features and one 
 sample
 where
 expressionset1 = (present.10 + present.48) / 2
 expressionset2 = (absent.10 + absent.48) / 2
 ?

 Thanks,
 Clayton

 - Original Message -
 From: Tal Galili tal.gal...@gmail.com
 To: SML paral...@lafn.org
 Cc: r-help@r-project.org
 Sent: Thursday, October 6, 2011 4:09:43 AM
 Subject: Re: [R] Mean(s) from values in different row?

 One way for doing it would be to combine the columns using paste and then
 use tapply to get the means.

 For example:

 set.seed(32341)
 a1 = sample(c(a,b), 100,replace = T)
 a2 = sample(c(a,b), 100,replace = T)
 y = rnorm(100)
 tapply(y,paste(a1,a2), mean)



 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Thu, Oct 6, 2011 at 8:40 AM, SML paral...@lafn.org wrote:

 Hello:

 Is there a way to get a mean from values stored in different rows?

 The data looks like this:
  YEAR-1, JAN, FEB, ..., DEC
  YEAR-2, JAN, FEB, ..., DEC
  YEAR-3, JAN, FEB, ..., DEC

 What I want is the mean(s) for just the consecutive winter months:
  YEAR-1.DEC, YEAR-2.JAN, YEAR-2.FEB
  YEAR-2.DEC, YEAR-3.JAN, YEAR-3.FEB
  etc.

 Thanks.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mittag-Leffler function

2011-10-09 Thread Eric Ferreira
Dear colleagues,

Do you know any R code implemented for the Mittag-Leffler function
(generalisation of the exponential function, fractional calculus) ?

Thanks in advance,

-- 
Dr Eric B Ferreira
Exact Sciences Department
Federal University of Alfenas
Brazil

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable name question

2011-10-09 Thread Rolf Turner

On 10/10/11 04:53, R. Michael Weylandt wrote:

SNIP

Try this:

for (i in 1990:2009) {
 varName = paste(pci, i, collapse = )
 assign(varName, log(get(varName))
}


SNIP

I believe that ``sep=  '' is needed here rather than collapse.

Try:
paste(junk,42,collapse=)

You get

[1] junk 42

with a space in it.  Here paste is using the default value of sep, 
namely  ,

and is not using collapse at all, since there is nothing to collapse (the
value is a scalar to start with).

Whereas

paste(junk,42,sep=)

gives

[1] junk42

which is what is wanted.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variable name question

2011-10-09 Thread Deepankar Basu
Thanks for both comments. Indeed the sep =  is needed.

On Sun, Oct 9, 2011 at 9:43 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:

 On 10/10/11 04:53, R. Michael Weylandt wrote:

 SNIP

  Try this:

 for (i in 1990:2009) {
 varName = paste(pci, i, collapse = )
 assign(varName, log(get(varName))
 }

  SNIP

 I believe that ``sep=  '' is needed here rather than collapse.

 Try:
paste(junk,42,collapse=)

 You get

[1] junk 42

 with a space in it.  Here paste is using the default value of sep, namely 
 ,
 and is not using collapse at all, since there is nothing to collapse (the
 value is a scalar to start with).

 Whereas

paste(junk,42,sep=)

 gives

[1] junk42

 which is what is wanted.

cheers,

Rolf Turner


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ANOVA from imported data has only 1 degree of freedom

2011-10-09 Thread Ben Bolker
shardman samuelhardman at hotmail.com writes:

 
 Hi,
 
 I'm trying to analyse some data I have imported into R from a .csv file but
 when I carry out the aov command the results show only one degree of freedom
 when there should be 14. Does anyone know why? I'd really appreciate some
 help, the data is pasted below.
 
 /The imported table looks ike this this:/
 
 Order Transect Sample Abundance
 1  Coleoptera1  113
[snip]
 15 Coleoptera3  529
 
 /The command I am using is this:/
 
 anova2-aov(Abundance~Transect,data=beetle)
 

I was going to tell you to read up on the distinction between
factors and numeric variables in statistical models, but I can't
immediately find a reference on line (this is bound to be
in Peter Dalgaard's book or any other basic-stats-with-R
text).

str() will tell you whether the columns are numeric or factors.

Try

beetle - transform(beetle,Transect=factor(Transect))

[or ?read.csv and use colClasses explicitly to specify
the classes of the columns]

before running your anova.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast or space-efficient lookup?

2011-10-09 Thread ivo welch
hi patrick.  thanks.  I think you are right..

combined - merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE )
lm( stockreturn ~ marketreturn, data=combined )

becomes something like

main - as.data.table(main)
setkey(main, mmdd)
aggregate.data - as.data.table(aggregate.data)
setkey(aggregate.data, mmdd)
main - main[ aggregate.data ]

this is fast and memory efficient.

for me, on the plus side, data.table is a data.frame, so it can be
used easily elsewhere.  on the plus side, data.table is
super-efficient.  on the minus side, data.table often has very strange
syntax.  for example, main[aggregate.data] is counterintuitive.
passing of functions in a slot that should be a tensor index is also
strange.  I would much prefer it if all non-tensor functionality was
in functions, and not in arguments following [ ].

I have written this before:  Given that applied end users of
statistics typically use data.frame as their main container for data
sets, data.frame should be as efficient and tuned as possible.  cell
assignments should be fast.  indexing and copying should be fast.  it
would give R a whole lot more appeal.  the functionality in data.table
should be core functionality, not requiring an add-on with strange
syntax.  just my 5 cents...of course, the R developers are saints,
putting in a lot of effort with no compense, so complaining is unfair.
 and thanks to Matthew Doyle for writing data.table, without which I
couldn't do this AT ALL.

regards,

/iaw


Ivo Welch (ivo.we...@gmail.com)





On Sun, Oct 9, 2011 at 10:42 AM, Patrick Burns pbu...@pburns.seanet.com wrote:
 I think you are looking for the 'data.table'
 package.

 On 09/10/2011 17:31, ivo welch wrote:

 Dear R experts---I am struggling with memory and speed issues.  Advice
 would be appreciated.

 I have a long data set (of financial stock returns, with stock name
 and trading day).  All three variables, stock return, id and day, are
 irregular.  About 1.3GB in object.size (200MB on disk).  now, I need
 to merge the main data set with some aggregate data (e.g., the SP500
 market rate of return, with a day index) from the same day.  this
 market data set is not a big data set (object.size=300K, 5 columns,
 12000 rows).

 let's say my (dumb statistical) plan is to run one grand regression,
 where the individual rate of return is y and the market rate of return
 is x.  the following should work without a problem:

 combined- merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE
 )
 lm( stockreturn ~ marketreturn, data=combined )

 alas, the merge is neither space-efficient nor fast.  in fact, I run
 out of memory on my 16GB linux machine.  my guess is that by whittling
 it down, I could work it (perhaps doing it in chunks, and then
 rbinding it), but this is painful.

 in perl, I would define a hash with the day as key and the market
 return as value, and then loop over the main data set to supplement
 it.

 is there a recommended way of doing such tasks in R, either super-fast
 (so that I merge many many times) or space efficient (so that I merge
 once and store the results)?

 sincerely,

 /iaw

 
 Ivo Welch (ivo.we...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Patrick Burns
 pbu...@pburns.seanet.com
 twitter: @portfolioprobe
 http://www.portfolioprobe.com/blog
 http://www.burns-stat.com
 (home of 'Some hints for the R beginner'
 and 'The R Inferno')


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast or space-efficient lookup?

2011-10-09 Thread Steve Lianoglou
Hi Ivo,

I'll just be brief, but regarding data.table's syntax: one person's
strange is another's intuitive :-)

Note also that data.table also provides a `merge.data.table` function
which works pretty much as you expect merge.data.frame to. The
exception being that its `by` argument will, by default, work on the
intersection of keys between the two data.tables being merged, and not
all columns of the same name between the two tables.

Hope that's helpful.

-steve

On Sun, Oct 9, 2011 at 10:44 PM, ivo welch ivo.we...@gmail.com wrote:
 hi patrick.  thanks.  I think you are right..

 combined - merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE )
 lm( stockreturn ~ marketreturn, data=combined )

 becomes something like

 main - as.data.table(main)
 setkey(main, mmdd)
 aggregate.data - as.data.table(aggregate.data)
 setkey(aggregate.data, mmdd)
 main - main[ aggregate.data ]

 this is fast and memory efficient.

 for me, on the plus side, data.table is a data.frame, so it can be
 used easily elsewhere.  on the plus side, data.table is
 super-efficient.  on the minus side, data.table often has very strange
 syntax.  for example, main[aggregate.data] is counterintuitive.
 passing of functions in a slot that should be a tensor index is also
 strange.  I would much prefer it if all non-tensor functionality was
 in functions, and not in arguments following [ ].

 I have written this before:  Given that applied end users of
 statistics typically use data.frame as their main container for data
 sets, data.frame should be as efficient and tuned as possible.  cell
 assignments should be fast.  indexing and copying should be fast.  it
 would give R a whole lot more appeal.  the functionality in data.table
 should be core functionality, not requiring an add-on with strange
 syntax.  just my 5 cents...of course, the R developers are saints,
 putting in a lot of effort with no compense, so complaining is unfair.
  and thanks to Matthew Doyle for writing data.table, without which I
 couldn't do this AT ALL.

 regards,

 /iaw

 
 Ivo Welch (ivo.we...@gmail.com)





 On Sun, Oct 9, 2011 at 10:42 AM, Patrick Burns pbu...@pburns.seanet.com 
 wrote:
 I think you are looking for the 'data.table'
 package.

 On 09/10/2011 17:31, ivo welch wrote:

 Dear R experts---I am struggling with memory and speed issues.  Advice
 would be appreciated.

 I have a long data set (of financial stock returns, with stock name
 and trading day).  All three variables, stock return, id and day, are
 irregular.  About 1.3GB in object.size (200MB on disk).  now, I need
 to merge the main data set with some aggregate data (e.g., the SP500
 market rate of return, with a day index) from the same day.  this
 market data set is not a big data set (object.size=300K, 5 columns,
 12000 rows).

 let's say my (dumb statistical) plan is to run one grand regression,
 where the individual rate of return is y and the market rate of return
 is x.  the following should work without a problem:

 combined- merge( main, aggregate.data, by=day, all.x=TRUE, all.y=FALSE
 )
 lm( stockreturn ~ marketreturn, data=combined )

 alas, the merge is neither space-efficient nor fast.  in fact, I run
 out of memory on my 16GB linux machine.  my guess is that by whittling
 it down, I could work it (perhaps doing it in chunks, and then
 rbinding it), but this is painful.

 in perl, I would define a hash with the day as key and the market
 return as value, and then loop over the main data set to supplement
 it.

 is there a recommended way of doing such tasks in R, either super-fast
 (so that I merge many many times) or space efficient (so that I merge
 once and store the results)?

 sincerely,

 /iaw

 
 Ivo Welch (ivo.we...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Patrick Burns
 pbu...@pburns.seanet.com
 twitter: @portfolioprobe
 http://www.portfolioprobe.com/blog
 http://www.burns-stat.com
 (home of 'Some hints for the R beginner'
 and 'The R Inferno')


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] HWEBayes, swapping the homozygotes genotype frequencies

2011-10-09 Thread Aaron Mackey
Without really knowing this code, I can guess that it may be the
triangular prior at work.  Bayes Factors are notorious for being sensitive
to the prior.  Presumably, the prior somehow prefers to see the rarer allele
as the BB, and not the AA homozygous genotype (this is a common
assumption: that AA is the reference, and thus the major, more frequent,
allele).

-Aaron

On Sat, Oct 8, 2011 at 7:52 PM, stat999 yumik091...@gmail.com wrote:

  I evaluated the Bayes factor in the k=2 allele case with a triangular
 prior under the null as in the example in the help file:


 HWETriangBF2(nvec=c(88,10,2))
 [1] 0.4580336

 When I swap the n11 entry and n22 entry of nvec, I received totally
 different Bayes factor:

 
  HWETriangBF2(nvec=c(2,10,88))
 [1] 5.710153
 

 In my understanding, defining the genotype frequency as n11 or n22 are
 arbitrary.
 So I was expecting the same value of Bayes factor.

 This is the case for conjugate Dirichlet prior:
 DirichNormHWE(nvec=c(88,10,2), c(1,1))/DirichNormSat(nvec=c(88,10,2),
 c(1,1,1))
 [1] 1.542047
 DirichNormHWE(nvec=c(2,10,88), c(1,1))/DirichNormSat(nvec=c(2,10,88),
 c(1,1,1))
 [1] 1.542047

 Could you explain why the HWETriangBF2 is returining completely different
 values of Bayes Factor??


 --
 View this message in context:
 http://r.789695.n4.nabble.com/HWEBayes-swapping-the-homozygotes-genotype-frequencies-tp3886313p3886313.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.