Re: [R] Tinn-R user guide (latex sources) available on GitHub

2013-11-27 Thread Joris Meys
Thank you for doing this!


On Wed, Nov 27, 2013 at 11:22 AM, Jose Claudio Faria 
joseclaudio.fa...@gmail.com wrote:

 Dears user,

 The Tinn-R User Guide is completely written in LaTeX and the idea
 behind this to be available on GitHub is that it has contributions
 from multiple users.

 If you find something that you would like to include or impruve:
 please, fell free to make it better.

 This User Guide have been developing under both OS: Windows and Linux.

 Under Windows: we have been using Tinn-R as editor and MikTeX as compiler.

 Under Linux: we have been using Vim (with LaTeX-Box plugin) as editor
 and TexLive as compiler.

 Link: https://github.com/jcfaria/Tinn-R-User-Guide

 Regards,
 --
 ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\
 Jose Claudio Faria
 Estatistica
 UESC/DCET/Brasil
 joseclaudio.faria at gmail.com
 Telefones:
 55(73)3680.5545 - UESC
 55(73)9100.7351 - TIM
 55(73)8817.6159 - OI
 ///\\\///\\\///\\\///\\\///\\\///\\\///\\\///\\\

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Should there be an R-beginners list?

2013-11-27 Thread Joris Meys
StackOverflow has certainly its merits, although I miss a bit the good ol'
Oxford sarcasm gems you find in this list.

This said : Beginner's list. Bad, bad idea. First rule in my classes is:
RTFI (Read The Fucking Internetzz). Anybody using R should be able to do a
basic Google search. A beginner's list is not going to help them in
learning that.

If beginners do the effort of following the posting guidelines, netiquette
or any other guide to getting help on the internet, they can safely use
this list.

Cheers
Joris




On Wed, Nov 27, 2013 at 9:47 AM, Rolf Turner r.tur...@auckland.ac.nzwrote:

 On 11/25/13 09:04, Rich Shepard wrote:

 On Sun, 24 Nov 2013, Yihui Xie wrote:

  Mailing lists are good for a smaller group of people, and especially
 good when more focused on discussions on development (including bug
 reports). The better place for questions is a web forum.


   I disagree. Mail lists push messages to subscribers while web fora
 require
 one to use a browser, log in, then pull messages. Not nearly as
 convenient.


 Well expressed Rich.  I agree with you completely.

 cheers,

 Rolf Turner


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/
 posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems when Apply a script to a list

2010-08-27 Thread Joris Meys
Where exactly did you put the sink() statement? I tried it with a 1000
dataframes and I have no problem whatsoever.

Cheers
Joris

On Fri, Aug 27, 2010 at 6:56 AM,  ev...@aueb.gr wrote:
 Joris,
 thank you very much for your help.
 It is very helpful for me.
 I still have a problem with sink stack although I put after my last print
 result sink().
 I want to ask you another one question. Do you know how can I have at the
 sink output file a message for which one set of the list are the exact
 results. When I use for instead of apply I had
 cat(\n***\nEstimation \n\nDataset
 Sim : ,
          i )
 Thank you in advance
 Evgenia
 Joris Meys writes:

 Answers below.
 On Thu, Aug 26, 2010 at 11:20 AM, Evgenia ev...@aueb.gr wrote:

 Dear users,
 ***I have a function f to simulate data from a model (example below
 used
 only to show my problems)
 f-function(n,mean1){
 a-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
 b-matrix(runif(n),ncol=5)
 data-rbind(a,b)
 out-data
 out}
 *I want to simulate 1000 datasets (here only 5) so I use
 S-list()
 for (i in 1:5){
 S[[i]]-f(n=10,mean1=0)}
 **I have a very complicated function  for estimation of a model which
 I
 want to apply to Each one of the above simulated datasets
 fun-function(data){data-as.matrix(data)
 sink(' Example.txt',append=TRUE)
          cat(\n***\nEstimation
 \n\nDataset Sim : ,
            i )
 d-data%*%t(data)
 s-solve(d)
 print(s)
 out-s
 out
 }
 results-list()
 for (i in 1:5){results[[i]]-fun(data=S[[i]])}

 My questions are:
 1) for some datasets system is computational singular and this causes
 execution of the for to stop.By this way I have only results until this
 problem happens.How can I pass over the execution for this step and have
 results for All other datasets for which function fun is applicable?

 see ?try, or ?tryCatch.
 I'd do something in the line of
 for(i in 1:5){
     tmp - try(fun(data=S[[i]]))
     results[[i]] - ifelse(is(tmp,try-error),NA,tmp)
 }
 Alternatively, you could also use lapply :
 results - lapply(S,function(x{
   tmp - try(fun(data=x))
    ifelse(is(tmp,try-error),NA,tmp)
 })

 2) After some runs to my program, I receive this error message someError
 in
 sink(output.txt) : sink stack is full . How can I solve this problem,
 as I
 want to have results of my program for 1000 datasets.

 That is because you never empty the sink. add sink() after the last
 line you want in the file. This will empty the sink buffer to the
 file. Otherwise R keeps everything in the memory, and that gets too
 full after a while.

 3) Using for is the correct way to run my proram for a list

 See the lapply solution.

 Thanks alot
 Evgenia
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html
 Sent from the R help mailing list archive at Nabble.com.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Joris Meys
 Statistical consultant
 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control
 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php






-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change value of a slot of an S4 object within a method.

2010-08-26 Thread Joris Meys
OK, I admit. It will never win a beauty price, but I won't either so
we go pretty well together.

Seriously, I am aware this is about the worst way to solve such a
problem. But as I said, rewriting the class definitions wasn't an
option any more. I'll definitely take your advice for the next
project.

Cheers
Joris

On Wed, Aug 25, 2010 at 9:36 PM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Howdy,

 My eyes start to gloss over on their first encounter of `substitute`
 ... add enough `eval`s and (even) an `expression` (!) to that, and
 you'll probably see me running for the hills ... but hey, if it makes
 sense to you, more power to you ;-)

 Glad you found a fix that works,

 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems when Apply a script to a list

2010-08-26 Thread Joris Meys
Answers below.

On Thu, Aug 26, 2010 at 11:20 AM, Evgenia ev...@aueb.gr wrote:

 Dear users,

 ***I have a function f to simulate data from a model (example below used
 only to show my problems)

 f-function(n,mean1){
 a-matrix(rnorm(n, mean1 , sd = 1),ncol=5)
 b-matrix(runif(n),ncol=5)
 data-rbind(a,b)
 out-data
 out}

 *I want to simulate 1000 datasets (here only 5) so I use
 S-list()

 for (i in 1:5){
 S[[i]]-f(n=10,mean1=0)}

 **I have a very complicated function  for estimation of a model which I
 want to apply to Each one of the above simulated datasets

 fun-function(data){data-as.matrix(data)
 sink(' Example.txt',append=TRUE)
          cat(\n***\nEstimation
 \n\nDataset Sim : ,
            i )
 d-data%*%t(data)
 s-solve(d)
 print(s)
 out-s
 out
 }
 results-list()
 for (i in 1:5){results[[i]]-fun(data=S[[i]])}


 My questions are:
 1) for some datasets system is computational singular and this causes
 execution of the for to stop.By this way I have only results until this
 problem happens.How can I pass over the execution for this step and have
 results for All other datasets for which function fun is applicable?

see ?try, or ?tryCatch.

I'd do something in the line of
for(i in 1:5){
 tmp - try(fun(data=S[[i]]))
 results[[i]] - ifelse(is(tmp,try-error),NA,tmp)
}

Alternatively, you could also use lapply :
results - lapply(S,function(x{
   tmp - try(fun(data=x))
ifelse(is(tmp,try-error),NA,tmp)
})


 2) After some runs to my program, I receive this error message someError in
 sink(output.txt) : sink stack is full . How can I solve this problem, as I
 want to have results of my program for 1000 datasets.

That is because you never empty the sink. add sink() after the last
line you want in the file. This will empty the sink buffer to the
file. Otherwise R keeps everything in the memory, and that gets too
full after a while.


 3) Using for is the correct way to run my proram for a list
See the lapply solution.


 Thanks alot

 Evgenia

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Problems-when-Apply-a-script-to-a-list-tp2339403p2339403.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing arguments between S4 methods fails within a function:bug? example with raster package.

2010-08-26 Thread Joris Meys
Dear all,

This problem came up initially while debugging a function, but it
seems to be a more general problem of R. I hope I'm wrong, but I can't
find another explanation. Let me illustrate with the raster package.

For an object RasterLayer (which inherits from Raster), there is a
method xyValues defined with the signature
(object=RasterLayer,xy=matrix). There is also a method with
signature (object=Raster,xy=vector). The only thing this method
does, is change xy into a matrix and then pass on to the next method
using callGeneric again. Arguments are passed.

Now this all works smoothly, as long as you stay in the global environment :
require(raster)

a - raster()
a[] - 1:ncell(a)

origin - c(-80,50)
eff.dist - 10

unlist(xyValues(a,xy=origin,buffer=eff.dist))
[1] 14140 14141 14500 14501

Now let's make a very basic test function :

test - function(x,orig.point){
eff.distance - 10
p - unlist(xyValues(x,xy=orig.point,buffer=eff.distance))
return(p)
}

This gives the following result :
 test(a,origin)
Error in .local(object, xy, ...) : object 'eff.distance' not found

huh? Apparently, eff.distance got lost somewhere in the parsetree (am
I saying this correctly?)

The funny thing is when we change origin to a matrix :
 origin - matrix(origin,ncol=2)

 unlist(xyValues(a,xy=origin,buffer=eff.dist))
[1] 14140 14141 14500 14501

 test(a,origin)
[1] 14140 14141 14500 14501

It all works again! So something goes wrong with passing the arguments
from one method to another using callGeneric. Is this a bug in R or am
I missing something obvious?

The relevant code from the raster package :

setMethod(xyValues, signature(object='Raster', xy='vector'),
function(object, xy, ...) {
if (length(xy) == 2) {
callGeneric(object, matrix(xy, ncol=2), ...)
} else {
stop('xy coordinates should be a two-column matrix or 
data.frame,
or a vector of two numbers.')
}
} )

setMethod(xyValues, signature(object='RasterLayer', xy='matrix'),
function(object, xy, method='simple', buffer=NULL, fun=NULL, 
na.rm=TRUE) {

if (dim(xy)[2] != 2) {
stop('xy has wrong dimensions; it should have 2 
columns' )
}

if (! is.null(buffer)) {
return( .xyvBuf(object, xy, buffer, fun, na.rm=na.rm) )
}

if (method=='bilinear') {
return(.bilinearValue(object, xy))
} else if (method=='simple') {
cells - cellFromXY(object, xy)
return(.readCells(object, cells))
} else {
stop('invalid method argument. Should be simple or 
bilinear.')
}
}   
)   



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Change value of a slot of an S4 object within a method.

2010-08-25 Thread Joris Meys
Dear all,

I have an S4 class with a slot extra which is a list. I want to be
able to add an element called name to that list, containing the
object value (which can be a vector, a dataframe or another S4
object)

Obviously

setMethod(add.extra,signature=c(PM10Data,character,vector),
  function(object,name,value){
 obj...@extra[[name]] - value
  }
)

Contrary to what I would expect, the line :
eval(eval(substitute(expression(obj...@extra[[name]] - value

gives the error :
Error in obj...@extra[[name]] - value : object 'object' not found

Substitute apparently doesn't work any more in S4 methods...

 I found a work-around by calling the initializer every time, but this
influences the performance. Returning the object is also not an
option, as I'd have to remember to assign that one each time to the
original name.

Basically I'm trying to do some call by reference with S4, but don't
see how I should do that. How would I be able to tackle this problem
in an efficient and elegant way?

Thank you in advance
Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change value of a slot of an S4 object within a method.

2010-08-25 Thread Joris Meys
Hi Steve,

thanks for the tip.  I'll definitely take a closer look at your
solution for implementation for future use.  But right now I don't
have the time to start rewriting my class definitions.

Luckily, I found where exactly things were going wrong. After reading
into the documentation about the evaluation in R, I figured out I have
to specify the environment where substitute should look explicitly as
parent.frame(1). I still don't understand completely why exactly, but
it does the job.

Thus :
eval(eval(substitute(expression(obj...@extra[[name]] - value

should become :

eval(
   eval(
  substitute(
 expression(obj...@extra[[name]] - value)
  ,env=parent.frame(1) )
   )
)

Tried it out and it works.

Cheers
Joris

On Wed, Aug 25, 2010 at 6:21 PM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Hi Joris,

 On Wed, Aug 25, 2010 at 11:56 AM, Joris Meys jorism...@gmail.com wrote:
 Dear all,

 I have an S4 class with a slot extra which is a list. I want to be
 able to add an element called name to that list, containing the
 object value (which can be a vector, a dataframe or another S4
 object)

 Obviously

 setMethod(add.extra,signature=c(PM10Data,character,vector),
  function(object,name,value){
             obj...@extra[[name]] - value
  }
 )

 Contrary to what I would expect, the line :
 eval(eval(substitute(expression(obj...@extra[[name]] - value

 gives the error :
 Error in obj...@extra[[name]] - value : object 'object' not found

 Substitute apparently doesn't work any more in S4 methods...

  I found a work-around by calling the initializer every time, but this
 influences the performance. Returning the object is also not an
 option, as I'd have to remember to assign that one each time to the
 original name.

 Basically I'm trying to do some call by reference with S4, but don't
 see how I should do that. How would I be able to tackle this problem
 in an efficient and elegant way?

 In lots of my own S4 classes I define a slot called .cache which is
 an environment for this exact purpose.

 Using this solution for your scenario might look something like this:

 setMethod(add.extra,signature=c(PM10Data,character,vector),
 function(object, name, value) {
  obj...@.cache$extra[[name]] - value
 })

 I'm not sure what your entire problem looks like, but to get your
 extra list, or a value form it, you could:

 setMethod(extra, signature=PM10Data,
 function(object, name=NULL) {
  if (!is.null(name)) {
    obj...@.cache$extra[[name]]
  } else {
    obj...@.cache$extra
 })

 ... or something like that.

 The last thing you have to be careful of is that you nave to make sure
 that each new(PM10Data) object you have initializes its *own* cache:

 setClass(PM10Data, representation(..., .cache='environment'))
 setMethod(initialize, PM10Data,
 function(.Object, ..., .cache=new.env()) {
  callNextMethod(.Object, .cache=.cache, ...)
 })

 Does that make sense?

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???

2010-07-28 Thread Joris Meys
Dear all,

I run a gamm with following call :

result - try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP
)+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month
),data=tmp) )

with mgcv version 1.6.2

No stress about the data, the error is not data-related. I get :

Error in isS4(x) : object 'VM' not found

What so? I did define the dataframe to be used, and the dataframe
contains a variable VM :

 str(tmp)
'data.frame':   4014 obs. of  12 variables:
 $ values : num  73.45 105.45 74.45 41.45 -4.55 ...
 $ dates  :Class 'Date'  num [1:4014] 9862 9863 9864 9865 9866 ...
 $ year   : num  -5.65 -5.65 -5.65 -5.65 -5.65 ...
 $ day: num  -178 -177 -176 -175 -174 ...
 $ month  : Factor w/ 156 levels 1996-April,1996-August,..: 17 17
17 17 17 17 17 17 17 17 ...
 $ julday : num  -2241 -2240 -2239 -2238 -2237 ...
 $ weekend: num  -0.289 -0.289 -0.289 0.711 0.711 ...
 $ VM : num  0.139 -1.451 0.349 0.839 -0.611 ...
 $ RH : num  55.2 61.4 59.8 64.1 60.7 ...
 $ TT : num  -23.4 -23.6 -19.5 -16.1 -15.3 ...
 $ PP : num  6.17 4.27 -4.93 -9.23 -2.63 ...
 $ RF : Ord.factor w/ 3 levels None2.5mm..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, means)= num

Any idea what I'm missing here?

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???

2010-07-28 Thread Joris Meys
Dear all,

it gets even more weird. After restarting R, the code I used works
just fine. The call is generated in a function that I debugged using
browser(). Problem is solved, but I have no clue whatsoever how that
error came about. It must have something to do with namespaces, but
the origin is dark. I tried to regenerate the error, but didn't
succeed.

Somebody an idea as to where I have to look for a cause?

Cheers
Joris

On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys jorism...@gmail.com wrote:
 Dear all,

 I run a gamm with following call :

 result - try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP
 )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month
 ),data=tmp) )

 with mgcv version 1.6.2

 No stress about the data, the error is not data-related. I get :

 Error in isS4(x) : object 'VM' not found

 What so? I did define the dataframe to be used, and the dataframe
 contains a variable VM :

 str(tmp)
 'data.frame':   4014 obs. of  12 variables:
  $ values : num  73.45 105.45 74.45 41.45 -4.55 ...
  $ dates  :Class 'Date'  num [1:4014] 9862 9863 9864 9865 9866 ...
  $ year   : num  -5.65 -5.65 -5.65 -5.65 -5.65 ...
  $ day    : num  -178 -177 -176 -175 -174 ...
  $ month  : Factor w/ 156 levels 1996-April,1996-August,..: 17 17
 17 17 17 17 17 17 17 17 ...
  $ julday : num  -2241 -2240 -2239 -2238 -2237 ...
  $ weekend: num  -0.289 -0.289 -0.289 0.711 0.711 ...
  $ VM     : num  0.139 -1.451 0.349 0.839 -0.611 ...
  $ RH     : num  55.2 61.4 59.8 64.1 60.7 ...
  $ TT     : num  -23.4 -23.6 -19.5 -16.1 -15.3 ...
  $ PP     : num  6.17 4.27 -4.93 -9.23 -2.63 ...
  $ RF     : Ord.factor w/ 3 levels None2.5mm..: 1 1 1 1 1 1 1 1 1 1 ...
  - attr(*, means)= num

 Any idea what I'm missing here?

 Cheers
 Joris


 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange error : isS4(x) in gamm function (mgcv package). Variable in data-frame not recognized???

2010-07-28 Thread Joris Meys
Follow up:

I finally succeeded to more or less reproduce the error. The origin
lied in the fact that I accidently loaded a function while being in
browser mode for debugging that function. So something went very much
wrong with the namespaces. Teaches me right...

Cheers
Joris

On Wed, Jul 28, 2010 at 2:27 PM, Joris Meys jorism...@gmail.com wrote:
 Dear all,

 it gets even more weird. After restarting R, the code I used works
 just fine. The call is generated in a function that I debugged using
 browser(). Problem is solved, but I have no clue whatsoever how that
 error came about. It must have something to do with namespaces, but
 the origin is dark. I tried to regenerate the error, but didn't
 succeed.

 Somebody an idea as to where I have to look for a cause?

 Cheers
 Joris

 On Wed, Jul 28, 2010 at 1:16 PM, Joris Meys jorism...@gmail.com wrote:
 Dear all,

 I run a gamm with following call :

 result - try(gamm(values~ s( VM )+s( RH )+s( TT )+s( PP
 )+RF+weekend+s(day)+s(julday) ,correlation=corCAR1(form=~ day|month
 ),data=tmp) )

 with mgcv version 1.6.2

 No stress about the data, the error is not data-related. I get :

 Error in isS4(x) : object 'VM' not found

 What so? I did define the dataframe to be used, and the dataframe
 contains a variable VM :

 str(tmp)
 'data.frame':   4014 obs. of  12 variables:
  $ values : num  73.45 105.45 74.45 41.45 -4.55 ...
  $ dates  :Class 'Date'  num [1:4014] 9862 9863 9864 9865 9866 ...
  $ year   : num  -5.65 -5.65 -5.65 -5.65 -5.65 ...
  $ day    : num  -178 -177 -176 -175 -174 ...
  $ month  : Factor w/ 156 levels 1996-April,1996-August,..: 17 17
 17 17 17 17 17 17 17 17 ...
  $ julday : num  -2241 -2240 -2239 -2238 -2237 ...
  $ weekend: num  -0.289 -0.289 -0.289 0.711 0.711 ...
  $ VM     : num  0.139 -1.451 0.349 0.839 -0.611 ...
  $ RH     : num  55.2 61.4 59.8 64.1 60.7 ...
  $ TT     : num  -23.4 -23.6 -19.5 -16.1 -15.3 ...
  $ PP     : num  6.17 4.27 -4.93 -9.23 -2.63 ...
  $ RF     : Ord.factor w/ 3 levels None2.5mm..: 1 1 1 1 1 1 1 1 1 1 
 ...
  - attr(*, means)= num

 Any idea what I'm missing here?

 Cheers
 Joris


 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R^2 in loess and predict?

2010-07-09 Thread Joris Meys
I do not agree with your interpretation of the adjusted R^2. The R^2
is no more than the ratio of the explained variance by the total
variance, expressed in sums of squares. The adjusted R^2 is adjusted
for the degrees of freedom, and can only be used for selection
purposes. The interpretation towards the final model is hard, and
definitely not a measure of how well it models the population.

For a loess regression this can be calculated as well. But the loess
is a local regression technique, highly flexible, and highly dependent
on the window you use. In these cases, R^2 (or any other goodness of
fit test) tells you even less. You can get an R^2 of 1 if you chose
the window small enough.

If you want to do inference on nonlinear regression techniques, I
strongly suggest you use Generalized Additive Models, eg from the
package mgcv. There you can use the framework of likelihood ratio
tests for determination of goodness of fit by comparing models.

Cheers
Joris

On Fri, Jul 9, 2010 at 10:42 AM, Ralf B ralf.bie...@gmail.com wrote:
 Parametric regression produces R^2 as a measure of how well the model
 predicts the sample and adjusted R^2 as a measure of how well it
 models the population. What is the equalvalent for non-parametric
 regression (e.g. loess function) ?

 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Non-parametric regression

2010-07-09 Thread Joris Meys
Just to be correct : gam is mentioned on the page Tal linked to, but
is a semi-parametric approach using maximum likelihood. It stays valid
though.

Another thing : you detect non-normality. But can you use a Poisson
distribution for example? The framework of generalized linear models
and generalized additive models allows you to deal with non-normality
of your data.

In any case, I suggest you contact a statistician nearby for guidance.

Cheers
Joris

On Fri, Jul 9, 2010 at 10:26 AM, Tal Galili tal.gal...@gmail.com wrote:
 From reviewing the first google page result for Non-parametric regression
 R, I hope this link will prove useful:

 http://socserv.mcmaster.ca/jfox/Courses/Oxford-2005/R-nonparametric-regression.html



 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --




 On Fri, Jul 9, 2010 at 11:01 AM, Ralf B ralf.bie...@gmail.com wrote:

 I have two data sets, each a vector of 1000 numbers, each vector
 representing a distribution (i.e. 1000 numbers each of which
 representing a frequency at one point on a scale between 1 and 1000).
 For similfication, here an short version with only 5 points.


 a - c(8,10,8,12,4)
 b - c(7,11,8,10,5)

 Leaving the obvious discussion about causality aside fro a moment, I
 would like to see how well i can predict b from a using a regression.
 Since I do not know anything about the distribution type and already
 discovered non-normality I cannot use parametric regression or
 anything GLM for that matter.

 How should I proceed in using non-parametric regression to model
 vector a and see how well it predicts b? Perhaps you could extend the
 given lines into a short example script to give me an idea? Are there
 any other options?

 Best,
 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split with list

2010-07-09 Thread Joris Meys
One  solution is  to put these unwanted entries to 

repor$9853312 [1:2,2:3] - 

Cheers
Joris

On Fri, Jul 9, 2010 at 12:18 PM, n.via...@libero.it n.via...@libero.it wrote:

 Dear List I would like to ask you something concenting a better print of the 
 R output:
 I have a bit data frame which has the following structure:
 CFISCALE              RAGSOCB            ANNO       VAR1        VAR2.
 9853312                     astra                 2005           6            
    45

 9853312                     astra                 2006          78            
   45


 9853312                     astra                 2007           55           
    76


 9653421                      geox                 2005           35           
   89



 9653421                     geox                 2006            24           
     33

 9653421                      geox                 2007           54           
    55


 The first thing I did is to split my data frame for CFISCALE. The result is 
 that R has transformed my data frame into a list. The second step was to 
 transpose each element of my list.
 repo=split(rep,rep$CFISCALE)
 repor=lapply(repo,function(x){
 t(x)})


 When I print my list the format is the following
 $9853312
                                   1                           2               
          3

 CFISCALE            9853312         9853312             9853312

 RAGSOCB            astra               astra                    astra

 ANNO                   2005                2006                      
 2007

 VAR1                       6                         78                       
        55

 VAR2                       45                        45                       
       76


 There is a way to remove the  first row I mean 1, 2 , 3 and to have just one 
 CFISCALE and RAGSOCB???
 For the second problem I tried to use unique but it seems that it doesnt work 
 for list. So what I would like to get is:
 $9853312




 CFISCALE            9853312


 RAGSOCB            astra
 ANNO                   2005                2006                      
 2007

 VAR1                       6                         78                       
        55

 VAR2                       45                        45                       
       76


 This is because I next run xtable on my list in order to get a table in 
 Latex, which I woud like to be in a nice format.
 Thanks a lot for your attention!





        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random sample from arrays

2010-07-09 Thread Joris Meys
Could you elaborate?

Both

 x - 1:4
 set - matrix(nrow = 50, ncol = 11)
  for(i in c(1:11)){
set[,i] -sample(x,50)
print(c(i,-, set), quote = FALSE)
   }

and

 x - 1:4
 set - matrix(nrow = 50, ncol = 11)
 for(i in c(1:50)){
   set[i,] -sample(x,11)
   print(c(i,-, set), quote = FALSE)
  }

run perfectly fine on my computer.
Cheers


On Fri, Jul 9, 2010 at 3:10 PM, Assa Yeroslaviz fry...@gmail.com wrote:
 Hi Joris,
 I guess i did it wrong again.
 but your example didn't work either. I still get the error massage.

 but replicate function just fine. I can even replicate the whole array
 lines.

 THX

 Assa

 On Thu, Jul 8, 2010 at 15:20, Joris Meys jorism...@gmail.com wrote:

 Don't know what exactly you're trying to do, but you make a matrix
 with 11 columns and 50 rows, then treat it as a vector. On top of
 that, you try to fill 50 rows/columns with 50 values. Off course that
 doesn't work. Did you check the warning messages when running the
 code?

 Either do :

  for(i in c(1:11)){
    set[,i] -sample(x,50)
    print(c(i,-, set), quote = FALSE)
   }

 or

  for(i in c(1:50)){
    set[i,] -sample(x,11)
    print(c(i,-, set), quote = FALSE)
   }

 Or just forget about the loop altogether and do :

 set - replicate(11,sample(x,50))
 or
 set - t(replicate(50,sample(x,11)))

 cheers

 On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz fry...@gmail.com wrote:
  Hello R users,
 
  I'm trying to extract random samples from a big array I have.
 
  I have a data frame of over 40k lines and would like to produce around
  50
  random sample of around 200 lines each from this array.
 
  this is the matrix
           ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T
  xxx__8T
  yyy_1c yyy_1c _2c
  1 A_512  2.150295  2.681759  2.177138  2.142790  2.115344  2.013047
  2.115634  2.189372  1.643328  1.563523
  2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
  11.650336 11.995504 13.024494 12.776322
  3 A_152  2.063276  2.160961  2.067549  2.059732  2.656416  2.075775
  2.033982  2.111937  1.606340  1.548940
  4 A_163  9.570761 10.448615  9.432859  9.732615 10.354234 10.993279
  9.160038  9.104121 10.079177  9.828757
  5 A_184  3.574271  4.680859  4.517047  4.047096  3.623668  3.021356
  3.559434  3.156093  4.308437  4.045098
  6 A_199  7.593952  7.454087  7.513013  7.449552  7.345718  7.367068
  7.410085  7.022582  7.668616  7.953706
  ...
 
  I tried to do it with a for loop:
 
  genelist - read.delim(/user/R/raw_data.txt)
  rownames(genelist) - genelist[,1]
  genes - rownames(genelist)
 
  x - 1:4
  set - matrix(nrow = 50, ncol = 11)
 
  for(i in c(1:50)){
     set[i] -sample(x,50)
     print(c(i,-, set), quote = FALSE)
     }
 
  which basically do the trick, but I just can't save the results outside
  the
  loop.
  After having the random sets of lines it wasn't a problem to extract the
  line from the arrays using subset.
 
  genSet1 -sample(x,50)
  random1 - genes %in% genSet1
  subsetGenelist - subset(genelist, random1)
 
 
  is there a different way of creating these random vectors or saving the
  loop
  results outside tjhe loop so I cn work with them?
 
  Thanks a lot
 
  Assa
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relation in aggregated data

2010-07-08 Thread Joris Meys
Depending on the data and the research question, a meta-analytic
approach might be appropriate. You can see every campaign as a
study. See the package metafor for example. You can only draw very
general conclusions, but at least your inference will be closer to
correct.

Cheers
Joris

On Thu, Jul 8, 2010 at 10:03 AM, Petr PIKAL petr.pi...@precheza.cz wrote:
 Thank you

 Actually when I do this myself I always try to make day or week averages
 if possible. However this was done by one of my colleagues and basically
 the aggregation was done on basis of campaigns. There is 4 to 6 campaigns
 per year and sometimes there is apparent relationship in aggregated data
 sometimes is not. My opinion is that I can not say much about exact
 relations until I have other clues or ways like expected underlaying laws
 of physics.

 Thanks again

 Best regards
 Petr



 Joris Meys jorism...@gmail.com napsal dne 07.07.2010 17:33:55:

 You examples are pretty extreme... Combining 120 data points in 4
 points is off course never going to give a result. Try :

 fac - rep(1:8,each=15)
 xprum - tapply(x, fac, mean)
 yprum - tapply(y, fac, mean)
 plot(xprum, yprum)

 Relation is not obvious, but visible.

 Yes, you lose information. Yes, your hypothesis changes. But in the
 case you describe, averaging the x-values for every day (so you get an
 average linked to 1 y value) seems like a possibility, given you take
 that into account when formulating the hypothesis. Optimally, you
 should take the standard error on the average into account for the
 analysis, but this is complicated, often not done and in most cases
 ignoring this issue is not influencing the results to that extent it
 becomes important.

 YMMV

 Cheers

 On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL petr.pi...@precheza.cz
 wrote:
  Dear all
 
  My question is more on statistics than on R, however it can be
  demonstrated by R. It is about pros and cons trying to find a
 relationship
  by aggregated data. I can have two variables which can be related and
 I
  measure them regularly during some time (let say a year) but I can not
  measure them in a same time - (e.g. I can not measure x and respective
  value of y, usually I have 3 or more values of x and only one value of
 y
  per day).
 
  I can make a aggregated values (let say quarterly). My questions are:
 
  1.      Is such approach sound? Can I use it?
  2.      What could be the problems
  3.      Is there any other method to inspect variables which can be
  related but you can not directly measure them in a same time?
 
  My opinion is, that it is not much sound to inspect aggregated values
 and
  there can be many traps especially if there are only few aggregated
  values. Below you can see my examples.
 
  If you have some opinion on this issue, please let me know.
 
  Best regards
  Petr
 
  Let us have a relation x/y
 
  set.seed(555)
  x - rnorm(120)
  y - 5*x+3+rnorm(120)
  plot(x, y)
 
  As you can see there is clear relation which can be seen from plot.
 Now I
  make a factor for aggregation.
 
  fac - rep(1:4,each=30)
 
  xprum - tapply(x, fac, mean)
  yprum - tapply(y, fac, mean)
  plot(xprum, yprum)
 
  Relationship is completely gone. Now let us make other fake data
 
  xn - runif(120)*rep(1:4, each=30)
  yn - runif(120)*rep(1:4, each=30)
  plot(xn,yn)
 
  There is no visible relation, xn and yn are independent but related to
  aggregation factor.
 
  xprumn - tapply(xn, fac, mean)
  yprumn - tapply(yn, fac, mean)
  plot(xprumn, yprumn)
 
  Here you can see perfect relation which is only due to aggregation
 factor.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package installation for Windows 7

2010-07-08 Thread Joris Meys
Hi,

I am running Windows 7 and R 2.11.1, and everything is installing just
fine for me. Did you install R in the Program Files folder? If so,
uninstall and try to re-install R in another folder (e.g.
c:\R\R2.11.1\ like on my computer). I noticed in the past that the
access control of Windows treats the Program Files folder different
compared to other folders.

Cheers
Joris

On Thu, Jul 8, 2010 at 1:15 PM, Dave Bickel
davidbickel.com+re...@gmail.com wrote:
 Neither biocLite nor the GUI menus can install packages on my system. Here
 is relevant output:

 version
 _
 platform i386-pc-mingw32
 arch i386
 os mingw32
 system i386, mingw32
 status
 major 2
 minor 11.1
 year 2010
 month 05
 day 31
 svn rev 52157
 language R
 version.string R version 2.11.1 (2010-05-31)
 source(http://bioconductor.org/biocLite.R;)
 BioC_mirror = http://www.bioconductor.org
 Change using chooseBioCmirror().
 biocLite(ALL)
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] ALL
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
 source(http://bioconductor.org/biocLite.R;)
 BioC_mirror = http://www.bioconductor.org
 Change using chooseBioCmirror().
 Warning messages:
 1: In safeSource() : Redefining ‘biocinstall’
 2: In safeSource() : Redefining ‘biocinstallPkgGroups’
 3: In safeSource() : Redefining ‘biocinstallRepos’
 biocLite()
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] affy affydata affyPLM affyQCReport
 [5] annaffy annotate Biobase biomaRt
 [9] Biostrings DynDoc gcrma genefilter
 [13] geneplotter GenomicRanges hgu95av2.db limma
 [17] marray multtest vsn xtable
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
 biocLite()
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] affy affydata affyPLM affyQCReport
 [5] annaffy annotate Biobase biomaRt
 [9] Biostrings DynDoc gcrma genefilter
 [13] geneplotter GenomicRanges hgu95av2.db limma
 [17] marray multtest vsn xtable
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
 library(Biobase)
 Error in library(Biobase) : there is no package called 'Biobase'
 biocLite(Biobase)
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] Biobase
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
 source(http://bioconductor.org/biocLite.R;)
 BioC_mirror = http://www.bioconductor.org
 Change using chooseBioCmirror().
 Warning messages:
 1: In safeSource() : Redefining ‘biocinstall’
 2: In safeSource() : Redefining ‘biocinstallPkgGroups’
 3: In safeSource() : Redefining ‘biocinstallRepos’
 biocLite(Biobase)
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] Biobase
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed

 utils:::menuInstallLocal() # Install package(s) from local zip files...
 Error in if (ok) { : missing value where TRUE/FALSE needed

 utils:::menuInstallPkgs() # Install package(s)...
 --- Please select a CRAN mirror for use in this session ---
 Error in if (ok) { : missing value where TRUE/FALSE needed


 I would appreciate any assistance.

 David

 --
 David R. Bickel, PhD
 Associate Professor
 Ottawa Institute of Systems Biology
 Biochem., Micro. and I. Department
 Mathematics and Statistics Department
 University of Ottawa
 451 Smyth Road
 Ottawa, Ontario K1H 8M5

 http://www.statomics.com

 Office Tel: (613) 562-5800 ext. 8670
 Office Fax: (613) 562-5185
 Office Room: RGN 4510F (Follow the signs to the elevator, and take it to the
 fourth floor. Turn left and go all the way to the end of the hall, and enter
 the door to the OISB area.)
 Lab Tel.: (613) 562-5800 ext. 8304
 Lab Room: RGN 4501T

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9

Re: [R] how to determine order of arima bt acf and pacf

2010-07-08 Thread Joris Meys
Did you read these already?

http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf

http://www.barigozzi.eu/ARIMA.pdf

Cheers
Joris

On Thu, Jul 8, 2010 at 11:45 AM, vijaysheegi vijay.she...@gmail.com wrote:

 Hi R community,
 I am new to R.Have  some doubts on ACF and pacf.Appying acf and pacf to
 dataframe or table.How to determine ARIMA degrees which suits our example .
 Please assist me on this and also please give me link for the same so that i
 will also try understand the solution.

 Thanks in advance
 vijaysheegi
 student


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-determine-order-of-arima-bt-acf-and-pacf-tp2282053p2282053.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strsplit(dia ma, \\b) splits characterwise

2010-07-08 Thread Joris Meys
l guess this is expected behaviour, although counterintuitive. \b
represents an empty string indicating a word boundary, but is coerced
to character and thus simply the empty string. This means the output
you get is the same as
 strsplit(dia ma, ,perl=T)
[[1]]
[1] d i a   m a

I'd use the seperating character as split in strsplit, eg

 strsplit(dia ma, \\s)
[[1]]
[1] dia ma

If you need the space in the list as well, you'll have to go around it I guess.

 test - as.vector(gregexpr(\\b, dia ma, perl=TRUE)[[1]])
 test
[1] 1 4 5 7
 apply(embed(test,2),1,function(x) substr(dia ma,x[2],x[1]-1))
[1] dia ma

It would be nice if special characters like \b would be recognized by
strsplit as well though.

Cheers
Joris

On Thu, Jul 8, 2010 at 10:15 AM, Suharto Anggono Suharto Anggono
suharto_angg...@yahoo.com wrote:
 \b is word boundary.
 But, unexpectedly, strsplit(dia ma, \\b) splits character by character.

 strsplit(dia ma, \\b)
 [[1]]
 [1] d i a   m a

 strsplit(dia ma, \\b, perl=TRUE)
 [[1]]
 [1] d i a   m a


 How can that be?

 This is the output of 'gregexpr'.

 gregexpr(\\b, dia ma)
 [[1]]
 [1] 1 2 3 4 5 6
 attr(,match.length)
 [1] 0 0 0 0 0 0

 gregexpr(\\b, dia ma, perl=TRUE)
 [[1]]
 [1] 1 4 5 7
 attr(,match.length)
 [1] 0 0 0 0


 The output from gregexpr(\\b, dia ma, perl=TRUE) is what I expect. I 
 expect 'strsplit' to split at that points.

 This is in Windows. R was installed from binary.

 sessionInfo()
 R version 2.11.1 (2010-05-31)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base



 R 2.8.1 shows the same 'strsplit' behavior, but the behavior of default 
 'gregexpr' (i.e. perl=FALSE) is different.

 strsplit(dia ma, \\b)
 [[1]]
 [1] d i a   m a

 strsplit(dia ma, \\b, perl=TRUE)
 [[1]]
 [1] d i a   m a

 gregexpr(\\b, dia ma)
 [[1]]
 [1] 1 4 5 7
 attr(,match.length)
 [1] 0 0 0 0

 gregexpr(\\b, dia ma, perl=TRUE)
 [[1]]
 [1] 1 4 5 7
 attr(,match.length)
 [1] 0 0 0 0

 sessionInfo()
 R version 2.8.1 (2008-12-22)
 i386-pc-mingw32

 locale:
 LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
 States.1252;LC_MON
 ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United 
 States.1252


 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package installation for Windows 7

2010-07-08 Thread Joris Meys
As far as my experience goes, there is no need whatsoever to run R as
an administrator for installation of any package, including
BioConductor, provided you stay away from the Program Files folder.

Cheers
Joris

On Thu, Jul 8, 2010 at 2:55 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 On 08/07/2010 7:15 AM, Dave Bickel wrote:

 Neither biocLite nor the GUI menus can install packages on my system.

 This is probably a permissions problem.  Are you running R as an
 administrator when you try the install?  If not, you won't be able to
 install to the default location, but you should be able to set up your own
 library somewhere else, and install there.

 The error message should be fixed; we shouldn't give

 Error in if (ok) { : missing value where TRUE/FALSE needed


 but I doubt if that's the source of your problem.

 BTW, in future Bioconductor questions should go to their list.  In this case
 it looks as though it's not a BioC problem, but that just means there's no
 need to list all the biocLite.R material at the start:  short simple bug
 reports are easier to deal with.

 Duncan Murdoch

 Here is relevant output:

   version
 _
 platform i386-pc-mingw32
 arch i386
 os mingw32
 system i386, mingw32
 status
 major 2
 minor 11.1
 year 2010
 month 05
 day 31
 svn rev 52157
 language R
 version.string R version 2.11.1 (2010-05-31)
   source(http://bioconductor.org/biocLite.R;)
 BioC_mirror = http://www.bioconductor.org
 Change using chooseBioCmirror().
   biocLite(ALL)
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] ALL
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
   source(http://bioconductor.org/biocLite.R;)
 BioC_mirror = http://www.bioconductor.org
 Change using chooseBioCmirror().
 Warning messages:
 1: In safeSource() : Redefining ‘biocinstall’
 2: In safeSource() : Redefining ‘biocinstallPkgGroups’
 3: In safeSource() : Redefining ‘biocinstallRepos’
   biocLite()
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] affy affydata affyPLM affyQCReport
 [5] annaffy annotate Biobase biomaRt
 [9] Biostrings DynDoc gcrma genefilter
 [13] geneplotter GenomicRanges hgu95av2.db limma
 [17] marray multtest vsn xtable
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
   biocLite()
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] affy affydata affyPLM affyQCReport
 [5] annaffy annotate Biobase biomaRt
 [9] Biostrings DynDoc gcrma genefilter
 [13] geneplotter GenomicRanges hgu95av2.db limma
 [17] marray multtest vsn xtable
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
   library(Biobase)
 Error in library(Biobase) : there is no package called 'Biobase'
   biocLite(Biobase)
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] Biobase
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed
   source(http://bioconductor.org/biocLite.R;)
 BioC_mirror = http://www.bioconductor.org
 Change using chooseBioCmirror().
 Warning messages:
 1: In safeSource() : Redefining ‘biocinstall’
 2: In safeSource() : Redefining ‘biocinstallPkgGroups’
 3: In safeSource() : Redefining ‘biocinstallRepos’
   biocLite(Biobase)
 Using R version 2.11.1, biocinstall version 2.6.7.
 Installing Bioconductor version 2.6 packages:
 [1] Biobase
 Please wait...

 Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
 argument 'lib' is missing: using '\Users\dbickel/R/win-library/2.11'
 Error in if (ok) { : missing value where TRUE/FALSE needed

   utils:::menuInstallLocal() # Install package(s) from local zip
 files...
 Error in if (ok) { : missing value where TRUE/FALSE needed

   utils:::menuInstallPkgs() # Install package(s)...
 --- Please select a CRAN mirror for use in this session ---
 Error in if (ok) { : missing value where TRUE/FALSE needed


 I would appreciate any assistance.

 David



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied

Re: [R] random sample from arrays

2010-07-08 Thread Joris Meys
Don't know what exactly you're trying to do, but you make a matrix
with 11 columns and 50 rows, then treat it as a vector. On top of
that, you try to fill 50 rows/columns with 50 values. Off course that
doesn't work. Did you check the warning messages when running the
code?

Either do :

 for(i in c(1:11)){
set[,i] -sample(x,50)
print(c(i,-, set), quote = FALSE)
   }

or

 for(i in c(1:50)){
set[i,] -sample(x,11)
print(c(i,-, set), quote = FALSE)
   }

Or just forget about the loop altogether and do :

set - replicate(11,sample(x,50))
or
set - t(replicate(50,sample(x,11)))

cheers

On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz fry...@gmail.com wrote:
 Hello R users,

 I'm trying to extract random samples from a big array I have.

 I have a data frame of over 40k lines and would like to produce around 50
 random sample of around 200 lines each from this array.

 this is the matrix
          ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T
 yyy_1c yyy_1c _2c
 1 A_512  2.150295  2.681759  2.177138  2.142790  2.115344  2.013047
 2.115634  2.189372  1.643328  1.563523
 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
 11.650336 11.995504 13.024494 12.776322
 3 A_152  2.063276  2.160961  2.067549  2.059732  2.656416  2.075775
 2.033982  2.111937  1.606340  1.548940
 4 A_163  9.570761 10.448615  9.432859  9.732615 10.354234 10.993279
 9.160038  9.104121 10.079177  9.828757
 5 A_184  3.574271  4.680859  4.517047  4.047096  3.623668  3.021356
 3.559434  3.156093  4.308437  4.045098
 6 A_199  7.593952  7.454087  7.513013  7.449552  7.345718  7.367068
 7.410085  7.022582  7.668616  7.953706
 ...

 I tried to do it with a for loop:

 genelist - read.delim(/user/R/raw_data.txt)
 rownames(genelist) - genelist[,1]
 genes - rownames(genelist)

 x - 1:4
 set - matrix(nrow = 50, ncol = 11)

 for(i in c(1:50)){
    set[i] -sample(x,50)
    print(c(i,-, set), quote = FALSE)
    }

 which basically do the trick, but I just can't save the results outside the
 loop.
 After having the random sets of lines it wasn't a problem to extract the
 line from the arrays using subset.

 genSet1 -sample(x,50)
 random1 - genes %in% genSet1
 subsetGenelist - subset(genelist, random1)


 is there a different way of creating these random vectors or saving the loop
 results outside tjhe loop so I cn work with them?

 Thanks a lot

 Assa

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using nlm or optim

2010-07-08 Thread Joris Meys
Without data I can't check, but try :
mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean))

With a random dataset I get :
 Mean - rnorm(136)

 Var - 1 + rnorm(136)^2
 mle(nll,start=list(c=0.01,z=2.1,s=200),fixed=list(V=Var,M=Mean))
Error in optim(start, f, method = method, hessian = TRUE, ...) :
  initial value in 'vmmin' is not finite

This might be just a data problem, but again, I'm not sure.

Cheers
Joris

On Thu, Jul 8, 2010 at 3:11 AM, Anita Narwani anitanarw...@gmail.com wrote:
 Hello,
 I am trying to use nlm to estimate the parameters that minimize the
 following function:

 Predict-function(M,c,z){
 + v = c*M^z
 + return(v)
 + }

 M is a variable and c and z are parameters to be estimated.

 I then write the negative loglikelihood function assuming normal errors:

 nll-function(M,V,c,z,s){
 n-length(Mean)
 logl- -.5*n*log(2*pi) -.5*n*log(s) - (1/(2*s))*sum((V-Predict(Mean,c,z))^2)
 return(-logl)
 }

 When I put the Mean and Variance (variables with 136 observations) into this
 function, and estimates for c,z, and s, it outputs the estimate for the
 normal negative loglikelihood given the data, so I know that this works.

 However, I am unable to use mle to estimate the parameters c, z, and s. I do
 not know how or where the data i.e. Mean (M) and Variance (V) should enter
 into the mle function. I have tried variations on

 mle(nll,start=list(c=0.01,z=2.1,s=200)) including
 mle(nll,start=list(M=Mean,V=Var, c=0.01,z=2.1,s=200))

 I keep getting errors and am quite certain that I just have a syntax error
 in the script because I don't know how to enter the variables into MLE.

 Thanks for your help,
 Anita.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] package installation for Windows 7

2010-07-08 Thread Joris Meys
On Thu, Jul 8, 2010 at 3:39 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 On 08/07/2010 9:26 AM, David Bickel wrote:

 Yes, the User into which I logged in before launching RGui is an
 Administrator. Correct, the problem is not limited to Bioconductor packages.


 On Windows 7, it's not enough to have the user be an administrator, you need
 to run programs specifically with administrative rights.  You do this by
 right clicking on the icon, and choosing Run as administrator.

Reason for this is that Program Files is a protected directory, and
changes can only be done by programs that get administrator rights
(which is not the same as running a program in an administrator
account). Without those administrator rights, R cannot access the
default folder for the packages, as that one is also included in the
Program Files. Also changing the Rprofile.site becomes a hassle, as so
many other tweaks.

Actually, the only programs getting there are Microsoft related
applications. If there's no strict need to place a program there, I
stay away from that folder as far as possible. Vista has the same
problem by the way, and obviously the same solutions.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different goodness of fit tests leads to contradictory conclusions

2010-07-07 Thread Joris Meys
The first two options are GOF-tests for ungrouped data, the latter two
can only be used for grouped data. According to my experience, the G^2
test is more influenced by this than the X^2 test (gives -wrongly-
significant deviations more easily when used for ungrouped data).

If you started from binary data, you can only use the Hosmer tests.

Cheers
Joris

On Wed, Jul 7, 2010 at 4:00 PM, Kiyoshi Sasaki skiyoshi2...@yahoo.com wrote:
 I am trying to test goodness of fit for my legalistic regression using 
 several options as shown below.  Hosmer-Lemeshow test (whose function I 
 borrowed from a previous post), Hosmer–le Cessie omnibus lack of fit test 
 (also borrowed from a previous post), Pearson chi-square test, and deviance 
 test.  All the tests, except the deviance tests, produced p-values well above 
 0.05.  Would anyone please teach me why deviance test produce p-values such a 
 small value (0.001687886) that suggest lack of fit, while other tests suggest 
 good fit? Did I do something wrong?

 Thank you for your time and help!

 Kiyoshi


 mod.fit - glm(formula = no.NA$repcnd ~  no.NA$svl, family = binomial(link = 
 logit), data =  no.NA, na.action = na.exclude, control = list(epsilon = 
 0.0001, maxit = 50, trace = F))

 # Option 1: Hosmer-Lemeshow test
 mod.fit - glm(formula = no.NA$repcnd ~  no.NA$svl, family = binomial(link = 
 logit), data =  no.NA, na.action = na.exclude, control = list(epsilon = 
 0.0001, maxit = 50, trace = F))

    hosmerlem - function (y, yhat, g = 10)
 {
 cutyhat - cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), 
 include.lowest = T)
 obs - xtabs(cbind(1 - y, y) ~ cutyhat)
 expect - xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
 chisq - sum((obs - expect)^2/expect)
 P - 1 - pchisq(chisq, g - 2)
 c(X^2 = chisq, Df = g - 2, P(Chi) = P)
 }

 hosmerlem(no.NA$repcnd, fitted(mod.fit))
  X^2       Df       P(Chi)
 7.8320107    8.000    0.4500497


 # Option 2 - Hosmer–le Cessie omnibus lack of fit test:
 library(Design)
 lrm.GOF - lrm(formula = no.NA$repcnd ~  no.NA$svl, data =  no.NA, y = T, x 
 = T)
 resid(lrm.GOF,type = gof)
 Sum of squared errors Expected value|H0    SD 
 Z P
      48.3487115      48.3017399       
    0.1060826 0.4427829 0.6579228

 # Option 3 - Pearson chi-square p-value:
 pp - sum(resid(lrm.GOF,typ=pearson)^2)
 1-pchisq(pp, mod.fit$df.resid)
 [1] 0.4623282


 # Option 4 - Deviance (G^2) test:
 1-pchisq(mod.fit$deviance, mod.fit$df.resid)
 [1] 0.001687886



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relation in aggregated data

2010-07-07 Thread Joris Meys
You examples are pretty extreme... Combining 120 data points in 4
points is off course never going to give a result. Try :

fac - rep(1:8,each=15)
xprum - tapply(x, fac, mean)
yprum - tapply(y, fac, mean)
plot(xprum, yprum)

Relation is not obvious, but visible.

Yes, you lose information. Yes, your hypothesis changes. But in the
case you describe, averaging the x-values for every day (so you get an
average linked to 1 y value) seems like a possibility, given you take
that into account when formulating the hypothesis. Optimally, you
should take the standard error on the average into account for the
analysis, but this is complicated, often not done and in most cases
ignoring this issue is not influencing the results to that extent it
becomes important.

YMMV

Cheers

On Wed, Jul 7, 2010 at 4:24 PM, Petr PIKAL petr.pi...@precheza.cz wrote:
 Dear all

 My question is more on statistics than on R, however it can be
 demonstrated by R. It is about pros and cons trying to find a relationship
 by aggregated data. I can have two variables which can be related and I
 measure them regularly during some time (let say a year) but I can not
 measure them in a same time - (e.g. I can not measure x and respective
 value of y, usually I have 3 or more values of x and only one value of y
 per day).

 I can make a aggregated values (let say quarterly). My questions are:

 1.      Is such approach sound? Can I use it?
 2.      What could be the problems
 3.      Is there any other method to inspect variables which can be
 related but you can not directly measure them in a same time?

 My opinion is, that it is not much sound to inspect aggregated values and
 there can be many traps especially if there are only few aggregated
 values. Below you can see my examples.

 If you have some opinion on this issue, please let me know.

 Best regards
 Petr

 Let us have a relation x/y

 set.seed(555)
 x - rnorm(120)
 y - 5*x+3+rnorm(120)
 plot(x, y)

 As you can see there is clear relation which can be seen from plot. Now I
 make a factor for aggregation.

 fac - rep(1:4,each=30)

 xprum - tapply(x, fac, mean)
 yprum - tapply(y, fac, mean)
 plot(xprum, yprum)

 Relationship is completely gone. Now let us make other fake data

 xn - runif(120)*rep(1:4, each=30)
 yn - runif(120)*rep(1:4, each=30)
 plot(xn,yn)

 There is no visible relation, xn and yn are independent but related to
 aggregation factor.

 xprumn - tapply(xn, fac, mean)
 yprumn - tapply(yn, fac, mean)
 plot(xprumn, yprumn)

 Here you can see perfect relation which is only due to aggregation factor.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with write.table, involving loops paste statement

2010-07-07 Thread Joris Meys
To be correct, everything is written to all folders according to my testing.

There is absolutely no need whatsoever to use l_ply. And in any case,
take as much as possible outside the loop, like the library statement
and the max.col.

Following is _a_ solution, not the most optimal, but as close as
feasible to your code :

A_var_df - data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)),
from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1))

# I make some data and make sure I can adjust the number of dirs and the steps
Dchr1 -matrix(rep(1:100,each=10),ncol=100)
dirs - 20
max.col - ncol(Dchr1)
steps = ceiling(max.col/dirs)

cols - seq(1, max.col, by=steps)

for(i in 1:length(A_var_df[,1]))
{
  k - cols[i]
 print(as.data.frame(Dchr1[,k:min(k+steps, max.col)]))
 print(paste(./A_,A_var_df[i,1], /k.csv, sep=))
# I use print here just to show which dataframe is going to which directory,
# you can reconstruct the write.table yourself.
}

Cheers
Joris

On Wed, Jul 7, 2010 at 9:32 PM, CC turtysm...@gmail.com wrote:
 Hi!

 I want to write portions of my data (3573 columns at a time) to twenty
 folders I have available titled A_1 to A_20 such that the first 3573
 columns will go to folder A_1, next 3573 to folder A_2 and so on.

 This code below ensures that the data is written into all 20 folders, but
 only the last iteration of the loop (last 3573 columns) is being written
 into ALL of the folders (A_1 to A_20) rather than the sequential order that
 I would like.

 How can I fix this?

 Thank you!

 *

 Code:

 A_var_df - data.frame(index=1:length(seq(1.0, -0.9, by= -0.1)),
 from=seq(1.0, -0.9, by= -0.1), to=seq(0.9, -1.0, by= -0.1))

 for(i in 1:length(A_var_df[,1]))
 {
 library(plyr)
 max.col - ncol(Dchr1)
 l_ply(seq(1, max.col, by=3573), function(k)
 write.table(as.data.frame(Dchr1[,k:min(k+3572, max.col)]), paste(./A_,
 A_var_df[i,1], /k.csv, sep=), sep=,, row.names=F, quote=F) )
 }

 **

 --
 Thanks,
 CC

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] when all I have is a contingency table....

2010-07-07 Thread Joris Meys
See Teds answer for histogram (I'd go with barplot).

For most statistical procedures there is a weighted version (e.g.
weighted.mean() for the mean). Your counts are valid weights for most
procedures.

Cheers
Joris

On Wed, Jul 7, 2010 at 10:39 PM, Andrei Zorine zoav1...@gmail.com wrote:
 Hello,
 I just need a hint here:
 Suppose I have no raw data, but only a frequency table I have, and I
 want to run basic statistical procedures with it, like histogram,
 descriptive statistics, etc. How do I do this with R?
 For example, how do I plot a histogram for this table for a sample of size 60?

 Value   Count
 1       10
 2       8
 3       12
 4       9
 5       14
 6       7


 Thanks,
 A.Z.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] acf

2010-07-06 Thread Joris Meys
Your question is a bit confusing. acfresidfit is an object, of which
we don't know the origin. with your test file, I arrive at the first
correlations (but with integer headings) :

 residfit - read.table(fileree2_test_out.txt)
 acf(residfit)
 acfresid - acf(residfit)
 acfresid

Autocorrelations of series ‘residfit’, by lag

 0  1  2  3  4  5  6  7  8  9
   10 11 12 13 14 15 16
 1.000 -0.015  0.010  0.099  0.048 -0.014 -0.039 -0.019  0.040  0.018
0.042  0.078 -0.029  0.028 -0.016 -0.021 -0.109
17 18 19 20 21 22 23 24 25
 0.000 -0.038 -0.006  0.015 -0.032 -0.002  0.014 -0.226 -0.030

Could you please check where the object acfresidfit is coming from and
how you generated it?
Cheers
Joris

On Tue, Jul 6, 2010 at 9:47 AM, nuncio m nunci...@gmail.com wrote:
 Hi list,
          I have the following code to compute the acf of a time series
 acfresid - acf(residfit), where residfit is the series
 when I type acfresid at the prompt the follwoing is displayed

 Autocorrelations of series ‘residfit’, by lag

 0. 0.0833 0.1667 0.2500 0. 0.4167 0.5000 0.5833 0.6667 0.7500 0.8333

  1.000 -0.015  0.010  0.099  0.048 -0.014 -0.039 -0.019  0.040  0.018  0.042

 0.9167 1. 1.0833 1.1667 1.2500 1. 1.4167 1.5000 1.5833 1.6667 1.7500

  0.078 -0.029  0.028 -0.016 -0.021 -0.109  0.000 -0.038 -0.006  0.015 -0.032

 1.8333 1.9167 2. 2.0833
 -0.002  0.014 -0.226 -0.030
 Residfit is a timeseries object at monthly interval (0.0833), Here I
 understand R computed the correlation at lags 0 to 2 years.

 What is surprising to me is
 if I type acfresidfit at the prompt the following is displayed

 Autocorrelations of series ‘residfit’, by lag

     0      1      2      3      4      5      6      7      8      9     10

  1.000 -0.004  0.011  0.041 -0.056  0.019 -0.052 -0.027 -0.008 -0.012 -0.034

    11     12     13     14     15     16     17     18     19     20     21

  0.024 -0.005  0.006 -0.045  0.031 -0.035 -0.011 -0.021 -0.020 -0.010 -0.007

    22     23     24     25
 -0.038  0.017  0.051  0.038
 From the header I understand both are autocorrelation computed at the same
 lags. but the correlations are different

 where am I going wrong and which is the correct one.

 file residfit is also attached(filename-fileree2_test_out.txt)
 Thanks
 nuncio
 --
 Nuncio.M
 Research Scientist
 National Center for Antarctic and Ocean research
 Head land Sada
 Vasco da Gamma
 Goa-403804

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] timeseries

2010-07-06 Thread Joris Meys
Difficult to guess why, but I reckon you should use ts() instead of
as.ts. Otherwise set the tsp-attribute correctly. Eg :

 x - cumsum(1+round(rnorm(20),2))
 as.ts(x)
Time Series:
Start = 1
End = 20
Frequency = 1
 [1]  0.87  3.51  4.08  4.20  3.25  4.63  6.30  6.89  9.28  9.93 10.19
 9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47

 tseries - ts(x,frequency=12,start=c(2004,3))
 tseries
   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2004  0.87  3.51  4.08  4.20  3.25  4.63  6.30  6.89  9.28  9.93
2005 10.19  9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47

 tsp(x) - c(2004+2/12,2005.75,12)
 as.ts(x)
   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
2004  0.87  3.51  4.08  4.20  3.25  4.63  6.30  6.89  9.28  9.93
2005 10.19  9.97 10.20 11.51 11.52 11.53 12.97 14.04 17.02 18.47

See ?ts to get the options right. I suggest to use the function ts()
instead of assigning the tsp attribute.
)
Cheers
Joris


On Mon, Jul 5, 2010 at 9:35 AM, nuncio m nunci...@gmail.com wrote:
 Dear useRs,
 I am trying to construct a time series using as.ts function, surprisingly
 when I plot
 the data the x axis do not show the time in years, however if I use
 ts(data), time in years are shown in the
 x axis.  Why such difference in the results of both the commands
 Thanks
 nuncio


 --
 Nuncio.M
 Research Scientist
 National Center for Antarctic and Ocean research
 Head land Sada
 Vasco da Gamma
 Goa-403804

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculation on series with different time-steps

2010-07-06 Thread Joris Meys
Look at ?ifelse en ?abs, eg :

data_frame$new_column_in_dataframe - ifelse(stage$julian_day ==
baro$julian_day  abs(stage$time -
baro$hour) = 30,
stage$stage.cm - baro$pressure, NA )

Cheers
Joris


On Thu, Jul 1, 2010 at 10:09 PM, Jeana Lee jeana@colorado.edu wrote:
 Hello,

 I have two series, one with stream stage measurements every 5 minutes, and
 the other with barometric pressure measurements every hour.  I want to
 subtract each barometric pressure measurement from the 12 stage measurements
 closest in time to it (6 stage measurements on either side of the hour).

 I want to do something like the following, but I don't know the syntax.

 If the Julian day of the stage measurement is equal to the Julian day of
 the pressure measurement, AND the absolute value of the difference between
 the time of the stage measurement and the hour of the pressure measurement
 is less than or equal to 30 minutes, then subtract the pressure measurement
 from the stage measurement (and put it in a new column in the stage data
 frame).

                     if ( stage$julian_day = baro$julian_day  |stage$time -
 baro$hour| = 30 )
                     then (stage$stage.cm - baro$pressure)

 Can you help me?

 Thanks,
 JL

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help With ANOVA

2010-07-06 Thread Joris Meys
We're missing the samp1 etc. in order to be able to test the code.
Where did you get the other p-value?
Cheers
Joris

On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel amitrh...@yahoo.co.uk wrote:
 Hi I needed some help with ANOVA

 I have a problem with My ANOVA
 analysis. I have a dataset with a known ANOVA p-value, however I can
 not seem to re-create it in R.

 I have created a list (zzzanova) which contains
 1)Intensity Values
 2)Group Number (6 Different Groups)
 3)Sample Number (54 different samples)
 this is created by the script in Appendix 1

 I then conduct ANOVA with the command
 zzz.aov - aov(Intensity ~ Group, data = zzzanova)

 I get a p-value of
 Pr(F)1
 0.9483218

 The
 expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
 or have put in a wrong formula. I am trying to do an ANOVA analysis
 across all 6 Groups. Is there something wrong with my formula. But I think I
 have made a mistake in the formula rather than anything else.




 APPENDIX 1

 datalist - c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 
 -4.60517, 3.003749, -4.60517,
    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, 
 -4.60517,
    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, 
 -4.60517, 2.39127,
    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, 
 -4.60517, 2.121427, 1.973118,
    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,  
 0.023703, -4.60517,
    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 
 1.371895, 1.533227)

 zzzanova -
 structure(list(Intensity = c(t(Samp1), t(Samp2), t(Samp3), t(Samp4)),
 Group = structure(c(1,1,1,1,1,1,1,1,1,
         2,2,2,2,2,2,2,2,
         3,3,3,3,3,3,3,3,3,
         4,4,4,4,4,4,4,4,4,4,
         5,5,5,5,5,5,5,5,5,
         6,6,6,6,6,6,6,6,6), .Label = c(Group1, Group2, Group3, 
 Group4, Group5, Group6), class = factor),
    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
 ))
 , .Names = c(Intensity,
 Group, Sample), row.names =
 c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
 51, 52, 53, 54),class = data.frame)




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help With ANOVA

2010-07-06 Thread Joris Meys
I still can't reproduce your example. The aov output gives me the following :

 anova(aov(Intensity ~ Group, data = zzzanova))
Analysis of Variance Table

Response: Intensity
  Df Sum Sq Mean Sq F value  Pr(F)
Group  5  98.85  19.771  2.1469 0.07576 .
Residuals 48 442.03   9.209
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Next to that I noticed you have truncated data, which has implications
for the analysis as well. If you use a Kruskal-Wallis test, the
p-value becomes larger :

 kruskal.test(Intensity ~ Group, data = zzzanova)

Kruskal-Wallis rank sum test

data:  Intensity by Group
Kruskal-Wallis chi-squared = 6.6955, df = 5, p-value = 0.2443

Which is to be expected, as you have almost 50% truncated data. So a
p-value of 0.005 seems very wrong to me.

Cheers
Joris

On Tue, Jul 6, 2010 at 7:11 PM, Amit Patel amitrh...@yahoo.co.uk wrote:
 Hi Joris

 Sorry i had a misprint in the appendix code in the last email

 datalist - c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 
 -4.60517, 3.003749, -4.60517,
    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, 
 -4.60517,
    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, 
 -4.60517, 2.39127,
    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, 
 -4.60517, 2.121427, 1.973118,
    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,  
 0.023703, -4.60517,
    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 
 1.371895, 1.533227)

 zzzanova -
 structure(list(Intensity = datalist,
 Group = structure(c(1,1,1,1,1,1,1,1,1,
         2,2,2,2,2,2,2,2,
         3,3,3,3,3,3,3,3,3,
         4,4,4,4,4,4,4,4,4,4,
         5,5,5,5,5,5,5,5,5,
         6,6,6,6,6,6,6,6,6), .Label = c(Group1, Group2, Group3, 
 Group4, Group5, Group6), class = factor),
    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
 ))
 , .Names = c(Intensity,
 Group, Sample), row.names =
 c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
 51, 52, 53, 54),class = data.frame)


 Thanks for your reply





 - Original Message 
 From: Joris Meys jorism...@gmail.com
 To: Amit Patel amitrh...@yahoo.co.uk
 Cc: r-help@r-project.org
 Sent: Tue, 6 July, 2010 17:04:40
 Subject: Re: [R] Help With ANOVA

 We're missing the samp1 etc. in order to be able to test the code.
 Where did you get the other p-value?
 Cheers
 Joris

 On Tue, Jul 6, 2010 at 3:08 PM, Amit Patel amitrh...@yahoo.co.uk wrote:
 Hi I needed some help with ANOVA

 I have a problem with My ANOVA
 analysis. I have a dataset with a known ANOVA p-value, however I can
 not seem to re-create it in R.

 I have created a list (zzzanova) which contains
 1)Intensity Values
 2)Group Number (6 Different Groups)
 3)Sample Number (54 different samples)
 this is created by the script in Appendix 1

 I then conduct ANOVA with the command
 zzz.aov - aov(Intensity ~ Group, data = zzzanova)

 I get a p-value of
 Pr(F)1
 0.9483218

 The
 expected p-value is 0.00490 so I feel I maybe using ANOVA incorrectly
 or have put in a wrong formula. I am trying to do an ANOVA analysis
 across all 6 Groups. Is there something wrong with my formula. But I think I
 have made a mistake in the formula rather than anything else.




 APPENDIX 1

 datalist - c(-4.60517, -4.60517, -4.60517, -4.60517, -4.60517, -4.60517, 
 -4.60517, 3.003749, -4.60517,
    2.045314, 2.482557, -4.60517, -4.60517, -4.60517, -4.60517, 1.592743, 
 -4.60517,
    -4.60517, 0.91328, -4.60517, -4.60517, 1.827744, 2.457795, 0.355075, 
 -4.60517, 2.39127,
    2.016987, 2.319903, 1.146683, -4.60517, -4.60517, -4.60517, 1.846162, 
 -4.60517, 2.121427, 1.973118,
    -4.60517, 2.251568, -4.60517, 2.270724, 0.70338, 0.963816, -4.60517,  
 0.023703, -4.60517,
    2.043382, 1.070586, 2.768289, 1.085169, 0.959334, -0.02428, -4.60517, 
 1.371895, 1.533227)

 zzzanova -
 structure(list(Intensity = c(t(Samp1), t(Samp2), t(Samp3), t(Samp4)),
 Group = structure(c(1,1,1,1,1,1,1,1,1,
         2,2,2,2,2,2,2,2,
         3,3,3,3,3,3,3,3,3,
         4,4,4,4,4,4,4,4,4,4,
         5,5,5,5,5,5,5,5,5,
         6,6,6,6,6,6,6,6,6), .Label = c(Group1, Group2, Group3, 
 Group4, Group5, Group6), class = factor),
    Sample = structure(c( 1, 2, 3, 4, 5, 6, 7, 8, 9,
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
    30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54)
 ))
 , .Names = c(Intensity,
 Group, Sample), row.names =
 c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
 41, 42

Re: [R] PCA and Regression

2010-07-06 Thread Joris Meys
PCA components are orthogonal by definition so no, that doesn't make
sense at all. Do yourself a favor and get a book on multivariate data
analysis. Two books come to mind: Obviously the one of Hair and
colleagues, called multivariate data analysis and easily found in a
university library (or on the internet).

The second one is An R and S-Plus Companion to Multivariate Analysis
by Everitt. This one you have less chance of finding in the library,
but you find it online for sale.

This is a nice introduction as well :
https://netfiles.uiuc.edu/miguez/www/Teaching/MultivariateRGGobi.pdf

Never use a chainsaw without reading the manual.

Cheers
Joris

On Tue, Jul 6, 2010 at 9:07 PM, Marino Taussig De Bodonia, Agnese
agnese.marin...@imperial.ac.uk wrote:
 Hello,

 I am currently analyzing responses to questionnaires about general attitudes. 
 I have performed a PCA on my data, and have retained two Principal 
 Components. Now I would like to use the scores of both the principal 
 comonents in a multiple regression. I would like to know if it makes sense to 
 use the scores of one principal component to explain the variance in the 
 scores of another principal component:

 lm(scores of principal component 1~scores of principal component 2+ age, 
 gender, etc..)

 Please could you let me know if this is statistically sound?

 Thank you in advance for you time,

 Agnese
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] S4 classes and debugging - Is there a summary?

2010-07-02 Thread Joris Meys
Dear all,

I'm getting more and more frustrated with the whole S4 thing and I'm
looking for a more advanced summary on how to deal with them. Often I
get error messages that don't make sense at all, or the code is not
doing what I think it would do. Far too often inspecting the code
requires me to go to the source, which doesn't really help in easily
finding the bit of code you're interested in.

Getting the code with getAnywhere() doesn't always work. Debug()
doesn't work. Using trace() and browser() is not an option, as I can't
find the correct method.

eg :
library(raster)
 getAnywhere(xyValues)
A single object matching ‘xyValues’ was found
It was found in the following places
  package:raster
  namespace:raster
with value

standardGeneric for xyValues defined from package raster

function (object, xy, ...)
standardGeneric(xyValues)
environment: 0x04daf14c
Methods may be defined for arguments: object, xy
Use  showMethods(xyValues)  for currently available ones.
 showMethods(xyValues)
Function: xyValues (package raster)
object=Raster, xy=data.frame
object=Raster, xy=SpatialPoints
object=Raster, xy=vector
object=RasterLayer, xy=matrix
object=RasterStackBrick, xy=matrix

And now...?

Is there an overview that actually explains how you get the
information you're looking for without strolling through the complete
source? Sorry if I sound frustrated, but this is costing me huge
amounts of time, to that extent that I rather write a custom function
than use one in an S4 package if I'm not absolutely sure about what it
does and how it achieves it.

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] S4 classes and debugging - Is there a summary?

2010-07-02 Thread Joris Meys
Correction: trace off course works when specifying the signature. so eg:

trace(xyValues,tracer=browser,signature=c(object=RasterLayer, xy=matrix))
allows me to browse through the function. Should have specified the
signature, I overlooked that. Still, if there's a manual on S4 that
anybody likes to mention, please go ahead. I find bits and pieces on
the internet, but haven't found an overview yet dealing with all
peculiarities of these classes.

Cheers
Joris

On Fri, Jul 2, 2010 at 2:05 PM, Joris Meys jorism...@gmail.com wrote:
 Dear all,

 I'm getting more and more frustrated with the whole S4 thing and I'm
 looking for a more advanced summary on how to deal with them. Often I
 get error messages that don't make sense at all, or the code is not
 doing what I think it would do. Far too often inspecting the code
 requires me to go to the source, which doesn't really help in easily
 finding the bit of code you're interested in.

 Getting the code with getAnywhere() doesn't always work. Debug()
 doesn't work. Using trace() and browser() is not an option, as I can't
 find the correct method.

 eg :
 library(raster)
 getAnywhere(xyValues)
 A single object matching ‘xyValues’ was found
 It was found in the following places
  package:raster
  namespace:raster
 with value

 standardGeneric for xyValues defined from package raster

 function (object, xy, ...)
 standardGeneric(xyValues)
 environment: 0x04daf14c
 Methods may be defined for arguments: object, xy
 Use  showMethods(xyValues)  for currently available ones.
 showMethods(xyValues)
 Function: xyValues (package raster)
 object=Raster, xy=data.frame
 object=Raster, xy=SpatialPoints
 object=Raster, xy=vector
 object=RasterLayer, xy=matrix
 object=RasterStackBrick, xy=matrix

 And now...?

 Is there an overview that actually explains how you get the
 information you're looking for without strolling through the complete
 source? Sorry if I sound frustrated, but this is costing me huge
 amounts of time, to that extent that I rather write a custom function
 than use one in an S4 package if I'm not absolutely sure about what it
 does and how it achieves it.

 Cheers
 Joris


 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] S4 classes and debugging - Is there a summary?

2010-07-02 Thread Joris Meys
On Fri, Jul 2, 2010 at 4:01 PM, Martin Morgan mtmor...@fhcrc.org wrote:

  selectMethod(xyValues, c('RasterLayer', 'matrix'))

 would be my choice.

Thanks, that's a discovery. I guess I misunderstood the help pages on that one.

 I don't really have the right experience, but Chamber's 2008 Software
 for Data Analysis... and Gentleman's 2008 R Programming for
 Bioinformatics... books would be where I'd start. ?Methods and ?Classes
 are I think under-used.

I did read the posting guide ;-) but I have to admit the help pages
are not always that clear. Which is one of the reasons why S4 gets me
frustrated so easily. I also have Chamber's book and Gentleman's book
is here on the department as well. I'll go again through the relevant
chapters.

Thanks
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] left end or right end

2010-07-01 Thread Joris Meys
First of all, read the posting guide carefully :
http://www.R-project.org/posting-guide.html
Your question is far from clear. When you say that the lengths of P
and Q are different, you mean the length of the data or the difference
between start and end? That makes a world of difference.

Regarding the statistical test, that depends on what your data
represents. Is it possible for P to fall close to the left and the
right :
P-
Q   ---
For example.

You should also specify which test you want to use. Then people on the
list will be able to tell you whether that is available in R. You can
off course construct your own test with the tools R provides, but
again, this requires a lot more information. Next to that, the list is
actually not intended for statistical advice, but for advice regarding
R code. Maybe somebody will join in with some statistical guidance,
but if you don't know what to do, you better consult a statistician at
your departement.

Cheers
Joris

On Thu, Jul 1, 2010 at 1:53 PM, ravikumar sukumar
ravikumarsuku...@gmail.com wrote:
 Dear all,
 I am a biologist. I have two sets of distance P(start1, end1) and Q(start2,
 end2).
 The distance will be like this.
 P         
 Q  

 I want to know whether P falls closely to the right end or left  end of Q.
  P and Q are of different lengths for each data point. There are more than
 1 pairs of P and Q.
 Is there any test or function in R to bring a statistically significant
 conclusion.

 Thanks for all,
 Suku

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] run R

2010-06-30 Thread Joris Meys
Welcome to a new world. You're using the _programming language_ R with
thousands of packages, and first of all you should be reading the
introduction material thoroughly :

http://cran.r-project.org/doc/manuals/R-intro.pdf
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

Regarding your question : Check the help files of source :
?source

if you have a script say foo.R in a folder c:\bar, do :

source(c:/bar/foo.R)

mind the slashes instead of the backslashes. I assume you're on a
Windows system...

Cheers
Joris

On Wed, Jun 30, 2010 at 3:08 PM,  jorge.conr...@cptec.inpe.br wrote:


      Hi,


      I'm starting use the R Package and I have some .R scripts. How
      can I run these .R scripts.


      Conrado

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Joris Meys
Dear all,

I've recently purchased a new 64bit system with an intel i7 quadcore
processor. As I understood (maybe wrongly) that to date the 32bit
version of R is more stable than the 64bit, I installed the 32bit
version and am happily using it ever since. Now I'm running a whole
lot of models, which goes smoothly, and I thought out of curiosity to
check how much processor I'm using. I would have thought I used 25%
(being one core), as on my old dual core R uses 50% of the total
processor capacity. Funny, it turns out that R is currently using only
12-13% of my cpu, which is about half of what I expected.

Did I miss something somewhere? Should I change some settings? I'm
running on a Windows 7 enterprise. I looked around already, but I have
the feeling I overlooked something.

Cheers
Joris

sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United
States.1252

attached base packages:
[1] grDevices datasets  splines   graphics  stats tcltk utils
   methods   base

other attached packages:
[1] svSocket_0.9-48 TinnR_1.0.3 R2HTML_2.0.0Hmisc_3.7-0
survival_2.35-7

loaded via a namespace (and not attached):
[1] cluster_1.12.3 grid_2.10.1lattice_0.18-3 svMisc_0.9-57  tools_2.10.1


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use of processor by R 32bit on a 64bit machine

2010-06-29 Thread Joris Meys
*slaps forehead*
Thanks. So out it goes, that hyperthreading. Who invented
hyperthreading on a quad-core anyway?


Cheers
Joris

2010/6/29 Uwe Ligges lig...@statistik.tu-dortmund.de:


 On 29.06.2010 15:30, Joris Meys wrote:

 Dear all,

 I've recently purchased a new 64bit system with an intel i7 quadcore
 processor. As I understood (maybe wrongly) that to date the 32bit
 version of R is more stable than the 64bit, I installed the 32bit
 version and am happily using it ever since. Now I'm running a whole
 lot of models, which goes smoothly, and I thought out of curiosity to
 check how much processor I'm using. I would have thought I used 25%
 (being one core), as on my old dual core R uses 50% of the total
 processor capacity. Funny, it turns out that R is currently using only
 12-13% of my cpu, which is about half of what I expected.


 An Intel Core i7 Quadcore has 8 virtual cores since it supports
 hyperthreading. R uses one of these virtual cores. Note that 2 virtual cores
 won't be twice as fast since they are running on the same physical core.
 Hence this is expected.

 Uwe Ligges



 Did I miss something somewhere? Should I change some settings? I'm
 running on a Windows 7 enterprise. I looked around already, but I have
 the feeling I overlooked something.

 Cheers
 Joris

 sessionInfo()
 R version 2.10.1 (2009-12-14)
 i386-pc-mingw32

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252    LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C                           LC_TIME=English_United
 States.1252

 attached base packages:
 [1] grDevices datasets  splines   graphics  stats     tcltk     utils
    methods   base

 other attached packages:
 [1] svSocket_0.9-48 TinnR_1.0.3     R2HTML_2.0.0    Hmisc_3.7-0
 survival_2.35-7

 loaded via a namespace (and not attached):
 [1] cluster_1.12.3 grid_2.10.1    lattice_0.18-3 svMisc_0.9-57
  tools_2.10.1






-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (New) Popularity of R, SAS, SPSS, Stata...

2010-06-28 Thread Joris Meys
Dear Robert,

I've tried to acces that link, but to no prevail. Seems the server
r4stats.com is down, as he doesn't respond. This link got me to the
site :
http://sites.google.com/site/r4statistics/popularity

Cheers
Joris

On Mon, Jun 28, 2010 at 3:52 PM, Muenchen, Robert A (Bob)
muenc...@utk.edu wrote:
 Greeting Listserv Readers,

 At http://r4stats.com/popularity I have added plots, data, and/or
 discussion of:

 1. Scholarly impact of each package across the years
 2. The number of subscribers to some of the listservs
 3. How popular each package is among Google searches across the years
 4. Survey results from a Rexer Analytics poll
 5. Survey results from a KDnuggests poll
 6. A rudimentary analysis of the software skills that employers are
 seeking

 Thanks very much to all the folks who helped on this project including:
 John Fox, Marc Schwartz, Duncan Murdoch, Martin Weiss, John (Jiangtang)
 HU, Andre Wielki, Kjetil Halvorsen, Dario Solari, Joris Meys, Keo
 Ormsby, Karl Rexer, and Gregory Piatetsky-Shapiro.

 If anyone can think of other angles, please let me know.

 Cheers,
 Bob

 =
  Bob Muenchen (pronounced Min'-chen), Manager
  Research Computing Support
  Voice: (865) 974-5230
  Email: muenc...@utk.edu
  Web:   http://oit.utk.edu/research,
  News:  http://oit.utk.edu/research/news.php
  Feedback: http://oit.utk.edu/feedback/
 =

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different standard errors from R and other software

2010-06-26 Thread Joris Meys
If I understand correctly from their website, discrete choice models
are mostly generalized linear models with the common link functions
for discrete data? Apart from a few names I didn't recognize, all
analyses seem quite standard to me. So I wonder why you would write
the log-likelihood yourself for techniques that are implemented in R.

Unless I missed something pretty important, or you want to do a
specific analysis that wasn't clear to me, you should take a closer
look at the possibilities in R for generalized linear (mixed)
modelling and so on.

Binary choice translates to a simple glm with a logit function.
Multinomial choice can be done with eg. multinom() from nnet. Ordered
choice can be done with polr() from the MASS package. A nice one to
look at is the package mgcv or gamm4 in case of big datasets. They
offer very flexible models that can include random terms, specific
variance-covariance structures and non-linear relations in the form of
splines.

Apologies if this is all obvious and known to you. In that case you
might want to specify what exactly it is you are comparing and how
exactly you calculated it yourself.

Cheers
Joris

On Fri, Jun 25, 2010 at 11:47 PM, Min Chen chenmin0...@gmail.com wrote:
 Hi all,

    Sorry to bother you. I'm estimating a discrete choice model in R using
 the maxBFGS command. Since I wrote the log-likelihood myself, in order to
 double check, I run the same model in Limdep. It turns out that the
 coefficient estimates are quite close; however, the standard errors are very
 different. I also computed the hessian and outer product of the gradients in
 R using the numDeriv package, but the results are still very different from
 those in Limdep. Is it the routine to compute the inverse hessian that
 causes the difference? Thank you very much!

     Best wishes.


 Min


 --
 Min Chen
 Ph.D. Candidate
 Department of Agricultural, Food, and Resource Economics
 125 Cook Hall
 Michigan State University

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple qqplot question

2010-06-25 Thread Joris Meys
Sorry, missed the two variable thing. Go with the lm solution then,
and you can tweak the plot yourself (the confidence intervals are
easily obtained via predict(lm.object, interval=prediction) ). The
function qq.plot uses robust regression, but in your case normal
regression will do.

Regarding the shapes : this just indicates both tails are shorter than
expected, so you have a kurtosis greater than 3 (or positive,
depending whether you do the correction or not)

Cheers
Joris

On Fri, Jun 25, 2010 at 4:10 AM, Ralf B ralf.bie...@gmail.com wrote:
 Short rep: I have two distributions, data and data2; each build from
 about 3 million data points; they appear similar when looking at
 densities and histograms. I plotted qqplots for further eye-balling:

 qqplot(data, data2, xlab = 1, ylab = 2)

 and get an almost perfect diagonal line which means they are in fact
 very alike. Now I tried to check normality using qqnorm -- and I think
 I am doing something wrong here:

 qqnorm(data, main = Q-Q normality plot for 1)
 qqnorm(data2, main = Q-Q normality plot for 2)

 I am getting perfect S-shaped curves (??) for both distributions. Am I
 something missing here?

 |
 |                               *  *   *  *
 |                           *
 |                        *
 |                    *
 |               *
 |            *
 |         *
 | * * *
 |-

 Thanks, Ralf




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys
As a remark on your histogram : use less breaks! This histogram tells
you nothing. An interesting function is ?density , eg :

x-rnorm(250)
hist(x,freq=F)
lines(density(x),col=red)

See also this ppt, a very nice and short introduction to graphics in R :
http://csg.sph.umich.edu/docs/R/graphics-1.pdf

2010/6/25 Atte Tenkanen atte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

You should think about the central limit theorem. Actually, you can
just use a t-test to compare means, as with those sample sizes the
mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

According to probability theory, this will be in 5% of the cases if
you repeat your sampling infinitly. But as David asked: why on earth
do you want to test that?

cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Joris Meys
I've been looking around myself, but I couldn't find any. Maybe
somebody will chime in to direct you to the correct places. I also
checked the papers, and it seems not too hard to implement. If I find
some time, I'll take a look at it next week.

For the other two gentlemen, check:

http://www.getahead.psu.edu/PDF/EuclideanDistanceMatrixAnalysis.pdf
http://www.getahead.psu.edu/PDF/no.1.pdf

Cheers
Joris

On Fri, Jun 25, 2010 at 8:30 AM, gokhanocakoglu ocako...@uludag.edu.tr wrote:

 In fact, Euclidean Distance Matrix Analysis (EDMA) of form is a coordinate
 free approach to the analysis of form using landmark data which was
 developed by  Subhash Lele and Joan Richstmeier. They also developed a
 computer program (http://www.getahead.psu.edu/comment/edma.asp) that allow
 to perform several techniques including EDMA I-II but I wonder is there a
 package or code available in R to perform EDMA...

 thanks
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268018.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimizing given two vectors of data

2010-06-25 Thread Joris Meys
Optim uses vectors of _parameters_, not of data. You add a
(likelihood) function, give initial values of the parameters, and get
the optimized parameters back. See ?optim and the examples therein. It
contains an example for optimization using multiple data columns.

Cheers
Joris

On Fri, Jun 25, 2010 at 8:12 AM, confusedSoul ruchir_402...@infosys.com wrote:

 I am trying to estimate an Arrhenius-exponential model in R.  I have one
 vector of data containing failure times, and another containing
 corresponding temperatures.  I am trying to optimize a maximum likelihood
 function given BOTH these vectors.  However, the optim command takes only
 one such vector parameter.

 How can I pass both vectors into the function?
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Optimizing-given-two-vectors-of-data-tp2268002p2268002.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] i want create script

2010-06-25 Thread Joris Meys
Please read the posting guide :  http://www.R-project.org/posting-guide.html

Your question is very vague. One could assume you're completely new to
R and want the commands to read a csv file (see ?read.csv), and to
write away a table (eg ?write.table to write your predicted data in a
text format).

My guess is you want to run this script in the shell without having to
open R, similar to a perl scipt. For this, take a look at:

http://cran.r-project.org/doc/manuals/R-intro.html#Scripting-with-R
http://projects.uabgrid.uab.edu/r-group/wiki/CommandLineProcessing

Cheers
Joris

On Fri, Jun 25, 2010 at 8:26 AM, vijaysheegi vijay.she...@gmail.com wrote:

 Hi R community,
 I want to create a script which will take the .csv table as input and do
 some prediction and output should be returned to some file.Inputs is exel
 sheet containing some tables of data.out should be table of predicted
 data.Will some one help me in this regards...
 Thanks in advance.

 I am using Windows R.Please advise proccedure to create Rscript.


 Regards
 -
 Vijay
 Research student
 Bangalore
 India
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/i-want-create-script-tp2268011p2268011.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-25 Thread Joris Meys
2010/6/25 Frank E Harrell Jr f.harr...@vanderbilt.edu:
 The central limit theorem doesn't help.  It just addresses type I error,
 not power.

 Frank

I don't think I stated otherwise. I am aware of the fact that the
wilcoxon has an Asymptotic Relative Efficiency greater than 1 compared
to the t-test in case of skewed distributions. Apologies if I caused
more confusion.

The problem with the wilcoxon is twofold as far as I understood this
data correctly :
- there are quite some ties
- the wilcoxon assumes under the null that the distributions are the
same, not only the location. The influence of unequal variances and/or
shapes of the distribution is enhanced in the case of unequal sample
sizes.

The central limit theory makes that :
- the t-test will do correct inference in the presence of ties
- unequal variances can be taken into account using the modified
t-test, both in the case of equal and unequal sample sizes

For these reasons, I would personally use the t-test for comparing two
samples from the described population. Your mileage may vary.

Cheers
Joris


 On 06/25/2010 04:29 AM, Joris Meys wrote:
 As a remark on your histogram : use less breaks! This histogram tells
 you nothing. An interesting function is ?density , eg :

 x-rnorm(250)
 hist(x,freq=F)
 lines(density(x),col=red)

 See also this ppt, a very nice and short introduction to graphics in R :
 http://csg.sph.umich.edu/docs/R/graphics-1.pdf

 2010/6/25 Atte Tenkanenatte...@utu.fi:
 Is there anything for me?

 There is a lot of data, n=2418, but there are also a lot of ties.
 My sample n≈250-300

 You should think about the central limit theorem. Actually, you can
 just use a t-test to compare means, as with those sample sizes the
 mean is almost certainly normally distributed.

 i would like to test, whether the mean of the sample differ significantly 
 from the population mean.

 According to probability theory, this will be in 5% of the cases if
 you repeat your sampling infinitly. But as David asked: why on earth
 do you want to test that?

 cheers
 Joris



 --
 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Euclidean Distance Matrix Analysis (EDMA) in R?

2010-06-25 Thread Joris Meys
Thanks for the link, very interesting book. Yet, I couldn't find the
part about EDMA. It would have surprised me anyway, as the input of
multidimensional scaling is one matrix with euclidean distances
between your observations, whereas in EDMA the data consist of a
number of distance matrices.

Quite a different thing if you ask me. Neither cmdscale nor isoMDS or
its derivated functions (eg metaMDS in the vegan package) are going to
be of any help.

Now I come to think of it, vegan has a procrustes function, but I'm
not sure if it is generalized to be of use in EDMA.

Cheers
Joris

On Fri, Jun 25, 2010 at 7:42 PM, Kjetil Halvorsen
kjetilbrinchmannhalvor...@gmail.com wrote:
 There is a freely downloadable and very relevant ( readable) book at
 https://ccrma.stanford.edu/~dattorro/mybook.html
 Convex Optimization and Euclidean Distance geometry, and it indeed names 
 EDMA
 as a form of multidimensional scaling (or maybe in the oposite way).
 You should have a look
 at the codes for multidimensional scaling in R.

 Kjetil

 On Fri, Jun 25, 2010 at 6:25 AM, gokhanocakoglu ocako...@uludag.edu.tr 
 wrote:

 thanks for your interests Joris



 Gokhan OCAKOGLU
 Uludag University
 Faculty of Medicine
 Department of Biostatistics
 http://www20.uludag.edu.tr/~biostat/ocakoglui.htm

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Euclidean-Distance-Matrix-Analysis-EDMA-in-R-tp2266797p2268257.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Average 2 Columns when possible, or return available value

2010-06-25 Thread Joris Meys
Just want to add that if you want to clean out the NA rows in a matrix
or data frame, take a look at ?complete.cases. Can be handy to use
with big datasets. I got curious, so I just ran the codes given here
on a big dataset, before and after removing NA rows. I have to be
honest, this is surely an illustration of the power of rowMeans. I'm
amazed myself.

DF - data.frame(
  A=rep(DF$A,1),
  B=rep(DF$B,1)
)

 system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
  13.260.06   13.46

 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.030.000.03

 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )

Timing stopped at: 227.84 1.03 249.31  -- I got impatient and pressed the escape

 DF - DF[complete.cases(DF),]

 system.time(apply(DF,1,mean,na.rm=TRUE))
   user  system elapsed
   0.390.000.39

 system.time(matrix(rowMeans(DF, na.rm=TRUE), ncol=1))
   user  system elapsed
   0.010.000.02

 system.time(t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean,
+ na.rm=TRUE)[,-1]))
+ )
   user  system elapsed
  10.010.07   13.40

Cheers
Joris


On Sat, Jun 26, 2010 at 1:08 AM, emorway emor...@engr.colostate.edu wrote:

 Forum,

 Using the following data:

 DF-read.table(textConnection(A B
 22.60 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
 102.00 NA
  19.20 NA
  19.20 NA
  NA NA
  NA NA
  NA NA
  11.80 NA
  7.62 NA
  NA NA
  NA NA
  NA NA
  NA NA
  NA NA
  75.00 NA
  NA NA
  18.30 18.2
  NA NA
  NA NA
  8.44 NA
  18.00 NA
  NA NA
  12.90 NA),header=T)
 closeAllConnections()

 The second column is a duplicate reading of the first column, and when two
 values are available, I would like to average column 1 and 2 (example code
 below).  But if there is only one reading, I would like to retain it, but I
 haven't found a good way to exclude NA's using the following code:

 t(as.matrix(aggregate(t(as.matrix(DF)),list(rep(1:1,each=2)),mean)[,-1]))

 Currently, row 24 is the only row with a returned value.  I'd like the
 result to return column A if it is the only available value, and average
 where possible.  Of course, if both columns are NA, NA is the only possible
 result.

 The result I'm after would look like this (row 24 is an avg):

  22.60
    NA
    NA
    NA
    NA
    NA
    NA
    NA
 102.00
  19.20
  19.20
    NA
    NA
    NA
  11.80
  7.62
    NA
    NA
    NA
    NA
    NA
  75.00
    NA
  18.25
    NA
    NA
  8.44
  18.00
    NA
  12.90

 This is a small example from a much larger data frame, so if you're
 wondering what the deal is with list(), that will come into play for the
 larger problem I'm trying to solve.

 Respectfully,
 Eric
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Average-2-Columns-when-possible-or-return-available-value-tp2269049p2269049.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] All a column to a data frame with a specific condition

2010-06-25 Thread Joris Meys
merge(sum_plan_a,data,by=plan_a)

Cheers
Joris


On Sat, Jun 26, 2010 at 2:27 AM, Yi liuyi.fe...@gmail.com wrote:
 Hi, folks,

 Please first look at the codes:

 plan_a=c('apple','orange','apple','apple','pear','bread')
 plan_b=c('bread','bread','orange','bread','bread','yogurt')
 value=1:6
 data=data.frame(plan_a,plan_b,value)
 library(plyr)
 library(reshape)
 mm=melt(data, id=c('plan_a','plan_b'))
 sum_plan_a=cast(mm,plan_a~variable,sum)

 ### I would like to add a new column to the data.frame named 'data',  with
 the same sum of value for the same type of plan_a
 ### The result should come up like this:

   plan_a  plan_b  value  sum_plan_a
 1  apple  bread      1        8
 2 orange  bread     2        2
 3  apple orange     3        8
 4  apple  bread      4        8
 5   pear  bread      5         5
 6  bread yogurt     6         6

 Any tips?

 Thank you.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-25 Thread Joris Meys
I had taken the opposite tack with Google Trends by subtracting
 keywords
like:
SAS -shoes -airlines -sonar...
but never got as good results as that beautiful X code for search.
When you see the end-of-semester panic bumps in traffic, you know
 you're
nailing it!

 I have to eat those words already. The R code for search that showed a
 peak every December did not have quotes around it, so it was searching
 for those three words not the complete phrase. When you add the quotes,
 the peaks vanish.

Don't swallow! You're looking through search terms, not through web
pages. R code for regression, regression code R etc. are all valid
searches, no quotation marks needed.

http://www.google.com/insights/search/#q=code%20for%20r%2Ccode%20for%20SAS%2Ccode%20for%20SPSS%2Ccode%20for%20matlabcmpt=q

This one is nice too. You can see that the bump in the autumn semester
for R is replacing the one for Matlab. Then in the spring semester
Matlab stays high but R drops. And both the US and India always have a
very large search index, whereas the rest of the world is essentially
worthless. Which leads me to the conclusion that : 1) The results are
probably coming from google.com, excluding local versions, and 2) in
the US (and India), statistics is mainly taught in the autumn
semester. Given the fact that daylight has a beneficial effect on the
emotional well being, the impopularity of statistics is likely caused
by unfortunate scheduling.

Forget Excel. Google rocks! ;-)

Cheers
Joris


 Once you go the phrase route, you gain precision but end up with zero
 counts on various phrases. I avoided that by combining them with + to
 get enough to plot. The resulting graph shows SAS dominant until
 mid-2006 when SPSS takes the top position, followed by R, SAS, Stata in
 order:

 http://www.google.com/insights/search/#q=%22r%20code%20for%22%2B%22r%20m
 anual%22%2B%22r%20tutorial%22%2B%22r%20graph%22%2C%22sas%20code%20for%22
 %2B%22sas%20manual%22%2B%22sas%20tutorial%22%2B%22sas%20graph%22%2C%22sp
 ss%20code%20for%22%2B%22spss%20manual%22%2B%22spss%20tutorial%22%2B%22sp
 ss%20graph%22%2C%22stata%20code%20for%22%2B%22stata%20manual%22%2B%22sta
 ta%20tutorial%22%2B%22stata%20graph%22%2C%22s-plus%20code%20for%22%2B%22
 s-plus%20manual%22%2Bs-plus%20tutorial%22%2B%22s-plus%20graph%22cmpt=q

 This might be a good one to add to http://r4stats.com/popularity

 Bob


I see that there's a car, the R Code Mustang, that adding for gets
 rid
of.

Thanks for getting me back on a topic that I had given up on!

Bob

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Joris Meys
Sent: Thursday, June 24, 2010 7:56 PM
To: Dario Solari
Cc: r-help@r-project.org
Subject: Re: [R] Popularity of R, SAS, SPSS, Stata...

Nice idea, but quite sensitive to search terms, if you compare your
result on ... code with ... code for:
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%2
 0
f
or%2Cspss%20code%20forcmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari
 dario.sol...@gmail.com
wrote:
 First: excuse for my english

 My opinion: a useful font for measuring popoularity can be Google
 Insights for Search - http://www.google.com/insights/search/#

 Every person using a software like R, SAS, SPSS needs first to learn
 it. So probably he make a web-search for a manual, a tutorial, a
 guide. One can measure the share of this kind of serach query.
 This kind of results can be useful to determine trends of
 popularity.

 Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
 SPSS tutorial/manual/guide

http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20m
 a
n
ual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%
 2
B
%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22s
 a
s
%20manual%22%2B%22sas%20guide%22cmpt=q

 Example 2: R software, SAS software, SPSS software

http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%
 2
0
software%22%2C%22sas%20software%22cmpt=q

 Example 3: R code, SAS code, SPSS code

http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20co
 d
e
%22%2C%22sas%20code%22cmpt=q

 Example 4: R graph, SAS graph, SPSS graph

http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20g
 r
a
ph%22%2C%22sas%20graph%22cmpt=q

 Example 5: R regression, SAS regression, SPSS regression

http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22sps
 s
%
20regression%22%2C%22sas%20regression%22cmpt=q

 Some example are cross-software (learning needs - Example1), other
can
 be biased by the tarditional use of that software (in SPSS usually
you
 don't manipulate graph, i think)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Joris Meys

Re: [R] count data with a specific range

2010-06-24 Thread Joris Meys
see ?levels eg:

x - rnorm(10)
y - cut(x,c(-10,0,10))
levels(y)-c(-10-0,0-10)

cheers
Joris

On Thu, Jun 24, 2010 at 4:14 AM, Yi liuyi.fe...@gmail.com wrote:
 Yeap. It works. Just to make the result more beautiful.

 One more question.

 The interval is showns as (0,10].

 Is there a way to change it into the format 0-10?
 Thanks.

 On Wed, Jun 23, 2010 at 6:12 PM, Joris Meys jorism...@gmail.com wrote:

 see ?cut

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 2:57 AM, Yi liuyi.fe...@gmail.com wrote:
  I would like to prepare the data for barplot. But I only have the data
  frame
  now.
 
  x1=rnorm(10,mean=2)
  x2=rnorm(20,mean=-1)
  x3=rnorm(15,mean=3)
  data=data.frame(x1,x2,x3)
 
  If there a way to put data within a specific range? The expected result
  is
  as follows:
   range       x1                  x2                    x3
  -10-0        2                      5                     1  (# points
  in
  this range)
  0-10         7                     9                      6
  ...
 
  I know the table function but I do not know how to deal with the range
  issue.
 
  Thanks in advance.
 
  Yi
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys
Isn't that exactly what you would expect when using a _generalized_
least squares compared to a normal least squares? GLS is not the same
as WLS.

http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

Cheers
Joris

On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I thought
 that maybe optimum weights from gls might be used as weights in lm (as
 shown below), but apparently this is not the case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
 = varIdent(form = ~ 1 | Species))
 aa - attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms of
 standard errors). Could you please explain why? Are these different
 types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys
One way of looking at it is doing a sign test after substraction of
the mean. For symmetrical data sets, E[X-mean(X)] = 0, so you expect
to have about as many values above as below zero. There is a sign test
somewhere in one of the packages, but it's easily done using the
binom.test as well :

 set.seed(12345)
 x1 - rnorm(100)
 x2 - rpois(100,2)

  binom.test((sum(x1-mean(x1)0)),length(x1))

Exact binomial test

data:  (sum(x1 - mean(x1)  0)) and length(x1)
number of successes = 56, number of trials = 100, p-value = 0.2713
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.4571875 0.6591640
sample estimates:
probability of success
  0.56

  binom.test((sum(x2-mean(x2)0)),length(x2))

Exact binomial test

data:  (sum(x2 - mean(x2)  0)) and length(x2)
number of successes = 37, number of trials = 100, p-value = 0.01203
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2755666 0.4723516
sample estimates:
probability of success
  0.37

Cheers
Joris

On Thu, Jun 24, 2010 at 4:16 AM, Atte Tenkanen atte...@utu.fi wrote:
 PS.

 Mayby I can somehow try to transform data and check it, for example, using 
 the skewness-function of timeDate-package?

 Thanks. What I have had to ask is that

 how do you test that the data is symmetric enough?
 If it is not, is it ok to use some data transformation?

 when it is said:

 The Wilcoxon signed rank test does not assume that the data are
 sampled from a Gaussian distribution. However it does assume that the
 data are distributed symmetrically around the median. If the
 distribution is asymmetrical, the P value will not tell you much about
 whether the median is different than the hypothetical value.

  On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote:
   Hi all,
  
   I have a distribution, and take a sample of it. Then I compare
 that
  sample with the mean of the population like here in Wilcoxon signed

  rank test with continuity correction:
  
   wilcox.test(Sample,mu=mean(All), alt=two.sided)
  
          Wilcoxon signed rank test with continuity correction
  
   data:  AlphaNoteOnsetDists
   V = 63855, p-value = 0.0002093
   alternative hypothesis: true location is not equal to 0.4115136
  
   wilcox.test(Sample,mu=mean(All), alt = greater)
  
          Wilcoxon signed rank test with continuity correction
  
   data:  AlphaNoteOnsetDists
   V = 63855, p-value = 0.0001047
   alternative hypothesis: true location is greater than 0.4115136
  
   What assumptions are needed for the population?
 
  wikipedia says:
  The Wilcoxon signed-rank test is a _non-parametric_ statistical
  hypothesis test for... 
  it also talks about the assumptions.
 
   What can we say according these results?
   p-value for the less is 0.999.
 
  That the p-value for less and greater seem to sum up to one, and that
  the p-value of greater is half of that for two-sided. You shouldn't
  ask what we can say. You should ask yourself What was the question
  and is this test giving me an answer on that question?
 
  Cheers
  Joris
 
  --
  Joris Meys
  Statistical consultant
 
  Ghent University
  Faculty of Bioscience Engineering
  Department of Applied mathematics, biometrics and process control
 
  tel : +32 9 264 59 87
  joris.m...@ugent.be
  ---
  Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys
Indeed, WLS is a special case of GLS, where the error covariance
matrix is a diagonal matrix. OLS is a special case of GLS, where the
error is considered homoscedastic and all weights are equal to 1. And
I now realized that the varIdent() indeed makes a diagonal covariance
matrix, so the results should be the same in fact. Sorry for missing
that one.

A closer inspection shows that the results don't differ too much. The
fitting method differs between both functions; lm.wfit uses the QR
decomposition, whereas gls() uses restricted maximum likelihood. In
Asymptopia, they should give the same result.

Cheers
Joris


On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com wrote:
 Thanks for reply.

 Yes, they do differ, but does not gls() with the weights argument
 (correlation being unchanged) make the special version of GLS, as this
 sentence from the page you provided says: The method leading to this
 result is called Generalized Least Squares estimation (GLS), of which
 WLS is just a special case?

 Best,
 Stats Wolf



 On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the same
 as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I thought
 that maybe optimum weights from gls might be used as weights in lm (as
 shown below), but apparently this is not the case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
 = varIdent(form = ~ 1 | Species))
 aa - attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms of
 standard errors). Could you please explain why? Are these different
 types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2

2010-06-24 Thread Joris Meys
On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza
eugeniusz.kal...@polsl.pl wrote:

 Dear useRs,

 Thanks for any advices

 # I do not know where are the examples how to mark groups
 #  based on signal occurence in the additional variable: cf. variable c2,
 # How to calculate different calculations for groups defined by (split by 
 occurence of c2 characteristic data)


 #First example of simple data
 #mexample   1      2    3  4     5  6  7  8  9  10 11       12 13 14 15 16 17
 c0-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,      12,13,14,15,16,17 
     )
 c0
 c1-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10, 
 0,NA,20,10.3444)
 c1
 c2-rbind(NA,Start1,NA,NA,Stop1,NA,NA,NA,NA,NA,NA,Start2,NA,NA,NA,NA,Stop2)
 c2
 C.df-data.frame(cbind(c0,c1,c2))
 colnames(C.df)-c(c0,c1,c2)
 C.df

 # preparation of form for explaining further needed result (next 3 lines are 
 not needed indeed, they are only  to explain how to obtain final result
  c3-rbind(NA,Start1,Start1,Start1,Start1,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2)
  c4-rbind(NA, Stop1, Stop1, Stop1, Stop1, Stop2, Stop2, Stop2, 
 Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, 
 Stop2)
  C.df-data.frame(cbind(c0,c1,c2,c3,c4))
  colnames(C.df)-c(c0,c1,c2,c3,c4)
  C.df$c5-paste(C.df$c3,C.df$c4,sep=-)
  C.df

Now this is something I don't get. The list Start2-Stop2 starts way
before Start2, actually at Stop1. Sure that's what you want?

I took the liberty of showing how to get the data between start and
stop for every entry, and how to apply functions to it. If you don't
get the code, look at
?lapply
?apply
?grep

I also adjusted your example, as you caused all variables to be
factors by using the cbind in the data.frame function. Never do this
unless you're really sure you have to. But I can't think of a case
where that would be beneficial...

...
C.df-data.frame(c0,c1,c2)
C.df

# find positions
Start - grep(Start,C.df$c2)
Stop - grep(Stop,C.df$c2)

# create indices
idx - apply(cbind(Start,Stop),1,function(i) i[1]:i[2])
names(idx) - paste(Start,1:length(Start),-Stop,1:length(Start),sep=)

# Apply the function summary and get a list back named by the interval.
out - lapply(idx,function(i) summary(C.df[i,1:2]))
out

If you really need to start Start2 right after Stop1, you can use a
similar approach.

Cheers
Joris

 # NEEDED RESULTS
  # needed result
 # for Stat1-Stop1: mean(20,30,40,50)
 # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
 #mean:
         c1     c3    c4           c5
         20  Start1 Stop1 Start1-Stop1
   25.48585  Start2 Stop2 Start2-Stop2

 #sum
 # for Stat1-Stop1: sum(20,30,40,50)
 # for Stat2-Stop2: sum(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
 #sum:
         c1     c3    c4           c5
        140  Start1 Stop1 Start1-Stop1
   280.3444  Start2 Stop2 Start2-Stop2

 # for Stat1-Stop1: max(20,30,40,50)
 # for Stat2-Stop2: max(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
 #max:
         c1     c3    c4           c5
        50  Start1 Stop1 Start1-Stop1
        60  Start2 Stop2 Start2-Stop2

 # place of max  (in Start1-Stop1: 4 th element in gruop Start1-Stop1
 # place of max  (in Start1-Stop1: 2 nd element in gruop Start1-Stop1

        c0     c3    c4           c5
         4  Start1 Stop1 Start1-Stop1
         2  Start2 Stop2 Start2-Stop2


 Thanks for any suggestion,
 Kaluza

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys
Thanks!  I was getting confused as wel, Wolf really had a point.

I have to admit that this is all a bit counterintuitive. I would
expect those weights to be the inverse of the variances, as GLS uses
the inverse of the variance-covariance matrix. I finally found in the
help files ?nlme::varWeights the needed information, after you pointed
out the problem and after strolling through quite a part of the help
files...

Does this mean that the GLS algorithm uses 1/sd as weights (can't
imagine that...), or is this one of the things in R that just are like
that?

Cheers
Joris

On Thu, Jun 24, 2010 at 3:20 PM, Viechtbauer Wolfgang (STAT)
wolfgang.viechtba...@stat.unimaas.nl wrote:
 The weights in 'aa' are the inverse standard deviations. But you want to use 
 the inverse variances as the weights:

 aa - (attributes(summary(f1)$modelStruct$varStruct)$weights)^2

 And then the results are essentially identical.

 Best,

 --
 Wolfgang Viechtbauer                        http://www.wvbauer.com/
 Department of Methodology and Statistics    Tel: +31 (0)43 388-2277
 School for Public Health and Primary Care   Office Location:
 Maastricht University, P.O. Box 616         Room B2.01 (second floor)
 6200 MD Maastricht, The Netherlands         Debyeplein 1 (Randwyck)


 Original Message
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Stats Wolf Sent:
 Thursday, June 24, 2010 15:00 To: Joris Meys
 Cc: r-help@r-project.org
 Subject: Re: [R] Question on WLS (gls vs lm)

 Indeed, they should give the same results, and hence I was worried to
 see that the results were not that same. Suffice it to look at
 standard errors and p-values: they do differ, and the differences are
 not really that small.


 Thanks,
 Stats Wolf



 On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys jorism...@gmail.com
 wrote:
 Indeed, WLS is a special case of GLS, where the error covariance
 matrix is a diagonal matrix. OLS is a special case of GLS, where the
 error is considered homoscedastic and all weights are equal to 1. And
 I now realized that the varIdent() indeed makes a diagonal covariance
 matrix, so the results should be the same in fact. Sorry for missing
 that one.

 A closer inspection shows that the results don't differ too much. The
 fitting method differs between both functions; lm.wfit uses the QR
 decomposition, whereas gls() uses restricted maximum likelihood. In
 Asymptopia, they should give the same result.

 Cheers
 Joris


 On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com
 wrote:
 Thanks for reply.

 Yes, they do differ, but does not gls() with the weights argument
 (correlation being unchanged) make the special version of GLS, as
 this sentence from the page you provided says: The method leading
 to this result is called Generalized Least Squares estimation
 (GLS), of which WLS is just a special case?

 Best,
 Stats Wolf



 On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com
 wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the
 same as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_square
 s_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com
 wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I
 thought that maybe optimum weights from gls might be used as
 weights in lm (as shown below), but apparently this is not the
 case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,
 weights = varIdent(form = ~ 1 | Species)) aa -
 attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris,
 weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms
 of standard errors). Could you please explain why? Are these
 different types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


 __
 R-help@r-project.org mailing list
 https

Re: [R] How to say if error

2010-06-24 Thread Joris Meys
An old-fashioned and I guess also advised-against method would be to
use try() itself, eg :

set.seed(1)
x - rnorm(1:10)
y - letters[1:10]
z - rnorm(1:10)

for (i in list(x,y,z)){
  cc - try(sum(i), silent=T)
  if(is(cc,try-error)) {next}
  print(cc)
}

Put silent=F if you want to see the error methods. See also ?try (and ?is )
Cheers
Joris

On Thu, Jun 24, 2010 at 3:34 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 24/06/2010 7:06 AM, Paul Chatfield wrote:

 I've had a look at the conditions in base and I can't get the ones to work
 I've looked at but it is all new to me.
 For example, I can work the examples for tryCatch, but it won't print a
 finally message for me when I apply it to my model.  Even if I could get
 this to work, I think it would still cause a break e.g.
 for (j in 1:10)
 {tryCatch(ifelse(j==5, stop(j), j), finally=print(oh dear))}

 Thanks for the suggestion though - any others?


 I think you don't want to use finally, which is just code that's guaranteed
 to be executed at the end.  You want to catch the errors and continue.  For
 example,

 for (j in 1:10)
 { tryCatch(ifelse(j==5, stop(j), print(j)), error=function(e) {print(caught
 error); print(e)}) }

 Duncan Murdoch

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If-error-resume-next

2010-06-24 Thread Joris Meys
http://r.789695.n4.nabble.com/How-to-say-if-error-td2266619.html#a2266619

Cheers

On Thu, Jun 24, 2010 at 4:30 PM, Christofer Bogaso
bogaso.christo...@gmail.com wrote:
 In VBA there is a syntax if error resume next which sometimes acts as very
 beneficial on ignoring error. Is there any similar kind of function/syntax
 here in R as well?

 Thanks for your attention

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Averaging half hourly data to hourly

2010-06-24 Thread Joris Meys
See ?aggregate

Cheers
J

On Thu, Jun 24, 2010 at 10:45 AM, Jennifer Wright
j.wright...@sms.ed.ac.uk wrote:
 Hi all,

 I have some time-series data in half hourly time steps and I want to be able
 to either average or sum the two half hours together into hours. Any help
 greatly appreciated.

 Thanks,

 Jenny


 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to say if error

2010-06-24 Thread Joris Meys
You could do that using the options, eg :

set.seed(1)
x - rnorm(1:10)
y - letters[1:10]
z - rnorm(1:10)

warn -getOption(warn)
options(warn=2)
for (i in list(x,y,z)){
  cc - try(mean(i), silent=T)
  if(is(cc,try-error)) {next}
  print(cc)
}
options(warn=warn)

see ?options under warn

Cheers
Joris

On Thu, Jun 24, 2010 at 5:12 PM, Paul Chatfield
p.s.chatfi...@reading.ac.uk wrote:

 On a similar issue, how can you detect a warning in a loop - e.g. the
 following gives a warning, so I'd like to set up code to recognise that and
 then carry on in a loop

 x-rnorm(2);y-c(1,0)
 ff-glm(y/23~x, family=binomial)

 so this would be incorporated into a loop that might be

 x-rnorm(10);y-rep(c(1,0),5)
 for (i in 1:10)
 {ee-glm(y~x, family=binomial)
 ff-glm(y/23~x, family=binomial)}

 from which I would recognise the warning in ff and not those in ee, saving
 results from ee and not from ff. The last bit would be easy adding a line
 if(there_is_a_warning_message) {newvector-NA} else {use results} but how do
 you detect the warning message?

 Thanks all for your feedback so far,

 Paul
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-say-if-error-tp2266619p2267140.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PD: ?to calculate sth for groups defined between points in one variable (string), / value separating/ spliting variable into groups by i.e. between start, NA, NA, stop1, start2, NA, stop2

2010-06-24 Thread Joris Meys
Same trick :
c0-rbind( 1,  2 , 3, 4,  5, 6, 7, 8, 9,10,11,
12,13,14,15,16,17 )
c0
c1-rbind(10, 20 ,30,40, 50,10,60,20,30,40,50,  30,10,
0,NA,20,10.3444)
c1
c2-rbind(NA,A,NA,NA,B,NA,NA,NA,NA,NA,NA,C,NA,NA,NA,NA,D)
c2
C.df-data.frame(c0,c1,c2)
C.df

pos - which(!is.na(C.df$c2))

idx - sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1))
names(idx) - sapply(2:length(pos),
function(i) paste(C.df$c2[pos[i-1]],-,C.df$c2[pos[i]]))


out - lapply(idx,function(i) summary(C.df[i,1:2]))
out



2010/6/24 Eugeniusz Kałuża eugeniusz.kal...@polsl.pl:
 Dear useRs,

 Thanks for advice from Joris Meys,
 Now will try to think how to make it working for less specyfic case,
 to make the problem more general.
 Then the result should be displayed for every group between non empty string 
 in c2
 i.e. not only result for:
  #mean:
          c1     c3    c4           c5
          20  Start1 Stop1 Start1-Stop1
    25.48585  Start2 Stop2 Start2-Stop2

 but also for every one group created by space between two closest strings in 
 c2, that contains only seriess of Na, NA, NA, separated from time to time by 
 one string i.e.:
  #mean:
          c1     c3    c4           c5
          20 Start1 Stop1 Start1-Stop1
          .. Stop1 Start2 Stop1-Start2
    25.48585  Start2 Stop2 Start2-Stop2

 i.e.
 to rewrite this maybe for another simpler version of command

 but also for every one group created by space between two closest strings in 
 c2, that contains only seriess of Na, NA, NA, separated from time to time by 
 one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA
 i.e.:
  #mean:
          c1     c3    c4           c5
          20      A     B          A-B
          ..      B     C          B-C
    25.48585      C     D          C-D
 ...


 Looking for more general method (function), grouping between these letters in 
 c2,
 I will now try to study solution proposed by Joris Meys
 Thanks for immediate aswer
 Kaluza




 -Wiadomo¶æ oryginalna-
 Od: Joris Meys [mailto:jorism...@gmail.com]
 Wys³ano: Cz 2010-06-24 15:14
 Do: Eugeniusz Ka³u¿a
 DW: r-help@r-project.org
 Temat: Re: [R] ?to calculate sth for groups defined between points in one 
 variable (string), / value separating/ spliting variable into groups by i.e. 
 between start, NA, NA, stop1, start2, NA, stop2

 On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza
 eugeniusz.kal...@polsl.pl wrote:

 Dear useRs,

 Thanks for any advices

 # I do not know where are the examples how to mark groups
 #  based on signal occurence in the additional variable: cf. variable c2,
 # How to calculate different calculations for groups defined by (split by 
 occurence of c2 characteristic data)


 #First example of simple data
 #mexample   1      2    3  4     5  6  7  8  9  10 11       12 13 14 15 16 17
 c0-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,      
 12,13,14,15,16,17     )
 c0
 c1-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10, 
 0,NA,20,10.3444)
 c1
 c2-rbind(NA,Start1,NA,NA,Stop1,NA,NA,NA,NA,NA,NA,Start2,NA,NA,NA,NA,Stop2)
 c2
 C.df-data.frame(cbind(c0,c1,c2))
 colnames(C.df)-c(c0,c1,c2)
 C.df

 # preparation of form for explaining further needed result (next 3 lines are 
 not needed indeed, they are only  to explain how to obtain final result
  c3-rbind(NA,Start1,Start1,Start1,Start1,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2,Start2)
  c4-rbind(NA, Stop1, Stop1, Stop1, Stop1, Stop2, Stop2, 
 Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, Stop2, 
 Stop2, Stop2)
  C.df-data.frame(cbind(c0,c1,c2,c3,c4))
  colnames(C.df)-c(c0,c1,c2,c3,c4)
  C.df$c5-paste(C.df$c3,C.df$c4,sep=-)
  C.df

 Now this is something I don't get. The list Start2-Stop2 starts way
 before Start2, actually at Stop1. Sure that's what you want?

 I took the liberty of showing how to get the data between start and
 stop for every entry, and how to apply functions to it. If you don't
 get the code, look at
 ?lapply
 ?apply
 ?grep

 I also adjusted your example, as you caused all variables to be
 factors by using the cbind in the data.frame function. Never do this
 unless you're really sure you have to. But I can't think of a case
 where that would be beneficial...

 ...
 C.df-data.frame(c0,c1,c2)
 C.df

 # find positions
 Start - grep(Start,C.df$c2)
 Stop - grep(Stop,C.df$c2)

 # create indices
 idx - apply(cbind(Start,Stop),1,function(i) i[1]:i[2])
 names(idx) - paste(Start,1:length(Start),-Stop,1:length(Start),sep=)

 # Apply the function summary and get a list back named by the interval.
 out - lapply(idx,function(i) summary(C.df[i,1:2]))
 out

 If you really need to start Start2 right after Stop1, you can use a
 similar approach.

 Cheers
 Joris

 # NEEDED RESULTS
  # needed result
 # for Stat1-Stop1: mean(20,30,40,50)
 # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
 #mean:
         c1     c3    c4           c5
         20  Start1 Stop1 Start1-Stop1
   25.48585

Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys
I do agree that one should not trust solely on sources like wikipedia
and graphpad, although they contain a lot of valuable information.

This said, it is not too difficult to illustrate why, in the case of
the one-sample signed rank test, the differences should be not to far
away from symmetrical. It just needs some reflection on how the
statistic is calculated. If you have an asymmetrical distribution, you
have a lot of small differences with a negative sign and a lot of
large differences with a positive sign if you test against the median
or mean. Hence the sum of ranks for one side will be higher than for
the other, leading eventually to a significant result.

An extreme example :

 set.seed(100)
 y - rnorm(100,1,2)^2
 wilcox.test(y,mu=median(y))

Wilcoxon signed rank test with continuity correction

data:  y
V = 3240.5, p-value = 0.01396
alternative hypothesis: true location is not equal to 1.829867

 wilcox.test(y,mu=mean(y))

Wilcoxon signed rank test with continuity correction

data:  y
V = 1763, p-value = 0.008837
alternative hypothesis: true location is not equal to 5.137409

Which brings us to the question what location is actually tested in
the wilcoxon test. For the measure of location to be the mean (or
median), one has to assume that the distribution of the differences is
rather symmetrical, which implies your data has to be distributed
somewhat symmetrical. The test is robust against violations of this
-implicit- assumption, but in more extreme cases skewness does matter.

Cheers
Joris

On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net wrote:


 You are being misled. Simply finding a statement on a statistics software
 website, even one as reputable as Graphpad (???), does not mean that it is
 necessarily true. My understanding (confirmed reviewing Nonparametric
 statistical methods for complete and censored data by M. M. Desu, Damaraju
 Raghavarao, is that the Wilcoxon signed-rank test does not require that the
 underlying distributions be symmetric. The above quotation is highly
 inaccurate.

 --
 David.



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-24 Thread Joris Meys
On Fri, Jun 25, 2010 at 12:17 AM, David Winsemius
dwinsem...@comcast.net wrote:

 On Jun 24, 2010, at 6:09 PM, Joris Meys wrote:

 I do agree that one should not trust solely on sources like wikipedia
 and graphpad, although they contain a lot of valuable information.

 This said, it is not too difficult to illustrate why, in the case of
 the one-sample signed rank test,

 That is a key point. I was assuming that you were using the paired sample
 version of the WSRT and I may have been misleading the OP. For the
 one-sample situation, the assumption of symmetry is needed but for the
 paired sampling version of the test, the location shift becomes the tested
 hypothesis, and no assumptions about the form of the hypothesis are made
 except that they be the same.

I believe you mean the form of the distributions. The assumption that
the distributions of both samples are the same (or similar, it is a
robust test) implies that the differences x_i - y_i are more or less
symmetrically distributed. Key point here that we're not talking about
the distribution of the populations/samples (as done in the OP) but
about the distribution of the difference. I may not have been clear
enough on that one.

Cheers
Joris

 Any consideration of median or mean (which
 will be the same in the case of symmetric distributions) gets lost in the
 paired test case.

 --
 David.


 the differences should be not to far
 away from symmetrical. It just needs some reflection on how the
 statistic is calculated. If you have an asymmetrical distribution, you
 have a lot of small differences with a negative sign and a lot of
 large differences with a positive sign if you test against the median
 or mean. Hence the sum of ranks for one side will be higher than for
 the other, leading eventually to a significant result.

 An extreme example :

 set.seed(100)
 y - rnorm(100,1,2)^2
 wilcox.test(y,mu=median(y))

       Wilcoxon signed rank test with continuity correction

 data:  y
 V = 3240.5, p-value = 0.01396
 alternative hypothesis: true location is not equal to 1.829867

 wilcox.test(y,mu=mean(y))

       Wilcoxon signed rank test with continuity correction

 data:  y
 V = 1763, p-value = 0.008837
 alternative hypothesis: true location is not equal to 5.137409

 Which brings us to the question what location is actually tested in
 the wilcoxon test. For the measure of location to be the mean (or
 median), one has to assume that the distribution of the differences is
 rather symmetrical, which implies your data has to be distributed
 somewhat symmetrical. The test is robust against violations of this
 -implicit- assumption, but in more extreme cases skewness does matter.

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 7:40 PM, David Winsemius dwinsem...@comcast.net
 wrote:


 You are being misled. Simply finding a statement on a statistics software
 website, even one as reputable as Graphpad (???), does not mean that it
 is
 necessarily true. My understanding (confirmed reviewing Nonparametric
 statistical methods for complete and censored data by M. M. Desu,
 Damaraju
 Raghavarao, is that the Wilcoxon signed-rank test does not require that
 the
 underlying distributions be symmetric. The above quotation is highly
 inaccurate.

 --
 David.



 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help, bifurcation diagram efficiency

2010-06-24 Thread Joris Meys
Basically, don't write loops. Think vectors, matrices,... The R
Inferno of Patrick Burns contains a lot of valuable information on
optimizing code :

http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf

Cheers
Joris


On Thu, Jun 24, 2010 at 7:51 PM, Tyler Massaro tyler.mass...@gmail.com wrote:
 Hello all -

 This code will run, but it bogs down my computer when I run it for finer and
 finer time increments and more generations.  I was wondering if there is a
 better way to write my loops so that this wouldn't happen.  Thanks!

 -Tyler

 #

 # Bifurcation diagram

 # Using Braaksma system of equations

 # We have however used a Fourier analysis

 # to get a forcing function similar to

 # cardiac action potential...

 #

 require(odesolve)



 # We get this s_of_t function from Maple ws

 s_of_t = function(t)

 {

 (1/10) * (( (1/2) + (1/2) * (sin((1/4)*pi*t))/(abs(sin((1/4)*pi*t * (
 6.588315815*sin((1/4)*pi*t) - 1.697435362*sin((1/2)*pi*t) - 1.570296922*sin
 ((3/4)*pi*t) + 0.3247901958*sin(pi*t) + 0.7962749105*sin((5/4)*pi*t) +
 0.07812230515*sin((3/2)*pi*t) - 0.3424877143*sin((7/4)*pi*t) - 0.1148306748*
 sin(2*pi*t) + 0.1063696962*sin((9/4)*pi*t) + 0.02812403009*sin((5/2)*pi*t)))

  }


 ModBraaksma = function(t, n, p)

 {


  dx.dt = (1/0.01)*(n[2]-((1/2)*n[1]^2+(1/3)*n[1]^3))

  dy.dt = -(n[1]+p[alpha]) + 0.032 * s_of_t(t)

 list(c(dx.dt, dy.dt))

 }


 initial.values = c(0.1, -0.02)


 alphamin = 0.01

 alphamax = 0.02


 alphas = seq(alphamin, alphamax, by = 0.1)


 TimeInterval = 100


 times = seq(0.001, TimeInterval, by = 0.001)


 plot(1, xlim = c(alphamin, alphamax), ylim = c(0,0.3), type = n,xlab
 = Values
 of alpha, ylab = Approximate loop size for a limit cycle, main =
 Bifurcation
 Diagram)


 for (i in 1:length(alphas)){

  params = c(alpha=alphas[i])

 out = lsoda(initial.values, times, ModBraaksma, params)

 X = out[,2]

 Y = out[,3]

  for(j in 200:length(times)){

  if (abs(X[j])  0.001) {

 points(alphas[i], Y[j], pch = .)

  }

 }

 }

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple qqplot question

2010-06-24 Thread Joris Meys
Also take a look at qq.plot in the package car. Gives you exactly
what you want.
Cheers
Joris

On Fri, Jun 25, 2010 at 12:55 AM, Ralf B ralf.bie...@gmail.com wrote:
 More details...

 I have two distributions which are very similar. I have plotted
 density plots already from the two distributions. In addition,
 I created a qqplot that show an almost straight line. What I want is a
 line that represents the ideal case in which the two
 distributions match perfectly. I would use this line to see how much
 the errors divert at different stages of the plot.

 Ralf



 On Thu, Jun 24, 2010 at 5:56 PM, stephen sefick ssef...@gmail.com wrote:
 You are going to have to define the question a little better.  Also,
 please provide a reproducible example.

 On Thu, Jun 24, 2010 at 4:44 PM, Ralf B ralf.bie...@gmail.com wrote:
 I am a beginner in R, so please don't step on me if this is too
 simple. I have two data sets datax and datay for which I created a
 qqplot

 qqplot(datax,datay)

 but now I want a line that indicates the perfect match so that I can
 see how much the plot diverts from the ideal. This ideal however is
 not normal, so I think qqnorm and qqline cannot be applied.

 Perhaps you can help?

 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Stephen Sefick
 
 | Auburn University                                   |
 | Department of Biological Sciences           |
 | 331 Funchess Hall                                  |
 | Auburn, Alabama                                   |
 | 36849                                                    |
 |___|
 | sas0...@auburn.edu                             |
 | http://www.auburn.edu/~sas0025             |
 |___|

 Let's not spend our time and resources thinking about things that are
 so little or so large that all they really do for us is puff us up and
 make us feel like gods.  We are mammals, and have not exhausted the
 annoying little problems of being mammals.

                                                                -K. Mullis


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Install package automatically if not there?

2010-06-24 Thread Joris Meys
I've never seen R been Perl'd this nice before.

On Thu, Jun 24, 2010 at 10:32 PM, Albert-Jan Roskam fo...@yahoo.com wrote:
 require(pkg) || install.packages(pkg)

 Cheers!!

 Albert-Jan



 ~~

 All right, but apart from the sanitation, the medicine, education, wine, 
 public order, irrigation, roads, a fresh water system, and public health, 
 what have the Romans ever done for us?

 ~~

 --- On Thu, 6/24/10, Ralf B ralf.bie...@gmail.com wrote:

 From: Ralf B ralf.bie...@gmail.com
 Subject: [R] Install package automatically if not there?
 To: r-help@r-project.org r-help@r-project.org
 Date: Thursday, June 24, 2010, 9:25 PM

 Hi fans,

 is it possible for a script to check if a library has been installed?
 I want to automatically install it if it is missing to avoid scripts
 to crash when running on a new machine...

 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Install package automatically if not there?

2010-06-24 Thread Joris Meys
If you want to make changes more permanent, you should take a look at
the Rprofile.site file. This one gets loaded in R at the startup of
the console. You can set the CRAN there if you want too.

Cheers
Joris

On Fri, Jun 25, 2010 at 1:32 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hello Ralf,

 Glad it works for you.  As far as avoiding any prompting if packages
 are out-of-date; I am not sure.  It honestly seems like a risky idea
 to me to have old packages being overwritten without a user knowing.
 However, I added a few lines of code here to set the CRAN mirror, and
 revert it back to whatever it was when the function ends to make it as
 inobtrusive as possible.


 load.fun - function(x) {
  old.repos - getOption(repos)
  on.exit(options(repos = old.repos)) #this resets the repos option
 when the function exits
  new.repos - old.repos
  new.repos[CRAN] - http://cran.stat.ucla.edu; #set your favorite
 CRAN Mirror here
  options(repos = new.repos)
  x - as.character(substitute(x))
  if(isTRUE(x %in% .packages(all.available=TRUE))) {
    eval(parse(text=paste(require(, x, ), sep=)))
  } else {
    update.packages() # recommended before installing so that
 dependencies are the latest version
    eval(parse(text=paste(install.packages(', x, '), sep=)))
    eval(parse(text=paste(require(, x, ), sep=)))
  }
 }

 Best regards,

 Josh

 On Thu, Jun 24, 2010 at 3:34 PM, Ralf B ralf.bie...@gmail.com wrote:
 Thanks, works great.

 By the way, is there a say to even go further, and avoid prompting
 (e.g. picking a particiular server by default and updating without
 further prompt) ?  I could understand if such a feature does not exist for
 security reasons...

 Thanks again,
 Ralf

 On Thu, Jun 24, 2010 at 5:12 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 On Thu, Jun 24, 2010 at 1:51 PM, Joshua Wiley jwiley.ps...@gmail.com 
 wrote:
 Hello Ralf,

 This is a little function that you may find helpful.  If the package
 is already installed, it just loads it, otherwise it updates the
 existing packages and then installs the required package.  As in
 require(), 'x' does not need to be quoted.

 load.fun - function(x) {
  x - as.character(substitute(x))
  if(isTRUE(x %in% .packages(all.available=TRUE))) {
    eval(parse(text=paste(require(, x, ), sep=)))
  } else {
    update.packages() # recommended before installing so that
 dependencies are the latest version
    eval(parse(text=paste(install.packages(', x, '), sep=)))
  }
 }

 I miscopied the last line; it should be

 ###
 load.fun - function(x) {
  x - as.character(substitute(x))
  if(isTRUE(x %in% .packages(all.available=TRUE))) {
    eval(parse(text=paste(require(, x, ), sep=)))
  } else {
    update.packages() # recommended before installing so that
 dependencies are the latest version
    eval(parse(text=paste(install.packages(', x, '), sep=)))
    eval(parse(text=paste(require(, x, ), sep=)))
  }
 }
 ###


 HTH,

 Josh

 On Thu, Jun 24, 2010 at 12:25 PM, Ralf B ralf.bie...@gmail.com wrote:
 Hi fans,

 is it possible for a script to check if a library has been installed?
 I want to automatically install it if it is missing to avoid scripts
 to crash when running on a new machine...

 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student
 Health Psychology
 University of California, Los Angeles




 --
 Joshua Wiley
 Ph.D. Student
 Health Psychology
 University of California, Los Angeles





 --
 Joshua Wiley
 Ph.D. Student
 Health Psychology
 University of California, Los Angeles

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-24 Thread Joris Meys
Nice idea, but quite sensitive to search terms, if you compare your
result on ... code with ... code for:
http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20forcmpt=q

On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote:
 First: excuse for my english

 My opinion: a useful font for measuring popoularity can be Google
 Insights for Search - http://www.google.com/insights/search/#

 Every person using a software like R, SAS, SPSS needs first to learn
 it. So probably he make a web-search for a manual, a tutorial, a
 guide. One can measure the share of this kind of serach query.
 This kind of results can be useful to determine trends of
 popularity.

 Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
 SPSS tutorial/manual/guide
 http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22cmpt=q

 Example 2: R software, SAS software, SPSS software
 http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22cmpt=q

 Example 3: R code, SAS code, SPSS code
 http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22cmpt=q

 Example 4: R graph, SAS graph, SPSS graph
 http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22cmpt=q

 Example 5: R regression, SAS regression, SPSS regression
 http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22cmpt=q

 Some example are cross-software (learning needs - Example1), other can
 be biased by the tarditional use of that software (in SPSS usually you
 don't manipulate graph, i think)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-24 Thread Joris Meys
dangit, tab in the way...

On Fri, Jun 25, 2010 at 1:56 AM, Joris Meys jorism...@gmail.com wrote:
 Nice idea, but quite sensitive to search terms, if you compare your
 result on ... code with ... code for:
 http://www.google.com/insights/search/#q=r%20code%20for%2Csas%20code%20for%2Cspss%20code%20forcmpt=q

This one actually shows quite nicely when students start panicking for
their projects.
Cheers
Joris

 On Thu, Jun 24, 2010 at 10:48 PM, Dario Solari dario.sol...@gmail.com wrote:
 First: excuse for my english

 My opinion: a useful font for measuring popoularity can be Google
 Insights for Search - http://www.google.com/insights/search/#

 Every person using a software like R, SAS, SPSS needs first to learn
 it. So probably he make a web-search for a manual, a tutorial, a
 guide. One can measure the share of this kind of serach query.
 This kind of results can be useful to determine trends of
 popularity.

 Example 1: R tutorial/manual/guide, SAS tutorial/manual/guide,
 SPSS tutorial/manual/guide
 http://www.google.com/insights/search/#q=%22r%20tutorial%22%2B%22r%20manual%22%2B%22r%20guide%22%2B%22r%20vignette%22%2C%22spss%20tutorial%22%2B%22spss%20manual%22%2B%22spss%20guide%22%2C%22sas%20tutorial%22%2B%22sas%20manual%22%2B%22sas%20guide%22cmpt=q

 Example 2: R software, SAS software, SPSS software
 http://www.google.com/insights/search/#q=%22r%20software%22%2C%22spss%20software%22%2C%22sas%20software%22cmpt=q

 Example 3: R code, SAS code, SPSS code
 http://www.google.com/insights/search/#q=%22r%20code%22%2C%22spss%20code%22%2C%22sas%20code%22cmpt=q

 Example 4: R graph, SAS graph, SPSS graph
 http://www.google.com/insights/search/#q=%22r%20graph%22%2C%22spss%20graph%22%2C%22sas%20graph%22cmpt=q

 Example 5: R regression, SAS regression, SPSS regression
 http://www.google.com/insights/search/#q=%22r%20regression%22%2C%22spss%20regression%22%2C%22sas%20regression%22cmpt=q

 Some example are cross-software (learning needs - Example1), other can
 be biased by the tarditional use of that software (in SPSS usually you
 don't manipulate graph, i think)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write a loop for tallies

2010-06-24 Thread Joris Meys
It would help if you placed r - 0; s - 0 etc. outside the loop. Same
goes for cat(...). And get rid of the sum(r), sum(s) and so on, that's
doing nothing (r,s,... are single numbers)

This said :

See Peter Langfelder's response.

Cheers
Joris

 # see ?table for a better approach
 r-0
 s-0
 t-0
 u-0
 v-0

 a- for (i in 1:length(n)) {
 ifelse(n[i] == 3000, r - r+1,
 ifelse(n[i] == 4000, s - r+1,
 ifelse(n[i] == 5000, t - r+1,
 ifelse(n[i] == 6000, u - r+1,
 ifelse(n[i] == 7000, v - r+1, NA)
 }
 cat(r = , r, \n)
 cat(s = , s, \n)
 cat(t = , t, \n)
 cat(u = , u, \n)
 cat(v = , v, \n)


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save scores from sem

2010-06-23 Thread Joris Meys
I should have specified: lavaan is not familiar to me. I'm also not
familiar enough with the sem package to tell you how to obtain the
scores, but all information regarding the fit is in the object. With
that, it shouldn't be too difficult to get the scores you want. This
paper might give you some more information, in case you didn't know it
yet :

http://socserv.mcmaster.ca/jfox/Misc/sem/SEM-paper.pdf

On a side note, sem with a single latent variable might be seen as a
factor analysis with one component, but definitely not as a PCA. A PCA
is constructed based on the total variance, rendering an orthogonal
space with as many dimensions as there ara variables. Not so for a FA,
as the matrix used to calculate the eigenvectors and eigenvalues is a
reduced matrix, in essence only taking into account part of the
variation in the data for calculation of the loadings. This makes PCA
absolutely defined, but FA only up to a rotation.

On a second side note, using the saved scores in some subsequent
analysis is in my view completely against the idea behind sem.
Structural equation modelling combines those observed variables
exactly to be able to take the variation on the combined latent
variable into account. If you use those latent variables as input in a
second analysis, you lose the information regarding the variation.

Cheers
Joris



On Wed, Jun 23, 2010 at 9:53 AM, Steve Powell st...@promente.net wrote:
 Dear Joris,
 thanks for your reply - it is the sem case which interests me. Suppose
 for example I use sem to construct a CFA for a set of variables, with
 a single latent variable, then this could be equivalent to a PCA with
 a single component, couldn't it? From the PCA I could save the
 scores as new variables; is there an equivalent with sem? This would
 be particularly useful if e.g. in sem I let some of the errors covary
 and then wanted to use the saved scores in some subsequent analysis.

 By the way, lavaan is at cran.r-project.org/web/packages/lavaan/index.html

 Best Wishes
 Steve

 www.promente.org | skype stevepowell99 | +387 61 215 997




 On Tue, Jun 22, 2010 at 7:08 PM, Joris Meys jorism...@gmail.com wrote:
 PCA and factor analysis is implemented in the core R distribution, no
 extra packages needed. When using princomp, they're in the object.

  pr.c - princomp(USArrests,scale=T)
  pr.c$scores # gives you the scores

 see ?princomp

 When using factanal, you can ask for regression scores or Bartlett
 scorse. See ?factanal.
 Mind you, you will get different -i.e. more correct- results than
 those obtained by SPSS.

 I don't understand what you mean with scores in the context of
 structural equation modelling. Lavaan is unknown to me.

 Cheers
 Joris

 On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell st...@promente.net wrote:
  Dear expeRts,
 sorry for such a newbie question -
 in PCA/factor analysis e.g. in SPSS it is possible to save scores from the
 factors. Is it analogously possible to save the implied scores from the
 latent variables in a measurement model or structural model e.g. using the
 sem or lavaan packages, to use in further analyses?
 Best wishes
 Steve Powell

 www.promente.org | skype stevepowell99 | +387 61 215 997

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (help) This is an R workspace memory processing question

2010-06-23 Thread Joris Meys
http://lmgtfy.com/?q=R+large+data+sets

Cheers
Joris

On Wed, Jun 23, 2010 at 6:38 AM, 나여나 dllm...@hanmail.net wrote:

    Hi all!


   This is an R workspace memory processing question.


   There is a method which from the R will control 10GB data at 500MB units?


   my work environment :

   R version : 2.11.1
   OS : WinXP Pro sp3


   Thanks and best regards.


   Park, Young-Ju

   from Korea.

   [1][rKWLzcpt.zNp8gmPEwGJCA00]

   
 [...@from=dllmainrcpt=r%2Dhelp%40r%2Dproject%2Eorgmsgid=%3C20100623133828%2EH
   M%2E0Zq%40dllmain%2Ewwl737%2Ehanmail%2Enet%3E]

 References

   1. mailto:dllm...@hanmail.net
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about a program

2010-06-23 Thread Joris Meys
Most of the computation time is in the functions qvnorm. You can win a
little bit by optimizing the code, but the gain is relatively small.
You can also decrease the interval used to evaluate qvnorm to win some
speed there. As you look for the upper tail, no need to evaluate the
function in negative numbers. Eg :

pair_kFWER2=function(m,alpha,rho,k,pvaluessort){
library(mvtnorm)
cc_z - numeric(m)
Var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)

lpart - 1:k  # first part
hpart - (k+1):m  # second part

# make first part of the cc_z
cc_z[lpart] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
interval=c(0,4),sigma=Var)$quantile

# make second part of the cc_z
cc_z[hpart] - sapply(hpart,function(i){
qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,interval=c(0,4),
tail=upper, sigma=Var)$quantile
})

crit_cons - 1-pnorm(cc_z)

# does the test whether values of crit_cons[i] are smaller than
# values pvaluessort[i] and returns a vector
# which says at which positions does not occur
# tail takes the last position. I guess that's what you wanted
out - tail(which(!(crit_cons  pvaluessort)),1)
return(out)
}

 system.time(replicate(100,pair_kFWER(m,alpha,rho,k,pvaluessort)))
   user  system elapsed
   5.970.046.04

 system.time(replicate(100,pair_kFWER2(m,alpha,rho,k,pvaluessort)))
   user  system elapsed
   4.430.004.44

Check whether the function works correctly. It gives the same result
as the other one in my tests. Only in the case where your function
returns 0, the modified returns integer(0)

Cheers
Joris


On Wed, Jun 23, 2010 at 2:21 PM, li li hannah@gmail.com wrote:
 Dear all,
   I have the following program for a multiple comparison procedure.
 There are two functions for the two steps. First step is to calculate the
 critical values,
 while the second step is the actual procedure [see below: program with two
 functions].
   This  work fine. However, However I want to put them into one function
 for the convenience
 of later use [see below: program with one function]. Some how the big
 function works extremely
 slow.  For example I chose
 m - 10
 rho - 0.1
 k - 2
 alpha - 0.05
 pvaluessort - sort(1-pnorm(rnorm(10))

 Is there anything that I did wrong?
  Thank you!
                    Hannah


 ##Program with two functions
 ## first step
 library(mvtnorm)
 cc_f - function(m, rho, k, alpha) {

 cc_z - numeric(m)
 var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
 for (i in 1:m){
   if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
 sigma=var)$quantile} else
               {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
 tail=upper, sigma=var)$quantile}
               }
 cc - 1-pnorm(cc_z)
 return(cc)
                             }
 ## second step
 pair_kFWER=function(m,crit_cons,pvaluessort){
 k=0
 while((k m)(crit_cons[m-k]  pvaluessort[m-k])){
 k=k+1
 }
 return(m-k)
 }
 ###
 ##Program with one function ##

 pair_kFWER=function(m,alpha,rho,k,pvaluessort){
 library(mvtnorm)
 cc_z - numeric(m)
 var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
 for (i in 1:m){
   if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
 sigma=var)$quantile} else
               {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
 tail=upper, sigma=var)$quantile}
               }
 crit_cons - 1-pnorm(cc_z)

 k=0
 while((k m)(crit_cons[m-k]  pvaluessort[m-k])){
 k=k+1
 }
 return(m-k)
 }
 #

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing distributions

2010-06-23 Thread Joris Meys
A qqplot would indeed help. ?ks.test for more formal testing, but be
aware: You should also think about what you call similar
distributions. See following example :

set.seed(12345)
x1 - c(rnorm(100),rnorm(150,3.3,0.7))
x2 - c(rnorm(140,1,1.2),rnorm(110,3.3,0.6))
x3 - c(rnorm(140,2,1.2),rnorm(110,4.3,0.6))
d1 -density(x1)
d2 - density(x2)
d3 - density(x3)

xlim - 1.2*c(min(x1,x2,x3),max(x1,x2,x3))
ylim - 1.2*c(0,max(d1$y,d2$y,d3$y))

op - par(mfrow=c(1,3))
plot(d1,xlim=xlim,ylim=ylim)
lines(d2,col=red)
lines(d3,col=blue)
qqplot(x1,x2)
qqplot(x2,x3)
par(op)

# formal testing
ks.test(x1,x2)
ks.test(x2,x3)

# relocate x3
x3b - x3 - mean(x3-x2)
x3c - x3 - mean(x3-x1)

# formal testing
ks.test(x2,x3b)
ks.test(x1,x3c)

# test location
t.test(x2-x1)
t.test(x3-x2)
t.test(x3-x1)

Cheers
Joris

On Wed, Jun 23, 2010 at 9:33 PM, Ralf B ralf.bie...@gmail.com wrote:
 I am trying to do something in R and would appreciate a push into the
 right direction. I hope some of you experts can help.

 I have two distributions obtrained from 1 datapoints each (about
 1 datapoints each, non-normal with multi-model shape (when
 eye-balling densities) but other then that I know little about its
 distribution). When plotting the two distributions together I can see
 that the two densities are alike with a certain distance to each other
 (e.g. 50 units on the X axis). I tried to plot a simplified picture of
 the density plot below:




 |
 |                                                         *
 |                                                      *     *
 |                                                   *    +   *
 |                                              *     +     +  *
 |                     *        +           *   +            +  *
 |                 *        +*     +   *  +                   + *
 |              *       +       *     +                           +*
 |           *       +                                               +*
 |        *       +                                                    +*
 |     *      +                                                          + *
 |  *      +                                                               + *
 |___


 What I would like to do is to formally test their similarity or
 otherwise measure it more reliably than just showing and discussing a
 plot. Is there a general approach other then using a Mann-Whitney test
 which is very strict and seems to assume a perfect match. Is there a
 test that takes in a certain 'band' (e.g. 50,100, 150 units on X) or
 are there any other similarity measures that could give me a statistic
 about how close these two distributions are to each other ? All I can
 say from eye-balling is that they seem to follow each other and it
 appears that one distribution is shifted by a amount from the other.
 Any ideas?

 Ralf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wilcoxon signed rank test and its requirements

2010-06-23 Thread Joris Meys
On Wed, Jun 23, 2010 at 10:27 PM, Atte Tenkanen atte...@utu.fi wrote:
 Hi all,

 I have a distribution, and take a sample of it. Then I compare that sample 
 with the mean of the population like here in Wilcoxon signed rank test with 
 continuity correction:

 wilcox.test(Sample,mu=mean(All), alt=two.sided)

        Wilcoxon signed rank test with continuity correction

 data:  AlphaNoteOnsetDists
 V = 63855, p-value = 0.0002093
 alternative hypothesis: true location is not equal to 0.4115136

 wilcox.test(Sample,mu=mean(All), alt = greater)

        Wilcoxon signed rank test with continuity correction

 data:  AlphaNoteOnsetDists
 V = 63855, p-value = 0.0001047
 alternative hypothesis: true location is greater than 0.4115136

 What assumptions are needed for the population?

wikipedia says:
The Wilcoxon signed-rank test is a _non-parametric_ statistical
hypothesis test for... 
it also talks about the assumptions.

 What can we say according these results?
 p-value for the less is 0.999.

That the p-value for less and greater seem to sum up to one, and that
the p-value of greater is half of that for two-sided. You shouldn't
ask what we can say. You should ask yourself What was the question
and is this test giving me an answer on that question?

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Probabilities from survfit.coxph:

2010-06-23 Thread Joris Meys
On Wed, Jun 23, 2010 at 9:03 PM, Parminder Mankoo pkmanko...@gmail.com wrote:
 Hello:

 In the example below (or for a censored data) using survfit.coxph, can
 anyone point me to a link or a pdf as to how the probabilities appearing in
 bold under summary(pred$surv) are calculated? Do these represent
 acumulative probability distribution in time (not including censored time)?

predicted survival. Hence the column name survival...

Cheers
Joris

 Thanks very much,
 parmee

 *fit - coxph(Surv(futime, fustat) ~ age, data = ovarian)*
 *pred - survfit(fit, newdata=data.frame(age=60))*
 *summary(pred)*

  time n.risk n.event survival std.err lower 95% CI upper 95% CI
   59     26       1    *0.978*  0.0240        0.932        1.000
  115     25       1    *0.952*  0.0390        0.878        1.000
  156     24       1    *0.917*  0.0556        0.814        1.000
  268     23       1    *0.880*  0.0704        0.752        1.000
  329     22       1    *0.818*  0.0884        0.662        1.000
  353     21       1    *0.760*  0.0991        0.588        0.981
  365     20       1    *0.698*  0.1079        0.516        0.945
  431     17       1    *0.623*  0.1187        0.429        0.905
  464     15       1    *0.549*  0.1248        0.352        0.858
  475     14       1    *0.480*  0.1267        0.286        0.805
  563     12       1    *0.382*  0.1332        0.193        0.757
  638     11       1    *0.297*  0.1292        0.127        0.697

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list operation

2010-06-23 Thread Joris Meys
Another variation on the same theme :

lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc'))
set - c('a','c')

f -function(lst,set) sapply(lst,function(x) sum(set %in% x)==length(set) )
i - f(lst,set)
names(i[i])

Doesn't serve anybody but keeps my mind fresh.

For long lists, you might benefit from first calculating the length of
set, to avoid having to do that n times for a list of length n.

Cheers
Joris

On Wed, Jun 23, 2010 at 11:01 PM, Peter Alspach
peter.alsp...@plantandfood.co.nz wrote:
 Tena koe Yu

 One possibility:

 lst[sapply(lst, function(x) length(x[x%in% c('a','c')])==2)]

 HTH ...

 Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Yuan Jian
 Sent: Thursday, 24 June 2010 1:35 a.m.
 To: r-help@r-project.org
 Subject: [R] list operation

 Hi,

 it seems a simple problem, but I can not find a clear way.
 I have a list:
 lst=list(m=c('a','b','c'),n=c('c','a'),l=c('a','bc'))
  lst
 $m
 [1] a b c
 $n
 [1] c a
 $l
 [1] a  bc

 how can I get list elements that include a given subset? for example,
 for given subset {'a','c'}, the answer should be 'm' and 'n'.

 thanks
 Yu



       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginning Eigen System question.

2010-06-23 Thread Joris Meys
Which other Linear Algebra system, and which function did you use in R?
Cheers
Joris

On Thu, Jun 24, 2010 at 12:32 AM,  rkevinbur...@charter.net wrote:
 Forgive me if I missunderstand a basic Eigensystem but when I present the 
 following matrix to most any other LinearAlgebra system:

  1  3  1
  1  2  2
  1  1  3

 I get an answer like:

 //$values
 //[1]  5.00e+00  1.00e+00 -5.536207e-16

 //$vectors
 //           [,1]       [,2]       [,3]
 //[1,] 0.5773503 -0.8451543 -0.9428090
 //[2,] 0.5773503 -0.1690309  0.2357023
 //[3,] 0.5773503  0.5070926  0.2357023

 But R gives me:

 //$values
 //[1]  5.00e+00  1.00e+00 -5.536207e-16

 //$vectors
 //           [,1]       [,2]       [,3]
 //[1,] -0.5773503 -0.8451543 -0.9428090
 //[2,] -0.5773503 -0.1690309  0.2357023
 //[3,] -0.5773503  0.5070926  0.2357023

 The only difference seems to be the sign on the first eigen vector. What am I 
 missing?

 Kevin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginning Eigen System question.

2010-06-23 Thread Joris Meys
On Thu, Jun 24, 2010 at 12:41 AM, Joris Meys jorism...@gmail.com wrote:
 Which other Linear Algebra system, and which function did you use in R?
 Cheers
 Joris

Never mind, off course you used eigen()...
Eigenvectors are only determined up to a constant. If I'm not mistaken
(but check the help files on it), R normalizes the eigenvectors (and
so does your other Linear Algebra system). This makes the eigenvectors
 defined up to the sign.

Cheers
Joris

 On Thu, Jun 24, 2010 at 12:32 AM,  rkevinbur...@charter.net wrote:
 Forgive me if I missunderstand a basic Eigensystem but when I present the 
 following matrix to most any other LinearAlgebra system:

  1  3  1
  1  2  2
  1  1  3

 I get an answer like:

 //$values
 //[1]  5.00e+00  1.00e+00 -5.536207e-16

 //$vectors
 //           [,1]       [,2]       [,3]
 //[1,] 0.5773503 -0.8451543 -0.9428090
 //[2,] 0.5773503 -0.1690309  0.2357023
 //[3,] 0.5773503  0.5070926  0.2357023

 But R gives me:

 //$values
 //[1]  5.00e+00  1.00e+00 -5.536207e-16

 //$vectors
 //           [,1]       [,2]       [,3]
 //[1,] -0.5773503 -0.8451543 -0.9428090
 //[2,] -0.5773503 -0.1690309  0.2357023
 //[3,] -0.5773503  0.5070926  0.2357023

 The only difference seems to be the sign on the first eigen vector. What am 
 I missing?

 Kevin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem to building R (datasets)

2010-06-23 Thread Joris Meys
Why compile from source? 2.11.1 installs fine on XP from the binary,
so that's the more obvious solution.

Cheers
Joris

On Thu, Jun 24, 2010 at 12:39 AM, Geun Seop Lee clarmas...@gmail.com wrote:

  Dear all,

 While I was trying to build R source, I found an error at datasets package
 (there was no error before that)

 ../../../library/datasets/R/datasets is unchanged
 Error in dir.create(Rdatadir, showWarnings = FALSE) :
   file name conversion problem
 Calls: Anonymous - Anonymous - dir.create
 Execution halted
 make[2]: *** [all] Error 1
 make[1]: *** [R] Error 1
 make: *** [all] Error 2

 And it was caused by

 �...@$(ECHO) tools:::data2LazyLoadDB(\$(pkg)\, compress=3) | \
    R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE)  /dev/null
 at the Makefile.win in the src/datasets directory
 I am using Window XP and tried to compile 2.11.1 version.

 I can't imagine how I can solve this problem. Any hints or suggestions
 will be appreciated.

 Thank you.

 Lee.



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with a specific range

2010-06-23 Thread Joris Meys
see ?cut

Cheers
Joris

On Thu, Jun 24, 2010 at 2:57 AM, Yi liuyi.fe...@gmail.com wrote:
 I would like to prepare the data for barplot. But I only have the data frame
 now.

 x1=rnorm(10,mean=2)
 x2=rnorm(20,mean=-1)
 x3=rnorm(15,mean=3)
 data=data.frame(x1,x2,x3)

 If there a way to put data within a specific range? The expected result is
 as follows:
  range       x1                  x2                    x3
 -10-0        2                      5                     1  (# points in
 this range)
 0-10         7                     9                      6
 ...

 I know the table function but I do not know how to deal with the range
 issue.

 Thanks in advance.

 Yi

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?

2010-06-22 Thread Joris Meys
On Mon, Jun 21, 2010 at 10:01 PM, Bert Gunter gunter.ber...@gene.com wrote:

 1. This discussion probably belongs on the sig-mixed-models list.

Correct, but I'll keep it here now for archive purposes.


 2. Your claim is incorrect, I think. The structure of the random errors =
 model covariance can be parameterized in various ways, and one can try to
 test significance of nested parameterizations (for a particular fixed
 effects parameterizaton). Whether you can do so meaningfully especially in
 the gamm context --  is another issue, but if you check e.g. Bates and
 Pinheiro, anova for different random effects parameterizations is advocated,
 and is implemented in the anova.lme() nlme function.

It is a matter of debate and interpretation. Mind you that I never
said it can't be done. I just wanted to pointed out that in most
cases, it shouldn't be done. As somebody else noted, both Wood and
Pinheiro and Bates suggest testing the random components _if the fixed
effects are the same_ . Yet, even then it gets difficult to offer the
correct hypothesis. In the example of Carlo, H0 is actually not
correct :

1) The 7 extra random components are not really 7 extra random
components, as they are definitely not independent, but actually all
part of the same spline fit.

2) According to my understanding, the degrees of freedom for a
likelihood is determined by the amount of parameters fitted, not the
amount of variables in the model. The parameters linked to the random
effect are part of the variance structure, and the number of
parameters to be fitted in this variance structure is not dependent on
the number of knots, but on the number of smooths. Indeed, in the gamm
model the variance of the parameters b for the random effect is
considered to be equal for all b's related to the same smooth.

3) it appears to me that you violate the restriction both
Pinheiro/Bates and Wood put on the testing of models with LR : you can
only do so if none of the variance parameters are restriced to the
edge of the feasible parameter space. Again as I see it, the model
with less knots implies the assumption that some variance components
are zero.

Hence, you cannot use a LR test to compare both models. The anova.lme
won't carry out the test anyway :

 library(mgcv)
 library(nlme)
 set.seed(123)

  x  - runif(100,1,10)# regressor
  b0 - rep(rnorm(10,mean=1,sd=2),each=10) # random intercept
  id - rep(1:10, each=10) # identifier
  y - b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable

  f1 - gamm(y ~ s(x, k=3, bs=cr), random = list(id=~1), method=ML )

  f2 - gamm(y ~ s(x, k=10, bs=cr), random = list(id=~1),method=ML )

 anova(f1$lme,f2$lme)
   Model df  AIC  BIClogLik
f1$lme 1  5 466.6232 479.6491 -228.3116
f2$lme 2  5 347.6293 360.6551 -168.8146

If you change the random term not related to the splines, it does
carry out the test. In this case, you can test the random effects as
explained by Pinheiro/Bates.

 f3 - gamm(y ~ s(x, k=10, bs=cr),method=ML )

 anova(f2$lme,f3$lme)
   Model df  AIC  BIClogLik   Test  L.Ratio p-value
f2$lme 1  5 347.6293 360.6551 -168.8146
f3$lme 2  4 446.2704 456.6910 -219.1352 1 vs 2 100.6411  .0001

As an illustration, the variance of the random effects :
 f1$lme
...
Random effects:
 Formula: ~Xr.1 - 1 | g.1
Xr.1
StdDev: 163.8058

 Formula: ~1 | id %in% g.1
(Intercept) Residual
StdDev:1.686341 2.063269

Number of Observations: 100
Number of Groups:
g.1 id %in% g.1
  1  10

 f2$lme
...
Random effects:
 Formula: ~Xr.1 - 1 | g.1
 Structure: pdIdnot
   Xr.11Xr.12Xr.13Xr.14Xr.15Xr.16Xr.17Xr.18
StdDev: 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349 2.242349
...

Clearly, the SD for all parameters b is exactly the same, and hence
only 1 parameter in the likelihood function. So both models won't
differ in df when using the ML estimation. This does not mean that the
b-coefficients for the random effects cannot be obtained. They are
just not part of the minimalization in the likelihood function. Or, as
Wood said about random effects : you don't estimate them, although you
might want to predict them.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-22 Thread Joris Meys
Hehe,

You do have a point in not calling R a statistical language. It is
indeed far more than that; Yet, I don't agree that statistics is done
by stuffy professors. Wished it was so, but alas, last time I looked
at my paycheck I had to conclude that I might be stuffy, but I'm far
from being paid as a professor...

Cheers
Joris

On Tue, Jun 22, 2010 at 11:34 AM, Patrick Burns
pbu...@pburns.seanet.com wrote:
 I'll expand my statement slightly.

 Yes, Peter, you are the archetypical
 stuffy professor.  The truth hurts.

 By any reasonable metric that I've
 thought of my company name is at least
 one-third statistics, from which a
 common (and I think correct) inference
 would be that I'm not anti-statistics.


 There are two aspects of why I think
 that R should not be called a statistical
 program: marketing and reality.

 Marketing

 Identifying with the most dreaded experience
 in university is not so good for sales.
 (Reducing stuffiness might reduce the root
 problem here.)

 Reality

 R really is used for more than statistics.
 Almost all of my use of R is outside the
 realm of statistics.  Maybe the field of
 statistics should have claim on a lot of
 that, but as of now that isn't the case.

 A Fusion

 R's real competition is not SAS or SPSS, but
 Excel.  As Brian has pointed out before,
 the vast majority of statistics is actually
 done in Excel.  Is Excel a statistics program?
 I don't think many people think that -- neither
 statisticians nor non-statisticians.

 Pat


 On 21/06/2010 10:32, Joris Meys wrote:

 On Mon, Jun 21, 2010 at 11:15 AM, Patrick Burns
 pbu...@pburns.seanet.com  wrote:

 (Statistics is what stuffy professors
 do, I just look at my data and try to
 figure out what it means.)

 Often those stuffy professors have a reason to do so. When they want
 an objective view on the data for example, or an objective measure of
 the significance of a hypothesis. But you're right, who cares about
 objectiveness these days? It doesn't sell you a paper, does it?

 Cheers
 Joris



 --
 Patrick Burns
 pbu...@pburns.seanet.com
 http://www.burns-stat.com
 (home of 'Some hints for the R beginner'
 and 'The R Inferno')




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] {Spam?} RE: {Spam?} RE: {Spam?} Re: mgcv, testing gamm vs lme, which degrees of freedom?

2010-06-22 Thread Joris Meys
Hi Carlos,

there is no possible way you can compare both models using a classical
statistical framework, be it ML, REML or otherwise. The assumptions
are violated. Regarding the df, see my previous mail.

In your case, I'd resort to the AIC/BIC criteria, and if prediction is
the main focus, compare the predictive power of both models using a
crossvalidation approach. Wood suggests in his book also a MCMC
approach for more difficult comparisons.

Cheers
Joris

On Tue, Jun 22, 2010 at 1:31 AM, Carlo Fezzi c.fe...@uea.ac.uk wrote:
 Hi Christos,

 thanks for your kind reply, I agree entirely with your interpreation.

 In the first model comparison, however, anova does seem to work
 according to our interpretation, i.e. the df are equal in the two model.
 My intuition is that the anova command does a fixed-effect test rather
 than a random effect one. This is the results I get:

 anova(f1$lme,f2$lme)

       Model df      AIC      BIC    logLik
 f1$lme     1  5 466.6232 479.6491 -228.3116
 f2$lme     2  5 347.6293 360.6551 -168.8146

 Hence I was not sure our interpretation was correct.

 On your second regarding mode point I totally agree about the appealing of
 GAMs... howver, I am working in a specific application where the quadratic
 function is the established benchmark and I think that testing against it
 will show even more strongly the appeal of a gamm approach. Any idea of
 which bases could work?

 Finally thansk for the tip regarding gamm4, unfortunately I need to fit a
 bivariate smooth so I cannot use it.

 Best wishes,

 Carlo




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Project

2010-06-22 Thread Joris Meys
Presuming you're talking about Perl, I might be able to help with
specific issues, but read the posting guide before you actually call
upon our help.

http://www.R-project.org/posting-guide.html

One book I definitely can recommend is Phil Spectors Data Manipulation with R

http://www.springer.com/statistics/computanional+statistics/book/978-0-387-74730-9

It covers most you need to know about data structures in R, as they're
quite different from Perl. Pay attention to the vectorization and the
use of indices in R, as both concepts differ substantially from Perl.
In essence, avoid for-loops as much as you can. One other source you
might look at is the R inferno of Patrick Burns :

http://lib.stat.cmu.edu/S/Spoetry/Tutor/R_inferno.pdf

Cheers
Joris

On Tue, Jun 22, 2010 at 4:35 AM, Colton Hunt coltonhu...@gmail.com wrote:
 I am an intern in Virginia Beach and I am currently working on a project
 that involves converting a script of Pearl to R. The code takes one input
 text file and outputs six text files. If you think you can help, I will be
 able to provide more detailed information. Thank you for your time!

 Colton Hunt

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do you install R package?

2010-06-22 Thread Joris Meys
open the R console.
type:
?install.packages.

press enter.

read.

say Doh, that's easy.
do what you read.

cheers
Joris

On Tue, Jun 22, 2010 at 5:38 PM, Amy Li a...@ucsd.edu wrote:
 Hi I'm a new user. I need to install some new packages. Can someone show me?

 Amy

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replication time series using permutations of the each y values

2010-06-22 Thread Joris Meys
Read the posting guide please. What's your data structure? That's
quite important. As I see it, I can easily get a matrix with what you
want by :
x1 - rep(a,b,each=3)
x2 - rep(c,d,f,times=2)
x3 - rep(g,6)
x4 - rep(h,6)
result - rbind(x1,x2,x3,x4)

But that's not what you want probably.
Cheers
Joris

2010/6/22 Josué Polanco jom...@gmail.com:
 Dear All,

 I'm trying to create this:
 I've these data (a, b,c, etc. are numbers) :

 1  a b
 2  c d f
 3  g
 4  h
   ...
 N

 I'm trying to create the all time series permutations (length = N)
 taking account
 the possibilities for each value. For example (N=4),

 1  a
 2  c
 3  g
 4  h

 next,
 1   b
 2   c
 3   g
 4   h

 next
 1   a
 2   d
 3   g
 4   h

 and so on. For this example, there are 2*3 different (possibilities)
 time series, but
 I need to do, v. gr., N = 100, and 2^14  different (possibilities)
 time series. I am wondering
 if somebody could give me a hint (suggestion) to make this using a
 function in R.

 thank you so much

 best regards

 --
 Josue Polanco

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicate dates in zoo objects

2010-06-22 Thread Joris Meys
Try this :
 x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1

 x - zoo(1:5, x.Date)

 x
2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14
 1  2  3  4  5

 i - match(unique(index(x)),rev(index(x)))

 x[i]
2003-02-01 2003-02-03 2003-02-07 2003-02-14
 1  2  4  5

Cheers
Joris


On Tue, Jun 22, 2010 at 4:35 PM, Research risk2...@ath.forthnet.gr wrote:
 Hello,

 I have a zoo time series read from an excel file which has some dates the
 same, such as the following example:

 02/10/1995     4925.5
 30/10/1995     4915.9
 23/01/1996     4963.5
 23/01/1996     5009.2
 04/03/1996     5031.9     # here
 04/03/1996     5006.5     # here
 03/04/1996     5069.2
 03/05/1996     5103.7
 31/05/1996     5107.1
 01/07/1996     5153.1
 02/08/1996     5151.7

 Is there a simple way to keep the last  price of the ones that have the same
 dates?

 04/03/1996    5031.9
 04/03/1996    5006.5

 i.e., keep only the 04/03/1996    5006.5  price and discard the previous
 one... Is there an implicit function that does that or do I need some sort
 of recursive algorithm?

 You can try a solution on this example (for convenience):

 x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1
 x - zoo(rnorm(5), x.Date)

 Zoo object  has 2 prices with same dates.

 Many thanks in advance,
 Costas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Duplicate dates in zoo objects

2010-06-22 Thread Joris Meys
Oops, I was too fast. I meant :

i - length(x)- match(unique(index(x)),rev(index(x)))+1

(one has to reverse the indices again)

But in any case, the aggregate-way is indeed far cleaner. Thx for
pointing that out.
Cheers
Joris

On Tue, Jun 22, 2010 at 7:24 PM, Achim Zeileis achim.zeil...@uibk.ac.at wrote:
 On Tue, 22 Jun 2010, Joris Meys wrote:

 Try this :

 x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1

 x - zoo(1:5, x.Date)

 x

 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14
        1          2          3          4          5

 i - match(unique(index(x)),rev(index(x)))

 x[i]

 2003-02-01 2003-02-03 2003-02-07 2003-02-14
        1          2          4          5

 Note that this happens to do the right thing in this particular toy example
 but not more generally. Simply adding a single observation will make it
 fail. The aggregate() solution I posted previously is preferred:

 R x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14, 15) - 1
 R x - zoo(1:6, x.Date)
 Warning message:
 In zoo(1:6, x.Date) :
  some methods for zoo objects do not work if the index entries in
 'order.by' are not unique
 R x
 2003-02-01 2003-02-03 2003-02-07 2003-02-07 2003-02-14 2003-02-15
         1          2          3          4          5          6
 R aggregate(x, time(x), tail, 1)
 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15
         1          2          4          5          6
 R i - match(unique(index(x)),rev(index(x)))
 R  x[i]
 2003-02-01 2003-02-03 2003-02-07 2003-02-14 2003-02-15
         1          2          3          5          6

 Best,
 Z

 Cheers
 Joris


 On Tue, Jun 22, 2010 at 4:35 PM, Research risk2...@ath.forthnet.gr
 wrote:

 Hello,

 I have a zoo time series read from an excel file which has some dates the
 same, such as the following example:

 02/10/1995     4925.5
 30/10/1995     4915.9
 23/01/1996     4963.5
 23/01/1996     5009.2
 04/03/1996     5031.9     # here
 04/03/1996     5006.5     # here
 03/04/1996     5069.2
 03/05/1996     5103.7
 31/05/1996     5107.1
 01/07/1996     5153.1
 02/08/1996     5151.7

 Is there a simple way to keep the last  price of the ones that have the
 same
 dates?

 04/03/1996    5031.9
 04/03/1996    5006.5

 i.e., keep only the 04/03/1996    5006.5  price and discard the
 previous
 one... Is there an implicit function that does that or do I need some
 sort
 of recursive algorithm?

 You can try a solution on this example (for convenience):

 x.Date - as.Date(2003-02-01) + c(1, 3, 7, 7, 14) - 1
 x - zoo(rnorm(5), x.Date)

 Zoo object  has 2 prices with same dates.

 Many thanks in advance,
 Costas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save scores from sem

2010-06-22 Thread Joris Meys
PCA and factor analysis is implemented in the core R distribution, no
extra packages needed. When using princomp, they're in the object.

 pr.c - princomp(USArrests,scale=T)
 pr.c$scores # gives you the scores

see ?princomp

When using factanal, you can ask for regression scores or Bartlett
scorse. See ?factanal.
Mind you, you will get different -i.e. more correct- results than
those obtained by SPSS.

I don't understand what you mean with scores in the context of
structural equation modelling. Lavaan is unknown to me.

Cheers
Joris

On Tue, Jun 22, 2010 at 3:11 PM, Steve Powell st...@promente.net wrote:
  Dear expeRts,
 sorry for such a newbie question -
 in PCA/factor analysis e.g. in SPSS it is possible to save scores from the
 factors. Is it analogously possible to save the implied scores from the
 latent variables in a measurement model or structural model e.g. using the
 sem or lavaan packages, to use in further analyses?
 Best wishes
 Steve Powell

 www.promente.org | skype stevepowell99 | +387 61 215 997

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm

2010-06-22 Thread Joris Meys
On Tue, Jun 22, 2010 at 1:00 AM, Samuel Okoye samu...@yahoo.com wrote:
 Hi,

 I have the following data

 data1 - data.frame(count = c(0,1,1,2,4,5,13,16,14), weeks = 1:9,
     treat=c(rep(1mg,3),rep(5mg,3),rep(10mg,3)))
 and I am using

 library(splines)

 to fit

 glm.m - glm(count~weeks)+as.factor(treat),family=poisson,data=data1)

 and I am interested in predicting the count variale for the weeks 10, 11 and
 12 with treat 10mg and 15mg.

bad luck for you.

newdat -data.frame(
weeks=rep(10:12,each=2),
treat=rep(c(5mg,10mg),times=3)
)

preds - predict(glm.m,type=response,newdata=newdat,se.fit=T)
cbind(newdat,preds)

gives as expected :
Warning message:
In bs(weeks, degree = 3L, knots = numeric(0), Boundary.knots = c(1L,  :
  some 'x' values beyond boundary knots may cause ill-conditioned bases

  weeks treat   fitse.fit residual.scale
110   5mg  5.934881  5.205426  1
210  10mg 12.041639  9.514347  1
311   5mg  4.345165  6.924663  1
411  10mg  8.816168 15.805171  1
512   5mg  2.781063  8.123436  1
612  10mg  5.642667 18.221007  1


Watch the standard errors on the predicted values. No, you shouldn't
predict outside your data space, especially when using splines. And
when interested in 15mg, well, you shouldn't put treatment as a factor
to start with.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Verify the linear regression model used in R ( fundamental theory)

2010-06-22 Thread Joris Meys
It's normally always specified, unless in the case of least squares
linear regression (lm), where it is considered obvious.

Cheers
Joris

On Wed, Jun 23, 2010 at 1:46 AM, Yi liuyi.fe...@gmail.com wrote:
 Hi, folks,

 As I understand, Least-squares Estimate (second-moment assumption) and the
 Method of Maximum Likelihood (full distribtuion assumption) are used for
 linear regression.

 I do ?lm, but the help file does not tell me the model employed in R. But
 in the book 'Introductory Statistics with R', it indicates R estimate the
 parameters using the method of Least-squares. However it assumes the error
 is iid N(o,sigma^2).

 Am I correct? Is there any general way (like RSiteSearch() ) to find out
 what the model (theory) is for specific function? Let's say how to find out
 the assumption and the model used for rlm.

 Thanks

 Yi

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to 'understand' R functions besides reading R codes

2010-06-22 Thread Joris Meys
There is:
1) read the help files
2) read the vignette (see ?vignette )
2) look up at www.rseek.org
3) check google
4) look at the references mentioned in the help files (or do first if
you're not familiar with the method)
5) ask here

If still nothing :
use another function/method.

Cheers
Joris

On Wed, Jun 23, 2010 at 2:29 AM, Yi liuyi.fe...@gmail.com wrote:
 Apologize for not being clearer earlier. I would like to ask again. Thank
 Joris and Markleeds for response.

 Two examples:

 1. Function 'var'. In R, it is the sum of square divided by (n-1) but not by
 n. (I know this in R class)

 2. Function 'lm'. In R, it is the residual sum of square divied by (n-2) not
 by n, the same as in the least squares estimate. But the assumption
 following that in the method of maximum likelihood.  (I know this by looking
 up the book 'Introductory Statistics with R').

 But not all functions are explained in the books.

 I am thinking whether there is a general way to figure out how R function
 works and what is the underlying theory besides looking at the codes
 (type var or lm at the  Rprompt). Because R codes are too difficult to
 understand.

 Thanks

 Yi

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mahalanobis distance

2010-06-22 Thread Joris Meys
Say X is your data matrix with the variable, then you could do :
 X - matrix(rnorm(2100),300,7)
 S - var(X)
 dist - as.dist(
 apply(X,1,function(i){
 mahalanobis(X,i,S)
 }
 )
 )

Cheers
Joris

On Tue, Jun 22, 2010 at 11:41 PM, yoo hoo freesuccess2...@yahoo.com wrote:
 I am a new R user.  i have a question about Mahalanobis distance.actually i 
 have 300 rows and 7 columns. columns are different measurements, 300 rows are 
 genes. since genes can
 classify into 4 categories. i used dist() with euclidean distance and 
 cmdscale to do MDS plot. but find out Mahalanobis distance may be
 better. how do i use Mahalanobis() to generate similar dist object which i 
 can use MDS plot? second question is if should i calculate mean for
 every categories for every measurement first and do 4*4 distance matrix, or i 
 should calculate pairwise distance first and then find category
 means since i only care about relative position of 4 categories in MDS
 plot. Thank you very much.



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compute coefficient of determination (R-squared) for GLM (maximum likelihood)

2010-06-21 Thread Joris Meys
It is not available for a reason. The correct way would be to use lm()
instead, if possible. This function reports an R² in the summary. In
the case of glm, and if you're absolutely sure about what you're
doing, you can use one of the approximations that is used when looking
at prediction only, realizing very well you can't possibly use R² to
compare models with a different number of variables and realizing very
well that the R² doesn't mean what you think it does when using a link
function :

x - 1:100
y - 1:100 + rnorm(100)

mod - glm(y~x)

# possibility 1
R2 - cor(y,predict(mod))^2

# possibility 2
R2 - 1 - (sum((y-predict(mod))^2)/sum((y-mean(y))^2))

In the case where you use a link function, you should work on the
converted data : convert the values of y, and use
predict(mod,type=link) for a correct estimate.

Cheers
Joris

On Mon, Jun 21, 2010 at 12:00 AM, elaine kuo elaine.kuo...@gmail.com wrote:
 Dear,

 I want to compute coefficient of determination (R-squared) to complement AIC
 for model selection of
 multivariable GLM.

 However, I found this is not a built-in function in glm. neither is it
 available through reviewing the question in the R-help archive.
 Please kindly help and thanks a lot.

 Elaine

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Popularity of R, SAS, SPSS, Stata...

2010-06-21 Thread Joris Meys
On Mon, Jun 21, 2010 at 11:15 AM, Patrick Burns
pbu...@pburns.seanet.com wrote:

 (Statistics is what stuffy professors
 do, I just look at my data and try to
 figure out what it means.)

Often those stuffy professors have a reason to do so. When they want
an objective view on the data for example, or an objective measure of
the significance of a hypothesis. But you're right, who cares about
objectiveness these days? It doesn't sell you a paper, does it?

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing elements of a list over a certain threshold

2010-06-21 Thread Joris Meys
magic code:
example[examplethreshold] -threshold

Cheers
Joris

On Mon, Jun 21, 2010 at 12:34 PM, Jim Hargreaves ja...@ipec.co.uk wrote:
 Dear List,

 I have a list of length ~1000 filled with numerics. I need to replace the
 elements of this list that are above a certain numerical threshold with the
 value of the threshold.

 e.g
 example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
 threshold=5
 magic code goes here
 example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1).

 I have written a crude script that achieves this but it's very slow. Is
 there a way to do this using some R function?

 Crude script: http://pastebin.com/3KSfi8nD

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing elements of a list over a certain threshold

2010-06-21 Thread Joris Meys
You shouldn't use sapply/lapply for this but use the indices

 set.seed(1)
 r - round(runif(10,1,10))
 treshold - 5
 head(r)
[1] 3 4 6 9 3 9

 system.time( r[ rthreshold ] - threshold )
   user  system elapsed
  0   0   0

 head(r)
[1] 3 4 5 5 3 5



On Mon, Jun 21, 2010 at 2:03 PM, Christos Argyropoulos
argch...@hotmail.com wrote:

 Hi,
 You should use the sapply/lapply for such operations.

 r-round(runif(10,1,10))
 head(r)
 [1] 3 7 6 3 2 8
 filt-function(x,thres) ifelse(xthres,x,thres)
 system.time(r2-sapply(r,filt,thres=5))
   user  system elapsed
   3.36    0.00    3.66
 head(r2)
 [1] 3 5 5 3 2 5


 To return a list, replace sapply with lapply

 Christos



 Date: Mon, 21 Jun 2010 11:34:01 +0100
 From: ja...@ipec.co.uk
 To: r-help@r-project.org
 Subject: [R]  Replacing elements of a list over a certain threshold

 Dear List,

 I have a list of length ~1000 filled with numerics. I need to replace
 the elements of this list that are above a certain numerical threshold
 with the value of the threshold.

 e.g
 example=list(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
 threshold=5
 magic code goes here
 example=(1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 2, 1).

 I have written a crude script that achieves this but it's very slow. Is
 there a way to do this using some R function?

 Crude script: http://pastebin.com/3KSfi8nD

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 _
 Hotmail: Free, trusted and rich email service.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   >