Re: [R] Question: how to obtain the clusters of genes (basically the ones in the row dendrograms) from an object obtained by heatmap.2 function

2010-09-17 Thread Sunny Srivastava
Thanks Michael for your help. At least its good to know that there is no
function which does what I wanted. I will definitely try to code something
that fulfills my requirements.

Regards,
S.

On Fri, Sep 17, 2010 at 2:32 AM, Michael Bedward
michael.bedw...@gmail.comwrote:

 Hello Sunny,

 Defining groups at a given height can be done with the cut function
 (see ?cut.dendrogram).

 Unfortunately, cut doesn't give you the option of specifying a number
 of clusters rather than a height in the way that cutree does for
 hclust objects. Perhaps someone else here knows of a package that
 includes a function to do this (I did a quick sos search but didn't
 find anything obvious).  Otherwise you could examine the cut function
 and try to hack a version that takes number of groups as an argument.
 You can see the function's code with the command getS3method(cut,
 dendrogram).

 Sorry I can't help more. Being a simple soul I find hclust objects so
 much easier to deal with !

 Michael


 On 17 September 2010 11:30, Sunny Srivastava research.b...@gmail.com
 wrote:
  Hello R-Helpers,
  I have a question about extracting the clusters of genes after we make
 the
  heatmap (say ht4) using the heatmap.2 function. Basically, I want to get
 the
  clusters which are shown as row dendrogram in the heatmap.
 
  I understand that ht4$rowDendrogram is an object of dendrogram and it
  containes details of all the nodes and branches, but lets say I want to
 know
  the number of clusters and the genes in each cluster if I terminated the
  tree (dendrogram) at a particular height. Also, if I know that I want 12
  clusters, how do I know which height I should terminate the tree (or
  branching structure)
 
  I am sorry if I am not clear. Please let me know if you need any further
  clarifications. Thanks in advance for your help.
 
  Best Regards,
  S.
 
  PS: I had posted this question on the Bioconductor mailing list, but no
 one
  responded with the answer about my problem (some suggested to use other
  package). Probably this is more related to R, so I am reposting here.
 
 
 
  Below is the dump of the matrix *row.scaled.fc2* and the object *ht4*
 which
  was obtained by using the heatmap.2 function.
 
 
 -
  Dump of the matrix row.scaled.fc2 of dimension 47X4 for which I need to
 get
  the cluster
 
 -
  row.scaled.fc2 -
 structure(c(1.28370586906697, 0.045882176168387, -1.36708804887146,
 0.521861081643557, 0.931917365795697, -1.26811754842825,
  -0.72130612803134,
 0.997233560332114, 1.10914280037357, 0.906822512746599,
  0.124305385892705,
 0.243716750638903, -0.81506628597585, 0.9281945227055,
  -1.02514155647985,
 -0.0148828263869010, 0.610771143828774, -1.31512789127346,
  -1.03419747081316,
 -1.37364737258546, 1.25426184614502, -0.901983912371582,
  1.39208493297302,
 1.46330419386939, 1.46904838309704, 1.33893188130515,
  1.19407808189758,
 1.2218547353343, 1.19698274357976, 1.18155526998177,
  0.841732283108634,
 0.747807260442244, 0.714318042078153, -1.33532716080095,
  -0.313607205847584,
 0.355541486307312, -0.116351310506438, 0.77912190137299,
  1.19372966187956,
 -1.46614749631243, 1.05871763558761, -0.943184299406566,
  1.03714731356991,
 1.25047276064487, 0.851530489918317, 0.97326112450597,
  0.776853817614179,
 0.254354524536168, 1.31978778177031, 1.03174081073449,
  1.03284070831524,
 0.653353551741362, -0.215733545477378, -0.966047927590969,
  0.652368565446036,
 0.536560120952493, 0.807139899513123, 1.26763097889282,
  1.28335333872251,
 1.45704025225707, 0.57691754078049, 1.07113369815538,
  0.610158458070122,
 -0.762088920575592, 1.00819322156949, 1.14148232415467,
  0.297815716619546,
 0.143195107796418, -0.0065855621849476, 0.062650188298147,
  -0.177601977084224,
 -0.437288024655434, 0.178377570495840, 0.447251122498145,
  0.400521563178456,
 0.441487949431983, 0.46509369129, 0.754248218272813,
  0.657576754588525,
 0.832332574891687, -0.194585070239614, 1.09572866565514,
  1.04256940502540,
 0.583290457043162, 0.947182223637108, 0.453501818870319,
  0.362539212141846,
 0.64658837487362, 0.778492522245523, 0.406650195058153,
  -0.113538768459753,
 0.596257630693658, 0.652082611403661, 0.731202922578465,
  -0.540351240198989,
 -0.280636135117373, 0.0957282195118376,
 -0.301771114678491,
  -0.319287162711085,
 

Re: [R] Question: how to obtain the clusters of genes (basically the ones in the row dendrograms) from an object obtained by heatmap.2 function

2010-09-17 Thread Michael Bedward
Cool. If you get some code working and don't mind sharing it please
post it here.

Michael

On 17 September 2010 16:49, Sunny Srivastava research.b...@gmail.com wrote:
 Thanks Michael for your help. At least its good to know that there is no
 function which does what I wanted. I will definitely try to code something
 that fulfills my requirements.
 Regards,
 S.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about converting a matrix to a dataframe

2010-09-16 Thread Michael Green
Use as.data.frame instead. It does what you want it to do.

newdata.df-as.data.frame(stocks1[1:5,2:5])

Cheers,
Michael

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Leigh E. Lommen
Sent: 16. september 2010 17:18
To: r-help@r-project.org
Subject: [R] question about converting a matrix to a dataframe

First I have a matrix called stocks1:

 

 class(stocks1)

[1] matrix

 

Here are the first 5 rows of the last 4 columns:

 stocks1[1:5,2:5]

  [,1]  [,2]  [,3]  [,4] 

[1,] 80.73 31.95 25.4  25.69

[2,] 83.66 31.95 27.12 25.2 

[3,] 83.27 32.93 28.74 26.29

[4,] 83.9  34.07 29.77 26.6 

[5,] 82.74 35.18 30.24 27.41

 

Now, why can't I convert this into a dataframe?  It automatically
converts the columns into 1 long row??

 newdata.df-data.frame(stocks1[1:5,2:5])

 newdata.df

   X1.1  X1.2  X1.3 X1.4  X1.5  X1.6  X1.7  X1.8  X1.9 X1.10 X1.11 X1.12


1 80.73 83.66 83.27 83.9 82.74 31.95 31.95 32.93 34.07 35.18  25.4 27.12

 

  X1.13 X1.14 X1.15 X1.16 X1.17 X1.18 X1.19 X1.20 

1 28.74 29.77 30.24 25.69  25.2 26.29  26.6 27.41

 

 

Regards,

Leigh


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about converting a matrix to a dataframe

2010-09-16 Thread Ivan Calandra
  Hi,

Well, it works for me:

x - matrix(1:20, nrow=5, ncol=4)
data.frame(x[1:5,2:4])
   X1 X2 X3
1  6 11 16
2  7 12 17
3  8 13 18
4  9 14 19
5 10 15 20

Maybe with as.data.frame(), or set the drop argument to FALSE:
data.frame(x[1:5,2:4,drop=FALSE])

Not sure why it doesn't work for you. Check the output of str(stocks1)

HTH,
Ivan

Le 9/16/2010 17:18, Leigh E. Lommen a écrit :
 First I have a matrix called stocks1:



 class(stocks1)
 [1] matrix



 Here are the first 5 rows of the last 4 columns:

 stocks1[1:5,2:5]
[,1]  [,2]  [,3]  [,4]

 [1,] 80.73 31.95 25.4  25.69

 [2,] 83.66 31.95 27.12 25.2

 [3,] 83.27 32.93 28.74 26.29

 [4,] 83.9  34.07 29.77 26.6

 [5,] 82.74 35.18 30.24 27.41



 Now, why can't I convert this into a dataframe?  It automatically
 converts the columns into 1 long row??

 newdata.df-data.frame(stocks1[1:5,2:5])
 newdata.df
 X1.1  X1.2  X1.3 X1.4  X1.5  X1.6  X1.7  X1.8  X1.9 X1.10 X1.11 X1.12


 1 80.73 83.66 83.27 83.9 82.74 31.95 31.95 32.93 34.07 35.18  25.4 27.12



X1.13 X1.14 X1.15 X1.16 X1.17 X1.18 X1.19 X1.20

 1 28.74 29.77 30.24 25.69  25.2 26.29  26.6 27.41



 Regards,

 Leigh


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-15 Thread Hey Sky
thanks. Ravi and Nash. 

I will read the new package and may use it after I am familiar with it. I may 
bother both of you when I have questions.thanks for that in advance.

Nan
from Montreal






Hi Nan, 
You can take a look at the optimx package on CRAN.  John Nash and I wrote 
this 
package to help lay and sophisticated users alike.  This package unifies 
various 
optimization algorithms in R for smooth, box-constrained optimization. It has 
features for checking objective function, gradient (and hessian) 
specifications.  
It checks for potential problems due to poor scaling; checks feasibility of 
starting values.  It provides diagnostics (KKT conditions) on whether or not a 
local optimum has been located.  It also allows the user to run various 
optimization algorithms in one simple call, which is essentially identical to 
optim call. This feature can be especially useful for developers to benchmark 
different algorithms and choose the best one for their class of problems. 

http://cran.r-project.org/web/packages/optimx/index.html 
Ravi.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on staRt package

2010-09-14 Thread Uwe Ligges



On 14.09.2010 12:25, Paulo Teles wrote:


Dear Sirs,

I have been using the package staRt but it has disappeared from the latest R 
versions. I emailed the author but he never replied. Is it possible to let me 
know if that package has been removed or if it has been replaced by another or 
what happened? The latest version is 1.1.12.
I would like to know if I can keep using it or not. If it has been permanently 
removed, I should look for an alternative.



I cannot remember the reason why this has been archived. If it was not 
due to a maintainer's request to do so, then probably because the 
package does not pass checks OK any more and the maintainer was 
unresponsive.
In any case, you can get the old source package from the archives and 
install and check it yourself.


Since the package is GPL 2 according to the DESCRIPTION file, you could 
take over maintainership and submit a new version to CRAN.


Best,
Uwe Ligges





Thank you very much

Paulo Teles


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question: Form a new list with the index replicated equal to the number of elements in that index

2010-09-13 Thread Phil Spector

Sunny -
   I don't think mapply is needed:


 lapply(1:length(mylist),function(x)rep(x,length(mylist[[x]])))

[[1]]
[1] 1 1 1

[[2]]
[1] 2

[[3]]
[1] 3 3

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Mon, 13 Sep 2010, Sunny Srivastava wrote:


Dear R-Helpers,
I have a list l1 like:

l1[[1]]
a b c

l1[[2]]
d

l1[[3]]
e f

I want an output res like:

res[[1]]
1 1 1

res[[2]]
2

res[[3]]
3 3

Essentially, I want to replicate each index equal to the number of elements
present in that index.

Below is what I do to accomplish this:

l1 - list(c(a, b, c), d, c(e, f))
res - mapply(rep, seq_along(l1),times=(lapply(l1, length)))

Is there a more elegant way of doing this (possibly using one (l/m)apply
function)?

Thanks in advance for the help.

Best Regards,
S.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question: Form a new list with the index replicated equal to the number of elements in that index

2010-09-13 Thread Henrique Dallazuanna
Try this:

relist(rep(1:length(l1), sapply(l1, length)), l1)


On Mon, Sep 13, 2010 at 3:41 PM, Sunny Srivastava
research.b...@gmail.comwrote:

 Dear R-Helpers,
 I have a list l1 like:

 l1[[1]]
 a b c

 l1[[2]]
 d

 l1[[3]]
 e f

 I want an output res like:

 res[[1]]
 1 1 1

 res[[2]]
 2

 res[[3]]
 3 3

 Essentially, I want to replicate each index equal to the number of elements
 present in that index.

 Below is what I do to accomplish this:

 l1 - list(c(a, b, c), d, c(e, f))
 res - mapply(rep, seq_along(l1),times=(lapply(l1, length)))

 Is there a more elegant way of doing this (possibly using one (l/m)apply
 function)?

 Thanks in advance for the help.

 Best Regards,
 S.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question: Form a new list with the index replicated equal to the number of elements in that index

2010-09-13 Thread Sunny Srivastava
Thank you very much list!

On Mon, Sep 13, 2010 at 5:04 PM, Phil Spector spec...@stat.berkeley.eduwrote:

 Sunny -
   I don't think mapply is needed:

   lapply(1:length(mylist),function(x)rep(x,length(mylist[[x]])))

 [[1]]
 [1] 1 1 1

 [[2]]
 [1] 2

 [[3]]
 [1] 3 3

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



 On Mon, 13 Sep 2010, Sunny Srivastava wrote:

  Dear R-Helpers,
 I have a list l1 like:

 l1[[1]]
 a b c

 l1[[2]]
 d

 l1[[3]]
 e f

 I want an output res like:

 res[[1]]
 1 1 1

 res[[2]]
 2

 res[[3]]
 3 3

 Essentially, I want to replicate each index equal to the number of
 elements
 present in that index.

 Below is what I do to accomplish this:

 l1 - list(c(a, b, c), d, c(e, f))
 res - mapply(rep, seq_along(l1),times=(lapply(l1, length)))

 Is there a more elegant way of doing this (possibly using one (l/m)apply
 function)?

 Thanks in advance for the help.

 Best Regards,
 S.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question: Form a new list with the index replicated equal to the number of elements in that index

2010-09-13 Thread Peter Dalgaard
On 09/13/2010 08:41 PM, Sunny Srivastava wrote:
 Dear R-Helpers,
 I have a list l1 like:
 
 l1[[1]]
 a b c
 
 l1[[2]]
 d
 
 l1[[3]]
 e f
 
 I want an output res like:
 
 res[[1]]
 1 1 1
 
 res[[2]]
 2
 
 res[[3]]
 3 3
 
 Essentially, I want to replicate each index equal to the number of elements
 present in that index.
 
 Below is what I do to accomplish this:
 
 l1 - list(c(a, b, c), d, c(e, f))
 res - mapply(rep, seq_along(l1),times=(lapply(l1, length)))
 
 Is there a more elegant way of doing this (possibly using one (l/m)apply
 function)?

Don't know about elegance, but how about

lapply(seq_along(l1), function(i)rep(i, length(l1[[j]])) )

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-08 Thread Ravi Varadhan
Hi Nan,

You can take a look at the optimx package on CRAN.  John Nash and I wrote
this package to help lay and sophisticated users alike.  This package
unifies various optimization algorithms in R for smooth, box-constrained
optimization. It has features for checking objective function, gradient (and
hessian) specifications.  It checks for potential problems due to poor
scaling; checks feasibility of starting values.  It provides diagnostics
(KKT conditions) on whether or not a local optimum has been located.  It
also allows the user to run various optimization algorithms in one simple
call, which is essentially identical to optim call. This feature can be
especially useful for developers to benchmark different algorithms and
choose the best one for their class of problems.

http://cran.r-project.org/web/packages/optimx/index.html

Ravi.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Hey Sky
Sent: Tuesday, September 07, 2010 2:48 PM
To: Ben Bolker; r-h...@stat.math.ethz.ch
Subject: Re: [R] question on optim

thanks. Ben

after read your email, I realized the initial value of w[5]=0 is a stupid 
mistake. and I have changed it. but I am sorry I cannot reproduce the
result, 
convergence, as you get. the error message is non-finite finite difference 
value [12]. any suggestion about it?

and could you plz recommend some R books on optimization, such as tips for
setup 
gradient and others, or common mistakes?  thanks

Nan







- Original Message 
From: Ben Bolker bbol...@gmail.com
To: r-h...@stat.math.ethz.ch
Sent: Tue, September 7, 2010 11:15:43 AM
Subject: Re: [R] question on quot;optimquot;

Hey Sky heyskywalker at yahoo.com writes:

 I do not know how to describe my question. I am a new user for R and
 write the 
 following code for a dynamic labor economics model and use OPTIM to get 
 optimizations and parameter values. the following code does not work due
to 
 the equation:
 
wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]
 
 where w[5] is one of the parameters (together with vector a, b and other 
 elements in vector w) need to be estimated. if I
  delete the w[5] from the upper 
 equation. that is:
 
  wden[,i]-dnorm(1-regw[,i])
 
 optim will give me the estimated parameters.

  Thank you for the reproducible example!

  The problem is that you are setting the initial value of w[5]
to zero, and then trying to divide by it ...

  I find that


guess-rep(0,times=npar)
guess[16] - 1

system.time(r1-optim(guess,myfunc1,data=mydata, method=BFGS,hessian=TRUE,
  control=list(trace=TRUE)))

seems to work OK (I have no idea if the answers are sensible are not ...)

If you're going to be doing a lot of this it might be wise to see
if you can specify the gradient of your objective function for R --
it will speed up and stabilize the fitting considerably.

  By the way, you should be careful with this function: if we try
this with Nelder-Mead instead, it appears to head to a set of
parameters that lead to some sort of singularity in the objective
function:

system.time(r2-optim(guess,myfunc1,data=mydata, 
   method=Nelder-Mead,hessian=FALSE,
  control=list(trace=TRUE,maxit=5000)))

## still thinks it hasn't converged, but objective function is
##   much smaller

## plot 'slice' through objective space where 0 corresponds to
##  fit-1 parameters and 1 corresponds to fit-2 parameters;
## adapted from emdbook::calcslice
range - seq(-0.1,1.1,length=400)
slicep - seq(range[1], range[2], length = 400)
slicepars - t(sapply(slicep, function(x) (1 - x) * r1$par +  x * r2$par))
v - apply(slicepars, 1, myfunc1)
plot(range,v,type=l)


  Ideally, you should be able to look at the parameters of fit #2
and figure out (a) what the result means in terms of labor economics
and (b) how to keep the objective function from going there, or at
least identifying when it does.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-08 Thread Paulo Barata


Dear R-list members,

I am using R 2.11.1 on Windows XP. When I try to install package
optimx through the GUI menu Packages / Install packages, this
package does not appear in the list that opens up (chosen from the
Austria CRAN site). The package is listed on Austria's CRAN web
page, but today (8 September 2010) it does not show in the list
obtained through the menu.

Thank you.

Paulo Barata

(Rio de Janeiro - Brazil)

--
--


On 8/9/2010 11:01, Ravi Varadhan wrote:

Hi Nan,

You can take a look at the optimx package on CRAN.  John Nash and I wrote
this package to help lay and sophisticated users alike.  This package
unifies various optimization algorithms in R for smooth, box-constrained
optimization. It has features for checking objective function, gradient (and
hessian) specifications.  It checks for potential problems due to poor
scaling; checks feasibility of starting values.  It provides diagnostics
(KKT conditions) on whether or not a local optimum has been located.  It
also allows the user to run various optimization algorithms in one simple
call, which is essentially identical to optim call. This feature can be
especially useful for developers to benchmark different algorithms and
choose the best one for their class of problems.

http://cran.r-project.org/web/packages/optimx/index.html

Ravi.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Hey Sky
Sent: Tuesday, September 07, 2010 2:48 PM
To: Ben Bolker; r-h...@stat.math.ethz.ch
Subject: Re: [R] question on optim

thanks. Ben

after read your email, I realized the initial value of w[5]=0 is a stupid
mistake. and I have changed it. but I am sorry I cannot reproduce the
result,
convergence, as you get. the error message isnon-finite finite difference
value [12]. any suggestion about it?

and could you plz recommend some R books on optimization, such as tips for
setup
gradient and others, or common mistakes?  thanks

Nan







- Original Message 
From: Ben Bolkerbbol...@gmail.com
To: r-h...@stat.math.ethz.ch
Sent: Tue, September 7, 2010 11:15:43 AM
Subject: Re: [R] question onquot;optimquot;

Hey Skyheyskywalkerat  yahoo.com  writes:


I do not know how to describe my question. I am a new user for R and
write the
following code for a dynamic labor economics model and use OPTIM to get
optimizations and parameter values. the following code does not work due

to

the equation:

wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]

where w[5] is one of the parameters (together with vector a, b and other
elements in vector w) need to be estimated. if I
  delete the w[5] from the upper
equation. that is:

  wden[,i]-dnorm(1-regw[,i])

optim will give me the estimated parameters.


   Thank you for the reproducible example!

   The problem is that you are setting the initial value of w[5]
to zero, and then trying to divide by it ...

   I find that


guess-rep(0,times=npar)
guess[16]- 1

system.time(r1-optim(guess,myfunc1,data=mydata, method=BFGS,hessian=TRUE,
   control=list(trace=TRUE)))

seems to work OK (I have no idea if the answers are sensible are not ...)

If you're going to be doing a lot of this it might be wise to see
if you can specify the gradient of your objective function for R --
it will speed up and stabilize the fitting considerably.

   By the way, you should be careful with this function: if we try
this with Nelder-Mead instead, it appears to head to a set of
parameters that lead to some sort of singularity in the objective
function:

system.time(r2-optim(guess,myfunc1,data=mydata,
method=Nelder-Mead,hessian=FALSE,
   control=list(trace=TRUE,maxit=5000)))

## still thinks it hasn't converged, but objective function is
##   much smaller

## plot 'slice' through objective space where 0 corresponds to
##  fit-1 parameters and 1 corresponds to fit-2 parameters;
## adapted from emdbook::calcslice
range- seq(-0.1,1.1,length=400)
slicep- seq(range[1], range[2], length = 400)
slicepars- t(sapply(slicep, function(x) (1 - x) * r1$par +  x * r2$par))
v- apply(slicepars, 1, myfunc1)
plot(range,v,type=l)


   Ideally, you should be able to look at the parameters of fit #2
and figure out (a) what the result means in terms of labor economics
and (b) how to keep the objective function from going there, or at
least identifying when it does.

   Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting

Re: [R] question on optim

2010-09-08 Thread Ravi Varadhan
The windows build is not on CRAN at the moment. It might be there in a few
days.  In the meanwhile you can get the windows binary from the optimizer
project on r-forge:

https://r-forge.r-project.org/R/?group_id=395

Ravi.

-Original Message-
From: Paulo Barata [mailto:pbar...@infolink.com.br] 
Sent: Wednesday, September 08, 2010 4:26 PM
To: Ravi Varadhan
Cc: 'Hey Sky'; 'Ben Bolker'; r-h...@stat.math.ethz.ch
Subject: Re: [R] question on optim


Dear R-list members,

I am using R 2.11.1 on Windows XP. When I try to install package
optimx through the GUI menu Packages / Install packages, this
package does not appear in the list that opens up (chosen from the
Austria CRAN site). The package is listed on Austria's CRAN web
page, but today (8 September 2010) it does not show in the list
obtained through the menu.

Thank you.

Paulo Barata

(Rio de Janeiro - Brazil)

--
--


On 8/9/2010 11:01, Ravi Varadhan wrote:
 Hi Nan,

 You can take a look at the optimx package on CRAN.  John Nash and I
wrote
 this package to help lay and sophisticated users alike.  This package
 unifies various optimization algorithms in R for smooth, box-constrained
 optimization. It has features for checking objective function, gradient
(and
 hessian) specifications.  It checks for potential problems due to poor
 scaling; checks feasibility of starting values.  It provides diagnostics
 (KKT conditions) on whether or not a local optimum has been located.  It
 also allows the user to run various optimization algorithms in one simple
 call, which is essentially identical to optim call. This feature can be
 especially useful for developers to benchmark different algorithms and
 choose the best one for their class of problems.

 http://cran.r-project.org/web/packages/optimx/index.html

 Ravi.

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
 Behalf Of Hey Sky
 Sent: Tuesday, September 07, 2010 2:48 PM
 To: Ben Bolker; r-h...@stat.math.ethz.ch
 Subject: Re: [R] question on optim

 thanks. Ben

 after read your email, I realized the initial value of w[5]=0 is a stupid
 mistake. and I have changed it. but I am sorry I cannot reproduce the
 result,
 convergence, as you get. the error message isnon-finite finite difference
 value [12]. any suggestion about it?

 and could you plz recommend some R books on optimization, such as tips for
 setup
 gradient and others, or common mistakes?  thanks

 Nan







 - Original Message 
 From: Ben Bolkerbbol...@gmail.com
 To: r-h...@stat.math.ethz.ch
 Sent: Tue, September 7, 2010 11:15:43 AM
 Subject: Re: [R] question onquot;optimquot;

 Hey Skyheyskywalkerat  yahoo.com  writes:

 I do not know how to describe my question. I am a new user for R and
 write the
 following code for a dynamic labor economics model and use OPTIM to get
 optimizations and parameter values. the following code does not work due
 to
 the equation:

 wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]

 where w[5] is one of the parameters (together with vector a, b and other
 elements in vector w) need to be estimated. if I
   delete the w[5] from the upper
 equation. that is:

   wden[,i]-dnorm(1-regw[,i])

 optim will give me the estimated parameters.

Thank you for the reproducible example!

The problem is that you are setting the initial value of w[5]
 to zero, and then trying to divide by it ...

I find that


 guess-rep(0,times=npar)
 guess[16]- 1

 system.time(r1-optim(guess,myfunc1,data=mydata,
method=BFGS,hessian=TRUE,
control=list(trace=TRUE)))

 seems to work OK (I have no idea if the answers are sensible are not ...)

 If you're going to be doing a lot of this it might be wise to see
 if you can specify the gradient of your objective function for R --
 it will speed up and stabilize the fitting considerably.

By the way, you should be careful with this function: if we try
 this with Nelder-Mead instead, it appears to head to a set of
 parameters that lead to some sort of singularity in the objective
 function:

 system.time(r2-optim(guess,myfunc1,data=mydata,
 method=Nelder-Mead,hessian=FALSE,
control=list(trace=TRUE,maxit=5000)))

 ## still thinks it hasn't converged, but objective function is
 ##   much smaller

 ## plot 'slice' through objective space where 0 corresponds to
 ##  fit-1 parameters and 1 corresponds to fit-2 parameters;
 ## adapted from emdbook::calcslice
 range- seq(-0.1,1.1,length=400)
 slicep- seq(range[1], range[2], length = 400)
 slicepars- t(sapply(slicep, function(x) (1 - x) * r1$par +  x * r2$par))
 v- apply(slicepars, 1, myfunc1)
 plot(range,v,type=l)


Ideally, you should be able to look at the parameters of fit #2
 and figure out (a) what the result means in terms of labor economics
 and (b) how to keep the objective function from going

Re: [R] question on quot;optimquot;

2010-09-07 Thread Ben Bolker
Hey Sky heyskywalker at yahoo.com writes:

 I do not know how to describe my question. I am a new user for R and
 write the 
 following code for a dynamic labor economics model and use OPTIM to get 
 optimizations and parameter values. the following code does not work due to 
 the equation:
 
    wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]
 
 where w[5] is one of the parameters (together with vector a, b and other 
 elements in vector w) need to be estimated. if I
  delete the w[5] from the upper 
 equation. that is:
 
  wden[,i]-dnorm(1-regw[,i])
 
 optim will give me the estimated parameters.

  Thank you for the reproducible example!

  The problem is that you are setting the initial value of w[5]
to zero, and then trying to divide by it ...

  I find that


guess-rep(0,times=npar)
guess[16] - 1

system.time(r1-optim(guess,myfunc1,data=mydata, method=BFGS,hessian=TRUE,
  control=list(trace=TRUE)))

seems to work OK (I have no idea if the answers are sensible are not ...)

 If you're going to be doing a lot of this it might be wise to see
if you can specify the gradient of your objective function for R --
it will speed up and stabilize the fitting considerably.

  By the way, you should be careful with this function: if we try
this with Nelder-Mead instead, it appears to head to a set of
parameters that lead to some sort of singularity in the objective
function:

system.time(r2-optim(guess,myfunc1,data=mydata, 
   method=Nelder-Mead,hessian=FALSE,
  control=list(trace=TRUE,maxit=5000)))

## still thinks it hasn't converged, but objective function is
##   much smaller

## plot 'slice' through objective space where 0 corresponds to
##  fit-1 parameters and 1 corresponds to fit-2 parameters;
## adapted from emdbook::calcslice
range - seq(-0.1,1.1,length=400)
slicep - seq(range[1], range[2], length = 400)
slicepars - t(sapply(slicep, function(x) (1 - x) * r1$par +  x * r2$par))
v - apply(slicepars, 1, myfunc1)
plot(range,v,type=l)


  Ideally, you should be able to look at the parameters of fit #2
and figure out (a) what the result means in terms of labor economics
and (b) how to keep the objective function from going there, or at
least identifying when it does.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-07 Thread Hey Sky
sorry. there is a type in the following code. there is no w[5]*acwrk[,i] in the 
regw equation. the right one should be as following:


   regw[,i]-w[1]+ w[2]*eta[m]+exp(w[3]+w[4]*eta[m])*actr[,i]




 


- Original Message 
From: Hey Sky heyskywal...@yahoo.com
To: R r-help@r-project.org
Sent: Tue, September 7, 2010 10:38:55 AM
Subject: [R] question on optim

Hey, R users

I do not know how to describe my question. I am a new user for R and write the 
following code for a dynamic labor economics model and use OPTIM to get 
optimizations and parameter values. the following code does not work due to 
the equation:

   wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]

where w[5] is one of the parameters (together with vector a, b and other 
elements in vector w) need to be estimated. if I delete the w[5] from the upper 
equation. that is:

 wden[,i]-dnorm(1-regw[,i])

optim will give me the estimated parameters. but the paramter w[5] is a key one 
for me. 


now my questions is: what reason lead to the first equation does not work and 
the way to correct it to make it work?

 I wish I made the queston clear and thanks for any suggestion.

Nan
from Montreal








#the function 
myfunc1-function(par,data) {

# define the parameter matrix used in following part
vbar2-matrix(0,n,nt)
vbar3-matrix(0,n,nt)
v8 -matrix(0,n,nt)
regw-matrix(0,n,nt)
wden-matrix(0,n,nt)
lia-matrix(0,n,nt)
ccl-matrix(1,n,ns)
eta-c(0,0)

# setup the parts for loglikelihood
q1-exp(par[1])
pr1-q1/(1+q1)
pr2-1-pr1

eta[2]-par[2]
a-par[3:6]
b-par[7:11]
w-par[12:npar]

for(m in 1:ns){
 for(i in 1:nt){
   regw[,i]-w[1]+ w[2]*eta[m]+exp(w[3]+w[4]*eta[m])*actr[,i]+w[5]*acwrk[,i]
    vbar2[,i]=a[1]+ eta[m]+regw[,i]*a[2]+acwrk[,i]*a[3]+actr[,i]*a[4]
    vbar3[,i]=b[1]+b[2]*eta[m]+regw[,i]*b[3]+acwrk[,i]*b[4]+actr[,i]*b[5]
    v8[,i]=1+exp(vbar2[,i])+exp(vbar3[,i])
 
   for(j in 1:n){
    if (home[j,i]==1) lia[j,i]=1/v8[j,i]
  if (wrk[j,i]==1) lia[j,i]=exp(vbar2[j,i])/v8[j,i]
    if (tr[j,i]==1) lia[j,i]=exp(vbar3[j,i])/v8[j,i]
 }

   wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]
   ccl[,m]-lia[,i]*ccl[,m]*wden[,i]
 }
}

func-pr1*ccl[,1]+pr2*ccl[,2]
f-sum(log(func))
return(-f)
}


#***
# main program ** gen random data and get the optimization **

nt-16    # number of periods
ns-2 # number of person type
n-50 # number of people
npar-17 # number of parameters

tr-matrix(0,n,nt)
wrk-matrix(0,n,nt)
home-matrix(0,n,nt)
actr-matrix(0,n,nt)
acwrk-matrix(0,n,nt)

for(i in 1:nt){
  tr[,i]-round(runif(n))
  home[,i]-round(runif(n))
}

for(i in 1:nt){
 for(k in 1:n){
  if(tr[k,i]==1  home[k,i]==1) home[k,i]=0
  wrk[k,i]- 1-tr[k,i]-home[k,i]
 }
}

actr[,1]-tr[,1]
acwrk[,1]-wrk[,1]
for(j in 2:nt){
actr[,j]-actr[,j-1]+tr[,j]
acwrk[,j]-acwrk[,j-1]+wrk[,j]
}

mydata-cbind(tr,wrk,home,actr,acwrk)

guess-rep(0,times=npar)
system.time(r1-optim(guess,myfunc1,data=mydata, method=BFGS,hessian=T))
r1$par




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-07 Thread Ravi Varadhan
Hi,

I do not see how `data' is used in your objective function.

The objective function is not even evaluable at the initial guess.  

 myfunc1(guess, mydata)
[1] NaN

I also think that some of the parameters may have to be constrained, for 
example, par[1]  0.  At a minimum, make sure that the obj fn is correctly 
specified before we can start tackling other issues.

Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Hey Sky heyskywal...@yahoo.com
Date: Tuesday, September 7, 2010 10:40 am
Subject: [R] question on optim
To: R r-help@r-project.org


 Hey, R users
  
  I do not know how to describe my question. I am a new user for R and 
 write the 
  following code for a dynamic labor economics model and use OPTIM to 
 get 
  optimizations and parameter values. the following code does not work 
 due to 
  the equation:
  
     wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]
  
  where w[5] is one of the parameters (together with vector a, b and 
 other 
  elements in vector w) need to be estimated. if I delete the w[5] from 
 the upper 
  equation. that is:
  
   wden[,i]-dnorm(1-regw[,i])
  
  optim will give me the estimated parameters. but the paramter w[5] is 
 a key one 
  for me. 
  
  
  now my questions is: what reason lead to the first equation does not 
 work and 
  the way to correct it to make it work?
  
   I wish I made the queston clear and thanks for any suggestion.
  
  Nan
  from Montreal
  
  
  
  
  
  
  
  
  #the function 
  myfunc1-function(par,data) {
  
  # define the parameter matrix used in following part
  vbar2-matrix(0,n,nt)
  vbar3-matrix(0,n,nt)
  v8 -matrix(0,n,nt)
  regw-matrix(0,n,nt)
  wden-matrix(0,n,nt)
  lia-matrix(0,n,nt)
  ccl-matrix(1,n,ns)
  eta-c(0,0)
  
  # setup the parts for loglikelihood
  q1-exp(par[1])
  pr1-q1/(1+q1)
  pr2-1-pr1
  
  eta[2]-par[2]
  a-par[3:6]
  b-par[7:11]
  w-par[12:npar]
  
  for(m in 1:ns){
   for(i in 1:nt){
     regw[,i]-w[1]+ w[2]*eta[m]+exp(w[3]+w[4]*eta[m])*actr[,i]+w[5]*acwrk[,i]
      vbar2[,i]=a[1]+ eta[m]+regw[,i]*a[2]+acwrk[,i]*a[3]+actr[,i]*a[4]
      vbar3[,i]=b[1]+b[2]*eta[m]+regw[,i]*b[3]+acwrk[,i]*b[4]+actr[,i]*b[5]
      v8[,i]=1+exp(vbar2[,i])+exp(vbar3[,i])
   
     for(j in 1:n){
      if (home[j,i]==1) lia[j,i]=1/v8[j,i]
    if (wrk[j,i]==1) lia[j,i]=exp(vbar2[j,i])/v8[j,i]
      if (tr[j,i]==1) lia[j,i]=exp(vbar3[j,i])/v8[j,i]
   }
  
     wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]
     ccl[,m]-lia[,i]*ccl[,m]*wden[,i]
   }
  }
  
  func-pr1*ccl[,1]+pr2*ccl[,2]
  f-sum(log(func))
  return(-f)
  }
  
  
  #***
  # main program ** gen random data and get the optimization **
  
  nt-16    # number of periods
  ns-2 # number of person type
  n-50 # number of people
  npar-17 # number of parameters
  
  tr-matrix(0,n,nt)
  wrk-matrix(0,n,nt)
  home-matrix(0,n,nt)
  actr-matrix(0,n,nt)
  acwrk-matrix(0,n,nt)
  
  for(i in 1:nt){
    tr[,i]-round(runif(n))
    home[,i]-round(runif(n))
  }
  
  for(i in 1:nt){
   for(k in 1:n){
    if(tr[k,i]==1  home[k,i]==1) home[k,i]=0
    wrk[k,i]- 1-tr[k,i]-home[k,i]
   }
  }
  
  actr[,1]-tr[,1]
  acwrk[,1]-wrk[,1]
  for(j in 2:nt){
  actr[,j]-actr[,j-1]+tr[,j]
  acwrk[,j]-acwrk[,j-1]+wrk[,j]
  }
  
  mydata-cbind(tr,wrk,home,actr,acwrk)
  
  guess-rep(0,times=npar)
  system.time(r1-optim(guess,myfunc1,data=mydata, method=BFGS,hessian=T))
  r1$par
  
  
  
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-07 Thread Hey Sky
thanks. Ben

after read your email, I realized the initial value of w[5]=0 is a stupid 
mistake. and I have changed it. but I am sorry I cannot reproduce the result, 
convergence, as you get. the error message is non-finite finite difference 
value [12]. any suggestion about it?

and could you plz recommend some R books on optimization, such as tips for 
setup 
gradient and others, or common mistakes?  thanks

Nan







- Original Message 
From: Ben Bolker bbol...@gmail.com
To: r-h...@stat.math.ethz.ch
Sent: Tue, September 7, 2010 11:15:43 AM
Subject: Re: [R] question on quot;optimquot;

Hey Sky heyskywalker at yahoo.com writes:

 I do not know how to describe my question. I am a new user for R and
 write the 
 following code for a dynamic labor economics model and use OPTIM to get 
 optimizations and parameter values. the following code does not work due to 
 the equation:
 
wden[,i]-dnorm((1-regw[,i])/w[5])/w[5]
 
 where w[5] is one of the parameters (together with vector a, b and other 
 elements in vector w) need to be estimated. if I
  delete the w[5] from the upper 
 equation. that is:
 
  wden[,i]-dnorm(1-regw[,i])
 
 optim will give me the estimated parameters.

  Thank you for the reproducible example!

  The problem is that you are setting the initial value of w[5]
to zero, and then trying to divide by it ...

  I find that


guess-rep(0,times=npar)
guess[16] - 1

system.time(r1-optim(guess,myfunc1,data=mydata, method=BFGS,hessian=TRUE,
  control=list(trace=TRUE)))

seems to work OK (I have no idea if the answers are sensible are not ...)

If you're going to be doing a lot of this it might be wise to see
if you can specify the gradient of your objective function for R --
it will speed up and stabilize the fitting considerably.

  By the way, you should be careful with this function: if we try
this with Nelder-Mead instead, it appears to head to a set of
parameters that lead to some sort of singularity in the objective
function:

system.time(r2-optim(guess,myfunc1,data=mydata, 
   method=Nelder-Mead,hessian=FALSE,
  control=list(trace=TRUE,maxit=5000)))

## still thinks it hasn't converged, but objective function is
##   much smaller

## plot 'slice' through objective space where 0 corresponds to
##  fit-1 parameters and 1 corresponds to fit-2 parameters;
## adapted from emdbook::calcslice
range - seq(-0.1,1.1,length=400)
slicep - seq(range[1], range[2], length = 400)
slicepars - t(sapply(slicep, function(x) (1 - x) * r1$par +  x * r2$par))
v - apply(slicepars, 1, myfunc1)
plot(range,v,type=l)


  Ideally, you should be able to look at the parameters of fit #2
and figure out (a) what the result means in terms of labor economics
and (b) how to keep the objective function from going there, or at
least identifying when it does.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on optim

2010-09-07 Thread Ben Bolker
On 10-09-07 02:48 PM, Hey Sky wrote:
 thanks. Ben

 after read your email, I realized the initial value of w[5]=0 is a
 stupid mistake. and I have changed it. but I am sorry I cannot
 reproduce the result, convergence, as you get. the error message is
 non-finite finite difference value [12]. any suggestion about
 it?

When I 'fix' the objective function as you specified in your
second message, I get into trouble too.


 and could you plz recommend some R books on optimization, such as
 tips for setup gradient and others, or common mistakes?  thanks

 Nan


  I'm afraid I don't know of a great reference, although pp. 340-341
of http://www.math.mcmaster.ca/~bolker/emdbook/book.pdf do give some
basic trouble-shooting suggestions (nothing about gradients, though,
but the example in ?optim does use gradients).  I would say that my
best _general_ advice is to  understand what all the parameters of
your model mean: what are reasonable starting values and reasonable
ranges?  Use control(parscale) to tell R the approximate expected
order of magnitude of each parameter. You may be able to keep the
model from getting into trouble by using method=L-BFGS-B and
bounding the parameters within a reasonable range.  For more general
troubleshooting: try different optimization methods (in particular,
Nelder-Mead doesn't need to compute finite differences); trap
non-finite (NA, NaN, Inf, -Inf) that occur in the function and report
them and/or stop the function (see ?browser) and/or replace them with
large finite values; use control(trace=TRUE) or add cat() statements
to your function to see where the optimizer is trying to go.

  good luck,
Ben Bolker



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-09-05 Thread Teresa Iglesias


David Winsemius wrote:
 
 That is different than my understanding of AIC. I thought that the AIC  
 and BIC both took as input the difference in -2LL and then adjusted  
 those differences for the differences in number of degrees of freedom. 
 
 

David! Your words make sense to me now. Sorry for the lapse.
A very smart professor took the time out to school me. I misunderstood the
output from coxme. I see now that it's giving the LL for the NULL model and
for the model I have specified and the AIC output is the difference between
the full model and the NULL. So the numbers all make sense to me and in fact
the p-values are in agreement in that they decrease as the specified model
is an improvement over the NULL. I am using only the AIC values and akaike
weights in my analyses so the p-values are not my basis for reaching a
conclusion. I was distressed over the seeming disagreement between the AIC,
the p-values and my results using lmer (ignoring that the data were
censored), and bar graphs illustrating a clear difference when coxme was
telling me otherwise.  To spell it out for others like me that need to see
the numbers add up...

Given this output from coxme:
---
  NULL Integrated   Penalized
Log-likelihood -119.8470  -112.1598   -108.1663

Chisq   df  pAIC   BIC
Integrated loglik 15.37 2.00 0.00045863 11.37  8.05
Penalized loglik 23.36 7.06 0.00153710  9.25 -2.49


-2(LL) + (2*df) = AIC

NULL:-2(-119.8470) = 239.694
Integrated: -2(-112.1598) + (2 * 2) = 228.3196
Penalized: -2(-108.1663) + (2 * 7.06) = 230.4526

subtract the integrated model's AIC from the NULL model's AIC, you
get the stated AIC for the integrated model in the output (same for
penalized model).

239.694 - 228.3196  =11.3744

So the larger (positive range) the AIC, in the coxme output, the better that
model does compared to the NULL model. Incidentally, we see that the p-value
decreases with an increase in the coxme AIC and so there is no
disagreement. 

Thank you very smart professor! 

-Teresa Iglesias
Davis, Ca
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2527739.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival

2010-08-31 Thread Norm Matloff
In 2010-08-30, C. Peng button...@hotmail.com wrote:

 What statistical measure(s) tend to be answering ALL(?) question of
 practical interest?

None.  All I had said was that significance testing doesn't really
answer any questions of practical interest.  Unfortunately, that doesn't
mean there's something to answer all such questions.

In the regression case brought up by the original poster (that was Cox
regression, but the principle is the same), prediction-oriented measures
such AIC or cross-validation directly address the question of interest,
if that question is predictive ability.

In general, one should at the very least form confidence intervals
instead of significance tests.  In the case in which tests are
demanded, e.g. medical journals, replace instead of by in addition
to.

Norm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival

2010-08-30 Thread C. Peng

What statistical measure(s) tend to be answering ALL(?) question of practical
interest? 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Re-Question-regarding-significance-of-a-covariate-in-a-coxme-survival-tp2399386p2399577.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-29 Thread C. Peng

The likelihood ratio test is more reliable when one model is nested in the
other. This true for your case. 
AIC/SBC are usually used when two models are in a hiearchical structure.
Please also note that any decision made made based on AIC/SBC scores are
very subjective since no sampling distribution can be used to make a
rigorous decision. Regarding the magnitutes between the loglikelihood and
AIC/SBC, I would say the author must used a modified version in coxme()
since several different modified AIC/SBC scores are running in practice. 


My suggestion would be to use LR test for your case: 

For the integrated likelihhod: 

LL.small.model = - 467.3549(including lifedxm) 
LL.large.model = - 471.(excluding lifedxm) 
DF.diff = 3 - 1 = 2 

LR: -2*(- 471. + 467.3549) = 7.9568 
p-value: 1-pchisq(7.9568,2) = 0.01871556 


For the penalized likelihhod: 

LPL.small.model = -435.2096 (including lifedxm) 
LPL.large.model = -436.0478 (excluding lifedxm) 
DF.diff = 3 - 1 = 2 

PLR: -2*(- 436.0478 + 435.2096 ) = 1.6764 
p-value: 1-pchisq(1.6764,2) = 0.4324883 

Two different likehood methods produce different results, which one you
should use depends 
on which likelihood makes more sense to you (or which likehood is better). 

HTH 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399114.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-29 Thread Cheng Peng

The likelihood ratio test is more reliable when one model is nested in the
other. This true for your case.
AIC/SBC are usually used when two models are in a hiearchical structure.
Please also note that any decision made
made based on AIC/SBC scores are very subjective since no sampling
distribution can be used to make a rigorous decision. 
regarding the magnitutes between the loglikelihood and AIC/SBC, I would say
the author must used a modified version in coxme()
since several different modified AIC/SBC scores are running in practice.


My suggestion would be to use LR test for your case:

For the integrated likelihhod:

LL.small.model = - 467.3549(including lifedxm)
LL.large.model = - 471.(excluding lifedxm)
DF.diff = 3 - 1 = 2

LR: -2*(- 471. + 467.3549) = 7.9568
p-value: 1-pchisq(7.9568,2) = 0.01871556


For the penalized likelihhod:

LPL.small.model = -435.2096 (including lifedxm)
LPL.large.model = -436.0478 (excluding lifedxm)
DF.diff = 3 - 1 = 2

PLR: -2*(- 436.0478 + 435.2096 ) = 1.6764
p-value: 1-pchisq(1.6764,2) = 0.4324883

Two different likehood methods produce different results, which one you
should use depends 
on which likelihood makes more sense to you (or which likehood is better).

HTH

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399090.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-29 Thread Cheng Peng

My suggestion:

If compare model 1 and model 2 with model 0 respectively, the (penalized)
likelihood ratio test is valid.
IF you compare model 2 with model 3, the (penalized) likelihood ratio test
is invalid. You may want to use AIC/SBC to make a subjective decision.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399095.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-29 Thread C. Peng

My suggestion for Teresa: 

If compare model 1 and model 2 with model 0 respectively, the (penalized)
likelihood ratio test is valid. 
IF you compare model 2 with model 3, the (penalized) likelihood ratio test
is invalid. You may want to use AIC/SBC to make a subjective decision. 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399116.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival

2010-08-29 Thread Norm Matloff
Using a p-value to make any kind of decision is questionable to begin
with, and especially unreliable in choosing covariates in regression.
Old studies, e.g. by Walls and Weeks and by Bendel and Afifi, have shown
that if predictive ability is the criterion of interest and one wishes
to use p-values for deciding whether to include a covariate, one should
set the p-value bar very large, at 0.25 and even 0.40.

By contrast, methods such as AIC are aimed at avoiding overfitting, by
penalizing models with large numbers of covariates.  Same for Mallows'
Cp, cross validation etc.

So, the p-value and AIC are answering quite different questions, and
thus should not be expected to give the same or even similar results.
But, worse than that, many point out that p-values tend not to be
answering ANY question of practical interest.

It's a shame that the use of p-values is so entrenched.  I can expand on
this, with references, if there is interest.

Norm Matloff
Professor of Computer Science (formerly Statistics)
University of California, Davis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival

2010-08-29 Thread cheng peng

What statistical measure(s) tend to be answering ALL(?) question of practical
interest? 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Re-Question-regarding-significance-of-a-covariate-in-a-coxme-survival-tp2399386p2399524.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-28 Thread Teresa Iglesias
Christopher David Desjardins desja004 at umn.edu writes:

 
 Hi,
 I am running a Cox Mixed Effects Hazard model using the library coxme. I 
 am trying to model time to onset (age_sym1) of thought problems (e.g. 
 hearing voices) (sym1).  As I have siblings in my dataset,  I have 
 decided to account for this by including a random effect for family 
 (famid). My covariate of interest is Mother's diagnosis where a 0 is 
 bipolar, 1 is control, and 2 is major depression.  I am trying to fit 
 the following model.
 
 thorp1 - coxme(Surv(age_sym1, sym1) ~ lifedxm + (1 | famid), data = 
 bip.surv)
 
 Which provides the following output:
 
 -
   thorp1
 Cox mixed-effects model fit by maximum likelihood
Data: bip.surv
events, n = 99, 189 (3 observations deleted due to missingness)
Iterations= 10 54
  NULL Integrated Penalized
 Log-likelihood -479.0372 -467.3549 -435.2096
Chisqdf  p   AIC BIC
 Integrated loglik 23.36 3.00 3.3897e-05 17.36 9.58
   Penalized loglik 87.66 47.27 3.2374e-04 -6.88 -129.54
 Model: Surv(age_sym1, sym1) ~ lifedxm + (1 | famid)
 Fixed coefficients
   coef exp(coef) se(coef)  z  p
 lifedxmCONTROL -1.0576257 0.3472794 0.3872527 -2.73 0.0063
 lifedxmMAJOR   -0.6329957 0.5309987 0.3460847 -1.83 0.0670
 Random effects
   Group Variable Std DevVariance
   famid Intercept 0.980 0.9757033
 
 --
 
 So it appears that there is a difference in hazard rates associated with 
 Mother's diagnosis between BIPOLAR and CONTROL. Just to be safe, I fit a 
 model without this covariate present and decided to compare the models 
 with AIC and see if fit decreased with this covariate not in the model.
 
 --
thorp1.n - coxme(Surv(age_sym1, sym1) ~ (1 | famid), data = bip.surv)
   thorp1.n
 Cox mixed-effects model fit by maximum likelihood
Data: bip.surv
events, n = 99, 189 (3 observations deleted due to missingness)
Iterations= 10 54
  NULL Integrated Penalized
 Log-likelihood -479.0372 -471. -436.0478
Chisqdf  pAIC BIC
 Integrated loglik 15.41 1.00 0.8663 13.41 10.81
   Penalized loglik 85.98 49.48 0.00099915 -12.97 -141.37
 Model:  Surv(age_sym1, sym1) ~ (1 | famid)
 Random effects
   Group Variable Std Dev Variance
   famid Intercept 1.050949 1.104495
 
 
 The problem is that the AIC for the model w/o lifedxm is 13.41 and the 
 model w/ lifedxm is 17.36. So fit actually improved without lifedxm in 
 the model but really the fit is indistinguishable if I use Burnham  
 Anderson, 2002's criteria.
 
 So my conundrum is this - The AIC and the p-values appear to disagree. 
 The p-value would suggest that lifedxm should be included in the model 
 and the AIC says it doesn't improve fit. In general, I tend to favor the 
 AIC (I usually work with large sample sizes and multilevel models) but I 
 am just beginning with survival models and I am curious if a survival 
 model guru might suggest whether lifedxm needs to be in the model or not 
 based on my results here? Would people generally use the AIC in this 
 situation?  Also, I can't run anova() on this models because of the 
 random effect.
 
 I am happy to provide the data if necessary.
 
 Please cc me on a reply.
 
 Thanks,
 Chris
 
 

Hi Chris,
Did you get an answer to why the p-value and AIC score disagreed? 
I am getting the same results with my own data. They're not small 
disagreements either. The AIC score is telling me the opposite of 
what the p-value and the parameter coef are saying.  
The p-value and the coef for the predictor variable are in agreement.

I've also noticed in some published papers with tables containing 
p-values and AIC scores that the chisq p-value and AIC score 
are in opposition too. This makes me think I'm missing something 
and that this all actually makes sense.

However given that AIC = − 2 (log L) + 2K (where K is the number of parameters)
and seeing as how the log-likelihood scores below are in the hundreds, shouldn't
the AIC score be in the hundreds as well?? 


--

model0 (intercept only model)
   NULL Integrated Penalized
Log-likelihood -119.8470  -117.9749 -113.1215

 Chisq   dfp   AICBIC
Integrated loglik  3.74 1.00 0.052989  1.74   0.08
Penalized loglik 13.45 7.06 0.063794 -0.68 -12.43


Random effects
Group   Variable  Std Dev   Variance 
SiteIntercept0.6399200 0.4094977


_
model1 (before vs after)
 NULL Integrated Penalized
Log-likelihood -119.8470  -112.1598 -108.1663

   Chisq   df  pAIC   BIC
Integrated loglik 15.37 2.00 

Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-28 Thread David Winsemius


On Aug 27, 2010, at 4:32 PM, Teresa Iglesias wrote:


Christopher David Desjardins desja004 at umn.edu writes:



Hi,
I am running a Cox Mixed Effects Hazard model using the library  
coxme. I

am trying to model time to onset (age_sym1) of thought problems (e.g.
hearing voices) (sym1).  As I have siblings in my dataset,  I have
decided to account for this by including a random effect for family
(famid). My covariate of interest is Mother's diagnosis where a 0 is
bipolar, 1 is control, and 2 is major depression.  I am trying to fit
the following model.

thorp1 - coxme(Surv(age_sym1, sym1) ~ lifedxm + (1 | famid), data =
bip.surv)

Which provides the following output:

-

thorp1

Cox mixed-effects model fit by maximum likelihood
  Data: bip.surv
  events, n = 99, 189 (3 observations deleted due to missingness)
  Iterations= 10 54
NULL Integrated Penalized
Log-likelihood -479.0372 -467.3549 -435.2096
  Chisqdf  p   AIC BIC
Integrated loglik 23.36 3.00 3.3897e-05 17.36 9.58
 Penalized loglik 87.66 47.27 3.2374e-04 -6.88 -129.54
Model: Surv(age_sym1, sym1) ~ lifedxm + (1 | famid)
Fixed coefficients
 coef exp(coef) se(coef)  z  p
lifedxmCONTROL -1.0576257 0.3472794 0.3872527 -2.73 0.0063
lifedxmMAJOR   -0.6329957 0.5309987 0.3460847 -1.83 0.0670
Random effects
 Group Variable Std DevVariance
 famid Intercept 0.980 0.9757033

--

So it appears that there is a difference in hazard rates associated  
with
Mother's diagnosis between BIPOLAR and CONTROL. Just to be safe, I  
fit a
model without this covariate present and decided to compare the  
models
with AIC and see if fit decreased with this covariate not in the  
model.


--
  thorp1.n - coxme(Surv(age_sym1, sym1) ~ (1 | famid), data =  
bip.surv)

thorp1.n

Cox mixed-effects model fit by maximum likelihood
  Data: bip.surv
  events, n = 99, 189 (3 observations deleted due to missingness)
  Iterations= 10 54
NULL Integrated Penalized
Log-likelihood -479.0372 -471. -436.0478
  Chisqdf  pAIC BIC
Integrated loglik 15.41 1.00 0.8663 13.41 10.81
 Penalized loglik 85.98 49.48 0.00099915 -12.97 -141.37
Model:  Surv(age_sym1, sym1) ~ (1 | famid)
Random effects
 Group Variable Std Dev Variance
 famid Intercept 1.050949 1.104495


The problem is that the AIC for the model w/o lifedxm is 13.41 and  
the
model w/ lifedxm is 17.36. So fit actually improved without lifedxm  
in

the model but really the fit is indistinguishable if I use Burnham 
Anderson, 2002's criteria.

So my conundrum is this - The AIC and the p-values appear to  
disagree.
The p-value would suggest that lifedxm should be included in the  
model
and the AIC says it doesn't improve fit. In general, I tend to  
favor the
AIC (I usually work with large sample sizes and multilevel models)  
but I

am just beginning with survival models and I am curious if a survival
model guru might suggest whether lifedxm needs to be in the model  
or not

based on my results here? Would people generally use the AIC in this
situation?  Also, I can't run anova() on this models because of the
random effect.

I am happy to provide the data if necessary.

Please cc me on a reply.

Thanks,
Chris


Hi Chris,
Did you get an answer to why the p-value and AIC score disagreed?
I am getting the same results with my own data. They're not small
disagreements either. The AIC score is telling me the opposite of
what the p-value and the parameter coef are saying.
The p-value and the coef for the predictor variable are in agreement.

I've also noticed in some published papers with tables containing
p-values and AIC scores that the chisq p-value and AIC score
are in opposition too. This makes me think I'm missing something
and that this all actually makes sense.

However given that AIC = − 2 (log L) + 2K (where K is the number of  
parameters)
and seeing as how the log-likelihood scores below are in the  
hundreds, shouldn't

the AIC score be in the hundreds as well??


That is different than my understanding of AIC. I thought that the AIC  
and BIC both took as input the difference in -2LL and then adjusted  
those differences for the differences in number of degrees of freedom.  
That perspective would inform one that the absolute value of the  
deviance ( = -2LL)  is immaterial and only differences are subject to  
interpretation. The p-values are calculated as Wald tests, which are  
basically first order approximations and have lower credence than  
model level comparisons. The OP also seemed to have ignored the fact  
that the penalized estimates of AIC and BIC went in the opposite  
direction from what he might have hoped.



--


Re: [R] Question regarding significance of a covariate in a coxme survival model

2010-08-28 Thread Teresa Iglesias
On Sat, Aug 28, 2010 at 8:38 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Aug 27, 2010, at 4:32 PM, Teresa Iglesias wrote:

  Christopher David Desjardins desja004 at umn.edu writes:


 Hi,
 I am running a Cox Mixed Effects Hazard model using the library coxme. I
 am trying to model time to onset (age_sym1) of thought problems (e.g.
 hearing voices) (sym1).  As I have siblings in my dataset,  I have
 decided to account for this by including a random effect for family
 (famid). My covariate of interest is Mother's diagnosis where a 0 is
 bipolar, 1 is control, and 2 is major depression.  I am trying to fit
 the following model.

 thorp1 - coxme(Surv(age_sym1, sym1) ~ lifedxm + (1 | famid), data =
 bip.surv)

 Which provides the following output:

 -

 thorp1

 Cox mixed-effects model fit by maximum likelihood
  Data: bip.surv
  events, n = 99, 189 (3 observations deleted due to missingness)
  Iterations= 10 54
NULL Integrated Penalized
 Log-likelihood -479.0372 -467.3549 -435.2096
  Chisqdf  p   AIC BIC
 Integrated loglik 23.36 3.00 3.3897e-05 17.36 9.58
  Penalized loglik 87.66 47.27 3.2374e-04 -6.88 -129.54
 Model: Surv(age_sym1, sym1) ~ lifedxm + (1 | famid)
 Fixed coefficients
 coef exp(coef) se(coef)  z  p
 lifedxmCONTROL -1.0576257 0.3472794 0.3872527 -2.73 0.0063
 lifedxmMAJOR   -0.6329957 0.5309987 0.3460847 -1.83 0.0670
 Random effects
  Group Variable Std DevVariance
  famid Intercept 0.980 0.9757033

 --

 So it appears that there is a difference in hazard rates associated with
 Mother's diagnosis between BIPOLAR and CONTROL. Just to be safe, I fit a
 model without this covariate present and decided to compare the models
 with AIC and see if fit decreased with this covariate not in the model.

 --
  thorp1.n - coxme(Surv(age_sym1, sym1) ~ (1 | famid), data = bip.surv)

 thorp1.n

 Cox mixed-effects model fit by maximum likelihood
  Data: bip.surv
  events, n = 99, 189 (3 observations deleted due to missingness)
  Iterations= 10 54
NULL Integrated Penalized
 Log-likelihood -479.0372 -471. -436.0478
  Chisqdf  pAIC BIC
 Integrated loglik 15.41 1.00 0.8663 13.41 10.81
  Penalized loglik 85.98 49.48 0.00099915 -12.97 -141.37
 Model:  Surv(age_sym1, sym1) ~ (1 | famid)
 Random effects
  Group Variable Std Dev Variance
  famid Intercept 1.050949 1.104495
 

 The problem is that the AIC for the model w/o lifedxm is 13.41 and the
 model w/ lifedxm is 17.36. So fit actually improved without lifedxm in
 the model but really the fit is indistinguishable if I use Burnham 
 Anderson, 2002's criteria.

 So my conundrum is this - The AIC and the p-values appear to disagree.
 The p-value would suggest that lifedxm should be included in the model
 and the AIC says it doesn't improve fit. In general, I tend to favor the
 AIC (I usually work with large sample sizes and multilevel models) but I
 am just beginning with survival models and I am curious if a survival
 model guru might suggest whether lifedxm needs to be in the model or not
 based on my results here? Would people generally use the AIC in this
 situation?  Also, I can't run anova() on this models because of the
 random effect.

 I am happy to provide the data if necessary.

 Please cc me on a reply.

 Thanks,
 Chris


 Hi Chris,
 Did you get an answer to why the p-value and AIC score disagreed?
 I am getting the same results with my own data. They're not small
 disagreements either. The AIC score is telling me the opposite of
 what the p-value and the parameter coef are saying.
 The p-value and the coef for the predictor variable are in agreement.

 I've also noticed in some published papers with tables containing
 p-values and AIC scores that the chisq p-value and AIC score
 are in opposition too. This makes me think I'm missing something
 and that this all actually makes sense.

 However given that AIC = − 2 (log L) + 2K (where K is the number of
 parameters)
 and seeing as how the log-likelihood scores below are in the hundreds,
 shouldn't
 the AIC score be in the hundreds as well??


 That is different than my understanding of AIC. I thought that the AIC and
 BIC both took as input the difference in -2LL and then adjusted those
 differences for the differences in number of degrees of freedom. That
 perspective would inform one that the absolute value of the deviance ( =
 -2LL)  is immaterial and only differences are subject to interpretation. The
 p-values are calculated as Wald tests, which are basically first order
 approximations and have lower credence than model level comparisons. The OP
 also seemed to have ignored the fact that the penalized estimates of AIC and
 

Re: [R] question

2010-08-20 Thread r.ookie
I'm not sure I understand exactly what you're asking but look at the truncated 
normal distribution.

On Aug 20, 2010, at 5:13 PM, solafah bh wrote:

Hello
 
I want to know how can i sampling from upper and lower tail of normal 
distribution , in two cases , if i know the upper and lower bounds of 
distribution and if i do not.
 
Regards



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about unwanted addresses in contact list

2010-08-18 Thread Erik Iverson



Scott Compton wrote:

Hi,

It looks like about 200+ addresses from people who post on this list
have been added to my contact list without my permission. Has this
happened to other people? I use Mac OS. This is quite disconcerting
and suggests a virus floating around on this distribution list.



Very likely, your unstated mailer collects all addresses it 'sees'
and puts them in its address book.  If you don't like this behavior,
it might be able to be turned off.  But that depends on your
particular mail program, and not R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about unwanted addresses in contact list

2010-08-18 Thread Scott Compton
It doesn't happen to other lists that I belong to, just the R lists. But I will 
check.
Thanks for the tip.

Best, 
Scott

On Aug 18, 2010, at 12:23 PM, Erik Iverson wrote:

 
 
 Scott Compton wrote:
 Hi,
 It looks like about 200+ addresses from people who post on this list
 have been added to my contact list without my permission. Has this
 happened to other people? I use Mac OS. This is quite disconcerting
 and suggests a virus floating around on this distribution list.
 
 Very likely, your unstated mailer collects all addresses it 'sees'
 and puts them in its address book.  If you don't like this behavior,
 it might be able to be turned off.  But that depends on your
 particular mail program, and not R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about unwanted addresses in contact list

2010-08-18 Thread Corey Gallon
The suggestion that a virus floating around on this distribution list is 
baseless.  All source code distributed on the list is embedded in messages in 
plaintext -- i.e. there is no distribution of binary files as part of the 
mailing list.  How would a virus propagate under these circumstances?

Your complaint is almost certainly the result of your email client and its 
particular configuration.

* Sent from my BlackBerry handheld device

-Original Message-
From: Scott Compton scomp...@duke.edu
Sender: r-help-boun...@r-project.org
Date: Wed, 18 Aug 2010 12:32:15 
To: Erik Iversoner...@ccbr.umn.edu
Cc: r-help@r-project.org
Subject: Re: [R] question about unwanted addresses in contact list

It doesn't happen to other lists that I belong to, just the R lists. But I will 
check.
Thanks for the tip.

Best, 
Scott

On Aug 18, 2010, at 12:23 PM, Erik Iverson wrote:

 
 
 Scott Compton wrote:
 Hi,
 It looks like about 200+ addresses from people who post on this list
 have been added to my contact list without my permission. Has this
 happened to other people? I use Mac OS. This is quite disconcerting
 and suggests a virus floating around on this distribution list.
 
 Very likely, your unstated mailer collects all addresses it 'sees'
 and puts them in its address book.  If you don't like this behavior,
 it might be able to be turned off.  But that depends on your
 particular mail program, and not R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about bayesian model selection for quantile regression

2010-08-17 Thread Xin__ Li
Hi All:
the package MuMIn can be used to select the model based on AIC or AICc.
The code is as follows:

data(Cement)
lm1 - lm(y ~ ., data = Cement)
dd - dredge(lm1,rank=AIC)
print(dd)

If I want to select the model by BIC, what code do I need to use? And when
to select the best model based on AIC, what the differences between the
function dredge in packageMuMIn and the function stepAIC in package
MASS

Many thanks

best wishes

XIN LI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on contour function

2010-08-12 Thread ba ba
Duncan and David, thank you so much.

You are right. We can use
z1 - outer(x, y, function(x,y) x^2+3*y^2)
rather than
   xy - meshgrid(x,y)
   z2 - xy$x^2+ 3*xy$y^2
to get right answer.  I run these codes on my computer and found that z2 is
the transpose of z1.

So I guess in order to obtain the expected result, there are at least two
ways.

   x - seq(-1,1,0.1)
   y - seq(-1,1,0.1)
   z - outer(x,y, FUN=function(x,y) x^2+ 3*y^2)
   contour(x,y,z,col=blue,xlab=x,ylab=y)

or
   require(RTOMO)
   x - seq(-1,1,0.1)
   y - seq(-1,1,0.1)
   xy - meshgrid(x,y)
   z - xy$x^2+ 3*xy$y^2
   z - t(z)
   contour(x,y,z,col=blue,xlab=x,ylab=y)

Of course, the first method is better since it only uses the base function.

David Lee

On 12 August 2010 01:54, David Winsemius dwinsem...@comcast.net wrote:


 On Aug 11, 2010, at 11:16 AM, ba ba wrote:

  Dear All,

 I tried to plot contour lines using R function contour, but got the
 results
 which are not expected.

 require(RTOMO)
 x - seq(-1,1,0.1)
 y - seq(-1,1,0.1)
 xy - meshgrid(x,y)

 z - xy$x^2+ 3*xy$y^2
 contour(x,y,z,col=blue,xlab=x,ylab=y)

 The above code gave me the contour graph for z=3*x^2+y^2 rather than
 z=x^2+3*y^2. Is anyone know the reason?


 Because contour was expecting a matrix of z values for z and you gave it a
 list created by a function you did not understand?

  meshgrid
 function (a, b)
 {
return(list(x = outer(b * 0, a, FUN = +), y = outer(b,
a * 0, FUN = +)))
 }

 Instead:
 Use the base function outer():


  x - seq(-1,1,0.1)
  y - seq(-1,1,0.1)
  xy - outer(x,y, FUN=function(x,y) x^2+ 3*y^2)
 
 
  contour(x,y,xy,col=blue,xlab=x,ylab=y)

 --
 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about bayesian model selection for quantile regression

2010-08-11 Thread Xin__ Li

 Hello:

  I try to use function bms to select the best model with the biggest
posterior probability for different quantile. I have one return and 16
variables. However, I can just select for the linear model, how to change
the code for quantile:


  bma1=bms(rq(y~.,tau=0.25,data=EQT), burn = 1000, iter = 3000, mcmc=bd)
 Error in mpparam[[1]] : subscript out of bounds


The code for linear model is sucessful:


  bma1=bms(EQT, burn = 1000, iter = 3000, mcmc=bd)
 where EQT is my data

   beta.draws.bma(bma1[1:5])

how to use monte carlo to select the best model for quantile? Or which other
function should be used?



 Best wishes
 XIN LI


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on contour function

2010-08-11 Thread Duncan Murdoch

On 11/08/2010 11:16 AM, ba ba wrote:

Dear All,

I tried to plot contour lines using R function contour, but got the results
which are not expected.

require(RTOMO)
x - seq(-1,1,0.1)
y - seq(-1,1,0.1)
xy - meshgrid(x,y)

z - xy$x^2+ 3*xy$y^2
contour(x,y,z,col=blue,xlab=x,ylab=y)

The above code gave me the contour graph for z=3*x^2+y^2 rather than
z=x^2+3*y^2. Is anyone know the reason? Thank you very much in advance.


A problem in RTOMO, or a misuse of it?  If I do the calculation of z as

z - outer(x, y, function(x,y) x^2+3*y^2)

I get the expected result.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question on contour function

2010-08-11 Thread David Winsemius


On Aug 11, 2010, at 11:16 AM, ba ba wrote:


Dear All,

I tried to plot contour lines using R function contour, but got the  
results

which are not expected.

require(RTOMO)
x - seq(-1,1,0.1)
y - seq(-1,1,0.1)
xy - meshgrid(x,y)

z - xy$x^2+ 3*xy$y^2
contour(x,y,z,col=blue,xlab=x,ylab=y)

The above code gave me the contour graph for z=3*x^2+y^2 rather than
z=x^2+3*y^2. Is anyone know the reason?


Because contour was expecting a matrix of z values for z and you gave  
it a list created by a function you did not understand?


 meshgrid
function (a, b)
{
return(list(x = outer(b * 0, a, FUN = +), y = outer(b,
a * 0, FUN = +)))
}

Instead:
Use the base function outer():

 x - seq(-1,1,0.1)
 y - seq(-1,1,0.1)
 xy - outer(x,y, FUN=function(x,y) x^2+ 3*y^2)


 contour(x,y,xy,col=blue,xlab=x,ylab=y)

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about SVM in e1071

2010-08-05 Thread Pau Carrio Gaspar
Jack,

sorry for the late answer.
I agree that my last post is misleading. Here a new try:
* *
Increasing the value of *C* (...) forces the creation of a more accurate
model, that may not generalise well.(Try to imagine the feature space with
the two mapped sets very far from each other ) A model that fits better the
training data is done by adding more SV ( till we get a convex hull of the
data ), this is done reducing the soft margin (i.e. decreasing C ) ( and
again that may not generalise well, maybe you can do a  program witch
cross-validation )

Here is another question: is the complexity of the boundary determined by
number of total SVs (bounded SV + free SV) or free SVs only?
What do you mean by complexity of the boundary ?

Regards
Pau

2010/7/28 Jack Luo jluo.rh...@gmail.com

 Pau,

 Sorry for getting back to you for this again. I am getting confused about
 your interpretation of 3). It is obvious from your code that increasing C
 results in* smaller *number of SVs, this seems to contradict with your
 interpretation  * Increasing the value of C (...) forces the creation of
 a more accurate model.* A more accurate model is done my adding more SV.
 In addition, I got to know that the number of SVs increases with C
 decreasing is because there are many bounded SVs (whose alpha = C, remember
 0  alpha = C), those SVs with alpha smaller than C is called free SVs.
 Here is another question: is the complexity of the boundary determined by
 number of total SVs (bounded SV + free SV) or free SVs only?

 Thanks a bunch,

 -Jack


 On Thu, Jul 15, 2010 at 4:17 AM, Pau Carrio Gaspar paucar...@gmail.comwrote:

 Hi Jack,

 to 1) and 2) there are telling you the same. I recommend you to read the
 first sections of the article it is very well writen and clear. There you
 will read about duality.

 to 3) I interpret the scatter plot so: * Increasing the value of C (...)
 forces the creation of a more accurate model.* A more accurate model is
 done my adding more SV ( till we get a convex hull of the data )

 hope it helps
 Regards
 Pau

 2010/7/14 Jack Luo jluo.rh...@gmail.com

 Pau,

 Thanks a lot for your email, I found it very helpful. Please see below
 for my reply, thanks.

 -Jack

 On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar paucar...@gmail.com
  wrote:

  Hello Jack,

 1 ) why do you thought that  larger C is prone to overfitting than
 smaller C ?


   *There is some statement in the link http://www.dtreg.com/svm.htm

 To allow some flexibility in separating the categories, SVM models have
 a cost parameter, C, that controls the trade off between allowing
 training errors and forcing rigid margins. It   creates a soft marginthat 
 permits some misclassifications. Increasing the value of
 C increases the cost of misclassifying points and forces the creation of
 a more accurate model that may not generalize well.

 My understanding is that this means larger C may not generalize well
 (prone to overfitting).
 *

 2 ) if you look at the formulation of the quadratic program problem you
 will see that  C rules the error of the cutting plane  ( and overfitting
 ). Therfore for hight C you allow that the cutting plane cuts worse the
 set, so SVM needs less points to build it. a proper explanation is in
 Kristin P. Bennett and Colin Campbell, Support Vector Machines: Hype or
 Hallelujah?, SIGKDD Explorations, 2,2, 2000, 1-13.
 http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf

 *Could you be more specific about this? I don't quite understand. *


 3) you might find usefull this plots:

 library(e1071)
 m1 - matrix( c(
 0,0,0,1,1,2, 1, 2,3,2,3, 3, 0,
 1,2,3,0, 1, 2, 3,
 1,2,3,2,3,3, 0, 0,0,1, 1, 2, 4,
 4,4,4,0, 1, 2, 3,
 1,1,1,1,1,1,-1,-1,  -1,-1,-1,-1, 1 ,1,1,1,
  1, 1,-1,-1
 ), ncol = 3 )

 Y = m1[,3]
 X = m1[,1:2]

 df = data.frame( X , Y )

 par(mfcol=c(4,2))
 for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
 #cost - 1
 model.svm - svm( Y ~ . , data = df ,  type = C-classification ,
 kernel = linear, cost = cost,
  scale =FALSE )
 #print(model.svm$SV)

 plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( cost: ,cost, #SV: ,
 nrow(model.svm$SV) ))
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=3, col=green)
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=4, col=blue)
 points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = red)
 }
 *
 *

 *Thanks a lot for the code, I really appreciate it. I've run it, but I
 am not sure how should I interpret the scatter plot, although it is obvious
 that number of SVs decreases with cost increasing. *


 Regards
 Pau


 2010/7/14 Jack Luo jluo.rh...@gmail.com

 Hi,

 I have a question about the parameter C (cost) in svm function in
 e1071. I
 thought larger C is prone to overfitting than smaller C, and hence
 leads to
 more support vectors. However, using the Wisconsin breast cancer
 example on
 the link:
 

Re: [R] question!!!!

2010-08-04 Thread Michael Bedward
Hi

 I made some design matrix X(in linear regression model)

 I there any method to normalize X?

You can normalize a matrix column-wise like this:

# m is a matrix
apply(m, 2, function(x) x / max(x) )

Or normalize row-wise like this:
t(apply(m, 1, function(x) x / max(x) ))

I'm sure there are more elegant ways to do it but those work for me.

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding S4 objects and reading in excel data with RODBC

2010-08-04 Thread Duncan Murdoch

On 04/08/2010 5:38 AM, Sander wrote:

L.S.

I am trying to get data from an excel sheet using the RODBC library, quite
like the example on page 34 of the 'Data manipulation with R' book written
by Phil Spector. It seems very straightforward, yet I get the same error
everytime I try it, with different excel sheets and trying to tweek the code
a bit. This is what I do:

library(RODBC)

sheet='X:\\example.xls'
con=odbcConnectExcel(sheet)

tbls=sqlTables(con)
tbls$TABLE_NAME
qry=paste(SELECT * FROM ',t...@table_name[1],', sep= )


The error I get is:
Error in paste(SELECT * FROM ', t...@table_name[1], ', sep =  ) : 
  trying to get slot TABLE_NAME from an object (class data.frame) that

is not an S4 object


tbls is not an S4 object, so you can't use the @ notation with it.  In 
some older versions of R @ would extract an attribute (because that's 
where S4 slots were stored), but current versions don't allow this. 
Without testing, I would guess you can replace


t...@table_name[1]

with

attr(tbls, TABLE_NAME)[1]

but it's possible the name is wrong.

Duncan Murdoch



Does anyone know what I'm doing wrong here? I have to admit that I don't
know much about sql.


Best regards,

Sander van Kuijk


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question!!!!

2010-08-04 Thread Greg Snow
?scale

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of leepama
 Sent: Tuesday, August 03, 2010 10:55 PM
 To: r-help@r-project.org
 Subject: [R] question
 
 
 I made some design matrix X(in linear regression model)
 
 I there any method to normalize X?
 --
 View this message in context: http://r.789695.n4.nabble.com/question-
 tp2312938p2312938.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question!!

2010-08-01 Thread Nikhil Kaza

?replicate
?apply
?sapply

Nikhil Kaza
Asst. Professor,
City and Regional Planning
University of North Carolina

nikhil.l...@gmail.com

On Aug 1, 2010, at 2:42 AM, leepama wrote:



hi!! imade many codes during these days..
I study Statistics

please one more question!!
ex1-function(n,p,rho){
muvec1=zeros(1,p)
A=eye(p)
 for(i in 1:p){
  for(j in 1:p){
   A[i,j]=rho^(abs(i-j))
X=mvrnorm(n,muvec1,A)}}
return(X)
}
this code generates design matrix in linare regression model..
but this code only one data set..

Is there any method to generate several data sets(design Matrix)?? 
(in R

program)

please give me help..
--
View this message in context: 
http://r.789695.n4.nabble.com/question-tp2309277p2309277.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about SVM in e1071

2010-07-28 Thread Jack Luo
Pau,

Sorry for getting back to you for this again. I am getting confused about
your interpretation of 3). It is obvious from your code that increasing C
results in* smaller *number of SVs, this seems to contradict with your
interpretation  * Increasing the value of C (...) forces the creation of a
more accurate model.* A more accurate model is done my adding more SV.
In addition, I got to know that the number of SVs increases with C
decreasing is because there are many bounded SVs (whose alpha = C, remember
0  alpha = C), those SVs with alpha smaller than C is called free SVs.
Here is another question: is the complexity of the boundary determined by
number of total SVs (bounded SV + free SV) or free SVs only?

Thanks a bunch,

-Jack

On Thu, Jul 15, 2010 at 4:17 AM, Pau Carrio Gaspar paucar...@gmail.comwrote:

 Hi Jack,

 to 1) and 2) there are telling you the same. I recommend you to read the
 first sections of the article it is very well writen and clear. There you
 will read about duality.

 to 3) I interpret the scatter plot so: * Increasing the value of C (...)
 forces the creation of a more accurate model.* A more accurate model is
 done my adding more SV ( till we get a convex hull of the data )

 hope it helps
 Regards
 Pau

 2010/7/14 Jack Luo jluo.rh...@gmail.com

 Pau,

 Thanks a lot for your email, I found it very helpful. Please see below for
 my reply, thanks.

 -Jack

 On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar 
 paucar...@gmail.comwrote:

  Hello Jack,

 1 ) why do you thought that  larger C is prone to overfitting than
 smaller C ?


   *There is some statement in the link http://www.dtreg.com/svm.htm

 To allow some flexibility in separating the categories, SVM models have
 a cost parameter, C, that controls the trade off between allowing
 training errors and forcing rigid margins. It   creates a soft marginthat 
 permits some misclassifications. Increasing the value of
 C increases the cost of misclassifying points and forces the creation of
 a more accurate model that may not generalize well.

 My understanding is that this means larger C may not generalize well
 (prone to overfitting).
 *

 2 ) if you look at the formulation of the quadratic program problem you
 will see that  C rules the error of the cutting plane  ( and overfitting
 ). Therfore for hight C you allow that the cutting plane cuts worse the
 set, so SVM needs less points to build it. a proper explanation is in
 Kristin P. Bennett and Colin Campbell, Support Vector Machines: Hype or
 Hallelujah?, SIGKDD Explorations, 2,2, 2000, 1-13.
 http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf

 *Could you be more specific about this? I don't quite understand. *


 3) you might find usefull this plots:

 library(e1071)
 m1 - matrix( c(
 0,0,0,1,1,2, 1, 2,3,2,3, 3, 0,
 1,2,3,0, 1, 2, 3,
 1,2,3,2,3,3, 0, 0,0,1, 1, 2, 4, 4,4,4,
 0, 1, 2, 3,
 1,1,1,1,1,1,-1,-1,  -1,-1,-1,-1, 1 ,1,1,1, 1,
 1,-1,-1
 ), ncol = 3 )

 Y = m1[,3]
 X = m1[,1:2]

 df = data.frame( X , Y )

 par(mfcol=c(4,2))
 for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
 #cost - 1
 model.svm - svm( Y ~ . , data = df ,  type = C-classification , kernel
 = linear, cost = cost,
  scale =FALSE )
 #print(model.svm$SV)

 plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( cost: ,cost, #SV: ,
 nrow(model.svm$SV) ))
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=3, col=green)
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=4, col=blue)
 points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = red)
 }
 *
 *

 *Thanks a lot for the code, I really appreciate it. I've run it, but I am
 not sure how should I interpret the scatter plot, although it is obvious
 that number of SVs decreases with cost increasing. *


 Regards
 Pau


 2010/7/14 Jack Luo jluo.rh...@gmail.com

 Hi,

 I have a question about the parameter C (cost) in svm function in e1071.
 I
 thought larger C is prone to overfitting than smaller C, and hence leads
 to
 more support vectors. However, using the Wisconsin breast cancer example
 on
 the link:
 http://planatscher.net/svmtut/svmtut.html
 I found that the largest cost have fewest support vectors, which is
 contrary
 to what I think. please see the scripts below:
 Am I misunderstanding something here?

 Thanks a bunch,

 -Jack

  model1 - svm(databctrain, classesbctrain, kernel = linear, cost =
 0.01)
  model2 - svm(databctrain, classesbctrain, kernel = linear, cost =
 1)
  model3 - svm(databctrain, classesbctrain, kernel = linear, cost =
 100)
  model1

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 0.01)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  0.01
  gamma:  0.111

 Number of Support Vectors:  99

  model2

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 1)


 Parameters:
   SVM-Type:  

Re: [R] Question regarding panel data diagnostic

2010-07-26 Thread Setlhare Lekgatlhamang
Dear Lexi,
Thanks a lot for your prompt answers,
The issue i'm confronted to is the following: i have a panel data N=17
T=5  (annual observations) and wanted to check for stationarity  to
avoid a spurious regression, but the question is do i' have the right
do do so??? it's statistically correct? if no is there any alternative
method to verify if our regression is correct?

Thanks again

Ama
==
Dear Ama,
I copy my reply to the list, in case someone needs it.

Spurious regression occurs when correlation between time series
variables results from their common trends - the variables tend to move
together over some cycle. However, it may difficult to decipher whether
or not the variables in your model have significant trends; also trends
differ (see Enders 1995 Time Series Econometrics). So to deal with this,
you must perform formal integration tests.

If the variables have unit root (ie, non-stationary) then you cannot
model the variables in their levels. You must transform them by
appropriate differencing. Then you can model using a dynamic model or
error-correction model (ecm) (if the variables are cointegrated). Use of
ecm makes sense only if the time span of your data is long enough - it
is a long run concept.
Long enough depends on the phenomenon under study. If theory suggests
that equilibrium could occur within the time span of your data (17 years
in your case - this is long enough in most cases), then concepts of
cointegration and ecm are relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Saturday, July 24, 2010 1:01 PM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic


Let me correct an omission in my response below. The last sentence
should read But if the data are 10 quarterly or monthly values, these
techniques are not relevant.

Cheers
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Saturday, July 24, 2010 12:54 PM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic

My thought is this:
It depends on what you have in the panel. Are your data cross-section
data observed over ten years for, say, 3 countries (or regions within
the same country)? If so, yes you can perform integration properties
(what people usually call unit root test) and then test for
cointegration. But if the data are quarterly or monthly, these
techniques are not relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of amatoallah ouchen
Sent: Friday, July 23, 2010 12:18 AM
To: r-help@r-project.org
Subject: [R] Question regarding panel data diagnostic

Good day R-listers,
I'm currently working on a panel data analysis (N=17, T=5), in order
to check for the spurious regression problem, i have to  test for
stationarity but i've read somewhere  that i needn't to test for it as
 my T10 , what do you think? if yes  is there any other test  i have
to  perform in such case (a kind of cointegration test for small T?)

Any hint would be highly appreciated.

Ama.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\\ .{{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding panel data diagnostic

2010-07-26 Thread Setlhare Lekgatlhamang
Oops, I misread your email in respect of the number of years you have
for your data. Anyways, my comments still hold.

Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Monday, July 26, 2010 8:59 AM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic

Dear Lexi,
Thanks a lot for your prompt answers,
The issue i'm confronted to is the following: i have a panel data N=17
T=5  (annual observations) and wanted to check for stationarity  to
avoid a spurious regression, but the question is do i' have the right
do do so??? it's statistically correct? if no is there any alternative
method to verify if our regression is correct?

Thanks again

Ama
==
Dear Ama,
I copy my reply to the list, in case someone needs it.

Spurious regression occurs when correlation between time series
variables results from their common trends - the variables tend to move
together over some cycle. However, it may difficult to decipher whether
or not the variables in your model have significant trends; also trends
differ (see Enders 1995 Time Series Econometrics). So to deal with this,
you must perform formal integration tests.

If the variables have unit root (ie, non-stationary) then you cannot
model the variables in their levels. You must transform them by
appropriate differencing. Then you can model using a dynamic model or
error-correction model (ecm) (if the variables are cointegrated). Use of
ecm makes sense only if the time span of your data is long enough - it
is a long run concept.
Long enough depends on the phenomenon under study. If theory suggests
that equilibrium could occur within the time span of your data (17 years
in your case - this is long enough in most cases), then concepts of
cointegration and ecm are relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Saturday, July 24, 2010 1:01 PM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic


Let me correct an omission in my response below. The last sentence
should read But if the data are 10 quarterly or monthly values, these
techniques are not relevant.

Cheers
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Saturday, July 24, 2010 12:54 PM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic

My thought is this:
It depends on what you have in the panel. Are your data cross-section
data observed over ten years for, say, 3 countries (or regions within
the same country)? If so, yes you can perform integration properties
(what people usually call unit root test) and then test for
cointegration. But if the data are quarterly or monthly, these
techniques are not relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of amatoallah ouchen
Sent: Friday, July 23, 2010 12:18 AM
To: r-help@r-project.org
Subject: [R] Question regarding panel data diagnostic

Good day R-listers,
I'm currently working on a panel data analysis (N=17, T=5), in order
to check for the spurious regression problem, i have to  test for
stationarity but i've read somewhere  that i needn't to test for it as
 my T10 , what do you think? if yes  is there any other test  i have
to  perform in such case (a kind of cointegration test for small T?)

Any hint would be highly appreciated.

Ama.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\\\ {{dropped:15}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding panel data diagnostic

2010-07-24 Thread Setlhare Lekgatlhamang
My thought is this:
It depends on what you have in the panel. Are your data cross-section data 
observed over ten years for, say, 3 countries (or regions within the same 
country)? If so, yes you can perform integration properties (what people 
usually call unit root test) and then test for cointegration. But if the data 
are quarterly or monthly, these techniques are not relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of amatoallah ouchen
Sent: Friday, July 23, 2010 12:18 AM
To: r-help@r-project.org
Subject: [R] Question regarding panel data diagnostic

Good day R-listers,
I'm currently working on a panel data analysis (N=17, T=5), in order
to check for the spurious regression problem, i have to  test for
stationarity but i've read somewhere  that i needn't to test for it as
 my T10 , what do you think? if yes  is there any other test  i have
to  perform in such case (a kind of cointegration test for small T?)

Any hint would be highly appreciated.

Ama.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding panel data diagnostic

2010-07-24 Thread Setlhare Lekgatlhamang
Let me correct an omission in my response below. The last sentence
should read But if the data are 10 quarterly or monthly values, these
techniques are not relevant.

Cheers
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Saturday, July 24, 2010 12:54 PM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic

My thought is this:
It depends on what you have in the panel. Are your data cross-section
data observed over ten years for, say, 3 countries (or regions within
the same country)? If so, yes you can perform integration properties
(what people usually call unit root test) and then test for
cointegration. But if the data are quarterly or monthly, these
techniques are not relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of amatoallah ouchen
Sent: Friday, July 23, 2010 12:18 AM
To: r-help@r-project.org
Subject: [R] Question regarding panel data diagnostic

Good day R-listers,
I'm currently working on a panel data analysis (N=17, T=5), in order
to check for the spurious regression problem, i have to  test for
stationarity but i've read somewhere  that i needn't to test for it as
 my T10 , what do you think? if yes  is there any other test  i have
to  perform in such case (a kind of cointegration test for small T?)

Any hint would be highly appreciated.

Ama.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ..{{dropped:14}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding panel data diagnostic

2010-07-24 Thread Setlhare Lekgatlhamang

Let me correct an omission in my response below. The last sentence
should read But if the data are 10 quarterly or monthly values, these
techniques are not relevant.

Cheers
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Setlhare Lekgatlhamang
Sent: Saturday, July 24, 2010 12:54 PM
To: amatoallah ouchen; r-help@r-project.org
Subject: Re: [R] Question regarding panel data diagnostic

My thought is this:
It depends on what you have in the panel. Are your data cross-section
data observed over ten years for, say, 3 countries (or regions within
the same country)? If so, yes you can perform integration properties
(what people usually call unit root test) and then test for
cointegration. But if the data are quarterly or monthly, these
techniques are not relevant.

Hope this helps.
Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of amatoallah ouchen
Sent: Friday, July 23, 2010 12:18 AM
To: r-help@r-project.org
Subject: [R] Question regarding panel data diagnostic

Good day R-listers,
I'm currently working on a panel data analysis (N=17, T=5), in order
to check for the spurious regression problem, i have to  test for
stationarity but i've read somewhere  that i needn't to test for it as
 my T10 , what do you think? if yes  is there any other test  i have
to  perform in such case (a kind of cointegration test for small T?)

Any hint would be highly appreciated.

Ama.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ..{{dropped:14}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about a perceived irregularity in R syntax

2010-07-23 Thread Duncan Murdoch

Nordlund, Dan (DSHS/RDA) wrote:

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Peter Dalgaard
Sent: Thursday, July 22, 2010 3:13 PM
To: Pat Schmitz
Cc: r-help@r-project.org
Subject: Re: [R] Question about a perceived irregularity in R syntax

Pat Schmitz wrote:


Both vector query's can select the values from the data.frame as
  

written,


however in the first form assigning a value to said selected numbers
  

fails.


 Can you explain the reason this fails?

dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))

dat$Value[dat$Value == NA] - 1 #Why does this  fails to work,
dat$Value[dat$Value %in% NA] - 1 #While this does work?


#Particularly when str() results in an equivalent class
dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
str(dat$Value[dat$Value %in% NA])
str(dat$Value[dat$Value == NA])
  

1. NA and NA are very different things
2. checkout is.na() and its help page





I also would have suggested is.na to do the replacement.  What surprised me was that 

dat$Value[dat$Value %in% NA] - 1 

actually worked.  I guess I always assumed that if 

  

NA == NA


[1] NA

then an attempt to compare NA to elements in a vector would also return NA, but 
not so.

  

NA %in% c(1,NA,3)


[1] TRUE


Learned something new today,


I suspect that's not intentional, though I'm not sure it should be 
fixed.  According to the usual convention the result should be a logical NA.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about a perceived irregularity in R syntax

2010-07-23 Thread Duncan Murdoch

On 23/07/2010 7:14 AM, Duncan Murdoch wrote:

Nordlund, Dan (DSHS/RDA) wrote:
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Peter Dalgaard
 Sent: Thursday, July 22, 2010 3:13 PM
 To: Pat Schmitz
 Cc: r-help@r-project.org
 Subject: Re: [R] Question about a perceived irregularity in R syntax

 Pat Schmitz wrote:
 
 Both vector query's can select the values from the data.frame as
   
 written,
 
 however in the first form assigning a value to said selected numbers
   
 fails.
 
  Can you explain the reason this fails?


 dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))

 dat$Value[dat$Value == NA] - 1 #Why does this  fails to work,
 dat$Value[dat$Value %in% NA] - 1 #While this does work?


 #Particularly when str() results in an equivalent class
 dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
 str(dat$Value[dat$Value %in% NA])
 str(dat$Value[dat$Value == NA])
   
 1. NA and NA are very different things

 2. checkout is.na() and its help page


 

 I also would have suggested is.na to do the replacement.  What surprised me was that 

 dat$Value[dat$Value %in% NA] - 1 

 actually worked.  I guess I always assumed that if 

   
 NA == NA
 
 [1] NA


 then an attempt to compare NA to elements in a vector would also return NA, 
but not so.

   
 NA %in% c(1,NA,3)
 
 [1] TRUE



 Learned something new today,

I suspect that's not intentional, though I'm not sure it should be 
fixed.  According to the usual convention the result should be a logical NA.


Oops, not true. The behaviour is clearly documented in ?match:

Exactly what matches what is to some extent a matter of
definition. For all types, ‘NA’ matches ‘NA’ and no other
value. For real and complex values, ‘NaN’ values are regarded
as matching any other ‘NaN’ value, but not matching ‘NA’.

Thanks to Brian Ripley (the author of that paragraph) for pointing this 
out to me. Not sure how I missed it on my first reading, but the fact 
that it preceded my morning coffee might be a contributing factor.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about a perceived irregularity in R syntax

2010-07-22 Thread Peter Dalgaard
Pat Schmitz wrote:
 Both vector query's can select the values from the data.frame as written,
 however in the first form assigning a value to said selected numbers fails.
  Can you explain the reason this fails?
 
 dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
 
 dat$Value[dat$Value == NA] - 1 #Why does this  fails to work,
 dat$Value[dat$Value %in% NA] - 1 #While this does work?
 
 
 #Particularly when str() results in an equivalent class
 dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
 str(dat$Value[dat$Value %in% NA])
 str(dat$Value[dat$Value == NA])

1. NA and NA are very different things
2. checkout is.na() and its help page



-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about a perceived irregularity in R syntax

2010-07-22 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Peter Dalgaard
 Sent: Thursday, July 22, 2010 3:13 PM
 To: Pat Schmitz
 Cc: r-help@r-project.org
 Subject: Re: [R] Question about a perceived irregularity in R syntax
 
 Pat Schmitz wrote:
  Both vector query's can select the values from the data.frame as
 written,
  however in the first form assigning a value to said selected numbers
 fails.
   Can you explain the reason this fails?
 
  dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
 
  dat$Value[dat$Value == NA] - 1 #Why does this  fails to work,
  dat$Value[dat$Value %in% NA] - 1 #While this does work?
 
 
  #Particularly when str() results in an equivalent class
  dat - data.frame(index = 1:10, Value = c(1:4, NA, 6, NA, 8:10))
  str(dat$Value[dat$Value %in% NA])
  str(dat$Value[dat$Value == NA])
 
 1. NA and NA are very different things
 2. checkout is.na() and its help page
 
 

I also would have suggested is.na to do the replacement.  What surprised me was 
that 

dat$Value[dat$Value %in% NA] - 1 

actually worked.  I guess I always assumed that if 

 NA == NA
[1] NA

then an attempt to compare NA to elements in a vector would also return NA, but 
not so.

 NA %in% c(1,NA,3)
[1] TRUE


Learned something new today,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about allocMatrix error message

2010-07-21 Thread Phil Spector

Moohwan -
   It appears that you are trying to calculate a 10
by 10 matrix when all you want are the diagonal 
elements.  Here's one approach that might work:


s = t(dev)%*%dev/(nr-1)
sinv = solve(s)
part = sinv%*%t(dev)
t2 = dev[,1]*part[1,] + dev[,2]*part[2,]

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Wed, 21 Jul 2010, Moohwan Kim wrote:


Dear R family,

I faced a technical problem in r coding.
#s=t(dev)%*%dev/(nr-1) # dev (100,000 by 2) stands for
deviation from the mean
#sinv=solve(s)
#t2=diag(dev%*%sinv%*%t(dev))

I got an error message at t2 statement:
Error in diag(dev %*% si %*% t(dev)) : allocMatrix: too many elements specified

Please let me know if there is a way to overcome this problem.

best
moohwan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question from day2 beginner

2010-07-20 Thread maxcheese

Thanks for the explanation!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Question-from-day2-beginner-tp2293221p2294990.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about sign of probit estimates

2010-07-20 Thread peter dalgaard

On Jul 20, 2010, at 2:39 AM, Nita Umashankar wrote:

 Hello,
 I am getting some results from my Probit estimation in R that are in the
 opposite direction of what I hypothesized.  In sas, the default is
 probability that y=0 (instead of 1) so one needs to type the word
 descending to get P(y=1).  Is the same true for R?  Is the default set to
 P(0)?

Why not just try it?

 y - c(1,0,0)
 glm(y~1,binomial(probit))

Call:  glm(formula = y ~ 1, family = binomial(probit)) 

Coefficients:
(Intercept)  
-0.4307  

Intercept is less than zero so it's probit(P(1))==qnorm(1/3).

(BTW, I don't think what SAS does is a well-defined concept. There are three 
procedures that will do probit regression - fortunately PROC CATMOD will not... 
- and they each have more than one way of specifying the response variable.) 


 Thank you in advance.
 
 Nita Umashankar
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question from day2 beginner

2010-07-19 Thread Patrick Burns

Try:

ar(logrvar, aic=TRUE)$resid

The problem you are running into is that
'resid' is a generic function, and the 'ar'
function neither returns an object with a
class nor returns an object that the default
method works with.

The 'summary' command you used might have
instead used 'str'.

The 'hints' document mentioned in my signature
might help you with such questions.

Welcome to the R world.

On 18/07/2010 18:16, maxcheese wrote:


Hello, I just started learning R and have a very basic question:

When I try ar model and extract residuals it returns Null.  Could someone
explain what's going on here?  Does that mean the model is null?  But it
seems residual has a length=# observation of the original series according
to the model summary.  I am learning it by myself and have no one to ask
here so please allow me to ask this very basic question...  Thank you very
much.


summary(ar(logrvar, aic=TRUE))


  Length Class  Mode
order1  -none- numeric
ar  40  -none- numeric
var.pred 1  -none- numeric
x.mean   1  -none- numeric
aic 44  -none- numeric
n.used   1  -none- numeric
order.max1  -none- numeric
partialacf  43  -none- numeric
resid20731  -none- numeric
method   1  -none- character
series   1  -none- character
frequency1  -none- numeric
call 3  -none- call
asy.var.coef  1600  -none- numeric


resid(ar(logrvar, aic=TRUE))

NULL


--
Patrick Burns
pbu...@pburns.seanet.com
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about KLdiv and large datasets

2010-07-18 Thread Ralf B
Is the 'eps' argument part of KLdiv (was not able to find that in the
help pages) or part of a general environment (such as the graphics
parameters 'par' ) ? I am asking so that I can read about it what it
actually does to resolve the question you already raised about its
reliability...

Ralf

On Fri, Jul 16, 2010 at 10:41 AM, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2010-07-16 7:56, Ralf B wrote:

 Hi all,

 when running KL on a small data set, everything is fine:

 require(flexmix)
 n- 20
 a- rnorm(n)
 b- rnorm(n)
 mydata- cbind(a,b)
 KLdiv(mydata)

 however, when this dataset increases

 require(flexmix)
 n- 1000
 a- rnorm(n)
 b- rnorm(n)
 mydata- cbind(a,b)
 KLdiv(mydata)


 KL seems to be not defined. Can somebody explain what is going on?

 Thanks,
 Ralf

 Ralf,

 You can adjust the 'eps=' argument. But I don't know
 what this will do to the reliability of the results.

 KLdiv(mydata, eps = 1e-7)

  -Peter Ehlers


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about KLdiv and large datasets

2010-07-18 Thread Peter Ehlers

I just answered this but realize that I did so off-list.
So, for completeness, here's what I said:

I think I see the problem. From ?KLdiv, you're getting the
modeltools help page. What you need is the flexmix help
page for KLdiv. Just get to the flexmix index page
(you can do ?flexmix and then click on the 'Index' link at
the bottom). Then select KLdiv-method. Here you will find
the eps argument described. Its default value is eps=10^-4
and the description says:

eps Probabilities below this threshold are replaced by this 
threshold for numerical stability.


It's this comment that makes me doubt the reliability for
very small eps values.

Cheers,
Peter


On 2010-07-18 20:50, Ralf B wrote:

Is the 'eps' argument part of KLdiv (was not able to find that in the
help pages) or part of a general environment (such as the graphics
parameters 'par' ) ? I am asking so that I can read about it what it
actually does to resolve the question you already raised about its
reliability...

Ralf

On Fri, Jul 16, 2010 at 10:41 AM, Peter Ehlersehl...@ucalgary.ca  wrote:

On 2010-07-16 7:56, Ralf B wrote:


Hi all,

when running KL on a small data set, everything is fine:

require(flexmix)
n- 20
a- rnorm(n)
b- rnorm(n)
mydata- cbind(a,b)
KLdiv(mydata)

however, when this dataset increases

require(flexmix)
n- 1000
a- rnorm(n)
b- rnorm(n)
mydata- cbind(a,b)
KLdiv(mydata)


KL seems to be not defined. Can somebody explain what is going on?

Thanks,
Ralf


Ralf,

You can adjust the 'eps=' argument. But I don't know
what this will do to the reliability of the results.

KLdiv(mydata, eps = 1e-7)

  -Peter Ehlers



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about KLdiv and large datasets

2010-07-16 Thread Peter Ehlers

On 2010-07-16 7:56, Ralf B wrote:

Hi all,

when running KL on a small data set, everything is fine:

require(flexmix)
n- 20
a- rnorm(n)
b- rnorm(n)
mydata- cbind(a,b)
KLdiv(mydata)

however, when this dataset increases

require(flexmix)
n- 1000
a- rnorm(n)
b- rnorm(n)
mydata- cbind(a,b)
KLdiv(mydata)


KL seems to be not defined. Can somebody explain what is going on?

Thanks,
Ralf


Ralf,

You can adjust the 'eps=' argument. But I don't know
what this will do to the reliability of the results.

KLdiv(mydata, eps = 1e-7)

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-16 Thread karena

hey, guys, all these methods work perfectly. thank you!!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2291497.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question regarding varImpPlot results vs. model$importance data on package RandomForest

2010-07-15 Thread Allan Engelhardt
Use the source, Luke. varImpPlot calls 
randomForest:::importance.randomForest (yeah, that is three colons) and 
reading about the scale= parameter in help(importance, 
package=randomForest) should enlighten you.  For the impatient, try


varImpPlot(mtcars.rf, scale=FALSE)


Hope this helps a little.

Allan

On 14/07/10 00:46, Mike Williamson wrote:

Hi everyone,

 I have another Random Forest package question:

- my (presumably incorrect) understanding of the varImpPlot is that it
should plot the % increase in MSE and IncNodePurity exactly as can be
found from the importance section of the model results.
   - However, the plot does not, in fact, match the importance section
   of the random forest model.

 E.g., if you use the example given in the ?randomForest, you will see
the plot showing the highest few %IncMSE values around 17 or 18%.  But if
you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3.  Perhaps more
importantly, for the plot, it will show wt is highest %MSE, then disp,
then cyl, then hp; whereas the $importance will show wt, then disp,
then hp, then cyl.  And the ratios look somewhat different, too.
 Here is the code for that example:

set.seed(4543)
data(mtcars)
mtcars.rf- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE,
importance=TRUE)
varImpPlot(mtcars.rf)

 I am using version 2.11.1 of 'R' and version 4.5-35 of Random Forest.

 I don't really care or need for the varImpPlot to work just right.  But
I am not sure which is accurate:  the varImpPlot or the $importance
section.  Which should I trust more, especially when they disagree
appreciably?

  Thanks!
  Mike



Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here.
   -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about SVM in e1071

2010-07-15 Thread Pau Carrio Gaspar
Hi Jack,

to 1) and 2) there are telling you the same. I recommend you to read the
first sections of the article it is very well writen and clear. There you
will read about duality.

to 3) I interpret the scatter plot so: * Increasing the value of C (...)
forces the creation of a more accurate model.* A more accurate model is done
my adding more SV ( till we get a convex hull of the data )

hope it helps
Regards
Pau

2010/7/14 Jack Luo jluo.rh...@gmail.com

 Pau,

 Thanks a lot for your email, I found it very helpful. Please see below for
 my reply, thanks.

 -Jack

 On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar 
 paucar...@gmail.comwrote:

  Hello Jack,

 1 ) why do you thought that  larger C is prone to overfitting than
 smaller C ?


   *There is some statement in the link http://www.dtreg.com/svm.htm

 To allow some flexibility in separating the categories, SVM models have a
 cost parameter, C, that controls the trade off between allowing training
 errors and forcing rigid margins. It   creates a soft margin that permits
 some misclassifications. Increasing the value of C increases the cost of
 misclassifying points and forces the creation of a more accurate model that
 may not generalize well.

 My understanding is that this means larger C may not generalize well (prone
 to overfitting).
 *

 2 ) if you look at the formulation of the quadratic program problem you
 will see that  C rules the error of the cutting plane  ( and overfitting
 ). Therfore for hight C you allow that the cutting plane cuts worse the
 set, so SVM needs less points to build it. a proper explanation is in
 Kristin P. Bennett and Colin Campbell, Support Vector Machines: Hype or
 Hallelujah?, SIGKDD Explorations, 2,2, 2000, 1-13.
 http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf

 *Could you be more specific about this? I don't quite understand. *


 3) you might find usefull this plots:

 library(e1071)
 m1 - matrix( c(
 0,0,0,1,1,2, 1, 2,3,2,3, 3, 0,
 1,2,3,0, 1, 2, 3,
 1,2,3,2,3,3, 0, 0,0,1, 1, 2, 4, 4,4,4,
 0, 1, 2, 3,
 1,1,1,1,1,1,-1,-1,  -1,-1,-1,-1, 1 ,1,1,1, 1,
 1,-1,-1
 ), ncol = 3 )

 Y = m1[,3]
 X = m1[,1:2]

 df = data.frame( X , Y )

 par(mfcol=c(4,2))
 for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
 #cost - 1
 model.svm - svm( Y ~ . , data = df ,  type = C-classification , kernel
 = linear, cost = cost,
  scale =FALSE )
 #print(model.svm$SV)

 plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( cost: ,cost, #SV: ,
 nrow(model.svm$SV) ))
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=3, col=green)
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=4, col=blue)
 points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = red)
 }
 *
 *

 *Thanks a lot for the code, I really appreciate it. I've run it, but I am
 not sure how should I interpret the scatter plot, although it is obvious
 that number of SVs decreases with cost increasing. *


 Regards
 Pau


 2010/7/14 Jack Luo jluo.rh...@gmail.com

 Hi,

 I have a question about the parameter C (cost) in svm function in e1071.
 I
 thought larger C is prone to overfitting than smaller C, and hence leads
 to
 more support vectors. However, using the Wisconsin breast cancer example
 on
 the link:
 http://planatscher.net/svmtut/svmtut.html
 I found that the largest cost have fewest support vectors, which is
 contrary
 to what I think. please see the scripts below:
 Am I misunderstanding something here?

 Thanks a bunch,

 -Jack

  model1 - svm(databctrain, classesbctrain, kernel = linear, cost =
 0.01)
  model2 - svm(databctrain, classesbctrain, kernel = linear, cost = 1)
  model3 - svm(databctrain, classesbctrain, kernel = linear, cost =
 100)
  model1

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 0.01)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  0.01
  gamma:  0.111

 Number of Support Vectors:  99

  model2

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 1)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  1
  gamma:  0.111

 Number of Support Vectors:  46

  model3

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 100)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  100
  gamma:  0.111

 Number of Support Vectors:  44

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] question about SVM in e1071

2010-07-14 Thread Pau Carrio Gaspar
 Hello Jack,

1 ) why do you thought that  larger C is prone to overfitting than smaller
C ?

2 ) if you look at the formulation of the quadratic program problem you will
see that  C rules the error of the cutting plane  ( and overfitting ).
Therfore for hight C you allow that the cutting plane cuts worse the set,
so SVM needs less points to build it. a proper explanation is in Kristin P.
Bennett and Colin Campbell, Support Vector Machines: Hype or Hallelujah?,
SIGKDD Explorations, 2,2, 2000, 1-13.
http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf

3) you might find usefull this plots:

library(e1071)
m1 - matrix( c(
0,0,0,1,1,2, 1, 2,3,2,3, 3, 0, 1,2,3,
0, 1, 2, 3,
1,2,3,2,3,3, 0, 0,0,1, 1, 2, 4, 4,4,4,0,
1, 2, 3,
1,1,1,1,1,1,-1,-1,  -1,-1,-1,-1, 1 ,1,1,1, 1,
1,-1,-1
), ncol = 3 )

Y = m1[,3]
X = m1[,1:2]

df = data.frame( X , Y )

par(mfcol=c(4,2))
for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
#cost - 1
model.svm - svm( Y ~ . , data = df ,  type = C-classification , kernel =
linear, cost = cost,
 scale =FALSE )
#print(model.svm$SV)

plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( cost: ,cost, #SV: ,
nrow(model.svm$SV) ))
points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=3, col=green)
points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=4, col=blue)
points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = red)
}


Regards
Pau


2010/7/14 Jack Luo jluo.rh...@gmail.com

 Hi,

 I have a question about the parameter C (cost) in svm function in e1071. I
 thought larger C is prone to overfitting than smaller C, and hence leads to
 more support vectors. However, using the Wisconsin breast cancer example on
 the link:
 http://planatscher.net/svmtut/svmtut.html
 I found that the largest cost have fewest support vectors, which is
 contrary
 to what I think. please see the scripts below:
 Am I misunderstanding something here?

 Thanks a bunch,

 -Jack

  model1 - svm(databctrain, classesbctrain, kernel = linear, cost =
 0.01)
  model2 - svm(databctrain, classesbctrain, kernel = linear, cost = 1)
  model3 - svm(databctrain, classesbctrain, kernel = linear, cost = 100)
  model1

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 0.01)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  0.01
  gamma:  0.111

 Number of Support Vectors:  99

  model2

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 1)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  1
  gamma:  0.111

 Number of Support Vectors:  46

  model3

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 100)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  100
  gamma:  0.111

 Number of Support Vectors:  44

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about SVM in e1071

2010-07-14 Thread Jack Luo
Pau,

Thanks a lot for your email, I found it very helpful. Please see below for
my reply, thanks.

-Jack

On Wed, Jul 14, 2010 at 10:36 AM, Pau Carrio Gaspar paucar...@gmail.comwrote:

  Hello Jack,

 1 ) why do you thought that  larger C is prone to overfitting than smaller
 C ?


  *There is some statement in the link http://www.dtreg.com/svm.htm

To allow some flexibility in separating the categories, SVM models have a
cost parameter, C, that controls the trade off between allowing training
errors and forcing rigid margins. It   creates a soft margin that permits
some misclassifications. Increasing the value of C increases the cost of
misclassifying points and forces the creation of a more accurate model that
may not generalize well.

My understanding is that this means larger C may not generalize well (prone
to overfitting).
*
2 ) if you look at the formulation of the quadratic program problem you will
see that  C rules the error of the cutting plane  ( and overfitting ).
Therfore for hight C you allow that the cutting plane cuts worse the set,
so SVM needs less points to build it. a proper explanation is in Kristin P.
Bennett and Colin Campbell, Support Vector Machines: Hype or Hallelujah?,
SIGKDD Explorations, 2,2, 2000, 1-13.
http://www.idi.ntnu.no/emner/it3704/lectures/papers/Bennett_2000_Support.pdf

*Could you be more specific about this? I don't quite understand. *


 3) you might find usefull this plots:

 library(e1071)
 m1 - matrix( c(
 0,0,0,1,1,2, 1, 2,3,2,3, 3, 0,
 1,2,3,0, 1, 2, 3,
 1,2,3,2,3,3, 0, 0,0,1, 1, 2, 4, 4,4,4,
 0, 1, 2, 3,
 1,1,1,1,1,1,-1,-1,  -1,-1,-1,-1, 1 ,1,1,1, 1,
 1,-1,-1
 ), ncol = 3 )

 Y = m1[,3]
 X = m1[,1:2]

 df = data.frame( X , Y )

 par(mfcol=c(4,2))
 for( cost in c( 1e-3 ,1e-2 ,1e-1, 1e0,  1e+1, 1e+2 ,1e+3)) {
 #cost - 1
 model.svm - svm( Y ~ . , data = df ,  type = C-classification , kernel =
 linear, cost = cost,
  scale =FALSE )
 #print(model.svm$SV)

 plot(x=0,ylim=c(0,5), xlim=c(0,3),main= paste( cost: ,cost, #SV: ,
 nrow(model.svm$SV) ))
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=3, col=green)
 points(m1[m1[,3]0,1], m1[m1[,3]0,2], pch=4, col=blue)
 points(model.svm$SV[,1],model.svm$SV[,2], pch=18 , col = red)
 }
 *
 *

*Thanks a lot for the code, I really appreciate it. I've run it, but I am
not sure how should I interpret the scatter plot, although it is obvious
that number of SVs decreases with cost increasing. *


 Regards
 Pau


 2010/7/14 Jack Luo jluo.rh...@gmail.com

 Hi,

 I have a question about the parameter C (cost) in svm function in e1071. I
 thought larger C is prone to overfitting than smaller C, and hence leads
 to
 more support vectors. However, using the Wisconsin breast cancer example
 on
 the link:
 http://planatscher.net/svmtut/svmtut.html
 I found that the largest cost have fewest support vectors, which is
 contrary
 to what I think. please see the scripts below:
 Am I misunderstanding something here?

 Thanks a bunch,

 -Jack

  model1 - svm(databctrain, classesbctrain, kernel = linear, cost =
 0.01)
  model2 - svm(databctrain, classesbctrain, kernel = linear, cost = 1)
  model3 - svm(databctrain, classesbctrain, kernel = linear, cost =
 100)
  model1

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 0.01)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  0.01
  gamma:  0.111

 Number of Support Vectors:  99

  model2

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 1)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  1
  gamma:  0.111

 Number of Support Vectors:  46

  model3

 Call:
 svm.default(x = databctrain, y = classesbctrain, kernel = linear,
cost = 100)


 Parameters:
   SVM-Type:  C-classification
  SVM-Kernel:  linear
   cost:  100
  gamma:  0.111

 Number of Support Vectors:  44

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-14 Thread Gabor Grothendieck
On Wed, Jul 14, 2010 at 2:21 PM, karena dr.jz...@gmail.com wrote:

 Hi,

 I have a data.frame as following:
 var1         var2
 1           ab_c_(ok)
 2           okf789(db)_c
 3           jojfiod(90).gt
 4           ij_(78)__op
 5           (iojfodjfo)_ab

 what I want is to create a new variable called var3. the value of var3 is
 the content in the Parentheses. so var3 would be:
 var3
 ok
 db
 90
 78
 iojfodjfo


Here are several alternatives.  The gsub solution matches everything
up to the ( as well as everything after the ) and replaces each with
nothing.  The strsplit solution splits each into three fields,
everything before the (, everything with in the (), and everything
after the ) and the picks off the second.  The strapply solution
matches everything from ( to ) and returns everything between them.
The below works whether DF$var2 is factor or character but if you know
its character you can drop the as.character in #2 and #3.

# 1
gsub(.*[(]|[)].*, , DF$var2)

# 2
sapply(strsplit(as.character(DF$var2), [()]), [, 2)

# 3
library(gsubfn)
strapply(as.character(DF$var2), [(](.*)[)], simplify = TRUE)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-14 Thread Wu Gong

Try this:

text - 'var1 var2
1 ab_c_(ok)
2 okf789(db)_c
3 jojfiod(90).gt
4 ij_(78)__op
5 (iojfodjfo)_ab'

df - read.table(textConnection(text), head=T, sep= ,quote=)
df$var3 - gsub((.*\\()(.*)(\\).*),\\2,df$var2)


-
A R learner.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289327.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about string handling....

2010-07-14 Thread Henrique Dallazuanna
Another option could be:

df$var3 - gsub(.*\\((.*)\\).*, \\1, df$var2)

On Wed, Jul 14, 2010 at 3:21 PM, karena dr.jz...@gmail.com wrote:


 Hi,

 I have a data.frame as following:
 var1 var2
 1   ab_c_(ok)
 2   okf789(db)_c
 3   jojfiod(90).gt
 4   ij_(78)__op
 5   (iojfodjfo)_ab

 what I want is to create a new variable called var3. the value of var3 is
 the content in the Parentheses. so var3 would be:
 var3
 ok
 db
 90
 78
 iojfodjfo

 how to do this?

 thanks,

 karena
 --
 View this message in context:
 http://r.789695.n4.nabble.com/question-about-string-handling-tp2289178p2289178.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about food sampling analysis

2010-07-14 Thread kMan
Dear Sarah,

[snip...]
I know that samples within each facility cannot be treated as independent,
so I need an approach that accounts for (1) clustering within facilities
and

You could just use lm()  some planning. The data from within a specific
facility can be fit with a model to generate parameters that are compared
between facilities. Not to practical though - assuming the 57 production
facilities each have their own analytical lab, you'll have 57 different fits
to get your parameters from to use in your between test. Questions about
dependent data are fairly common, so it should be relatively straight
forward to get a solution and/or idea for a suitable package from the
archives.

(2) the different number of samples taken at each facility.
It's a waste of time to worry about that. You'll be comparing aggregate
values between groups, and you'll have too few data-points within a group to
detect within effects... 

[snip...]

Sincerely,
KeithC.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about food sampling analysis

2010-07-13 Thread Richard Weeks
Hi Sarah,

We regularly undertake work in the food sector and have developed many
custom built solutions. To be more specific, the statistics we employ is
that of sensory analysis and we regularly use the sensominer package in
R.

Regards,

Richard Weeks

Mangosolutions
data analysis that delivers


Mail: rwe...@mango-solutions.com
T: +44 (0)1249 767700
F: +44 (0)1249 767707
M: +44 (0)7500 040365

Unit 2 Greenways Business Park
Bellinger Close
Chippenham
Wilts
SN15 1BN
UK
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Sarah Henderson
Sent: 12 July 2010 22:42
To: R List
Subject: [R] Question about food sampling analysis

Greetings to all, and my apologies for a question that is mostly about
statistics and secondarily about R.  I have just started a new job that
(this week, apparently) requires statistical knowledge beyond my
training
(as an epidemiologist).

The problem:

- We have 57 food production facilities in three categories
- Samples of 4-6 different foods were tested for listeria at each
facility
- I need to describe the presence of listeria in food (1) overall and
(2) by
facility category.

I know that samples within each facility cannot be treated as
independent,
so I need an approach that accounts for (1) clustering within facilities
and
(2) the different number of samples taken at each facility.  If someone
could kindly point me towards the right type of analysis for this and/or
its
associated R functions/packages, I would greatly appreciate it.

Many thanks,

Sarah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
LEGAL NOTICE
This message is intended for the use o...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about food sampling analysis

2010-07-13 Thread Sarah Henderson
Greetings to all --

As always, I am grateful for the help of the R community.  The generosity of
other R users never ceases to impress me.  Just to recap the responses I
have had to this question, in case they can help anyone else down the line:

Robert LaBudde suggested Applied Survey Data Analysis by Heeringa et al. as
a go-to reference

Ruben Roa suggested glm() with binomial family for modeling Listeria
presence/absence, glm() with Gamma family for modeling Listeria
concentrations, or the betareg() function of the betareg package for
modeling Listeria proportions.

Jim Lemon suggest the lme family of functions in the lme4 package, and Modern
Applied Statistics with S as a good reference (which, thankfully, I have in
my library).

Richard Weeks replied to the whole group suggesting sensory analysis with
the sensominer package.

Thanks again for your time, everyone.

Sarah


On Tue, Jul 13, 2010 at 3:07 AM, Richard Weeks
rwe...@mango-solutions.comwrote:

 Hi Sarah,

 We regularly undertake work in the food sector and have developed many
 custom built solutions. To be more specific, the statistics we employ is
 that of sensory analysis and we regularly use the sensominer package in
 R.

 Regards,

 Richard Weeks

 Mangosolutions
 data analysis that delivers


 Mail: rwe...@mango-solutions.com
 T: +44 (0)1249 767700
 F: +44 (0)1249 767707
 M: +44 (0)7500 040365

 Unit 2 Greenways Business Park
 Bellinger Close
 Chippenham
 Wilts
 SN15 1BN
 UK


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Sarah Henderson
 Sent: 12 July 2010 22:42
 To: R List
 Subject: [R] Question about food sampling analysis

 Greetings to all, and my apologies for a question that is mostly about
 statistics and secondarily about R.  I have just started a new job that
 (this week, apparently) requires statistical knowledge beyond my
 training
 (as an epidemiologist).

 The problem:

 - We have 57 food production facilities in three categories
 - Samples of 4-6 different foods were tested for listeria at each
 facility
 - I need to describe the presence of listeria in food (1) overall and
 (2) by
 facility category.

 I know that samples within each facility cannot be treated as
 independent,
 so I need an approach that accounts for (1) clustering within facilities
 and
 (2) the different number of samples taken at each facility.  If someone
 could kindly point me towards the right type of analysis for this and/or
 its
 associated R functions/packages, I would greatly appreciate it.

 Many thanks,

 Sarah

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 LEGAL NOTICE
 This message is intended for the use of the named recipient(s) only and may
 contain confidential and / or privileged information. If you are not the
 intended recipient, please contact the sender and delete this message. Any
 unauthorised use of the information contained in this message is prohibited.
 Mango Solutions Limited is registered in England under No. 4560258 with its
 registered office at Suite 3, Middlesex House, Rutherford Close, Stevenage,
 Herts, SG1 2EF, UK.

 PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS EMAIL

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about lpSolve package

2010-07-06 Thread David Winsemius


On Jul 6, 2010, at 3:32 PM, Xiaoxi Gao wrote:



Hello R users,

I have two quick questions while using lpSolve package for linear  
programming. (1) the result contains both characters and numbers,  
e.g., Success: the objective function is 40.5, but I only need the  
number, can I only store the number?


?lp
?lp.object   # or just use str()

(2) How to set boundaries for variables? e.g., all variable are  
positive.


It appears you need to take time to re-read the help page and work  
through the examples.





--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question concerning VGAM

2010-07-05 Thread Martin Maechler
  == Martin Spindler martin.spind...@gmx.de
 on Mon, 5 Jul 2010 07:48:42 +0200 writes:

 Hello everyone,
 using the VGAM package and the following code

 

 library(VGAM)

 bp1 - vglm(cbind(daten$anzahl_b, daten$deckung_b) ~ ., binom2.rho,
 data=daten1)

 summary(bp1)

 coef(bp1, matrix=TRUE)

 

 produced this error message:

 
 error in object$coefficients : $ operator not defined for this S4 class

 

 I am bit confused because some day ago this error message did not show up
 and everything was fine.

 Thank you very much in advance for your help.
 
 Best,

 

 Martin


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
--- PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
--- and provide commented, minimal, self-contained, reproducible code.

Hmm, and which part of the two lines above did you not
understand?

example(vglm)

already contains uses of coef() which do work fine;
so it must be you, or your setup which breaks things.

Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question regarding panel data analysis

2010-07-01 Thread Liviu Andronic
Hello

On Thu, Jul 1, 2010 at 1:12 AM, amatoallah ouchen at.ouc...@gmail.com wrote:
 serious  issue for me . I'm currently running a panel data analysis
 i've used the plm package to perform the Tests of poolability as
 results intercepts and coefficients are assumed different. so my

The above is not very clear and you should probably give us more
information, such as minimal code and results (see the posting guide).

Otherwise please refrain from sending the same message  4 times to the
list. Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys
Isn't that exactly what you would expect when using a _generalized_
least squares compared to a normal least squares? GLS is not the same
as WLS.

http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

Cheers
Joris

On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I thought
 that maybe optimum weights from gls might be used as weights in lm (as
 shown below), but apparently this is not the case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
 = varIdent(form = ~ 1 | Species))
 aa - attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms of
 standard errors). Could you please explain why? Are these different
 types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Stats Wolf
Thanks for reply.

Yes, they do differ, but does not gls() with the weights argument
(correlation being unchanged) make the special version of GLS, as this
sentence from the page you provided says: The method leading to this
result is called Generalized Least Squares estimation (GLS), of which
WLS is just a special case?

Best,
Stats Wolf



On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the same
 as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I thought
 that maybe optimum weights from gls might be used as weights in lm (as
 shown below), but apparently this is not the case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
 = varIdent(form = ~ 1 | Species))
 aa - attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms of
 standard errors). Could you please explain why? Are these different
 types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys
Indeed, WLS is a special case of GLS, where the error covariance
matrix is a diagonal matrix. OLS is a special case of GLS, where the
error is considered homoscedastic and all weights are equal to 1. And
I now realized that the varIdent() indeed makes a diagonal covariance
matrix, so the results should be the same in fact. Sorry for missing
that one.

A closer inspection shows that the results don't differ too much. The
fitting method differs between both functions; lm.wfit uses the QR
decomposition, whereas gls() uses restricted maximum likelihood. In
Asymptopia, they should give the same result.

Cheers
Joris


On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com wrote:
 Thanks for reply.

 Yes, they do differ, but does not gls() with the weights argument
 (correlation being unchanged) make the special version of GLS, as this
 sentence from the page you provided says: The method leading to this
 result is called Generalized Least Squares estimation (GLS), of which
 WLS is just a special case?

 Best,
 Stats Wolf



 On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the same
 as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I thought
 that maybe optimum weights from gls might be used as weights in lm (as
 shown below), but apparently this is not the case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
 = varIdent(form = ~ 1 | Species))
 aa - attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms of
 standard errors). Could you please explain why? Are these different
 types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Stats Wolf
Indeed, they should give the same results, and hence I was worried to
see that the results were not that same. Suffice it to look at
standard errors and p-values: they do differ, and the differences are
not really that small.


Thanks,
Stats Wolf



On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys jorism...@gmail.com wrote:
 Indeed, WLS is a special case of GLS, where the error covariance
 matrix is a diagonal matrix. OLS is a special case of GLS, where the
 error is considered homoscedastic and all weights are equal to 1. And
 I now realized that the varIdent() indeed makes a diagonal covariance
 matrix, so the results should be the same in fact. Sorry for missing
 that one.

 A closer inspection shows that the results don't differ too much. The
 fitting method differs between both functions; lm.wfit uses the QR
 decomposition, whereas gls() uses restricted maximum likelihood. In
 Asymptopia, they should give the same result.

 Cheers
 Joris


 On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com wrote:
 Thanks for reply.

 Yes, they do differ, but does not gls() with the weights argument
 (correlation being unchanged) make the special version of GLS, as this
 sentence from the page you provided says: The method leading to this
 result is called Generalized Least Squares estimation (GLS), of which
 WLS is just a special case?

 Best,
 Stats Wolf



 On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the same
 as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_squares_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I thought
 that maybe optimum weights from gls might be used as weights in lm (as
 shown below), but apparently this is not the case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,  weights
 = varIdent(form = ~ 1 | Species))
 aa - attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris, weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms of
 standard errors). Could you please explain why? Are these different
 types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Viechtbauer Wolfgang (STAT)
The weights in 'aa' are the inverse standard deviations. But you want to use 
the inverse variances as the weights:

aa - (attributes(summary(f1)$modelStruct$varStruct)$weights)^2

And then the results are essentially identical.

Best,

--
Wolfgang Viechtbauerhttp://www.wvbauer.com/
Department of Methodology and StatisticsTel: +31 (0)43 388-2277
School for Public Health and Primary Care   Office Location:
Maastricht University, P.O. Box 616 Room B2.01 (second floor)
6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck)


Original Message
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Stats Wolf Sent:
Thursday, June 24, 2010 15:00 To: Joris Meys
Cc: r-help@r-project.org
Subject: Re: [R] Question on WLS (gls vs lm)

 Indeed, they should give the same results, and hence I was worried to
 see that the results were not that same. Suffice it to look at
 standard errors and p-values: they do differ, and the differences are
 not really that small.


 Thanks,
 Stats Wolf



 On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys jorism...@gmail.com
 wrote:
 Indeed, WLS is a special case of GLS, where the error covariance
 matrix is a diagonal matrix. OLS is a special case of GLS, where the
 error is considered homoscedastic and all weights are equal to 1. And
 I now realized that the varIdent() indeed makes a diagonal covariance
 matrix, so the results should be the same in fact. Sorry for missing
 that one.

 A closer inspection shows that the results don't differ too much. The
 fitting method differs between both functions; lm.wfit uses the QR
 decomposition, whereas gls() uses restricted maximum likelihood. In
 Asymptopia, they should give the same result.

 Cheers
 Joris


 On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com
 wrote:
 Thanks for reply.

 Yes, they do differ, but does not gls() with the weights argument
 (correlation being unchanged) make the special version of GLS, as
 this sentence from the page you provided says: The method leading
 to this result is called Generalized Least Squares estimation
 (GLS), of which WLS is just a special case?

 Best,
 Stats Wolf



 On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com
 wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the
 same as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_square
 s_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com
 wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I
 thought that maybe optimum weights from gls might be used as
 weights in lm (as shown below), but apparently this is not the
 case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,
 weights = varIdent(form = ~ 1 | Species)) aa -
 attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris,
 weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms
 of standard errors). Could you please explain why? Are these
 different types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Joris Meys
Thanks!  I was getting confused as wel, Wolf really had a point.

I have to admit that this is all a bit counterintuitive. I would
expect those weights to be the inverse of the variances, as GLS uses
the inverse of the variance-covariance matrix. I finally found in the
help files ?nlme::varWeights the needed information, after you pointed
out the problem and after strolling through quite a part of the help
files...

Does this mean that the GLS algorithm uses 1/sd as weights (can't
imagine that...), or is this one of the things in R that just are like
that?

Cheers
Joris

On Thu, Jun 24, 2010 at 3:20 PM, Viechtbauer Wolfgang (STAT)
wolfgang.viechtba...@stat.unimaas.nl wrote:
 The weights in 'aa' are the inverse standard deviations. But you want to use 
 the inverse variances as the weights:

 aa - (attributes(summary(f1)$modelStruct$varStruct)$weights)^2

 And then the results are essentially identical.

 Best,

 --
 Wolfgang Viechtbauer                        http://www.wvbauer.com/
 Department of Methodology and Statistics    Tel: +31 (0)43 388-2277
 School for Public Health and Primary Care   Office Location:
 Maastricht University, P.O. Box 616         Room B2.01 (second floor)
 6200 MD Maastricht, The Netherlands         Debyeplein 1 (Randwyck)


 Original Message
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Stats Wolf Sent:
 Thursday, June 24, 2010 15:00 To: Joris Meys
 Cc: r-help@r-project.org
 Subject: Re: [R] Question on WLS (gls vs lm)

 Indeed, they should give the same results, and hence I was worried to
 see that the results were not that same. Suffice it to look at
 standard errors and p-values: they do differ, and the differences are
 not really that small.


 Thanks,
 Stats Wolf



 On Thu, Jun 24, 2010 at 2:39 PM, Joris Meys jorism...@gmail.com
 wrote:
 Indeed, WLS is a special case of GLS, where the error covariance
 matrix is a diagonal matrix. OLS is a special case of GLS, where the
 error is considered homoscedastic and all weights are equal to 1. And
 I now realized that the varIdent() indeed makes a diagonal covariance
 matrix, so the results should be the same in fact. Sorry for missing
 that one.

 A closer inspection shows that the results don't differ too much. The
 fitting method differs between both functions; lm.wfit uses the QR
 decomposition, whereas gls() uses restricted maximum likelihood. In
 Asymptopia, they should give the same result.

 Cheers
 Joris


 On Thu, Jun 24, 2010 at 12:54 PM, Stats Wolf stats.w...@gmail.com
 wrote:
 Thanks for reply.

 Yes, they do differ, but does not gls() with the weights argument
 (correlation being unchanged) make the special version of GLS, as
 this sentence from the page you provided says: The method leading
 to this result is called Generalized Least Squares estimation
 (GLS), of which WLS is just a special case?

 Best,
 Stats Wolf



 On Thu, Jun 24, 2010 at 12:49 PM, Joris Meys jorism...@gmail.com
 wrote:
 Isn't that exactly what you would expect when using a _generalized_
 least squares compared to a normal least squares? GLS is not the
 same as WLS.

 http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_least_square
 s_generalized.htm

 Cheers
 Joris

 On Thu, Jun 24, 2010 at 9:16 AM, Stats Wolf stats.w...@gmail.com
 wrote:
 Hi all,

 I understand that gls() uses generalized least squares, but I
 thought that maybe optimum weights from gls might be used as
 weights in lm (as shown below), but apparently this is not the
 case. See:

 library(nlme)
 f1 - gls(Petal.Width ~ Species / Petal.Length, data = iris,
 weights = varIdent(form = ~ 1 | Species)) aa -
 attributes(summary(f1)$modelStruct$varStruct)$weights
 f2 - lm(Petal.Width ~ Species / Petal.Length, data = iris,
 weights = aa)

 summary(f1)$tTable; summary(f2)

 So, the two models with the very same weights do differ (in terms
 of standard errors). Could you please explain why? Are these
 different types of weights?

 Many thanks in advance,
 Stats Wolf

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php





 --
 Joris Meys
 Statistical consultant

 Ghent University
 Faculty of Bioscience Engineering
 Department of Applied mathematics, biometrics and process control

 tel : +32 9 264 59 87
 joris.m...@ugent.be
 ---
 Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


 __
 R-help@r-project.org mailing list
 https

Re: [R] Question on WLS (gls vs lm)

2010-06-24 Thread Gabor Grothendieck
On Thu, Jun 24, 2010 at 9:20 AM, Viechtbauer Wolfgang (STAT)
wolfgang.viechtba...@stat.unimaas.nl wrote:
 The weights in 'aa' are the inverse standard deviations. But you want to use 
 the inverse variances as the weights:

 aa - (attributes(summary(f1)$modelStruct$varStruct)$weights)^2

 And then the results are essentially identical.


We might now ask how we might have found Wolfgang's answer via
calculation.  Lets redo the gls calculation of variance from scratch
by iterated re-weighted least squares (just one iteration here) and
compare that to the gls aa calculated by the original poster:

# estimate beta
fm - lm(Petal.Width ~ Species / Petal.Length, iris)

# estimate variance
v - fitted(lm(resid(fm)^2 ~ Species, iris))
v - v/v[1]

# compare to aa from original poster
lm(log(aa) ~ log(v))

The last line gives:

(Intercept)   log(v)
-4.212e-07   -5.000e-01

which suggsts:   aa = 1/sqrt(variance)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about a program

2010-06-23 Thread Jim Holtman

use Rprof to determine where your function is spending time.

What is the problem you are trying to solve?

Sent from my iPhone.

On Jun 23, 2010, at 5:21, li li hannah@gmail.com wrote:


Dear all,
  I have the following program for a multiple comparison procedure.
There are two functions for the two steps. First step is to  
calculate the

critical values,
while the second step is the actual procedure [see below: program  
with two

functions].
  This  work fine. However, However I want to put them into one  
function

for the convenience
of later use [see below: program with one function]. Some how the big
function works extremely
slow.  For example I chose
m - 10
rho - 0.1
k - 2
alpha - 0.05
pvaluessort - sort(1-pnorm(rnorm(10))

Is there anything that I did wrong?
Thank you!
   Hannah


##Program with two functions
## first step
library(mvtnorm)
cc_f - function(m, rho, k, alpha) {

cc_z - numeric(m)
var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
for (i in 1:m){
  if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha,  
tail=upper,

sigma=var)$quantile} else
  {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
tail=upper, sigma=var)$quantile}
  }
cc - 1-pnorm(cc_z)
return(cc)
}
## second step
pair_kFWER=function(m,crit_cons,pvaluessort){
k=0
while((k m)(crit_cons[m-k]  pvaluessort[m-k])){
k=k+1
}
return(m-k)
}
###
##Program with one function ##

pair_kFWER=function(m,alpha,rho,k,pvaluessort){
library(mvtnorm)
cc_z - numeric(m)
var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
for (i in 1:m){
  if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha,  
tail=upper,

sigma=var)$quantile} else
  {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
tail=upper, sigma=var)$quantile}
  }
crit_cons - 1-pnorm(cc_z)

k=0
while((k m)(crit_cons[m-k]  pvaluessort[m-k])){
k=k+1
}
return(m-k)
}
#

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about a program

2010-06-23 Thread Joris Meys
Most of the computation time is in the functions qvnorm. You can win a
little bit by optimizing the code, but the gain is relatively small.
You can also decrease the interval used to evaluate qvnorm to win some
speed there. As you look for the upper tail, no need to evaluate the
function in negative numbers. Eg :

pair_kFWER2=function(m,alpha,rho,k,pvaluessort){
library(mvtnorm)
cc_z - numeric(m)
Var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)

lpart - 1:k  # first part
hpart - (k+1):m  # second part

# make first part of the cc_z
cc_z[lpart] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
interval=c(0,4),sigma=Var)$quantile

# make second part of the cc_z
cc_z[hpart] - sapply(hpart,function(i){
qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,interval=c(0,4),
tail=upper, sigma=Var)$quantile
})

crit_cons - 1-pnorm(cc_z)

# does the test whether values of crit_cons[i] are smaller than
# values pvaluessort[i] and returns a vector
# which says at which positions does not occur
# tail takes the last position. I guess that's what you wanted
out - tail(which(!(crit_cons  pvaluessort)),1)
return(out)
}

 system.time(replicate(100,pair_kFWER(m,alpha,rho,k,pvaluessort)))
   user  system elapsed
   5.970.046.04

 system.time(replicate(100,pair_kFWER2(m,alpha,rho,k,pvaluessort)))
   user  system elapsed
   4.430.004.44

Check whether the function works correctly. It gives the same result
as the other one in my tests. Only in the case where your function
returns 0, the modified returns integer(0)

Cheers
Joris


On Wed, Jun 23, 2010 at 2:21 PM, li li hannah@gmail.com wrote:
 Dear all,
   I have the following program for a multiple comparison procedure.
 There are two functions for the two steps. First step is to calculate the
 critical values,
 while the second step is the actual procedure [see below: program with two
 functions].
   This  work fine. However, However I want to put them into one function
 for the convenience
 of later use [see below: program with one function]. Some how the big
 function works extremely
 slow.  For example I chose
 m - 10
 rho - 0.1
 k - 2
 alpha - 0.05
 pvaluessort - sort(1-pnorm(rnorm(10))

 Is there anything that I did wrong?
  Thank you!
                    Hannah


 ##Program with two functions
 ## first step
 library(mvtnorm)
 cc_f - function(m, rho, k, alpha) {

 cc_z - numeric(m)
 var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
 for (i in 1:m){
   if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
 sigma=var)$quantile} else
               {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
 tail=upper, sigma=var)$quantile}
               }
 cc - 1-pnorm(cc_z)
 return(cc)
                             }
 ## second step
 pair_kFWER=function(m,crit_cons,pvaluessort){
 k=0
 while((k m)(crit_cons[m-k]  pvaluessort[m-k])){
 k=k+1
 }
 return(m-k)
 }
 ###
 ##Program with one function ##

 pair_kFWER=function(m,alpha,rho,k,pvaluessort){
 library(mvtnorm)
 cc_z - numeric(m)
 var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
 for (i in 1:m){
   if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
 sigma=var)$quantile} else
               {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
 tail=upper, sigma=var)$quantile}
               }
 crit_cons - 1-pnorm(cc_z)

 k=0
 while((k m)(crit_cons[m-k]  pvaluessort[m-k])){
 k=k+1
 }
 return(m-k)
 }
 #

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about boosting(Adaboosting. M1)

2010-06-19 Thread Joris Meys
Changbin,

The weights don't have to sum up to one. These are the weights of the
trees in the bag used to combine them into the final fit, and if I'm
not mistaken expressed as the logit of the error for the respective
trees.

If you use a method, be sure you understand it. If you don't
understand those weights, I reckon there's a whole lot of other things
you don't understand about the method. A good start is running and
analyzing the example in the help file. Those weights don't sum up to
one either, but the authors don't seem to mind...

There are references given in the help files, and you should
understand what they say before you apply the algorithm. You're not
going to handle a chainsaw without reading the manual carefully
either.

Cheers
Joris


On Sat, Jun 19, 2010 at 11:35 PM, Changbin Du changb...@gmail.com wrote:
 HI, Guys,

 I am trying to use the AdaBoosting. M.1 algorithm to integrate three models.
 I found the sum of  weights for each model is not  equal to one.

 How to deal with this?

 Thanks, any response or suggestions are appreciated!


 --
 Sincerely,
 Changbin
 --

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question in R

2010-06-18 Thread Joris Meys
See below.

On Fri, Jun 18, 2010 at 7:11 PM, li li hannah@gmail.com wrote:
 Dear all,
   I am trying to calculate certain critical values from bivariate normal
 distribution (please see the
 function below).

 m - 10
 rho - 0.1
 k - 2
 alpha - 0.05
 ## calculate critical constants
 cc_z - numeric(m)
 var - matrix(c(1,rho,rho,1), nrow=2, ncol=2, byrow=T)
 for (i in 1:m){
   if (i = k) {cc_z[i] - qmvnorm((k*(k-1))/(m*(m-1))*alpha, tail=upper,
 sigma=var)$quantile} else
               {cc_z[i] - qmvnorm((k*(k-1))/((m-i+k)*(m-i+k-1))*alpha,
 tail=upper, sigma=var)$quantile}
               }



 After the critical constants cc_z is calculated, I wanted to check whether
 they are correct.


 ##check whether cc_z is correct
  pmvnorm(lower=c(cc_z[1], cc_z[1]),
 upper=Inf,sigma=var)-(k*(k-1))/(n*(n-1))

Shouldn't this be
 pmvnorm(lower=c(cc_z[1], cc_z[1]),
+ upper=Inf,sigma=var)-(k*(k-1))/(m*(m-1))*alpha
[1] -5.87e-09
attr(,error)
[1] 1e-15
attr(,msg)
[1] Normal Completion

This still gives a bit of an error, but you have to take into account
as well that the underlying algorithms use randomized quasi-MC
methods, and that floating point issues can play here as well. So it
looks to me that your calculations are correct.

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding print

2010-06-17 Thread jim holtman
cat(out, '\n')

On Thu, Jun 17, 2010 at 10:19 AM, Adolf STIPS
adolf.st...@jrc.ec.europa.eu wrote:

 Hi,

 Does anybody know how to have output from print, without the leading [1]?
 (Or must I use cat/write?)

out=r15
print(out,quote=FALSE)
 [1] r15

 And I definitely do not want the leading [1] as I want to construct a table
 from this.

 Ciao, Adolf




 
 Adolf Stips (new email: adolf.st...@jrc.ec.europa.eu)
 Global Environment Monitoring unit
 CEC Joint Research Centre, TP 272
 I-21027 Ispra, Italy
 Tel: +39-0332-789876
 Fax: +39-0332-789034
 I know that I do not know, but even that not for sure!
 
 The views expressed are purely those of the writer and may not in any
 circumstances be regarded as stating an official position of the European
 Commission.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding print

2010-06-17 Thread Greg Snow
The cat function is probably the best approach, but if your really feel the 
need to use print then you can just assign blank names (now it will be a named 
vector and slower in heavy calculations, but the printing is different).  Try 
something like:

 names(x) - rep( '', length(x) )
 print(x)
# or
 x

This of course leads to the question of how to prevent the blank line(s).  The 
best answer there is use the cat function (or some other specialized function 
to print your objects (which will probably use cat)).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Adolf STIPS
 Sent: Thursday, June 17, 2010 8:20 AM
 To: r-help@r-project.org
 Subject: [R] Question regarding print
 
 
 Hi,
 
 Does anybody know how to have output from print, without the leading
 [1]?
 (Or must I use cat/write?)
 
 out=r15
 print(out,quote=FALSE)
 [1] r15
 
 And I definitely do not want the leading [1] as I want to construct a
 table
 from this.
 
 Ciao, Adolf
 
 
 
 
 
 Adolf Stips (new email: adolf.st...@jrc.ec.europa.eu)
 Global Environment Monitoring unit
 CEC Joint Research Centre, TP 272
 I-21027 Ispra, Italy
 Tel: +39-0332-789876
 Fax: +39-0332-789034
 I know that I do not know, but even that not for sure!
 
 The views expressed are purely those of the writer and may not in any
 circumstances be regarded as stating an official position of the
 European
 Commission.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2010-06-16 Thread Joris Meys
Two possibilities : rescale your random vector, or resample to get
numbers within the range. But neither of these solutions will give you
a true exponential distribution. I am not aware of truncated
exponential distributions that are available in R, but somebody else
might know more about that.

# possibility I : rescaling
rsample - rexp(5)
lim - 0.8
rsample - rsample*lim/max(rsample)
rsample

# possibility II : resampling
rsample - rexp(5)
while(sum(rsamplelim)0) {
  rsample - ifelse(rsamplelim,rexp(length(rsample)),rsample)
}
rsample

Cheers
Joris

On Wed, Jun 16, 2010 at 12:00 PM, Assieh Rashidi
assiehrash...@yahoo.com wrote:

 Dear Mr.
 for writing program about Gibbs sampling, i have a question.
 if i want to generate data from Exponential distribution but range of X is 
 restricted, how can i do?
 regards,
 A.Rashidi





        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about user define function

2010-06-16 Thread Allan Engelhardt


On 15/06/10 21:39, GL wrote:

Have the following function that is called by the statement below. Trying to
return the two dataframes, but instead get one large list including both
tables.

ReadInputDataFrames- function() {

   dbs.this= read.delim(this.txt, header = TRUE, sep = \t, quote=\,
dec=.)
   dbs.that=  read.delim(that.txt, header = TRUE, sep = \t, quote=\,
dec=.)
   c(this= dbs.this,patdb = dbs.that)

   


If you really want to return two dataframes, then

return(list(this = dbs.this, that = dbs.that))

More likely, you want to return all the data in one dataframe.  If they 
have the same structure (columns and column names) then you want


return(rbind(dbs.this, dbs,that))

If you want something else, provide an example.

Hope this helps a little.

Allan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2010-06-16 Thread Matt Shotwell
Since there is a simple closed form for the truncated exponential CDF,
you can use inverse transform sampling. I believe this is quite common
in survival analysis methods. The first step is to compute and write an
R function to compute the inverse CDF for the truncated exponential,
say 

itexp - function(u, m, t) { -log(1-u*(1-exp(-t*m)))/m }

where u is the quantile, m is the rate, and t is the level of
truncation. Next, we draw from the truncated exponential with something
like

rtexp - function(n, m, t) { itexp(runif(n), m, t) }

Check it out with 

texp - rtexp(1,1,pi)
hist(texp)
summary(texp)

Matt Shotwell
Graduate Student
Div. Biostatistics and Epidemiology
Medical University of South Carolina

On Wed, 2010-06-16 at 11:11 -0400, Joris Meys wrote:
 Two possibilities : rescale your random vector, or resample to get
 numbers within the range. But neither of these solutions will give you
 a true exponential distribution. I am not aware of truncated
 exponential distributions that are available in R, but somebody else
 might know more about that.
 
 # possibility I : rescaling
 rsample - rexp(5)
 lim - 0.8
 rsample - rsample*lim/max(rsample)
 rsample
 
 # possibility II : resampling
 rsample - rexp(5)
 while(sum(rsamplelim)0) {
   rsample - ifelse(rsamplelim,rexp(length(rsample)),rsample)
 }
 rsample
 
 Cheers
 Joris
 
 On Wed, Jun 16, 2010 at 12:00 PM, Assieh Rashidi
 assiehrash...@yahoo.com wrote:
 
  Dear Mr.
  for writing program about Gibbs sampling, i have a question.
  if i want to generate data from Exponential distribution but range of X is 
  restricted, how can i do?
  regards,
  A.Rashidi
 
 
 
 
 
 [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about user define function

2010-06-15 Thread David Winsemius


On Jun 15, 2010, at 4:39 PM, GL wrote:



Have the following function that is called by the statement below.  
Trying to
return the two dataframes, but instead get one large list including  
both

tables.

ReadInputDataFrames - function() {

 dbs.this= read.delim(this.txt, header = TRUE, sep = \t,  
quote=\,

dec=.)
 dbs.that=  read.delim(that.txt, header = TRUE, sep = \t,  
quote=\,

dec=.)


## Possible that you want to cbind() them rather than c() them


 cbind(this= dbs.this,patdb = dbs.that)

}


Which would require that they have the same number of columns and  
possibly that the column names match up.


?cbind




Called by:

c - ReadInputDataFrames()

Results:

str(c) yields a list of 106 items $this.variabe1..53, $that 
$variable1..53




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about mean

2010-06-09 Thread Peter Langfelder
apply(iris[, -5], 2, tapply, iris$Species, mean)

On Wed, Jun 9, 2010 at 3:43 PM, SH.Chou cls3...@gmail.com wrote:

 Hi there:
 I have a question about generating mean value of a data.frame. Take
 iris data for example, if I have a data.frame looking like the following:
 -
Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies
 15.1   3.5  1.4
0.2 setosa
 24.9   3.0  1.4
0.2 setosa
 34.7   3.2   1.3
   0.2 setosa
 . .   .  .
 .  .
 . .   .  .
.   .
 . .   .  .
.   .
 ---
 There are three different species in this table. I want to make a table and
 calculate mean value for each specie as the following table:

 -
 Sepal.Length Sepal.Width Petal.Length
 Petal.Width
 mean.setosa5.0063.428 1.462
  0.246
 mean.versicolor   5.936 2.770 4.260
  1.326
 mean.virginica  6.5882.974 5.552
  2.026
 -
 Is there any short syntax can do it?? I mean shorter than the code I wrote
 as following:

 attach(iris)
 mean.setosa-mean(iris[Species==setosa, 1:4])
 mean.versicolor-mean(iris[Species==versicolor, 1:4])
 mean.virginica-mean(iris[Species==virginica, 1:4])
 data.mean-rbind(mean.setosa, mean.versicolor, mean.virginica)
 detach(iris)
 --

 Thanks a million!!!


 --
 =
 Shih-Hsiung, Chou
 System Administrator / PH.D Student at
 Department of Industrial Manufacturing
 and Systems Engineering
 Kansas State University

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about mean

2010-06-09 Thread Bill.Venables
Here is an alternative

with(iris, rowsum(iris[, -5], Species)/table(Species))

 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Peter Langfelder
Sent: Thursday, 10 June 2010 12:27 PM
To: SH.Chou
Cc: r-help@r-project.org
Subject: Re: [R] question about mean

apply(iris[, -5], 2, tapply, iris$Species, mean)

On Wed, Jun 9, 2010 at 3:43 PM, SH.Chou cls3...@gmail.com wrote:

 Hi there:
 I have a question about generating mean value of a data.frame. Take
 iris data for example, if I have a data.frame looking like the following:
 -
Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies
 15.1   3.5  1.4
0.2 setosa
 24.9   3.0  1.4
0.2 setosa
 34.7   3.2   1.3
   0.2 setosa
 . .   .  .
 .  .
 . .   .  .
.   .
 . .   .  .
.   .
 ---
 There are three different species in this table. I want to make a table and
 calculate mean value for each specie as the following table:

 -
 Sepal.Length Sepal.Width Petal.Length
 Petal.Width
 mean.setosa5.0063.428 1.462
  0.246
 mean.versicolor   5.936 2.770 4.260
  1.326
 mean.virginica  6.5882.974 5.552
  2.026
 -
 Is there any short syntax can do it?? I mean shorter than the code I wrote
 as following:

 attach(iris)
 mean.setosa-mean(iris[Species==setosa, 1:4])
 mean.versicolor-mean(iris[Species==versicolor, 1:4])
 mean.virginica-mean(iris[Species==virginica, 1:4])
 data.mean-rbind(mean.setosa, mean.versicolor, mean.virginica)
 detach(iris)
 --

 Thanks a million!!!


 --
 =
 Shih-Hsiung, Chou
 System Administrator / PH.D Student at
 Department of Industrial Manufacturing
 and Systems Engineering
 Kansas State University

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about mean

2010-06-09 Thread Phil Spector

One possibility is


aggregate(iris[,-5],list(iris[,5]),mean)

 Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa5.006   3.4281.462   0.246
2 versicolor5.936   2.7704.260   1.326
3  virginica6.588   2.9745.552   2.026

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Wed, 9 Jun 2010, SH.Chou wrote:


Hi there:
I have a question about generating mean value of a data.frame. Take
iris data for example, if I have a data.frame looking like the following:
-
   Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies
15.1   3.5  1.4
   0.2 setosa
24.9   3.0  1.4
   0.2 setosa
34.7   3.2   1.3
  0.2 setosa
. .   .  .
.  .
. .   .  .
   .   .
. .   .  .
   .   .
---
There are three different species in this table. I want to make a table and
calculate mean value for each specie as the following table:

-
Sepal.Length Sepal.Width Petal.Length
Petal.Width
mean.setosa5.0063.428 1.462
 0.246
mean.versicolor   5.936 2.770 4.260
 1.326
mean.virginica  6.5882.974 5.552
 2.026
-
Is there any short syntax can do it?? I mean shorter than the code I wrote
as following:

attach(iris)
mean.setosa-mean(iris[Species==setosa, 1:4])
mean.versicolor-mean(iris[Species==versicolor, 1:4])
mean.virginica-mean(iris[Species==virginica, 1:4])
data.mean-rbind(mean.setosa, mean.versicolor, mean.virginica)
detach(iris)
--

Thanks a million!!!


--
=
Shih-Hsiung, Chou
System Administrator / PH.D Student at
Department of Industrial Manufacturing
and Systems Engineering
Kansas State University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about avoid the for loop

2010-06-03 Thread Jorge Ivan Velez
Hi Carrie,

Here are two options:

# Option 1
d - data.frame(x, t)
y - with(d, ifelse(t == 0, rbinom(2, 1, 0.2), rbinom(3, 1, 0.8)))
y

# Option 2  -- more general case, e.g. you do not know
#  how many 0's and 1's you have within each strata
spd - with(d, split(d, x))
do.call(c, lapply(spd, function(comp)
with(comp, ifelse(t == 0, rbinom(sum(t==0), 1, 0.2),
rbinom(sum(t!=0), 1, 0.8)

HTH,
Jorge


On Thu, Jun 3, 2010 at 11:49 AM, Carrie Li  wrote:

 Dear R-helpers,

 I would like to generate a binary random variable within a stratum's
 stratum. Here is a simple example.


 ## x is the first level strata index, here I have 3 strata.
 x=c(rep(1,5), rep(2,5), rep(3,5))

 ## within x, there is a second strata indexed by t=0 and t=1
 t=rep(c(0,0,1,1,1),3)


 ## and within strata i and t=0 and t=1, I generate the random binomial
 variable respectively, and save in y
 y=rep(NA, length(x))
 for (i in 1:3)
 {
 y[(x==i)(t==0)]=rbinom(2, 1, 0.2)
 y[(x==i)(t==1)]=rbinom(3, 1, 0.8)
 }


 My question: is there any way to avoid the for loop, since I have the first
 level strata has thousands of strata. (Within each x stratum, the t only
 has
 2 levels, 0 and 1 )

 Thanks for any help!

 Carrie

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about avoid the for loop

2010-06-03 Thread Carrie Li
Thanks! Jorge
Just one more question I don't get it even after checking help
For option, why just using with(d,...), ifelse works on stratum indexed by x
automatically ?
Since in with, we didn't specify the stratum is indexed by x, what if you
have another categorical variable in the data ?
Thanks again!



On Thu, Jun 3, 2010 at 12:21 PM, Jorge Ivan Velez
jorgeivanve...@gmail.comwrote:

 Hi Carrie,

 Here are two options:

 # Option 1
 d - data.frame(x, t)
 y - with(d, ifelse(t == 0, rbinom(2, 1, 0.2), rbinom(3, 1, 0.8)))
 y

 # Option 2  -- more general case, e.g. you do not know
 #  how many 0's and 1's you have within each strata
 spd - with(d, split(d, x))
 do.call(c, lapply(spd, function(comp)
 with(comp, ifelse(t == 0, rbinom(sum(t==0), 1, 0.2),
 rbinom(sum(t!=0), 1, 0.8)

 HTH,
 Jorge


 On Thu, Jun 3, 2010 at 11:49 AM, Carrie Li  wrote:

 Dear R-helpers,

 I would like to generate a binary random variable within a stratum's
 stratum. Here is a simple example.


 ## x is the first level strata index, here I have 3 strata.
 x=c(rep(1,5), rep(2,5), rep(3,5))

 ## within x, there is a second strata indexed by t=0 and t=1
 t=rep(c(0,0,1,1,1),3)


 ## and within strata i and t=0 and t=1, I generate the random binomial
 variable respectively, and save in y
 y=rep(NA, length(x))
 for (i in 1:3)
 {
 y[(x==i)(t==0)]=rbinom(2, 1, 0.2)
 y[(x==i)(t==1)]=rbinom(3, 1, 0.8)
 }


 My question: is there any way to avoid the for loop, since I have the
 first
 level strata has thousands of strata. (Within each x stratum, the t only
 has
 2 levels, 0 and 1 )

 Thanks for any help!

 Carrie

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about avoid the for loop

2010-06-03 Thread Carrie Li
Hi Jorge,

I found a problem.
I just want to check if the answer is random, I change the code as follows:

 d - data.frame(x, t)
 y - with(d, ifelse(t == 0, rbinom(2, 1, 0.5), rnorm(3)))

 cbind(x, t,y)
  x t y
 [1,] 1 0 0.000
 [2,] 1 0 0.000
 [3,] 1 1 0.8920037
 [4,] 1 1 1.6695435
 [5,] 1 1 0.8289429
 [6,] 2 0 0.000
 [7,] 2 0 0.000
 [8,] 2 1 0.8289429
 [9,] 2 1 0.8920037
[10,] 2 1 1.6695435
[11,] 3 0 0.000
[12,] 3 0 0.000
[13,] 3 1 1.6695435
[14,] 3 1 0.8289429
[15,] 3 1 0.8920037

So, notice that the normal random variable only takes 3 values, and same as
the binary part, it's all 0 0 1 1 1 for each stratum. Is there way to make
them complete randomly different over the strata ?

Thank you again!
Carrie



On Thu, Jun 3, 2010 at 7:24 PM, Carrie Li carrieands...@gmail.com wrote:

 Thanks! Jorge
 Just one more question I don't get it even after checking help
 For option, why just using with(d,...), ifelse works on stratum indexed by
 x automatically ?
 Since in with, we didn't specify the stratum is indexed by x, what if you
 have another categorical variable in the data ?
 Thanks again!




 On Thu, Jun 3, 2010 at 12:21 PM, Jorge Ivan Velez 
 jorgeivanve...@gmail.com wrote:

 Hi Carrie,

 Here are two options:

 # Option 1
 d - data.frame(x, t)
 y - with(d, ifelse(t == 0, rbinom(2, 1, 0.2), rbinom(3, 1, 0.8)))
 y

 # Option 2  -- more general case, e.g. you do not know
 #  how many 0's and 1's you have within each strata
 spd - with(d, split(d, x))
 do.call(c, lapply(spd, function(comp)
 with(comp, ifelse(t == 0, rbinom(sum(t==0), 1, 0.2),
 rbinom(sum(t!=0), 1, 0.8)

 HTH,
 Jorge


 On Thu, Jun 3, 2010 at 11:49 AM, Carrie Li  wrote:

  Dear R-helpers,

 I would like to generate a binary random variable within a stratum's
 stratum. Here is a simple example.


 ## x is the first level strata index, here I have 3 strata.
 x=c(rep(1,5), rep(2,5), rep(3,5))

 ## within x, there is a second strata indexed by t=0 and t=1
 t=rep(c(0,0,1,1,1),3)


 ## and within strata i and t=0 and t=1, I generate the random binomial
 variable respectively, and save in y
 y=rep(NA, length(x))
 for (i in 1:3)
 {
 y[(x==i)(t==0)]=rbinom(2, 1, 0.2)
 y[(x==i)(t==1)]=rbinom(3, 1, 0.8)
 }


 My question: is there any way to avoid the for loop, since I have the
 first
 level strata has thousands of strata. (Within each x stratum, the t only
 has
 2 levels, 0 and 1 )

 Thanks for any help!

 Carrie

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about avoid the for loop

2010-06-03 Thread Jorge Ivan Velez
Hi Carrie,

It works just fine in this case because you have the same number of 0's and
1's within each strata. If that would not be the case, option 1 would not
work. That's why I provided you a second option.

Best,
Jorge


On Thu, Jun 3, 2010 at 7:24 PM, Carrie Li  wrote:

 Thanks! Jorge
 Just one more question I don't get it even after checking help
 For option, why just using with(d,...), ifelse works on stratum indexed by
 x automatically ?
 Since in with, we didn't specify the stratum is indexed by x, what if you
 have another categorical variable in the data ?
 Thanks again!




 On Thu, Jun 3, 2010 at 12:21 PM, Jorge Ivan Velez 
 jorgeivanve...@gmail.com wrote:

 Hi Carrie,

 Here are two options:

 # Option 1
 d - data.frame(x, t)
 y - with(d, ifelse(t == 0, rbinom(2, 1, 0.2), rbinom(3, 1, 0.8)))
 y

 # Option 2  -- more general case, e.g. you do not know
 #  how many 0's and 1's you have within each strata
 spd - with(d, split(d, x))
 do.call(c, lapply(spd, function(comp)
 with(comp, ifelse(t == 0, rbinom(sum(t==0), 1, 0.2),
 rbinom(sum(t!=0), 1, 0.8)

 HTH,
 Jorge


 On Thu, Jun 3, 2010 at 11:49 AM, Carrie Li  wrote:

  Dear R-helpers,

 I would like to generate a binary random variable within a stratum's
 stratum. Here is a simple example.


 ## x is the first level strata index, here I have 3 strata.
 x=c(rep(1,5), rep(2,5), rep(3,5))

 ## within x, there is a second strata indexed by t=0 and t=1
 t=rep(c(0,0,1,1,1),3)


 ## and within strata i and t=0 and t=1, I generate the random binomial
 variable respectively, and save in y
 y=rep(NA, length(x))
 for (i in 1:3)
 {
 y[(x==i)(t==0)]=rbinom(2, 1, 0.2)
 y[(x==i)(t==1)]=rbinom(3, 1, 0.8)
 }


 My question: is there any way to avoid the for loop, since I have the
 first
 level strata has thousands of strata. (Within each x stratum, the t only
 has
 2 levels, 0 and 1 )

 Thanks for any help!

 Carrie

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about avoid the for loop

2010-06-03 Thread Carrie Li
Yes, in my case here, each strata has the same number of 0's and 1's. But I
want the y to be randomly generated within each strata, so y should have
some difference across the strata. (at least for the rnorm part we would see
much clear randomness)
(I hope what I am asking here is clear to you. )

Using option 2 you provided previously
the output:

 cbind(x, t,y)
  x t y
 [1,] 1 0 0.000
 [2,] 1 0 0.000
 [3,] 1 1 0.8920037
 [4,] 1 1 1.6695435
 [5,] 1 1 0.8289429
 [6,] 2 0 0.000
 [7,] 2 0 0.000
 [8,] 2 1 0.8289429
 [9,] 2 1 0.8920037
[10,] 2 1 1.6695435
[11,] 3 0 0.000
[12,] 3 0 0.000
[13,] 3 1 1.6695435
[14,] 3 1 0.8289429
[15,] 3 1 0.8920037


For the rnorm part, it seems only take 3 values 1.66954, 0.82894, and 0.8920



On Thu, Jun 3, 2010 at 8:16 PM, Jorge Ivan Velez
jorgeivanve...@gmail.comwrote:

 Hi Carrie,

 It works just fine in this case because you have the same number of 0's and
 1's within each strata. If that would not be the case, option 1 would not
 work. That's why I provided you a second option.

 Best,
 Jorge


 On Thu, Jun 3, 2010 at 7:24 PM, Carrie Li  wrote:

 Thanks! Jorge
 Just one more question I don't get it even after checking help
 For option, why just using with(d,...), ifelse works on stratum indexed by
 x automatically ?
 Since in with, we didn't specify the stratum is indexed by x, what if you
 have another categorical variable in the data ?
 Thanks again!




 On Thu, Jun 3, 2010 at 12:21 PM, Jorge Ivan Velez 
 jorgeivanve...@gmail.com wrote:

 Hi Carrie,

 Here are two options:

 # Option 1
 d - data.frame(x, t)
 y - with(d, ifelse(t == 0, rbinom(2, 1, 0.2), rbinom(3, 1, 0.8)))
 y

 # Option 2  -- more general case, e.g. you do not know
 #  how many 0's and 1's you have within each strata
 spd - with(d, split(d, x))
 do.call(c, lapply(spd, function(comp)
 with(comp, ifelse(t == 0, rbinom(sum(t==0), 1, 0.2),
 rbinom(sum(t!=0), 1, 0.8)

 HTH,
 Jorge


 On Thu, Jun 3, 2010 at 11:49 AM, Carrie Li  wrote:

  Dear R-helpers,

 I would like to generate a binary random variable within a stratum's
 stratum. Here is a simple example.


 ## x is the first level strata index, here I have 3 strata.
 x=c(rep(1,5), rep(2,5), rep(3,5))

 ## within x, there is a second strata indexed by t=0 and t=1
 t=rep(c(0,0,1,1,1),3)


 ## and within strata i and t=0 and t=1, I generate the random binomial
 variable respectively, and save in y
 y=rep(NA, length(x))
 for (i in 1:3)
 {
 y[(x==i)(t==0)]=rbinom(2, 1, 0.2)
 y[(x==i)(t==1)]=rbinom(3, 1, 0.8)
 }


 My question: is there any way to avoid the for loop, since I have the
 first
 level strata has thousands of strata. (Within each x stratum, the t only
 has
 2 levels, 0 and 1 )

 Thanks for any help!

 Carrie

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


<    5   6   7   8   9   10   11   12   13   14   >