Re: [R] Unsubscribe please

2013-04-17 Thread Pascal Oettli

Hello,

Don't reply only to me.

1) Filter the unwanted mails,
2) It takes few days to unsubscribe you.

Regards,
Pascal


On 04/17/2013 02:59 PM, bert verleysen (beverconsult) wrote:

I did this, but still I receive to much mails

Bert Verleysen
00 32 (0)477 874 272



samen zoekend naar generatief organiseren


-Oorspronkelijk bericht-
Van: Pascal Oettli [mailto:kri...@ymail.com]
Verzonden: woensdag 17 april 2013 6:33
Aan: Bert Verleysen (beverconsult)
CC: R-help@r-project.org
Onderwerp: Re: [R] Unsubscribe please

Hi,

Do it yourself:
https://stat.ethz.ch/mailman/listinfo/r-help

Hint:
Bbottom of the page (To unsubscribe from R-help)

Regards,
Pascal


On 04/17/2013 06:33 AM, Bert Verleysen (beverconsult) wrote:



Verstuurd vanaf mijn iPad
Bert Verleysen
00 32 (0)477 874 272
www.beverconsult.be

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transformation of a variable in a dataframe

2013-04-17 Thread David Winsemius


On Apr 16, 2013, at 10:33 PM, jpm miao wrote:


HI,
  I have a dataframe with two variable A, B. I transform the two  
variable
and name them as C, D and save it in a dataframe  dfcd. However, I  
wonder

why can't I call them by dfcd$C and dfcd$D?


Because you didn't assign them to dfab$C. It's going to be more  
successful if you use:


dfab[[A]]*2



  Thanks,

Miao


A=c(1,2,3)
B=c(4,6,7)
dfab-data.frame(A,B)
C=dfab[A]*2
D=dfab[B]*3
dfcd-data.frame(C,D)
dfcd

 A  B
1 2 12
2 4 18
3 6 21

dfcd$C

NULL

dfcd$A

[1] 2 4 6

[[alternative HTML version deleted]]



David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging big data.frame

2013-04-17 Thread avinash sahu
Hi all,

I am trying to merge 2 big data.frame. The problem is merge is memory
intensive so R is going out of memory error: cannot allocate vector of size
360.1 Mb. To overcome this, I am exploring option of using data.table
package. But its not helping in term of memory as merge in data.table is
fast but not memory efficient. Similar error is coming.
My inputs are
inp1
 V1 V2
1  a i1
2  a i2
3  a i3
4  a i4
5  b i5
6  c i6

inp2
  V1 V2
1  a  x
2  b  x
3  a  y
4  c  z

I want  merge(x=inp1, y=inp2, by.x=V1, by.y=V1)
so the output

 V1 V2.x V2.y
1   a   i1x
2   a   i1y
3   a   i2x
4   a   i2y
5   a   i3x
6   a   i3y
7   a   i4x
8   a   i4y
9   b   i5x
10  c   i6z

Is there a way to do this without using merge in data.table? or Is there
any other solution to do this in more efficient and less memory ?

thanks
avi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging big data.frame

2013-04-17 Thread Jeff Newmiller
check out the sqldf package
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

avinash sahu avinash.s...@gmail.com wrote:

Hi all,

I am trying to merge 2 big data.frame. The problem is merge is memory
intensive so R is going out of memory error: cannot allocate vector of
size
360.1 Mb. To overcome this, I am exploring option of using data.table
package. But its not helping in term of memory as merge in data.table
is
fast but not memory efficient. Similar error is coming.
My inputs are
inp1
 V1 V2
1  a i1
2  a i2
3  a i3
4  a i4
5  b i5
6  c i6

inp2
  V1 V2
1  a  x
2  b  x
3  a  y
4  c  z

I want  merge(x=inp1, y=inp2, by.x=V1, by.y=V1)
so the output

 V1 V2.x V2.y
1   a   i1x
2   a   i1y
3   a   i2x
4   a   i2y
5   a   i3x
6   a   i3y
7   a   i4x
8   a   i4y
9   b   i5x
10  c   i6z

Is there a way to do this without using merge in data.table? or Is
there
any other solution to do this in more efficient and less memory ?

thanks
avi

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge

2013-04-17 Thread Farnoosh
Thanks a lot:)

Sent from my iPad

On Apr 16, 2013, at 10:15 PM, arun smartpink...@yahoo.com wrote:

 Hi Farnoosh,
 YOu can use either ?merge() or ?join()
 DataA- read.table(text=
 ID v1 
 1 10
 2 1 
 3 22
 4 15
 5 3 
 6 6 
 7 8 
 ,sep=,header=TRUE)
 
 DataB- read.table(text=
 ID v2
 2 yes
 5 no
 7 yes
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
 
 merge(DataA,DataB,by=ID,all.x=TRUE)
 #  ID v1   v2
 #1  1 10 NA
 #2  2  1  yes
 #3  3 22 NA
 #4  4 15 NA
 #5  5  3   no
 #6  6  6 NA
 #7  7  8  yes
  library(plyr)
  join(DataA,DataB,by=ID,type=left)
 #  ID v1   v2
 #1  1 10 NA
 #2  2  1  yes
 #3  3 22 NA
 #4  4 15 NA
 #5  5  3   no
 #6  6  6 NA
 #7  7  8  yes
 A.K.
 
 
 
 
 
 
 From: farnoosh sheikhi farnoosh...@yahoo.com
 To: smartpink...@yahoo.com smartpink...@yahoo.com 
 Sent: Wednesday, April 17, 2013 12:52 AM
 Subject: Merge
 
 
 
 Hi Arun,
 
 I want to merge a data set with another data frame with 2 columns and keep 
 the sample size of the DataA.
 
 DataA  DataB  DataCombine 
 ID v1  ID V2  ID v1 v2 
 1 10  2 yes  1 10 NA 
 2 1  5 no  2 1 yes 
 3 22  7 yes  3 22 NA 
 4 15 4 15 NA 
 5 3 5 3 no 
 6 6 6 6 NA 
 7 8 7 8 yes 
 
 
 Thanks a lot for your help and time.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-17 Thread Shane Carey
Can you resend this link please?

Thanks


On Tue, Apr 16, 2013 at 10:33 PM, Jim Lemon j...@bitwrit.com.au wrote:

 On 04/17/2013 03:25 AM, Sarah Goslee wrote:

 ...
 Ouch.

 (Note: I know nothing about the site, the author of the article, or
 the study in question. I was pointed to it by someone else. But if
 true: highly problematic.)

 Sarah

  There seem to be three major problems described here, and only one is
 marginally related to Excel (and similar spreadsheets). Cherry picking data
 is all too common. Almost anyone who reviews papers for publication will
 have encountered it, and there are excellent books describing examples that
 have had great influence on public policy.

 Similarly, applying obscure and sometimes inappropriate statistical
 methods that produce the desired results when nothing else will appears
 with depressing frequency.

 The final point does relate to Excel and any application that hides what
 is going on to the casual observer. I will treasure this URL to give to
 anyone who chastises my moaning when I have to perform some task in Excel.
 It is not an error in the application (although these certainly exist) but
 a salutory caution to those who think that if a reasonable looking number
 appears in a cell, it must be the correct answer. I have found not one, but
 two such errors in the simple calculation of a birthday age from the date
 of birth and date of death.

 Jim

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I don't understand the 'order' function

2013-04-17 Thread Patrick Burns

There is a blog post about this:

http://www.portfolioprobe.com/2012/07/26/r-inferno-ism-order-is-not-rank/

And proof that it is possible to confuse them
even when you know the difference.

Pat

On 16/04/2013 19:10, Julio Sergio wrote:

Julio Sergio juliosergio at gmail.com writes:



I thought I've understood the 'order' function, using simple examples like:


Thanks to you all!... As Sarah said, what was damaged was my understanding (
;-) )... and as Duncan said, I was confusing 'order' with 'rank',
thanks! Now I understand the 'order' function.

   -Sergio

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-17 Thread peter dalgaard

On Apr 17, 2013, at 10:16 , Shane Carey wrote:

 Can you resend this link please?
 

Psst:

https://stat.ethz.ch/pipermail/r-help/2013-April/351669.html


-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a vector with repeating dates

2013-04-17 Thread Katherine Gobin
Dear R forum

I have a data.frame

df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013), 
values = c(47, 38, 56, 92))

I need to to create a vector by repeating the dates as 

Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,  Current_date, 
4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date, 4/15/2013, 4/14/2013, 
4/13/2013, 4/12/2013

i.e. I need to create a new vector as given below which I need to use for some 
other purpose.

Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013
Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013
Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013

Is it possible to construct such a
 column?

Regards

Katherine



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a vector with repeating dates

2013-04-17 Thread andrija djurovic
?rep


On Wed, Apr 17, 2013 at 11:11 AM, Katherine Gobin katherine_go...@yahoo.com
 wrote:

 Dear R forum

 I have a data.frame

 df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013,
 4/12/2013), values = c(47, 38, 56, 92))

 I need to to create a vector by repeating the dates as

 Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,
 Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date,
 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013

 i.e. I need to create a new vector as given below which I need to use for
 some other purpose.

 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013

 Is it possible to construct such a
  column?

 Regards

 Katherine



 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Understanding why a GAM can't suppress an intercept

2013-04-17 Thread Simon Wood

hi Andrew.

gam does suppress the intercept, it's just that this doesn't force the 
smooth through the intercept in the way that you would like. Basically 
for the parameteric component of the model '-1' behaves exactly like it 
does in 'lm' (it's using the same code). The smooths are 'added on' to 
the parametric component of the model, with sum to zero constraints to 
force identifiability.


There is a solution to forcing a spline through a particular point at
http://r.789695.n4.nabble.com/Use-pcls-in-quot-mgcv-quot-package-to-achieve-constrained-cubic-spline-td4660966.html
(i.e. the R help thread Re: [R] Use pcls in mgcv package to achieve 
constrained cubic spline)


best,
Simon

On 16/04/13 22:36, Andrew Crane-Droesch wrote:

  Dear List,

I've just tried to specify a GAM without an intercept -- I've got one
of the (rare) cases where it is appropriate for E(y) - 0 as X -0.
Naively running a GAM with the -1 appended to the formula and the
calling predict.gam, I see that the model isn't behaving as expected.

I don't understand why this would be.  Google turns up this old R help
thread: http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html

Simon writes:

 *Smooth terms are constrained to sum to zero over the covariate
 values. **
 **This is an identifiability constraint designed to avoid
 confounding with **
 **the intercept (particularly important if you have more than one
 smooth). *
 If you remove the intercept from you model altogether (m2) then the
 smooth will still sum to zero over the covariate values, which in
 your
 case will mean that the smooth is quite a long way from the data.
 When
 you include the intercept (m1) then the intercept is effectively
 shifting the constrained curve up towards the data, and you get a
 nice fit.

Why?  I haven't read Simon's book in great detail, though I have read
Ruppert et al.'s Semiparametric Regression.  I don't see a reason why
a penalized spline model shouldn't equal the intercept (or zero) when
all of the regressors equals zero.

Is anyone able to help with a bit of intuition?  Or relevant passages
from a good description of why this would be the case?

Furthermore, why does the -1 formula specification work if it
doesn't work as intended by for example lm?

Many thanks,
Andrew






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a vector with repeating dates

2013-04-17 Thread Katherine Gobin
Dear Andrija Djurovic,

Thanks for the suggestion. Ia m aware of rep. However, here I need to repeat 
not only dates, but a string Current_date. Thus, I need to create a vector ( 
to be included in some other data.frame) with the name say dt which will 
contain

dt
Current_date

4/15/2013

4/14/2013

4/13/2013

4/12/2013

Current_date

4/15/2013

4/14/2013

4/13/2013

4/12/2013

Current_date

4/15/2013

4/14/2013

4/13/2013

4/12/2013

So this is combination of dates and a string. Hence, I am just wondering if it 
is possible to create such a vector or not?

Regards

Katherine


--- On Wed, 17/4/13, andrija djurovic djandr...@gmail.com wrote:

From: andrija djurovic djandr...@gmail.com
Subject: Re: [R] Creating a vector with repeating dates
To: Katherine Gobin katherine_go...@yahoo.com
Cc: r-help@r-project.org r-help@r-project.org
Date: Wednesday, 17 April, 2013, 10:14 AM

?rep

On Wed, Apr 17, 2013 at 11:11 AM, Katherine Gobin katherine_go...@yahoo.com 
wrote:

Dear R forum



I have a data.frame



df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013), 
values = c(47, 38, 56, 92))



I need to to create a vector by repeating the dates as



Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,  Current_date, 
4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date, 4/15/2013, 4/14/2013, 
4/13/2013, 4/12/2013



i.e. I need to create a new vector as given below which I need to use for some 
other purpose.



Current_date

4/15/2013

4/14/2013

4/13/2013

4/12/2013

Current_date

4/15/2013

4/14/2013

4/13/2013

4/12/2013

Current_date

4/15/2013

4/14/2013

4/13/2013

4/12/2013



Is it possible to construct such a

 column?



Regards



Katherine







        [[alternative HTML version deleted]]




__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a vector with repeating dates

2013-04-17 Thread Rui Barradas

Hello,

Try the following.

rep(c(Current_date, as.character(df$dates)), 3)


Hope this helps,

Rui Barradas

Em 17-04-2013 10:11, Katherine Gobin escreveu:

Dear R forum

I have a data.frame

df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013), 
values = c(47, 38, 56, 92))

I need to to create a vector by repeating the dates as

Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,  Current_date, 
4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013

i.e. I need to create a new vector as given below which I need to use for some 
other purpose.

Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013
Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013
Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013

Is it possible to construct such a
  column?

Regards

Katherine



[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a vector with repeating dates

2013-04-17 Thread andrija djurovic
Hi.

Here are some examples that can maybe help you:

a - Current date
b - Sys.Date()-1:5
a
b

class(a)
class(b)

c(a,b)
mode(b)
as.numeric(b)
class(c(a,b))

c(a, as.character(b))
class(c(a,b))
class(c(a,as.character(b)))

Hope this helps.


On Wed, Apr 17, 2013 at 11:21 AM, Katherine Gobin katherine_go...@yahoo.com
 wrote:

 Dear Andrija Djurovic,

 Thanks for the suggestion. Ia m aware of rep. However, here I need to
 repeat not only dates, but a string Current_date. Thus, I need to create
 a vector ( to be included in some other data.frame) with the name say dt
 which will contain

 dt
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013

 So this is combination of dates and a string. Hence, I am just wondering
 if it is possible to create such a vector or not?

 Regards

 Katherine


 --- On *Wed, 17/4/13, andrija djurovic djandr...@gmail.com* wrote:


 From: andrija djurovic djandr...@gmail.com
 Subject: Re: [R] Creating a vector with repeating dates
 To: Katherine Gobin katherine_go...@yahoo.com
 Cc: r-help@r-project.org r-help@r-project.org
 Date: Wednesday, 17 April, 2013, 10:14 AM

 ?rep


 On Wed, Apr 17, 2013 at 11:11 AM, Katherine Gobin 
 katherine_go...@yahoo.com http://mc/compose?to=katherine_go...@yahoo.com
  wrote:

 Dear R forum

 I have a data.frame

 df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013,
 4/12/2013), values = c(47, 38, 56, 92))

 I need to to create a vector by repeating the dates as

 Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,
 Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date,
 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013

 i.e. I need to create a new vector as given below which I need to use for
 some other purpose.

 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013

 Is it possible to construct such a
  column?

 Regards

 Katherine



 [[alternative HTML version deleted]]


 __
 R-help@r-project.org http://mc/compose?to=R-help@r-project.org mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a vector with repeating dates

2013-04-17 Thread Jim Lemon

On 04/17/2013 07:11 PM, Katherine Gobin wrote:

Dear R forum

I have a data.frame

df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013), 
values = c(47, 38, 56, 92))

I need to to create a vector by repeating the dates as

Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,  Current_date, 
4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013

i.e. I need to create a new vector as given below which I need to use for some 
other purpose.

Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013
Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013
Current_date
4/15/2013
4/14/2013
4/13/2013
4/12/2013

Is it possible to construct such a
  column?


Hi Katherine,
How about:

rep(c(Current date,paste(4,15:12,2013,sep=/)),3)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I don't understand the 'order' function

2013-04-17 Thread peter dalgaard

On Apr 17, 2013, at 10:41 , Patrick Burns wrote:

 There is a blog post about this:
 
 http://www.portfolioprobe.com/2012/07/26/r-inferno-ism-order-is-not-rank/
 
 And proof that it is possible to confuse them
 even when you know the difference.

It usually helps to remember that x[order(x)] is sort(x)  (and that x[rank(x)] 
is nothing of the sort).

It's somewhat elusive, but not impossible to realize that the two are inverses 
(if no ties). Duncan M. indicated it nicely earlier in the thread: rank() is 
how to permute ordered data to get those observed, order is how to permute the 
data to put them in order. 

They are inverses in terms of composition of permutations, not as 
transformations of sets of integers: rank(order(x)) and order(rank(x)) are both 
equal to order(x), whereas

 x - rnorm(5)
 rank(x)
[1] 4 3 5 2 1
 order(x)
[1] 5 4 2 1 3
 ## permutation matrix
 P - matrix(0,5,5); diag(P[,order(x)]) - 1
 P %*% 1:5
 [,1]
[1,]5
[2,]4
[3,]2
[4,]1
[5,]3
 P2 - matrix(0,5,5); diag(P2[,rank(x)]) - 1
 P2 %*% 1:5
 [,1]
[1,]4
[2,]3
[3,]5
[4,]2
[5,]1
 P %*% P2
 [,1] [,2] [,3] [,4] [,5]
[1,]10000
[2,]01000
[3,]00100
[4,]00010
[5,]00001

Or, as Duncan put it: 
rank(x)[order(x)]  and order(x)[rank(x)] are 1:length(x).

The thing that tends to blow my mind is that order(order(x))==rank(x). I can't 
seem to get my intuition to fathom it, although there's a fairly easy proof in 
that 

1:N == sort(order(x)) == order(x)[order(order(x))] == order(x)[rank(x)]

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spatial Ananlysis: zero.policy=TRUE doesn't work for no neighbour regions??

2013-04-17 Thread Roger Bivand
Molo kurz_m at uni-hohenheim.de writes:

 
...
 *As there are some regions without neighbours in my data I use the following
 code to create the Weights Matrix:*
 
  W_Matrix- nb2listw(location_nbq, style=W, zero.policy=TRUE)
 W_Matrix
 
 *And get this Output:*
 
...
 /(Error in print.listw(list(style = W, neighbours = list(c(23L, 31L, 42L
 : 
   regions with no neighbours found, use zero.policy=TRUE)/
 
 As I use zero.policy=TRUE I just don't understand what I'm doing wrong...
 My question would be: How could I create a Weights Matrix allowing for
 no-neighbour areas? 

You have not grasped the fact that your object W_Matrix has been created
correctly, but that spdep:::print.listw also needs a zero.policy=TRUE, so:

print(W_Matrix, zero.policy=TRUE)

will work. If you want to set this globally for all subsequent function
calls in your current session, use set.ZeroPolicyOption(TRUE).

Hope this clarifies,

Roger

PS. Consider posting questions of this kind to R-sig-geo

 
 Thanks
 Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with DateVisit-gives wrong year?

2013-04-17 Thread Pancho Mulongeni
Hi I have the following factor of dates that I want to converted to Date class 
so I can extract the month
 test.date
 [1] 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012
 [7] 14/05/2012 14/05/2012 14/05/2012 14/05/2012
201 Levels: 01/10/2012 01/11/2012 01/12/2012 02/07/2012 ... 28/09/2012
I use code below
ntest.date-as.Date(test.date,'%d/%m/%y')

but the output has the wrong year, and the reverse order
ntest.date
 [1] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14
 [6] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14

What am I doing wrong?
I dare not say the word 'bug'
Thanks
Pancho Mulongeni
Research Assistant
PharmAccess Foundation
1 Fouché Street
Windhoek West
Windhoek
Namibia
 
Tel:   +264 61 419 000
Fax:  +264 61 419 001/2
Mob: +264 81 4456 286

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regularized Regressions

2013-04-17 Thread Christos Giannoulis
Hi all,

I would greatly appreciate if someone was so kind and share with us a
package or method that uses a regularized regression approach that balances
a regression model performance and model complexity.

That said I would be most grateful is there is an R-package that combines
Ridge (sum of squares coefficients), Lasso: Sum of absolute coefficients
and Best Subsets: Number of coefficients as methods of regularized
regression.

Sincerely,

Christos Giannoulis

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with DateVisit-gives wrong year?

2013-04-17 Thread Jim Lemon

On 04/17/2013 09:18 PM, Pancho Mulongeni wrote:

Hi I have the following factor of dates that I want to converted to Date class 
so I can extract the month

test.date

  [1] 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012
  [7] 14/05/2012 14/05/2012 14/05/2012 14/05/2012
201 Levels: 01/10/2012 01/11/2012 01/12/2012 02/07/2012 ... 28/09/2012
I use code below
ntest.date-as.Date(test.date,'%d/%m/%y')

but the output has the wrong year, and the reverse order
ntest.date
  [1] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14
  [6] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14

What am I doing wrong?
I dare not say the word 'bug'


Hey Pancho,
It is not the bug, it is the case. Try:

ntest.date-as.Date(test.date,%d/%m/%Y)

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with DateVisit-gives wrong year?

2013-04-17 Thread arun
HI,
test.date- rep(c(14/05/2012,01/10/2012,28/09/2012),each=6)
 as.Date(test.date,%d/%m/%Y)
# [1] 2012-05-14 2012-05-14 2012-05-14 2012-05-14 2012-05-14
 #[6] 2012-05-14 2012-10-01 2012-10-01 2012-10-01 2012-10-01
#[11] 2012-10-01 2012-10-01 2012-09-28 2012-09-28 2012-09-28
#[16] 2012-09-28 2012-09-28 2012-09-28
 test.date1- factor(test.date)
 as.Date(test.date1,%d/%m/%Y)
# [1] 2012-05-14 2012-05-14 2012-05-14 2012-05-14 2012-05-14
 #[6] 2012-05-14 2012-10-01 2012-10-01 2012-10-01 2012-10-01
#[11] 2012-10-01 2012-10-01 2012-09-28 2012-09-28 2012-09-28
#[16] 2012-09-28 2012-09-28 2012-09-28
A.K.




- Original Message -
From: Pancho Mulongeni p.mulong...@namibia.pharmaccess.org
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 7:18 AM
Subject: [R] Problem with DateVisit-gives wrong year?

Hi I have the following factor of dates that I want to converted to Date class 
so I can extract the month
 test.date
[1] 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012
[7] 14/05/2012 14/05/2012 14/05/2012 14/05/2012
201 Levels: 01/10/2012 01/11/2012 01/12/2012 02/07/2012 ... 28/09/2012
I use code below
ntest.date-as.Date(test.date,'%d/%m/%y')

but the output has the wrong year, and the reverse order
ntest.date
[1] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14
[6] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14

What am I doing wrong?
I dare not say the word 'bug'
Thanks
Pancho Mulongeni
Research Assistant
PharmAccess Foundation
1 Fouché Street
Windhoek West
Windhoek
Namibia
 
Tel:   +264 61 419 000
Fax:  +264 61 419 001/2
Mob: +264 81 4456 286

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transformation of a variable in a dataframe

2013-04-17 Thread arun
Hi,

You may also try:

dfab-data.frame(A,B)
library(plyr)
 dfcd-subset(mutate(dfab,C=A*2,D=B*3),select=-c(A,B))
#or
 dfcd1-subset(within(dfab,{D-B*3;C-A*2}),select=-c(A,B))
dfcd$C
#[1] 2 4 6
 dfcd$D
#[1] 12 18 21
A.K.



- Original Message -
From: jpm miao miao...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 1:33 AM
Subject: [R] Transformation of a variable in a dataframe

HI,
   I have a dataframe with two variable A, B. I transform the two variable
and name them as C, D and save it in a dataframe  dfcd. However, I wonder
why can't I call them by dfcd$C and dfcd$D?

   Thanks,

Miao

 A=c(1,2,3)
 B=c(4,6,7)
 dfab-data.frame(A,B)
 C=dfab[A]*2
 D=dfab[B]*3
 dfcd-data.frame(C,D)
 dfcd
  A  B
1 2 12
2 4 18
3 6 21
 dfcd$C
NULL
 dfcd$A
[1] 2 4 6

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-17 Thread Kevin Wright
On Tue, Apr 16, 2013 at 4:33 PM, Jim Lemon j...@bitwrit.com.au wrote:

 On 04/17/2013 03:25 AM, Sarah Goslee wrote:

The final point does relate to Excel and any application that hides what is
 going on to the casual observer. I will treasure this URL to give to anyone
 who chastises my moaning when I have to perform some task in Excel. It is
 not an error in the application (although these certainly exist) but a
 salutory caution to those who think that if a reasonable looking number
 appears in a cell, it must be the correct answer. I have found not one, but
 two such errors in the simple calculation of a birthday age from the date
 of birth and date of death.

 Jim


So there (maybe) was a bug in Excel.  Maybe hidden from the casual
observer.  And since Excel is not R, and we are R snobs, Excel is evil,
right?  But, wait.  Is it easier for a casual observer to detect a flaw
in the formula in Excel, or to find an incorrect array index in an R
script?  All ye who want to cast stones upon the interface of Excel should
ask yourselves if you have ever had a bug in R code.

Kevin (no fan of Excel either)



 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Kevin Wright

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove higher order interaction terms

2013-04-17 Thread Liviu Andronic
Dear all,
Consider the model below:

 x - lm(mpg ~ cyl * disp * hp * drat, mtcars)
 summary(x)

Call:
lm(formula = mpg ~ cyl * disp * hp * drat, data = mtcars)

Residuals:
Min  1Q  Median  3Q Max
-3.5725 -0.6603  0.0108  1.1017  2.6956

Coefficients:
   Estimate Std. Error t value Pr(|t|)
(Intercept)   1.070e+03  3.856e+02   2.776  0.01350 *
cyl  -2.084e+02  7.196e+01  -2.896  0.01052 *
disp -6.760e+00  3.700e+00  -1.827  0.08642 .
hp   -9.302e+00  3.295e+00  -2.823  0.01225 *
drat -2.824e+02  1.073e+02  -2.633  0.01809 *
cyl:disp  1.065e+00  5.034e-01   2.116  0.05038 .
cyl:hp1.587e+00  5.296e-01   2.996  0.00855 **
disp:hp   7.422e-02  3.461e-02   2.145  0.04769 *
cyl:drat  5.652e+01  2.036e+01   2.776  0.01350 *
disp:drat 1.824e+00  1.011e+00   1.805  0.08990 .
hp:drat   2.600e+00  9.226e-01   2.819  0.01236 *
cyl:disp:hp  -1.050e-02  4.518e-03  -2.323  0.03368 *
cyl:disp:drat-2.884e-01  1.392e-01  -2.071  0.05484 .
cyl:hp:drat  -4.428e-01  1.504e-01  -2.945  0.00950 **
disp:hp:drat -2.070e-02  9.568e-03  -2.163  0.04600 *
cyl:disp:hp:drat  2.923e-03  1.254e-03   2.331  0.03317 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.245 on 16 degrees of freedom
Multiple R-squared: 0.9284, Adjusted R-squared: 0.8612
F-statistic: 13.83 on 15 and 16 DF,  p-value: 2.007e-06


Is there a straightforward way to remove the highest order interaction
terms? Say:
cyl:disp:hp
cyl:disp:drat
cyl:hp:drat
disp:hp:drat
cyl:disp:hp:drat

I know I could do this:
 x - lm(mpg ~ cyl * disp * hp * drat - cyl:disp:hp - cyl:disp:drat - 
 cyl:hp:drat - disp:hp:drat - cyl:disp:hp:drat, mtcars)

But I was hoping for a more elegant solution. Regards,
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove higher order interaction terms

2013-04-17 Thread Marc Schwartz

On Apr 17, 2013, at 7:23 AM, Liviu Andronic landronim...@gmail.com wrote:

 Dear all,
 Consider the model below:
 
 x - lm(mpg ~ cyl * disp * hp * drat, mtcars)
 summary(x)
 
 Call:
 lm(formula = mpg ~ cyl * disp * hp * drat, data = mtcars)
 
 Residuals:
Min  1Q  Median  3Q Max
 -3.5725 -0.6603  0.0108  1.1017  2.6956
 
 Coefficients:
   Estimate Std. Error t value Pr(|t|)
 (Intercept)   1.070e+03  3.856e+02   2.776  0.01350 *
 cyl  -2.084e+02  7.196e+01  -2.896  0.01052 *
 disp -6.760e+00  3.700e+00  -1.827  0.08642 .
 hp   -9.302e+00  3.295e+00  -2.823  0.01225 *
 drat -2.824e+02  1.073e+02  -2.633  0.01809 *
 cyl:disp  1.065e+00  5.034e-01   2.116  0.05038 .
 cyl:hp1.587e+00  5.296e-01   2.996  0.00855 **
 disp:hp   7.422e-02  3.461e-02   2.145  0.04769 *
 cyl:drat  5.652e+01  2.036e+01   2.776  0.01350 *
 disp:drat 1.824e+00  1.011e+00   1.805  0.08990 .
 hp:drat   2.600e+00  9.226e-01   2.819  0.01236 *
 cyl:disp:hp  -1.050e-02  4.518e-03  -2.323  0.03368 *
 cyl:disp:drat-2.884e-01  1.392e-01  -2.071  0.05484 .
 cyl:hp:drat  -4.428e-01  1.504e-01  -2.945  0.00950 **
 disp:hp:drat -2.070e-02  9.568e-03  -2.163  0.04600 *
 cyl:disp:hp:drat  2.923e-03  1.254e-03   2.331  0.03317 *
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
 Residual standard error: 2.245 on 16 degrees of freedom
 Multiple R-squared: 0.9284,   Adjusted R-squared: 0.8612
 F-statistic: 13.83 on 15 and 16 DF,  p-value: 2.007e-06
 
 
 Is there a straightforward way to remove the highest order interaction
 terms? Say:
 cyl:disp:hp
 cyl:disp:drat
 cyl:hp:drat
 disp:hp:drat
 cyl:disp:hp:drat
 
 I know I could do this:
 x - lm(mpg ~ cyl * disp * hp * drat - cyl:disp:hp - cyl:disp:drat - 
 cyl:hp:drat - disp:hp:drat - cyl:disp:hp:drat, mtcars)
 
 But I was hoping for a more elegant solution. Regards,
 Liviu


If you only want up to say second order interactions:

 summary(lm(mpg ~ (cyl + disp + hp + drat) ^ 2, data = mtcars))

Call:
lm(formula = mpg ~ (cyl + disp + hp + drat)^2, data = mtcars)

Residuals:
Min  1Q  Median  3Q Max 
-3.5487 -1.6998  0.0894  1.2366  4.6138 

Coefficients:
  Estimate Std. Error t value Pr(|t|)  
(Intercept)  9.816e+01  4.199e+01   2.338   0.0294 *
cyl -1.656e+01  1.226e+01  -1.351   0.1910  
disp 1.333e-03  1.634e-01   0.008   0.9936  
hp  -1.936e-01  2.260e-01  -0.857   0.4014  
drat-8.913e+00  8.745e+00  -1.019   0.3197  
cyl:disp 2.134e-02  1.071e-02   1.992   0.0595 .
cyl:hp   3.074e-02  1.970e-02   1.560   0.1337  
cyl:drat 2.590e+00  2.601e+00   0.996   0.3307  
disp:hp -3.846e-04  3.906e-04  -0.985   0.3359  
disp:drat   -3.518e-02  3.951e-02  -0.890   0.3834  
hp:drat  1.210e-02  5.432e-02   0.223   0.8259  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.717 on 21 degrees of freedom
Multiple R-squared:  0.8623,Adjusted R-squared:  0.7967 
F-statistic: 13.15 on 10 and 21 DF,  p-value: 6.237e-07


This is covered in ?formula 

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compate tow different data frame

2013-04-17 Thread arun
Hi,
dat1- read.table(text=
V1    V2
A  1
B  2
C  1
D  3
,sep=,header=TRUE,stringsAsFactors=FALSE)
dat2- read.table(text=
V3    V2
AAA   1
BBB   2
CCC   3
,sep=,header=TRUE,stringsAsFactors=FALSE) 
library(plyr)
join(dat1,dat2,by=V2,type=full)
#  V1 V2  V3
#1  A  1 AAA
#2  B  2 BBB
#3  C  1 AAA
#4  D  3 CCC
merge(dat1,dat2)
#  V2 V1  V3
#1  1  A AAA
#2  1  C AAA
#3  2  B BBB
#4  3  D CCC
A.K.

Dear R-users, 

I have the following 2 files; 

A 

V1    V2 
A      1 
B      2 
C      1 
D      3 

B 

V1         V2 
AAA       1 
BBB       2 
CCC       3 


I want to get this output 

C 
V1    V2     V3 
A      1       AAA 
B      2       BBB 
C      1       AAA 
D      3       CCC 

I want to compare A$V2 with B$V2, if it is the same, then append B$V1 to A. 

How to ?? 

Please

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove higher order interaction terms

2013-04-17 Thread Liviu Andronic
On Wed, Apr 17, 2013 at 2:33 PM, Marc Schwartz marc_schwa...@me.com wrote:
 If you only want up to say second order interactions:

 summary(lm(mpg ~ (cyl + disp + hp + drat) ^ 2, data = mtcars))

This is what I was looking for. Thank you so much.


 This is covered in ?formula

Indeed. I tried to parse ?formula at several occasions in the past few
years, but never quite grasped it fully.

Thanks again,
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] On matrix calculation

2013-04-17 Thread Christofer Bogaso
Hello again,

Let say I have a matrix:

Mat - matrix(1:12, 4, 3)

And a vector:

Vec - 5:8

Now I want to do following:

Each element of row-i in 'Mat' will be divided by i-th element of Vec

Is there any direct way to doing that?

Thanks for your help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regularized Regressions

2013-04-17 Thread Christos Giannoulis
Merhaba, Hello to you too Mehmet (Yasu ki sena)

Thank you for your email and especially for sharing this package. I
appreciate it.

However, my feeling is that this package does not have the third component
of Best Subsets (pls correct me if I am wrong). It uses only a combination
of Ridge and Lasso.

If you happen to know any other packages that uses all of them I would
greatly welcome and appreciate if you were so kind and share it. I tried to
search the cran lists but I am not sure I can found something like that.
That's why I was asking the R-community

Thank you again for your prompt response!

Cheers

Christos

On Wed, Apr 17, 2013 at 8:16 AM, Suzen, Mehmet msu...@gmail.com wrote:

 Yasu,

 Try Elastic nets:
 http://cran.r-project.org/web/packages/pensim/index.html

 There some other packages supporting elastic nets: Just search the CRAN

 Cheers,
 Mehmet


 On 17 April 2013 13:19, Christos Giannoulis cgiann...@gmail.com wrote:
  Hi all,
 
  I would greatly appreciate if someone was so kind and share with us a
  package or method that uses a regularized regression approach that
 balances
  a regression model performance and model complexity.
 
  That said I would be most grateful is there is an R-package that combines
  Ridge (sum of squares coefficients), Lasso: Sum of absolute coefficients
  and Best Subsets: Number of coefficients as methods of regularized
  regression.
 
  Sincerely,
 
  Christos Giannoulis
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] On matrix calculation

2013-04-17 Thread arun
Hi,
Try:
sweep(Mat,1,Vec,/)
 #     [,1] [,2] [,3]
#[1,] 0.200    1 1.80
#[2,] 0.333    1 1.67
#[3,] 0.4285714    1 1.571429
#[4,] 0.500    1 1.50


do.call(rbind,lapply(seq_len(nrow(Mat)),function(i) Mat[i,]/Vec[i]))
#  [,1] [,2] [,3]
#[1,] 0.200    1 1.80
#[2,] 0.333    1 1.67
#[3,] 0.4285714    1 1.571429
#[4,] 0.500    1 1.50
A.K.



- Original Message -
From: Christofer Bogaso bogaso.christo...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 8:39 AM
Subject: [R] On matrix calculation

Hello again,

Let say I have a matrix:

Mat - matrix(1:12, 4, 3)

And a vector:

Vec - 5:8

Now I want to do following:

Each element of row-i in 'Mat' will be divided by i-th element of Vec

Is there any direct way to doing that?

Thanks for your help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change the date into an interval of date?

2013-04-17 Thread arun
Hi,
Try:
evt_c.1- read.table(text=
patient_id   responsed_at
1    2010-5
1    2010-7
1    2010-8
1    2010-9
2    2010-5
2    2010-6
2    2010-7
,sep=,header=TRUE,stringsAsFactors=FALSE)  
lst1-split(evt_c.1,evt_c.1$patient_id)
 res-do.call(rbind,lapply(lst1,function(x) 
{x1-as.numeric(gsub(.*\\-,,x[,2])); x$t-c(0,cumsum(diff(x1)));x}))
 row.names(res)-1:nrow(res)
 res
#  patient_id responsed_at t
#1  1   2010-5 0
#2  1   2010-7 2
#3  1   2010-8 3
#4  1   2010-9 4
#5  2   2010-5 0
#6  2   2010-6 1
#7  2   2010-7 2

#or
library(plyr)
res2-mutate(evt_c.1,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)
res2
#  patient_id responsed_at t
#1  1   2010-5 0
#2  1   2010-7 2
#3  1   2010-8 3
#4  1   2010-9 4
#5  2   2010-5 0
#6  2   2010-6 1
#7  2   2010-7 2
 identical(res,res2)
#[1] TRUE

A.K.



 From: GUANGUAN LUO guanguan...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, April 17, 2013 8:32 AM
Subject: Re: how to change the date into an interval of date?
 


thank you, and now i've got a table like this
 dput(head(evt_c.1,5)) structure(list(responsed_at = c(2010-05, 2010-07, 
 2010-08, 
2010-10, 2010-11), patient_id = c(2L, 2L, 2L, 2L, 2L), number = c(1, 
2, 3, 4, 5), response_id = c(77L, 1258L, 2743L, 4499L, 6224L),  session_id = 
c(2L, 61L, 307L, 562L, 809L), login = c(3002,  3002, 3002, 3002, 3002), 
clinique_basdai.fatigue = c(4, 5,  5, 6, 4), 

which i want is to add a column t, for example
now my table is like this:
patient_id   responsed_at
12010-5
12010-7
12010-8
12010-9
22010-5
22010-6
22010-7 

after add the column t

paient_id responsed_att
12010-5   0
12010-7   2
12010-8   3
12010-9   4
22010-5   0
22010-6   1
22010-7   2 




Le 17 avril 2013 14:23, arun smartpink...@yahoo.com a écrit :

Hi,
format() is one way.
library(zoo)
 as.yearmon(dat1$responsed_at)
#[1] May 2010 Jul 2010 Aug 2010 Oct 2010 Nov 2010 Dec 2010
 #[7] Jan 2011 Feb 2011 Mar 2011 Apr 2011 Jun 2011 Jul 2011
#[13] Aug 2011 Sep 2011 Oct 2011 Nov 2011 Dec 2011 Jan 2012
#[19] Mar 2012 May 2010
A.K.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] On matrix calculation

2013-04-17 Thread Berend Hasselman

On 17-04-2013, at 14:39, Christofer Bogaso bogaso.christo...@gmail.com wrote:

 Hello again,
 
 Let say I have a matrix:
 
 Mat - matrix(1:12, 4, 3)
 
 And a vector:
 
 Vec - 5:8
 
 Now I want to do following:
 
 Each element of row-i in 'Mat' will be divided by i-th element of Vec
 
 Is there any direct way to doing that?


What have you tried?

Because of recycling and storage by columns of matrices and  and the fact 
length(Vec) == nrow(Mat)

Mat / Vec

will do.

Berend
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-17 Thread Gabor Grothendieck
On Tue, Apr 16, 2013 at 1:25 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Given that we occasionally run into problems with comparing Excel
 results to R results, and other spreadsheet-induced errors, I thought
 this might be of interest.

 http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems

 The punchline:

 If this error turns out to be an actual mistake Reinhart-Rogoff made,
 well, all I can hope is that future historians note that one of the
 core empirical points providing the intellectual foundation for the
 global move to austerity in the early 2010s was based on someone
 accidentally not updating a row formula in Excel.

 Ouch.

 (Note: I know nothing about the site, the author of the article, or
 the study in question. I was pointed to it by someone else. But if
 true: highly problematic.)


Herndon, Ash and Pollin (HAP), the authors of the critique, found that
in the highest debt category the Excel error in Rienhart and Rogoff
(RR)  was -0.3 percent points compared to a total error (from that
plus RR's other 2 mistakes) of -2.3 percentage points.  See Figure 1
of HAP. Thus aside from the dubiousness of attributing the coding
error in Excel to Excel itself it was not the main source of the
discrepancy.

Also even if one backs out all three errors that they found, the key
conclusion that GDP growth is declining with debt still occurs (but to
a lesser extent) as pointed out by RR in an initial responding email
reported by Bloomberg News.

The key takeaway here is really unrelated to Excel but rather is that
until data and analyses are shared or made public so that the analysis
can be reproduced one cannot have any real confidence in research
results.

RR
http://www.nber.org/papers/w15639.pdf

HAP
http://www.peri.umass.edu/fileadmin/pdf/working_papers/working_papers_301-350/WP322.pdf

Bloomberg News
http://www.bloomberg.com/news/2013-04-16/reinhart-rogoff-paper-cited-by-ryan-faulted-for-serious-errors-.html

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with DateVisit-gives wrong year?

2013-04-17 Thread Pancho Mulongeni
Thank you,
I see this was due to me using %y instead of %Y,
See ?strptime

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: 17 April 2013 12:31
To: Pancho Mulongeni
Cc: R help
Subject: Re: [R] Problem with DateVisit-gives wrong year?

HI,
test.date- rep(c(14/05/2012,01/10/2012,28/09/2012),each=6)
 as.Date(test.date,%d/%m/%Y)
# [1] 2012-05-14 2012-05-14 2012-05-14 2012-05-14 2012-05-14
 #[6] 2012-05-14 2012-10-01 2012-10-01 2012-10-01 2012-10-01
#[11] 2012-10-01 2012-10-01 2012-09-28 2012-09-28 2012-09-28
#[16] 2012-09-28 2012-09-28 2012-09-28
 test.date1- factor(test.date)
 as.Date(test.date1,%d/%m/%Y)
# [1] 2012-05-14 2012-05-14 2012-05-14 2012-05-14 2012-05-14
 #[6] 2012-05-14 2012-10-01 2012-10-01 2012-10-01 2012-10-01
#[11] 2012-10-01 2012-10-01 2012-09-28 2012-09-28 2012-09-28
#[16] 2012-09-28 2012-09-28 2012-09-28
A.K.




- Original Message -
From: Pancho Mulongeni p.mulong...@namibia.pharmaccess.org
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 7:18 AM
Subject: [R] Problem with DateVisit-gives wrong year?

Hi I have the following factor of dates that I want to converted to Date class 
so I can extract the month
 test.date
[1] 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012 14/05/2012 [7] 
14/05/2012 14/05/2012 14/05/2012 14/05/2012
201 Levels: 01/10/2012 01/11/2012 01/12/2012 02/07/2012 ... 28/09/2012 I use 
code below
ntest.date-as.Date(test.date,'%d/%m/%y')

but the output has the wrong year, and the reverse order ntest.date [1] 
2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14
[6] 2020-05-14 2020-05-14 2020-05-14 2020-05-14 2020-05-14

What am I doing wrong?
I dare not say the word 'bug'
Thanks
Pancho Mulongeni
Research Assistant
PharmAccess Foundation
1 Fouché Street
Windhoek West
Windhoek
Namibia
 
Tel:   +264 61 419 000
Fax:  +264 61 419 001/2
Mob: +264 81 4456 286

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change the date into an interval of date?

2013-04-17 Thread arun
Hi,
I hope this is what you are looking for:
library(plyr)
 
mutate(evt_c.1,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,gsub(-.*,,responsed_at),FUN=function(x)
 c(0,cumsum(diff(x)
#   patient_id responsed_at t
#1   1   2010-5 0
#2   1   2010-7 2
#3   1   2010-8 3
#4   1   2010-9 4
#5   1  2010-12 7
#6   1   2011-1 0
#7   1   2011-2 1
#8   2   2010-5 0
#9   2   2010-6 1
#10  2   2010-7 2
#11  3   2010-1 0
#12  3   2010-2 1
#13  3   2010-4 3
#14  3   2010-5 4
#15  4  2011-01 0
#16  4  2011-03 2
#17  5  2012-04 0
#18  5  2012-06 2
A.K.




 From: GUANGUAN LUO guanguan...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, April 17, 2013 9:21 AM
Subject: Re: how to change the date into an interval of date?
 


evt_c.1- read.table(text=
patient_id   responsed_at
1    2010-5
1    2010-7
1    2010-8
1    2010-9
1    2010-12
1    2011-1
1    2011-2
2    2010-5
2    2010-6
2    2010-7
3    2010-1
3    2010-2
3    2010-4
3    2010-5
4    2011-01
4    2011-03
5    2012-04
5    2012-06
,sep=,header=TRUE,
stringsAsFactors=FALSE)

mutate(evt_c.11,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)  patient_id responsed_at  t
1   1   2010-5  0
2   1   2010-7  2
3   1   2010-8  3
4   1   2010-9  4
5   1  2010-12  7
6   1   2011-1 -4
7   1   2011-2 -3
8   2   2010-5  0
9   2   2010-6  1
10  2   2010-7  2
11  3   2010-1  0
12  3   2010-2  1
13  3   2010-4  3
14  3   2010-5  4
15  4  2011-01  0
16  4  2011-03  2
17  5  2012-04  0
18  5  2012-06  2


This is my problem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Package VIF

2013-04-17 Thread Jaap van Wyk


Hi

Could you perhaps possibly help me. I would like to use the package  
VIF but cannot get results


I attach the .csv file and my R code. What do I have to do ? Any help  
is greatly appreciated.


library(VIF)
coal - read.csv(e:/freekvif/cqa1.csv,header=TRUE)
y - as.numeric(coal$AI)
x - as.matrix(cbind(coal$Gyp, coal$Pyrite, coal$Sid, coal$Calcite,  
coal$Dol, coal$Apatite, coal$Kaol, coal$Quartz, coal$Mica, coal$Micro,  
coal$Rutile))

myd -list(y=y,x=x)
vif.sel - vif(myd$y, myd$x, subsize=11, trace=FALSE)
vif.sel$select

Response from R:
logical(0)

Thank you so much,
Jacob
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regularized Regressions

2013-04-17 Thread Levi Waldron
Perhaps I am wrong, but I think there are only a few packages supporting
Elastic Net, and none of them also perform Best Subsets.


On Wed, Apr 17, 2013 at 8:46 AM, Christos Giannoulis cgiann...@gmail.comwrote:

 Merhaba, Hello to you too Mehmet (Yasu ki sena)

 Thank you for your email and especially for sharing this package. I
 appreciate it.

 However, my feeling is that this package does not have the third component
 of Best Subsets (pls correct me if I am wrong). It uses only a combination
 of Ridge and Lasso.

 If you happen to know any other packages that uses all of them I would
 greatly welcome and appreciate if you were so kind and share it. I tried to
 search the cran lists but I am not sure I can found something like that.
 That's why I was asking the R-community

 Thank you again for your prompt response!

 Cheers

 Christos

 On Wed, Apr 17, 2013 at 8:16 AM, Suzen, Mehmet msu...@gmail.com wrote:

  Yasu,
 
  Try Elastic nets:
  http://cran.r-project.org/web/packages/pensim/index.html
 
  There some other packages supporting elastic nets: Just search the CRAN
 
  Cheers,
  Mehmet
 
 
  On 17 April 2013 13:19, Christos Giannoulis cgiann...@gmail.com wrote:
   Hi all,
  
   I would greatly appreciate if someone was so kind and share with us a
   package or method that uses a regularized regression approach that
  balances
   a regression model performance and model complexity.
  
   That said I would be most grateful is there is an R-package that
 combines
   Ridge (sum of squares coefficients), Lasso: Sum of absolute
 coefficients
   and Best Subsets: Number of coefficients as methods of regularized
   regression.
  
   Sincerely,
  
   Christos Giannoulis
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave: Some questions about potential formatting options

2013-04-17 Thread Max Kuhn
Paul,

#1: I've never tried but you might be able to escape the required tags in
your text (e.g. in html you could write out the b in your text).

#3: Which output? Is this in text?

#2: I may be possible and maybe easy to implement. So if you want to dig
into it, have at it. For me, I'm completely buried in
the foreseeable future and won't be able to pay much attention to it. To be
honest, odfWeave has been fairly neglected by me and lately I've had
thoughts of orphaning the package :-/

Thanks,

Max



On Tue, Apr 16, 2013 at 1:15 PM, Paul Miller pjmiller...@yahoo.com wrote:

 Hi Milan and Max,

 Thanks to each of you for your reply to my post. Thus far, I've managed to
 find answers to some of the questions I asked initially.

 I am now able to control the justification of the leftmost column in my
 tables, as well as to add borders to the top and bottom. I also downloaded
 Milan's revised version of odfWeave at the link below, and found that it
 does a nice job of controlling column widths.

 http://nalimilan.perso.neuf.fr/transfert/odfWeave.tar.gz

 There are some other things I'm still struggling with though.

 1. Is it possible to get odfTableCaption and odfFigureCaption to make the
 titles they produce bold? I understand it might be possible to accomplish
 this by changing something in the styles but am not sure what. If someone
 can give me a hint, I can likely do the rest.

 2. Is there any way to get odfFigureCaption to put titles at the top of
 the figure instead of the bottom? I've noticed that odfTableCaption is able
 to do this but apparently not odfFigureCaption.

 3. Is it possible to add special characters to the output? Below is a
 sample Kaplan-Meier analysis. There's a footnote in there that reads Note:
 X2(1) = xx.xx, p = .. Is there any way to make the X a lowercase Chi
 and to superscript the 2? I did quite a bit of digging on this topic. It
 sounds like it might be difficult, especially if one is using Windows as I
 am.

 Thanks,

 Paul

 ##
  Get data 
 ##

  Load packages 

 require(survival)
 require(MASS)

  Sample analysis 

 attach(gehan)
 gehan.surv - survfit(Surv(time, cens) ~ treat, data= gehan, conf.type =
 log-log)
 print(gehan.surv)

 survTable - summary(gehan.surv)$table
 survTable - data.frame(Treatment = rownames(survTable), survTable,
 row.names=NULL)
 survTable - subset(survTable, select = -c(records, n.max))

 ##
  odfWeave 
 ##

  Load odfWeave 

 require(odfWeave)

  Modify StyleDefs 

 currentDefs - getStyleDefs()

 currentDefs$firstColumn$type - Table Column
 currentDefs$firstColumn$columnWidth - 5 cm
 currentDefs$secondColumn$type - Table Column
 currentDefs$secondColumn$columnWidth - 3 cm

 currentDefs$ArialCenteredBold$fontSize - 10pt
 currentDefs$ArialNormal$fontSize - 10pt
 currentDefs$ArialCentered$fontSize - 10pt
 currentDefs$ArialHighlight$fontSize - 10pt

 currentDefs$ArialLeftBold - currentDefs$ArialCenteredBold
 currentDefs$ArialLeftBold$textAlign - left

 currentDefs$cgroupBorder - currentDefs$lowerBorder
 currentDefs$cgroupBorder$topBorder - 0.0007in solid #00

 setStyleDefs(currentDefs)

  Modify ImageDefs 

 imageDefs - getImageDefs()
 imageDefs$dispWidth - 5.5
 imageDefs$dispHeight- 5.5
 setImageDefs(imageDefs)

  Modify Styles 

 currentStyles - getStyles()
 currentStyles$figureFrame - frameWithBorders
 setStyles(currentStyles)

  Set odt table styles 

 tableStyles - tableStyles(survTable, useRowNames = FALSE, header = )
 tableStyles$headerCell[1,] - cgroupBorder
 tableStyles$header[,1] - ArialLeftBold
 tableStyles$text[,1] - ArialNormal
 tableStyles$cell[2,] - lowerBorder

  Weave odt source file 

 fp - N:/Studies/HCRPC1211/Report/odfWeaveTest/
 inFile - paste(fp, testWeaveIn.odt, sep=)
 outFile - paste(fp, testWeaveOut.odt, sep=)
 odfWeave(inFile, outFile)

 ##
  Contents of .odt source file 
 ##

 Here is a sample Kaplan-Meier table.

 testKMTable, echo=FALSE, results = xml=
 odfTableCaption(“A Sample Kaplan-Meier Analysis Table”)
 odfTable(survTable, useRowNames = FALSE, digits = 3,
 colnames = c(Treatment, Number, Events, Median, 95% LCL, 95%
 UCL),
 colStyles = c(firstColumn, secondColumn, secondColumn,
 secondColumn, secondColumn, secondColumn),
 styles = tableStyles)
 odfCat(“Note: X2(1) = xx.xx, p = .”)
 @

 Here is a sample Kaplan-Meier graph.

 testKMFig, echo=FALSE, fig = TRUE=
 odfFigureCaption(A Sample Kaplan-Meier Analysis Graph, label = Figure)
 plot(gehan.surv, xlab = Time, ylab= Survivorship)
 @





-- 

Max

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, 

Re: [R] how to change the date into an interval of date?

2013-04-17 Thread arun
Hi,
Try this:
library(mondate)
mutate(evt_c.1,t=ave(round(as.numeric(mondate(paste(evt_c.1[,2],-01,sep=,patient_id,FUN=function(x)
 c(0,cumsum(diff(x)
 #  patient_id responsed_at t
#1   1   2010-5 0
#2   1   2010-7 2
#3   1   2010-8 3
#4   1   2010-9 4
#5   1  2010-12 7
#6   1   2011-1 8
#7   1   2011-2 9
#8   2   2010-5 0
#9   2   2010-6 1
#10  2   2010-7 2
#11  3   2010-1 0
#12  3   2010-2 1
#13  3   2010-4 3
#14  3   2010-5 4
#15  4  2011-01 0
#16  4  2011-03 2
#17  5  2012-04 0
#18  5  2012-06 2
If it change:
evt_c.1$responsed_at[6:7]- c(2011-05,2011-07)
 
mutate(evt_c.1,t=ave(round(as.numeric(mondate(paste(evt_c.1[,2],-01,sep=,patient_id,FUN=function(x)
 c(0,cumsum(diff(x)
#   patient_id responsed_at  t
#1   1   2010-5  0
#2   1   2010-7  2
#3   1   2010-8  3
#4   1   2010-9  4
#5   1  2010-12  7
#6   1  2011-05 12
#7   1  2011-07 14
#8   2   2010-5  0
#9   2   2010-6  1
#10  2   2010-7  2
#11  3   2010-1  0
#12  3   2010-2  1
#13  3   2010-4  3
#14  3   2010-5  4
#15  4  2011-01  0
#16  4  2011-03  2
#17  5  2012-04  0
#18  5  2012-06  2

A.K.




 From: GUANGUAN LUO guanguan...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, April 17, 2013 9:25 AM
Subject: Re: how to change the date into an interval of date?
 


mutate(evt_c.11,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)  patient_id responsed_at  t
1   1   2010-5  0
2   1   2010-7  2
3   1   2010-8  3
4   1   2010-9  4
5   1  2010-12  7
6   1   2011-1  8
7   1   2011-2  9
8   2   2010-5  0
9   2   2010-6  1
10  2   2010-7  2
11  3   2010-1  0
12  3   2010-2  1
13  3   2010-4  3
14  3   2010-5  4
15  4  2011-01  0
16  4  2011-03  2
17  5  2012-04  0
18  5  2012-06  2
this is the order i want. you are so kind-hearted.

GG

2013/4/17 arun smartpink...@yahoo.com

Alright, Sorry, I misunderstood.  So, what do you want your result to be at 
2011-1.  Is it 0?







 From: GUANGUAN LUO guanguan...@gmail.com
To: arun smartpink...@yahoo.com 
Sent: Wednesday, April 17, 2013 9:21 AM

Subject: Re: how to change the date into an interval of date?
 


evt_c.1- read.table(text=
patient_id   responsed_at
1    2010-5
1    2010-7
1    2010-8
1    2010-9
1    2010-12
1    2011-1
1    2011-2
2    2010-5
2    2010-6
2    2010-7
3    2010-1
3    2010-2
3    2010-4
3    2010-5
4    2011-01
4    2011-03
5    2012-04
5    2012-06
,sep=,header=TRUE,
stringsAsFactors=FALSE)

mutate(evt_c.11,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)  patient_id responsed_at  t
1   1   2010-5  0
2   1   2010-7  2
3   1   2010-8  3
4   1   2010-9  4
5   1  2010-12  7
6   1   2011-1 -4
7   1   2011-2 -3
8   2   2010-5  0
9   2   2010-6  1
10  2   2010-7  2
11  3   2010-1  0
12  3   2010-2  1
13  3   2010-4  3
14  3   2010-5  4
15  4  2011-01  0
16  4  2011-03  2
17  5  2012-04  0
18  5  2012-06  2


This is my problem.




2013/4/17 arun smartpink...@yahoo.com

If this is not what your problem, please provide a dataset like below and 
explain where is the problem?





- Original Message -
From: arun smartpink...@yahoo.com
To: GUANGUAN LUO guanguan...@gmail.com
Cc:
Sent: Wednesday, April 17, 2013 9:17 AM
Subject: Re: how to change the date into an interval of date?

Hi,
I am not sure I understand your question:
evt_c.1- read.table(text=
patient_id   responsed_at
1    2010-5
1    2010-7
1    2010-8
1    2010-9
2    2010-5
2    2010-6
2    2010-7
3    2010-1
3    2010-2
3    2010-4
3    2010-5
4    2011-01
4    2011-03
5    2012-04
5    2012-06
,sep=,header=TRUE,stringsAsFactors=FALSE)
 
mutate(evt_c.1,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)
   patient_id responsed_at t
1   1   2010-5 0
2   1   2010-7 2
3  

[R] Bug in VGAM z value and coefficient ?

2013-04-17 Thread Olivier Merle
Dear,


When i multiply the y of a regression by 10, I would expect that the
coefficient would be multiply by 10 and the z value to stay constant. Here
some reproducible code to support the case.

*Ex 1*
library(mvtnorm)
library(VGAM)
set.seed(1)
x=rmvnorm(1000,sigma=matrix(c(1,0.75,0.75,1),2,2))
summary(vglm(y~x,family=studentt2,data=data.frame(y=x[,1],x=x[,2])))
summary(vglm(y~x,family=studentt2,data=data.frame(y=x[,1]*10,x=x[,2])))
summary(vglm(y~x,family=cauchy1,data=data.frame(y=x[,1],x=x[,2])))
summary(vglm(y~x,family=cauchy1,data=data.frame(y=x[,1]*10,x=x[,2])))
summary(vglm(y~x,family=hypersecant,data=data.frame(y=x[,1],x=x[,2])))
summary(vglm(y~x,family=hypersecant,data=data.frame(y=x[,1]*10,x=x[,2])))

*Ex 2*
library(VGAM)
tdata - data.frame(x2 = runif(nn - 1000))
tdata - transform(tdata, y1 = rt(nn, df = exp(exp(0.5 - x2))),
  y2 = rt(nn, df = exp(exp(0.5 - x2
fit1 - vglm(y1 ~ x2, studentt, tdata, trace = TRUE)
tdata$y1=tdata$y1*100
fit2 - vglm(y1 ~ x2, studentt, tdata, trace = TRUE)
coef(fit1, matrix = TRUE)
coef(fit2, matrix = TRUE)

I also feel that often VGAM package (vglm function) dont converge and just
stops. Do you know a reliable package with a lot of available distribution ?


Thanks,

Olivier

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: could not find function invlogit and bayesglm

2013-04-17 Thread S'dumo Masango
I have installed the arm package and its dependents (e.g MATRIX, etc), but
cannot use the functions invlogit and bayesglm because it gives me the
error message Error: could not find function invlogit or Error: could not
find function invlogit. What could be the problem.

 

Regards

 

Carrington


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a vector with repeating dates

2013-04-17 Thread Katherine Gobin
Dear Sir,

Thanks a lot for your valuable suggestions and help. 

Regards

Katherine


--- On Wed, 17/4/13, Jim Lemon j...@bitwrit.com.au wrote:

From: Jim Lemon j...@bitwrit.com.au
Subject: Re: [R] Creating a vector with repeating dates
To: Katherine Gobin katherine_go...@yahoo.com
Cc: r-help@r-project.org
Date: Wednesday, 17 April, 2013, 10:35 AM

On 04/17/2013 07:11 PM, Katherine Gobin wrote:
 Dear R forum

 I have a data.frame

 df = data.frame(dates = c(4/15/2013, 4/14/2013, 4/13/2013, 
 4/12/2013), values = c(47, 38, 56, 92))

 I need to to create a vector by repeating the dates as

 Current_date, 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013,  Current_date, 
 4/15/2013, 4/14/2013, 4/13/2013, 4/12/2013, Current_date, 4/15/2013, 
 4/14/2013, 4/13/2013, 4/12/2013

 i.e. I need to create a new vector as given below which I need to use for 
 some other purpose.

 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013
 Current_date
 4/15/2013
 4/14/2013
 4/13/2013
 4/12/2013

 Is it possible to construct such a
   column?

Hi Katherine,
How about:

rep(c(Current date,paste(4,15:12,2013,sep=/)),3)

Jim


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Anova unbalanced

2013-04-17 Thread paladini

Hello everybody,
I have got a data set with about 400 companies. Each company has a 
score for its enviroment comportment between 0 and 100. These companies 
belong to  about 15 different countries. I have e.g. 70 companies from 
UK and 5 from Luxembourg,- so the data set is pretty unbalanced and I 
want to do an ANOVA. Somthing like aov(enviromentscore~country). But the 
aov function is just for a balanced design.
So I wonder if I can use fit=lm(enviromentscore~country), anova (fit) 
instead? Would this be okay or can it also only be used with balanced 
data?


Thanking you in anticipation, best regards


Claudia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pathological case for the function 'brunch' of CAPER?

2013-04-17 Thread Xavier Prudent
Dear R-enthusiats,

While using the regression fonction 'brunch' of CAPER (with R v2.15.4),

in a simple case (binary variable Yes/No vs. a continuous variable)

I ended with an unexplained error:

Error in if (any(stRes  robust)) { :
  missing value where TRUE/FALSE needed


I simplified my code so that you can run it, just copy everything in a
directory and run
source(analysis.R)

  Code:
http://iktp.tu-dresden.de/~prudent/Divers/R/analysis.R

  Tree:
http://iktp.tu-dresden.de/~prudent/Divers/R/vertebrates.tree

  Data:
http://iktp.tu-dresden.de/~prudent/Divers/R/data.txt

The source of the error is the pruning, (particularly for these tips:

cavPor3, myoLuc1)

but after searching around I still have no clue of what is happening.

Any hint is welcome!

Thanks in advance,

Xavier

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: could not find function invlogit and bayesglm

2013-04-17 Thread John Kane
Have you loaded it?

library(arm)

John Kane
Kingston ON Canada


 -Original Message-
 From: masan...@uniswa.sz
 Sent: Wed, 17 Apr 2013 10:08:39 +0200
 To: r-help@r-project.org
 Subject: [R] Error: could not find function invlogit and bayesglm
 
 I have installed the arm package and its dependents (e.g MATRIX, etc),
 but
 cannot use the functions invlogit and bayesglm because it gives me
 the
 error message Error: could not find function invlogit or Error: could
 not
 find function invlogit. What could be the problem.
 
 
 
 Regards
 
 
 
 Carrington
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: could not find function invlogit and bayesglm

2013-04-17 Thread Jorge I Velez
Hi Carrington,

You also need the boot package (see
http://stat.ethz.ch/R-manual/R-patched/library/boot/html/inv.logit.html )
 As for the other function, please load the arm package, e.g.,

require(arm)
require(boot)

and then you will be able to use the functions mentioned below.

HTH,
Jorge.-


On Wed, Apr 17, 2013 at 6:08 PM, S'dumo Masango masan...@uniswa.sz wrote:

 I have installed the arm package and its dependents (e.g MATRIX, etc), but
 cannot use the functions invlogit and bayesglm because it gives me the
 error message Error: could not find function invlogit or Error: could
 not
 find function invlogit. What could be the problem.



 Regards



 Carrington


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: could not find function invlogit and bayesglm

2013-04-17 Thread Berend Hasselman

On 17-04-2013, at 10:08, S'dumo Masango masan...@uniswa.sz wrote:

 I have installed the arm package and its dependents (e.g MATRIX, etc), but
 cannot use the functions invlogit and bayesglm because it gives me the
 error message Error: could not find function invlogit or Error: could not
 find function invlogit. What could be the problem.
 
 
Have you done 

library(arm)

etc?

Berend
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Overlay two stat_ecdf() plots

2013-04-17 Thread PIKAL Petr
Hi

Your files have only one column, so melted data is virtualy the same.

When I read  them as test1 and test2 i can do

plot(ecdf(test1$Down))
plot(ecdf(test2$Up), add=T, col=2)

Or using previously ustated ggplot2 package

test-rbind(melt(test1),melt(test2))

p-ggplot(test, aes(x=value, colour=variable))
p+stat_ecdf()

gives me 2 curves.

What is your problem?

Petr

BTW. please use preferably

dput(your.data)

for providing data for us.

From: Robin Mjelle [mailto:robinmje...@gmail.com]
Sent: Tuesday, April 16, 2013 11:09 PM
To: PIKAL Petr
Subject: Re: [R] Overlay two stat_ecdf() plots


Dear Petr,
I have attached the two tables that I want to plot using stat_ecdf().
To plot one of the table I use:

Down - 
read.table(FC_For_top100Down_RegulatedMiRNATargetsClean.csv,sep=,header=T)
Down.m - melt(Down,variable.namehttp://variable.name=DownFC)
ggplot(Down.m, aes(value)) + stat_ecdf()
This workes fine, but how do I plot both files in one plot?

On Tue, Apr 16, 2013 at 9:45 AM, PIKAL Petr 
petr.pi...@precheza.czmailto:petr.pi...@precheza.cz wrote:
Hi

Do you mean ecdf? If yes just ose add option in plot.

plot(ecdf(rnorm(100, 1,2)))
plot(ecdf(rnorm(100, 2,2)), add=TRUE, col=2)

If not please specify from where is ecdf_stat or stat_ecdf which, as you 
indicate, are the same functions.

Regrdas
Petr




 -Original Message-
 From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
 [mailto:r-help-bounces@r-mailto:r-help-bounces@r-
 project.orghttp://project.org] On Behalf Of Robin Mjelle
 Sent: Monday, April 15, 2013 1:10 PM
 To: r-help@r-project.orgmailto:r-help@r-project.org
 Subject: [R] Overlay two stat_ecdf() plots

 I want to plot two scdf-plots in the same graph.
 I have two input tables with one column each:

  Targets - read.table(/media/, sep=, header=T) NonTargets -
  read.table(/media/..., sep=, header=T)

  head(Targets)
 V1
 1 3.160514
 2 6.701948
 3 4.093844
 4 1.992014
 5 1.604751
 6 2.076802

  head(NonTargets)
  V1
 1  3.895934
 2  1.990506
 3 -1.746919
 4 -3.451477
 5  5.156554
 6  1.195109

  Targets.m - melt(Targets)
  head(Targets.m)
   variablevalue
 1   V1 3.160514
 2   V1 6.701948
 3   V1 4.093844
 4   V1 1.992014
 5   V1 1.604751
 6   V1 2.076802

  NonTargets.m - melt(NonTargets)
  head(NonTargets.m)
   variable value
 1   V1  3.895934
 2   V1  1.990506
 3   V1 -1.746919
 4   V1 -3.451477
 5   V1  5.156554
 6   V1  1.195109


 How do I proceed to plot them in one Graph using ecdf_stat()

   [[alternative HTML version deleted]]

 __
 R-help@r-project.orgmailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mancova with R

2013-04-17 Thread Rémi Lesmerises
Dear all,

I'm trying to compare two sets of variables, the first set is composed 
exclusively of numerical variables and the second regroups factors and 
numerical variables. I can't use a Manova because of this inclusion of 
numerical variables in the second set. The solution should be to perform a 
Mancova, but I didn't find any package that allow this type of test.

I've already looked in this forum and on the net to find answers, but the only 
thing I've found is the following:


lm(as.matrix(Y) ~  x+z)
x and z could be numerical and factors. The problem with that is it actually 
only perform a succession of lm (or glm), one for each numerical variable 
contained in the Y matrix. It is not a true MANCOVA that do a significance test 
(most often a Wald test) for the overall two sets comparison. Such a test is 
available in SPSS and SAS, but I really want to stay in R! Someone have any 
idea?

Thanks in advance for your help!
 
Rémi Lesmerises, biol. M.Sc.,
Candidat Ph.D. en Biologie
Université du Québec à Rimouski
remilesmeri...@yahoo.ca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot 2 y axis

2013-04-17 Thread John Kane
Exel is hardly the epitome of good graphing.  There are a couple of ways to do 
what you want and Jim has shown you one , but ...

What about using two panels to present the data as in 
opar  -  par(mfrow = c(2, 1))
plot(dat1$Date, dat1$Weight, col = red, xlab = , ylab = Weight)
plot(dat1$Date, dat1$Height, col = blue, xlab = Date, ylab = Height)
par  -  opar


John Kane
Kingston ON Canada


 -Original Message-
 From: ye...@lbl.gov
 Sent: Tue, 16 Apr 2013 15:35:29 -0700
 To: r-help@r-project.org
 Subject: [R] plot 2 y axis
 
 Hi,
 
 I want to plot two variables on the same graph but with two y axis just
 like what you can do in Excel. I searched online that seems like you can
 not achieve that in ggplot. So is there anyway I can do it in a nice way
 in
 basic plot?
 
 Suppose my data looks like this:
 
 WeightHeight   Date
 0.1   0.31
 0.2  0.42
 0.3  0.83
 0.6   1  4
 
 I want to haveDateas X axis ,Weight as the left y axis and Height
 as
 the right y axis.
 
 Thanks.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spatial graph and its boundary

2013-04-17 Thread Ondřej Mikula
Dear r-helpers,
I have a graph created using the library 'spatgraphs'.

library(spatstat)
library(spatgraphs)
xy - rbind(c(28.39, -16.27), c(30.62, -20.13), c(32.25, -28.7), c(22.43,
-27.22), c(27.5, -21.17), c(31.22, -24.52), c(17.93, -26.92), c(18.72,
-17.95), c(24.15, -17.82), c(29.23, -22.85))
ow - owin(xrange=range(xy[,1]), yrange=range(xy[,2]))
pp - ppp(x=xy[,1],y=xy[,2],n=nrow(xy),window=ow)
gg - spatgraph(pp=pp, type=gabriel)
plot(gg, asp=1, pp=pp, add=FALSE)
lines(xy[c(1,2,10,6,3,4,7,8,9,1),],col=2,lwd=2)

and now I need to automatically extract polygon corresponding to its outer
boundary, highlighted here in red. Are you aware of such function in R? I
looked for it hard but with no success.
Thank you in advance for any hint.

Ondřej

-- 
Ondřej Mikula

Institute of Animal Physiology and Genetics
Academy of Sciences of the Czech Republic
Veveri 97, 60200 Brno, Czech Republic

Institute of Vertebrate Biology
Academy of Sciences of the Czech Republic
Studenec 122, 67502 Konesin, Czech Republic

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use of names() within lapply()

2013-04-17 Thread Ivan Alves
Dear all,

List g has 2 elements

 names(g)
[1] 2009-10-07 2012-02-29

and the list plot

lapply(g, plot, main=names(g))

results in equal plot titles with both list names, whereas distinct titles 
names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead 
of consecutively passing g[1] and then g[2] to process the additional 'main'  
argument to plot.  help(lapply) is mute as to what to element-wise pass 
parameters.  Any suggestion would be appreciated.

Kind regards,
Ivan
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mancova with R

2013-04-17 Thread John Fox
Dear Remi,

Take a look at the Anova() function in the car package. In your case, you could 
use

Anova(lm(as.matrix(Y) ~  x + z))

or, for more detail,

summary(Anova(lm(as.matrix(Y) ~  x + z)))

I hope this helps,
 John


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

On Wed, 17 Apr 2013 07:47:27 -0700 (PDT)
 Rémi Lesmerises remilesmeri...@yahoo.ca wrote:
 Dear all,
 
 I'm trying to compare two sets of variables, the first set is composed 
 exclusively of numerical variables and the second regroups factors and 
 numerical variables. I can't use a Manova because of this inclusion of 
 numerical variables in the second set. The solution should be to perform a 
 Mancova, but I didn't find any package that allow this type of test.
 
 I've already looked in this forum and on the net to find answers, but the 
 only thing I've found is the following:
 
 
 lm(as.matrix(Y) ~  x+z)
 x and z could be numerical and factors. The problem with that is it actually 
 only perform a succession of lm (or glm), one for each numerical variable 
 contained in the Y matrix. It is not a true MANCOVA that do a significance 
 test (most often a Wald test) for the overall two sets comparison. Such a 
 test is available in SPSS and SAS, but I really want to stay in R! Someone 
 have any idea?
 
 Thanks in advance for your help!
  
 Rémi Lesmerises, biol. M.Sc.,
 Candidat Ph.D. en Biologie
 Université du Québec à Rimouski
 remilesmeri...@yahoo.ca
 
   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use of names() within lapply()

2013-04-17 Thread Duncan Murdoch

On 17/04/2013 11:04 AM, Ivan Alves wrote:

Dear all,

List g has 2 elements

 names(g)
[1] 2009-10-07 2012-02-29

and the list plot

lapply(g, plot, main=names(g))

results in equal plot titles with both list names, whereas distinct titles 
names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead 
of consecutively passing g[1] and then g[2] to process the additional 'main'  
argument to plot.  help(lapply) is mute as to what to element-wise pass 
parameters.  Any suggestion would be appreciated.


I think you want mapply rather than lapply, or you could do lapply on a 
vector of indices.  For example,


mapply(plot, g, main=names)

or

lapply(1:2, function(i) plot(g[[i]], main=names(g)[i]))

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use of names() within lapply()

2013-04-17 Thread arun
Hi,
Try:
set.seed(25)
g- list(sample(1:40,20,replace=TRUE),sample(40:60,20,replace=TRUE))
names(g)- c(2009-10-07,2012-02-29)
pdf(Trialnew.pdf)
 lapply(seq_along(g),function(i) plot(g[[i]],main=names(g)[i]))
dev.off()
A.K.




- Original Message -
From: Ivan Alves pap:
u...@me.com
To: R-help@r-project.org R-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 11:04 AM
Subject: [R] use of names() within lapply()

Dear all,

List g has 2 elements

 names(g)
[1] 2009-10-07 2012-02-29

and the list plot

lapply(g, plot, main=names(g))

results in equal plot titles with both list names, whereas distinct titles 
names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead 
of consecutively passing g[1] and then g[2] to process the additional 'main'  
argument to plot.  help(lapply) is mute as to what to element-wise pass 
parameters.  Any suggestion would be appreciated.

Kind regards,
Ivan
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] proper way to handle obsolete function names

2013-04-17 Thread Jannis

Dear R community,


what would be the proper R way to handle obsolete function names? I have 
created several packages with functions and sometimes would like to 
change the name of a function but would like to create a mechanism that 
other scripts of functions using the old name still work.



Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to change the date into an interval of date?

2013-04-17 Thread arun
Hi.
No problem.
cc:ing to Rhelp.
A.K.





 From: GUANGUAN LUO guanguan...@gmail.com

Sent: Wednesday, April 17, 2013 10:25 AM
Subject: Re: how to change the date into an interval of date?



Thank you so much . That is exactly the things i want.
GG




Hi,
Try this:
library(mondate)
mutate(evt_c.1,t=ave(round(as.numeric(mondate(paste(evt_c.1[,2],-01,sep=,patient_id,FUN=function(x)
 c(0,cumsum(diff(x)

 #  patient_id responsed_at t
#1   1   2010-5 0
#2   1   2010-7 2
#3   1   2010-8 3
#4   1   2010-9 4

#5   1  2010-12 7
#6   1   2011-1 8
#7   1   2011-2 9
#8   2   2010-5 0
#9   2   2010-6 1
#10  2   2010-7 2
#11  3   2010-1 0
#12  3   2010-2 1
#13  3   2010-4 3
#14  3   2010-5 4
#15  4  2011-01 0
#16  4  2011-03 2
#17  5  2012-04 0
#18  5  2012-06 2
If it change:
evt_c.1$responsed_at[6:7]- c(2011-05,2011-07)
 
mutate(evt_c.1,t=ave(round(as.numeric(mondate(paste(evt_c.1[,2],-01,sep=,patient_id,FUN=function(x)
 c(0,cumsum(diff(x)

#   patient_id responsed_at  t
#1   1   2010-5  0
#2   1   2010-7  2
#3   1   2010-8  3
#4   1   2010-9  4
#5   1  2010-12  7
#6   1  2011-05 12
#7   1  2011-07 14

#8   2   2010-5  0
#9   2   2010-6  1
#10  2   2010-7  2
#11  3   2010-1  0
#12  3   2010-2  1
#13  3   2010-4  3
#14  3   2010-5  4
#15  4  2011-01  0
#16  4  2011-03  2
#17  5  2012-04  0
#18  5  2012-06  2


A.K.




 From: GUANGUAN LUO guanguan...@gmail.com

Sent: Wednesday, April 17, 2013 9:25 AM

Subject: Re: how to change the date into an interval of date?



mutate(evt_c.11,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)  patient_id responsed_at  t
1           1       2010-5  0
2           1       2010-7  2
3           1       2010-8  3
4           1       2010-9  4
5           1      2010-12  7
6           1       2011-1  8
7           1       2011-2  9
8           2       2010-5  0
9           2       2010-6  1
10          2       2010-7  2
11          3       2010-1  0
12          3       2010-2  1
13          3       2010-4  3
14          3       2010-5  4
15          4      2011-01  0
16          4      2011-03  2
17          5      2012-04  0
18          5      2012-06  2
this is the order i want. you are so kind-hearted.

GG



Alright, Sorry, I misunderstood.  So, what do you want your result to be at 
2011-1.  Is it 0?







 From: GUANGUAN LUO guanguan...@gmail.com

Sent: Wednesday, April 17, 2013 9:21 AM

Subject: Re: how to change the date into an interval of date?



evt_c.1- read.table(text=
patient_id   responsed_at
1    2010-5
1    2010-7
1    2010-8
1    2010-9
1    2010-12
1    2011-1
1    2011-2
2    2010-5
2    2010-6
2    2010-7
3    2010-1
3    2010-2
3    2010-4
3    2010-5
4    2011-01
4    2011-03
5    2012-04
5    2012-06
,sep=,header=TRUE,
stringsAsFactors=FALSE)

mutate(evt_c.11,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 c(0,cumsum(diff(x)  patient_id responsed_at  t
1           1       2010-5  0
2           1       2010-7  2
3           1       2010-8  3
4           1       2010-9  4
5           1      2010-12  7
6           1       2011-1 -4
7           1       2011-2 -3
8           2       2010-5  0
9           2       2010-6  1
10          2       2010-7  2
11          3       2010-1  0
12          3       2010-2  1
13          3       2010-4  3
14          3       2010-5  4
15          4      2011-01  0
16          4      2011-03  2
17          5      2012-04  0
18          5      2012-06  2


This is my problem.






If this is not what your problem, please provide a dataset like below and 
explain where is the problem?





- Original Message -

To: GUANGUAN LUO guanguan...@gmail.com
Cc:
Sent: Wednesday, April 17, 2013 9:17 AM
Subject: Re: how to change the date into an interval of date?

Hi,
I am not sure I understand your question:
evt_c.1- read.table(text=
patient_id   responsed_at
1    2010-5
1    2010-7
1    2010-8
1    2010-9
2    2010-5
2    2010-6
2    2010-7
3    2010-1
3    2010-2
3    2010-4
3    2010-5
4    2011-01
4    2011-03
5    2012-04
5    2012-06
,sep=,header=TRUE,stringsAsFactors=FALSE)
 
mutate(evt_c.1,t=ave(as.numeric(gsub(.*\\-,,responsed_at)),patient_id,FUN=function(x)
 

Re: [R] Mancova with R

2013-04-17 Thread Rémi Lesmerises
Dear John,

Thanks for your comments! But when I tried your suggestion, the output was as 
the following:


 Response Dist_arbre :
            Df     Sum Sq    Mean Sq F value    Pr(F)    
Poids        1 0.00010398 0.00010398  6.2910 0.0364733 *  
Age          1 0.5202 0.5202  3.1476 0.1139652    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ 
’ 1 

I have the P-value but not the direction of the relationship, information that 
I had with  lm(as.matrix(Y) ~  x+z). I could combine the results of these 
two tests, but it seems inelegant to me. Moreover I didn't have a total 
significance test as with a true MANCOVA. 

An idea?!

Rémi Lesmerises, biol. M.Sc.,
Candidat Ph.D. en Biologie
Université du Québec à Rimouski
300, allée des Ursulines
remilesmeri...@yahoo.ca




 De : John Fox j...@mcmaster.ca
À : Rémi Lesmerises remilesmeri...@yahoo.ca 
Cc : r-help@r-project.org r-help@r-project.org 
Envoyé le : mercredi 17 avril 2013 10h54
Objet : Re: [R] Mancova with R


Dear Remi,

Take a look at the Anova() function in the car package. In your case, you could 
use

Anova(lm(as.matrix(Y) ~  x + z))

or, for more detail,

summary(Anova(lm(as.matrix(Y) ~  x + z)))

I hope this helps,
John


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
    
On Wed, 17 Apr 2013 07:47:27 -0700 (PDT)
Rémi Lesmerises remilesmeri...@yahoo.ca wrote:
 Dear all,
 
 I'm trying to compare two sets of variables, the first set is composed 
 exclusively of numerical variables and the second regroups factors and 
 numerical variables. I can't use a Manova because of this inclusion of 
 numerical variables in the second set. The solution should be to perform a 
 Mancova, but I didn't find any package that allow this type of test.
 
 I've already looked in this forum and on the net to find answers, but the 
 only thing I've found is the following:
 
 
 lm(as.matrix(Y) ~  x+z)
 x and z could be numerical and factors. The problem with that is it actually 
 only perform a succession of lm (or glm), one for each numerical variable 
 contained in the Y matrix. It is not a true MANCOVA that do a significance 
 test (most often a Wald test) for the overall two sets comparison. Such a 
 test is available in SPSS and SAS, but I really want to stay in R! Someone 
 have any idea?
 
 Thanks in advance for your help!
  
 Rémi Lesmerises, biol. M.Sc.,
 Candidat Ph.D. en Biologie
 Université du Québec à Rimouski
 remilesmeri...@yahoo.ca
 
     [[alternative HTML version deleted]]
 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use of names() within lapply()

2013-04-17 Thread Ivan Alves
Dear Duncan and A.K.
Many thanks for your super quick help. The modified lapply did the trick, 
mapply died with a error Error in dots[[2L]][[1L]] : object of type 'builtin' 
is not subsettable.
Kind regards,
Ivan
On 17 Apr 2013, at 17:12, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 17/04/2013 11:04 AM, Ivan Alves wrote:
 Dear all,
 
 List g has 2 elements
 
  names(g)
 [1] 2009-10-07 2012-02-29
 
 and the list plot
 
 lapply(g, plot, main=names(g))
 
 results in equal plot titles with both list names, whereas distinct titles 
 names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in 
 stead of consecutively passing g[1] and then g[2] to process the additional 
 'main'  argument to plot.  help(lapply) is mute as to what to element-wise 
 pass parameters.  Any suggestion would be appreciated.
 
 I think you want mapply rather than lapply, or you could do lapply on a 
 vector of indices.  For example,
 
 mapply(plot, g, main=names)
 
 or
 
 lapply(1:2, function(i) plot(g[[i]], main=names(g)[i]))
 
 Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mancova with R

2013-04-17 Thread John Fox
Dear Remi,

On Wed, 17 Apr 2013 08:23:07 -0700 (PDT)
 Rémi Lesmerises remilesmeri...@yahoo.ca wrote:
 Dear John,
 
 Thanks for your comments! But when I tried your suggestion, the output was as 
 the following:
 
 
  Response Dist_arbre :
             Df     Sum Sq    Mean Sq F value    Pr(F)    
 Poids        1 0.00010398 0.00010398  6.2910 0.0364733 *  
 Age          1 0.5202 0.5202  3.1476 0.1139652    
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
 
 I have the P-value but not the direction of the relationship, information 
 that I had with  lm(as.matrix(Y) ~  x+z). I could combine the results of 
 these two tests, but it seems inelegant to me. Moreover I didn't have a total 
 significance test as with a true MANCOVA. 
 
 An idea?!

Yes, try what I suggested. The output that you show here isn't from the the 
Anova() function in the car package. As well, you might find it useful to read 
the on-line appendix on multivariate linear models, at 
http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix/Appendix-Multivariate-Linear-Models.pdf,
 from the book with which the car package is associated.

Best,
 John

 
 Rémi Lesmerises, biol. M.Sc.,
 Candidat Ph.D. en Biologie
 Université du Québec à Rimouski
 300, allée des Ursulines
 remilesmeri...@yahoo.ca
 
 
 
 
  De : John Fox j...@mcmaster.ca
 À : Rémi Lesmerises remilesmeri...@yahoo.ca 
 Cc : r-help@r-project.org r-help@r-project.org 
 Envoyé le : mercredi 17 avril 2013 10h54
 Objet : Re: [R] Mancova with R
  
 
 Dear Remi,
 
 Take a look at the Anova() function in the car package. In your case, you 
 could use
 
 Anova(lm(as.matrix(Y) ~  x + z))
 
 or, for more detail,
 
 summary(Anova(lm(as.matrix(Y) ~  x + z)))
 
 I hope this helps,
 John
 
 
 John Fox
 Sen. William McMaster Prof. of Social Statistics
 Department of Sociology
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/
     
 On Wed, 17 Apr 2013 07:47:27 -0700 (PDT)
 Rémi Lesmerises remilesmeri...@yahoo.ca wrote:
  Dear all,
  
  I'm trying to compare two sets of variables, the first set is composed 
  exclusively of numerical variables and the second regroups factors and 
  numerical variables. I can't use a Manova because of this inclusion of 
  numerical variables in the second set. The solution should be to perform a 
  Mancova, but I didn't find any package that allow this type of test.
  
  I've already looked in this forum and on the net to find answers, but the 
  only thing I've found is the following:
  
  
  lm(as.matrix(Y) ~  x+z)
  x and z could be numerical and factors. The problem with that is it 
  actually only perform a succession of lm (or glm), one for each numerical 
  variable contained in the Y matrix. It is not a true MANCOVA that do a 
  significance test (most often a Wald test) for the overall two sets 
  comparison. Such a test is available in SPSS and SAS, but I really want to 
  stay in R! Someone have any idea?
  
  Thanks in advance for your help!
   
  Rémi Lesmerises, biol. M.Sc.,
  Candidat Ph.D. en Biologie
  Université du Québec à Rimouski
  remilesmeri...@yahoo.ca
  
      [[alternative HTML version deleted]]
  


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova unbalanced

2013-04-17 Thread Jose Iparraguirre
Dear Claudia,

Your question has been posed on many previous occasions. 
The (short) answer has always been the same: have a look at the Anova function 
in the car package but before doing that, get a copy of John Fox's Applied 
Regression Analysis and Generalized Linear Models book.
Best,

José


José Iparraguirre
Chief Economist
Age UK

T 020 303 31482
E jose.iparragui...@ageuk.org.uk
Twitter @jose.iparraguirre@ageuk


Tavis House, 1- 6 Tavistock Square
London, WC1H 9NB
www.ageuk.org.uk | ageukblog.org.uk | @ageukcampaigns 




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of paladini
Sent: 17 April 2013 10:47
To: r-help@r-project.org
Subject: [R] Anova unbalanced

Hello everybody,
I have got a data set with about 400 companies. Each company has a 
score for its enviroment comportment between 0 and 100. These companies 
belong to  about 15 different countries. I have e.g. 70 companies from 
UK and 5 from Luxembourg,- so the data set is pretty unbalanced and I 
want to do an ANOVA. Somthing like aov(enviromentscore~country). But the 
aov function is just for a balanced design.
So I wonder if I can use fit=lm(enviromentscore~country), anova (fit) 
instead? Would this be okay or can it also only be used with balanced 
data?

Thanking you in anticipation, best regards


Claudia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Please donate to the Syria Crisis Appeal by text or online:

To donate £5 by mobile, text SYRIA to 70800.  To donate online, please visit 

http://www.ageinternational.org.uk/syria

Over one million refugees are desperately in need of water, food, healthcare, 
warm clothing, 
blankets and shelter; Age International urgently needs your support to help 
affected older refugees.


Age International is a subsidiary charity of Age UK and a member of the 
Disasters Emergency Committee (DEC).  
The DEC launches and co-ordinates national fundraising appeals for public 
donations on behalf of its member agencies.

Texts cost £5 plus one standard rate message.  Age International will receive a 
minimum of £4.96.  
More info at ageinternational.org.uk/SyriaTerms



 

---
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798). 
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer 
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health 
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and 
regulated by the Financial Services Authority. 
--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing 
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged 
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of people in later life.  The three national 
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help 
the Aged in these nations to form three registered charities: 
Age Scotland, Age NI, Age Cymru.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regularized Regressions

2013-04-17 Thread Christos Giannoulis
Hi Levi,

Thank you very much for your reply and concern.

There was a package till 2011...or even 2012...entitled Generalized Path
Seeking Regression in R which was using a combination of all of these
methods...as a way to regularize regression.

However, I presently can't find it. Thus, I was wondering if someone in the
R-community could inform us if there is an upgraded version of it with a
different name or another package which does the same thing.

Thank you again very much and my apologies for populating the mail messages
of the community.

Cheers

Christos

On Wed, Apr 17, 2013 at 9:50 AM, Levi Waldron
lwaldron.resea...@gmail.comwrote:

 Perhaps I am wrong, but I think there are only a few packages supporting
 Elastic Net, and none of them also perform Best Subsets.


 On Wed, Apr 17, 2013 at 8:46 AM, Christos Giannoulis 
 cgiann...@gmail.comwrote:

 Merhaba, Hello to you too Mehmet (Yasu ki sena)

 Thank you for your email and especially for sharing this package. I
 appreciate it.

 However, my feeling is that this package does not have the third component
 of Best Subsets (pls correct me if I am wrong). It uses only a combination
 of Ridge and Lasso.

 If you happen to know any other packages that uses all of them I would
 greatly welcome and appreciate if you were so kind and share it. I tried
 to
 search the cran lists but I am not sure I can found something like that.
 That's why I was asking the R-community

 Thank you again for your prompt response!

 Cheers

 Christos

 On Wed, Apr 17, 2013 at 8:16 AM, Suzen, Mehmet msu...@gmail.com wrote:

  Yasu,
 
  Try Elastic nets:
  http://cran.r-project.org/web/packages/pensim/index.html
 
  There some other packages supporting elastic nets: Just search the CRAN
 
  Cheers,
  Mehmet
 
 
  On 17 April 2013 13:19, Christos Giannoulis cgiann...@gmail.com
 wrote:
   Hi all,
  
   I would greatly appreciate if someone was so kind and share with us a
   package or method that uses a regularized regression approach that
  balances
   a regression model performance and model complexity.
  
   That said I would be most grateful is there is an R-package that
 combines
   Ridge (sum of squares coefficients), Lasso: Sum of absolute
 coefficients
   and Best Subsets: Number of coefficients as methods of regularized
   regression.
  
   Sincerely,
  
   Christos Giannoulis
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] odfWeave: Some questions about potential formatting options

2013-04-17 Thread Milan Bouchet-Valat
Le mardi 16 avril 2013 à 10:15 -0700, Paul Miller a écrit :
 Hi Milan and Max,
 
 Thanks to each of you for your reply to my post. Thus far, I've
 managed to find answers to some of the questions I asked initially.
 
 I am now able to control the justification of the leftmost column in
 my tables, as well as to add borders to the top and bottom. I also
 downloaded Milan's revised version of odfWeave at the link below, and
 found that it does a nice job of controlling column widths.
 
 http://nalimilan.perso.neuf.fr/transfert/odfWeave.tar.gz
 
 There are some other things I'm still struggling with though. 
 
 1. Is it possible to get odfTableCaption and odfFigureCaption to make
 the titles they produce bold? I understand it might be possible to
 accomplish this by changing something in the styles but am not sure
 what. If someone can give me a hint, I can likely do the rest.
Just right-click on a caption and choose Edit paragraph style... (in
the template document). Or edit the styles called Table and
Illustration.

 2. Is there any way to get odfFigureCaption to put titles at the top
 of the figure instead of the bottom? I've noticed that odfTableCaption
 is able to do this but apparently not odfFigureCaption.
No idea.

 3. Is it possible to add special characters to the output? Below is a
 sample Kaplan-Meier analysis. There's a footnote in there that reads
 Note: X2(1) = xx.xx, p = .. Is there any way to make the X a
 lowercase Chi and to superscript the 2? I did quite a bit of digging
 on this topic. It sounds like it might be difficult, especially if one
 is using Windows as I am.
For the Chi you can copy the Unicode character χ from e.g. LibreOffice
and use it in the string passed to odfCat() and friends. If that does
not work on Windows, you can also use the escape code \u03C7.

For the ², you can either use the Unicode character (code \u00B2), or
try to insert ODF markup to put a 2 as an exponent (I did not test
second option).


Regards

 Thanks,
 
 Paul 
 
 ##
  Get data 
 ##
 
  Load packages 
 
 require(survival)
 require(MASS)
 
  Sample analysis 
 
 attach(gehan)
 gehan.surv - survfit(Surv(time, cens) ~ treat, data= gehan, conf.type = 
 log-log)
 print(gehan.surv)
 
 survTable - summary(gehan.surv)$table
 survTable - data.frame(Treatment = rownames(survTable), survTable, 
 row.names=NULL)
 survTable - subset(survTable, select = -c(records, n.max))
 
 ##
  odfWeave 
 ##
 
  Load odfWeave 
 
 require(odfWeave)
 
  Modify StyleDefs 
 
 currentDefs - getStyleDefs()
 
 currentDefs$firstColumn$type - Table Column
 currentDefs$firstColumn$columnWidth - 5 cm
 currentDefs$secondColumn$type - Table Column
 currentDefs$secondColumn$columnWidth - 3 cm
 
 currentDefs$ArialCenteredBold$fontSize - 10pt
 currentDefs$ArialNormal$fontSize - 10pt
 currentDefs$ArialCentered$fontSize - 10pt
 currentDefs$ArialHighlight$fontSize - 10pt
 
 currentDefs$ArialLeftBold - currentDefs$ArialCenteredBold
 currentDefs$ArialLeftBold$textAlign - left
 
 currentDefs$cgroupBorder - currentDefs$lowerBorder
 currentDefs$cgroupBorder$topBorder - 0.0007in solid #00
 
 setStyleDefs(currentDefs)
 
  Modify ImageDefs 
 
 imageDefs - getImageDefs()
 imageDefs$dispWidth - 5.5
 imageDefs$dispHeight- 5.5
 setImageDefs(imageDefs)
 
  Modify Styles 
 
 currentStyles - getStyles()
 currentStyles$figureFrame - frameWithBorders
 setStyles(currentStyles)
 
  Set odt table styles 
 
 tableStyles - tableStyles(survTable, useRowNames = FALSE, header = )
 tableStyles$headerCell[1,] - cgroupBorder
 tableStyles$header[,1] - ArialLeftBold
 tableStyles$text[,1] - ArialNormal
 tableStyles$cell[2,] - lowerBorder
 
  Weave odt source file 
 
 fp - N:/Studies/HCRPC1211/Report/odfWeaveTest/
 inFile - paste(fp, testWeaveIn.odt, sep=)
 outFile - paste(fp, testWeaveOut.odt, sep=)
 odfWeave(inFile, outFile)
 
 ##
  Contents of .odt source file 
 ##
 
 Here is a sample Kaplan-Meier table.
 
 testKMTable, echo=FALSE, results = xml=
 odfTableCaption(“A Sample Kaplan-Meier Analysis Table”)
 odfTable(survTable, useRowNames = FALSE, digits = 3,
 colnames = c(Treatment, Number, Events, Median, 95% LCL, 95% UCL),
 colStyles = c(firstColumn, secondColumn, secondColumn, 
 secondColumn, secondColumn, secondColumn),
 styles = tableStyles)
 odfCat(“Note: X2(1) = xx.xx, p = .”)
 @
 
 Here is a sample Kaplan-Meier graph.
 
 testKMFig, echo=FALSE, fig = TRUE=
 odfFigureCaption(A Sample Kaplan-Meier Analysis Graph, label = Figure)
 plot(gehan.surv, xlab = Time, ylab= Survivorship)
 @


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible 

[R] Best way to calculate averages of Blocks in an matrix?

2013-04-17 Thread Keith S Weintraub
Folks,
  I recently was given a simulated data set like the following subset:

sim_sub-structure(list(V11 = c(0.01, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0.01, 0.03, 0, 
0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 0, 0.04), V13 = c(0, 
0, 0, 0.01, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 
0.01), V14 = c(0, 0.01, 0.01, 0.01, 0.01, 0, 0, 0, 0, 0.03, 0, 
0, 0.01, 0.01, 0.04, 0.01, 0.02, 0, 0.01, 0.03), V15 = c(0, 0.01, 
0, 0, 0.01, 0, 0, 0, 0.01, 0.02, 0.01, 0, 0, 0.01, 0, 0, 0, 0.01, 
0.01, 0.04), V16 = c(0, 0, 0, 0.03, 0.02, 0.01, 0, 0, 0.02, 0.02, 
0, 0.02, 0.02, 0, 0.01, 0.01, 0, 0, 0.03, 0.01), V17 = c(0, 0.01, 
0, 0.01, 0, 0, 0, 0.01, 0.05, 0.03, 0, 0.01, 0, 0.02, 0.02, 0, 
0, 0.01, 0.02, 0.04), V18 = c(0, 0.01, 0, 0.03, 0.03, 0, 0, 0, 
0.02, 0.01, 0, 0.02, 0.01, 0.02, 0.03, 0.02, 0, 0, 0.04, 0.04
), V19 = c(0, 0.01, 0.01, 0.02, 0.07, 0, 0, 0, 0.04, 0.01, 0.02, 
0, 0, 0, 0.04, 0, 0, 0, 0, 0.05), V20 = c(0, 0, 0, 0.01, 0.04, 
0.01, 0, 0, 0.02, 0.04, 0.01, 0, 0.02, 0, 0.03, 0, 0.02, 0.01, 
0.03, 0.03)), .Names = c(V11, V12, V13, V14, V15, V16, 
V17, V18, V19, V20), row.names = c(NA, 20L), class = data.frame)

 sim_sub
V11  V12  V13  V14  V15  V16  V17  V18  V19  V20
1  0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2  0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.01 0.00
3  0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00
4  0.01 0.01 0.01 0.01 0.00 0.03 0.01 0.03 0.02 0.01
5  0.00 0.03 0.00 0.01 0.01 0.02 0.00 0.03 0.07 0.04
6  0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01
7  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8  0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00
9  0.00 0.00 0.00 0.00 0.01 0.02 0.05 0.02 0.04 0.02
10 0.00 0.00 0.01 0.03 0.02 0.02 0.03 0.01 0.01 0.04
11 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.01
12 0.00 0.01 0.00 0.00 0.00 0.02 0.01 0.02 0.00 0.00
13 0.00 0.00 0.00 0.01 0.00 0.02 0.00 0.01 0.00 0.02
14 0.00 0.01 0.00 0.01 0.01 0.00 0.02 0.02 0.00 0.00
15 0.00 0.00 0.01 0.04 0.00 0.01 0.02 0.03 0.04 0.03
16 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.02 0.00 0.00
17 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.02
18 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01
19 0.00 0.00 0.00 0.01 0.01 0.03 0.02 0.04 0.00 0.03
20 0.00 0.04 0.01 0.03 0.04 0.01 0.04 0.04 0.05 0.03

Every 5 rows represents one block of simulated data.

What would be the best way to average the blocks?

My way was to reshape sim_sub, average over the columns and then reshape back 
like so:

 matrix(colSums(matrix(t(sim_sub), byrow = TRUE, ncol = 50)), byrow = TRUE, 
 ncol = 10)/4
   [,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]  [,10]
[1,] 0.0050 0. 0. 0.0025 0.0025 0.005 0. 0.0050 0.0050 0.0050
[2,] 0. 0.0025 0. 0.0075 0.0025 0.005 0.0050 0.0075 0.0025 0.0050
[3,] 0. 0. 0. 0.0050 0.0025 0.005 0.0050 0.0025 0.0025 0.0075
[4,] 0.0025 0.0050 0.0025 0.0075 0.0075 0.020 0.0250 0.0275 0.0150 0.0150
[5,] 0. 0.0175 0.0075 0.0275 0.0175 0.015 0.0225 0.0275 0.0425 0.0350


How bad is t(sim_sub) in the above?

Thanks for your time,
KW

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use of names() within lapply()

2013-04-17 Thread arun
Dear Ivan,
No problem.
If you want it in a single plot:


matplot(do.call(cbind,g),ylab=value,pch=1:2,main=Some 
plot,col=c(red,orange),type=o)
 
legend(topleft,inset=.01,lty=c(1,1),title=Plot,col=c(red,orange),names(g),horiz=TRUE)
A.K.



 From: Ivan Alves papu...@me.com
To: Duncan Murdoch murdoch.dun...@gmail.com; arun smartpink...@yahoo.com 
Cc: R-help@r-project.org R-help@r-project.org 
Sent: Wednesday, April 17, 2013 11:33 AM
Subject: Re: [R] use of names() within lapply()
 

Dear Duncan and A.K.
Many thanks for your super quick help. The modified lapply did the trick, 
mapply died with a error Error in dots[[2L]][[1L]] : object of type 'builtin' 
is not subsettable.
Kind regards,
Ivan
On 17 Apr 2013, at 17:12, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 17/04/2013 11:04 AM, Ivan Alves wrote:
 Dear all,
 
 List g has 2 elements
 
  names(g)
 [1] 2009-10-07 2012-02-29
 
 and the list plot
 
 lapply(g, plot, main=names(g))
 
 results in equal plot titles with both list names, whereas distinct titles 
 names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in 
 stead of consecutively passing g[1] and then g[2] to process the additional 
 'main'  argument to plot.  help(lapply) is mute as to what to element-wise 
 pass parameters.  Any suggestion would be appreciated.
 
 I think you want mapply rather than lapply, or you could do lapply on a 
 vector of indices.  For example,
 
 mapply(plot, g, main=names)
 
 or
 
 lapply(1:2, function(i) plot(g[[i]], main=names(g)[i]))
 
 Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to calculate averages of Blocks in an matrix?

2013-04-17 Thread arun
do.call(rbind,lapply(split(sim_sub,((seq_len(nrow(sim_sub))-1)%/% 
5)+1),colMeans))
  #  V11   V12   V13   V14   V15  V16   V17   V18   V19   V20
#1 0.004 0.008 0.002 0.008 0.004 0.01 0.004 0.014 0.022 0.010
#2 0.002 0.000 0.002 0.006 0.006 0.01 0.018 0.006 0.010 0.014
#3 0.000 0.004 0.002 0.012 0.004 0.01 0.010 0.016 0.012 0.012
#4 0.000 0.008 0.002 0.014 0.012 0.01 0.014 0.020 0.010 0.018
A.K.



- Original Message -
From: Keith S Weintraub kw1...@gmail.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 12:54 PM
Subject: [R] Best way to calculate averages of Blocks in an matrix?

Folks,
  I recently was given a simulated data set like the following subset:

sim_sub-structure(list(V11 = c(0.01, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0.01, 0.03, 0, 
0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 0, 0.04), V13 = c(0, 
0, 0, 0.01, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 
0.01), V14 = c(0, 0.01, 0.01, 0.01, 0.01, 0, 0, 0, 0, 0.03, 0, 
0, 0.01, 0.01, 0.04, 0.01, 0.02, 0, 0.01, 0.03), V15 = c(0, 0.01, 
0, 0, 0.01, 0, 0, 0, 0.01, 0.02, 0.01, 0, 0, 0.01, 0, 0, 0, 0.01, 
0.01, 0.04), V16 = c(0, 0, 0, 0.03, 0.02, 0.01, 0, 0, 0.02, 0.02, 
0, 0.02, 0.02, 0, 0.01, 0.01, 0, 0, 0.03, 0.01), V17 = c(0, 0.01, 
0, 0.01, 0, 0, 0, 0.01, 0.05, 0.03, 0, 0.01, 0, 0.02, 0.02, 0, 
0, 0.01, 0.02, 0.04), V18 = c(0, 0.01, 0, 0.03, 0.03, 0, 0, 0, 
0.02, 0.01, 0, 0.02, 0.01, 0.02, 0.03, 0.02, 0, 0, 0.04, 0.04
), V19 = c(0, 0.01, 0.01, 0.02, 0.07, 0, 0, 0, 0.04, 0.01, 0.02, 
0, 0, 0, 0.04, 0, 0, 0, 0, 0.05), V20 = c(0, 0, 0, 0.01, 0.04, 
0.01, 0, 0, 0.02, 0.04, 0.01, 0, 0.02, 0, 0.03, 0, 0.02, 0.01, 
0.03, 0.03)), .Names = c(V11, V12, V13, V14, V15, V16, 
V17, V18, V19, V20), row.names = c(NA, 20L), class = data.frame)

 sim_sub
    V11  V12  V13  V14  V15  V16  V17  V18  V19  V20
1  0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2  0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.01 0.00
3  0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00
4  0.01 0.01 0.01 0.01 0.00 0.03 0.01 0.03 0.02 0.01
5  0.00 0.03 0.00 0.01 0.01 0.02 0.00 0.03 0.07 0.04
6  0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01
7  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8  0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00
9  0.00 0.00 0.00 0.00 0.01 0.02 0.05 0.02 0.04 0.02
10 0.00 0.00 0.01 0.03 0.02 0.02 0.03 0.01 0.01 0.04
11 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.01
12 0.00 0.01 0.00 0.00 0.00 0.02 0.01 0.02 0.00 0.00
13 0.00 0.00 0.00 0.01 0.00 0.02 0.00 0.01 0.00 0.02
14 0.00 0.01 0.00 0.01 0.01 0.00 0.02 0.02 0.00 0.00
15 0.00 0.00 0.01 0.04 0.00 0.01 0.02 0.03 0.04 0.03
16 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.02 0.00 0.00
17 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.02
18 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01
19 0.00 0.00 0.00 0.01 0.01 0.03 0.02 0.04 0.00 0.03
20 0.00 0.04 0.01 0.03 0.04 0.01 0.04 0.04 0.05 0.03

Every 5 rows represents one block of simulated data.

What would be the best way to average the blocks?

My way was to reshape sim_sub, average over the columns and then reshape back 
like so:

 matrix(colSums(matrix(t(sim_sub), byrow = TRUE, ncol = 50)), byrow = TRUE, 
 ncol = 10)/4
       [,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]  [,10]
[1,] 0.0050 0. 0. 0.0025 0.0025 0.005 0. 0.0050 0.0050 0.0050
[2,] 0. 0.0025 0. 0.0075 0.0025 0.005 0.0050 0.0075 0.0025 0.0050
[3,] 0. 0. 0. 0.0050 0.0025 0.005 0.0050 0.0025 0.0025 0.0075
[4,] 0.0025 0.0050 0.0025 0.0075 0.0075 0.020 0.0250 0.0275 0.0150 0.0150
[5,] 0. 0.0175 0.0075 0.0275 0.0175 0.015 0.0225 0.0275 0.0425 0.0350


How bad is t(sim_sub) in the above?

Thanks for your time,
KW

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge

2013-04-17 Thread Janesh Devkota
Hi, I have a quick question here. Lets say he has three data frames and he
needs to combine those three data frame using merge. Can we simply use
merge to join three data frames ? I remember I had some problem using merge
for more than two dataframes.

Thanks.


On Wed, Apr 17, 2013 at 1:05 AM, Farnoosh farnoosh...@yahoo.com wrote:

 Thanks a lot:)

 Sent from my iPad

 On Apr 16, 2013, at 10:15 PM, arun smartpink...@yahoo.com wrote:

  Hi Farnoosh,
  YOu can use either ?merge() or ?join()
  DataA- read.table(text=
  ID v1
  1 10
  2 1
  3 22
  4 15
  5 3
  6 6
  7 8
  ,sep=,header=TRUE)
 
  DataB- read.table(text=
  ID v2
  2 yes
  5 no
  7 yes
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
 
  merge(DataA,DataB,by=ID,all.x=TRUE)
  #  ID v1   v2
  #1  1 10 NA
  #2  2  1  yes
  #3  3 22 NA
  #4  4 15 NA
  #5  5  3   no
  #6  6  6 NA
  #7  7  8  yes
   library(plyr)
   join(DataA,DataB,by=ID,type=left)
  #  ID v1   v2
  #1  1 10 NA
  #2  2  1  yes
  #3  3 22 NA
  #4  4 15 NA
  #5  5  3   no
  #6  6  6 NA
  #7  7  8  yes
  A.K.
 
 
 
 
 
  
  From: farnoosh sheikhi farnoosh...@yahoo.com
  To: smartpink...@yahoo.com smartpink...@yahoo.com
  Sent: Wednesday, April 17, 2013 12:52 AM
  Subject: Merge
 
 
 
  Hi Arun,
 
  I want to merge a data set with another data frame with 2 columns and
 keep the sample size of the DataA.
 
  DataA  DataB  DataCombine
  ID v1  ID V2  ID v1 v2
  1 10  2 yes  1 10 NA
  2 1  5 no  2 1 yes
  3 22  7 yes  3 22 NA
  4 15 4 15 NA
  5 3 5 3 no
  6 6 6 6 NA
  7 8 7 8 yes
 
 
  Thanks a lot for your help and time.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to calculate averages of Blocks in an matrix?

2013-04-17 Thread arun
Also,
do.call(rbind,lapply(split(sim_sub,rep(1:(1+nrow(sim_sub)/5),each=5)[seq_len(nrow(sim_sub))]),colMeans))
#    V11   V12   V13   V14   V15  V16   V17   V18   V19   V20
#1 0.004 0.008 0.002 0.008 0.004 0.01 0.004 0.014 0.022 0.010
#2 0.002 0.000 0.002 0.006 0.006 0.01 0.018 0.006 0.010 0.014
#3 0.000 0.004 0.002 0.012 0.004 0.01 0.010 0.016 0.012 0.012
#4 0.000 0.008 0.002 0.014 0.012 0.01 0.014 0.020 0.010 0.018
A.K.




- Original Message -
From: arun smartpink...@yahoo.com
To: Keith S Weintraub kw1...@gmail.com
Cc: R help r-help@r-project.org
Sent: Wednesday, April 17, 2013 1:04 PM
Subject: Re: [R] Best way to calculate averages of Blocks in an matrix?

do.call(rbind,lapply(split(sim_sub,((seq_len(nrow(sim_sub))-1)%/% 
5)+1),colMeans))
  #  V11   V12   V13   V14   V15  V16   V17   V18   V19   V20
#1 0.004 0.008 0.002 0.008 0.004 0.01 0.004 0.014 0.022 0.010
#2 0.002 0.000 0.002 0.006 0.006 0.01 0.018 0.006 0.010 0.014
#3 0.000 0.004 0.002 0.012 0.004 0.01 0.010 0.016 0.012 0.012
#4 0.000 0.008 0.002 0.014 0.012 0.01 0.014 0.020 0.010 0.018
A.K.



- Original Message -
From: Keith S Weintraub kw1...@gmail.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Wednesday, April 17, 2013 12:54 PM
Subject: [R] Best way to calculate averages of Blocks in an matrix?

Folks,
  I recently was given a simulated data set like the following subset:

sim_sub-structure(list(V11 = c(0.01, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0.01, 0.03, 0, 
0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 0, 0.04), V13 = c(0, 
0, 0, 0.01, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 
0.01), V14 = c(0, 0.01, 0.01, 0.01, 0.01, 0, 0, 0, 0, 0.03, 0, 
0, 0.01, 0.01, 0.04, 0.01, 0.02, 0, 0.01, 0.03), V15 = c(0, 0.01, 
0, 0, 0.01, 0, 0, 0, 0.01, 0.02, 0.01, 0, 0, 0.01, 0, 0, 0, 0.01, 
0.01, 0.04), V16 = c(0, 0, 0, 0.03, 0.02, 0.01, 0, 0, 0.02, 0.02, 
0, 0.02, 0.02, 0, 0.01, 0.01, 0, 0, 0.03, 0.01), V17 = c(0, 0.01, 
0, 0.01, 0, 0, 0, 0.01, 0.05, 0.03, 0, 0.01, 0, 0.02, 0.02, 0, 
0, 0.01, 0.02, 0.04), V18 = c(0, 0.01, 0, 0.03, 0.03, 0, 0, 0, 
0.02, 0.01, 0, 0.02, 0.01, 0.02, 0.03, 0.02, 0, 0, 0.04, 0.04
), V19 = c(0, 0.01, 0.01, 0.02, 0.07, 0, 0, 0, 0.04, 0.01, 0.02, 
0, 0, 0, 0.04, 0, 0, 0, 0, 0.05), V20 = c(0, 0, 0, 0.01, 0.04, 
0.01, 0, 0, 0.02, 0.04, 0.01, 0, 0.02, 0, 0.03, 0, 0.02, 0.01, 
0.03, 0.03)), .Names = c(V11, V12, V13, V14, V15, V16, 
V17, V18, V19, V20), row.names = c(NA, 20L), class = data.frame)

 sim_sub
    V11  V12  V13  V14  V15  V16  V17  V18  V19  V20
1  0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2  0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.01 0.00
3  0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00
4  0.01 0.01 0.01 0.01 0.00 0.03 0.01 0.03 0.02 0.01
5  0.00 0.03 0.00 0.01 0.01 0.02 0.00 0.03 0.07 0.04
6  0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01
7  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8  0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00
9  0.00 0.00 0.00 0.00 0.01 0.02 0.05 0.02 0.04 0.02
10 0.00 0.00 0.01 0.03 0.02 0.02 0.03 0.01 0.01 0.04
11 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.01
12 0.00 0.01 0.00 0.00 0.00 0.02 0.01 0.02 0.00 0.00
13 0.00 0.00 0.00 0.01 0.00 0.02 0.00 0.01 0.00 0.02
14 0.00 0.01 0.00 0.01 0.01 0.00 0.02 0.02 0.00 0.00
15 0.00 0.00 0.01 0.04 0.00 0.01 0.02 0.03 0.04 0.03
16 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.02 0.00 0.00
17 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.02
18 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01
19 0.00 0.00 0.00 0.01 0.01 0.03 0.02 0.04 0.00 0.03
20 0.00 0.04 0.01 0.03 0.04 0.01 0.04 0.04 0.05 0.03

Every 5 rows represents one block of simulated data.

What would be the best way to average the blocks?

My way was to reshape sim_sub, average over the columns and then reshape back 
like so:

 matrix(colSums(matrix(t(sim_sub), byrow = TRUE, ncol = 50)), byrow = TRUE, 
 ncol = 10)/4
       [,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]  [,10]
[1,] 0.0050 0. 0. 0.0025 0.0025 0.005 0. 0.0050 0.0050 0.0050
[2,] 0. 0.0025 0. 0.0075 0.0025 0.005 0.0050 0.0075 0.0025 0.0050
[3,] 0. 0. 0. 0.0050 0.0025 0.005 0.0050 0.0025 0.0025 0.0075
[4,] 0.0025 0.0050 0.0025 0.0075 0.0075 0.020 0.0250 0.0275 0.0150 0.0150
[5,] 0. 0.0175 0.0075 0.0275 0.0175 0.015 0.0225 0.0275 0.0425 0.0350


How bad is t(sim_sub) in the above?

Thanks for your time,
KW

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to calculate averages of Blocks in an matrix?

2013-04-17 Thread Rui Barradas

Hello,

Try the following.


blocks - rep(1:(1 + nrow(sim_sub) %/% 5), each = 5)[seq_len(nrow(sim_sub))]
aggregate(sim_sub, list(blocks), FUN = mean)


Hope this helps,

Rui Barradas

Em 17-04-2013 18:04, arun escreveu:

do.call(rbind,lapply(split(sim_sub,((seq_len(nrow(sim_sub))-1)%/% 
5)+1),colMeans))
   #  V11   V12   V13   V14   V15  V16   V17   V18   V19   V20
#1 0.004 0.008 0.002 0.008 0.004 0.01 0.004 0.014 0.022 0.010
#2 0.002 0.000 0.002 0.006 0.006 0.01 0.018 0.006 0.010 0.014
#3 0.000 0.004 0.002 0.012 0.004 0.01 0.010 0.016 0.012 0.012
#4 0.000 0.008 0.002 0.014 0.012 0.01 0.014 0.020 0.010 0.018
A.K.



- Original Message -
From: Keith S Weintraub kw1...@gmail.com
To: r-help@r-project.org r-help@r-project.org
Cc:
Sent: Wednesday, April 17, 2013 12:54 PM
Subject: [R] Best way to calculate averages of Blocks in an matrix?

Folks,
   I recently was given a simulated data set like the following subset:

sim_sub-structure(list(V11 = c(0.01, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0.01, 0.03, 0,
0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 0, 0.04), V13 = c(0,
0, 0, 0.01, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 0.01, 0, 0, 0, 0,
0.01), V14 = c(0, 0.01, 0.01, 0.01, 0.01, 0, 0, 0, 0, 0.03, 0,
0, 0.01, 0.01, 0.04, 0.01, 0.02, 0, 0.01, 0.03), V15 = c(0, 0.01,
0, 0, 0.01, 0, 0, 0, 0.01, 0.02, 0.01, 0, 0, 0.01, 0, 0, 0, 0.01,
0.01, 0.04), V16 = c(0, 0, 0, 0.03, 0.02, 0.01, 0, 0, 0.02, 0.02,
0, 0.02, 0.02, 0, 0.01, 0.01, 0, 0, 0.03, 0.01), V17 = c(0, 0.01,
0, 0.01, 0, 0, 0, 0.01, 0.05, 0.03, 0, 0.01, 0, 0.02, 0.02, 0,
0, 0.01, 0.02, 0.04), V18 = c(0, 0.01, 0, 0.03, 0.03, 0, 0, 0,
0.02, 0.01, 0, 0.02, 0.01, 0.02, 0.03, 0.02, 0, 0, 0.04, 0.04
), V19 = c(0, 0.01, 0.01, 0.02, 0.07, 0, 0, 0, 0.04, 0.01, 0.02,
0, 0, 0, 0.04, 0, 0, 0, 0, 0.05), V20 = c(0, 0, 0, 0.01, 0.04,
0.01, 0, 0, 0.02, 0.04, 0.01, 0, 0.02, 0, 0.03, 0, 0.02, 0.01,
0.03, 0.03)), .Names = c(V11, V12, V13, V14, V15, V16,
V17, V18, V19, V20), row.names = c(NA, 20L), class = data.frame)


sim_sub

 V11  V12  V13  V14  V15  V16  V17  V18  V19  V20
1  0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2  0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.01 0.00
3  0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00
4  0.01 0.01 0.01 0.01 0.00 0.03 0.01 0.03 0.02 0.01
5  0.00 0.03 0.00 0.01 0.01 0.02 0.00 0.03 0.07 0.04
6  0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01
7  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
8  0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00
9  0.00 0.00 0.00 0.00 0.01 0.02 0.05 0.02 0.04 0.02
10 0.00 0.00 0.01 0.03 0.02 0.02 0.03 0.01 0.01 0.04
11 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.01
12 0.00 0.01 0.00 0.00 0.00 0.02 0.01 0.02 0.00 0.00
13 0.00 0.00 0.00 0.01 0.00 0.02 0.00 0.01 0.00 0.02
14 0.00 0.01 0.00 0.01 0.01 0.00 0.02 0.02 0.00 0.00
15 0.00 0.00 0.01 0.04 0.00 0.01 0.02 0.03 0.04 0.03
16 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.02 0.00 0.00
17 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.02
18 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01
19 0.00 0.00 0.00 0.01 0.01 0.03 0.02 0.04 0.00 0.03
20 0.00 0.04 0.01 0.03 0.04 0.01 0.04 0.04 0.05 0.03

Every 5 rows represents one block of simulated data.

What would be the best way to average the blocks?

My way was to reshape sim_sub, average over the columns and then reshape back 
like so:


matrix(colSums(matrix(t(sim_sub), byrow = TRUE, ncol = 50)), byrow = TRUE, ncol 
= 10)/4

[,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]  [,10]
[1,] 0.0050 0. 0. 0.0025 0.0025 0.005 0. 0.0050 0.0050 0.0050
[2,] 0. 0.0025 0. 0.0075 0.0025 0.005 0.0050 0.0075 0.0025 0.0050
[3,] 0. 0. 0. 0.0050 0.0025 0.005 0.0050 0.0025 0.0025 0.0075
[4,] 0.0025 0.0050 0.0025 0.0075 0.0075 0.020 0.0250 0.0275 0.0150 0.0150
[5,] 0. 0.0175 0.0075 0.0275 0.0175 0.015 0.0225 0.0275 0.0425 0.0350


How bad is t(sim_sub) in the above?

Thanks for your time,
KW

--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mancova with R

2013-04-17 Thread peter dalgaard

On Apr 17, 2013, at 16:47 , Rémi Lesmerises wrote:

 Dear all,
 
 I'm trying to compare two sets of variables, the first set is composed 
 exclusively of numerical variables and the second regroups factors and 
 numerical variables. I can't use a Manova because of this inclusion of 
 numerical variables in the second set. The solution should be to perform a 
 Mancova, but I didn't find any package that allow this type of test.
 
 I've already looked in this forum and on the net to find answers, but the 
 only thing I've found is the following:
 
 
 lm(as.matrix(Y) ~  x+z)
 x and z could be numerical and factors. The problem with that is it actually 
 only perform a succession of lm (or glm), one for each numerical variable 
 contained in the Y matrix. It is not a true MANCOVA that do a significance 
 test (most often a Wald test) for the overall two sets comparison. Such a 
 test is available in SPSS and SAS, but I really want to stay in R! Someone 
 have any idea?

You can fit two models and compare them with (say)

fit1 - lm(as.matrix(Y) ~  x+z)
fit2 - lm(as.matrix(Y) ~  x)
anova(fit1, fit2, test=Wilks)

or, removing terms sequentially:

anova(fit1, test=Wilks)

 
 Thanks in advance for your help!
  
 Rémi Lesmerises, biol. M.Sc.,
 Candidat Ph.D. en Biologie
 Université du Québec à Rimouski
 remilesmeri...@yahoo.ca
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge

2013-04-17 Thread arun


HI Janesh,

YOu can use:
library(plyr)
?join_all() 

#From the help page:

 dfs - list(
   a = data.frame(x = 1:10, a = runif(10)),
   b = data.frame(x = 1:10, b = runif(10)),
   c = data.frame(x = 1:10, c = runif(10))
 )
 join_all(dfs)
 join_all(dfs, x)

 join_all(dfs, x)
#    x a b c
#1   1 0.7113766 0.1348978 0.1153703
#2   2 0.2520057 0.7249154 0.2362936
#3   3 0.5670157 0.8166805 0.3049683
#4   4 0.7441726 0.4929165 0.6779029
#5   5 0.5616914 0.5272339 0.6202915
#6   6 0.2858429 0.1203205 0.8399356
#7   7 0.9910520 0.1251815 0.4729418
#8   8 0.7079778 0.5465055 0.8951371
#9   9 0.0564100 0.1837211 0.6451289
#10 10 0.7169663 0.1328287 0.2467554
 Reduce(function(...) merge(...,by=x),dfs)
#    x a b c
#1   1 0.7113766 0.1348978 0.1153703
#2   2 0.2520057 0.7249154 0.2362936
#3   3 0.5670157 0.8166805 0.3049683
#4   4 0.7441726 0.4929165 0.6779029
#5   5 0.5616914 0.5272339 0.6202915
#6   6 0.2858429 0.1203205 0.8399356
#7   7 0.9910520 0.1251815 0.4729418
#8   8 0.7079778 0.5465055 0.8951371
#9   9 0.0564100 0.1837211 0.6451289
#10 10 0.7169663 0.1328287 0.2467554
A.K.



 From: Janesh Devkota janesh.devk...@gmail.com
To: Farnoosh farnoosh...@yahoo.com 
Cc: arun smartpink...@yahoo.com; R help r-help@r-project.org 
Sent: Wednesday, April 17, 2013 1:05 PM
Subject: Re: [R] Merge
 


Hi, I have a quick question here. Lets say he has three data frames and he 
needs to combine those three data frame using merge. Can we simply use merge to 
join three data frames ? I remember I had some problem using merge for more 
than two dataframes. 

Thanks.



On Wed, Apr 17, 2013 at 1:05 AM, Farnoosh farnoosh...@yahoo.com wrote:

Thanks a lot:)

Sent from my iPad


On Apr 16, 2013, at 10:15 PM, arun smartpink...@yahoo.com wrote:

 Hi Farnoosh,
 YOu can use either ?merge() or ?join()
 DataA- read.table(text=
 ID     v1
 1     10
 2     1
 3     22
 4     15
 5     3
 6     6
 7     8
 ,sep=,header=TRUE)

 DataB- read.table(text=
 ID v2
 2 yes
 5 no
 7 yes
 ,sep=,header=TRUE,stringsAsFactors=FALSE)

 merge(DataA,DataB,by=ID,all.x=TRUE)
 #  ID v1   v2
 #1  1 10 NA
 #2  2  1  yes
 #3  3 22 NA
 #4  4 15 NA
 #5  5  3   no
 #6  6  6 NA
 #7  7  8  yes
  library(plyr)
  join(DataA,DataB,by=ID,type=left)
 #  ID v1   v2
 #1  1 10 NA
 #2  2  1  yes
 #3  3 22 NA
 #4  4 15 NA
 #5  5  3   no
 #6  6  6 NA
 #7  7  8  yes
 A.K.





 
 From: farnoosh sheikhi farnoosh...@yahoo.com
 To: smartpink...@yahoo.com smartpink...@yahoo.com
 Sent: Wednesday, April 17, 2013 12:52 AM
 Subject: Merge



 Hi Arun,

 I want to merge a data set with another data frame with 2 columns and keep 
 the sample size of the DataA.

 DataA  DataB  DataCombine
 ID v1  ID V2  ID v1 v2
 1 10  2 yes  1 10 NA
 2 1  5 no  2 1 yes
 3 22  7 yes  3 22 NA
 4 15     4 15 NA
 5 3     5 3 no
 6 6     6 6 NA
 7 8     7 8 yes


 Thanks a lot for your help and time.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

2013-04-17 Thread Jan Holstein
I have 11 possible predictor variables and use them to model quite a few
target variables. 
In search for a consistent manner and possibly non-manual manner to identify
the significant predictor vars out of the eleven I thought the option
select=T might do.

Example: (here only 4 pedictors) 
first is vanilla with select=F

 fit1-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=F)
 summary(fit1)

Family: quasi 
Link function: log 
Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
Estimate Std. Error t value Pr(|t|)  
(Intercept)   -34.57  20.47  -1.689   0.0913 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Approximate significance of smooth terms:
edf Ref.df  F  p-value
s(mgs)2.335  2.623  0.2600.829
s(gsd)6.868  7.506 13.955   2e-16 ***
s(mud)8.990  9.000 11.727   2e-16 ***
s(ssCmax) 6.770  6.978  6.664 7.68e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

R-sq.(adj) =  0.402   Deviance explained = 40.4%
GCV score = 8.8563e+05  Scale est. = 8.8053e+05  n = 4511



then turn select=TRUE




fit2-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=TRUE)
 summary(fit2)

Family: quasi 
Link function: log 

Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)   0.1585 1.7439   0.0910.928
Approximate significance of smooth terms:
edf Ref.df F p-value
s(mgs)2.456  8 24.50  2e-16 ***
s(gsd)7.272  9 14.33  2e-16 ***
s(mud)7.678  9 20.38  2e-16 ***
s(ssCmax) 6.556  9 14.36  2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
R-sq.(adj) =  0.397   Deviance explained =   40%
GCV score = 8.9209e+05  Scale est. = 8.8715e+05  n = 4511

I seem to not fully understand how to work with select.
The predictor mgs is obviously not significant, as seen from fit
(above), yet here it appears as significant. Why was it not dropped? How are
not-significant predictors are identified? 





--
View this message in context: 
http://r.789695.n4.nabble.com/mgcv-how-select-significant-predictor-vars-when-using-gam-select-TRUE-using-automatic-optimization-tp4664510.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge

2013-04-17 Thread Janesh Devkota
Hello Arun,

Thank you so much for the prompt reply. I have one simple question here.
DOes three dots (...) in the reduce function means we are applying for
three dataframes here ? So, if we were to combine four would that dots be
four dots ?

Thanks.


On Wed, Apr 17, 2013 at 12:16 PM, arun smartpink...@yahoo.com wrote:



 HI Janesh,

 YOu can use:
 library(plyr)
 ?join_all()

 #From the help page:

  dfs - list(
a = data.frame(x = 1:10, a = runif(10)),
b = data.frame(x = 1:10, b = runif(10)),
c = data.frame(x = 1:10, c = runif(10))
  )
  join_all(dfs)
  join_all(dfs, x)

  join_all(dfs, x)
 #x a b c
 #1   1 0.7113766 0.1348978 0.1153703
 #2   2 0.2520057 0.7249154 0.2362936
 #3   3 0.5670157 0.8166805 0.3049683
 #4   4 0.7441726 0.4929165 0.6779029
 #5   5 0.5616914 0.5272339 0.6202915
 #6   6 0.2858429 0.1203205 0.8399356
 #7   7 0.9910520 0.1251815 0.4729418
 #8   8 0.7079778 0.5465055 0.8951371
 #9   9 0.0564100 0.1837211 0.6451289
 #10 10 0.7169663 0.1328287 0.2467554
  Reduce(function(...) merge(...,by=x),dfs)
 #x a b c
 #1   1 0.7113766 0.1348978 0.1153703
 #2   2 0.2520057 0.7249154 0.2362936
 #3   3 0.5670157 0.8166805 0.3049683
 #4   4 0.7441726 0.4929165 0.6779029
 #5   5 0.5616914 0.5272339 0.6202915
 #6   6 0.2858429 0.1203205 0.8399356
 #7   7 0.9910520 0.1251815 0.4729418
 #8   8 0.7079778 0.5465055 0.8951371
 #9   9 0.0564100 0.1837211 0.6451289
 #10 10 0.7169663 0.1328287 0.2467554
 A.K.


 
  From: Janesh Devkota janesh.devk...@gmail.com
 To: Farnoosh farnoosh...@yahoo.com
 Cc: arun smartpink...@yahoo.com; R help r-help@r-project.org
 Sent: Wednesday, April 17, 2013 1:05 PM
 Subject: Re: [R] Merge



 Hi, I have a quick question here. Lets say he has three data frames and he
 needs to combine those three data frame using merge. Can we simply use
 merge to join three data frames ? I remember I had some problem using merge
 for more than two dataframes.

 Thanks.



 On Wed, Apr 17, 2013 at 1:05 AM, Farnoosh farnoosh...@yahoo.com wrote:

 Thanks a lot:)
 
 Sent from my iPad
 
 
 On Apr 16, 2013, at 10:15 PM, arun smartpink...@yahoo.com wrote:
 
  Hi Farnoosh,
  YOu can use either ?merge() or ?join()
  DataA- read.table(text=
  ID v1
  1 10
  2 1
  3 22
  4 15
  5 3
  6 6
  7 8
  ,sep=,header=TRUE)
 
  DataB- read.table(text=
  ID v2
  2 yes
  5 no
  7 yes
  ,sep=,header=TRUE,stringsAsFactors=FALSE)
 
  merge(DataA,DataB,by=ID,all.x=TRUE)
  #  ID v1   v2
  #1  1 10 NA
  #2  2  1  yes
  #3  3 22 NA
  #4  4 15 NA
  #5  5  3   no
  #6  6  6 NA
  #7  7  8  yes
   library(plyr)
   join(DataA,DataB,by=ID,type=left)
  #  ID v1   v2
  #1  1 10 NA
  #2  2  1  yes
  #3  3 22 NA
  #4  4 15 NA
  #5  5  3   no
  #6  6  6 NA
  #7  7  8  yes
  A.K.
 
 
 
 
 
  
  From: farnoosh sheikhi farnoosh...@yahoo.com
  To: smartpink...@yahoo.com smartpink...@yahoo.com
  Sent: Wednesday, April 17, 2013 12:52 AM
  Subject: Merge
 
 
 
  Hi Arun,
 
  I want to merge a data set with another data frame with 2 columns and
 keep the sample size of the DataA.
 
  DataA  DataB  DataCombine
  ID v1  ID V2  ID v1 v2
  1 10  2 yes  1 10 NA
  2 1  5 no  2 1 yes
  3 22  7 yes  3 22 NA
  4 15 4 15 NA
  5 3 5 3 no
  6 6 6 6 NA
  7 8 7 8 yes
 
 
  Thanks a lot for your help and time.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge

2013-04-17 Thread arun
No, you don't have to use four dots.
Please check these links for further details:


http://stackoverflow.com/questions/5890576/usage-of-three-dots-or-dot-dot-dot-in-functions
http://cran.r-project.org/doc/manuals/R-lang.pdf
A.K.



 From: Janesh Devkota janesh.devk...@gmail.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org; Farnoosh farnoosh...@yahoo.com 
Sent: Wednesday, April 17, 2013 1:23 PM
Subject: Re: [R] Merge
 


Hello Arun, 

Thank you so much for the prompt reply. I have one simple question here. DOes 
three dots (...) in the reduce function means we are applying for three 
dataframes here ? So, if we were to combine four would that dots be four dots ? 

Thanks.



On Wed, Apr 17, 2013 at 12:16 PM, arun smartpink...@yahoo.com wrote:



HI Janesh,

YOu can use:
library(plyr)
?join_all()

#From the help page:

 dfs - list(
   a = data.frame(x = 1:10, a = runif(10)),
   b = data.frame(x = 1:10, b = runif(10)),
   c = data.frame(x = 1:10, c = runif(10))
 )
 join_all(dfs)
 join_all(dfs, x)

 join_all(dfs, x)
#    x a b c
#1   1 0.7113766 0.1348978 0.1153703
#2   2 0.2520057 0.7249154 0.2362936
#3   3 0.5670157 0.8166805 0.3049683
#4   4 0.7441726 0.4929165 0.6779029
#5   5 0.5616914 0.5272339 0.6202915
#6   6 0.2858429 0.1203205 0.8399356
#7   7 0.9910520 0.1251815 0.4729418
#8   8 0.7079778 0.5465055 0.8951371
#9   9 0.0564100 0.1837211 0.6451289
#10 10 0.7169663 0.1328287 0.2467554
 Reduce(function(...) merge(...,by=x),dfs)
#    x a b c
#1   1 0.7113766 0.1348978 0.1153703
#2   2 0.2520057 0.7249154 0.2362936
#3   3 0.5670157 0.8166805 0.3049683
#4   4 0.7441726 0.4929165 0.6779029
#5   5 0.5616914 0.5272339 0.6202915
#6   6 0.2858429 0.1203205 0.8399356
#7   7 0.9910520 0.1251815 0.4729418
#8   8 0.7079778 0.5465055 0.8951371
#9   9 0.0564100 0.1837211 0.6451289
#10 10 0.7169663 0.1328287 0.2467554
A.K.




 From: Janesh Devkota janesh.devk...@gmail.com
To: Farnoosh farnoosh...@yahoo.com
Cc: arun smartpink...@yahoo.com; R help r-help@r-project.org
Sent: Wednesday, April 17, 2013 1:05 PM
Subject: Re: [R] Merge




Hi, I have a quick question here. Lets say he has three data frames and he 
needs to combine those three data frame using merge. Can we simply use merge 
to join three data frames ? I remember I had some problem using merge for more 
than two dataframes. 

Thanks.



On Wed, Apr 17, 2013 at 1:05 AM, Farnoosh farnoosh...@yahoo.com wrote:

Thanks a lot:)

Sent from my iPad


On Apr 16, 2013, at 10:15 PM, arun smartpink...@yahoo.com wrote:

 Hi Farnoosh,
 YOu can use either ?merge() or ?join()
 DataA- read.table(text=
 ID     v1
 1     10
 2     1
 3     22
 4     15
 5     3
 6     6
 7     8
 ,sep=,header=TRUE)

 DataB- read.table(text=
 ID v2
 2 yes
 5 no
 7 yes
 ,sep=,header=TRUE,stringsAsFactors=FALSE)

 merge(DataA,DataB,by=ID,all.x=TRUE)
 #  ID v1   v2
 #1  1 10 NA
 #2  2  1  yes
 #3  3 22 NA
 #4  4 15 NA
 #5  5  3   no
 #6  6  6 NA
 #7  7  8  yes
  library(plyr)
  join(DataA,DataB,by=ID,type=left)
 #  ID v1   v2
 #1  1 10 NA
 #2  2  1  yes
 #3  3 22 NA
 #4  4 15 NA
 #5  5  3   no
 #6  6  6 NA
 #7  7  8  yes
 A.K.





 
 From: farnoosh sheikhi farnoosh...@yahoo.com
 To: smartpink...@yahoo.com smartpink...@yahoo.com
 Sent: Wednesday, April 17, 2013 12:52 AM
 Subject: Merge



 Hi Arun,

 I want to merge a data set with another data frame with 2 columns and keep 
 the sample size of the DataA.

 DataA  DataB  DataCombine
 ID v1  ID V2  ID v1 v2
 1 10  2 yes  1 10 NA
 2 1  5 no  2 1 yes
 3 22  7 yes  3 22 NA
 4 15     4 15 NA
 5 3     5 3 no
 6 6     6 6 NA
 7 8     7 8 yes


 Thanks a lot for your help and time.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proper way to handle obsolete function names

2013-04-17 Thread R. Michael Weylandt michael.weyla...@gmail.com


On Apr 17, 2013, at 10:17 AM, Jannis bt_jan...@yahoo.de wrote:

 Dear R community,
 
 
 what would be the proper R way to handle obsolete function names? I have 
 created several packages with functions and sometimes would like to change 
 the name of a function but would like to create a mechanism that other 
 scripts of functions using the old name still work.

It sounds like you want .Deprecate

?.Deprecate

Michael

 
 
 Cheers
 Jannis
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mgcv: how select significant predictor vars when using gam(...select=TRUE) using automatic optimization

2013-04-17 Thread Simon Wood

Jan,

What mgcv version are you using, please? (Older versions have a poor 
p-value approximation when select=TRUE, but of course it's possible that 
you've managed to break the newer approximation as well)


The 'select=TRUE' option adds a penalty to each smooth, to allow it to 
be penalized out of the model altogether via optimization of the 
smoothing parameter selection criterion. Usually it is better to use 
REML for smoothing parameter selection in this case using 
'method=REML' as an option to gam. This is because REML is less prone 
to undersmoothing than GCV. So 'select=TRUE' is not selecting on the 
basis of the p-values, themselves, but obviously this sort of 
discrepancy should not be happening.


best,
Simon

On 17/04/13 15:50, Jan Holstein wrote:

I have 11 possible predictor variables and use them to model quite a few
target variables.
In search for a consistent manner and possibly non-manual manner to identify
the significant predictor vars out of the eleven I thought the option
select=T might do.

Example: (here only 4 pedictors)
first is vanilla with select=F


fit1-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=F)
summary(fit1)


Family: quasi
Link function: log
Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)   -34.57  20.47  -1.689   0.0913 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
 edf Ref.df  F  p-value
s(mgs)2.335  2.623  0.2600.829
s(gsd)6.868  7.506 13.955   2e-16 ***
s(mud)8.990  9.000 11.727   2e-16 ***
s(ssCmax) 6.770  6.978  6.664 7.68e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.402   Deviance explained = 40.4%
GCV score = 8.8563e+05  Scale est. = 8.8053e+05  n = 4511



then turn select=TRUE




fit2-gam(target~s(mgs)+s(gsd)+s(mud)+s(ssCmax),family=quasi(link=log),data=wspe1,select=TRUE)

summary(fit2)


Family: quasi
Link function: log

Formula:
target ~ s(mgs) + s(gsd) + s(mud) + s(ssCmax)
Parametric coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept)   0.1585 1.7439   0.0910.928
Approximate significance of smooth terms:
 edf Ref.df F p-value
s(mgs)2.456  8 24.50  2e-16 ***
s(gsd)7.272  9 14.33  2e-16 ***
s(mud)7.678  9 20.38  2e-16 ***
s(ssCmax) 6.556  9 14.36  2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) =  0.397   Deviance explained =   40%
GCV score = 8.9209e+05  Scale est. = 8.8715e+05  n = 4511

I seem to not fully understand how to work with select.
The predictor mgs is obviously not significant, as seen from fit
(above), yet here it appears as significant. Why was it not dropped? How are
not-significant predictors are identified?





--
View this message in context: 
http://r.789695.n4.nabble.com/mgcv-how-select-significant-predictor-vars-when-using-gam-select-TRUE-using-automatic-optimization-tp4664510.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603   http://people.bath.ac.uk/sw283

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proper way to handle obsolete function names

2013-04-17 Thread Peter Langfelder
On Wed, Apr 17, 2013 at 10:36 AM, R. Michael Weylandt
michael.weyla...@gmail.com

 It sounds like you want .Deprecate

 ?.Deprecate


Perhaps you meant Deprecated?

?Deprecated

Best,

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simulation\bootstrap of list factors

2013-04-17 Thread Berg, Tobias van den
Dear R experts,

I am trying to simulate a list containing data matrices. Unfortunately, I don't 
manage to get it to work.

A small example:

n=5
nbootstrap=2

  subsets-list()
  for (i in 1:n){
subsets[[i]] -   rnorm(5, mean=80, sd=1)


for (j in 1:nbootstrap){
  test-list()
  test[[j]]-subsets[[i]]
  }
  }

How can I get test to be 2 simulation rounds with each 5 matrices.

Kind regards, Tobias



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unsubscribe please

2013-04-17 Thread David Winsemius

On Apr 16, 2013, at 11:02 PM, Pascal Oettli wrote:

 Hello,
 
 Don't reply only to me.
 
 1) Filter the unwanted mails,

There is also an option to get Rhelp postings in digest-format.

-- 
David.

 2) It takes few days to unsubscribe you.
 
 Regards,
 Pascal
 
 
 On 04/17/2013 02:59 PM, bert verleysen (beverconsult) wrote:
 I did this, but still I receive to much mails
 
 Bert Verleysen
 00 32 (0)477 874 272
 
 
 
 samen zoekend naar generatief organiseren
 
 
 -Oorspronkelijk bericht-
 Van: Pascal Oettli [mailto:kri...@ymail.com]
 Verzonden: woensdag 17 april 2013 6:33
 Aan: Bert Verleysen (beverconsult)
 CC: R-help@r-project.org
 Onderwerp: Re: [R] Unsubscribe please
 
 Hi,
 
 Do it yourself:
 https://stat.ethz.ch/mailman/listinfo/r-help
 
 Hint:
 Bbottom of the page (To unsubscribe from R-help)
 
 Regards,
 Pascal
 
 
 On 04/17/2013 06:33 AM, Bert Verleysen (beverconsult) wrote:
 
 
 Verstuurd vanaf mijn iPad
 Bert Verleysen
 00 32 (0)477 874 272
 www.beverconsult.be
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use of names() within lapply()

2013-04-17 Thread Duncan Murdoch

On 17/04/2013 11:33 AM, Ivan Alves wrote:

Dear Duncan and A.K.
Many thanks for your super quick help. The modified lapply did the trick, mapply died 
with a error Error in dots[[2L]][[1L]] : object of type 'builtin' is not 
subsettable.


That's due to a typo:  I should have said

mapply(plot, g, main=names(g))


Duncan Murdoch

Kind regards,
Ivan
On 17 Apr 2013, at 17:12, Duncan Murdoch murdoch.dun...@gmail.com wrote:

 On 17/04/2013 11:04 AM, Ivan Alves wrote:
 Dear all,

 List g has 2 elements

  names(g)
 [1] 2009-10-07 2012-02-29

 and the list plot

 lapply(g, plot, main=names(g))

 results in equal plot titles with both list names, whereas distinct titles 
names(g[1]) and names(g[2]) are sought. Clearly, lapply is passing 'g' in stead of 
consecutively passing g[1] and then g[2] to process the additional 'main'  argument 
to plot.  help(lapply) is mute as to what to element-wise pass parameters.  Any 
suggestion would be appreciated.

 I think you want mapply rather than lapply, or you could do lapply on a 
vector of indices.  For example,

 mapply(plot, g, main=names)

 or

 lapply(1:2, function(i) plot(g[[i]], main=names(g)[i]))

 Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to calculate averages of Blocks in an matrix?

2013-04-17 Thread David Winsemius

On Apr 17, 2013, at 9:54 AM, Keith S Weintraub wrote:

 Folks,
  I recently was given a simulated data set like the following subset:
 
 sim_sub-structure(list(V11 = c(0.01, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0.01, 0.03, 0, 
 0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 0, 0.04), V13 = c(0, 
 0, 0, 0.01, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 
 0.01), V14 = c(0, 0.01, 0.01, 0.01, 0.01, 0, 0, 0, 0, 0.03, 0, 
 0, 0.01, 0.01, 0.04, 0.01, 0.02, 0, 0.01, 0.03), V15 = c(0, 0.01, 
 0, 0, 0.01, 0, 0, 0, 0.01, 0.02, 0.01, 0, 0, 0.01, 0, 0, 0, 0.01, 
 0.01, 0.04), V16 = c(0, 0, 0, 0.03, 0.02, 0.01, 0, 0, 0.02, 0.02, 
 0, 0.02, 0.02, 0, 0.01, 0.01, 0, 0, 0.03, 0.01), V17 = c(0, 0.01, 
 0, 0.01, 0, 0, 0, 0.01, 0.05, 0.03, 0, 0.01, 0, 0.02, 0.02, 0, 
 0, 0.01, 0.02, 0.04), V18 = c(0, 0.01, 0, 0.03, 0.03, 0, 0, 0, 
 0.02, 0.01, 0, 0.02, 0.01, 0.02, 0.03, 0.02, 0, 0, 0.04, 0.04
 ), V19 = c(0, 0.01, 0.01, 0.02, 0.07, 0, 0, 0, 0.04, 0.01, 0.02, 
 0, 0, 0, 0.04, 0, 0, 0, 0, 0.05), V20 = c(0, 0, 0, 0.01, 0.04, 
 0.01, 0, 0, 0.02, 0.04, 0.01, 0, 0.02, 0, 0.03, 0, 0.02, 0.01, 
 0.03, 0.03)), .Names = c(V11, V12, V13, V14, V15, V16, 
 V17, V18, V19, V20), row.names = c(NA, 20L), class = data.frame)
 
 sim_sub
V11  V12  V13  V14  V15  V16  V17  V18  V19  V20
 1  0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
 2  0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.01 0.00
 3  0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00
 4  0.01 0.01 0.01 0.01 0.00 0.03 0.01 0.03 0.02 0.01
 5  0.00 0.03 0.00 0.01 0.01 0.02 0.00 0.03 0.07 0.04
 6  0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01
 7  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
 8  0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00
 9  0.00 0.00 0.00 0.00 0.01 0.02 0.05 0.02 0.04 0.02
 10 0.00 0.00 0.01 0.03 0.02 0.02 0.03 0.01 0.01 0.04
 11 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.01
 12 0.00 0.01 0.00 0.00 0.00 0.02 0.01 0.02 0.00 0.00
 13 0.00 0.00 0.00 0.01 0.00 0.02 0.00 0.01 0.00 0.02
 14 0.00 0.01 0.00 0.01 0.01 0.00 0.02 0.02 0.00 0.00
 15 0.00 0.00 0.01 0.04 0.00 0.01 0.02 0.03 0.04 0.03
 16 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.02 0.00 0.00
 17 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.02
 18 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01
 19 0.00 0.00 0.00 0.01 0.01 0.03 0.02 0.04 0.00 0.03
 20 0.00 0.04 0.01 0.03 0.04 0.01 0.04 0.04 0.05 0.03
 
 Every 5 rows represents one block of simulated data.
 
 What would be the best way to average the blocks?

This answers the posed question:

   tapply( data.matrix(sim_sub),  rep( rep(1:4, each=5), each=10) ,mean)
 1  2  3  4 
0.0030 0.0070 0.0106 0.0144 


Your code following suggests that you do not want the average values within 
blocks but within blocks AND ALSO within columns (although how you get 5 rows 
of 5 blocks from a 20 row input object is unclear to me)

 data.frame( lapply(sim_sub, function(col) tapply(col, rep(1:4, each=5), mean) 
  ) )
V11   V12   V13   V14   V15  V16   V17   V18   V19   V20
1 0.004 0.008 0.002 0.008 0.004 0.01 0.004 0.014 0.022 0.010
2 0.002 0.000 0.002 0.006 0.006 0.01 0.018 0.006 0.010 0.014
3 0.000 0.004 0.002 0.012 0.004 0.01 0.010 0.016 0.012 0.012
4 0.000 0.008 0.002 0.014 0.012 0.01 0.014 0.020 0.010 0.018

From your code I am guessing a typo of 5 for 4?

 
 My way was to reshape sim_sub, average over the columns and then reshape back 
 like so:
 
 matrix(colSums(matrix(t(sim_sub), byrow = TRUE, ncol = 50)), byrow = TRUE, 
 ncol = 10)/4
   [,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]  [,10]
 [1,] 0.0050 0. 0. 0.0025 0.0025 0.005 0. 0.0050 0.0050 0.0050
 [2,] 0. 0.0025 0. 0.0075 0.0025 0.005 0.0050 0.0075 0.0025 0.0050
 [3,] 0. 0. 0. 0.0050 0.0025 0.005 0.0050 0.0025 0.0025 0.0075
 [4,] 0.0025 0.0050 0.0025 0.0075 0.0075 0.020 0.0250 0.0275 0.0150 0.0150
 [5,] 0. 0.0175 0.0075 0.0275 0.0175 0.015 0.0225 0.0275 0.0425 0.0350
 
 
 How bad is t(sim_sub) in the above?

The whole matrix( matrix( t(.), ... )) approach seems kind of tortured, but to 
your question, t() is a fairly efficient function.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to calculate averages of Blocks in an matrix?

2013-04-17 Thread arun



 tapply(t(data.matrix(sim_sub)),rep( rep(1:4, each=5), each=10),mean) 
   #  1  2  3  4 
#0.0086 0.0074 0.0082 0.0108 

unlist(lapply(split(sim_sub,((seq_len(nrow(sim_sub))-1)%/%5)+1),function(x) 
mean(unlist(x
#    1  2  3  4 
#0.0086 0.0074 0.0082 0.0108 
A.K.

- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: Keith S Weintraub kw1...@gmail.com
Cc: r-help@r-project.org r-help@r-project.org
Sent: Wednesday, April 17, 2013 4:05 PM
Subject: Re: [R] Best way to calculate averages of Blocks in an matrix?


On Apr 17, 2013, at 9:54 AM, Keith S Weintraub wrote:

 Folks,
  I recently was given a simulated data set like the following subset:
 
 sim_sub-structure(list(V11 = c(0.01, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V12 = c(0, 0, 0, 0.01, 0.03, 0, 
 0, 0, 0, 0, 0, 0.01, 0, 0.01, 0, 0, 0, 0, 0, 0.04), V13 = c(0, 
 0, 0, 0.01, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 0.01, 0, 0, 0, 0, 
 0.01), V14 = c(0, 0.01, 0.01, 0.01, 0.01, 0, 0, 0, 0, 0.03, 0, 
 0, 0.01, 0.01, 0.04, 0.01, 0.02, 0, 0.01, 0.03), V15 = c(0, 0.01, 
 0, 0, 0.01, 0, 0, 0, 0.01, 0.02, 0.01, 0, 0, 0.01, 0, 0, 0, 0.01, 
 0.01, 0.04), V16 = c(0, 0, 0, 0.03, 0.02, 0.01, 0, 0, 0.02, 0.02, 
 0, 0.02, 0.02, 0, 0.01, 0.01, 0, 0, 0.03, 0.01), V17 = c(0, 0.01, 
 0, 0.01, 0, 0, 0, 0.01, 0.05, 0.03, 0, 0.01, 0, 0.02, 0.02, 0, 
 0, 0.01, 0.02, 0.04), V18 = c(0, 0.01, 0, 0.03, 0.03, 0, 0, 0, 
 0.02, 0.01, 0, 0.02, 0.01, 0.02, 0.03, 0.02, 0, 0, 0.04, 0.04
 ), V19 = c(0, 0.01, 0.01, 0.02, 0.07, 0, 0, 0, 0.04, 0.01, 0.02, 
 0, 0, 0, 0.04, 0, 0, 0, 0, 0.05), V20 = c(0, 0, 0, 0.01, 0.04, 
 0.01, 0, 0, 0.02, 0.04, 0.01, 0, 0.02, 0, 0.03, 0, 0.02, 0.01, 
 0.03, 0.03)), .Names = c(V11, V12, V13, V14, V15, V16, 
 V17, V18, V19, V20), row.names = c(NA, 20L), class = data.frame)
 
 sim_sub
    V11  V12  V13  V14  V15  V16  V17  V18  V19  V20
 1  0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
 2  0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.01 0.01 0.00
 3  0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.00
 4  0.01 0.01 0.01 0.01 0.00 0.03 0.01 0.03 0.02 0.01
 5  0.00 0.03 0.00 0.01 0.01 0.02 0.00 0.03 0.07 0.04
 6  0.01 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01
 7  0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
 8  0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00
 9  0.00 0.00 0.00 0.00 0.01 0.02 0.05 0.02 0.04 0.02
 10 0.00 0.00 0.01 0.03 0.02 0.02 0.03 0.01 0.01 0.04
 11 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.02 0.01
 12 0.00 0.01 0.00 0.00 0.00 0.02 0.01 0.02 0.00 0.00
 13 0.00 0.00 0.00 0.01 0.00 0.02 0.00 0.01 0.00 0.02
 14 0.00 0.01 0.00 0.01 0.01 0.00 0.02 0.02 0.00 0.00
 15 0.00 0.00 0.01 0.04 0.00 0.01 0.02 0.03 0.04 0.03
 16 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.02 0.00 0.00
 17 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.02
 18 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01
 19 0.00 0.00 0.00 0.01 0.01 0.03 0.02 0.04 0.00 0.03
 20 0.00 0.04 0.01 0.03 0.04 0.01 0.04 0.04 0.05 0.03
 
 Every 5 rows represents one block of simulated data.
 
 What would be the best way to average the blocks?

This answers the posed question:

   tapply( data.matrix(sim_sub),  rep( rep(1:4, each=5), each=10) ,mean)
     1      2      3      4 
0.0030 0.0070 0.0106 0.0144 


Your code following suggests that you do not want the average values within 
blocks but within blocks AND ALSO within columns (although how you get 5 rows 
of 5 blocks from a 20 row input object is unclear to me)

 data.frame( lapply(sim_sub, function(col) tapply(col, rep(1:4, each=5), mean) 
  ) )
    V11   V12   V13   V14   V15  V16   V17   V18   V19   V20
1 0.004 0.008 0.002 0.008 0.004 0.01 0.004 0.014 0.022 0.010
2 0.002 0.000 0.002 0.006 0.006 0.01 0.018 0.006 0.010 0.014
3 0.000 0.004 0.002 0.012 0.004 0.01 0.010 0.016 0.012 0.012
4 0.000 0.008 0.002 0.014 0.012 0.01 0.014 0.020 0.010 0.018

From your code I am guessing a typo of 5 for 4?

 
 My way was to reshape sim_sub, average over the columns and then reshape back 
 like so:
 
 matrix(colSums(matrix(t(sim_sub), byrow = TRUE, ncol = 50)), byrow = TRUE, 
 ncol = 10)/4
       [,1]   [,2]   [,3]   [,4]   [,5]  [,6]   [,7]   [,8]   [,9]  [,10]
 [1,] 0.0050 0. 0. 0.0025 0.0025 0.005 0. 0.0050 0.0050 0.0050
 [2,] 0. 0.0025 0. 0.0075 0.0025 0.005 0.0050 0.0075 0.0025 0.0050
 [3,] 0. 0. 0. 0.0050 0.0025 0.005 0.0050 0.0025 0.0025 0.0075
 [4,] 0.0025 0.0050 0.0025 0.0075 0.0075 0.020 0.0250 0.0275 0.0150 0.0150
 [5,] 0. 0.0175 0.0075 0.0275 0.0175 0.015 0.0225 0.0275 0.0425 0.0350
 
 
 How bad is t(sim_sub) in the above?

The whole matrix( matrix( t(.), ... )) approach seems kind of tortured, but to 
your question, t() is a fairly efficient function.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

[R] Full Information Maximum Likelihood estimation method for multivariate sample selection problem

2013-04-17 Thread Champak Ishram
Dear R experts/ users

Full Information Maximum Likelihood (FIML) estimation approach is
considered robust over Seemingly Unrelated Regression (SUR) approach
for analysing data of multivariate sample selection problem. The zero
cases in my dependent variables are resulted from three sources:
Irreverent options, not choosing due to negative utility and not used
in the reported time. FIML can address the estimation problem
associated with
cross-equation correlation of the errors.
I am interested to learn and apply the FIML method of estimation. I
searched R resources in internet but I could not get the materials
specific to address my following questions. I request R experts/ users
to address the following queries.

Q.1. hick package of R (e.g. lavaan, mvnmle , stat4 and sem) is
appropriate to analyse the multivariate sample selection problem by
using FIML estimation method?

Q.2. How should it be formulated the code to execute the FIML method ?

Q.3. what is the right method similar to log likelihood ratio to
determine variables stability in the model?

Q.4. My original data of dependent variables are in percentage in
measurement. Do I need to change them any other specific functional
form?

I attempted to formulate the data in the following structure.

Selection equation

ws = c(w1, w2, w3)
# values of dependent variables in selection equations are binary  (1 and 0)

zs = c(z1, z2, z3, z4, z5) # z1, z2, z3 continuous and z4 and z5
dummies explanatory variables in selection equation

Level equation (extent of particular option use)

ys = c(y1, y2, y3)
# values of dependent variables are percentage with some zero cases

xs = c(x1, x2, x3, x4, x5) # x1, x2, x3 continuous and x4 and x5
dummies dependent variables.


Note: The variables in both selection and level equations are mostly same.



Advance thanks for helping me.

Champak Ishram

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multi-core processing in glmulti

2013-04-17 Thread M.-O. Adams
Dear list,

I am trying to do an automated model selection of a glmm (function glmer;
package: lme4) containing a large number of predictors. As far as i
understand, glmulti is able to devide the process into chuncks and proceed
by parallel processing on on multiple cores. Unfortunately this does not
seem to work and i could not really fid any advice on the matter on other
forums. Specifically i have the following questions:


1) does parallel processing only work for exhaustive processing
(glmulti(..., method = h, )) or also for the generic algorithm
(glmulti(..., method = g, ))?

2) do i need to invoke another package designed for parallel processing
(e.g. package::parallel or package::snow) to set up the necessary
computational clusters before calling glmulti, or can glmulti address the
differend cores of my pc on its own?

Any help would be greatly appreciated!

cheers,

marc



--
View this message in context: 
http://r.789695.n4.nabble.com/Multi-core-processing-in-glmulti-tp4664546.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] t-statistic for independent samples

2013-04-17 Thread David Arnold
Hi,

Typical things you read when new to stats are cautions about using a
t-statistic when comparing independent samples. You are steered toward a
pooled test or welch's approximation of the degrees of freedom in order to
make the distribution a t-distribution. However, most texts give no
information why you have to do this.

So I thought I try a little experiment which is outlined here.

Distrubtion of differences of independent samples
http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html
  

As you can see in the above link, I see no evidence why you need a pooled or
Welch's in these images.

Anyone care to comment? Or should I put this on Stack Exchange?

D.




--
View this message in context: 
http://r.789695.n4.nabble.com/t-statistic-for-independent-samples-tp4664553.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-statistic for independent samples

2013-04-17 Thread Kevin E. Thorpe


On 04/17/2013 06:24 PM, David Arnold wrote:

Hi,

Typical things you read when new to stats are cautions about using a
t-statistic when comparing independent samples. You are steered toward a
pooled test or welch's approximation of the degrees of freedom in order to
make the distribution a t-distribution. However, most texts give no
information why you have to do this.

So I thought I try a little experiment which is outlined here.

Distrubtion of differences of independent samples
http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html

As you can see in the above link, I see no evidence why you need a pooled or
Welch's in these images.

Anyone care to comment? Or should I put this on Stack Exchange?

D.


Admittedly, I just skimmed the page, but one thing stands out.  Your 
standard deviations are really quite close to each other.  Try your 
simulations again with variance ratios exceeding 2 and see what happens.



--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-statistic for independent samples

2013-04-17 Thread Jay Kerns
Dear David,

On Wed, Apr 17, 2013 at 6:24 PM, David Arnold dwarnol...@suddenlink.net wrote:
 Hi,

[snip]


 D.

Before posting to StackExchange, check out the Wikipedia entry for
Behrens-Fisher problem.

Cheers,
Jay


-- 
G. Jay Kerns, Ph.D.
Youngstown State University
http://people.ysu.edu/~gkerns/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-statistic for independent samples

2013-04-17 Thread David Arnold
OK,although the variance ratio was already 2.25 to 1,  tried sigma1=10,
sigma2=25, which makes the ratios of the variances 6.25 to 1.

Still no change. See: 
http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html

D.



--
View this message in context: 
http://r.789695.n4.nabble.com/t-statistic-for-independent-samples-tp4664553p4664556.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Understanding why a GAM can't suppress an intercept

2013-04-17 Thread Andrew Crane-Droesch
Simon,

Many thanks as always for your help.

I see and appreciate the example that you cited, but I'm having a hard 
time generalizing it to a multivariate case.  A bit about my context -- 
my dataset is response ratios; the log of a treatment over a control.  
One of my explanatory variables is treatment intensity.  When intensity 
goes to zero, the expectation of the response ratio should also go to 
zero.  Here is the model that I would like to fit:

model = (ResponseRatio~
 +s(as.factor(study),bs=re,by=intensity)
 +s(intensity)
 +s(x1),by=intensity)
 +s(x2),by=intensity)
 +te(x1,x2),by=intensity)
 +te(x1,intensity)
)

Here is the example that you gave:

library(mgcv)
set.seed(0)
n - 100
x - runif(n)*4-1;x - sort(x);
f - exp(4*x)/(1+exp(4*x));y - f+rnorm(100)*0.1;plot(x,y)
dat - data.frame(x=x,y=y)

## Create a spline basis and penalty, making sure there is a knot
## at the constraint point, (0 here, but could be anywhere)
knots - data.frame(x=seq(-1,3,length=9)) ## create knots
## set up smoother...
sm - smoothCon(s(x,k=9,bs=cr),dat,knots=knots)[[1]]

## 3rd parameter is value of spline at knot location 0,
## set it to 0 by dropping...
X - sm$X[,-3]## spline basis
S - sm$S[[1]][-3,-3] ## spline penalty
off - y*0 + .6   ## offset term to force curve through (0, .6)

## fit spline constrained through (0, .6)...
b - gam(y ~ X - 1 + offset(off),paraPen=list(X=list(S)))
lines(x,predict(b))

## compare to unconstrained fit...

b.u - gam(y ~ s(x,k=9),data=dat,knots=knots)
lines(x,predict(b.u))

*My question*:  how can I extend your example to more than one smooth 
terms, and several smooth interactions?  In the context of a model where 
E[ResponseRatio] must equal 0 when intensity equals zero?

I am not sure that I understand exactly what is going on when you call 
smoothCon to specify the `sm`.  I've written my own penalized spline 
code from scratch, but it is far less sophisticated than mgcv, and is 
basically a ridge regression where I optimize to get a lambda after 
specifying a model with a lot of knots.  mgcv clearly has a lot more 
going on, and is far preferable to my rudimentary code for its handling 
of tensors and random effects.

(Also, for prediction, how can I do by=dum AND by=intensity at the 
same time?)

Many thanks,
Andrew

PS:  I am aware that interacting my model with an intensity variable 
makes my model quite heteroskedastic.  I am thinking of using a cluster 
wild bootstrap to construct confidence intervals.  If a better way 
forward immediately comes to your mind -- especially if its 
computationally cheaper -- I'd greatly appreciate if you could share it.

On 04/17/2013 02:16 AM, Simon Wood wrote:
 hi Andrew.

 gam does suppress the intercept, it's just that this doesn't force the 
 smooth through the intercept in the way that you would like. Basically 
 for the parameteric component of the model '-1' behaves exactly like 
 it does in 'lm' (it's using the same code). The smooths are 'added on' 
 to the parametric component of the model, with sum to zero constraints 
 to force identifiability.

 There is a solution to forcing a spline through a particular point at
 http://r.789695.n4.nabble.com/Use-pcls-in-quot-mgcv-quot-package-to-achieve-constrained-cubic-spline-td4660966.html
  

 (i.e. the R help thread Re: [R] Use pcls in mgcv package to achieve 
 constrained cubic spline)

 best,
 Simon

 On 16/04/13 22:36, Andrew Crane-Droesch wrote:
   Dear List,

 I've just tried to specify a GAM without an intercept -- I've got one
 of the (rare) cases where it is appropriate for E(y) - 0 as X -0.
 Naively running a GAM with the -1 appended to the formula and the
 calling predict.gam, I see that the model isn't behaving as expected.

 I don't understand why this would be.  Google turns up this old R help
 thread: 
 http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html

 Simon writes:

  *Smooth terms are constrained to sum to zero over the covariate
  values. **
  **This is an identifiability constraint designed to avoid
  confounding with **
  **the intercept (particularly important if you have more than one
  smooth). *
  If you remove the intercept from you model altogether (m2) then 
 the
  smooth will still sum to zero over the covariate values, which in
  your
  case will mean that the smooth is quite a long way from the data.
  When
  you include the intercept (m1) then the intercept is effectively
  shifting the constrained curve up towards the data, and you get a
  nice fit.

 Why?  I haven't read Simon's book in great detail, though I have read
 Ruppert et al.'s Semiparametric Regression.  I don't see a reason why
 a penalized spline model shouldn't equal the intercept (or zero) when
 all of the regressors equals zero.

 Is anyone able to help with a bit of intuition?  Or relevant passages
 from a good description of why this would be the case?

 Furthermore, why does the -1 

[R] Memory usage reported by gc() differs from 'top'

2013-04-17 Thread Christian Brechbühler
In help(gc) I read, ...the primary purpose of calling 'gc' is for the
report on memory usage.
What memory usage does gc() report?  And more importantly, which memory
uses does it NOT report?  Because I see one answer from gc():

   used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells 14875922 794.5   21754962 1161.9 17854776 953.6
Vcells 59905567 457.1   84428913  644.2 72715009 554.8

(That's about 1.5g max used, 1.8g trigger.)
And a different answer from an OS utility, 'top':

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 6210 brech 20   0 18.2g 7.2g 2612 S1 93.4  16:26.73 R

So the R process is holding on to 18.2g memory, but it only seems to have
accout of 1.5g or so.
Where is the rest?

I tried searching the archives, and found answers like just buy more RAM.
 Which doesn't exactly answer my question.  And come on, 18g is pretty big;
sure it doesn't fit in my RAM (only 7.2g are in), but that's beside the
point.

The huge memory demand is specific to R version 2.15.3 Patched (2013-03-13
r62500) -- Security Blanket.  The same test runs without issues under R
version 2.15.1 beta (2012-06-11 r59557) -- Roasted Marshmallows.

I appreciate any insights you can share into R's memory management, and
gc() in particular.
/Christian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-statistic for independent samples

2013-04-17 Thread Thomas Lumley
I just looked more carefully at your code.

You are computing the unequal-variance (Welch) version of the t-test, so
that's why there isn't a problem.  Compare it with the equal-variance
t-test, using the pooled variance estimate, which does have a problem, as
below

-thomas

tstat4 - function() {

n1 = 7

mu1 = 100

sigma1 = 25

n2 = 14

mu2 = 100

sigma2 = 10

x1 = rnorm(n1, mu1, sigma1)

x1bar = mean(x1)

s1 = sd(x1)

x2 = rnorm(n2, mu2, sigma2)

x2bar = mean(x2)

s2 = sd(x2)

t = ((x1bar - x2bar) - (mu1 - mu2))/sqrt(s1^2/n1 + s2^2/n2)

t2= ((x1bar - x2bar) - (mu1 - mu2))/sqrt(((n1-1)*s1^2 + (n2-1)*s2^2)/(n1
+n2-2)*(1/n1+1/n2))

return(c(t,t2))

}


tstats4 = replicate(1, tstat4())


hist(tstats4[1,], breaks = scott, prob = TRUE, xlim = c(-4, 4), ylim = c(0
, 0.4))

x = seq(-4, 4, length = 200)

y = dt(x, df = 48)

lines(x, y, type = l, col = red)


hist(tstats4[2,], breaks = scott, prob = TRUE, xlim = c(-4, 4), ylim = c(0
, 0.4))

x = seq(-4, 4, length = 200)

y = dt(x, df = 48)

lines(x, y, type = l, col = red)


On Thu, Apr 18, 2013 at 12:28 PM, David Arnold dwarnol...@suddenlink.netwrote:

 OK,although the variance ratio was already 2.25 to 1,  tried sigma1=10,
 sigma2=25, which makes the ratios of the variances 6.25 to 1.

 Still no change. See:

 http://msemac.redwoods.edu/~darnold/math15/R/chapter11/DistributionForTwoIndependentSamplesPartII.html

 D.



 --
 View this message in context:
 http://r.789695.n4.nabble.com/t-statistic-for-independent-samples-tp4664553p4664556.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using different function (parameters) with apply

2013-04-17 Thread Sachinthaka Abeywardana
Hi All,

I have the following problem (read the commented bit below):

a-matrix(1:9,nrow=3)


a

 [,1] [,2] [,3]
[1,]147
[2,]258
[3,]369


div-1:3

apply(a,2,function(x)x/div) ##want to divide each column by div-
instead each row is divided##


 [,1] [,2] [,3]
[1,]1  4.07
[2,]1  2.54
[3,]1  2.03


apply(a,1,function(x)x/div) ##Changing Margin from 2 to 1 does
something completele weird## [,1] [,2] [,3]
[1,] 1.00 2.003
[2,] 2.00 2.503
[3,] 2.33 2.673


Any thoughts?


Thanks,

Sachin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.