date:20120702


Hello,

Try

myfun2 - function(DF, FUN){
x - lapply(as.data.frame(t(DF[-1])), function(x) FUN(x[1], x[2]))
names(x) - levels(DF[[1]])[ DF[[1]] ]
x
}

myfun2(myframe, myfun)

Hope this helps,

Rui Barradas

Em 02-07-2012 07:02, Onur Uncu escreveu:

Hi All

I have a dataframe:

myframe-data.frame(ID=c(first,second),x=c(1,2),y=c(3,4))

And I have a function myfun:

myfun-function(x,y) x+y

I would like to write a function myfun2 that takes myframe and myfun
as parameters and returns a list as below:

mylist
$first
[1] 4

$second
[2] 6

Could you please help me with this? Doesn't seem like the apply
family of functions were intended for this case.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Insert row in specific location between data frames


Hello,

When I've asked you to dput() your datasets, I meant all of the output 
of dput(), for us to copy it and paste in an R session. It is the 
easiest way of recreating exact copies of the objects.


Like this, with those  it's unusable.

Now, as far as I can see, you have a data.frame called groupA with 25 
rows and a vector of 24 elements. after including an NA in 11th position 
the vector length becomes 25. This part was already answered to.


Then you want to put that vector as a column of groupA. You do NOT need 
'with', this will do:


groupA$predict_SO2 - predict_SO2_a


Then you want to merge this resulting data.frame with another 
data.frame, groupB, right? But merge returns a data.frame with 0 rows.


What went wrong? The common columns combined don't have the same values.

Why not? Because column 'Date' is a factor, not a date. The labels, 
i.e., the dates values, might be equal but the factors, how they are 
coded, are not.


Use the following.


x - with(groupA, levels(Date)[Date])
x - as.Date(x, format=%d/%m/%Y)
groupA$Date - x

And the same for groupB. Only then try to merge them.

And next time paste the output of dput(), ALL of it.

Hope this helps,

Rui Barradas

Em 02-07-2012 05:39, pigpigmeow escreveu:

First, I have predict_SO2_a which is contained 24 data. I want to insert
NA in 11th row. Then, predict_SO2_a becomes 25 data.
After insert the row, I want to use with function to combine the
data.frame
/groupA$predict_SO2-with(groupA, predict_SO2_a).
/
/dput(predict_SO2_a)
c(39.7932308121176, 30.25257753285, 32.4675835451901, 31.9415094289634,
27.9083195877186, 11.5941369504695, 9.36812510512633, 12.3190926962636,
...
14.9134904913096, 33.8462160039482, 16.6586503422101, 11.0312717522444,
22.3102431270508, 15.1408236735915, 10.6875527887638, 11.3294850253127,
13.9037966719703, 28.6603710864312)


dput(groupA)

structure(list(Date = structure(c(15L, 21L, 23L, 20L, 9L, 10L,
2L, 11L, 22L, 6L, 7L, 16L, 17L, 24L, 26L, 12L, 18L, 19L, 14L,
8L, 25L, 3L, 4L, 5L, 13L), .Label = c(, 1/9/2001, 10/9/2010,
11/9/2010, 12/9/2010, 14/9/2002, 15/9/2002, 15/9/2009,
19/9/1999, 2/9/2000, 2/9/2001, 2/9/2008, 21/9/2010,
24/9/2008, 3/9/1997, 3/9/2003, 3/9/2005, 3/9/2008,
5/9/2008, 6/9/1998, 7/9/1997, 7/9/2001, 8/9/1997, 8/9/2006,
8/9/2010, 9/9/2006), class = factor), pressure = c(-8.110989011,
-5.910989011, -3.510989011, -4.732967033, -5.737362637, -7.607692308,
-9.675824176, -9.075824176, -5.575824176, -6.169230769, -8.169230769,
-9.207692308, -9.197802198, -4.884615385, -3.684615385, -3.132967033,
-3.332967033, -3.232967033, -9.532967033, -8.537362637, -6.869230769,
-6.869230769, -3.869230769, -2.069230769, -5.369230769), maxtemp =
c(2.056043956,
0.756043956, 1.556043956, 2.216483516, 1.995604396, 2.346153846,
1.97032967, 0.17032967, 1.57032967, 0.747252747, -0.352747253,
0.672527473, 1.985714286, 1.452747253, 0.352747253, 1.568131868,
3.068131868, 1.368131868, 0.168131868, 1.987912088, 5.187912088,
3.987912088, -0.812087912, 1.587912088, -1.112087912), avetemp =
c(2.540659341,
0.440659341, 1.340659341, 1.287912088, 2.278021978, 2.2, 1.962637363,
0.962637363, 1.562637363, 1.482417582, 0.682417582, 1.089010989,
2.103296703, 1.989010989, 0.589010989, 2.087912088, 2.287912088,
1.787912088, 1.287912088, 1.330769231, 5.237362637, 3.43736263
.
ratio = c(1.53920929073912,
...
2.08020364225369, 2.5845449621267, 4.68646633242131, 0.93343593089835,
1.18698605729367, 1.19133323040343, 1.9902213063946, 2.09049362040035
)), .Names = c(Date, pressure, maxtemp, avetemp, mintemp,
RH, solar, windspeed, transport, angle, rainfall,
RSP, Ozone, NO2, NOX, SO2, CO, newRSP, newOzone,
newNO2, newNOX, newSO2, predict_RSP, predict_NO2, predict_NOX,
ratio), row.names = c(NA, 25L), class = data.frame)/

Finally, I want to groupA combine with groupB, groupB contains ..

/dput(groupB)
structure(list(Date = structure(c(1L, 16L, 20L, 27L, 32L, 34L,
35L, 7L, 11L, 21L, 30L, 17L, 8L, 2L, 28L, 3L, 18L, 22L, 24L,
29L, 31L, 23L, 25L, 4L, 26L, 12L, 13L, 15L, 19L, 5L, 6L, 33L,
9L, 10L, 14L), .Label = c(1/9/1997, 1/9/2004, 1/9/2006,
1/9/2008, 10/11/2009, 11/11/2009, 11/9/1999, 12/10/2003,
13/9/2010, 17/9/2010, 18/9/1999, 18/9/2008, 18/9/2009,
18/9/2010, 19/9/2009, 2/9/1997, 2/9/2002, 2/9/2006,
20/9/2009, 26/11/1997, 3/10/2000, 3/9/2006, 3/9/2007,
4/9/2006, 4/9/2007, 4/9/2008, 5/9/1998, 5/9/2004, 5/9/2006,
6/9/2001, 6/9/2006, 7/9/1998, 7/9/2010, 8/9/1998, 9/9/1998
), class = factor), pressure = c(-8.310989011, -8.710989011,
-1.710989011, -4.732967033, -2.932967033, -2.732967033, -5.432967033,
-6.637362637, -7.237362637, -1.707692308, -6.475824176, -3.869230769,
-3.507692308, -8.098901099, -10.6989011, -7.184615385, 
  ratio = c(1.94158182541644, 2.12248234979731,
1.87302150800523, 2.61289013672199, 2.97067043253228, 2.85053235533923,
2.51886435993509, 1.87829582620638, 
2.9380496638884, 1.40686764084479,

Re: [R] table function in a matrix

2012-07-02 Thread Sarah Auburn

Dear Petr,
Thanks for your help. Sorry one more query for one of my datasets which has NAs 
(missing genotypes). Is there any way in which I can count NAs?
Many thanks!
Sarah

From: Sarah Auburn saub...@yahoo.com
To: Petr Savicky savi...@cs.cas.cz 
Cc: r-help@r-project.org r-help@r-project.org 
Sent: Thursday, 7 June 2012, 23:24
Subject: Re: [R] table function in a matrix

Perfect, thank you!

From: Petr Savicky savi...@cs.cas.cz
To: r-help@r-project.org 
Sent: Thursday, 7 June 2012, 19:42
Subject: Re: [R] table function in a matrix

On Wed, Jun 06, 2012 at 11:02:46PM -0700, Sarah Auburn wrote:
 Hi,
 I am trying to get a summary of the counts of different variables for each 
 sample in a matrix of the form m below to generate an output as shown. 
 (Ultimately I want to generate a stacked barchart for each sample). I am only 
 able to get the table function to work on one sample (column) at a time. 
 Any help appreciated.
 Thank you
 Sarah
 ?
 a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, 
 A, C)
 m-matrix(a, nrow=4)
 m
  [,1] [,2] [,3] [,4]
 [1,] A? C? A? D 
 [2,] A? A? D? C 
 [3,] B? C? C? A 
 [4,] B? D? A? C 

 output needed (so that I can use the barplot(t(output)) function):
      A B C D
 [,1] 2 2 0 0
 [,2] 1 0 2 1
 [,3] 2 0 1 1
 [,4] 1 0 2 1

Hi.

Try the following.

  a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, 
A, C)
  m-matrix(a, nrow=4)
  tab - function(x) { table(factor(x, levels=LETTERS[1:4])) }
  t(apply(m, 2, tab))

      A B C D
  [1,] 2 2 0 0
  [2,] 1 0 2 1
  [3,] 2 0 1 1
  [4,] 1 0 2 1

Factors are used to ensure that all the tables have the same length,
even if some letters are missing.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Heat Maps

2012-07-02 Thread Joseph Clark


the image function should do it, something like this: image( x, y, 
outer(x,y,u), col=[some vector of colors] ) You can add a breaks parameter 
but you need one more break than you have colors (include the start and end 
point).  Or just let it be automatic.I have used colorpanel() from the gplots 
library to generate graduated color shades.  This is based on actual code I'm 
using for one of my heatmaps: library(gplots) # for colorpanelz - 
outer(x,y,u)image( x, y, z, colorpanel(10,steelblue,white), 
breaks=quantile(z,seq(0,1,by=0.1))  )box() # for appearancespar(new=TRUE)  # 
you want the lines on top of the colors, so do the contour plot 
secondcontour(...) It will produce ten colors with steelblue for the highest 
value and white for the lowest value, and one shade for each decile.  You can 
omit the breaks term.   I have drawn indifference curves using the program 
below (Contour Plot)
 
 
 u - function(x, y) x^0.5 + y^0.5
 x - seq(0, 1000, by=1)
 y - seq(0, 1000, by=1)
 a - c(10, 20, 30)
 contour(x, y, outer(x, y, u),levels=a,col=blue)
 
 
 
 
 Now can any body please tell me how to draw Heat maps
 
 and that too on the same indifference curve plot (contour)
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table function in a matrix

2012-07-02 Thread Thaler,Thorn,LAUSANNE,Applied Mathematics


Hello,

See the difference.


a - b - c(A, A, B, B, C, A, C, D, A, D, C, A, 
D, C, A, C)

a[3] - NA

table(a)
table(a, exclude=NULL) # always include NA
table(b, exclude=NULL) # always include NA

# more flexible
table(b, useNA=always)
table(b, useNA=ifany)

Hope this helps,

Rui Barradas

Em 02-07-2012 07:27, Sarah Auburn escreveu:

Dear Petr,
Thanks for your help. Sorry one more query for one of my datasets which has NAs 
(missing genotypes). Is there any way in which I can count NAs?
Many thanks!
Sarah

From: Sarah Auburn saub...@yahoo.com
To: Petr Savicky savi...@cs.cas.cz
Cc: r-help@r-project.org r-help@r-project.org
Sent: Thursday, 7 June 2012, 23:24
Subject: Re: [R] table function in a matrix


Perfect, thank you!

From: Petr Savicky savi...@cs.cas.cz
To: r-help@r-project.org
Sent: Thursday, 7 June 2012, 19:42
Subject: Re: [R] table function in a matrix

On Wed, Jun 06, 2012 at 11:02:46PM -0700, Sarah Auburn wrote:

Hi,
I am trying to get a summary of the counts of different variables for each sample in a matrix of 
the form m below to generate an output as shown. (Ultimately I want to generate a 
stacked barchart for each sample). I am only able to get the table function to work on 
one sample (column) at a time. Any help appreciated.
Thank you
Sarah
?
a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, 
C)
m-matrix(a, nrow=4)
m
 [,1] [,2] [,3] [,4]
[1,] A? C? A? D
[2,] A? A? D? C
[3,] B? C? C? A
[4,] B? D? A? C

output needed (so that I can use the barplot(t(output)) function):
   A B C D
[,1] 2 2 0 0
[,2] 1 0 2 1
[,3] 2 0 1 1
[,4] 1 0 2 1


Hi.

Try the following.

   a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, A, 
C)
   m-matrix(a, nrow=4)
   tab - function(x) { table(factor(x, levels=LETTERS[1:4])) }
   t(apply(m, 2, tab))

   A B C D
   [1,] 2 2 0 0
   [2,] 1 0 2 1
   [3,] 2 0 1 1
   [4,] 1 0 2 1

Factors are used to ensure that all the tables have the same length,
even if some letters are missing.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Binary Quadratic Opt?

2012-07-02 Thread khris

Hi Menkes, 

Thanks for the reply but just academically free license won't work for me. GNU 
or more is reqd. 

Rest fine
Khris.

On Jun 30, 2012, at 7:21 PM, menkes [via R] wrote:

 Hi Khris, 
 
 If all your variables are binary then you may want to check CPLEX and/or 
 Gurobi (both provide a free academic license). 
 http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/
 http://www.gurobi.com/products/additional-products-using-gurobi/r
 
 The algorithms that CPLEX and Gurobi use for quadratic programming are 
 designed to work with convex objective functions, with the one exception when 
 all variables are binary.  In that case CPLEX and Gurobi apply some 
 transformation that in certain cases will allow you to solve binary quadratic 
 optimization problems. 
 
 Regards, 
 Menkes 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://r.789695.n4.nabble.com/Binary-Quadratic-Opt-tp4633521p4634971.html
 To unsubscribe from Binary Quadratic Opt?, click here.
 NAML



--
View this message in context: 
http://r.789695.n4.nabble.com/Binary-Quadratic-Opt-tp4633521p4635082.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About Error message

2012-07-02 Thread Petr PIKAL

 
 Hi again!
 I have a question about R.
 I have done gam in previous version of R with mgcv package and saved 
the
 workspace. This workspace contains different models and I will do 
prediction
 by these GAMs.
 
 However, I install new version of R. and use the same workspace. when I 
type
 summary(models), and the error message showed
 Error in Predict.matrix.cr.smooth(object, dk$data) :  F is missing from 
cr
 smooth - refit model with current mgcv.
 
 this workspace is normal when I used previous version of R. What's 
wrong?!

Hi

Maybe in new installation some packages are missing (not installed). Try 
to install all packages you used during your previous work and then try to 
start R again.

Regards
Petr

 Thank in advance.
 
 
 
 --
 View this message in context: http://r.789695.n4.nabble.com/About-Error-
 message-tp4634955.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geom_boxplot

2012-07-02 Thread Petr PIKAL

Hi

In new ggplot2 version following works too

p + geom_boxplot(aes(fill = factor(cyl))) +
   labs(fill = Cylinders) + ylab(Miles per Gallon)+xlab(Number of 
Cylinders)

Regards
Petr

 
 Yes you can do all of the things you want. 
 
 Below is a start, to give you an idea of how to approach some of it.
 
 library(ggplot2)
 p - ggplot(mtcars, aes(factor(cyl), mpg))
  p  -  p + geom_boxplot(aes(fill = factor(cyl))) +
   labs(fill = Cylinders)  +
   scale_y_continuous(Miles per Gallon) +
   scale_x_discrete(Number of Cylinders)
 p
 
 
 Have a look at 
ackoverflow.com/questions/3606697/how-to-set-x-axis-limits-
 in-ggplot2-r-plots for x and y axes limits. 
 
 It took me a while to realise it but, generally, I find that it is not 
too
 hard to find examples of what you need by just googling something like 
 :ggplot2 set x and y limits  or ggplot2 geom_bar colour and so on. 
 
 The ggplot2 and geom_XXX are pretty unique on the internet and search 
 results usually are not too bad. 
 
 You may also want to subcribe to the ggplot2 group on google groups.
 
 Best wishes
 
 
 John Kane
 Kingston ON Canada
 
 
  -Original Message-
  From: hannah@gmail.com
  Sent: Sun, 1 Jul 2012 08:39:20 -0400
  To: r-help@r-project.org
  Subject: Re: [R] geom_boxplot
  
  Also, it is possible to change ylim also?
  
  2012/7/1 li li hannah@gmail.com
  
  Dear all,
I have a few questions regarding the boxplot output from the
  geom_boxplot function.
  Attached is the output I get. Below are my questions:
  
1. How can I define the xlab and ylab myself?
   Also I would like to remove factor(variable)
  line on the right side.
  
2. How can I define the colors of the boxplots myself.
   For example, I want to use blue for
  LR, green for pair and purple for BR1.
Thanks so much!
  Hannah
  
  
 [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 FREE ONLINE PHOTOSHARING - Share your photos online with your friends 
and family!
 Visit http://www.inbox.com/photosharing to find out more!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] VIM package - how to get the underlying code

2012-07-02 Thread Uwe Ligges




On 01.07.2012 21:19, Mathias Worni wrote:

Dear R-users,

I am using R on a Mac using the latest version of R (2.15.1) working with
R-studio. To perform multiple imputation for a dataset with some missing
values, I am using the VIM package (http://goo.gl/rfGfr). While everything
is working fine also with the GUI, I wonder if anybody knows how to get the
code for the diagrams you can create using the GUI.



Just download the source version of the VIM package and take a look into 
the code?


Best,
Uwe Ligges



Thanks,
Mathias

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqlSave()

2012-07-02 Thread cindy.dol

Hi,

I tried your example but I have an error message:
sqlUpdate( channel = con, dat = tbl, tablename = myNewTable, index = ID
)
*Erreur dans sqlUpdate(channel = con, dat = tbl, tablename = myNewTable, 
: 
  [RODBC] Failed exec in Update*

I work with this:
sessionInfo() 
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods  
[8] base 

other attached packages:
 [1] sqldf_0.4-6.4 gsubfn_0.6-3  proto_0.3-9.2
 [4] chron_2.3-42  RSQLite.extfuns_0.0.1 RSQLite_0.11.1   
 [7] RODBC_1.3-5   RJDBC_0.2-0   rJava_0.9-3  
[10] DBI_0.2-5   

Do you know what is the problem?

Thank you for your answer  

--
View this message in context: 
http://r.789695.n4.nabble.com/sqlSave-tp892040p4635087.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot: dodge positions

Dear all,

I want to get a series of boxplots (grouped by two factors) and I want to 
overlay the original observations and the following code does almost what I 
want:

library(ggplot)
ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), 
grp = factor(rep(rep(1:3, 10), 4)))
ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()

Yet the position of the points and the position of the boxes on the x-axis is 
not the same. I would like that the points are shifted accordingly, such that 
they line up with the boxplots. I tried position_dodge:

ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + 
geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))

but that did not really help, as all points are now dodged and I just want to 
have a fixed offset for each subgroup of points such that the boxplot and the 
points are aligned. Any ideas?


Kind Regards,

Thorn Thaler
Mathematician

Applied Mathematics 
Nestec Ltd,
Nestlé Research Center
PO Box 44 
CH-1000 Lausanne 26
Phone: +41 21 785 8220
Fax: +41 21 785 9486

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Getting R In Torrents instead of Download Zip Files

2012-07-02 Thread Ajay Ohri

Dear List,

Sorry for intrusion. I live in area of erratic internet download speeds.
Can we get R in torrents instead of just download Zip files so we can
resume downloads when broken

Sincrely,

A Ohri

Websites-
Technology
http://decisionstats.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions


Hello,

Though I'm not the most fluent user of ggplot, I've seen no problem with 
the graph, each subgroup of points is over each boxplot.


Maybe what made the difference was the use of ---  ggplot2
not ggplot.

Hope this helps,

Rui Barradas

Em 02-07-2012 10:43, Thaler,Thorn,LAUSANNE,Applied Mathematics escreveu:

Dear all,

I want to get a series of boxplots (grouped by two factors) and I want to 
overlay the original observations and the following code does almost what I 
want:

library(ggplot)
ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y = runif(120,0,10), 
grp = factor(rep(rep(1:3, 10), 4)))
ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()

Yet the position of the points and the position of the boxes on the x-axis is 
not the same. I would like that the points are shifted accordingly, such that 
they line up with the boxplots. I tried position_dodge:

ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + 
geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))

but that did not really help, as all points are now dodged and I just want to 
have a fixed offset for each subgroup of points such that the boxplot and the 
points are aligned. Any ideas?


Kind Regards,

Thorn Thaler
Mathematician

Applied Mathematics
Nestec Ltd,
Nestlé Research Center
PO Box 44
CH-1000 Lausanne 26
Phone: +41 21 785 8220
Fax: +41 21 785 9486

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Quantitative analysis of treatment effects

2012-07-02 Thread syrvn

Hello!

I need to analyse a biological study. The study design is rather simple. It
includes 3 groups. 1 control group
and 2 test groups. The two test groups are treated with the same drug but
different doses. Each group has
approximately 14 observations and I look at around 600 variables. I already
used an one way ANOVA to
determine the significant hits between the groups. However, I now want to
know if there is a quantitative
effect between the treatment doses. In other words, is the treatment effect
(the difference between the control
group and a test group) bigger when you use a lower or a higher dose of the
same drug. I also need to include
1 or 2 co-variates in the analysis.

Unfortunately, I do not know if there is a statistical test/technique in R
which I can use to answer this question.
Any advise is very much appreciated.

syrvn

--
View this message in context: 
http://r.789695.n4.nabble.com/Quantitative-analysis-of-treatment-effects-tp4635094.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot.prcomp() call/eval

2012-07-02 Thread Thaler,Thorn,LAUSANNE,Applied Mathematics

Then lets go with this:

http://pastebin.com/6dtGCrpA

as an example of what i do. If you've got a better idea lets hear it :)


On 29.06.2012, at 17:30, Joshua Wiley wrote:

 On Fri, Jun 29, 2012 at 1:20 AM, Jessica Streicher
 j.streic...@micromata.de wrote:
 Hm.. i attached a file with the code, but it doesn't show up somehow..
 
 non text files are scrubbed, and only certain file extensions are
 allowed (I forget which, I know that .R is *not* allowed (although I
 think that .txt and maybe .log are).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantitative analysis of treatment effects

2012-07-02 Thread syrvn

I found a nice presentation which I think addresses my question/problem.

See slide 3 here:

http://www.ispor.org/meetings/atlanta0510/presentations/IP1-CookJohnR.pdf

--
View this message in context: 
http://r.789695.n4.nabble.com/Quantitative-analysis-of-treatment-effects-tp4635094p4635095.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions

Yes indeed. Sorry for the typo, I just added the library(ggplot) thing 
afterwards and I did not check for the spelling. So it should read as 
library(ggplot2) and there the issue is still unsettled.

Thx for pointing that out.

KR,

-Thorn

 -Original Message-
 From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
 Sent: Montag, 2. Juli 2012 12:20
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics
 Cc: r-help@r-project.org
 Subject: Re: [R] ggplot: dodge positions
 
 Hello,
 
 Though I'm not the most fluent user of ggplot, I've seen no problem
 with
 the graph, each subgroup of points is over each boxplot.
 
 Maybe what made the difference was the use of ---  ggplot2
 not ggplot.
 
 Hope this helps,
 
 Rui Barradas
 
 Em 02-07-2012 10:43, Thaler,Thorn,LAUSANNE,Applied Mathematics
 escreveu:
  Dear all,
 
  I want to get a series of boxplots (grouped by two factors) and I
 want to overlay the original observations and the following code does
 almost what I want:
 
  library(ggplot)
  ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
 runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
  Yet the position of the points and the position of the boxes on the
 x-axis is not the same. I would like that the points are shifted
 accordingly, such that they line up with the boxplots. I tried
 position_dodge:
 
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
 geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
  but that did not really help, as all points are now dodged and I just
 want to have a fixed offset for each subgroup of points such that the
 boxplot and the points are aligned. Any ideas?
 
 
  Kind Regards,
 
  Thorn Thaler
  Mathematician
 
  Applied Mathematics
  Nestec Ltd,
  Nestlé Research Center
  PO Box 44
  CH-1000 Lausanne 26
  Phone: +41 21 785 8220
  Fax: +41 21 785 9486
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting R In Torrents instead of Download Zip Files

2012-07-02 Thread peter dalgaard


On Jul 2, 2012, at 12:17 , Ajay Ohri wrote:

 Dear List,
 
 Sorry for intrusion. I live in area of erratic internet download speeds.
 Can we get R in torrents instead of just download Zip files so we can
 resume downloads when broken

Only if someone is willing to put up the manpower to create the server-side 
infrastructure, I expect.

However, don't browsers and FTP client have stop/resume functionality built in 
these days? OSX Safari certainly does.

-pd


 
 Sincrely,
 
 A Ohri
 
 Websites-
 Technology
 http://decisionstats.com
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error to convert a Compute A^-1 B from Matlab to R using solve(A, B)

2012-07-02 Thread gianni lavaredo

Dear Researchers,

I need to convert the following equation in R from Matlab

a = [x y ones(size(x))];
b = [-(x.^2+y.^2)];
a\b

ans =

   -9.9981
  -16.4966
   -7.6646

my solution in R is:

a = cbind(x,y,rep(1,length(x)))
b = cbind(-(x^2+y^2))

 head(a)
xy
[1,] 14.45319 5.065726 1
[2,] 14.99478 5.173893 1
[3,] 14.64158 5.616916 1
[4,] 14.61803 6.624069 1
[5,] 14.19997 6.794587 1
[6,] 15.08174 8.224843 1
 head(b)
  [,1]
[1,] -234.5564
[2,] -251.6125
[3,] -245.9255
[4,] -257.5652
[5,] -247.8057
[6,] -295.1068

following MATLAB/ R Reference
http://cran.r-project.org/doc/contrib/Hiebeler-matlabR.pdf
the a\b could be converted by solve(a,b) but i get the following error:

 solve(a,b)
Error in solve.default(a, b) : 'b' must be compatible with 'a'

thanks for any help
Gianni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error to convert a Compute A^-1 B from Matlab to R using solve(A, B)

Have a look at ?solve and see:

a   
a square numeric or complex matrix containing the coefficients of the linear 
system.

your a isn't square.

The help also mentions qr.solve for non-square matrices.

greetings
Jessi

On 02.07.2012, at 13:38, gianni lavaredo wrote:

 Dear Researchers,
 
 I need to convert the following equation in R from Matlab
 
 a = [x y ones(size(x))];
 b = [-(x.^2+y.^2)];
 a\b
 
 ans =
 
   -9.9981
  -16.4966
   -7.6646
 
 my solution in R is:
 
 a = cbind(x,y,rep(1,length(x)))
 b = cbind(-(x^2+y^2))
 
 head(a)
xy
 [1,] 14.45319 5.065726 1
 [2,] 14.99478 5.173893 1
 [3,] 14.64158 5.616916 1
 [4,] 14.61803 6.624069 1
 [5,] 14.19997 6.794587 1
 [6,] 15.08174 8.224843 1
 head(b)
  [,1]
 [1,] -234.5564
 [2,] -251.6125
 [3,] -245.9255
 [4,] -257.5652
 [5,] -247.8057
 [6,] -295.1068
 
 following MATLAB/ R Reference
 http://cran.r-project.org/doc/contrib/Hiebeler-matlabR.pdf
 the a\b could be converted by solve(a,b) but i get the following error:
 
 solve(a,b)
 Error in solve.default(a, b) : 'b' must be compatible with 'a'
 
 thanks for any help
 Gianni
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting R In Torrents instead of Download Zip Files

2012-07-02 Thread Ajay Ohri

Dear Prof D

No server side architecture is needed for bit torrents. It runs on peer to
peer. You just host one file (torrent file) and your torrent searches for
peers (other people having that torrent file/data).

To kickstart- you need to host some files on one server, as the mother
seed. That also will need just a modification in the search engine list
within your local computer 's bit torrent engine(as in add- cran.at in list
of sources instead of other sources)

You may want to ask Ubuntu /Debian how /why they do it? I may be completely
wrong here on the technical code- but I think it does help people from
developing world with erratic bandwidth, which is where I come from.

Sincerely,

Ajay Ohri

Websites-
Technology
http://decisionstats.com




On Mon, Jul 2, 2012 at 4:55 PM, peter dalgaard pda...@gmail.com wrote:


 On Jul 2, 2012, at 12:17 , Ajay Ohri wrote:

  Dear List,
 
  Sorry for intrusion. I live in area of erratic internet download speeds.
  Can we get R in torrents instead of just download Zip files so we can
  resume downloads when broken

 Only if someone is willing to put up the manpower to create the
 server-side infrastructure, I expect.

 However, don't browsers and FTP client have stop/resume functionality
 built in these days? OSX Safari certainly does.

 -pd


 
  Sincrely,
 
  A Ohri
 
  Websites-
  Technology
  http://decisionstats.com
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Peter Dalgaard, Professor
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPATSTAT: Minimum points for a Ripley K to be sensible?

2012-07-02 Thread Adrian.Baddeley

Sebastian Pucilowski s.pucilow...@student.unimelb.edu.au writes:

  What are the minimum number of points in a point pattern before a
  clustering analysis using a Ripley K function loses any meaning?

It depends what are your definition of `meaningful'.

The K-function doesn't become meaningless (undefined) until there are fewer 
than 2 points. 

But if your dataset contained only 10 points, for example, you could type
plot(envelope(runifpoint(10), Kest, nsim=1000, nrank=25))
to see the pointwise 95% prediction intervals for the K-function (as grey 
shading)
from a Poisson process with a mean of 10 points. 

To be detectablly different from a Poisson process, a dataset of 10 points
would need a K-function that goes outside these intervals somewhere. 
So it would need to be an extremely clustered pattern. 

Try for different values of 10 and adjust to suit your definition

Adrian Baddeley

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] enquiry

2012-07-02 Thread Karan Anand

hi,
i am new  to using r .so if you can pls  tell me how  to read 1951-52
,1952-52 date format in r

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rgraphviz package question

2012-07-02 Thread Jiayi Hou

Hi all,

I have a question regarding the Rgraphviz package. Does anybody know how to
add x-Axis and y-Axis with scale to the plot generated by renderGraph( ).
Or, is there any althernative way to do so by using plot( ) instead ?

Thanks,

-- 
Jiayi Hou
Ph.D Candidate
Department of Biostatistics
School of Medicine
Virginia Commonwealth University
Tel:(804)-828-2879(office)
 (804)-274-8757(cell)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adjusting length of series

2012-07-02 Thread Lekgatlhamang, lexi Setlhare

Hi David and AK,
I have been trying to implement your suggestions since yesterday, but I 
encountered some challenges.
Â 
As for David's suggestions, I could only implement it after some 
modifications.Â Using an abridgedÂ version of my data, I dpud my dataset and 
then show my steps below.
Â 
 dput(ydata)
structure(c(68.10004, -34.80002, 90.39996, 
54.60004, -172.3, 51.80002, 175, 79.80002, 
-35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 
98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, 
-17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, 
-226.9001, 224.1, 123.2, -95.19998, -115.5001, 
166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 
424.5001, 61.79993, -107, 230.4001, -395.2001, 
239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160., 
-191.1001, 451.0001, -100.9001, -218.4, 
-20.30011, 281.7002, -179.9001, -170.6, 
416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999, 
337.4001, -625.6001, 634.6001, -384.5001, 
448.7001, NA, NA, -164.45784099, 17.079353995, 
95.976788009, 680.23816699, -491.34869099, -274.694009, 
-256.332907, 469.62296, -146.431891, -41.077201995, -106.970104, 
757.68826399, -1689.214533, 2320.098952, -1446.97942, 516.384521, 
-375.27765099, 293.86702999, 417.845195, 278.198807, 
-968.59203399, -314.195986, NA, NA, NA, 181.53719499, 
78.897434013, 584.26137898, -1171.586858, 216.65468199, 
18.361101998, 725.955867, -616.054851, 105.35468901, 
-65.892902005, 864.65836799, -2446.902797, 4009.313485, 
-3767.078372, 1963.363941, -891.66217199, 669.14468099, 
123.978165, -139.646388, -1246.790841, 654.396048, NA, 4937, 
5005.1, 4970.3, 5060.7, 5115.3, 4943, 4994.8, 5169.8, 5249.6, 
5213.9, 5344.4, 5461.2, 5393.7, 5558.2, 6073, 5746.9, 5845.3, 
6005.5, 6058.7, 6342.3, 6230.7, 6358.5, 6341.2, 6627.5, 4187.5, 
4296.004835, 4240.051829, 4201.178177, 4258.281313, 4995.622616, 
5241.615228, 5212.913831, 4927.879527, 5112.468183, 5150.624948, 
5147.704511, 5037.81397, 5685.611693, 4644.194883, 5922.877025, 
5754.579747, 6102.66699, 6075.476582, 6342.153204, 7026.675021, 
7989.395645, 7983.524235, 7663.456839), .Dim = c(24L, 7L), .Dimnames = list(
Â Â Â  NULL, c(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, 
Â Â Â  CredL1, BoBCL1)), .Tsp = c(2001.083, 2003, 12
), class = c(mts, ts))
Â 
NB: the NAs in the dataset emanated from laggingÂ or differencing the series
Â 
David's suggestion
Â df-data.frame(DCred1,DCred2,DCred3,DBoBC2,DBoBC3,CredL1,BoBCL1)
Error in data.frame(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1, BoBCL1) : 
Â  arguments imply differing number of rows: 23, 22, 21, 24

So I modified as follows:
length(DCred3)Â  # finding the minimum length of various series
[1] 21

# Then dataframe construction
dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21],
+ Dbobc2=DBoBC2[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21])
# Then estimated regression
 regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe)
 summary(regCred)
# Worked well as shown by results below
Call:
lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL + 
Â Â Â  BoBCL, data = dframe)
Residuals:
Â Â Â  MinÂ Â Â Â Â  1QÂ  MedianÂ Â Â Â Â  3QÂ Â Â Â  Max 
-69.516 -27.695Â  -8.085Â  13.851 107.276 
Coefficients:
Â Â Â Â Â Â Â Â Â Â Â Â  Estimate Std. Error t value Pr(|t|)Â Â Â  
(Intercept) 159.32304Â  157.15209Â Â  1.014 0.327873Â Â Â  
Dcre2Â Â Â Â Â Â Â  -0.75527Â Â Â  0.17262Â  -4.375 0.000634 ***
Dcre3Â Â Â Â Â Â Â  -0.21006Â Â Â  0.08656Â  -2.427 0.029329 *Â  
Dbobc2Â Â Â Â Â Â Â  0.05111Â Â Â  0.06565Â Â  0.779 0.449197Â Â Â  
Dbobc3Â Â Â Â Â Â Â  0.03106Â Â Â  0.03510Â Â  0.885 0.391108Â Â Â  
CredLÂ Â Â Â Â Â Â  -0.10967Â Â Â  0.04933Â  -2.223 0.043177 *Â  
BoBCLÂ Â Â Â Â Â Â Â  0.09756Â Â Â  0.03097Â Â  3.150 0.007087 ** 
---
Signif. codes:Â  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1 
Residual standard error: 52.3 on 14 degrees of freedom
Multiple R-squared: 0.9331,Â Â Â Â  Adjusted R-squared: 0.9044 
F-statistic: 32.55 on 6 and 14 DF,Â  p-value: 1.911e-07 
Â 
This is good, but couldn't I code the process for my 15 variable model?
Perhaps that is where the use of
Dcr- lapply(..., function(x) ...)
comes in?
Â 
AK, if you spare some minutes,Â please use my dput data to illustrate the 
suggestion you made, I searched the lapply function (using ??lapply) but could 
not get a handle of how to use it in my case. My dput data is as shown below.
Â 
Â Â Â Â Â Â Â Â  DCred1 DCred2Â  DCred3Â Â Â Â Â  DBoBC2Â Â Â Â Â  DBoBC3 
CredL1Â Â  BoBCL1
Feb 2001Â Â  68.1Â Â Â Â  NAÂ Â Â Â Â  NAÂ Â Â Â Â Â Â Â Â  
NAÂ Â Â Â Â Â Â Â Â  NA 4937.0 4187.500
Mar 2001Â  -34.8

[R] R sub query

2012-07-02 Thread Sarah Auburn

Hello,
I would like to substitute a substring of characters defined by a specific 
start and end sequence. 
i.e. in the example matrix below, I would like to substitute .:X: with , 
where X varies in sequence...
 
m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, 
.:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)
 
output required:
 [,1]  [,2]  [,3]    [,4] 
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56   0,13 75,0 
 
Thank you for any help
Sarah
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] degree of freedom GLM

2012-07-02 Thread Jennifer Kaiser

Hi,
I have a problem with the df.
I read in a big csv file.

Tabelle - 
read.csv(C:\\Users\\Public\\Documents\\Bachelorarbeit\\eingabe8_durchnummeriert.csv
 , header = T , sep=;)


then I try this:

 ygamma - glm(Tabelle$sb_ek_ber ~1+ Tabelle$FAHRL_C + Tabelle$NUTZKREIS + 
 Tabelle$schw_drittel_c Â  , family = Gamma)

 Â anova(ygamma, test=Chisq)

Analysis of Deviance Table

Model: Gamma, link: inverse

Response: Tabelle$sb_ek_ber

Terms added sequentially (first to last)


Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â DfÂ Deviance Resid. Df Resid. Dev Â Pr(Chi) 
Â  Â 
NULL Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â 1236805 Â  35451551 
Â  Â  Â  Â  Â  Â  Â 
Tabelle$FAHRL_C Â  Â  Â  Â  1 Â  Â  Â  33987 Â  1236804 Â  35417564 0.0018493 
**Â 
Tabelle$NUTZKREIS Â  Â  Â  1 Â  Â  Â 48903 Â  1236803 Â  35368661 0.0001880 ***
Tabelle$schw_drittel_c Â Â 1 Â  Â  Â  47328 Â  1236802 Â  35321334 0.0002388 ***
---
Signif. codes: Â 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1Â 

 str(Tabelle)
'data.frame': Â  1236806 obs. of Â 9 variables:
Â $ Alter_JÃ¼ngster_C_inkl_AlterNutz: int Â 1 1 1 1 1 1 1 1 1 1 ...
Â $ ALTERKAU_C Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  : int Â 1 2 2 1 3 3 3 4 1 1 ...
Â $ FAHRL_C Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â : int Â 1 2 1 3 4 3 3 1 5 1 ...
Â $ NUTZKREIS Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â : int Â 1 2 2 2 2 2 2 1 1 2 ...
Â $ RKL_U12 Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â : int Â 1 1 1 2 3 4 4 3 5 6 ...
Â $ SF_Sonder_aufgefÃ¼llt Â  Â  Â  Â  Â  : int Â 1 2 3 4 4 4 4 5 6 7 ...
Â $ schw_drittel_c Â  Â  Â  Â  Â  Â  Â  Â  : int Â 1 2 3 4 3 3 3 3 1 1 ...
Â $ sb_ek_ber Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â : num Â 0.001 0.001 0.001 0.001 
0.001 0.001 0.001 0.001 0.001 0.001 ...
Â $ JE_gewichtet Â  Â  Â  Â  Â  Â  Â  Â  Â  : num Â 0.384 3.952 3.952 2.81 
3.952 ...

I don't understand why the df are always 1.

it would be great if you could help me.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] table function in a matrix

2012-07-02 Thread Sarah Auburn

Thank you

From: Rui Barradas ruipbarra...@sapo.pt
To: Sarah Auburn saub...@yahoo.com 
Cc: r-help@r-project.org 
Sent: Monday, 2 July 2012, 17:39
Subject: Re: [R] table function in a matrix

Hello,

See the difference.

a - b - c(A, A, B, B, C, A, C, D, A, D, C, A, 
D, C, A, C)
a[3] - NA

table(a)
table(a, exclude=NULL) # always include NA
table(b, exclude=NULL) # always include NA

# more flexible
table(b, useNA=always)
table(b, useNA=ifany)

Hope this helps,

Rui Barradas

Em 02-07-2012 07:27, Sarah Auburn escreveu:
 Dear Petr,
 Thanks for your help. Sorry one more query for one of my datasets which has 
 NAs (missing genotypes). Is there any way in which I can count NAs?
 Many thanks!
 Sarah

 From: Sarah Auburn saub...@yahoo.com
 To: Petr Savicky savi...@cs.cas.cz
 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Thursday, 7 June 2012, 23:24
 Subject: Re: [R] table function in a matrix

 Perfect, thank you!

 From: Petr Savicky savi...@cs.cas.cz
 To: r-help@r-project.org
 Sent: Thursday, 7 June 2012, 19:42
 Subject: Re: [R] table function in a matrix

 On Wed, Jun 06, 2012 at 11:02:46PM -0700, Sarah Auburn wrote:
 Hi,
 I am trying to get a summary of the counts of different variables for each 
 sample in a matrix of the form m below to generate an output as shown. 
 (Ultimately I want to generate a stacked barchart for each sample). I am 
 only able to get the table function to work on one sample (column) at a 
 time. Any help appreciated.
 Thank you
 Sarah
 ?
 a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, 
 A, C)
 m-matrix(a, nrow=4)
 m
  [,1] [,2] [,3] [,4]
 [1,] A? C? A? D
 [2,] A? A? D? C
 [3,] B? C? C? A
 [4,] B? D? A? C

 output needed (so that I can use the barplot(t(output)) function):
        A B C D
 [,1] 2 2 0 0
 [,2] 1 0 2 1
 [,3] 2 0 1 1
 [,4] 1 0 2 1

 Hi.

 Try the following.

    a-c(A, A, B, B, C, A, C, D, A, D, C, A, D, C, 
A, C)
    m-matrix(a, nrow=4)
    tab - function(x) { table(factor(x, levels=LETTERS[1:4])) }
    t(apply(m, 2, tab))

        A B C D
    [1,] 2 2 0 0
    [2,] 1 0 2 1
    [3,] 2 0 1 1
    [4,] 1 0 2 1

 Factors are used to ensure that all the tables have the same length,
 even if some letters are missing.

 Hope this helps.

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pie Chart in map

2012-07-02 Thread hgerritsen

The package mapplots lets you plot pie charts on a map and vary their size.

http://r.789695.n4.nabble.com/file/n4635089/pie.png 

--
View this message in context: 
http://r.789695.n4.nabble.com/Pie-Chart-in-map-tp2318816p4635089.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] interpolation to new points between geo coordinates

2012-07-02 Thread Jan Näs

Hi

I have a data set with geo coordinates and values for each coordinate.
I want to interpolate the values to new positions on a finer grid,
also geo coordinates.
I have looked at the fields package (interp.surface) and the akima
package (interp) but cant quite figure what I am doing wrong, or if
these functions suits my needs.

I have the two data set:

grid_1:

  lat  lon  value
1 56.5  11.1  53
2 56.6 11.1 53.1
3 56.7 11.12 52.1
4 56.5 11.2 52.9
...etc.

and a new grid

grid_2
  lat  lon
1 55.52 11.11
2 55.53 11.115
3 55.54 11.12
...etc.


And I want interpolated values for grid_2.
Any ideas?
/Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decrete value check in a matrix

2012-07-02 Thread Rantony

Hi All,

Here i have an Dataframe (or) Matrix like this,

MyMatrix -
ABC  XYZ
-----
1  2.5
3.4   4
5  6
5.6  6.7

Here i need to check each column value having decrete value or not ?.
If that particular coulmn-value having decrete value, then the result should
be
TRUE/FALSE respectively in the result column.
Finally, i need to get the result as  Dataframe (or) Matrix form like this

ABC  XYZ  ABC_RESULT   XYZ_RESULT
----- 

1  2.5 TRUE   FALSE
3.4   4FALSE TRUE
5  6TRUETRUE
5.6  6.7  FALSE FALSE

- Can any one help me fast ? 

Antony.

--
View this message in context: 
http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] zero dimension error while soft thresholding

2012-07-02 Thread Deeksha Malhan

powers = c(c(1:12), seq(from = 10, to=20, by=2))
 sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
 pickSoftThreshold: calculating connectivity for given powers...
   ..working on genes 1 through 1000 of  631066
Error in cor(data, data[, c(startG:endG)], use = p) :
  'x' has a zero dimension.
In addition: Warning message:
In is.na(cols) : is.na() applied to non-(list or vector) of type 'NULL'

I am getting error as shown above while using the soft thresholding code.
.please help me

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting bar graph over a geographical map

2012-07-02 Thread hgerritsen

You could try the library mapplots, see example below: 
http://r.789695.n4.nabble.com/file/n4635091/xy.png 

--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-bar-graph-over-a-geographical-map-tp4346925p4635091.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error() model is singular - what does that mean

2012-07-02 Thread zetwal

Hello

I have some test data that looks like that from a within subject experiment.
Subject   Task-KindData-Kind   Time-Taken   Correct
1A  Data1 5   1
1A  Data1 3   0
1A  Data1 1   1
1A  Data2 8   1
1A  Data2 7   0
1A  Data2 5   0
1A  Data3 2   1
1A  Data3 7   0
1A  Data350
1A  Data360
1B  Data1 3   1
1B  Data1 1   1
1B  Data1 3   0
1B  Data2 9   0
1B  Data2 8   1
1B  Data2 5  0
1B  Data3 2   1
1B  Data3 7   2
1B  Data353
1B  Data360
1C  Data1 3   1
1C  Data1 1   1
1C  Data1 3   0
1C  Data2 9   0
1C  Data2 8   1
1C  Data2 5  0
1C  Data3 2   1
1C  Data3 7   2
1C  Data353
1C  Data360
2A  Data1 5   1
2A  Data1 3   0
2A  Data1 1   1
2A  Data2 8   1
2A  Data2 7   0
2A  Data2 5   0
2A  Data3 2   1
2A  Data3 7   0
2A  Data350
2A  Data360
2B  Data1 3   1
2B  Data1 1   1
2B  Data1 3   0
2B  Data2 9   0
2B  Data2 8   1
2   B  Data2 5  0
2B  Data3 2   1
2B  Data3 7   2
2B  Data353
2B  Data360
2C  Data1 3   1
2C  Data1 1   1
2C  Data1 3   0
2C  Data2 9   0
2C  Data2 8   1
2C  Data2 5  0
2C  Data3 2   1
2C  Data3 7   2
2C  Data353
2C  Data360
.
.
.

some notes:
there are 20 subjects
there are 5 different kinds of tasks
There are 5 different kinds of data
and there are several different variations for a certain kind of task and
kind of data which is why for Subject = 1   Task-Kind=A  and Data-Kind=Data1 
we have 3 different results.

The measured parameters are time to complete the task and whether it was
correct or not (0 implies correct and 1 implies not correct)

I am computing the anova as follows:
aov.ex =
aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp)

since I want to see how the result is affected by the different kinds of
data as well as the the kind of task and I get a warning message saying:
Error() model is singular

I would be very grateful if someone could please tell me what does this
mean.
Thanks
Pascal

--
View this message in context: 
http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply with multiple conditions

2012-07-02 Thread pguilha

Hello all,

I have written a for loop to act on a dataframe with close to 3million rows
and 6 columns and I would like to pass it to apply() to speed the process up
(I let the loop run for 2 days before stopping it and it had only gone
through 200,000 rows) but I am really struggling to find a way to pass the
arguments. Below are the loop and the head of the dataframe I am working on.
Any hints would be much appreciated, thank you! (I have searched for this
but could not find any other posts doing quite what I want)
Paul

x-as.numeric(all.tf7[1,2])
for (i in 2:nrow(all.tf7)) {
  if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)115341)
all.tf7[i,6]-all.tf7[i-1,6]
  else if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)=115341) {
all.tf7[i,6]-(all.tf7[i-1,6]+1)
x-as.numeric(all.tf7[i,2]) }
  else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
all.tf7[i,6]-(all.tf7[i-1,6]+1)
x-as.numeric(all.tf7[i,2]) } 
}

#the aim here is to attribute a bin number to each row so that I can then
split the dataframe according to those bins.


chrom chromStart chromEnd name cumsum bin
chr1  10089 10309   ZBTB33  10089   1
chr1  10132 10536  TAF7_(SQ-8)  20221   1
chr1  10133 10362Pol2-4H8  30354   1
chr1  10148 10418  MafF_(M8194)  40502   1
chr1  10382 10578ZBTB33  50884   1
chr1  16132 16352CTCF  67016   1

--
View this message in context: 
http://r.789695.n4.nabble.com/apply-with-multiple-conditions-tp4635098.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decrete value check in a matrix

You are not asking for a Decrete [sic]  (descrete) value check but rather if 
the numbers are intergers.  

Try this:

# from the ?is.integer help page
is.wholenumber -
function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x))  tol


aa  - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7))
ww  - data.frame(  is.wholenumber(aa))
cbind(aa, ww)

John Kane
Kingston ON Canada


 -Original Message-
 From: antony.akk...@ge.com
 Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] Decrete value check in a matrix
 
 Hi All,
 
 Here i have an Dataframe (or) Matrix like this,
 
 MyMatrix -
 ABC  XYZ
 -----
 1  2.5
 3.4   4
 5  6
 5.6  6.7
 
 Here i need to check each column value having decrete value or not ?.
 If that particular coulmn-value having decrete value, then the result
 should
 be
 TRUE/FALSE respectively in the result column.
 Finally, i need to get the result as  Dataframe (or) Matrix form like
 this
 
 ABC  XYZ  ABC_RESULT   XYZ_RESULT
 ----- 
 
 1  2.5 TRUE   FALSE
 3.4   4FALSE TRUE
 5  6TRUETRUE
 5.6  6.7  FALSE FALSE
 
 - Can any one help me fast ?
 
 Antony.



Receive Notifications of Incoming Messages
Easily monitor multiple email accounts  access them with a click.
Visit http://www.inbox.com/notifier and check it out!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error() model is singular - what does that mean

Just looking at it i would try renaming Task-Kind, Data-Kind an Time-Taken
Those are ambiguous in the Formula.

Task-Kind vs Task - Kind

Though that might not be the error at hand :)


On 02.07.2012, at 14:15, zetwal wrote:

 Hello
 
 I have some test data that looks like that from a within subject experiment.
 Subject   Task-KindData-Kind   Time-Taken   Correct
 1A  Data1 5   1
 1A  Data1 3   0
 1A  Data1 1   1
 1A  Data2 8   1
 1A  Data2 7   0
 1A  Data2 5   0
 1A  Data3 2   1
 1A  Data3 7   0
 1A  Data350
 1A  Data360
 1B  Data1 3   1
 1B  Data1 1   1
 1B  Data1 3   0
 1B  Data2 9   0
 1B  Data2 8   1
 1B  Data2 5  0
 1B  Data3 2   1
 1B  Data3 7   2
 1B  Data353
 1B  Data360
 1C  Data1 3   1
 1C  Data1 1   1
 1C  Data1 3   0
 1C  Data2 9   0
 1C  Data2 8   1
 1C  Data2 5  0
 1C  Data3 2   1
 1C  Data3 7   2
 1C  Data353
 1C  Data360
 2A  Data1 5   1
 2A  Data1 3   0
 2A  Data1 1   1
 2A  Data2 8   1
 2A  Data2 7   0
 2A  Data2 5   0
 2A  Data3 2   1
 2A  Data3 7   0
 2A  Data350
 2A  Data360
 2B  Data1 3   1
 2B  Data1 1   1
 2B  Data1 3   0
 2B  Data2 9   0
 2B  Data2 8   1
 2   B  Data2 5  0
 2B  Data3 2   1
 2B  Data3 7   2
 2B  Data353
 2B  Data360
 2C  Data1 3   1
 2C  Data1 1   1
 2C  Data1 3   0
 2C  Data2 9   0
 2C  Data2 8   1
 2C  Data2 5  0
 2C  Data3 2   1
 2C  Data3 7   2
 2C  Data353
 2C  Data360
 .
 .
 .
 
 some notes:
 there are 20 subjects
 there are 5 different kinds of tasks
 There are 5 different kinds of data
 and there are several different variations for a certain kind of task and
 kind of data which is why for Subject = 1   Task-Kind=A  and Data-Kind=Data1 
 we have 3 different results.
 
 The measured parameters are time to complete the task and whether it was
 correct or not (0 implies correct and 1 implies not correct)
 
 I am computing the anova as follows:
 aov.ex =
 aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp)
 
 since I want to see how the result is affected by the different kinds of
 data as well as the the kind of task and I get a warning message saying:
 Error() model is singular
 
 I would be very grateful if someone could please tell me what does this
 mean.
 Thanks
 Pascal
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] binary tree

2012-07-02 Thread Robert Baer


On 6/27/2012 3:20 AM, Peppino wrote:

Hi I am new with R

I Have to build a binary tree with R. I'm very confused was wondering if
anyone had any R sample code they would share.

Any bady can help me?

You might want to look at the R Task view for phylogenetics:
http://cran.r-project.org/web/views/Phylogenetics.html.

The ape package may be some help depending on what you want to do.

Rob


Bye

Giuseppe

--
View this message in context: 
http://r.789695.n4.nabble.com/binary-tree-tp4634593.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decrete value check in a matrix

2012-07-02 Thread Marc Schwartz

Need not to be that complicated:

 aa == round(aa)
nanb
[1,]  TRUE FALSE
[2,] FALSE  TRUE
[3,]  TRUE  TRUE
[4,] FALSE FALSE


 cbind(aa, Result = aa == round(aa))
   na  nb Result.na Result.nb
1 1.0 2.4   TRUE  FALSE
2 3.4 4.0  FALSE   TRUE
3 5.0 6.0   TRUE   TRUE
4 5.6 6.7  FALSE  FALSE


Regards,

Marc Schwartz

On Jul 2, 2012, at 7:46 AM, John Kane wrote:

 You are not asking for a Decrete [sic]  (descrete) value check but rather if 
 the numbers are intergers.  
 
 Try this:
 
 # from the ?is.integer help page
 is.wholenumber -
function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x))  tol
 
 
 aa  - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7))
 ww  - data.frame(  is.wholenumber(aa))
 cbind(aa, ww)
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: antony.akk...@ge.com
 Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] Decrete value check in a matrix
 
 Hi All,
 
 Here i have an Dataframe (or) Matrix like this,
 
 MyMatrix -
 ABC  XYZ
 -----
 1  2.5
 3.4   4
 5  6
 5.6  6.7
 
 Here i need to check each column value having decrete value or not ?.
 If that particular coulmn-value having decrete value, then the result
 should
 be
 TRUE/FALSE respectively in the result column.
 Finally, i need to get the result as  Dataframe (or) Matrix form like
 this
 
 ABC  XYZ  ABC_RESULT   XYZ_RESULT
 ----- 
 
 1  2.5 TRUE   FALSE
 3.4   4FALSE TRUE
 5  6TRUETRUE
 5.6  6.7  FALSE FALSE
 
 - Can any one help me fast ?
 
 Antony.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R sub query

2012-07-02 Thread jim holtman

Will this do:

 m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, 
 .:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)

 m
 [,1]  [,2]  [,3][,4]
[1,] .:0:0,0 .:194:193,1 .:58:50,8 .:114:114,0
[2,] .:2:0,2 .:56:0,56   .:13:0,13 .:75:75,0

 sub(^\\.:[^:]*:, , m)
 [,1]  [,2][,3]   [,4]
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56  0,13 75,0



On Mon, Jul 2, 2012 at 4:15 AM, Sarah Auburn saub...@yahoo.com wrote:
 Hello,
 I would like to substitute a substring of characters defined by a specific 
 start and end sequence.
 i.e. in the example matrix below, I would like to substitute .:X: with , 
 where X varies in sequence...

 m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, 
 .:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)

 output required:
  [,1]  [,2]  [,3]    [,4]
 [1,] 0,0 193,1 50,8 114,0
 [2,] 0,2 0,56   0,13 75,0

 Thank you for any help
 Sarah
         [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions

Can you expand a bit on what is wrong with the dodge option?  From what I see 
it looks lovely witht the points exactly lined with the boxplots for each group 
but perhaps I don't understand exactly what you want .



John Kane
Kingston ON Canada


 -Original Message-
 From: thorn.tha...@rdls.nestle.com
 Sent: Mon, 2 Jul 2012 11:43:03 +0200
 To: r-help@r-project.org
 Subject: [R] ggplot: dodge positions
 
 Dear all,
 
 I want to get a series of boxplots (grouped by two factors) and I want to
 overlay the original observations and the following code does almost what
 I want:
 
 library(ggplot)
 ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
 runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
 Yet the position of the points and the position of the boxes on the
 x-axis is not the same. I would like that the points are shifted
 accordingly, such that they line up with the boxplots. I tried
 position_dodge:
 
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
 geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
 but that did not really help, as all points are now dodged and I just
 want to have a fixed offset for each subgroup of points such that the
 boxplot and the points are aligned. Any ideas?
 
 
 Kind Regards,
 
 Thorn Thaler
 Mathematician
 
 Applied Mathematics
 Nestec Ltd,
 Nestlé Research Center
 PO Box 44
 CH-1000 Lausanne 26
 Phone: +41 21 785 8220
 Fax: +41 21 785 9486


GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] enquiry

2012-07-02 Thread jim holtman

try this:

 as.Date('1951-52', format = %Y-%j)
[1] 1951-02-21



On Mon, Jul 2, 2012 at 5:44 AM, Karan Anand anand.kara...@gmail.com wrote:
 hi,
     i am new  to using r .so if you can pls  tell me how  to read 1951-52
 ,1952-52     date format in r

         [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error() model is singular - what does that mean

Also, try googling for  - R model is singular - , there seem to have been a lot 
of people with that particular error.

On 02.07.2012, at 14:56, Jessica Streicher wrote:

 Just looking at it i would try renaming Task-Kind, Data-Kind an Time-Taken
 Those are ambiguous in the Formula.
 
 Task-Kind vs Task - Kind
 
 Though that might not be the error at hand :)
 
 
 On 02.07.2012, at 14:15, zetwal wrote:
 
 Hello
 
 I have some test data that looks like that from a within subject experiment.
 Subject   Task-KindData-Kind   Time-Taken   Correct
 1A  Data1 5   1
 1A  Data1 3   0
 1A  Data1 1   1
 1A  Data2 8   1
 1A  Data2 7   0
 1A  Data2 5   0
 1A  Data3 2   1
 1A  Data3 7   0
 1A  Data350
 1A  Data360
 1B  Data1 3   1
 1B  Data1 1   1
 1B  Data1 3   0
 1B  Data2 9   0
 1B  Data2 8   1
 1B  Data2 5  0
 1B  Data3 2   1
 1B  Data3 7   2
 1B  Data353
 1B  Data360
 1C  Data1 3   1
 1C  Data1 1   1
 1C  Data1 3   0
 1C  Data2 9   0
 1C  Data2 8   1
 1C  Data2 5  0
 1C  Data3 2   1
 1C  Data3 7   2
 1C  Data353
 1C  Data360
 2A  Data1 5   1
 2A  Data1 3   0
 2A  Data1 1   1
 2A  Data2 8   1
 2A  Data2 7   0
 2A  Data2 5   0
 2A  Data3 2   1
 2A  Data3 7   0
 2A  Data350
 2A  Data360
 2B  Data1 3   1
 2B  Data1 1   1
 2B  Data1 3   0
 2B  Data2 9   0
 2B  Data2 8   1
 2   B  Data2 5  0
 2B  Data3 2   1
 2B  Data3 7   2
 2B  Data353
 2B  Data360
 2C  Data1 3   1
 2C  Data1 1   1
 2C  Data1 3   0
 2C  Data2 9   0
 2C  Data2 8   1
 2C  Data2 5  0
 2C  Data3 2   1
 2C  Data3 7   2
 2C  Data353
 2C  Data360
 .
 .
 .
 
 some notes:
 there are 20 subjects
 there are 5 different kinds of tasks
 There are 5 different kinds of data
 and there are several different variations for a certain kind of task and
 kind of data which is why for Subject = 1   Task-Kind=A  and Data-Kind=Data1 
 we have 3 different results.
 
 The measured parameters are time to complete the task and whether it was
 correct or not (0 implies correct and 1 implies not correct)
 
 I am computing the anova as follows:
 aov.ex =
 aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp)
 
 since I want to see how the result is affected by the different kinds of
 data as well as the the kind of task and I get a warning message saying:
 Error() model is singular
 
 I would be very grateful if someone could please tell me what does this
 mean.
 Thanks
 Pascal
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geom_boxplot

2012-07-02 Thread Thaler,Thorn,LAUSANNE,Applied Mathematics

Thanks Petr,

I suppose this means I have to reread that set of changes again.  I think I 
noticed it and promptly forgot it.

John Kane
Kingston ON Canada


 -Original Message-
 From: petr.pi...@precheza.cz
 Sent: Mon, 2 Jul 2012 10:41:20 +0200
 To: jrkrid...@inbox.com
 Subject: Re: [R] geom_boxplot
 
 Hi
 
 In new ggplot2 version following works too
 
 p + geom_boxplot(aes(fill = factor(cyl))) +
labs(fill = Cylinders) + ylab(Miles per Gallon)+xlab(Number
 of
 Cylinders)
 
 Regards
 Petr
 
 
 Yes you can do all of the things you want.
 
 Below is a start, to give you an idea of how to approach some of it.
 
 library(ggplot2)
 p - ggplot(mtcars, aes(factor(cyl), mpg))
  p  -  p + geom_boxplot(aes(fill = factor(cyl))) +
   labs(fill = Cylinders)  +
   scale_y_continuous(Miles per Gallon) +
   scale_x_discrete(Number of Cylinders)
 p
 
 
 Have a look at
 ackoverflow.com/questions/3606697/how-to-set-x-axis-limits-
 in-ggplot2-r-plots for x and y axes limits.
 
 It took me a while to realise it but, generally, I find that it is not
 too
 hard to find examples of what you need by just googling something like
 :ggplot2 set x and y limits  or ggplot2 geom_bar colour and so on.
 
 The ggplot2 and geom_XXX are pretty unique on the internet and search
 results usually are not too bad.
 
 You may also want to subcribe to the ggplot2 group on google groups.
 
 Best wishes
 
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: hannah@gmail.com
 Sent: Sun, 1 Jul 2012 08:39:20 -0400
 To: r-help@r-project.org
 Subject: Re: [R] geom_boxplot
 
 Also, it is possible to change ylim also?
 
 2012/7/1 li li hannah@gmail.com
 
 Dear all,
   I have a few questions regarding the boxplot output from the
 geom_boxplot function.
 Attached is the output I get. Below are my questions:
 
   1. How can I define the xlab and ylab myself?
  Also I would like to remove factor(variable)
 line on the right side.
 
   2. How can I define the colors of the boxplots myself.
  For example, I want to use blue for
 LR, green for pair and purple for BR1.
   Thanks so much!
 Hannah
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 FREE ONLINE PHOTOSHARING - Share your photos online with your friends
 and family!
 Visit http://www.inbox.com/photosharing to find out more!
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions

I guess it works with ggplot but not with ggplot2. I'm using only the latter 
but had a typo in my first post. So the code (which does not do what I want) is:

library(ggplot2)
ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
 runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()

Thinking of it, I would need to find out which offset ggplot uses to dodge the 
nested factors. If I knew the exact quantity, I could do something like

geom_point(aes(x = offset.used.by.geom_boxplot))

So how are the exact positions on the x-axis for geom_boxplot determined? Any 
ideas?

Thanks for the help, anyways.

KR,

-Thorn


 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: Montag, 2. Juli 2012 15:04
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 Can you expand a bit on what is wrong with the dodge option?  From what
 I see it looks lovely witht the points exactly lined with the boxplots
 for each group but perhaps I don't understand exactly what you want .
 
 
 
 John Kane
 Kingston ON Canada
 
 
  -Original Message-
  From: thorn.tha...@rdls.nestle.com
  Sent: Mon, 2 Jul 2012 11:43:03 +0200
  To: r-help@r-project.org
  Subject: [R] ggplot: dodge positions
 
  Dear all,
 
  I want to get a series of boxplots (grouped by two factors) and I
 want to
  overlay the original observations and the following code does almost
 what
  I want:
 
  library(ggplot)
  ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
  runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
  Yet the position of the points and the position of the boxes on the
  x-axis is not the same. I would like that the points are shifted
  accordingly, such that they line up with the boxplots. I tried
  position_dodge:
 
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
  geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
  but that did not really help, as all points are now dodged and I just
  want to have a fixed offset for each subgroup of points such that the
  boxplot and the points are aligned. Any ideas?
 
 
  Kind Regards,
 
  Thorn Thaler
  Mathematician
 
  Applied Mathematics
  Nestec Ltd,
  Nestlé Research Center
  PO Box 44
  CH-1000 Lausanne 26
  Phone: +41 21 785 8220
  Fax: +41 21 785 9486
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™
 and most webmails
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R sub query

2012-07-02 Thread Petr PIKAL

Hi

I am not at all an expert in regular expressions but

gsub(^[[:punct:]]+[[:digit:]]+:, ,m)

does the output you want. Maybe by chance :-)

Regards
Petr

 
 Hello,
 I would like to substitute a substring of characters defined by a 
specific
 start and end sequence. 
 i.e. in the example matrix below, I would like to substitute .:X: with 

 , where X varies in sequence...
  
 m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, 
.:58:50,8,
 .:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)
  
 output required:
  [,1]  [,2]  [,3][,4] 
 [1,] 0,0 193,1 50,8 114,0
 [2,] 0,2 0,56   0,13 75,0 
  
 Thank you for any help
 Sarah
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply with multiple conditions

2012-07-02 Thread Jean V Adams

Paul,

My interpretation is that you are trying to assign a new bin number to a 
row every time the variable chrom changes and every time the variable 
chromStart changes by 115341 or more.  Is that right?  If so, you don't 
need a loop at all.  Check out the code below.  I made a couple changes to 
the all.tf7 example data frame so that it would have two changes in bin 
number, one based on the chrom variable and one based on the chromStart 
variable.

Jean

all.tf7 - data.frame(
chrom = c(chr1, chr1, chr2, chr2, chr2, chr2), 
chromStart = c(10089, 10132, 10133, 10148, 210382, 216132), 
chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352), 
name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194), 
ZBTB33, CTCF), 
cumsum = c(10089, 20221, 30354, 40502, 50884, 67016), 
bin = rep(NA, 6)
)

# assign a new bin every time chrom changes and every time chromStart 
changes by 115341 or more
L - nrow(all.tf7)
prev.chrom - c(NA, all.tf7$chrom[-L])
delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start 
= 115341
all.tf7$bin - cumsum(new.bin)
all.tf7


pguilha paul.guilha...@gmail.com wrote on 07/02/2012 06:25:13 AM:

 Hello all,
 
 I have written a for loop to act on a dataframe with close to 3million 
rows
 and 6 columns and I would like to pass it to apply() to speed the 
process up
 (I let the loop run for 2 days before stopping it and it had only gone
 through 200,000 rows) but I am really struggling to find a way to pass 
the
 arguments. Below are the loop and the head of the dataframe I am working 
on.
 Any hints would be much appreciated, thank you! (I have searched for 
this
 but could not find any other posts doing quite what I want)
 Paul
 
 x-as.numeric(all.tf7[1,2])
 for (i in 2:nrow(all.tf7)) {
   if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)115341)
 all.tf7[i,6]-all.tf7[i-1,6]
   else if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)=115341) {
 all.tf7[i,6]-(all.tf7[i-1,6]+1)
 x-as.numeric(all.tf7[i,2]) }
   else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
 all.tf7[i,6]-(all.tf7[i-1,6]+1)
 x-as.numeric(all.tf7[i,2]) } 
 }
 
 #the aim here is to attribute a bin number to each row so that I can 
then
 split the dataframe according to those bins.
 
 
 chrom chromStart chromEnd name cumsum bin
 chr1  10089 10309   ZBTB33  10089   1
 chr1  10132 10536  TAF7_(SQ-8)  20221   1
 chr1  10133 10362Pol2-4H8  30354   1
 chr1  10148 10418  MafF_(M8194)  40502   1
 chr1  10382 10578ZBTB33  50884   1
 chr1  16132 16352CTCF  67016   1
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] enquiry

2012-07-02 Thread jim holtman

In order to do any conversion, you have to know the format of the data
that is being input.  So which is it:  does 52 represent the day of
the year, or the week of the year?  Does make a big difference, but
until you know what 1952-52 means, it is hard to specify which way
you should do the conversion.

On Mon, Jul 2, 2012 at 9:12 AM, arun smartpink...@yahoo.com wrote:
 Hi Jim,

 I tried,

 dat2-as.Date(dat1,format=%Y-%V)
 dat2
 [1] 1951-07-02 1952-07-02

 But, if the format is for -wk or -yy, then, not sure how this will 
 help.

 A.K.




 - Original Message -
 From: jim holtman jholt...@gmail.com
 To: Karan Anand anand.kara...@gmail.com
 Cc: r-help@r-project.org
 Sent: Monday, July 2, 2012 9:04 AM
 Subject: Re: [R] enquiry

 try this:

 as.Date('1951-52', format = %Y-%j)
 [1] 1951-02-21



 On Mon, Jul 2, 2012 at 5:44 AM, Karan Anand anand.kara...@gmail.com wrote:
 hi,
     i am new  to using r .so if you can pls  tell me how  to read 1951-52
 ,1952-52     date format in r

         [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions

I don't think I was clear. Sorry.  What I was refering to was the 

ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + 
geom_point(aes(ymax=max(y)),
position = position_dodge(width=.75))

which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which was what I 
though you want.  

I have no idea how the x axis points on the boxplot are  determined.  It may be 
relatively clear in the code but I don't really have the knowledge to ferret it 
out.

Sorry that I cannot be of more help.

John Kane
Kingston ON Canada


 -Original Message-
 From: thorn.tha...@rdls.nestle.com
 Sent: Mon, 2 Jul 2012 15:10:33 +0200
 To: jrkrid...@inbox.com, r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 I guess it works with ggplot but not with ggplot2. I'm using only the
 latter but had a typo in my first post. So the code (which does not do
 what I want) is:
 
 library(ggplot2)
 ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
  runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
 Thinking of it, I would need to find out which offset ggplot uses to
 dodge the nested factors. If I knew the exact quantity, I could do
 something like
 
 geom_point(aes(x = offset.used.by.geom_boxplot))
 
 So how are the exact positions on the x-axis for geom_boxplot determined?
 Any ideas?
 
 Thanks for the help, anyways.
 
 KR,
 
 -Thorn
 
 
 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: Montag, 2. Juli 2012 15:04
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 Can you expand a bit on what is wrong with the dodge option?  From what
 I see it looks lovely witht the points exactly lined with the boxplots
 for each group but perhaps I don't understand exactly what you want .
 
 
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: thorn.tha...@rdls.nestle.com
 Sent: Mon, 2 Jul 2012 11:43:03 +0200
 To: r-help@r-project.org
 Subject: [R] ggplot: dodge positions
 
 Dear all,
 
 I want to get a series of boxplots (grouped by two factors) and I
 want to
 overlay the original observations and the following code does almost
 what
 I want:
 
 library(ggplot)
 ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
 runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
 Yet the position of the points and the position of the boxes on the
 x-axis is not the same. I would like that the points are shifted
 accordingly, such that they line up with the boxplots. I tried
 position_dodge:
 
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
 geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
 but that did not really help, as all points are now dodged and I just
 want to have a fixed offset for each subgroup of points such that the
 boxplot and the points are aligned. Any ideas?
 
 
 Kind Regards,
 
 Thorn Thaler
 Mathematician
 
 Applied Mathematics
 Nestec Ltd,
 Nestlé Research Center
 PO Box 44
 CH-1000 Lausanne 26
 Phone: +41 21 785 8220
 Fax: +41 21 785 9486
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™
 and most webmails
 



GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decrete value check in a matrix

2012-07-02 Thread Thaler,Thorn,LAUSANNE,Applied Mathematics



Hi,

Try this:

dat1-read.table(text=
ABC  XYZ 
1  2.5
3.4   4
5  6
5.6  6.7
,sep=,header=TRUE)

dat1[dat1$ABC ==as.integer(dat1$ABC), ABC_RESULT]-TRUE
dat1[dat1$XYZ== as.integer(dat1$XYZ),XYZ_RESULT]-TRUE
dat1[is.na(dat1)]-FALSE
dat1
  ABC XYZ ABC_RESULT XYZ_RESULT
1 1.0 2.5   TRUE  FALSE
2 3.4 4.0  FALSE   TRUE
3 5.0 6.0   TRUE   TRUE
4 5.6 6.7  FALSE  FALSE

A.K.


From: Akkara, Antony (GE Energy, Non-GE) antony.akk...@ge.com
To: arun smartpink...@yahoo.com 
Sent: Monday, July 2, 2012 7:29 AM
Subject: Decrete value check in a matrix 


Hi Arun,

Can you please help me,

Here i have a Data frame (or) Matrix like this, 

MyMatrix - 

ABC          XYZ 
--        --- 
1              2.5 
3.4           4 
5              6 
5.6          6.7 

Here i need to check each column value having decrete value or not ?. 
If that particular coulmn-value having decrete value, then the result should be 
TRUE/FALSE respectively in the result column. 
Finally, i need to get the result as  Dataframe (or) Matrix form like this 

ABC          XYZ      ABC_RESULT               XYZ_RESULT 
--        ---                   
1              2.5                 TRUE                       FALSE 
3.4           4                    FALSE                     TRUE 
5              6                    TRUE                        TRUE 
5.6          6.7                  FALSE                     FALSE 

- Can any one solution fast. Its urgent thtz y. 

Antony.  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions

Unfortunately I can't see your example as the page is blocked by our firewall. 
Anyways, if I try the dodge code, the points are shifted, yet they are all 
shifted by another offset. It makes that the green points for instance are 
indeed closer to the green boxplot, yet they are not aligned meaning that all 
green plots seem to have a different position on the x-axis, while all the 
green points for x == A should align exactly with A. Am I clearer now? 

KR,

-Thorn


 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: Montag, 2. Juli 2012 15:21
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 I don't think I was clear. Sorry.  What I was refering to was the
 
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
 geom_point(aes(ymax=max(y)),
 position = position_dodge(width=.75))
 
 which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which
 was what I though you want.
 
 I have no idea how the x axis points on the boxplot are  determined.
 It may be relatively clear in the code but I don't really have the
 knowledge to ferret it out.
 
 Sorry that I cannot be of more help.
 
 John Kane
 Kingston ON Canada
 
 
  -Original Message-
  From: thorn.tha...@rdls.nestle.com
  Sent: Mon, 2 Jul 2012 15:10:33 +0200
  To: jrkrid...@inbox.com, r-help@r-project.org
  Subject: RE: [R] ggplot: dodge positions
 
  I guess it works with ggplot but not with ggplot2. I'm using only the
  latter but had a typo in my first post. So the code (which does not
 do
  what I want) is:
 
  library(ggplot2)
  ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
   runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
  Thinking of it, I would need to find out which offset ggplot uses to
  dodge the nested factors. If I knew the exact quantity, I could do
  something like
 
  geom_point(aes(x = offset.used.by.geom_boxplot))
 
  So how are the exact positions on the x-axis for geom_boxplot
 determined?
  Any ideas?
 
  Thanks for the help, anyways.
 
  KR,
 
  -Thorn
 
 
  -Original Message-
  From: John Kane [mailto:jrkrid...@inbox.com]
  Sent: Montag, 2. Juli 2012 15:04
  To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
  Subject: RE: [R] ggplot: dodge positions
 
  Can you expand a bit on what is wrong with the dodge option?  From
 what
  I see it looks lovely witht the points exactly lined with the
 boxplots
  for each group but perhaps I don't understand exactly what you want
 .
 
 
 
  John Kane
  Kingston ON Canada
 
 
  -Original Message-
  From: thorn.tha...@rdls.nestle.com
  Sent: Mon, 2 Jul 2012 11:43:03 +0200
  To: r-help@r-project.org
  Subject: [R] ggplot: dodge positions
 
  Dear all,
 
  I want to get a series of boxplots (grouped by two factors) and I
  want to
  overlay the original observations and the following code does
 almost
  what
  I want:
 
  library(ggplot)
  ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
  runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
  Yet the position of the points and the position of the boxes on the
  x-axis is not the same. I would like that the points are shifted
  accordingly, such that they line up with the boxplots. I tried
  position_dodge:
 
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
  geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
  but that did not really help, as all points are now dodged and I
 just
  want to have a fixed offset for each subgroup of points such that
 the
  boxplot and the points are aligned. Any ideas?
 
 
  Kind Regards,
 
  Thorn Thaler
  Mathematician
 
  Applied Mathematics
  Nestec Ltd,
  Nestlé Research Center
  PO Box 44
  CH-1000 Lausanne 26
  Phone: +41 21 785 8220
  Fax: +41 21 785 9486
 
  
  GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
  http://www.inbox.com/smileys
  Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google
 Talk™
  and most webmails
 
 
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™
 and most webmails
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decrete value check in a matrix

2012-07-02 Thread Rantony

Good. Its working fine.

Thank you John ! 

From: John Kane [via R]
[mailto:ml-node+s789695n4635113...@n4.nabble.com] 
Sent: Monday, July 02, 2012 6:18 PM
To: Akkara, Antony (GE Energy, Non-GE)
Subject: Re: Decrete value check in a matrix

You are not asking for a Decrete [sic]  (descrete) value check but
rather if the numbers are intergers.   

Try this: 

# from the ?is.integer help page 
is.wholenumber - 
function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x))  tol 

aa  - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7)) 
ww  - data.frame(  is.wholenumber(aa)) 
cbind(aa, ww) 

John Kane 
Kingston ON Canada 

 -Original Message- 
 From: [hidden email] 
 Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT) 
 To: [hidden email] 
 Subject: [R] Decrete value check in a matrix 

 Hi All, 

 Here i have an Dataframe (or) Matrix like this, 

 MyMatrix - 
 ABC  XYZ 
 ----- 
 1  2.5 
 3.4   4 
 5  6 
 5.6  6.7 

 Here i need to check each column value having decrete value or not ?. 
 If that particular coulmn-value having decrete value, then the result 
 should 
 be 
 TRUE/FALSE respectively in the result column. 
 Finally, i need to get the result as  Dataframe (or) Matrix form like 
 this 

 ABC  XYZ  ABC_RESULT   XYZ_RESULT 
 -----  

 1  2.5 TRUE   FALSE 
 3.4   4FALSE TRUE 
 5  6TRUETRUE 
 5.6  6.7  FALSE FALSE 

 - Can any one help me fast ? 

 Antony. 

Receive Notifications of Incoming Messages 
Easily monitor multiple email accounts  access them with a click. 
Visit http://www.inbox.com/notifier and check it out! 

__ 
[hidden email] mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

If you reply to this email, your message will be added to the discussion
below:

http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p
4635113.html 

To unsubscribe from Decrete value check in a matrix, click here
http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscrib
e_by_codenode=4635090code=YW50b255LmFra2FyYUBnZS5jb218NDYzNTA5MHwxNTUx
OTQzMDI5 .
NAML
http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_view
erid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.Bas
icNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.tem
plate.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml
-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemai
l.naml  

--
View this message in context: 
http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p4635118.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Specify model with polynomial interaction terms up to degree n

2012-07-02 Thread YTP

I would like to specify a model with all polynomial interaction terms between
two variables, say, up to degree 6. For example, terms like a^6 + (a^5 *
b^1)  +  (a^4 * b^2) + ... and so on.  The documentation states

The ^ operator indicates crossing to the specified degree.

so I would expect a model specified as y ~ (a+b)^6 to produce these terms.
However doing this only returns four slope coefficients, for Intercept, a,
b, and a:b.  Does anyone know how to produce the desired result? Thanks in
advance.

--
View this message in context: 
http://r.789695.n4.nabble.com/Specify-model-with-polynomial-interaction-terms-up-to-degree-n-tp4635130.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Binary Quadratic Opt?

2012-07-02 Thread khris

Hi, Petr,

 
 Hi Khris: 
 
 If i understand the problem correctly, you have a list of (x,y) coordinates, 
 where 
 some sensor is located, but you do not know, which sensor is there. The 
 database 
 contains data for each sensor identified in some way, but you do not know the 
 mapping between sensor identifiers from the database and the (x,y) 
 coordinates. 
 Is this correct? 

Yes.

 
  So I modelled the problem as inexact match between 2 Graphs. Since the best 
  package on Graphs i.e. iGraph does not have any function for Graph matching 
 
 I think, the problem is close to 
 
   http://en.wikipedia.org/wiki/Graph_isomorphism
 
 You have estimates of the distances between the sensors using identifiers 
 from the database. So, you know, which pairs of sensors are close. This is 
 one graph. The other graph is the graph of closeness between the known (x,y) 
 coordinates. You want to find a mapping between the vertices of these two 
 graphs, which preserves edges.

Yes, I agree the problem is more into Graph theoretic domain to be more precise 
inexact graph matching whose generalization is the Graph Isomorphism problem. 
The problem is more general than Graph Isomorphism. Let me define the problem 
more formally.

 We have 2 weighted undirected graphs. In one graph I know the distance of 
every vertex from every other vertex whereas in another graph I know only which 
vertices are close to a given vertex. So I know the neighboring vertices given 
a vertex. So the distance matrix of other Graph is incompletely known. So the 
question is can I find the best alignment between the 2 graphs.

Ex:- G1 is know the complete distance matrix. For G2, if there are four 
vertices let's say (v1, v2, v3 v4) the I know edge weight (v1,v2) and (v1,v3)  
but have no information of edge weight(v1,v4). Similarly I know about (v2,v3) 
but no information about edge weights (v2,v4) or (v3,v4).

So I was thinking of not to model it as general inexact Graph matching problem 
for then the complexity n^4. It seems the best way to model the solution is to 
consider only edges with are at distance of 1 unit i.e. closest edge from every 
vertex and not every edge from the given vertex. This will bring down the 
complexity from n^4 to 6*6*n^2 assuming every vertex has atmost 6 neighboring 
vertex. Quadratic complexity seems manageable. Ofcourse now the solution become 
lot more sensitive to the errors in Graph G2. Assuming best case if I have no 
errors in G2 i.e. for every vertex I know correctly it's closest neighbored in 
the rectangular grid then optimizing distance between G1 and G2 should give me 
best correct alignment. This seems to be the best approach under current 
circumstance.

As far as implementation goes I think I still have to use optimization package 
since there are not any readily and freely available function for inexact graph 
matching.

Petr how do you feel about it. Appreciate your feedback.

Regards
Khris.

 
  I converted the Inexact graph matching problem to Binary Quadratic Opt 
  Problem. Since there is no specialized package for Binary Quadratic Opt, 
  based on your input I converted it  into Binary Linear Opt problem. 
 
 The problem of graph isomorphism is hard in general, but if one of the 
 graphs is a rectangular grid, which does not have too many automorphisms, 
 the problem is not too hard. Try, for example, the following approach. 
 
 Look for small groups of the sensors, which form connected subgraphs, which 
 have the form of small pieces of the rectangular grid. If you have such 
 a small subgraph, look for nodes, which can be add to the subgraph to make 
 it a larger piece of the grid. 
 
 To start, the algorithm can choose any sensor, say S_0. Find all its 
 neighbours. 
 There should be at most 4 neighbours (in an ideal grid). Call the group of 
 these neighbours S_1. Then, find sensors, which are neighbours to at least 
 two members of S_1. Call them group S_2. The connections between S_0, S_1 
 and S_2 should form a pattern like 
 
2 - 1 - 2 
|   |   | 
1 - 0 - 1 
|   |   | 
2 - 1 - 2 
 
 The digits 0, 1, 2 distinguish elements of S_0, S_1, S_2. Continue this in 
 order to enlarge this recognised pattern. 
 
 If the grid is not ideal, the process may require to maintain several 
 candidate connected patterns and choose those, which can be extended 
 with further sensors and discard those, which cannot. 
 
 Another approach is as follows. Choose a random mapping between the 
 sensors and (x,y). Define a measure of the quality of the mapping. 
 For example, the number of matching edges minus the number of non-matching 
 edges. Then, use local search to maximize the quality. For example, in each 
 step, exchange two sensors in a way, which increases the quality. 
 
 Do you think that some of these approaches is applicable to your situation? 
 
 Petr. 
 
 __ 
 [hidden email] mailing list

Re: [R] enquiry

Hi Jim,

I tried,

dat2-as.Date(dat1,format=%Y-%V)
 dat2
[1] 1951-07-02 1952-07-02

But, if the format is for -wk or -yy, then, not sure how this will help.

A.K.




- Original Message -
From: jim holtman jholt...@gmail.com
To: Karan Anand anand.kara...@gmail.com
Cc: r-help@r-project.org
Sent: Monday, July 2, 2012 9:04 AM
Subject: Re: [R] enquiry

try this:

 as.Date('1951-52', format = %Y-%j)
[1] 1951-02-21



On Mon, Jul 2, 2012 at 5:44 AM, Karan Anand anand.kara...@gmail.com wrote:
 hi,
     i am new  to using r .so if you can pls  tell me how  to read 1951-52
 ,1952-52     date format in r

         [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions


Damn firewalls.
However as I see the graph, the points are lined up exactly on the centreline 
of each boxplot. So for example the lowest outlier on grp1 boxplot for A is 
exactlly where it should be at the end of the whisker.  Real outliers for grp1 
for D are exactly above the 'non-existant' whisker for that boxplot.  So I 
looks to me as if in my version of the plot it is what you want.

Any change that I can email directly to you and get an attactment through?
John Kane
Kingston ON Canada


 -Original Message-
 From: thorn.tha...@rdls.nestle.com
 Sent: Mon, 2 Jul 2012 15:40:48 +0200
 To: jrkrid...@inbox.com, r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 Unfortunately I can't see your example as the page is blocked by our
 firewall. Anyways, if I try the dodge code, the points are shifted, yet
 they are all shifted by another offset. It makes that the green points
 for instance are indeed closer to the green boxplot, yet they are not
 aligned meaning that all green plots seem to have a different position on
 the x-axis, while all the green points for x == A should align exactly
 with A. Am I clearer now?
 
 KR,
 
 -Thorn
 
 
 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: Montag, 2. Juli 2012 15:21
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 I don't think I was clear. Sorry.  What I was refering to was the
 
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
 geom_point(aes(ymax=max(y)),
 position = position_dodge(width=.75))
 
 which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35 which
 was what I though you want.
 
 I have no idea how the x axis points on the boxplot are  determined.
 It may be relatively clear in the code but I don't really have the
 knowledge to ferret it out.
 
 Sorry that I cannot be of more help.
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: thorn.tha...@rdls.nestle.com
 Sent: Mon, 2 Jul 2012 15:10:33 +0200
 To: jrkrid...@inbox.com, r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 I guess it works with ggplot but not with ggplot2. I'm using only the
 latter but had a typo in my first post. So the code (which does not
 do
 what I want) is:
 
 library(ggplot2)
 ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
  runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
 Thinking of it, I would need to find out which offset ggplot uses to
 dodge the nested factors. If I knew the exact quantity, I could do
 something like
 
 geom_point(aes(x = offset.used.by.geom_boxplot))
 
 So how are the exact positions on the x-axis for geom_boxplot
 determined?
 Any ideas?
 
 Thanks for the help, anyways.
 
 KR,
 
 -Thorn
 
 
 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: Montag, 2. Juli 2012 15:04
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
 Subject: RE: [R] ggplot: dodge positions
 
 Can you expand a bit on what is wrong with the dodge option?  From
 what
 I see it looks lovely witht the points exactly lined with the
 boxplots
 for each group but perhaps I don't understand exactly what you want
 .
 
 
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: thorn.tha...@rdls.nestle.com
 Sent: Mon, 2 Jul 2012 11:43:03 +0200
 To: r-help@r-project.org
 Subject: [R] ggplot: dodge positions
 
 Dear all,
 
 I want to get a series of boxplots (grouped by two factors) and I
 want to
 overlay the original observations and the following code does
 almost
 what
 I want:
 
 library(ggplot)
 ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
 runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4)))
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
 Yet the position of the points and the position of the boxes on the
 x-axis is not the same. I would like that the points are shifted
 accordingly, such that they line up with the boxplots. I tried
 position_dodge:
 
 ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
 geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
 but that did not really help, as all points are now dodged and I
 just
 want to have a fixed offset for each subgroup of points such that
 the
 boxplot and the points are aligned. Any ideas?
 
 
 Kind Regards,
 
 Thorn Thaler
 Mathematician
 
 Applied Mathematics
 Nestec Ltd,
 Nestlé Research Center
 PO Box 44
 CH-1000 Lausanne 26
 Phone: +41 21 785 8220
 Fax: +41 21 785 9486
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google
 Talk™
 and most webmails
 
 
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with

Re: [R] Decrete value check in a matrix

2012-07-02 Thread Thaler,Thorn,LAUSANNE,Applied Mathematics

Glad it works.  So far we seem to have at least three ways to do it.  R is 
amazing!

John Kane
Kingston ON Canada


 -Original Message-
 From: antony.akk...@ge.com
 Sent: Mon, 2 Jul 2012 06:04:03 -0700 (PDT)
 To: r-help@r-project.org
 Subject: Re: [R] Decrete value check in a matrix
 
 Good. Its working fine.
 
 Thank you John !
 
 
 
 From: John Kane [via R]
 [mailto:ml-node+s789695n4635113...@n4.nabble.com]
 Sent: Monday, July 02, 2012 6:18 PM
 To: Akkara, Antony (GE Energy, Non-GE)
 Subject: Re: Decrete value check in a matrix
 
 
 
 You are not asking for a Decrete [sic]  (descrete) value check but
 rather if the numbers are intergers.
 
 Try this:
 
 # from the ?is.integer help page
 is.wholenumber -
 function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x))  tol
 
 
 aa  - data.frame( na = c( 1, 3.4, 5, 5.6), nb = c(2.4, 4, 6, 6.7))
 ww  - data.frame(  is.wholenumber(aa))
 cbind(aa, ww)
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: [hidden email]
 Sent: Mon, 2 Jul 2012 03:04:48 -0700 (PDT)
 To: [hidden email]
 Subject: [R] Decrete value check in a matrix
 
 Hi All,
 
 Here i have an Dataframe (or) Matrix like this,
 
 MyMatrix -
 ABC  XYZ
 -----
 1  2.5
 3.4   4
 5  6
 5.6  6.7
 
 Here i need to check each column value having decrete value or not ?.
 If that particular coulmn-value having decrete value, then the result
 should
 be
 TRUE/FALSE respectively in the result column.
 Finally, i need to get the result as  Dataframe (or) Matrix form like
 this
 
 ABC  XYZ  ABC_RESULT   XYZ_RESULT
 ----- 
 
 1  2.5 TRUE   FALSE
 3.4   4FALSE TRUE
 5  6TRUETRUE
 5.6  6.7  FALSE FALSE
 
 - Can any one help me fast ?
 
 Antony.
 
 
 
 
 Receive Notifications of Incoming Messages
 Easily monitor multiple email accounts  access them with a click.
 Visit http://www.inbox.com/notifier and check it out!
 
 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p
 4635113.html
 
 To unsubscribe from Decrete value check in a matrix, click here
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscrib
 e_by_codenode=4635090code=YW50b255LmFra2FyYUBnZS5jb218NDYzNTA5MHwxNTUx
 OTQzMDI5 .
 NAML
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_view
 erid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.Bas
 icNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.tem
 plate.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml
 -instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemai
 l.naml
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Decrete-value-check-in-a-matrix-tp4635090p4635118.html
 Sent from the R help mailing list archive at Nabble.com.
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply with multiple conditions

2012-07-02 Thread Paul Guilhamon

Thanks for your reply Jean,

I think your interpretation is correct but when I run your code I end
up with the below dataframe and obviously the bins created there don't
correspond to a chromStart change of 115341:

  chrom chromStart chromEnd name cumsum bin
1  chr1  1008910309   ZBTB33  10089   1
2  chr1  1013210536  TAF7_(SQ-8)  20221   2
3  chr2  1013310362 Pol2-4H8  30354   3
4  chr2  1014810418 MafF_(M8194)  40502   4
5  chr2 210382   210578   ZBTB33  50884   5
6  chr2 216132   216352 CTCF  67016   6

the first two rows should have the same bin number (same chrom,
115341 diff), then rows 34 should be in another bin (different chrom
from rows 12, 115341 diff), and rows 56 in another one (same chrom
but 115341 difference between row 4 and row 5).

it seems the new.bin line of your code isn't quite doing what it
should but I can't pinpoint the error there...
Paul


On 2 July 2012 14:19, Jean V Adams jvad...@usgs.gov wrote:
 Paul,

 My interpretation is that you are trying to assign a new bin number to a row
 every time the variable chrom changes and every time the variable chromStart
 changes by 115341 or more.  Is that right?  If so, you don't need a loop at
 all.  Check out the code below.  I made a couple changes to the all.tf7
 example data frame so that it would have two changes in bin number, one
 based on the chrom variable and one based on the chromStart variable.

 Jean

 all.tf7 - data.frame(
 chrom = c(chr1, chr1, chr2, chr2, chr2, chr2),
 chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
 chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
 name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194),
 ZBTB33, CTCF),
 cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
 bin = rep(NA, 6)
 )

 # assign a new bin every time chrom changes and every time chromStart
 changes by 115341 or more
 L - nrow(all.tf7)
 prev.chrom - c(NA, all.tf7$chrom[-L])
 delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
 new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | delta.start =
 115341
 all.tf7$bin - cumsum(new.bin)
 all.tf7


 pguilha paul.guilha...@gmail.com wrote on 07/02/2012 06:25:13 AM:

 Hello all,

 I have written a for loop to act on a dataframe with close to 3million
 rows
 and 6 columns and I would like to pass it to apply() to speed the process
 up
 (I let the loop run for 2 days before stopping it and it had only gone
 through 200,000 rows) but I am really struggling to find a way to pass the
 arguments. Below are the loop and the head of the dataframe I am working
 on.
 Any hints would be much appreciated, thank you! (I have searched for this
 but could not find any other posts doing quite what I want)
 Paul

 x-as.numeric(all.tf7[1,2])
 for (i in 2:nrow(all.tf7)) {
   if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)115341)
 all.tf7[i,6]-all.tf7[i-1,6]
   else if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)=115341) {
 all.tf7[i,6]-(all.tf7[i-1,6]+1)
 x-as.numeric(all.tf7[i,2]) }
   else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
 all.tf7[i,6]-(all.tf7[i-1,6]+1)
 x-as.numeric(all.tf7[i,2]) }
 }

 #the aim here is to attribute a bin number to each row so that I can then
 split the dataframe according to those bins.


 chrom chromStart chromEnd name cumsum bin
 chr1  10089 10309   ZBTB33  10089   1
 chr1  10132 10536  TAF7_(SQ-8)  20221   1
 chr1  10133 10362Pol2-4H8  30354   1
 chr1  10148 10418  MafF_(M8194)  40502   1
 chr1  10382 10578ZBTB33  50884   1
 chr1  16132 16352CTCF  67016   1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot: dodge positions

Well, that is exactly what I wanted to have. And you were right, it had 
something to do with package versions. So I updated R and the plot looks 
exactly the way I wanted  it. Thanks a lot for your help and time.

KR,

-Thorn


 -Original Message-
 From: John Kane [mailto:jrkrid...@inbox.com]
 Sent: Montag, 2. Juli 2012 15:58
 To: Thaler,Thorn,LAUSANNE,Applied Mathematics
 Subject: RE: [R] ggplot: dodge positions
 
 Let's hope it makes it.  Just in case my version is okay let me give
 you my sessionInfo in case we have some subtle difference in settings
 that is having an effect.  Of course it may be the wrong graph.
 
 sessionInfo()
 R version 2.15.1 (2012-06-22)
 Platform: i686-pc-linux-gnu (32-bit)
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] grid  stats graphics  grDevices utils datasets  methods
 [8] base
 
 other attached packages:
 [1] RColorBrewer_1.0-5 plyr_1.7.1 reshape2_1.2.1
 scales_0.2.1
 [5] ggplot2_0.9.1
 
 loaded via a namespace (and not attached):
  [1] colorspace_1.1-1 dichromat_1.2-4  digest_0.5.2 labeling_0.1
  [5] MASS_7.3-18  memoise_0.1  munsell_0.3  proto_0.3-9.2
  [9] stringr_0.6  tools_2.15.1
 
 John Kane
 Kingston ON Canada
 
 
  -Original Message-
  From: thorn.tha...@rdls.nestle.com
  Sent: Mon, 2 Jul 2012 15:53:50 +0200
  To: jrkrid...@inbox.com
  Subject: RE: [R] ggplot: dodge positions
 
  Yes, that would be nice if you could send the graph directly to me.
  Thanks a billion for your help and your time.
 
  KR,
 
  -Thorn
 
 
  -Original Message-
  From: John Kane [mailto:jrkrid...@inbox.com]
  Sent: Montag, 2. Juli 2012 15:52
  To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-project.org
  Subject: RE: [R] ggplot: dodge positions
 
 
  Damn firewalls.
  However as I see the graph, the points are lined up exactly on the
  centreline of each boxplot. So for example the lowest outlier on
 grp1
  boxplot for A is exactlly where it should be at the end of the
 whisker.
  Real outliers for grp1 for D are exactly above the 'non-existant'
  whisker for that boxplot.  So I looks to me as if in my version of
  the plot it is what you want.
 
  Any change that I can email directly to you and get an attactment
  through?
  John Kane
  Kingston ON Canada
 
 
  -Original Message-
  From: thorn.tha...@rdls.nestle.com
  Sent: Mon, 2 Jul 2012 15:40:48 +0200
  To: jrkrid...@inbox.com, r-help@r-project.org
  Subject: RE: [R] ggplot: dodge positions
 
  Unfortunately I can't see your example as the page is blocked by
 our
  firewall. Anyways, if I try the dodge code, the points are shifted,
  yet
  they are all shifted by another offset. It makes that the green
  points
  for instance are indeed closer to the green boxplot, yet they are
  not aligned meaning that all green plots seem to have a different
  position on
  the x-axis, while all the green points for x == A should align
  exactly
  with A. Am I clearer now?
 
  KR,
 
  -Thorn
 
 
  -Original Message-
  From: John Kane [mailto:jrkrid...@inbox.com]
  Sent: Montag, 2. Juli 2012 15:21
  To: Thaler,Thorn,LAUSANNE,Applied Mathematics; r-help@r-
 project.org
  Subject: RE: [R] ggplot: dodge positions
 
  I don't think I was clear. Sorry.  What I was refering to was the
 
  ggplot(ddf, aes(x, y, colour=grp)) + geom_boxplot() +
  geom_point(aes(ymax=max(y)), position = position_dodge(width=.75))
 
  which is giving me http://www.mediafire.com/i/?fdurpq6e6l8cu35
  which was what I though you want.
 
  I have no idea how the x axis points on the boxplot are
 determined.
  It may be relatively clear in the code but I don't really have the
  knowledge to ferret it out.
 
  Sorry that I cannot be of more help.
 
  John Kane
  Kingston ON Canada
 
 
  -Original Message-
  From: thorn.tha...@rdls.nestle.com
  Sent: Mon, 2 Jul 2012 15:10:33 +0200
  To: jrkrid...@inbox.com, r-help@r-project.org
  Subject: RE: [R] ggplot: dodge positions
 
  I guess it works with ggplot but not with ggplot2. I'm using only
  the
  latter but had a typo in my first post. So the code (which does
  not
  do
  what I want) is:
 
  library(ggplot2)
  ddf - data.frame(x=factor(rep(LETTERS[1:4], each=30)), y =
  runif(120,0,10), grp = factor(rep(rep(1:3, 10), 4))) ggplot(ddf,
  aes(x, y, colour=grp)) + geom_boxplot() + geom_point()
 
  Thinking of it, I would need to find out which offset ggplot uses
  to
  dodge the nested factors. If I knew the exact quantity, I could
 do
  something like
 
  geom_point(aes(x = offset.used.by.geom_boxplot))
 
  So how are the exact positions on the x-axis for geom_boxplot
  determined?
  Any ideas?
 
  Thanks for the help, anyways.
 
  KR,

Re: [R] Adjusting length of series



On Jul 2, 2012, at 5:13 AM, Lekgatlhamang, lexi Setlhare wrote:


Hi David and AK,
I have been trying to implement your suggestions since yesterday,  
but I encountered some challenges.


As for David's suggestions, I could only implement it after some  
modifications. Using an abridged version of my data, I dpud my  
dataset and then show my steps below.


Well, your initial question (why the $ referencing did not work) is  
now answered. This is not a dataframe but rather a 'ts' classed object  
and there is no `$` method for such objects. They are really matrices  
with some extra attributes.


 ydata$BoBCL1
Error in ydata$BoBCL1 : $ operator is invalid for atomic vectors

As I understood it you were able to get useful analyses using the  
formula methods for lm on these objects, but were just having  
difficulty with the $ operator. So the answer is . don't do that.

--
David.




dput(ydata)

structure(c(68.10004, -34.80002, 90.39996,
54.60004, -172.3, 51.80002, 175, 79.80002,
-35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1,
98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8,
-17.30002, 286.3, NA, NA, -102.9001, 125.2,  
-35.79993,

-226.9001, 224.1, 123.2, -95.19998, -115.5001,
166.2001, -13.69998, -184.3, 232, 350.3,  
-840.9001,
424.5001, 61.79993, -107, 230.4001,  
-395.2001,

239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160.,
-191.1001, 451.0001, -100.9001, -218.4,
-20.30011, 281.7002, -179.9001, -170.6,
416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999,
337.4001, -625.6001, 634.6001,  
-384.5001,

448.7001, NA, NA, -164.45784099, 17.079353995,
95.976788009, 680.23816699, -491.34869099, -274.694009,
-256.332907, 469.62296, -146.431891, -41.077201995, -106.970104,
757.68826399, -1689.214533, 2320.098952, -1446.97942, 516.384521,
-375.27765099, 293.86702999, 417.845195, 278.198807,
-968.59203399, -314.195986, NA, NA, NA, 181.53719499,
78.897434013, 584.26137898, -1171.586858, 216.65468199,
18.361101998, 725.955867, -616.054851, 105.35468901,
-65.892902005, 864.65836799, -2446.902797, 4009.313485,
-3767.078372, 1963.363941, -891.66217199, 669.14468099,
123.978165, -139.646388, -1246.790841, 654.396048, NA, 4937,
5005.1, 4970.3, 5060.7, 5115.3, 4943, 4994.8, 5169.8, 5249.6,
5213.9, 5344.4, 5461.2, 5393.7, 5558.2, 6073, 5746.9, 5845.3,
6005.5, 6058.7, 6342.3, 6230.7, 6358.5, 6341.2, 6627.5, 4187.5,
4296.004835, 4240.051829, 4201.178177, 4258.281313, 4995.622616,
5241.615228, 5212.913831, 4927.879527, 5112.468183, 5150.624948,
5147.704511, 5037.81397, 5685.611693, 4644.194883, 5922.877025,
5754.579747, 6102.66699, 6075.476582, 6342.153204, 7026.675021,
7989.395645, 7983.524235, 7663.456839), .Dim = c(24L, 7L), .Dimnames  
= list(

NULL, c(DCred1, DCred2, DCred3, DBoBC2, DBoBC3,
CredL1, BoBCL1)), .Tsp = c(2001.083, 2003, 12
), class = c(mts, ts))

NB: the NAs in the dataset emanated from lagging or differencing the  
series


David's suggestion
 df-data.frame(DCred1,DCred2,DCred3,DBoBC2,DBoBC3,CredL1,BoBCL1)
Error in data.frame(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1,  
BoBCL1) :

  arguments imply differing number of rows: 23, 22, 21, 24

So I modified as follows:
length(DCred3)  # finding the minimum length of various series
[1] 21

# Then dataframe construction
dframe-  
data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21],
+  
Dbobc2 
= 
DBoBC2 
[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21])

# Then estimated regression
regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL,  
data=dframe)

summary(regCred)

# Worked well as shown by results below
Call:
lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL +
BoBCL, data = dframe)
Residuals:
Min  1Q  Median  3Q Max
-69.516 -27.695  -8.085  13.851 107.276
Coefficients:
 Estimate Std. Error t value Pr(|t|)
(Intercept) 159.32304  157.15209   1.014 0.327873
Dcre2-0.755270.17262  -4.375 0.000634 ***
Dcre3-0.210060.08656  -2.427 0.029329 *
Dbobc20.051110.06565   0.779 0.449197
Dbobc30.031060.03510   0.885 0.391108
CredL-0.109670.04933  -2.223 0.043177 *
BoBCL 0.097560.03097   3.150 0.007087 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 52.3 on 14 degrees of freedom
Multiple R-squared: 0.9331, Adjusted R-squared: 0.9044
F-statistic: 32.55 on 6 and 14 DF,  p-value: 1.911e-07

This is good, but couldn't I code the process for my 15 variable  
model?

Perhaps that is where the use of
Dcr- lapply(..., function(x) ...)
comes in?

AK, if you spare some minutes,

Re: [R] R sub query



On Jul 2, 2012, at 4:15 AM, Sarah Auburn wrote:


Hello,
I would like to substitute a substring of characters defined by a  
specific start and end sequence.
i.e. in the example matrix below, I would like to substitute .:X:  
with , where X varies in sequence...


m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .: 
58:50,8, .:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)


sub(\\..+\\:, , m)
 [,1]  [,2][,3]   [,4]
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56  0,13 75,0

You should also look at Holtman's since he is better at this than I am  
but I didn't really understand how his version worked. Mine is really  
in three parts. The first entry '\\.' matches the leading dot and it  
could have been '^\\.' to avoid any confusion with decimal points. The  
second entry is '.+' which is anything until the third entry '\\:'  
which ends up matching the last ':' since these are greedy expressions.


You could also have done it with \\.\\:.+\\:

(Now that I look at his again ^\\.:[^:]*: , I find that I can learn  
something from it, as often happens when I read his contributions. To  
my surprise the ':' character does not need to be escaped but can be  
and the interior of his expression '[^:]' is a negative character- 
class. It matches anything other than ':' and the '*' following it  
lets that anything be of any length. And then he didn't need to escape  
the trailing ':'.)


--
David.


output required:
 [,1]  [,2]  [,3][,4]
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56   0,13 75,0

Thank you for any help
Sarah
[[alternative HTML version deleted]]

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Specify model with polynomial interaction terms up to degree n



On Jul 2, 2012, at 9:29 AM, YTP wrote:

I would like to specify a model with all polynomial interaction  
terms between
two variables, say, up to degree 6. For example, terms like a^6 +  
(a^5 *

b^1)  +  (a^4 * b^2) + ... and so on.  The documentation states

The ^ operator indicates crossing to the specified degree.

so I would expect a model specified as y ~ (a+b)^6 to produce these  
terms.
However doing this only returns four slope coefficients, for  
Intercept, a,
b, and a:b.  Does anyone know how to produce the desired result?  
Thanks in

advance.


You might try:

poly(a,6)*poly(b,6)

(untested   ... and it looks somewhat dangerous to me.)

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply with multiple conditions

2012-07-02 Thread Jean V Adams

Paul,

Are you submitting the exact code that I included in my previous e-mail?
When I submit that code, I get this ...

  chrom chromStart chromEnd name cumsum bin
1  chr1  1008910309   ZBTB33  10089   1
2  chr1  1013210536  TAF7_(SQ-8)  20221   1
3  chr2  1013310362 Pol2-4H8  30354   2
4  chr2  1014810418 MafF_(M8194)  40502   2
5  chr2 210382   210578   ZBTB33  50884   3
6  chr2 216132   216352 CTCF  67016   3

Jean


Paul Guilhamon paul.guilha...@gmail.com wrote on 07/02/2012 08:59:00 AM:

 Thanks for your reply Jean,
 
 I think your interpretation is correct but when I run your code I end
 up with the below dataframe and obviously the bins created there don't
 correspond to a chromStart change of 115341:
 
   chrom chromStart chromEnd name cumsum bin
 1  chr1  1008910309   ZBTB33  10089   1
 2  chr1  1013210536  TAF7_(SQ-8)  20221   2
 3  chr2  1013310362 Pol2-4H8  30354   3
 4  chr2  1014810418 MafF_(M8194)  40502   4
 5  chr2 210382   210578   ZBTB33  50884   5
 6  chr2 216132   216352 CTCF  67016   6
 
 the first two rows should have the same bin number (same chrom,
 115341 diff), then rows 34 should be in another bin (different chrom
 from rows 12, 115341 diff), and rows 56 in another one (same chrom
 but 115341 difference between row 4 and row 5).
 
 it seems the new.bin line of your code isn't quite doing what it
 should but I can't pinpoint the error there...
 Paul
 
 
 On 2 July 2012 14:19, Jean V Adams jvad...@usgs.gov wrote:
  Paul,
 
  My interpretation is that you are trying to assign a new bin number to 
a row
  every time the variable chrom changes and every time the variable 
chromStart
  changes by 115341 or more.  Is that right?  If so, you don't need a 
loop at
  all.  Check out the code below.  I made a couple changes to the 
all.tf7
  example data frame so that it would have two changes in bin number, 
one
  based on the chrom variable and one based on the chromStart variable.
 
  Jean
 
  all.tf7 - data.frame(
  chrom = c(chr1, chr1, chr2, chr2, chr2, chr2),
  chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
  chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
  name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194),
  ZBTB33, CTCF),
  cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
  bin = rep(NA, 6)
  )
 
  # assign a new bin every time chrom changes and every time chromStart
  changes by 115341 or more
  L - nrow(all.tf7)
  prev.chrom - c(NA, all.tf7$chrom[-L])
  delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
  new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom | 
delta.start =
  115341
  all.tf7$bin - cumsum(new.bin)
  all.tf7
 
 
  pguilha paul.guilha...@gmail.com wrote on 07/02/2012 06:25:13 AM:
 
  Hello all,
 
  I have written a for loop to act on a dataframe with close to 
3million
  rows
  and 6 columns and I would like to pass it to apply() to speed the 
process
  up
  (I let the loop run for 2 days before stopping it and it had only 
gone
  through 200,000 rows) but I am really struggling to find a way to 
pass the
  arguments. Below are the loop and the head of the dataframe I am 
working
  on.
  Any hints would be much appreciated, thank you! (I have searched for 
this
  but could not find any other posts doing quite what I want)
  Paul
 
  x-as.numeric(all.tf7[1,2])
  for (i in 2:nrow(all.tf7)) {
if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)115341)
  all.tf7[i,6]-all.tf7[i-1,6]
else if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)=115341) {
  all.tf7[i,6]-(all.tf7[i-1,6]+1)
  x-as.numeric(all.tf7[i,2]) }
else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
  all.tf7[i,6]-(all.tf7[i-1,6]+1)
  x-as.numeric(all.tf7[i,2]) }
  }
 
  #the aim here is to attribute a bin number to each row so that I can 
then
  split the dataframe according to those bins.
 
 
  chrom chromStart chromEnd name cumsum bin
  chr1  10089 10309   ZBTB33  10089   1
  chr1  10132 10536  TAF7_(SQ-8)  20221   1
  chr1  10133 10362Pol2-4H8  30354   1
  chr1  10148 10418  MafF_(M8194)  40502   1
  chr1  10382 10578ZBTB33  50884   1
  chr1  16132 16352CTCF  67016   1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] interpolation to new points between geo coordinates

2012-07-02 Thread Jon Olav Skoien


Jan,

There are a lot of packages that can help you, the best one depends on 
your needs (with or without prediction uncertainty, format of results, 
different options) and the size of your problem.

CRAN has a spatial Task View
http://cran.r-project.org/web/views/Spatial.html
with a short description of most packages dealing with spatial data. I 
think the functions you mentioned should be able to solve your problems, 
but I dont have experience with either of them. It is impossible to know 
what you are doing wrong as you did not post any error messages.
For increasing the resolution of your data, you can also try 
disaggregate or resample in the raster package. gstat, with  automap or 
intamap as simpler interfaces can also be used for geostatistical 
interpolation to the higher resolution grid, also giving you a 
prediction uncertainty. You should in general be careful with 
interpolation of lat-lon data, consider using spTransform to get 
projected coordinates if you use any of the geostatistical methods.


You will for spatial questions generally get quicker response from the 
r-sig-geo mailinglist.


Best wishes,
Jon


On 02-Jul-12 10:47, Jan Näs wrote:

Hi

I have a data set with geo coordinates and values for each coordinate.
I want to interpolate the values to new positions on a finer grid,
also geo coordinates.
I have looked at the fields package (interp.surface) and the akima
package (interp) but cant quite figure what I am doing wrong, or if
these functions suits my needs.

I have the two data set:

grid_1:

   lat  lon  value
1 56.5  11.1  53
2 56.6 11.1 53.1
3 56.7 11.12 52.1
4 56.5 11.2 52.9
...etc.

and a new grid

grid_2
   lat  lon
1 55.52 11.11
2 55.53 11.115
3 55.54 11.12
...etc.


And I want interpolated values for grid_2.
Any ideas?
/Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Land Resource Management Unit

Via Fermi 2749, TP 440,  I-21027 Ispra (VA), ITALY

jon.sko...@jrc.ec.europa.eu
Tel:  +39 0332 789206

Disclaimer: Views expressed in this email are those of the individual and do 
not necessarily represent official views of the European Commission.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Specify model with polynomial interaction terms up to degree n



On Jul 2, 2012, at 10:51 AM, David Winsemius wrote:



On Jul 2, 2012, at 9:29 AM, YTP wrote:

I would like to specify a model with all polynomial interaction  
terms between
two variables, say, up to degree 6. For example, terms like a^6 +  
(a^5 *

b^1)  +  (a^4 * b^2) + ... and so on.  The documentation states

The ^ operator indicates crossing to the specified degree.

so I would expect a model specified as y ~ (a+b)^6 to produce these  
terms.
However doing this only returns four slope coefficients, for  
Intercept, a,
b, and a:b.  Does anyone know how to produce the desired result?  
Thanks in

advance.


You might try:

poly(a,6)*poly(b,6)

(untested   ... and it looks somewhat dangerous to me.)


Well, now it's tested and succeeds at least numerically. Also tested

( poly(a,6) +poly(b,6) )^2 with identical results.

Whether this is wise practice remains in doubt:

dfrm - data.frame(out=rnorm(100), a=rnorm(100), b=rnorm(100) )
anova(lm( out ~ (poly(a,6) +poly(b,6) )^2, data=dfrm) )
#---
Analysis of Variance Table

Response: out
  Df Sum Sq Mean Sq F value  Pr(F)
poly(a, 6) 6 12.409 2.06810  3.0754 0.01202 *
poly(b, 6) 6  5.321 0.88675  1.3187 0.26596
poly(a, 6):poly(b, 6) 36 41.091 1.14142  1.6974 0.04069 *
Residuals 51 34.295 0.67246
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] using na.locf from package zoo to fill NA gaps

2012-07-02 Thread jeff6868

Hi everybody,

I have a small question about the function na.locf from the package zoo.
I saw in the help that this function is able to fill NA gaps with the last
value before the NA gap (or with the next value).
But it is possible to fill my NA gaps according to the last AND the next
value at the same time?
Actually, I want R to fill my gaps with the method of na.locf only if the
last value before the gap and the next value after the gap are identical.
Here's an example: imagine this small DF:

df - data.frame(x1=c(1:3,NA,NA,NA,6:9))

In this case, the last value before NA (3) and the next value after NA
(6) are different, so I don't want him to fill this gap.

But if I have a DF like this:

df2 - data.frame(x2=c(1:3,NA,NA,NA,3:6))

The last and next value (3) are identical, so in this case I want him to
fill my gap with 3 as would do the na.locf function: 
na.locf(df2)

But as you understood, I want to do this only if last and next value are
identical. If they're not, I want to keep my NA gap.

Have you any idea how I can do this (maybe something to add to na.locf or
maybe another better function to do this)?

Thank you very much!


--
View this message in context: 
http://r.789695.n4.nabble.com/using-na-locf-from-package-zoo-to-fill-NA-gaps-tp4635150.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using na.locf from package zoo to fill NA gaps

2012-07-02 Thread Gabor Grothendieck

On Mon, Jul 2, 2012 at 11:17 AM, jeff6868
geoffrey_kl...@etu.u-bourgogne.fr wrote:
 Hi everybody,

 I have a small question about the function na.locf from the package zoo.
 I saw in the help that this function is able to fill NA gaps with the last
 value before the NA gap (or with the next value).
 But it is possible to fill my NA gaps according to the last AND the next
 value at the same time?
 Actually, I want R to fill my gaps with the method of na.locf only if the
 last value before the gap and the next value after the gap are identical.
 Here's an example: imagine this small DF:

 df - data.frame(x1=c(1:3,NA,NA,NA,6:9))

 In this case, the last value before NA (3) and the next value after NA
 (6) are different, so I don't want him to fill this gap.

 But if I have a DF like this:

 df2 - data.frame(x2=c(1:3,NA,NA,NA,3:6))

 The last and next value (3) are identical, so in this case I want him to
 fill my gap with 3 as would do the na.locf function:
 na.locf(df2)

 But as you understood, I want to do this only if last and next value are
 identical. If they're not, I want to keep my NA gap.

 Have you any idea how I can do this (maybe something to add to na.locf or
 maybe another better function to do this)?


Try doing it forwards and backwards and only replacing if they are the same:

library(zoo)

na.locf.ifeq - function(x) {
ix - na.locf(x) == na.locf(x, fromLast = TRUE)  is.na(x)
replace(x, ix, na.locf(x)[ix])
}

# test 1
x1 - c(1, 2, 3, NA, NA, NA, 6, 7, 8, 9)
na.locf.ifeq(x1)

# test 2
x2 - c(1, 2, 3, NA, NA, NA, 3, 4, 5, 6)
na.locf.ifeq(x2)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Heat Maps

2012-07-02 Thread David L Carlson

Something like this?

 image(x, y, outer(x, y, u), breaks=c(0, a), col=heat.colors(3))
 contour(x, y, outer(x, y, u),levels=a, col=blue, add=TRUE)

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Akhil dua
 Sent: Monday, July 02, 2012 2:26 AM
 To: Joseph Clark
 Cc: r-help@r-project.org
 Subject: Re: [R] Heat Maps
 
 Thanks Joseph
 
 
 
 but see i am not able to get heat maps with this code \
 can u please give me the full codes to generate heat map on the same
 graph
 where i have drawn contour lines
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fit circle with R

2012-07-02 Thread gianni lavaredo

Dear Researchers,

I wrote two function to fit a circle using noisy data.

1- the fitCircle() is derived from MATLAB code of * zhak Bucher* from the
link
http://www.mathworks.com/matlabcentral/fileexchange/5557-circle-fit/content/circfit.m
2- the CircleFitByPratt() from MATLAB code of *Nikolai Chernov *from the
link
http://www.mathworks.com/matlabcentral/fileexchange/22643-circle-fit-pratt-method/content/CircleFitByPratt.m,
based on:
*V. Pratt, Direct least-squares fitting of algebraic surfaces, Computer
Graphics, Vol. 21, pages 145-152 (1987)*

 I am looking for new methods to compare and improve my analysis because
the error increase with decreasing of points used in the functions.

Thanks for all suggestions
Gianni


Here the funtions with example

# fitCircle, returns:
# xf,yf = centre of the fitted circle
# Rf = radius of the fitted circle
# Cf = circumference of the fitted circle
# Af = Area of the fitted circle

fitCircle - function(x,y){
 a = qr.solve(cbind(x,y,rep(1,length(x))),cbind(-(x^2+y^2)))
 xf = -.5*a[1]
 yf = -.5*a[2]
 Rf  =  sqrt((a[1]^2+a[2]^2)/4-a[3])
Cf = 2*pi*Rf
Af = pi*(Rf^2)
m - cbind(xf,yf,Rf,Cf,Af)
return(m)}

# CircleFitByPratt, returns:
#   [,1] and [,2]  = centre of the fitted circle
# [,3] = radius of fitted cirlce


CircleFitByPratt - function(x,y){

n - length(x)
centroid - cbind(mean(x),mean(y))

Mxx=0; Myy=0; Mxy=0; Mxz=0; Myz=0; Mzz=0;

for(i in 1:n){
Xi - x[[i]] - centroid[1]
Yi - y[[i]] - centroid[2]
Zi - (Xi*Xi) + (Yi*Yi)
Mxy = Mxy + Xi*Yi;
Mxx = Mxx + Xi*Xi;
Myy = Myy + Yi*Yi;
Mxz = Mxz + Xi*Zi;
Myz = Myz + Yi*Zi;
Mzz = Mzz + Zi*Zi;
}

Mxx = Mxx/n
Myy = Myy/n
Mxy = Mxy/n
Mxz = Mxz/n
Myz = Myz/n
Mzz = Mzz/n

# computing the coefficients of the characteristic polynomial
Mz = Mxx + Myy;
Cov_xy = Mxx*Myy - Mxy*Mxy;
Mxz2 = Mxz*Mxz;
Myz2 = Myz*Myz;

A2 = 4*Cov_xy - 3*Mz*Mz - Mzz;
A1 = Mzz*Mz + 4*Cov_xy*Mz - Mxz2 - Myz2 - Mz*Mz*Mz;
A0 = Mxz2*Myy + Myz2*Mxx - Mzz*Cov_xy - 2*Mxz*Myz*Mxy + Mz*Mz*Cov_xy;
A22 = A2 + A2;

epsilon=1e-12;
ynew=1e+20;
IterMax=20;
xnew = 0;

# Newton's method starting at x=0
epsilon=1e-12;
ynew=1e+20;
IterMax=20;
xnew = 0;
iter=1:IterMax

for (i in 1:IterMax){
yold = ynew;
ynew = A0 + xnew*(A1 + xnew*(A2 + 4.*xnew*xnew));
if (abs(ynew)  abs(yold)){
print('Newton-Pratt goes wrong direction: |ynew|  |yold|')
xnew = 0;
break
}
Dy = A1 + xnew*(A22 + 16*xnew*xnew);
xold = xnew;
xnew = xold - ynew/Dy;
if (abs((xnew-xold)/xnew)  epsilon) {break}
if(iter[[i]] = IterMax){
print('Newton-Pratt will not converge');
xnew = 0;
}
if(xnew  0.){
print('Newton-Pratt negative root:  x=',xnew);
}
}

DET = xnew*xnew - xnew*Mz + Cov_xy;
Center = cbind(Mxz*(Myy-xnew)-Myz*Mxy , Myz*(Mxx-xnew)-Mxz*Mxy)/DET/2;

#computing the circle parameters

DET = xnew*xnew - xnew*Mz + Cov_xy;
Center = cbind(Mxz*(Myy-xnew)-Myz*Mxy , Myz*(Mxx-xnew)-Mxz*Mxy)/DET/2;

Par = cbind(Center+centroid , sqrt(Center[2]*Center[2]+Mz+2*xnew));
return(Par)
}

#EXAMPLE
library(plotrix)

# Create a Circle of radius=10, centre=5,5
R = 10; x_c = 5; y_c = 5;
thetas = seq(0,pi,(pi/64));
xs = x_c + R*cos(thetas)
ys = y_c + R*sin(thetas)
# Now add some random noise
mult = 0.5;
xs = xs+mult*rnorm(rnorm(xs));
ys = ys+mult*rnorm(rnorm(ys));
plot(xs,ys,pch=19,cex=0.5,col=red,xlim=c(-10,20),ylim=c(-10,20),asp=1)
# real circle
draw.circle(x_c,y_c,radius=10,border=black)
points(x_c,y_c,,pch=4,col=black)

CPrat - CircleFitByPratt(xs,ys)
draw.circle(CPrat[1],CPrat[2],radius=CPrat[3],border=blue)
points(CPrat[1],CPrat[2],pch=4,col=blue)

MyC - fitCircle(xs,ys)
draw.circle(MyC[1],MyC[2],radius=MyC[3],border=green)
points(MyC[1],MyC[2],pch=4,col=green)

# Select less points
points(xs[20:49],ys[20:49])


MyC1 - fitCircle(xs[20:49],ys[20:49])
draw.circle(MyC1[1],MyC1[2],radius=MyC1[3],border=blue,lty=2,lwd=2)

CPrat1 - CircleFitByPratt(xs[20:49],ys[20:49])
draw.circle(CPrat1[1],CPrat1[2],radius=CPrat1[3],border=green,lty=2,lwd=2)
points(CPrat[1],CPrat[2],pch=4,col=red)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Undocumented behavior around daylight savings time?

2012-07-02 Thread Samuel Brown

Apologies for the intrusion. I am a lurker on list.

I have been working to convert a digitized signal from a matlab file into R
for analysis and other applications. R.matlab is working fine, and it is
easy to convert the matlab date-time number (days since year 0) into R
date-time numbers (seconds since 1970-01-01).

Unfortunately, when I cast the R date-time number into POSIXct format it
seems to adjust silently by one hour to reflect daylight savings time, but
I have been unable to suppress that behavior. (The problem is that the
matlab date is already in MDT, and I don't want to have to write my own
code to suppress the added hour only when local DST rules apply.)

To use the example of the R date-time number 1340717324, this is the
behavior I observe:

 as.POSIXct(1340717324,origin='1970-01-01')

[1] 2012-06-26 14:28:44 MDT

 as.POSIXct(1340717324,origin='1970-01-01',tz='')

[1] 2012-06-26 14:28:44 MDT

 as.POSIXct(1340717324,origin='1970-01-01',tz='America/Denver')

[1] 2012-06-26 14:28:44 MDT

 as.POSIXct(1340717324,origin='1970-01-01',tz='MST')

[1] 2012-06-26 13:28:44 MST

 as.POSIXct(1340717324,origin='1970-01-01',tz='UTC')

[1] 2012-06-26 13:28:44 UTC
I was ultimately able to solve the problem by casting into and out of a
character string, but that seems risky/error prone.

rdates='%Y-%m-%d %H:%M:%S'

as.POSIXct(strptime(as.character(as.POSIXct(1340717324,origin='1970-01-01',tz='UTC')),format=rdates))
[1] 2012-06-26 13:28:44 MDT

I have read the various help entries and even investigated the lubridates
package, but none indicate why exactly the extra hour is being added or how
to suppress it. (Note that I tried as.POSIXlt with various settings of
isdst, none of which worked).

I'm using R version 2.13.1 (2011-07-08) on x86_64-apple-darwin9.8.0.



-- 


Samuel Brown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using na.locf from package zoo to fill NA gaps

2012-07-02 Thread jeff6868

Seems to work very well!
Thank you very much Gabor!

--
View this message in context: 
http://r.789695.n4.nabble.com/using-na-locf-from-package-zoo-to-fill-NA-gaps-tp4635150p4635160.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] save conditions in a list

2012-07-02 Thread Christof Kluß

Hi

how would you save conditions like

a = day  100; b = val  50; c = year == 2012

in a list? I like to have variables like day, val, year and a list
of conditions list(a,b,c). Then I want to check if a  b  c is true or
if a | b | c is true or similar things.

Greetings
Christof

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Specify model with polynomial interaction terms up to degree n

2012-07-02 Thread Bert Gunter

Inline below.

-- Bert

On Mon, Jul 2, 2012 at 8:04 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Jul 2, 2012, at 10:51 AM, David Winsemius wrote:


 On Jul 2, 2012, at 9:29 AM, YTP wrote:

  I would like to specify a model with all polynomial interaction terms
 between
 two variables, say, up to degree 6. For example, terms like a^6 + (a^5 *
 b^1)  +  (a^4 * b^2) + ... and so on.  The documentation states

 The ^ operator indicates crossing to the specified degree.

 so I would expect a model specified as y ~ (a+b)^6 to produce these
 terms.
 However doing this only returns four slope coefficients, for Intercept,
 a,
 b, and a:b.  Does anyone know how to produce the desired result? Thanks
 in
 advance.


 You might try:

 poly(a,6)*poly(b,6)

 (untested   ... and it looks somewhat dangerous to me.)


 Well, now it's tested and succeeds at least numerically. Also tested

 ( poly(a,6) +poly(b,6) )^2 with identical results.

 Whether this is wise practice remains in doubt:


No it doesn't. It isn't.

-- Bert


 dfrm - data.frame(out=rnorm(100), a=rnorm(100), b=rnorm(100) )
 anova(lm( out ~ (poly(a,6) +poly(b,6) )^2, data=dfrm) )
 #---
 Analysis of Variance Table

 Response: out
   Df Sum Sq Mean Sq F value  Pr(F)
 poly(a, 6) 6 12.409 2.06810  3.0754 0.01202 *
 poly(b, 6) 6  5.321 0.88675  1.3187 0.26596
 poly(a, 6):poly(b, 6) 36 41.091 1.14142  1.6974 0.04069 *
 Residuals 51 34.295 0.67246
 ---
 Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

 --

 David Winsemius, MD
 West Hartford, CT

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Specify model with polynomial interaction terms up to degree n


Hello,

Another way is to cbind the vectors 'a' and 'b', but this needs argument 
'raw' set to TRUE.


poly(cbind(a, b), 6, raw=TRUE)

To the OP: is this time series related? With 6 being a lag or test 
(e.g., Tsay, 1986) order? I'm asking this because package nlts has a 
function for this test up to order 5 and it uses poly().


Hope this helps,

Rui Barradas

Em 02-07-2012 16:04, David Winsemius escreveu:


On Jul 2, 2012, at 10:51 AM, David Winsemius wrote:



On Jul 2, 2012, at 9:29 AM, YTP wrote:


I would like to specify a model with all polynomial interaction terms
between
two variables, say, up to degree 6. For example, terms like a^6 + (a^5 *
b^1)  +  (a^4 * b^2) + ... and so on.  The documentation states

The ^ operator indicates crossing to the specified degree.

so I would expect a model specified as y ~ (a+b)^6 to produce these
terms.
However doing this only returns four slope coefficients, for
Intercept, a,
b, and a:b.  Does anyone know how to produce the desired result?
Thanks in
advance.


You might try:

poly(a,6)*poly(b,6)

(untested   ... and it looks somewhat dangerous to me.)


Well, now it's tested and succeeds at least numerically. Also tested

( poly(a,6) +poly(b,6) )^2 with identical results.

Whether this is wise practice remains in doubt:

dfrm - data.frame(out=rnorm(100), a=rnorm(100), b=rnorm(100) )
anova(lm( out ~ (poly(a,6) +poly(b,6) )^2, data=dfrm) )
#---
Analysis of Variance Table

Response: out
   Df Sum Sq Mean Sq F value  Pr(F)
poly(a, 6) 6 12.409 2.06810  3.0754 0.01202 *
poly(b, 6) 6  5.321 0.88675  1.3187 0.26596
poly(a, 6):poly(b, 6) 36 41.091 1.14142  1.6974 0.04069 *
Residuals 51 34.295 0.67246
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Undocumented behavior around daylight savings time?

2012-07-02 Thread Jeff Newmiller

Set your default local timezone (at least while converting to POSIXt types:

Sys.setenv(TZ=Etc/GMT+7)
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Samuel Brown samuelbr...@gmail.com wrote:

Apologies for the intrusion. I am a lurker on list.

I have been working to convert a digitized signal from a matlab file
into R
for analysis and other applications. R.matlab is working fine, and it
is
easy to convert the matlab date-time number (days since year 0) into R
date-time numbers (seconds since 1970-01-01).

Unfortunately, when I cast the R date-time number into POSIXct format
it
seems to adjust silently by one hour to reflect daylight savings time,
but
I have been unable to suppress that behavior. (The problem is that the
matlab date is already in MDT, and I don't want to have to write my own
code to suppress the added hour only when local DST rules apply.)

To use the example of the R date-time number 1340717324, this is the
behavior I observe:

 as.POSIXct(1340717324,origin='1970-01-01')

[1] 2012-06-26 14:28:44 MDT

 as.POSIXct(1340717324,origin='1970-01-01',tz='')

[1] 2012-06-26 14:28:44 MDT

 as.POSIXct(1340717324,origin='1970-01-01',tz='America/Denver')

[1] 2012-06-26 14:28:44 MDT

 as.POSIXct(1340717324,origin='1970-01-01',tz='MST')

[1] 2012-06-26 13:28:44 MST

 as.POSIXct(1340717324,origin='1970-01-01',tz='UTC')

[1] 2012-06-26 13:28:44 UTC
I was ultimately able to solve the problem by casting into and out of a
character string, but that seems risky/error prone.

rdates='%Y-%m-%d %H:%M:%S'

as.POSIXct(strptime(as.character(as.POSIXct(1340717324,origin='1970-01-01',tz='UTC')),format=rdates))
[1] 2012-06-26 13:28:44 MDT

I have read the various help entries and even investigated the
lubridates
package, but none indicate why exactly the extra hour is being added or
how
to suppress it. (Note that I tried as.POSIXlt with various settings of
isdst, none of which worked).

I'm using R version 2.13.1 (2011-07-08) on
x86_64-apple-darwin9.8.0.



-- 


Samuel Brown

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Insert row in specific location between data frames

2012-07-02 Thread pigpigmeow

I have already follow your step, it still not work
when I merge groupA and groupB , the error message was shown
Error in rbind(deparse.level, ...) : replacement has length zero

--
View this message in context: 
http://r.789695.n4.nabble.com/Insert-row-in-specific-location-between-data-frames-tp4634905p4635153.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Specifying Transfer Function in Time series Intervention model

2012-07-02 Thread Subhadip Nath

Hi Team, 

I am running ARIMAX with TSA package.  my code is
fit2 - arimax(yseries, order = c(1,0,1),xtransf =
data.frame(X1var),transfer=list(c(1,0)))
my question is
1st Q.-- If I need to take difference of X1var then what should i do?. What
i am doing like submitting R code as 
X1vard - diff(X1var)
and then i am including in the xtransf.  Same time if  i need to take
difference of yseries 
yseriesd-diff(yseries)
here ARIMAX order = c(1,0,1) i have to keep. ARIMAX order = c(1,1,1) is
giving error. Hope this is correct procedure.

2nd Q --In the transfer = list(c(1,0), 1 is Autoregressive operator
function and 0 is Moving average operator. I am not able to change the value
of MA operator. I am receiving an error message 
Error in optim(init[mask], armaCSS, method = BFGS, hessian = FALSE,  : 
  initial value in 'vmmin' is not finite
I am using R 2.15.1 version.

3rd Q -- If i need to take Lag values of X1var. how to incorporate in the
model.

Warm regards,
Subhadip


--
View this message in context: 
http://r.789695.n4.nabble.com/Specifying-Transfer-Function-in-Time-series-Intervention-model-tp4635133.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adjusting length of series



Hello,

The class of your data is not dataframe.
Suppose I call your data as ydat1

str(ydat1)

 mts [1:24, 1:7] 68.1 -34.8 90.4 54.6 -172.3 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:7] DCred1 DCred2 DCred3 DBoBC2 ...
 - attr(*, tsp)= num [1:3] 2001 2003 12
 - attr(*, class)= chr [1:2] mts ts

ydat2-data.frame(ydat1)

str(ydat2)
'data.frame':    24 obs. of  7 variables:
 $ DCred1: num  68.1 -34.8 90.4 54.6 -172.3 ...
 $ DCred2: num  NA -102.9 125.2 -35.8 -226.9 ...
 $ DCred3: num  NA NA 228 -161 -191 ...
 $ DBoBC2: num  NA -164.5 17.1 96 680.2 ...
 $ DBoBC3: num  NA NA 181.5 78.9 584.3 ...
 $ CredL1: num  4937 5005 4970 5061 5115 ...
 $ BoBCL1: num  4188 4296 4240 4201 4258 ...

#Since you wanted only to do lm for these columns, I guess it doesn't really 
matter whether you have month and year in the dataset.
 #With NAs
 regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat2)
 summary(regCred)

Call:
lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + 
    BoBCL1, data = ydat2)

Residuals:
    Min  1Q  Median  3Q Max 
-124.988463  -33.133975    7.971083   23.607953   76.813601 

Coefficients:
 Estimate    Std. Error  t value   Pr(|t|)    
(Intercept) -538.61375718  205.91179535 -2.61575   0.020344 *  
DCred2 0.96401908    0.15623660  6.17025 2.4337e-05 ***
DCred3    -0.25720355    0.08983607 -2.86303   0.012524 *  
DBoBC2    -0.11222347    0.07828182 -1.43358   0.173646    
DBoBC3 0.04564621    0.03825169  1.19331   0.252578    
CredL1 0.18499925    0.06565456  2.81777   0.013693 *  
BoBCL1    -0.07682710    0.03406916 -2.25503   0.040666 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 54.44479 on 14 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared: 0.9324472,    Adjusted R-squared: 0.903496 
F-statistic: 32.20757 on 6 and 14 DF,  p-value: 2.046024e-07 
Without NAs
 ydat3-na.omit(ydat2)
 regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat3)
 summary(regCred)

Call:
lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + 
    BoBCL1, data = ydat3)

Residuals:
    Min  1Q  Median  3Q Max 
-124.988463  -33.133975    7.971083   23.607953   76.813601 

Coefficients:
 Estimate    Std. Error  t value   Pr(|t|)    
(Intercept) -538.61375718  205.91179535 -2.61575   0.020344 *  
DCred2 0.96401908    0.15623660  6.17025 2.4337e-05 ***
DCred3    -0.25720355    0.08983607 -2.86303   0.012524 *  
DBoBC2    -0.11222347    0.07828182 -1.43358   0.173646    
DBoBC3 0.04564621    0.03825169  1.19331   0.252578    
CredL1 0.18499925    0.06565456  2.81777   0.013693 *  
BoBCL1    -0.07682710    0.03406916 -2.25503   0.040666 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 54.44479 on 14 degrees of freedom
Multiple R-squared: 0.9324472,    Adjusted R-squared: 0.903496 
F-statistic: 32.20757 on 6 and 14 DF,  p-value: 2.046024e-

#Same result
Not sure what you meant by (This is good, but couldn't I code the process for 
my 15 variable model?)


A.K.


From: Lekgatlhamang, lexi Setlhare lexisetlh...@yahoo.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Monday, July 2, 2012 5:13 AM
Subject: Re: [R]  Adjusting length of series


Hi David and AK,
I have been trying to implement your suggestions since yesterday, but I 
encountered some challenges.

As for David's suggestions, I could only implement it after some 
modifications. Using an abridged version of my data, I dpud my dataset and then 
show my steps below.

 dput(ydata)
structure(c(68.10004, -34.80002, 90.39996, 
54.60004, -172.3, 51.80002, 175, 79.80002, 
-35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 
98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, 
-17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, 
-226.9001, 224.1, 123.2,
-95.19998, -115.5001, 
166.2001, -13.69998, -184.3, 232, 350.3, -840.9001, 
424.5001, 61.79993, -107, 230.4001, -395.2001, 
239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160., 
-191.1001, 451.0001, -100.9001, -218.4, 
-20.30011, 281.7002, -179.9001, -170.6, 
416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999, 
337.4001, -625.6001, 634.6001, -384.5001, 
448.7001, NA, NA, -164.45784099, 17.079353995, 
95.976788009, 680.23816699, -491.34869099, -274.694009, 
-256.332907, 469.62296, -146.431891, -41.077201995, -106.970104, 
757.68826399, -1689.214533, 2320.098952, -1446.97942,

Re: [R] turning R expressions into functions?

2012-07-02 Thread Jochen Voß

Dear Thomas,

Many thanks for your answer.

On Sat, Jun 30, 2012 at 10:22:52AM +0900, Thomas Lumley wrote:
  1) good: If I run the following using Rscript
 
   test1 - function(e1) {
    e1 - substitute(e1)
    FuncIt(100, e1)
   }
 
   f - test1(rnorm(1))
   print(f)
 
  then I get the following output:
 
   function ()
   {
      for (funcit.i in 1:100) {
          rnorm(1)
      }
   }
   environment: 0x102260c28
 
  This is what I want.  But why do I need the extra substitute
  in test1?  I only found by experiment that this is needed.
 
 You don't.  You need an extra quote() in the argument.
 [...]
 You can get around this using substitute(), which extracts the
 unevaluated code from the formal argument, but it's probably a bad
 idea, since the user of the function should expect all the arguments
 to be evaluated.

I want my final function to work like system.time, i.e.
the user should not have to type quote() all the time
when calling the top-level function of my measuring
mechanism.

Is there a way to do the quoting inside the top-level function
call?

Many thanks,
Jochen Voss
-- 
http://seehuhn.de/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply with multiple conditions

2012-07-02 Thread pguilha

Jean, that's exactly what it should be, but yes I copied and pasted
from your email so I don't see how I could have introduced an error in
there
paul

On 2 July 2012 15:57, Jean V Adams [via R]
ml-node+s789695n4635144...@n4.nabble.com wrote:
 Paul,

 Are you submitting the exact code that I included in my previous e-mail?
 When I submit that code, I get this ...

   chrom chromStart chromEnd name cumsum bin
 1  chr1  1008910309   ZBTB33  10089   1
 2  chr1  1013210536  TAF7_(SQ-8)  20221   1
 3  chr2  1013310362 Pol2-4H8  30354   2
 4  chr2  1014810418 MafF_(M8194)  40502   2
 5  chr2 210382   210578   ZBTB33  50884   3
 6  chr2 216132   216352 CTCF  67016   3

 Jean


 Paul Guilhamon [hidden email] wrote on 07/02/2012 08:59:00 AM:

 Thanks for your reply Jean,

 I think your interpretation is correct but when I run your code I end
 up with the below dataframe and obviously the bins created there don't
 correspond to a chromStart change of 115341:

   chrom chromStart chromEnd name cumsum bin
 1  chr1  1008910309   ZBTB33  10089   1
 2  chr1  1013210536  TAF7_(SQ-8)  20221   2
 3  chr2  1013310362 Pol2-4H8  30354   3
 4  chr2  1014810418 MafF_(M8194)  40502   4
 5  chr2 210382   210578   ZBTB33  50884   5
 6  chr2 216132   216352 CTCF  67016   6

 the first two rows should have the same bin number (same chrom,
 115341 diff), then rows 34 should be in another bin (different chrom
 from rows 12, 115341 diff), and rows 56 in another one (same chrom
 but 115341 difference between row 4 and row 5).

 it seems the new.bin line of your code isn't quite doing what it
 should but I can't pinpoint the error there...
 Paul


 On 2 July 2012 14:19, Jean V Adams [hidden email] wrote:
  Paul,
 
  My interpretation is that you are trying to assign a new bin number to
 a row
  every time the variable chrom changes and every time the variable
 chromStart
  changes by 115341 or more.  Is that right?  If so, you don't need a
 loop at
  all.  Check out the code below.  I made a couple changes to the
 all.tf7
  example data frame so that it would have two changes in bin number,
 one

  based on the chrom variable and one based on the chromStart variable.
 
  Jean
 
  all.tf7 - data.frame(
  chrom = c(chr1, chr1, chr2, chr2, chr2, chr2),
  chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
  chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
  name = c(ZBTB33, TAF7_(SQ-8), Pol2-4H8, MafF_(M8194),
  ZBTB33, CTCF),
  cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
  bin = rep(NA, 6)
  )
 
  # assign a new bin every time chrom changes and every time chromStart
  changes by 115341 or more
  L - nrow(all.tf7)
  prev.chrom - c(NA, all.tf7$chrom[-L])
  delta.start - c(NA, all.tf7$chromStart[-1] - all.tf7$chromStart[-L])
  new.bin - is.na(prev.chrom) | all.tf7$chrom != prev.chrom |
 delta.start =

  115341
  all.tf7$bin - cumsum(new.bin)
  all.tf7
 
 
  pguilha [hidden email] wrote on 07/02/2012 06:25:13 AM:
 
  Hello all,
 
  I have written a for loop to act on a dataframe with close to
 3million
  rows
  and 6 columns and I would like to pass it to apply() to speed the
 process
  up
  (I let the loop run for 2 days before stopping it and it had only
 gone
  through 200,000 rows) but I am really struggling to find a way to
 pass the
  arguments. Below are the loop and the head of the dataframe I am
 working
  on.
  Any hints would be much appreciated, thank you! (I have searched for
 this

  but could not find any other posts doing quite what I want)
  Paul
 
  x-as.numeric(all.tf7[1,2])
  for (i in 2:nrow(all.tf7)) {
if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)115341)
  all.tf7[i,6]-all.tf7[i-1,6]
else if (all.tf7[i,1]==all.tf7[i-1,1]  (all.tf7[i,2]-x)=115341) {
  all.tf7[i,6]-(all.tf7[i-1,6]+1)
  x-as.numeric(all.tf7[i,2]) }
else if (all.tf7[i,1]!=all.tf7[i-1,1])  {
  all.tf7[i,6]-(all.tf7[i-1,6]+1)
  x-as.numeric(all.tf7[i,2]) }
  }
 
  #the aim here is to attribute a bin number to each row so that I can
 then

  split the dataframe according to those bins.
 
 
  chrom chromStart chromEnd name cumsum bin
  chr1  10089 10309   ZBTB33  10089   1
  chr1  10132 10536  TAF7_(SQ-8)  20221   1
  chr1  10133 10362Pol2-4H8  30354   1
  chr1  10148 10418  MafF_(M8194)  40502   1
  chr1  10382 10578ZBTB33  50884   1
  chr1  16132 16352CTCF  67016   1

 [[alternative HTML version deleted]]

 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal,

Re: [R] turning R expressions into functions?

2012-07-02 Thread Jochen Voß

Dear Greg,

many thanks for your anwer.

On Sat, Jun 30, 2012 at 11:39:07AM -0600, Greg Snow wrote:
 Look at the replicate function, it takes an expression (does not need
 a function) and runs that expression the specified number of times.
 Will that accomplish what you want without needing to worry about
 substitute, quote, eval, etc.?

Yes, this is very similar to what I want to achieve.
One of the main differences is that 'replicate' builds
up a list of all call results, and for my measurements
I want to avoid the resulting (time and memory) overhead.
But I did look at the implementation of 'replicate'
and this is where I took the trick of using
eval.parent and substitute from.

All the best,
Jochen
-- 
http://seehuhn.de/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] turning R expressions into functions?

2012-07-02 Thread Jochen Voß

Dear Dirk,

On Sat, Jun 30, 2012 at 01:28:13PM -0500, Dirk Eddelbuettel wrote:
 And also look at the existing benchmark packages 'rbenchmark' and
 'microbenchmark':

Many thanks for pointing out these packages, I wasn't aware of these.

R library(microbenchmark)
R x - 5; microbenchmark( 1/x, x^-1 )
Unit: nanoseconds
  expr minlq medianuq  max
1  1/x 296 322.5341 364.0 6298
2 x^-1 516 548.5570 591.5 5422

My own code (current version attached, comments would be very welcome)
is much more chatty:

R source(timeit.R)
R x - 5; TimeIt(1/x, x^-1)
tuning ...
measuring 10*1466753 samples for each expression ...
  |==| 
 100%

execution time comparison:
1/x(0.000571 ± 1.48e-05) ms/call
x^-1   (0.000864 ± 9.69e-06) ms/call
CI for difference: [-0.00031, -0.000275] ms/call

'1/x' is about 33.9% faster (p=2.75e-11)

One of the things I would love to add to my package would be the
ability to compare more than two expressions in one call.  But
unfortunately, I haven't found out so far whether (and if so, how) it
is possible to extract the elements of a ... object without
evaluating them.

Many thanks,
Jochen
-- 
http://seehuhn.de/
# timeit.R - pairwise comparison for the execution time of R expressions
#
# Copyright (c) 2012  Jochen Voss  v...@seehuhn.de
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see http://www.gnu.org/licenses/.
#
# --
#
# This file provides the R command 'TimeIt' to compare the execution
# time of two R expressions.

FuncIt - function(k, expr) {
  # Return a function which executes an expression k times.
  #
  # Args:
  #   k: The number of times 'expr' is executed.
  #   expr: An R expression.
  #
  # Returns:
  #   An R function, executing 'expr' in a loop.
  k - as.numeric(k)
  expr - eval.parent(substitute(expr))
  fn - eval(substitute(function() { for (funcit.i in 1:k) { expr } }))
  return(fn)
}

TuneIt - function(expr, max.seconds=1) {
  # Determine the approximate cost of calling an R expression in a
  # loop.  This function tries loops of different length and the uses
  # linear interpolation to get the result.
  #
  # Args:
  #   expr: The R expression to test.
  #   max.seconds: How much time (approximately) to use for
  #measurements, in seconds.  This should be much
  #larger than the resolution of 'system.time'.
  #Default is 1.
  #
  # Returns:
  #   A vector 'x' of length 2, such that the execution time for 'k'
  #   iterations is approximately 'x[0] + k * x[1]'.
  kk - c()
  tt - c()
  k - 1
  repeat {
f - FuncIt(k, expr)
t - system.time(f())[1]
kk - c(kk, k)
tt - c(tt, t)
if (t  max.seconds / 3) break
k - 2 * k
  }
  if (k  1) {
fit - lm(tt ~ kk)
return(coefficients(fit))
  } else {
return(c(0, tt))
  }
  return(k)
}

TimeIt - function(ex1, ex2, total.time=30, verbose=T) {
  # Compare the execution time of two R expressions.
  #
  # Args:
  #   ex1: The first R expression to evaluate
  #   ex2: The second R expression to evaluate
  #   total.time: How much time (approximately) to spend on
  # measuring, in seconds.  Longer times lead to more
  # accurate measurements and allow to detect smaller
  # differences in run time.  Default is 30 seconds.
  #
  # Returns:
  #   An object of class 'TimeIt', summarising the difference in
  #   execution time of the two expressions.
  start - proc.time()[1]

  ex1 - substitute(ex1)
  ex2 - substitute(ex2)

  if (verbose) {
cat(tuning ...\n)
  }
  # Use at most 20% or 10 seconds (whatever is smaller) of our time
  # budget for tuning.
  tune.time - min(.2*total.time, 10)
  c1 - TuneIt(ex1, tune.time / 2)
  c2 - TuneIt(ex2, tune.time / 2)
  mid - proc.time()[1] - start
  total.time - total.time - mid

  block.min - 1
  block.target - total.time^(1/4) * block.min^(3/4)
  c - c1 + c2
  block.k - max(round((block.target - c[1]) / c[2]), 1)
  f1 - FuncIt(block.k, ex1)
  f2 - FuncIt(block.k, ex2)
  ex1.time - c1[1] + block.k * c1[2]
  ex2.time - c2[1] + block.k * c2[2]
  pair.time - ex1.time + ex2.time
  n - max(round(total.time / pair.time), 2)

  if (verbose) {
cat(measuring , n, *, block.k,
 samples for each expression ...\n, sep=)
flush.console()
progress -

[R] vectorization with subset?

2012-07-02 Thread dlv04c

Hello,

I have a data frame (68,000 rows) of scores (V4) for a series of [genomic]
coordinates ranges (V2 to V3).



I also have a data frame (1.2 million rows) of single [genomic] coordinates.  



For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.

It would take about 4 days for this to finish for a single data set, and I
have 64 data sets.

Why does the rate at which coordinate averages are calculated increase when
coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?

Thanks,

Dan

--
View this message in context: 
http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adjusting length of series

2012-07-02 Thread Lekgatlhamang, lexi Setlhare

Noted David, and thanks very much.
Â 
Lexi
 


 From: David Winsemius dwinsem...@comcast.net


Sent: Monday, July 2, 2012 4:26 PM
Subject: Re: [R] Adjusting length of series
  

On Jul 2, 2012, at 5:13 AM, Lekgatlhamang, lexi Setlhare wrote:

 Hi David and AK,
 I have been trying to implement your suggestions since yesterday, but I 
 encountered some challenges.
 
 As for David's suggestions, I could only implement it after some 
 modifications. Using an abridged version of my data, I dpud my dataset and 
 then show my steps below.

Well, your initial question (why the $ referencing did not work) is now 
answered. This is not a dataframe but rather a 'ts' classed object and there is 
no `$` method for such objects. They are really matrices with some extra 
attributes.

 ydata$BoBCL1
Error in ydata$BoBCL1 : $ operator is invalid for atomic vectors

As I understood it you were able to get useful analyses using the formula 
methods for lm on these objects, but were just having difficulty with the $ 
operator. So the answer is . don't do that.
--David.

 
 dput(ydata)
 structure(c(68.10004, -34.80002, 90.39996,
 54.60004, -172.3, 51.80002, 175, 79.80002,
 -35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1,
 98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8,
 -17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993,
 -226.9001, 224.1, 123.2, -95.19998, -115.5001,
 166.2001, -13.69998, -184.3, 232, 350.3, -840.9001,
 424.5001, 61.79993, -107, 230.4001, -395.2001,
 239.4001, -145.1, 303.6, NA, NA, NA, 228.1, -160.,
 -191.1001, 451.0001, -100.9001, -218.4,
 -20.30011, 281.7002, -179.9001, -170.6,
 416.3, 118.3, -1191.2, 1265.4, -362.7002, -168.7999,
 337.4001, -625.6001, 634.6001, -384.5001,
 448.7001, NA, NA, -164.45784099, 17.079353995,
 95.976788009, 680.23816699, -491.34869099, -274.694009,
 -256.332907, 469.62296, -146.431891, -41.077201995, -106.970104,
 757.68826399, -1689.214533, 2320.098952, -1446.97942, 516.384521,
 -375.27765099, 293.86702999, 417.845195, 278.198807,
 -968.59203399, -314.195986, NA, NA, NA, 181.53719499,
 78.897434013, 584.26137898, -1171.586858, 216.65468199,
 18.361101998, 725.955867, -616.054851, 105.35468901,
 -65.892902005, 864.65836799, -2446.902797, 4009.313485,
 -3767.078372, 1963.363941, -891.66217199, 669.14468099,
 123.978165, -139.646388, -1246.790841, 654.396048, NA, 4937,
 5005.1, 4970.3, 5060.7, 5115.3, 4943, 4994.8, 5169.8, 5249.6,
 5213.9, 5344.4, 5461.2, 5393.7, 5558.2, 6073, 5746.9, 5845.3,
 6005.5, 6058.7, 6342.3, 6230.7, 6358.5, 6341.2, 6627.5, 4187.5,
 4296.004835, 4240.051829, 4201.178177, 4258.281313, 4995.622616,
 5241.615228, 5212.913831, 4927.879527, 5112.468183, 5150.624948,
 5147.704511, 5037.81397, 5685.611693, 4644.194883, 5922.877025,
 5754.579747, 6102.66699, 6075.476582, 6342.153204, 7026.675021,
 7989.395645, 7983.524235, 7663.456839), .Dim = c(24L, 7L), .Dimnames = list(
Â  Â   NULL, c(DCred1, DCred2, DCred3, DBoBC2, DBoBC3,
Â  Â   CredL1, BoBCL1)), .Tsp = c(2001.083, 2003, 12
 ), class = c(mts, ts))
 
 NB: the NAs in the dataset emanated from lagging or differencing the series
 
 David's suggestion
Â  df-data.frame(DCred1,DCred2,DCred3,DBoBC2,DBoBC3,CredL1,BoBCL1)
 Error in data.frame(DCred1, DCred2, DCred3, DBoBC2, DBoBC3, CredL1, BoBCL1) :
Â   arguments imply differing number of rows: 23, 22, 21, 24
 
 So I modified as follows:
 length(DCred3)Â  # finding the minimum length of various series
 [1] 21
 
 # Then dataframe construction
 dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21],
 + 
 Dbobc2=DBoBC2[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21])
 # Then estimated regression
 regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe)
 summary(regCred)
 # Worked well as shown by results below
 Call:
 lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL +
Â  Â   BoBCL, data = dframe)
 Residuals:
Â  Â   MinÂ  Â  Â  1QÂ  MedianÂ  Â  Â  3QÂ  Â   Max
 -69.516 -27.695Â  -8.085Â  13.851 107.276
 Coefficients:
Â  Â  Â  Â  Â  Â  Â  Estimate Std. Error t value Pr(|t|)
 (Intercept) 159.32304Â  157.15209Â   1.014 0.327873
 Dcre2Â  Â  Â  Â  -0.75527Â  Â  0.17262Â  -4.375 0.000634 ***
 Dcre3Â  Â  Â  Â  -0.21006Â  Â  0.08656Â  -2.427 0.029329 *
 Dbobc2Â  Â  Â  Â  0.05111Â  Â  0.06565Â   0.779 0.449197
 Dbobc3Â  Â  Â  Â  0.03106Â  Â  0.03510Â   0.885 0.391108
 CredLÂ  Â  Â  Â  -0.10967Â  Â  0.04933Â  -2.223 0.043177 *
 BoBCLÂ  Â  Â  Â   0.09756Â  Â  0.03097Â   3.150 0.007087 **
 ---
 Signif. codes:Â  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
 â 1
 Residual standard

Re: [R] R sub query

Hi,

Either of these should work:
m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, 
.:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)

 gsub(^\\.:[[:digit:]]+:,,m)
 [,1]  [,2]    [,3]   [,4]   
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56  0,13 75,0 


 gsub(^\\.:\\d+:,,m)
 [,1]  [,2]    [,3]   [,4]   
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56  0,13 75,0 


A.K.



- Original Message -
From: Sarah Auburn saub...@yahoo.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Monday, July 2, 2012 4:15 AM
Subject: [R] R sub query

Hello,
I would like to substitute a substring of characters defined by a specific 
start and end sequence. 
i.e. in the example matrix below, I would like to substitute .:X: with , 
where X varies in sequence...
 
m-matrix(c(.:0:0,0, .:2:0,2, .:194:193,1, .:56:0,56, .:58:50,8, 
.:13:0,13,  .:114:114,0, .:75:75,0), nrow=2)
 
output required:
 [,1]  [,2]  [,3]    [,4] 
[1,] 0,0 193,1 50,8 114,0
[2,] 0,2 0,56   0,13 75,0 
 
Thank you for any help
Sarah
    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] residuals from lm

2012-07-02 Thread chuck.01

FYI:
As you are likely thinking:  this doesn't belong here (It just occurred to
me), I am questioning what I did, not the output of lm()
But if someone knows why I am wrong please let me know.



chuck.01 wrote
 
 Hi, 
 I was playing around with something else and I noticed this matrix code
 for residuals in a linear model doesn't say what lm() says.  Please tell
 me if I am completely misguided here. 
 
 data(mtcars)
 Y - as.matrix(mtcars[,1])
 X - as.matrix(mtcars[,c(2:11)])
 
 # shouldnt this: 
 H - X %*% solve(t(X) %*% X) %*% t(X)
 (diag(dim(H)[1]) - H) %*% Y
 
 # be equal to this:
 residuals(lm(Y~X)) 
 
 # ???  
 # thanks
 


--
View this message in context: 
http://r.789695.n4.nabble.com/residuals-from-lm-tp4635155p4635161.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adjusting length of series

2012-07-02 Thread Lekgatlhamang, lexi Setlhare

Thanks very much A.K. I have to admit that my problem was not clearly stated, 
with the structure of my data provided. Now all is well.
Â 
Cheers
Lexi
 




Cc: R help r-help@r-project.org 
Sent: Monday, July 2, 2012 4:40 PM
Subject: Re: [R]  Adjusting length of series
  


Hello,

The class of your data is not dataframe.
Suppose I call your data as ydat1

str(ydat1)

Â mts [1:24, 1:7] 68.1 -34.8 90.4 54.6 -172.3 ...
Â - attr(*, dimnames)=List of 2
Â  ..$ : NULL
Â  ..$ : chr [1:7] DCred1 DCred2 DCred3 DBoBC2 ...
Â - attr(*, tsp)= num [1:3] 2001 2003 12
Â - attr(*, class)= chr [1:2] mts ts

ydat2-data.frame(ydat1)

str(ydat2)
'data.frame':Â Â Â  24 obs. ofÂ  7 variables:
Â $ DCred1: numÂ  68.1 -34.8 90.4 54.6 -172.3 ...
Â $ DCred2: numÂ  NA -102.9 125.2 -35.8 -226.9 ...
Â $ DCred3: numÂ  NA NA 228 -161 -191 ...
Â $ DBoBC2: numÂ  NA -164.5 17.1 96 680.2 ...
Â $ DBoBC3: numÂ  NA NA 181.5 78.9 584.3 ...
Â $ CredL1: numÂ  4937 5005 4970 5061 5115 ...
Â $ BoBCL1: numÂ  4188 4296 4240 4201 4258 ...

#Since you wanted only to do lm for these columns, I guess it doesn't really 
matter whether you have month and year in the dataset.
Â #With NAs
Â regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat2)
 summary(regCred)

Call:
lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + 
Â Â Â  BoBCL1, data = ydat2)

Residuals:
Â Â Â Â Â Â Â  MinÂ Â Â Â Â Â Â Â Â  1QÂ Â Â Â Â  MedianÂ Â Â Â Â Â Â Â Â  
3QÂ Â Â Â Â Â Â Â  Max 
-124.988463Â  -33.133975Â Â Â  7.971083Â Â  23.607953Â Â  76.813601 

Coefficients:
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  EstimateÂ Â Â  Std. ErrorÂ  t valueÂ Â  
Pr(|t|)Â Â Â  
(Intercept) -538.61375718Â  205.91179535 -2.61575Â Â  0.020344 *Â  
DCred2Â Â Â Â Â Â Â Â  0.96401908Â Â Â  0.15623660Â  6.17025 2.4337e-05 ***
DCred3Â Â Â Â Â Â Â  -0.25720355Â Â Â  0.08983607 -2.86303Â Â  0.012524 *Â  
DBoBC2Â Â Â Â Â Â Â  -0.11222347Â Â Â  0.07828182 -1.43358Â Â  0.173646Â Â Â  
DBoBC3Â Â Â Â Â Â Â Â  0.04564621Â Â Â  0.03825169Â  1.19331Â Â  0.252578Â Â Â  
CredL1Â Â Â Â Â Â Â Â  0.18499925Â Â Â  0.06565456Â  2.81777Â Â  0.013693 *Â  
BoBCL1Â Â Â Â Â Â Â  -0.07682710Â Â Â  0.03406916 -2.25503Â Â  0.040666 *Â  
---
Signif. codes:Â  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1 

Residual standard error: 54.44479 on 14 degrees of freedom
Â  (3 observations deleted due to missingness)
Multiple R-squared: 0.9324472,Â Â Â  Adjusted R-squared: 0.903496 
F-statistic: 32.20757 on 6 and 14 DF,Â  p-value: 2.046024e-07 
Without NAs
 ydat3-na.omit(ydat2)
 regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat3)
 summary(regCred)

Call:
lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + 
Â Â Â  BoBCL1, data = ydat3)

Residuals:
Â Â Â Â Â Â Â  MinÂ Â Â Â Â Â Â Â Â  1QÂ Â Â Â Â  MedianÂ Â Â Â Â Â Â Â Â  
3QÂ Â Â Â Â Â Â Â  Max 
-124.988463Â  -33.133975Â Â Â  7.971083Â Â  23.607953Â Â  76.813601 

Coefficients:
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  EstimateÂ Â Â  Std. ErrorÂ  t valueÂ Â  
Pr(|t|)Â Â Â  
(Intercept) -538.61375718Â  205.91179535 -2.61575Â Â  0.020344 *Â  
DCred2Â Â Â Â Â Â Â Â  0.96401908Â Â Â  0.15623660Â  6.17025 2.4337e-05 ***
DCred3Â Â Â Â Â Â Â  -0.25720355Â Â Â  0.08983607 -2.86303Â Â  0.012524 *Â  
DBoBC2Â Â Â Â Â Â Â  -0.11222347Â Â Â  0.07828182 -1.43358Â Â  0.173646Â Â Â  
DBoBC3Â Â Â Â Â Â Â Â  0.04564621Â Â Â  0.03825169Â  1.19331Â Â  0.252578Â Â Â  
CredL1Â Â Â Â Â Â Â Â  0.18499925Â Â Â  0.06565456Â  2.81777Â Â  0.013693 *Â  
BoBCL1Â Â Â Â Â Â Â  -0.07682710Â Â Â  0.03406916 -2.25503Â Â  0.040666 *Â  
---
Signif. codes:Â  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1 

Residual standard error: 54.44479 on 14 degrees of freedom
Multiple R-squared: 0.9324472,Â Â Â  Adjusted R-squared: 0.903496 
F-statistic: 32.20757 on 6 and 14 DF,Â  p-value: 2.046024e-

#Same result
Not sure what you meant by (This is good, but couldn't I code the process for 
my 15 variable model?)


A.K.




Cc: R help r-help@r-project.org 
Sent: Monday, July 2, 2012 5:13 AM
Subject: Re: [R]Â Â Adjusting length of series


Hi David and AK,
I have been trying to implement your suggestions since yesterday, but I 
encountered some challenges.

As for David's suggestions, I could only implement it after some 
modifications.Â Using an abridgedÂ version of my data, I dpud my dataset and 
then show my steps below.

 dput(ydata)
structure(c(68.10004, -34.80002, 90.39996, 
54.60004, -172.3, 51.80002, 175, 79.80002, 
-35.70007, 130.5, 116.8, -67.5, 164.5, 514.8, -326.1, 
98.40005, 160.2, 53.19998, 283.6, -111.6, 127.8, 
-17.30002, 286.3, NA, NA, -102.9001, 125.2, -35.79993, 
-226.9001, 224.1, 123.2,
-95.19998, -115.5001, 
166.2001, -13.69998, -184.3, 232, 350.3, -840.9001,
424.5001, 61.79993, -107, 230.4001,

Re: [R] Adjusting length of series



Hi,

One more thing,
ydat1: original dataset


 ydat2-data.frame(ydat1)

#Not sure ,how you did this step on original data because::
dframe- data.frame(Dcre1=DCred1[1:21],Dcre2=DCred2[1:21],Dcre3=DCred3[1:21],
 Dbobc2=DBoBC2[1:21],Dbobc3=DBoBC3[1:21],CredL=CredL1[1:21],BoBCL=BoBCL1[1:21])

I am getting errors for that step, when I used ydat1.


head(ydat1)
[1]   68.1  -34.8   90.4   54.6 -172.3   51.8


 head(ydat2)
  DCred1 DCred2 DCred3 DBoBC2  DBoBC3 CredL1   BoBCL1
1   68.1 NA NA NA  NA 4937.0 4187.500
2  -34.8 -102.9 NA -164.45784  NA 5005.1 4296.005
3   90.4  125.2  228.1   17.07935   181.53719 4970.3 4240.052
4   54.6  -35.8 -161.0   95.97679    78.89743 5060.7 4201.178
5 -172.3 -226.9 -191.1  680.23817   584.26138 5115.3 4258.281
6   51.8  224.1  451.0 -491.34869 -1171.58686 4943.0 4995.623




#I analyzed [1:21] again in ydat2.

dframe-data.frame(Dcre1=ydat2$DCred1[1:21],Dcre2=ydat2$DCred2[1:21],Dcre3=ydat2$DCred3[1:21],Dbobc2=ydat2$DBoBC2[1:21],Dbobc3=ydat2$DBoBC3[1:21],CredL=ydat2$CredL1[1:21],BoBCL=ydat2$BoBCL1[1:21])
But, the results are bit different than in my earlier post, because, here the 
NAs are still present in different rows.  So, the observations in those rows 
will be deleted while it is analyzed.

regCred- lm(Dcre1~Dcre2+Dcre3+Dbobc2+Dbobc3+CredL+BoBCL, data=dframe)
 summary(regCred)

Call:
lm(formula = Dcre1 ~ Dcre2 + Dcre3 + Dbobc2 + Dbobc3 + CredL + 
    BoBCL, data = dframe)

Residuals:
 Min   1Q   Median   3Q  Max 
-118.687  -25.568   -5.334   35.035   69.992 

Coefficients:
  Estimate Std. Error t value Pr(|t|)    
(Intercept) -485.42427  209.47952  -2.317 0.038958 *  
Dcre2  0.95097    0.18156   5.238 0.000209 ***
Dcre3 -0.28676    0.10787  -2.658 0.020852 *  
Dbobc2    -0.09512    0.09334  -1.019 0.328278    
Dbobc3 0.03199    0.04933   0.648 0.528936    
CredL  0.14825    0.07193   2.061 0.061645 .  
BoBCL -0.04844    0.04333  -1.118 0.285540

---
A.K.






From: Lekgatlhamang, lexi Setlhare lexisetlh...@yahoo.com
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Monday, July 2, 2012 11:43 AM
Subject: Re: [R]  Adjusting length of series


Thanks very much A.K. I have to admit that my problem was not clearly stated, 
with the structure of my data provided. Now all is well.

Cheers
Lexi

From: arun smartpink...@yahoo.com
To: Lekgatlhamang, lexi Setlhare lexisetlh...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Monday, July 2, 2012 4:40 PM
Subject: Re: [R]  Adjusting length of series



Hello,

The class of your data is not dataframe.
Suppose I call your data as ydat1

str(ydat1)

 mts [1:24, 1:7] 68.1 -34.8 90.4 54.6 -172.3 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:7] DCred1 DCred2 DCred3 DBoBC2 ...
 - attr(*, tsp)= num [1:3] 2001 2003 12
 - attr(*, class)= chr [1:2] mts ts

ydat2-data.frame(ydat1)

str(ydat2)
'data.frame':    24 obs. of  7 variables:
 $ DCred1: num  68.1 -34.8 90.4 54.6 -172.3 ...
 $ DCred2: num  NA -102.9 125.2 -35.8 -226.9 ...
 $ DCred3: num  NA NA 228 -161 -191 ...
 $ DBoBC2: num  NA -164.5 17.1 96 680.2 ...
 $ DBoBC3: num  NA NA 181.5 78.9 584.3 ...
 $ CredL1: num  4937 5005 4970 5061 5115 ...
 $ BoBCL1: num  4188 4296 4240 4201 4258 ...

#Since you wanted only to do lm
for these columns, I guess it doesn't really matter whether you have month and 
year in the dataset.
 #With NAs
 regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat2)
 summary(regCred)

Call:
lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + 
    BoBCL1, data = ydat2)

Residuals:
    Min  1Q  Median  3Q Max 
-124.988463  -33.133975    7.971083   23.607953   76.813601 

Coefficients:
 Estimate    Std. Error  t value   Pr(|t|)    
(Intercept)
-538.61375718  205.91179535 -2.61575   0.020344 *  
DCred2 0.96401908    0.15623660  6.17025 2.4337e-05 ***
DCred3    -0.25720355    0.08983607 -2.86303   0.012524 *  
DBoBC2    -0.11222347    0.07828182 -1.43358   0.173646    
DBoBC3 0.04564621    0.03825169  1.19331   0.252578    
CredL1 0.18499925    0.06565456  2.81777   0.013693 *  
BoBCL1    -0.07682710    0.03406916 -2.25503   0.040666 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01
‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 54.44479 on 14 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared: 0.9324472,    Adjusted R-squared: 0.903496 
F-statistic: 32.20757 on 6 and 14 DF,  p-value: 2.046024e-07 
Without NAs
 ydat3-na.omit(ydat2)
 regCred-lm(DCred1~DCred2+DCred3+DBoBC2+DBoBC3+CredL1+BoBCL1,data=ydat3)
 summary(regCred)

Call:
lm(formula = DCred1 ~ DCred2 + DCred3 + DBoBC2 + DBoBC3 + CredL1 + 
    BoBCL1, data = ydat3)

Residuals:
    Min  1Q  Median  3Q Max 
-124.988463  -33.133975    7.971083   23.607953

[R] residuals from lm

2012-07-02 Thread chuck.01

Hi, 
I was playing around with something else and I noticed this matrix code for
residuals in a linear model doesn't say what lm() says.  Please tell me if I
am completely misguided here. 

data(mtcars)
Y - as.matrix(mtcars[,1])
X - as.matrix(mtcars[,c(2:11)])

# shouldnt this: 
H - X %*% solve(t(X) %*% X) %*% t(X)
(diag(dim(H)[1]) - H) %*% Y

# be equal to this:
residuals(lm(Y~X)) 

# ???  
# thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/residuals-from-lm-tp4635155.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predicting expected number of events using a coxph model

2012-07-02 Thread agittens

Peter Dalgaard-2 wrote

I fit a coxph model:

coxphfit - coxph(Surv(sampledLifetime, !sampledCensoredQ) ~ curpbc6 +
prevpbc6, sampledTimeSeries)

Now I'm trying to predict the expected number of events using a new
dataset.
The documentation suggests that

coxPred - predict(coxphfit, newdata = testTimeSeries, type=expected)

will do what I want, but I get the error

Error in model.frame.default(data = testTimeSeries, formula =
Surv(sampledLifetime, :
variable lengths differ (found for 'curpbc6')

when I do this. The dataframes sampledTimeSeries and testTimeSeries were
constructed by taking rows from a larger dataframe, so they have the same
data.

What am I doing incorrectly?

Most likely referring to a variable not in testTimeSeries. (I kind of
suspect that unlike predict.lm, predict.coxph does not ignore the left
hand side of formulas. Does testTimeSeries contain a sampledLifetime
column?)

No, I did not have the lifetime and censored data in the dataframe.

Per your idea, I put the sampledLifetime and and sampledCensoredQ variables
in the dataframe sampledTimeSeries and left the rest of the code the same.
Now when I try with the new data set,

coxPred - predict(coxphfit, newdata = testTimeSeries, type=expected)

I get different errors. If I use testTimeSeries without the lifetime and
censor indicator columns (which shouldn't be required for prediction), then
i get the same error as before. If I put in these columns, then I get the
error

Error in predict.coxph(coxphfit, newdata = testTimeSeries, type =
expected, :
object 'x' not found

--
View this message in context:
http://r.789695.n4.nabble.com/predicting-expected-number-of-events-using-a-coxph-model-tp4634935p4635168.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error() model is singular - what does that mean

2012-07-02 Thread Bert Gunter

WARNING: Not tested in the absence of data provided by dput() to allow
easy input into R

1. Change the names of the inputs by removing the dash: This is not a
legitimate R name and c/sh ould be causing  problems in the aov() call
since the names are not quoted.

2. The model specification is wrong. It should be:
aov(Correct~TaskKind*DataKind+Error(Subject),data=allDataRaw.xp)

-- Bert



On Mon, Jul 2, 2012 at 6:04 AM, Jessica Streicher
j.streic...@micromata.dewrote:

 Also, try googling for  - R model is singular - , there seem to have been
 a lot of people with that particular error.

 On 02.07.2012, at 14:56, Jessica Streicher wrote:

  Just looking at it i would try renaming Task-Kind, Data-Kind an
 Time-Taken
  Those are ambiguous in the Formula.
 
  Task-Kind vs Task - Kind
 
  Though that might not be the error at hand :)
 
 
  On 02.07.2012, at 14:15, zetwal wrote:
 
  Hello
 
  I have some test data that looks like that from a within subject
 experiment.
  Subject   Task-KindData-Kind   Time-Taken   Correct
  1A  Data1 5   1
  1A  Data1 3   0
  1A  Data1 1   1
  1A  Data2 8   1
  1A  Data2 7   0
  1A  Data2 5   0
  1A  Data3 2   1
  1A  Data3 7   0
  1A  Data350
  1A  Data360
  1B  Data1 3   1
  1B  Data1 1   1
  1B  Data1 3   0
  1B  Data2 9   0
  1B  Data2 8   1
  1B  Data2 5  0
  1B  Data3 2   1
  1B  Data3 7   2
  1B  Data353
  1B  Data360
  1C  Data1 3   1
  1C  Data1 1   1
  1C  Data1 3   0
  1C  Data2 9   0
  1C  Data2 8   1
  1C  Data2 5  0
  1C  Data3 2   1
  1C  Data3 7   2
  1C  Data353
  1C  Data360
  2A  Data1 5   1
  2A  Data1 3   0
  2A  Data1 1   1
  2A  Data2 8   1
  2A  Data2 7   0
  2A  Data2 5   0
  2A  Data3 2   1
  2A  Data3 7   0
  2A  Data350
  2A  Data360
  2B  Data1 3   1
  2B  Data1 1   1
  2B  Data1 3   0
  2B  Data2 9   0
  2B  Data2 8   1
  2   B  Data2 5  0
  2B  Data3 2   1
  2B  Data3 7   2
  2B  Data353
  2B  Data360
  2C  Data1 3   1
  2C  Data1 1   1
  2C  Data1 3   0
  2C  Data2 9   0
  2C  Data2 8   1
  2C  Data2 5  0
  2C  Data3 2   1
  2C  Data3 7   2
  2C  Data353
  2C  Data360
  .
  .
  .
 
  some notes:
  there are 20 subjects
  there are 5 different kinds of tasks
  There are 5 different kinds of data
  and there are several different variations for a certain kind of task
 and
  kind of data which is why for Subject = 1   Task-Kind=A  and
 Data-Kind=Data1
  we have 3 different results.
 
  The measured parameters are time to complete the task and whether it was
  correct or not (0 implies correct and 1 implies not correct)
 
  I am computing the anova as follows:
  aov.ex =
 
 aov(Correct~Task-Kind*Data-Kind+Error(Subject/(Task-Kind*Data-Kind)),data=allDataRaw.xp)
 
  since I want to see how the result is affected by the different kinds of
  data as well as the the kind of task and I get a warning message saying:
  Error() model is singular
 
  I would be very grateful if someone could please tell me what does this
  mean.
  Thanks
  Pascal
 
  --
  View this message in context:
 http://r.789695.n4.nabble.com/Error-model-is-singular-what-does-that-mean-tp4635103.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version

[R] 'init.win' error when installing from source

2012-07-02 Thread Erin Hodgess

Dear R People:

I'm installing R 2-.15.1 on a Windows 32 bit machine from source.

I'm getting a strange error about init.win (please see below)

Does this look familiar to anyone, please?

Thanks,
Erin



Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

c:\R\R-2.15.1\src\gnuwin32make all recommended
make all recommended
make[1]: `MkRules' is up to date.
make[4]: Nothing to be done for `svnonly'.
installing C headers
make[2]: Nothing to be done for `all'.
make[2]: `libRblas.dll.a' is up to date.
make[5]: Nothing to be done for `svnonly'.
installing C headers
make --no-print-directory -C ../extra/intl CFLAGS='-O3 -Wall -pedantic
-mtune=core2' -f Makefile.win
make --no-print-directory -C ../appl CFLAGS='-O3 -Wall -pedantic
-mtune=core2' FFLAGS='-O3 -mtune=core2' -f Makefile.win
make --no-print-directory -C ../nmath CFLAGS='-O3 -Wall -pedantic
-mtune=core2' FFLAGS='-O3 -mtune=core2' -f Makefile.win
make --no-print-directory -C ../main CFLAGS='-O3 -Wall -pedantic
-mtune=core2' FFLAGS='-O3 -mtune=core2' malloc-DEFS='-DLEA_MALLOC' -f
Makefile.win
make --no-print-directory -C ./getline CFLAGS='-O3 -Wall -pedantic -mtune=core2'
make[4]: `gl.a' is up to date.
make -f Makefile.win makeMakedeps
make -f Makefile.win libpcre.a
make[5]: `libpcre.a' is up to date.
make[4]: Nothing to be done for `all'.
make -f Makefile.win makeMakedeps
make -f Makefile.win libtre.a
make[5]: `libtre.a' is up to date.
make[4]: Nothing to be done for `all'.
make[5]: `stamp' is up to date.
make[5]: `liblzma.a' is up to date.
make[3]: `R.dll' is up to date.
cp R.dll ../../bin/i386
make[3]: Nothing to be done for `all'.
make --no-print-directory -C front-ends
make[2]: `COPYRIGHTS' is up to date.
make --no-print-directory -C ../modules -f Makefile.win \
  CFLAGS='-O3 -Wall -pedantic -mtune=core2' FFLAGS='-O3 -mtune=core2'
make[5]: *** No rule to make target `init_win.o', needed by
`../../../bin/i386/Rlapack.dll'.  Stop.
make[4]: *** [all] Error 2
make[3]: *** [all] Error 1
make[2]: *** [rmodules] Error 2
make[1]: *** [rbuild] Error 2
make: *** [all] Error 2

c:\R\R-2.15.1\src\gnuwin32

Thanks,
Erin




-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorization with subset?