Re: [R] generating phi using function()

2015-03-31 Thread Daniel Nordlund

On 3/30/2015 12:30 PM, T.Riedle wrote:

Hi,
I am struggling with following function

phi - function(w1, w2, j, k, K){

+   zaehler - (k/K)^(w1-1)*(1-k/K)^(w2-1)
+   nenner - sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
+   return( zaehler/nenner )
+ }

phi(c(1, 1), 44L, 1)

Error in phi(c(1, 1), 44L, 1) : argument k is missing, with no default


Hence, I have changed the function to

phi - function(w, k, K){
+ w1 - w[1]
+ w2 - w[2]
+ zaehler - (k/K)^(w1-1)*(1-k/K)^(w2-1)
+ nenner - sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
+ return( zaehler/nenner )
+ }

  Unfortunately, when running the midas regression I get the following error 
message

  m22.phi- midas_r(rv~mls(rvh,1:max.lag+h1,1,phi), start = list(rvh=c(1,1)))

Error in X[, inds] %*% fun(st) : non-conformable arguments

I guess the problem is w but I do not find a solution how to produce the 
formula shown in the attached file where the exponents are w1 and w2, 
respectively.

Thanks for your help


From: jlu...@ria.buffalo.edu [mailto:jlu...@ria.buffalo.edu]
Sent: 30 March 2015 16:01
To: T.Riedle
Cc: r-help@r-project.org; R-help
Subject: Re: [R] generating phi using function()

Your function phi has 5 arguments with no defaults.  Your call only has 3 
arguments.  Hence the error message.

phi - function(w1, w2, j, k, K){

+   zaehler - (k/K)^(w1-1)*(1-k/K)^(w2-1)
+   nenner - sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
+   return( zaehler/nenner )
+ }

phi(c(1, 1), 44L, 1)

Error in phi(c(1, 1), 44L, 1) : argument k is missing, with no default











T.Riedle tr...@kent.ac.ukmailto:tr...@kent.ac.uk
Sent by: R-help 
r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org

03/29/2015 08:59 AM

To

r-help@r-project.orgmailto:r-help@r-project.org 
r-help@r-project.orgmailto:r-help@r-project.org,

cc

Subject

[R] generating phi using function()







Hi everybody,
I am trying to generate the formula shown in the attachment. My formula so far 
looks as follows:

phi - function(w1, w2, j, k, K){
zaehler - (k/K)^(w1-1)*(1-k/K)^(w2-1)
nenner - sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
return( zaehler/nenner )
}

Unfortunately something must be wrong here as I get the following message when 
running a midas regression

m22.phi- midas_r(rv~mls(rvh,1:max.lag+h1,1,phi), start = list(rvh=c(1,1)))
Error in phi(c(1, 1), 44L, 1) : argument K is missing, with no default
Called from: .rs.breakOnError(TRUE)
Browse[1] K-125
Browse[1] 125

Could anybody look into my phi formula and tell me what is wrong with it?

Thanks in advance.

__
R-help@r-project.orgmailto:R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



You haven't provided a reproducible example, so it is difficult to 
trouble shoot errors.  However, it is not obvious to me that the error 
message has anything to do with parameter, w, in the phi function.  In 
reading the documentation for midas_r(), the help says of the formula 
argument


formula 

formula for restricted MIDAS regression or midas_r object. Formula must 
include fmls function


your formula does not include the fmls() function, it uses mls().  So I 
think your problem may have to do with how you are calling the midas_r 
function, and how the parameters are created and passed to phi().



Unfortunately, I can't be of much more help,

Dan

--
Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Jeff Newmiller
This is no better because (a) you are still posting using HTML format, and (b) 
using printed output loses the internal representation of the data. The dput 
function is very helpful for solving this. [1]

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya ntfr...@gmail.com 
wrote:
Hi Stephen,

Sorry, the data came in bad way.
Here is the head of the data.

 head(data)Date Number.of.Rain.Days Total.rain
Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
Start.Rain..iv.
1 1952-01-01  86   1139.95292
  239 112 112
2 1953-01-01  96977.64698
   98 112 112
3 1954-01-01 114   1382.01492
   92 120 120
4 1955-01-01 119   1323.086   100
  100 125 174
5 1956-01-01 123   1266.44492
   92 119 119
6 1957-01-01 124   1235.96492
   92 112 112



Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick ssef...@gmail.com
wrote:

 Hi Frederic,

 Can you provide a minimal reproducible example including either real
data
 (dput), or simulated data that mimics your situation? This will allow
more
 people to help.

 Stephen

 On Mon, Mar 30, 2015 at 8:39 AM, Frederic Ntirenganya
ntfr...@gmail.com
 wrote:

 Dear All,

 I want to plot multiple using ggplot function from a data frame of
 many columns. I want to plot only str1, str2 and str3 and I failed
to
 make it. What I want is to compare str1, str2 and str3 by plotting
 vertical line. I also need to add points to the plot to be able to
 separate them.


 Here is how the data look like and how I tried to make it.

 Date NumberofRaindays TotalRains str1 str2 str3 1/1/1952 86 1360.5
92 120
 112 1/1/1953 96 1100 98 100 110
 ...   
  ...  

 df1 -data.frame(data)
 df1
 df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
 df2

 ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type =
h)

 Kindly any help is welcome. Thanks

 Regards,
 Frederic.

 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Stephen Sefick
 **
 Auburn University
 Biological Sciences
 331 Funchess Hall
 Auburn, Alabama
 36849
 **
 sas0...@auburn.edu
 http://www.auburn.edu/~sas0025
 **

 Let's not spend our time and resources thinking about things that are
so
 little or so large that all they really do for us is puff us up and
make us
 feel like gods.  We are mammals, and have not exhausted the annoying
little
 problems of being mammals.

 -K. Mullis

 A big computer, a complex algorithm and a long time does not equal
 science.

   -Robert Gentleman



   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and 

Re: [R] matrix manipulation question

2015-03-31 Thread Stéphane Adamowicz
Many thanks,

Stéphane

Le 30 mars 2015 à 10:42, peter dalgaard pda...@gmail.com a écrit :

 
 On 30 Mar 2015, at 09:59 , Stéphane Adamowicz 
 stephane.adamow...@avignon.inra.fr wrote:
 
 
 However, in order to help me understand, would you be so kind as to give me 
 a matrix or data.frame example where « complete.cases(X)== T » or « 
 complete.cases(X)== TRUE » would give some unwanted result ?
 
 The standard problem with T for TRUE is if T has been used for some other 
 purpose, like a time variable. E.g., T - 0 ; complete.cases(X)==T.
 
 complete.cases()==TRUE is just silly, like (x==0)==TRUE or 
 ((x==0)==TRUE)==TRUE). 
 
 (However, notice that x==TRUE is different from as.logical(x) if x is 
 numeric, so ifelse(x,y,z) may differ from ifelse(x==TRUE,y,z).) 
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Frederic Ntirenganya
 Hi All,

Sorry for the shape of data which was not good enough.This is how my
data look like.

I want to plot multiple using ggplot function from a data frame of
many columns. I want to plot only Start.of.Rain..i.,
Start.of.Rain..ii. and  Start.of.Rain..iii. and I failed to make it.
What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and
Start.of.Rain..iii. by plotting vertical line. I also need to add
points to the plot to be able to separate them. The x-axis must be
date column. Thanks!

Here is how the data look like and how I tried to make it.



Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


Here is how I tried to solve the problem.

df1 -data.frame(data)
df1
df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
df2

ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)

Kindly any help is welcome. Thanks

Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 This is no better because (a) you are still posting using HTML format, and
 (b) using printed output loses the internal representation of the data. The
 dput function is very helpful for solving this. [1]

 [1]
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya ntfr...@gmail.com
 wrote:
 Hi Stephen,
 
 Sorry, the data came in bad way.
 Here is the head of the data.
 
  head(data)Date Number.of.Rain.Days Total.rain
 Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
 Start.Rain..iv.
 1 1952-01-01  86   1139.95292
   239 112 112
 2 1953-01-01  96977.64698
98 112 112
 3 1954-01-01 114   1382.01492
92 120 120
 4 1955-01-01 119   1323.086   100
   100 125 174
 5 1956-01-01 123   1266.44492
92 119 119
 6 1957-01-01 124   1235.96492
92 112 112
 
 
 
 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/
 
 On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick ssef...@gmail.com
 wrote:
 
  Hi Frederic,
 
  Can you provide a minimal reproducible example including either real
 data
  (dput), or simulated data that mimics your situation? This will allow
 more
  people to help.
 
  Stephen
 
  On Mon, Mar 30, 2015 at 8:39 AM, Frederic Ntirenganya
 ntfr...@gmail.com
  wrote:
 
  Dear All,
 
  I want to plot multiple using ggplot function from a data frame of
  many columns. I want to plot only str1, str2 and str3 and I failed
 to
  make it. What I want is to compare str1, str2 and str3 by plotting
  vertical line. I also need to add points to the plot to be able to
  separate them.
 
 
  Here is how the data look like and how I tried to make it.
 
  Date NumberofRaindays TotalRains str1 str2 str3 1/1/1952 86 1360.5
 92 120
  112 1/1/1953 96 1100 98 100 110
  ...   
   ...  
 
  df1 -data.frame(data)
  df1
  df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
  df2
 
  ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type =
 h)
 
  Kindly any help is welcome. Thanks
 
  Regards,
  Frederic.
 
  Frederic Ntirenganya
  Maseno University,
  African Maths Initiative,
  Kenya.
  Mobile:(+254)718492836
  Email: fr...@aims.ac.za
  https://sites.google.com/a/aims.ac.za/fredo/
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE 

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Jeff Newmiller
By failing to take the advice given to you, you make it harder to help you. 
Learn to control your email program to send plain text, and learn to use the 
dput function.

With regard to this function call:

 ggplot(df2, aes(Date,value)) +

I highly recommend using named parameters in the aes call. Also, if you want 
different values of variable to be plotted with different colors, you should 
map that column to the colour dimension:

ggplot(df2, aes(x=Date,y=value,colour=variable)) +

The type argument applies to base graphics rather than ggplot graphics, and 
you should never put fixed values inside the aes call. Since colour has already 
been taken care of, you can give no parameters in the geom_line call:

geom_line()

So all together then:

ggplot(df2, aes(x=Date,y=value,colour=variable)) +
geom_line()

but I cannot test it because you have not followed my other advice.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 31, 2015 12:55:11 AM PDT, Frederic Ntirenganya ntfr...@gmail.com 
wrote:
 Hi All,

Sorry for the shape of data which was not good enough.This is how my
data look like.

I want to plot multiple using ggplot function from a data frame of
many columns. I want to plot only Start.of.Rain..i.,
Start.of.Rain..ii. and  Start.of.Rain..iii. and I failed to make it.
What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and
Start.of.Rain..iii. by plotting vertical line. I also need to add
points to the plot to be able to separate them. The x-axis must be
date column. Thanks!

Here is how the data look like and how I tried to make it.



Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
Start.of.Rain..ii.
Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96
977.646
98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100
100
12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


Here is how I tried to solve the problem.

df1 -data.frame(data)
df1
df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
df2

ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)

Kindly any help is welcome. Thanks

Regards,
Frederic.

Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us
wrote:

 This is no better because (a) you are still posting using HTML
format, and
 (b) using printed output loses the internal representation of the
data. The
 dput function is very helpful for solving this. [1]

 [1]

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#.. 
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya
ntfr...@gmail.com
 wrote:
 Hi Stephen,
 
 Sorry, the data came in bad way.
 Here is the head of the data.
 
  head(data)Date Number.of.Rain.Days Total.rain
 Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
 Start.Rain..iv.
 1 1952-01-01  86   1139.95292
   239 112 112
 2 1953-01-01  96977.64698
98 112 112
 3 1954-01-01 114   1382.01492
92 120 120
 4 1955-01-01 119   1323.086   100
   100 125 174
 5 1956-01-01 123   1266.44492
92 119 119
 6 1957-01-01 124   1235.96492
92 112 112
 
 
 
 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/
 
 On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick ssef...@gmail.com
 wrote:

[R] MethComp exported object namespace error

2015-03-31 Thread Kylie Lange
Hi everyone,

I am using the MCmcmc function of the MethComp package and receive the 
following error:

Error: 'coda.samples' is not an exported object from 'namespace:coda'

I emailed the package author last week but haven't had a reply. I have 
installed JAGS 3.4.0 as required by MethComp. I am using  R 3.1.2 and the 
MethComp currently on CRAN (1.22.1). I am not a regular R user so haven't had 
any luck making sense of the error, though there are references to namespace in 
the package check results here: 
http://cran.itam.mx/web/checks/check_results_bxc_at_steno.dk.html#MethComp. Not 
sure if that's relevant.

Any suggestions would be appreciated. Apologies if I haven't provided any 
required information.

The following shows my code and error (code taken from the package author's 
text 'Comparing Clinical Measurement Methods', Bendix Carstensen, section 
7.5.3) :

library(MethComp)
data(ox)
ox- Meth(ox)
m3- MCmcmc(ox, IxR=TRUE, n.iter=5)

Comparison of 2 methods, using 354 measurements on 61 items, with up to 3 
replicate measurements, (replicate values are in the set: 1 2 3 ) 
( 2 * 61 * 3 = 366 ): 

No. items with measurements on each method:
#Replicates
Method1   2   3 #Items #Obs: 354 Values:  min  med  max
  CO  1   4  56 61   177 22.2 78.6 93.5
  pulse   1   4  56 61   177 24.0 75.0 94.0

Simulation run of a model with 
- method by item and item by replicate interaction: 
- using 4 chains run for 5 iterations (of which 25000 are burn-in), 
- monitoring every 25 values of the chain: 
- giving a posterior sample of 4000 observations.

Loading required package: coda
Linked to JAGS 3.4.0
Loaded modules: basemod,bugs
Initialization and burn-in:
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 2868

Initializing model

  |++| 100%
Sampling:
Error: 'coda.samples' is not an exported object from 'namespace:coda'


Thanks,
Kylie.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-31 Thread RiGui
I found a fix to my problem using the fastLm() from package RcppEigen, using
the Jacobi singular value decomposition (SVD) (method 4) or a method based
on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm
function



install.packages(RcppEigen)
library(RcppEigen)

n_obs - 1500
y  - rnorm(n_obs, 10,2.89)
x1 - rnorm(n_obs, 0.01235657,0.45)
x2 - rnorm(n_obs, 10,3.21)
X  - cbind(x1,x2)



bFE - fastLm(y ~ x1 + x2, method =4)
bFE

Call:
fastLm.formula(formula = y ~ x1 + x2, method = 4)

Coefficients:
(Intercept)  x1  x2 
9.94832839474159414 0.12293 0.00440078989949841 


Best,

Raluca





--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Frederic Ntirenganya
Hi All,

Thanks for the help. I want to plot some of the columns on the same graph
not all of them. Sorry, I failed to follow the instructions. Here is the
output of *dput()* but I don't know how it works.

 dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, -5479,
-5114, -4748), class = Date), Number.of.Rain.Days = c(86L,
96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
228L)), .Names = c(Date, Number.of.Rain.Days, Total.rain,
Start.of.Rain..i., Start.of.Rain..ii., Start.of.Rain..iii.,
Start.Rain..iv., End.of.Rain.Season), row.names = c(NA, 6L
), class = data.frame)

 I think I need subset function then melt. Here is the approach I used:

d - subset(df1,
select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))
d
d2 - melt(d ,  id = 'Date', variable_name = 'Start')

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = h)

 but the error is:

Don't know how to automatically pick scale for object of type
function. Defaulting to continuousError in data.frame(colour =
function (x, ...)  :
  arguments imply differing number of rows: 0, 183


Thanks,

Frederic.



Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick ssef...@gmail.com wrote:

 Your data and post is still not provided in one of the formats provided
 here:
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
 I am unsure of what you want to do, but I have made a reproducible example
 that might help.

 zz - Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
 Start.of.Rain..ii.   Start.of.Rain..iii.
  1952-01-01  86   1139.95292
  239 11
  1953-01-01  96977.64698
   98 11
  1954-01-01 114   1382.01492
   92 12
  1955-01-01 119   1323.086   100
  100 12
  1956-01-01 123   1266.44492
   92 11
  1957-01-01 124   1235.96492
   92 11

 library(reshape)
 library(ggplot2)

 Data - read.table(text=zz, header = TRUE)

 df1 -data.frame(Data)

 df2 - melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))

 df3 - df2[-grep(Total.rain, df2$variable),]

 qplot(Date,value, data=df3) +facet_wrap(~variable)

 On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya ntfr...@gmail.com
 wrote:

  Hi All,

 Sorry for the shape of data which was not good enough.This is how my data 
 look like.

 I want to plot multiple using ggplot function from a data frame of many 
 columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
 Start.of.Rain..iii. and I failed to make it. What I want is to compare 
 Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
 vertical line. I also need to add points to the plot to be able to separate 
 them. The x-axis must be date column. Thanks!

 Here is how the data look like and how I tried to make it.



 Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
 Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


 Here is how I tried to solve the problem.

 df1 -data.frame(data)
 df1
 df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
 df2

 ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)

 Kindly any help is welcome. Thanks

 Regards,
 Frederic.

 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/

 On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us
  wrote:

 This is no better because (a) you are still posting using HTML format,
 and (b) using printed output loses the internal representation of the data.
 The dput function is very helpful for solving this. [1]

 [1]
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 ---
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/Batteries   

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread stephen sefick
The error message is very informative. You named a column in the melted
data Start, and told ggplot to use start. start is a function. R is
case sensitive.

On Tue, Mar 31, 2015 at 8:46 AM, Frederic Ntirenganya ntfr...@gmail.com
wrote:

 Hi All,

 Thanks for the help. I want to plot some of the columns on the same graph
 not all of them. Sorry, I failed to follow the instructions. Here is the
 output of *dput()* but I don't know how it works.

  dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, 
  -5479,
 -5114, -4748), class = Date), Number.of.Rain.Days = c(86L,
 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
 228L)), .Names = c(Date, Number.of.Rain.Days, Total.rain,
 Start.of.Rain..i., Start.of.Rain..ii., Start.of.Rain..iii.,
 Start.Rain..iv., End.of.Rain.Season), row.names = c(NA, 6L
 ), class = data.frame)

  I think I need subset function then melt. Here is the approach I used:

 d - subset(df1, 
 select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))
 d
 d2 - melt(d ,  id = 'Date', variable_name = 'Start')

 ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = h)

  but the error is:

 Don't know how to automatically pick scale for object of type function. 
 Defaulting to continuousError in data.frame(colour = function (x, ...)  :
   arguments imply differing number of rows: 0, 183


 Thanks,

 Frederic.



 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/

 On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick ssef...@gmail.com wrote:

 Your data and post is still not provided in one of the formats provided
 here:
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
 I am unsure of what you want to do, but I have made a reproducible example
 that might help.

 zz - Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
 Start.of.Rain..ii.   Start.of.Rain..iii.
  1952-01-01  86   1139.95292
239 11
  1953-01-01  96977.64698
 98 11
  1954-01-01 114   1382.01492
 92 12
  1955-01-01 119   1323.086   100
100 12
  1956-01-01 123   1266.44492
 92 11
  1957-01-01 124   1235.96492
 92 11

 library(reshape)
 library(ggplot2)

 Data - read.table(text=zz, header = TRUE)

 df1 -data.frame(Data)

 df2 - melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))

 df3 - df2[-grep(Total.rain, df2$variable),]

 qplot(Date,value, data=df3) +facet_wrap(~variable)

 On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya ntfr...@gmail.com
 wrote:

  Hi All,

 Sorry for the shape of data which was not good enough.This is how my data 
 look like.

 I want to plot multiple using ggplot function from a data frame of many 
 columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
 Start.of.Rain..iii. and I failed to make it. What I want is to compare 
 Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
 vertical line. I also need to add points to the plot to be able to separate 
 them. The x-axis must be date column. Thanks!

 Here is how the data look like and how I tried to make it.



 Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
 Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


 Here is how I tried to solve the problem.

 df1 -data.frame(data)
 df1
 df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
 df2

 ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)

 Kindly any help is welcome. Thanks

 Regards,
 Frederic.

 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/

 On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller 
 jdnew...@dcn.davis.ca.us wrote:

 This is no better because (a) you are still posting using HTML format,
 and (b) using printed output loses the internal representation of the data.
 The dput function is very helpful for solving this. [1]

 [1]
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 

Re: [R] Plotting using tapply function output

2015-03-31 Thread John Kane
Reproducibility
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example


John Kane
Kingston ON Canada


 -Original Message-
 From: amc5...@gmail.com
 Sent: Mon, 30 Mar 2015 16:07:05 -0700
 To: r-help@r-project.org
 Subject: [R] Plotting using tapply function output
 
 Hello,
 
 I am trying to plot the hourly standard deviation of wind speeds from
 13 different measured locations over many years. I imported the data
 using readLines and into a dataframe called finalData. Using tapply, I
 determined the standard deviation of the windspeed (ws) for each hour
 (hour) from every location (stn) using this command line:
 
 statHour = tapply(finalData$ws,list(finalData$stn,finalData$hour),sd)
 
 I want to plot the standard deviation for each hour of the day, with
 hours as the x-axis and the standard deviation for the y-axis, and
 each station as a different color.  I've managed to get a boxplot of
 this, but ideally, I'd like a scatter plot to determine the variations
 between each instrument throughout the day.  The boxplot command is
 this:
 
 boxplot(statHour, names=colnames(statHour),xlab='Hour of the
 Day',ylab='Standard Deviation of Wind Speed')
 
 I also tried to make a dataframe of the tapply output but it ends up
 using the hours as the column names instead of putting it into the
 dataframe.  Please help!!
 
 I have R version 3.1.1
 
 Thanks a lot,
 Alexandra
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE ONLINE PHOTOSHARING - Share your photos online with your friends and 
family!
Visit http://www.inbox.com/photosharing to find out more!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread John Kane
Hi Frederic,

Thanks for sending the data in dput() format. All it does in convert a data set 
into a standardized format (perfect copy) that anyone with R can read. People 
have different setups and defaults for reading data and so on and what you may 
read in to R as a character variable may be a factor when I read it it in and 
we can have some serious problems just trying to decide what the data looks 
like. 

I had a look at your code and it is confused. See my comments below

d - subset(df1, 
select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.)) 

d 

d2 - melt(d , id = 'Date', variable_name = 'Start') 

# You do not have any variable in your data.frame called “Start”

# Reshape2 seems to have just ignored “variable_name = 'Start' and did the melt 
based on id = 'Date'. Strange, I would have expected an error but it worked !

d2 - melt(d , id = 'Date') will give you exactly the same result.

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = h) 

Again you do not have a variable (column name) called 'start'. You have three 
column names (variables) in d2 These are Date variable and value .

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = h) 

Point one, you have no variable called start.  

Point two, what is type = “h” doing here? It is, as far as I can see not an 
option in geom_line for such an option. See ?geom_line for this point.

 I think you are confusing basic graphics commands (type =)  with ggplot 
commands. Have a look at 
http://www.cookbook-r.com/Graphs/Shapes_and_line_types/ for some examples that 
show the differences.

Below is what I think you may be trying to do (note I use dat1 for the 
data.frame rather than your df1).

###==
dat1  -  structure(list(Date = structure(c(-6575, -6209, -5844, -5479,
-5114, -4748), class = Date), Number.of.Rain.Days = c(86L,
96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
228L)), .Names = c(Date, Number.of.Rain.Days, Total.rain,
Start.of.Rain..i., Start.of.Rain..ii., Start.of.Rain..iii.,
Start.Rain..iv., End.of.Rain.Season), row.names = c(NA, 6L
), class = data.frame)

dd - subset(dat1, 
select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))

d2 - melt(dd ,  id = 'Date')

ggplot(d2, aes(Date,value)) + geom_line(aes(colour = variable))

ggplot(d2, aes(Date, value)) + 
   geom_histogram(  position=dodge,  stat = identity, aes(fill = 
variable))

##
John Kane
Kingston ON Canada


 -Original Message-
 From: ntfr...@gmail.com
 Sent: Tue, 31 Mar 2015 16:55:56 +0300
 To: ssef...@gmail.com
 Subject: Re: [R] Multiple Plots using ggplot
 
 Hi John,
 
 Sorry for the mistake I made for providing useless data.
 Here I am interest only on Tmin and Tmax columns. I want to use the same
 approach with the previous data. I want to plot on the same graph not
 separate graph. Thanks
 
 dput(head(BUTemp))structure(list(Year = c(1971L, 1971L, 1971L, 1971L,
 1971L, 1971L
 ), Month = c(2L, 2L, 2L, 2L, 2L, 2L), Day = 1:6, Rain = c(0,
 0, 0, 0, 0, 0), Tmax = c(24.3, 25, 25.6, 26.5, 27.8, 27.5), Tmin =
 c(13.5,
 13.2, 12.7, 12.7, 12.2, 14)), .Names = c(Year, Month, Day,
 Rain, Tmax, Tmin), row.names = c(NA, 6L), class = data.frame)
 
 Regards,
 
 Frederic.
 
 
 
 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/
 
 On Tue, Mar 31, 2015 at 4:46 PM, Frederic Ntirenganya ntfr...@gmail.com
 wrote:
 
 Hi All,
 
 Thanks for the help. I want to plot some of the columns on the same
 graph
 not all of them. Sorry, I failed to follow the instructions. Here is the
 output of *dput()* but I don't know how it works.
 
 dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844,
 -5479,
 -5114, -4748), class = Date), Number.of.Rain.Days = c(86L,
 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
 228L)), .Names = c(Date, Number.of.Rain.Days, Total.rain,
 Start.of.Rain..i., Start.of.Rain..ii., Start.of.Rain..iii.,
 Start.Rain..iv., End.of.Rain.Season), row.names = c(NA, 6L
 ), class = data.frame)
 
  I think I need subset function then melt. Here is the approach I used:
 
 d - subset(df1,
 

Re: [R] Calculating Kendall's tau

2015-03-31 Thread Bert Gunter
This sounds like homework. Homework is discouraged on this list (but
you might get lucky).

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Tue, Mar 31, 2015 at 9:08 AM, Desta Yoseph via R-help
r-help@r-project.org wrote:
 I am analyzing trend  using Mann-kendall  test for 31 independent sample, 
 each sample  have 34 years dataset.  I supposed to find Kendall “tau” for 
 each sample. The data is arranged in column wise (I attached  the data).To 
 find Kendall tau, I wrote R script as:
  desta-read.csv(rainfall.csv, header=T, sep=,) require(Kendall)  
 MK-function(y) { nc-ncol(y) 
 MannKendalltau- numeric(nc) for(i in 2:nc){  
 MannKendalltau[i]-MannKendall(y[,i])   }
 MannKendalltau}MK(desta)
 The  displayed result showed  both “tau”  and “2-sided p-value”in unorganized 
 way.  But, I want only “tau” value that is presented in organized  manner. 
 Anyone can tell me how can I get orderly displayed  “tau” value? here is my 
 sample result:  [[1]][1] 0
 [[2]][1] 0.4352941attr(,Csingle)[1] TRUE
 [[3]][1] 0.5462185attr(,Csingle)[1] TRUE
 [[4]][1] 0.4218487attr(,Csingle)[1] TRUEThank you for your guidance

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-31 Thread William Dunlap
If you really want your coefficient estimates to be scale-equivariant you
should test those methods for such a thing.  E.g., here are functions that
let you check how scaling one predictor affects the estimated coefficients
- they should give the same results for any scale factor.

f -
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n
{
cf - coef(lm(data=data, Y ~ X1 + I(X2/scale)))
cf * c(1, 1, 1/scale)
}
g -
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n
{
cf - coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=4))
cf * c(1, 1, 1/scale)
}
h -
function (scale=1, n=100, data=data.frame(Y=seq_len(n),
X1=sqrt(seq_len(n)), X2=log(seq_len(n
{
cf - coef(fastLm(data=data, Y ~ X1 + I(X2/scale), method=5))
cf * c(1, 1, 1/scale)
}

See how they compare for scale factors between 10^-15 and 10^15.  lm() is
looking pretty good.
 options(digits=4)
 scale - 10 ^ seq(-15,15,by=5)
 sapply(scale, f)
   [,1][,2][,3][,4][,5][,6][,7]
(Intercept)  -9.393  -9.393  -9.393  -9.393  -9.393  -9.393  -9.393
X1   19.955  19.955  19.955  19.955  19.955  19.955  19.955
I(X2/scale) -20.372 -20.372 -20.372 -20.372 -20.372 -20.372 -20.372
 sapply(scale, g)
 [,1][,2][,3][,4][,5][,6]   [,7]
(Intercept) 0.000e+00  -9.393  -9.393  -9.393  -9.393  -9.393 -3.126e+01
X1  2.772e-29  19.955  19.955  19.955  19.955  19.955  1.218e+01
I(X2/scale) 1.474e+01 -20.372 -20.372 -20.372 -20.372 -20.372 -2.892e-29
 sapply(scale, h)
 [,1]  [,2][,3][,4][,5]   [,6]
[,7]
(Intercept) 0.000e+00 3.807e-20  -9.395  -9.393  -9.393 -3.126e+01
-3.126e+01
X1  2.945e-29 2.772e-19  19.954  19.955  19.955  1.218e+01
 1.218e+01
I(X2/scale) 1.474e+01 1.474e+01 -20.369 -20.372 -20.372 -2.892e-19
 6.596e-30



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 31, 2015 at 5:10 AM, RiGui raluca@business.uzh.ch wrote:

 I found a fix to my problem using the fastLm() from package RcppEigen,
 using
 the Jacobi singular value decomposition (SVD) (method 4) or a method based
 on the eigenvalue-eigenvector decomposition of X'X - method 5 of the fastLm
 function



 install.packages(RcppEigen)
 library(RcppEigen)

 n_obs - 1500
 y  - rnorm(n_obs, 10,2.89)
 x1 - rnorm(n_obs, 0.01235657,0.45)
 x2 - rnorm(n_obs, 10,3.21)
 X  - cbind(x1,x2)



 bFE - fastLm(y ~ x1 + x2, method =4)
 bFE

 Call:
 fastLm.formula(formula = y ~ x1 + x2, method = 4)

 Coefficients:
 (Intercept)  x1  x2
 9.94832839474159414 0.12293 0.00440078989949841


 Best,

 Raluca





 --
 View this message in context:
 http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705328.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Kendall's tau

2015-03-31 Thread Bert Gunter
OK.

But always reply to the list (which I am ccing here) so that everyone
knows -- and re-submit your OP in **PLAIN TEXT**, not html, as this is
a plain text  list and html typically garbles everything.

Also, reading and following the posting guide (see end of this email)
generally improves your chance of getting useful help.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Tue, Mar 31, 2015 at 9:24 AM, Desta Yoseph desta...@yahoo.com wrote:
 Dear Bert,
 It is not homework. Actually my real work is for 10,360 sample data. But if
 some one showed me for 31 sample dataset, i can manage for large sample
 data.
 hopefully this give you few hint why i really want  someone help.
 cheers



 On Tuesday, March 31, 2015 6:14 PM, Bert Gunter gunter.ber...@gene.com
 wrote:


 This sounds like homework. Homework is discouraged on this list (but
 you might get lucky).

 Cheers,
 Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 Clifford Stoll




 On Tue, Mar 31, 2015 at 9:08 AM, Desta Yoseph via R-help
 r-help@r-project.org wrote:
 I am analyzing trend  using Mann-kendall  test for 31 independent sample,
 each sample  have 34 years dataset.  I supposed to find Kendall “tau” for
 each sample. The data is arranged in column wise (I attached  the data).To
 find Kendall tau, I wrote R script as:
  desta-read.csv(rainfall.csv, header=T, sep=,)
 require(Kendall)  MK-function(y) {nc-ncol(y)
 MannKendalltau- numeric(nc)for(i in 2:nc){
 MannKendalltau[i]-MannKendall(y[,i])  }MannKendalltau
 }MK(desta)
 The  displayed result showed  both “tau”  and “2-sided p-value”in
 unorganized way.  But, I want only “tau” value that is presented in
 organized  manner. Anyone can tell me how can I get orderly displayed  “tau”
 value? here is my sample result:  [[1]][1] 0
 [[2]][1] 0.4352941attr(,Csingle)[1] TRUE
 [[3]][1] 0.5462185attr(,Csingle)[1] TRUE
 [[4]][1] 0.4218487attr(,Csingle)[1] TRUEThank you for your guidance

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Calculating Kendall's tau

2015-03-31 Thread Desta Yoseph via R-help
I am analyzing trend  using Mann-kendall  test for 31 independent sample, each 
sample  have 34 years dataset.  I supposed to find Kendall “tau” for each 
sample. The data is arranged in column wise (I attached  the data).To find 
Kendall tau, I wrote R script as:
     desta-read.csv(rainfall.csv, header=T, sep=,)     require(Kendall)    
          MK-function(y) {                 nc-ncol(y)                 
MannKendalltau- numeric(nc)                 for(i in 2:nc){                    
      MannKendalltau[i]-MannKendall(y[,i])   }    
MannKendalltau    }    MK(desta)
The  displayed result showed  both “tau”  and “2-sided p-value”in unorganized 
way.  But, I want only “tau” value that is presented in organized  manner. 
Anyone can tell me how can I get orderly displayed  “tau” value? here is my 
sample result:      [[1]][1] 0
[[2]][1] 0.4352941attr(,Csingle)[1] TRUE
[[3]][1] 0.5462185attr(,Csingle)[1] TRUE
[[4]][1] 0.4218487attr(,Csingle)[1] TRUEThank you for your guidance 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing column labels for data frames inside a list

2015-03-31 Thread r-help
 Date: Mon, 30 Mar 2015 09:54:39 -0400
 From: Vikram Chhatre crypticline...@gmail.com
 To: r-help@r-project.org
 Subject: [R] changing column labels for data frames inside a list
 Message-ID:
 CAJZnH0=uGay_1VzjVTMMc=fweydkdjxm_tpi4hzo-ardztr...@mail.gmai
 Content-Type: text/plain; charset=UTF-8

  summary(mygenfreqt)
   Length Class  Mode
 dat1.str 59220  -none- numeric
 dat2.str 59220  -none- numeric
 dat3.str 59220  -none- numeric

  head(mylist[[1]])
1 2 3 4 5 6 7 8 910
  12
 L0001.1 0.60 0.500 0.325 0.675 0.600 0.500 0.500 0.375 0.550 0.475 0.3
 0.275
 L0001.2 0.40 0.500 0.675 0.325 0.400 0.500 0.500 0.625 0.450 0.525 0.6
 0.725

 I want to change 1:12 to pop1:pop12

 mylist- lapply(mylist, function(e) colnames(e) - paste0('pop',1:12))

 What this is doing is replacing the data frames with just names
 pop1:pop12.  I just want to replace the column labels.

 Thanks for any suggestions.

Some readers have already replied, but here is another option that exploits 
lapply()'s ... parameter.  First, we make a reproducible example.

(lista - list(mtcars, mtcars))

Now, we get the unique number of columns of the data frames in the variable 
lista.

(n.cols - unique(sapply(lista, ncol)))

Finally, we call lapply() and `colnames-` to change the column names of both 
data frames in lista.  See lapply()'s ... parameter (?lapply).

(lista - lapply(X = lista, FUN = `colnames-`, paste0(pop, seq_len(n.cols

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to deal with changing weighting functions

2015-03-31 Thread Adams, Jean
Can you give a concrete simple example of inputs with expected results?  Is
phi a function?  Of omega 1 and 2?  Is the summation over everything
through V_d-k?

On Mon, Mar 30, 2015 at 2:58 PM, T.Riedle tr...@kent.ac.uk wrote:

 Hi everybody,
 Does anybody have an idea how I can generate tau according to the attached
 formula? The point is that phi changes with k and I thought I could make it
 by using a for-function in R but I am not sure how to do that.

 Could anyone help me?
 Thanks in advance.


 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Kevin E. Thorpe

Hello.

I am trying to simulate recruitment in a randomized trial. Suppose I 
have three streams (strata) of patients represented by these data frames.


df1 - data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 - data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 - data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

What I need to do is construct a data frame with all of these combined 
where the order of selection from one of the three data frames is 
randomized but once a stratum is selected patients are selected 
sequentially from that data frame.


To see what I'm looking to achieve, suppose the first five subjects were 
to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
expected result should look like this:


rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
   strat id  pid
1  1  1 1001
2  2  1 2001
21 1  2 1002
4  3  1 3001
22 2  2 2002

I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
something obvious, but I really have no idea at the moment how to 
achieve this elegantly. Since I need to simulate many trial recruitments 
it needs to be general and compact.


I appreciate any advice.

Kevin

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple Plots using ggplot

2015-03-31 Thread Frederic Ntirenganya
Hi John,

Sorry for the mistake I made for providing useless data.
Here I am interest only on Tmin and Tmax columns. I want to use the same
approach with the previous data. I want to plot on the same graph not
separate graph. Thanks

 dput(head(BUTemp))structure(list(Year = c(1971L, 1971L, 1971L, 1971L, 1971L, 
 1971L
), Month = c(2L, 2L, 2L, 2L, 2L, 2L), Day = 1:6, Rain = c(0,
0, 0, 0, 0, 0), Tmax = c(24.3, 25, 25.6, 26.5, 27.8, 27.5), Tmin = c(13.5,
13.2, 12.7, 12.7, 12.2, 14)), .Names = c(Year, Month, Day,
Rain, Tmax, Tmin), row.names = c(NA, 6L), class = data.frame)

Regards,

Frederic.



Frederic Ntirenganya
Maseno University,
African Maths Initiative,
Kenya.
Mobile:(+254)718492836
Email: fr...@aims.ac.za
https://sites.google.com/a/aims.ac.za/fredo/

On Tue, Mar 31, 2015 at 4:46 PM, Frederic Ntirenganya ntfr...@gmail.com
wrote:

 Hi All,

 Thanks for the help. I want to plot some of the columns on the same graph
 not all of them. Sorry, I failed to follow the instructions. Here is the
 output of *dput()* but I don't know how it works.

  dput(head(data))structure(list(Date = structure(c(-6575, -6209, -5844, 
  -5479,
 -5114, -4748), class = Date), Number.of.Rain.Days = c(86L,
 96L, 114L, 119L, 123L, 124L), Total.rain = c(1139.952, 977.646,
 1382.014, 1323.086, 1266.444, 1235.964), Start.of.Rain..i. = c(92L,
 98L, 92L, 100L, 92L, 92L), Start.of.Rain..ii. = c(239L, 98L,
 92L, 100L, 92L, 92L), Start.of.Rain..iii. = c(112L, 112L, 120L,
 125L, 119L, 112L), Start.Rain..iv. = c(112L, 112L, 120L, 174L,
 119L, 112L), End.of.Rain.Season = c(228L, 229L, 240L, 228L, 228L,
 228L)), .Names = c(Date, Number.of.Rain.Days, Total.rain,
 Start.of.Rain..i., Start.of.Rain..ii., Start.of.Rain..iii.,
 Start.Rain..iv., End.of.Rain.Season), row.names = c(NA, 6L
 ), class = data.frame)

  I think I need subset function then melt. Here is the approach I used:

 d - subset(df1, 
 select=c(Date,Start.of.Rain..i.,Start.of.Rain..ii.,Start.of.Rain..iii.))
 d
 d2 - melt(d ,  id = 'Date', variable_name = 'Start')

 ggplot(d2, aes(Date,value)) + geom_line(aes(colour = start),type = h)

  but the error is:

 Don't know how to automatically pick scale for object of type function. 
 Defaulting to continuousError in data.frame(colour = function (x, ...)  :
   arguments imply differing number of rows: 0, 183


 Thanks,

 Frederic.



 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/

 On Tue, Mar 31, 2015 at 4:20 PM, stephen sefick ssef...@gmail.com wrote:

 Your data and post is still not provided in one of the formats provided
 here:
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
 I am unsure of what you want to do, but I have made a reproducible example
 that might help.

 zz - Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
 Start.of.Rain..ii.   Start.of.Rain..iii.
  1952-01-01  86   1139.95292
239 11
  1953-01-01  96977.64698
 98 11
  1954-01-01 114   1382.01492
 92 12
  1955-01-01 119   1323.086   100
100 12
  1956-01-01 123   1266.44492
 92 11
  1957-01-01 124   1235.96492
 92 11

 library(reshape)
 library(ggplot2)

 Data - read.table(text=zz, header = TRUE)

 df1 -data.frame(Data)

 df2 - melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))

 df3 - df2[-grep(Total.rain, df2$variable),]

 qplot(Date,value, data=df3) +facet_wrap(~variable)

 On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya ntfr...@gmail.com
 wrote:

  Hi All,

 Sorry for the shape of data which was not good enough.This is how my data 
 look like.

 I want to plot multiple using ggplot function from a data frame of many 
 columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
 Start.of.Rain..iii. and I failed to make it. What I want is to compare 
 Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
 vertical line. I also need to add points to the plot to be able to separate 
 them. The x-axis must be date column. Thanks!

 Here is how the data look like and how I tried to make it.



 Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
 Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


 Here is how I tried to solve the problem.

 df1 -data.frame(data)
 df1
 df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
 df2

 ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)

 Kindly any help is welcome. Thanks

 Regards,
 Frederic.

 

Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread John Kane
I think we need some data and code 
Reproducibility
https://github.com/hadley/devtools/wiki/Reproducibility
 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example



John Kane
Kingston ON Canada


 -Original Message-
 From: r...@catwhisker.org
 Sent: Mon, 30 Mar 2015 06:50:59 -0700
 To: r-help@r-project.org
 Subject: [R] data.frame: data-driven column selections that vary by row??
 
 Sorry if that's confusing: I'm probably confused. :-(
 
 I am collecting and trying to analyze data regarding performance of
 computer systems.
 
 After extracting the data from its repository, I have created and
 used a Perl script to generate a (relatively) simple CSV, each
 record of which contains:
 * a POSIXct timestamp
 * a hostname
 * a collection of metrics for the interval identified by the timestamp,
   and specific to the host in question, as well as some factors to
   group the hosts (e.g., whether it's in a control vs. a test
   group; a broad categorization of how the host is provisioned; which
   version of the software it was running at the time...).  (Each
   metric and factor is in a uniquely-named column.)
 
 As extracted from the repository, there were several records for each
 such hostname/timestamp pair -- e.g., there would be separate records
 for:
 * Input bandwidth utilization for network interface 1
 * Output bandwidth utilization for network interface 1
 * Input bandwidth utilization for network interface 2
 * Output bandwidth utilization for network interface 2
 
 (And the same field would be used for each of these -- the
 interpretation being driven by the content of other fields in teh
 record.)
 
 Working with the data as described (immediately) above directly in R
 seemed... daunting, at best: thus the excursion into Perl.
 
 And for some of the data, what I have works well enough.
 
 But now I also want to analyze information from disk drives, and things
 get messy (as far as I can see).
 
 First, each disk drive has a collection of 17 metrics (such as
 busy_pct, kb_per_transfer_read, and transfers_per_second_write),
 as well as a factor (dev_type).  Each also has a device name that is
 unique within the host where it resides (e.g. da1, da2, da3).
 (The dev_type factor identifies whether the drive is a solid-state
 device or a spinning disk.)
 
 I have thus made the corresponding columns unique by pasting the drive
 name and the name of the metric (or factor), separating the two with
 _ (e.g. da7_busy_pct; ada0_mb_per_second_write;
 ada4_queue_length).  I am not certain that's the best thing I could
 have done -- and I'm open to changing the approach.
 
 The challenge for me is that different (classes of) machines are
 provisioned differently; some consequennces of that:
 * While da1 may be a spinning disk on host A, that has no bearing on
   whether or not the da1 on host B is a spinning disk or an SSD.
 * Host C may not even have a da1 device.
 * Host D may be of a type that normally has a da1, but in this case,
   the drive has failed and has been disabled (so host D won't report
   anything about da1).
 
 (I'm not too bothered about the non-reporting case, but cite it so we
 all know about it.)
 
 I expect I will want to be using groupings:
 * All disk devices -- this one is easy.
 * All SSD devices (excluding spinning disks).
 * All spinning disks (excluding SSDs).
 
 I'm having trouble with the latter two (though, certainly, if I solve
 one, the other is also solved).
 
 Also, for some  of the metrics, I will want to sum them; for others,
 I will want to do other things -- find minima or maxima, or average
 them.  So pre-calculating such aggregates in the Perl script isn't
 something that appeals to me.
 
 Finally (as far as complications go), I'm trying to write the code in
 such a way that if we deploy a new configuration of machine that has
 (say) twice as many drives as the biggest one we presently deploy, the
 code Just Works -- I shouldn't need to update the code merely to adapt
 to another hardware configuration.
 
 I have been able to write a function that takes the data.frame obtained
 by reading the above-cited CSV, and generates a data.frame with a row
 for each host, and depicts the dev_type for each device for that host;
 here's an abbreviated (and slightly redacted) copy of its output to
 illustrate some of the above:
 
ada0 ada1 ada2 ada3 ada4 ada5 da30 da31 da32 da33 da34 da35 da36
 da3
 host_A  ssd  ssd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd
 hdd
 host_B  ssd  ssd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd  hdd
 hdd
 host_G  ssd  ssd  ssd  ssd  ssd  ssd
 ssd
 host_H  ssd  ssd  ssd  ssd  ssd  ssd
 ssd
 host_M  ssd  ssd  ssd  ssd  ssd  ssd
 ssd
 host_N  ssd  ssd  ssd  ssd  ssd  ssd
 ssd
 
 (That function is written with the explicit assumption(!) that for the
 period covered by a given set of input data, a given host's
 configuration remains static: we won't have drives changing type
 mid-stream.)

Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread Tom Wright
Nice clean-up!!!

On Tue, 2015-03-31 at 14:19 -0400, Ista Zahn wrote:
 library(tidyr)
 library(dplyr)
 bw - gather(bw, key = tmp, value = value,
 matches(^d[a-z]+[0-9]+))
 bw - separate(bw, tmp, c(disc, var), _, extra = merge)
 bw - spread(bw, var, value)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to obtain a cross tab count of unique values

2015-03-31 Thread Walter Anderson
I have a data frame that shows all of the parks (including duplicates)
that are impacted by a projects 'footprint':

PROJECT PARKNAME
A   PRK A
A   PRK B
A   PRK A
B   PRK C
B   PRK A
C   PRK B
C   PRK D
...

What I need is a cross tabulation that shows me the number of unique
parks for each project.  If I using the standard table(df$PROJECT) it
reports:

A 3
B 2
C 2
...

where I need it to ignore duplicates and report:

A 2
B 2
C 2
...

Anyone have any suggestions on how to do this within the R paradigm?

Walter Anderson

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread Tom Wright
Not entirely sure I understand your problem here (your first email was a
lot of reading).

Would it make sense to add an extra column device_name

Thus ending up with something like:
Host  Device  Type
host_Aada0ssd
host_Aada1ssd
host_Aada2hdd
...
host_Nda3 ssd


You could then subset this dataframe:
subset(data,Type==ssd  Device==ada0)

On Tue, 2015-03-31 at 10:22 -0700, David Wolfskill wrote:
 On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
  I think we need some data and code 
  Reproducibility
  https://github.com/hadley/devtools/wiki/Reproducibility
   
  http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
  
 
 I apologize for failing to provide that.
 
 Here is a quite small subset of the data (with a few edits to reduce
 excess verbosity in names of things) that still illustrates the
 challenge I perceive:
 
  dput(bw)
 structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 
 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 
 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 
 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
 ), hostname = c(c001, c002, c021, c022, c041, c051, 
 c001, c002, c021, c022, c041, c051, c001, c002, 
 c021, c022, c041, c051), health = c(0.054937499983, 
 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, 
 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 
 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151
 ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 
 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
 ), .Label = 2015Q1.2, class = factor), role = structure(c(1L, 
 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
 2L), .Label = c(control, test), class = factor), type = structure(c(3L, 
 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 
 2L), .Label = c(D, F, H), class = factor), da20_busy_pct = c(79.1, 
 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, 
 NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 
 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c(, 
 hdd), class = factor), da20_kb_per_xfer_read = c(727.23, 
 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 
 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, 
 NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read 
 = c(39.77, 
 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, 
 NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, 
 NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = 
 c(43.5, 
 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, 
 NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, 
 NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 
 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
 ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, 
 NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 
 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, 
 NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, 
 NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 
 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 
 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 
 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
 3L), .Label = c(, hdd, ssd), class = factor), da2_kb_per_xfer_read = 
 c(690.67, 
 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 
 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 
 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 
 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 
 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), 
 da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 
 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 
 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 
 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), 
 da2_xfers_per_sec_other = c(0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
 da2_xfers_per_sec_read = c(66, 
 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 
 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c(timestamp, 
 hostname, health, hw, fw, role, type, da20_busy_pct, 
 da20_dev_type, da20_kb_per_xfer_read, da20_kb_per_xfer_write, 
 da20_mb_per_sec_read, da20_mb_per_sec_write, da20_ms_per_xactn_read, 
 da20_ms_per_xactn_write, 

Re: [R] idiom for constructing data frame

2015-03-31 Thread Ista Zahn
You can make it as elegant as you want, e.g.,

make.empty.df - function(nrow,ncol, names) {
if(length(names) %% ncol != 0) stop(Lenght of names is not a
multiple of the number of colums)
data.frame(matrix(NA, nrow, ncol, dimnames = list(NULL, names)))
}


Best,
Ista

On Tue, Mar 31, 2015 at 2:37 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi,

 Duncan Murdoch suggested:

 The matrix() function has a dimnames argument, so you could do this:

 names - c(strat, id, pid)
 data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))

 That's a definite improvement, thanks. But no way to skip matrix()? It
 just seems unRlike, although since it's only full of NA values there
 are no coercion issues with column types or anything, so it doesn't
 hurt. It's just inelegant. :)

 Sarah
 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread William Dunlap
You can use structure() to attach the names to a list that is input to
data.frame.
E.g.,

dfNames - c(First, Second Name)
data.frame(lapply(structure(dfNames, names=dfNames),
function(name)rep(NA_real_, 5)))


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee sarah.gos...@gmail.com
wrote:

 Hi,

 Duncan Murdoch suggested:

  The matrix() function has a dimnames argument, so you could do this:
 
  names - c(strat, id, pid)
  data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))

 That's a definite improvement, thanks. But no way to skip matrix()? It
 just seems unRlike, although since it's only full of NA values there
 are no coercion issues with column types or anything, so it doesn't
 hurt. It's just inelegant. :)

 Sarah
 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
Hi,

Duncan Murdoch suggested:

 The matrix() function has a dimnames argument, so you could do this:

 names - c(strat, id, pid)
 data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))

That's a definite improvement, thanks. But no way to skip matrix()? It
just seems unRlike, although since it's only full of NA values there
are no coercion issues with column types or anything, so it doesn't
hurt. It's just inelegant. :)

Sarah
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does fitCopula work for amhCopula and joeCopula?

2015-03-31 Thread Laura Gianfagna
Good evening, this is a part of my Routine  which calculates the copula 
parameter and loglikelihood for each pair of rows of a data matrix, choosing, 
for each pair, the copula which gives the maximum likelihood. If I do my 
computation with this routine with only:


f - frankCopula(2,2)
  g - gumbelCopula(2,2)
  c - claytonCopula(2,2)


the program works correctly and gives the expected results.

If  I insert also:


  a - amhCopula(1,2)
  j - joeCopula(2,2)


then the program doesn’t work anymore. 

I tried on samples such as:


n - 1000
f - frankCopula(20,2)
x_1 - rCopula(n,f)
f - gumbelCopula(50,2)
x_2 - rCopula(n,f)
f - joeCopula(70,2)
x_3- rCopula(n,f)
x - cbind(x_1, x_2, x_3)
data - t(x)
dim - dim(data)[1]





Here is the part of code of Routine_Copula:

Routine_Copula - function(data,dim){
  
  library(copula)
  library(gtools)
  
  n - dim(data)[1];  # number of rows of the input matrix
  m - dim(data)[2];  # number of columns of the input matrix
  
  # Probability integral transform of the data
  ecdf - matrix(0,n,m);
  for (i in 1:n){
e - matrix(data[i,],m,1);
#ecdf[i,] - pobs(e);
ecdf[i,] - pobs(e, na.last=TRUE);
#na.last for controlling the treatment of NAs. If TRUE, missing values in 
the data are put last; if FALSE, they are put first; if NA, they are removed; 
if keep they are kept with rank NA.

  }



f - frankCopula(2,2)
  g - gumbelCopula(2,2)
  c - claytonCopula(2,2)
  a - amhCopula(1,2)
  j - joeCopula(2,2)





[….]


 for (j in 1:n_comb){
input - t(ecdf[comb[,j],])

try(summary - fitCopula(f,input,method='mpl',start=2),silent=TRUE);
resmatpar[j,1] - summary@estimate;
resmatllk[j,1] - summary@loglik;

try(summary - fitCopula(g,input,method='mpl',start=2),silent=TRUE);
resmatpar[j,2] - summary@estimate;
resmatllk[j,2] - summary@loglik;

try(summary - fitCopula(c,input,method='mpl',start=2),silent=TRUE);
resmatpar[j,3] - summary@estimate;
resmatllk[j,3] - summary@loglik;


try(summary - fitCopula(a,input,method='mpl',start=1),silent=TRUE);
resmatpar[j,4] - summary@estimate;
 resmatllk[j,4] - summary@loglik;
 
try(summary - fitCopula(j,input,method='mpl',start=2),silent=TRUE); 

 resmatpar[j,5] - summary@estimate;
resmatllk[j,5] - summary@loglik;

d - 
c(resmatllk[j,1],resmatllk[j,2],resmatllk[j,3],resmatllk[j,4],resmatllk[j,5]);



copchoice[j] - which(d==max(d));
param[j] - resmatpar[j,copchoice[j]];
loglik[j] - resmatllk[j,copchoice[j]];

  }


Thank you

Laura Gianfagna


​
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: non-conformable arguments

2015-03-31 Thread Soheila Khodakarim
Dear All,

I want to run neural network on my data.

i run these codes:

#load mydata
dim(mydata)
# 20 3111
library(neuralnet)
fm - as.formula(paste(resp ~, paste(colnames(mydata)[1:3110],
collapse=+)))
out - neuralnet(fm,data=mydata, hidden = 4, lifesign = minimal,
linear.output = FALSE, threshold = 0.1)
#load testset
dim(testset)
# 20 3111
out.results - compute(out, testset)
Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments

what should I do now?

Regards,
Soheila

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Duncan Murdoch
On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
 Hello.
 
 I am trying to simulate recruitment in a randomized trial. Suppose I 
 have three streams (strata) of patients represented by these data frames.
 
 df1 - data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
 df2 - data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
 df3 - data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
 
 What I need to do is construct a data frame with all of these combined 
 where the order of selection from one of the three data frames is 
 randomized but once a stratum is selected patients are selected 
 sequentially from that data frame.
 
 To see what I'm looking to achieve, suppose the first five subjects were 
 to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
 expected result should look like this:
 
 rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
 strat id  pid
 1  1  1 1001
 2  2  1 2001
 21 1  2 1002
 4  3  1 3001
 22 2  2 2002
 
 I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
 something obvious, but I really have no idea at the moment how to 
 achieve this elegantly. Since I need to simulate many trial recruitments 
 it needs to be general and compact.
 
 I appreciate any advice.

How about something like this:

# Permute an ordered vector of selections:
sel - sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3

# Create an empty dataframe to hold the results
df - data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

# Put the original dataframes into the appropriate slots:
df[sel == 1,] - df1
df[sel == 2,] - df2
df[sel == 3,] - df3

# Clean up the rownames
rownames(df) - NULL

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
I just snagged this from Duncan Murdoch's reply to the same question:

# Create an empty dataframe to hold the results
df - data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

This skips matrix(), but how to set the column names programmatically
within a function?

Sarah, still sure I'm missing something obvious


On Tue, Mar 31, 2015 at 1:46 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi folks,

 I KNOW there has to be a way to do this more elegantly, but I
 consistently fail to come up with it, as I was just reminded while
 writing an example for a query on this list.

 What's a nifty way to construct a data frame of a given size? The only
 way I know of it to use matrix(), eg

 data.frame(matrix(NA, nrow=10, ncol=3))

 and then to set the colnames in a second step.

 This comes up a lot when pre-allocated a data frame before using a
 loop: I know the size and column names, but want an empty structure to
 fill later.

 Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Nordlund, Dan (DSHS/RDA)
 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Kevin
 E. Thorpe
 Sent: Tuesday, March 31, 2015 10:53 AM
 To: Duncan Murdoch
 Cc: R Help Mailing List
 Subject: Re: [R] Randomly interleaving data frames while preserving
 order
 
 On 03/31/2015 01:44 PM, Duncan Murdoch wrote:
  On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:
  Hello.
 
  I am trying to simulate recruitment in a randomized trial. Suppose I
  have three streams (strata) of patients represented by these data
 frames.
 
  df1 - data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
  df2 - data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
  df3 - data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
 
  What I need to do is construct a data frame with all of these
 combined
  where the order of selection from one of the three data frames is
  randomized but once a stratum is selected patients are selected
  sequentially from that data frame.
 
  To see what I'm looking to achieve, suppose the first five subjects
 were
  to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
  expected result should look like this:
 
  rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
   strat id  pid
  1  1  1 1001
  2  2  1 2001
  21 1  2 1002
  4  3  1 3001
  22 2  2 2002
 
  I hope what I'm trying to accomplish makes sense. Maybe I'm missing
  something obvious, but I really have no idea at the moment how to
  achieve this elegantly. Since I need to simulate many trial
 recruitments
  it needs to be general and compact.
 
  I appreciate any advice.
 
  How about something like this:
 
  # Permute an ordered vector of selections:
  sel - sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3,
 nrow(df3
 
  # Create an empty dataframe to hold the results
  df - data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]
 
  # Put the original dataframes into the appropriate slots:
  df[sel == 1,] - df1
  df[sel == 2,] - df2
  df[sel == 3,] - df3
 
  # Clean up the rownames
  rownames(df) - NULL
 
  Duncan Murdoch
 
 
 Thanks Duncan.
 
 Once you see the solution it is indeed obvious.
 
 Kevin
 
 --
 Kevin E. Thorpe
 Head of Biostatistics,  Applied Health Research Centre (AHRC)
 Li Ka Shing Knowledge Institute of St. Michael's
 Assistant Professor, Dalla Lana School of Public Health
 University of Toronto
 email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
 

Another option would be to stack your strata and then sample from the combined 
data frame, something like this:

sample_size - 10
population - rbind(df1,df2,df3)
sim.sample - pop[sample(nrow(pop),sample_size, replace=FALSE),]

Hope this is helpful,

Dan

Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services  Enterprise Support Administration
Washington State Department of Social and Health Services


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Kevin E. Thorpe

On 03/31/2015 01:44 PM, Duncan Murdoch wrote:

On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:

Hello.

I am trying to simulate recruitment in a randomized trial. Suppose I
have three streams (strata) of patients represented by these data frames.

df1 - data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 - data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 - data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

What I need to do is construct a data frame with all of these combined
where the order of selection from one of the three data frames is
randomized but once a stratum is selected patients are selected
sequentially from that data frame.

To see what I'm looking to achieve, suppose the first five subjects were
to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
expected result should look like this:

rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
 strat id  pid
1  1  1 1001
2  2  1 2001
21 1  2 1002
4  3  1 3001
22 2  2 2002

I hope what I'm trying to accomplish makes sense. Maybe I'm missing
something obvious, but I really have no idea at the moment how to
achieve this elegantly. Since I need to simulate many trial recruitments
it needs to be general and compact.

I appreciate any advice.


How about something like this:

# Permute an ordered vector of selections:
sel - sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3

# Create an empty dataframe to hold the results
df - data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

# Put the original dataframes into the appropriate slots:
df[sel == 1,] - df1
df[sel == 2,] - df2
df[sel == 3,] - df3

# Clean up the rownames
rownames(df) - NULL

Duncan Murdoch



Thanks Duncan.

Once you see the solution it is indeed obvious.

Kevin

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread David Wolfskill
On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
 I think we need some data and code 
 Reproducibility
 https://github.com/hadley/devtools/wiki/Reproducibility
  
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 

I apologize for failing to provide that.

Here is a quite small subset of the data (with a few edits to reduce
excess verbosity in names of things) that still illustrates the
challenge I perceive:

 dput(bw)
structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L, 
1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L, 
1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L, 
1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
), hostname = c(c001, c002, c021, c022, c041, c051, 
c001, c002, c021, c022, c041, c051, c001, c002, 
c021, c022, c041, c051), health = c(0.054937499983, 
0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527, 
0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525, 
0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151
), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 
1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = 2015Q1.2, class = factor), role = structure(c(1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c(control, test), class = factor), type = structure(c(3L, 
3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 
2L), .Label = c(D, F, H), class = factor), da20_busy_pct = c(79.1, 
62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA, 
NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c(, 
hdd), class = factor), da20_kb_per_xfer_read = c(727.23, 
665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71, 
668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA, 
NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read = 
c(39.77, 
31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24, 
NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA, 
NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = 
c(43.5, 
31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6, 
NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA, 
NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0, 
0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA, 
NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56, 
48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA, 
NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0, 
NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5, 
81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2, 
74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L, 
2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 
3L), .Label = c(, hdd, ssd), class = factor), da2_kb_per_xfer_read = 
c(690.67, 
686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01, 
594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02, 
564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57, 
134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61, 
268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99), 
da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8, 
2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9, 
1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0, 
2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), da2_xfers_per_sec_other 
= c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_xfers_per_sec_read 
= c(66, 
62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61, 
226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c(timestamp, 
hostname, health, hw, fw, role, type, da20_busy_pct, 
da20_dev_type, da20_kb_per_xfer_read, da20_kb_per_xfer_write, 
da20_mb_per_sec_read, da20_mb_per_sec_write, da20_ms_per_xactn_read, 
da20_ms_per_xactn_write, da20_Q_length, da20_xfers_per_sec_other, 
da20_xfers_per_sec_read, da20_xfers_per_sec_write, da2_busy_pct, 
da2_dev_type, da2_kb_per_xfer_read, da2_kb_per_xfer_write, 
da2_mb_per_sec_read, da2_mb_per_sec_write, da2_ms_per_xactn_read, 
da2_ms_per_xactn_write, da2_Q_length, da2_xfers_per_sec_other, 
da2_xfers_per_sec_read, da2_xfers_per_sec_write), class = data.frame, 
row.names = c(1L, 
2L, 7L, 8L, 13L, 16L, 19L, 20L, 25L, 26L, 31L, 34L, 37L, 38L, 
43L, 44L, 49L, 52L))
 dim(bw)
[1] 18 31

(In the current case, 

[R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
Hi folks,

I KNOW there has to be a way to do this more elegantly, but I
consistently fail to come up with it, as I was just reminded while
writing an example for a query on this list.

What's a nifty way to construct a data frame of a given size? The only
way I know of it to use matrix(), eg

data.frame(matrix(NA, nrow=10, ncol=3))

and then to set the colnames in a second step.

This comes up a lot when pre-allocated a data frame before using a
loop: I know the size and column names, but want an empty structure to
fill later.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Duncan Murdoch

On 31/03/2015 1:52 PM, Sarah Goslee wrote:

I just snagged this from Duncan Murdoch's reply to the same question:

# Create an empty dataframe to hold the results
df - data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),]

This skips matrix(), but how to set the column names programmatically
within a function?

Sarah, still sure I'm missing something obvious


The matrix() function has a dimnames argument, so you could do this:

names - c(strat, id, pid)
data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))

Duncan Murdoch



On Tue, Mar 31, 2015 at 1:46 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi folks,

 I KNOW there has to be a way to do this more elegantly, but I
 consistently fail to come up with it, as I was just reminded while
 writing an example for a query on this list.

 What's a nifty way to construct a data frame of a given size? The only
 way I know of it to use matrix(), eg

 data.frame(matrix(NA, nrow=10, ncol=3))

 and then to set the colnames in a second step.

 This comes up a lot when pre-allocated a data frame before using a
 loop: I know the size and column names, but want an empty structure to
 fill later.

 Sarah




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame: data-driven column selections that vary by row??

2015-03-31 Thread Ista Zahn
Hi David,

I suggest reading http://www.jstatsoft.org/v59/i10, then:

library(tidyr)
library(dplyr)
bw - gather(bw, key = tmp, value = value, matches(^d[a-z]+[0-9]+))
bw - separate(bw, tmp, c(disc, var), _, extra = merge)
bw - spread(bw, var, value)

Best,
Ista

On Tue, Mar 31, 2015 at 1:22 PM, David Wolfskill r...@catwhisker.org wrote:
 On Tue, Mar 31, 2015 at 07:11:28AM -0800, John Kane wrote:
 I think we need some data and code
 Reproducibility
 https://github.com/hadley/devtools/wiki/Reproducibility
  
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 

 I apologize for failing to provide that.

 Here is a quite small subset of the data (with a few edits to reduce
 excess verbosity in names of things) that still illustrates the
 challenge I perceive:

 dput(bw)
 structure(list(timestamp = c(1426892400L, 1426892400L, 1426892400L,
 1426892400L, 1426892400L, 1426892400L, 1426892460L, 1426892460L,
 1426892460L, 1426892460L, 1426892460L, 1426892460L, 1426892520L,
 1426892520L, 1426892520L, 1426892520L, 1426892520L, 1426892520L
 ), hostname = c(c001, c002, c021, c022, c041, c051,
 c001, c002, c021, c022, c041, c051, c001, c002,
 c021, c022, c041, c051), health = c(0.054937499983,
 0.25058541667, 1, 1, 0.577784167075767, 0.546805261621527,
 0.1599375, 0.24954375, 1, 1, 0.582307554123614, 0.558298168996525,
 0.2813125, 0.27087708333, 1, 1, 0.579231349457365, 0.542973020177151
 ), hw = c(1.9, 1.9, 1.4, 1.4, 1.5, 1.5, 1.9, 1.9, 1.4, 1.4, 1.5,
 1.5, 1.9, 1.9, 1.4, 1.4, 1.5, 1.5), fw = structure(c(1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
 ), .Label = 2015Q1.2, class = factor), role = structure(c(1L,
 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
 2L), .Label = c(control, test), class = factor), type = structure(c(3L,
 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L,
 2L), .Label = c(D, F, H), class = factor), da20_busy_pct = c(79.1,
 62.8, NA, NA, NA, NA, 75, 64.8, NA, NA, NA, NA, 72.2, 74.5, NA,
 NA, NA, NA), da20_dev_type = structure(c(2L, 2L, 1L, 1L, 1L,
 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c(,
 hdd), class = factor), da20_kb_per_xfer_read = c(727.23,
 665.81, NA, NA, NA, NA, 737.04, 691.38, NA, NA, NA, NA, 721.71,
 668.96, NA, NA, NA, NA), da20_kb_per_xfer_write = c(0, 0, NA,
 NA, NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_mb_per_sec_read 
 = c(39.77,
 31.21, NA, NA, NA, NA, 36.71, 32.41, NA, NA, NA, NA, 35.94, 37.24,
 NA, NA, NA, NA), da20_mb_per_sec_write = c(0, 0, NA, NA, NA,
 NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_ms_per_xactn_read = 
 c(43.5,
 31.6, NA, NA, NA, NA, 35.7, 30.2, NA, NA, NA, NA, 32.7, 34.6,
 NA, NA, NA, NA), da20_ms_per_xactn_write = c(0, 0, NA, NA, NA,
 NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_Q_length = c(0,
 0, NA, NA, NA, NA, 2, 0, NA, NA, NA, NA, 1, 1, NA, NA, NA, NA
 ), da20_xfers_per_sec_other = c(0, 0, NA, NA, NA, NA, 0, 0, NA,
 NA, NA, NA, 0, 0, NA, NA, NA, NA), da20_xfers_per_sec_read = c(56,
 48, NA, NA, NA, NA, 51, 48, NA, NA, NA, NA, 51, 57, NA, NA, NA,
 NA), da20_xfers_per_sec_write = c(0, 0, NA, NA, NA, NA, 0, 0,
 NA, NA, NA, NA, 0, 0, NA, NA, NA, NA), da2_busy_pct = c(84.5,
 81.8, 29.5, 26.7, 55.5, 50.9, 80.6, 79.7, 29.2, 27.3, 58.8, 50.2,
 74.6, 79.3, 29.4, 26.6, 55.4, 50.1), da2_dev_type = structure(c(2L,
 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L,
 3L), .Label = c(, hdd, ssd), class = factor), da2_kb_per_xfer_read = 
 c(690.67,
 686.63, 613.78, 587, 571.64, 553.27, 692.26, 660.05, 612.01,
 594.28, 560.16, 566.41, 672.68, 670.25, 604.64, 592.16, 565.02,
 564.43), da2_kb_per_xfer_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_mb_per_sec_read = c(44.52, 41.57,
 134.26, 120.38, 252.88, 229.09, 41.24, 39.96, 132.68, 123.61,
 268.04, 227.34, 37.44, 39.93, 133.45, 120.28, 251.06, 225.99),
 da2_mb_per_sec_write = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0), da2_ms_per_xactn_read = c(49.1, 47.8,
 2, 1.8, 2.6, 2.4, 40.3, 43.9, 2, 1.8, 2.8, 2.4, 37.1, 40.9,
 1.9, 1.8, 2.6, 2.4), da2_ms_per_xactn_write = c(0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), da2_Q_length = c(0,
 2, 0, 1, 3, 0, 3, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 3), 
 da2_xfers_per_sec_other = c(0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
 da2_xfers_per_sec_read = c(66,
 62, 224, 210, 453, 424, 61, 62, 222, 213, 490, 411, 57, 61,
 226, 208, 455, 410), da2_xfers_per_sec_write = c(0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c(timestamp,
 hostname, health, hw, fw, role, type, da20_busy_pct,
 da20_dev_type, da20_kb_per_xfer_read, da20_kb_per_xfer_write,
 da20_mb_per_sec_read, da20_mb_per_sec_write, da20_ms_per_xactn_read,
 da20_ms_per_xactn_write, da20_Q_length, da20_xfers_per_sec_other,
 da20_xfers_per_sec_read, da20_xfers_per_sec_write, da2_busy_pct,
 da2_dev_type, da2_kb_per_xfer_read, 

Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Sarah Goslee
That's a fun one. Here's one possible approach. (Note that it can be
done without using a loop, but I find that a loop here increases
readability.)

I wrote it to work on a list of data frames. If the selection is
random, I'd set it up so that size is passed to the function, but
selection is generated within the function using sample().

recruitment - function(dflist, selection) {
results - data.frame(matrix(NA, nrow=length(selection),
ncol=ncol(dflist[[1]])))
colnames(results) - colnames(dflist[[1]])
for(i in unique(selection)) {
results[selection == i, ] - dflist[[i]][seq_len(sum(selection == i)),]
}
results
}


# and your example:


df1 - data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 - data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 - data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

touse - c(1, 2, 1, 3, 1) # could be generated using sample

dfall - list(df1, df2, df3)

touse - c(1, 2, 1, 3, 1)
# could be generated using sample given the size argument
# touse - sample(seq_along(dfall), size=5, replace=TRUE)

 recruitment(dfall, touse)
  strat id  pid
1 1  1 1001
2 2  1 2001
3 1  2 1002
4 3  1 3001
5 1  3 1003

Sarah

On Tue, Mar 31, 2015 at 1:05 PM, Kevin E. Thorpe
kevin.tho...@utoronto.ca wrote:
 Hello.

 I am trying to simulate recruitment in a randomized trial. Suppose I have
 three streams (strata) of patients represented by these data frames.

 df1 - data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
 df2 - data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
 df3 - data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)

 What I need to do is construct a data frame with all of these combined where
 the order of selection from one of the three data frames is randomized but
 once a stratum is selected patients are selected sequentially from that data
 frame.

 To see what I'm looking to achieve, suppose the first five subjects were to
 come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected
 result should look like this:

 rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
strat id  pid
 1  1  1 1001
 2  2  1 2001
 21 1  2 1002
 4  3  1 3001
 22 2  2 2002

 I hope what I'm trying to accomplish makes sense. Maybe I'm missing
 something obvious, but I really have no idea at the moment how to achieve
 this elegantly. Since I need to simulate many trial recruitments it needs to
 be general and compact.

 I appreciate any advice.

 Kevin


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Randomly interleaving data frames while preserving order

2015-03-31 Thread Tom Wright
samples-sample(c(rep(1,10),rep(2,10),rep(3,10)),30)
samples[samples==1]-1001:1010
samples[samples==2]-2001:2010
samples[samples==3]-3001:3010

fullDf-rbind(df1,df2,df3)

fullDf[sort(order(samples),index.return=TRUE)$ix,]

On Tue, 2015-03-31 at 13:05 -0400, Kevin E. Thorpe wrote:
 Hello.
 
 I am trying to simulate recruitment in a randomized trial. Suppose I 
 have three streams (strata) of patients represented by these data frames.
 

 
 What I need to do is construct a data frame with all of these combined 
 where the order of selection from one of the three data frames is 
 randomized but once a stratum is selected patients are selected 
 sequentially from that data frame.
 
 To see what I'm looking to achieve, suppose the first five subjects were 
 to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The 
 expected result should look like this:
 
 rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
 strat id  pid
 1  1  1 1001
 2  2  1 2001
 21 1  2 1002
 4  3  1 3001
 22 2  2 2002
 
 I hope what I'm trying to accomplish makes sense. Maybe I'm missing 
 something obvious, but I really have no idea at the moment how to 
 achieve this elegantly. Since I need to simulate many trial recruitments 
 it needs to be general and compact.
 
 I appreciate any advice.
 
 Kevin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain a cross tab count of unique values

2015-03-31 Thread Tom Wright
table(unique(df)$PROJECT)

On Tue, 2015-03-31 at 14:51 -0500, Walter Anderson wrote:
 I have a data frame that shows all of the parks (including duplicates)
 that are impacted by a projects 'footprint':
 
 PROJECT PARKNAME
 A   PRK A
 A   PRK B
 A   PRK A
 B   PRK C
 B   PRK A
 C   PRK B
 C   PRK D
 ...
 
 What I need is a cross tabulation that shows me the number of unique
 parks for each project.  If I using the standard table(df$PROJECT) it
 reports:
 
 A 3
 B 2
 C 2
 ...
 
 where I need it to ignore duplicates and report:
 
 A 2
 B 2
 C 2
 ...
 
 Anyone have any suggestions on how to do this within the R paradigm?
 
 Walter Anderson
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Debug package options

2015-03-31 Thread Keith S Weintraub
Duncan,
Thanks for the help.

Since I am the only person using this machine and I couldn’t figure out where 
to put the option statement aside from:
C:\Program Files\R\R-3.1.2\etc
In the file Rprofile.site

The option that I wanted was:
options(debug.font = Consolas 12”)

Which allowed me to have the right size font and Tk window to be able to do 
debugging using the debug package.

In case you are interested I use Windows 7 on my Mac via Parallels.

Thanks again,
Best,
KW



 On Mar 30, 2015, at 2:05 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 
 On 30/03/2015 1:50 PM, Keith S Weintraub wrote:
 Folks,
 
 I would like change some of the options for the Tk window that pops up when 
 using the debug package.
 
 I know how to change the options: e.g. options(debug.font = Courier 12 
 italic”).
 
 Is there a way to “preset” these in my environment so when debug starts up I 
 have all the options set up the way I want them?
 
 Do I do this in a .First file? Does the .First file have to load the debug 
 package every time I start up R?
 
 No need to do my work for me. Just point me to the right doc.
 
 See the ?Startup help topic.  You probably want to use one of the
 profile files rather than .First, because .First needs to be in a
 workspace, and you shouldn't be loading a workspace every time.
 
 Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread stephen sefick
Your data and post is still not provided in one of the formats provided
here:
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example.
I am unsure of what you want to do, but I have made a reproducible example
that might help.

zz - Date Number.of.Rain.Days Total.rain Start.of.Rain..i.
Start.of.Rain..ii.   Start.of.Rain..iii.
 1952-01-01  86   1139.95292
 239 11
 1953-01-01  96977.64698
  98 11
 1954-01-01 114   1382.01492
  92 12
 1955-01-01 119   1323.086   100
 100 12
 1956-01-01 123   1266.44492
  92 11
 1957-01-01 124   1235.96492
  92 11

library(reshape)
library(ggplot2)

Data - read.table(text=zz, header = TRUE)

df1 -data.frame(Data)

df2 - melt(df1 ,  id = c('Date', 'Number.of.Rain.Days'))

df3 - df2[-grep(Total.rain, df2$variable),]

qplot(Date,value, data=df3) +facet_wrap(~variable)

On Tue, Mar 31, 2015 at 2:55 AM, Frederic Ntirenganya ntfr...@gmail.com
wrote:

  Hi All,

 Sorry for the shape of data which was not good enough.This is how my data 
 look like.

 I want to plot multiple using ggplot function from a data frame of many 
 columns. I want to plot only Start.of.Rain..i., Start.of.Rain..ii. and  
 Start.of.Rain..iii. and I failed to make it. What I want is to compare 
 Start.of.Rain..i., Start.of.Rain..ii. and Start.of.Rain..iii. by plotting 
 vertical line. I also need to add points to the plot to be able to separate 
 them. The x-axis must be date column. Thanks!

 Here is how the data look like and how I tried to make it.



 Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
 Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96 977.646
 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11


 Here is how I tried to solve the problem.

 df1 -data.frame(data)
 df1
 df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
 df2

 ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)

 Kindly any help is welcome. Thanks

 Regards,
 Frederic.

 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/

 On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller jdnew...@dcn.davis.ca.us
 wrote:

 This is no better because (a) you are still posting using HTML format,
 and (b) using printed output loses the internal representation of the data.
 The dput function is very helpful for solving this. [1]

 [1]
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

 ---
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k

 ---
 Sent from my phone. Please excuse my brevity.

 On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya 
 ntfr...@gmail.com wrote:
 Hi Stephen,
 
 Sorry, the data came in bad way.
 Here is the head of the data.
 
  head(data)Date Number.of.Rain.Days Total.rain
 Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
 Start.Rain..iv.
 1 1952-01-01  86   1139.95292
   239 112 112
 2 1953-01-01  96977.64698
98 112 112
 3 1954-01-01 114   1382.01492
92 120 120
 4 1955-01-01 119   1323.086   100
   100 125 174
 5 1956-01-01 123   1266.44492
92 119 119
 6 1957-01-01 124   1235.96492
92 112 112
 
 
 
 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/
 
 On Mon, Mar 30, 2015 at 5:34 PM, stephen sefick ssef...@gmail.com
 wrote:
 
  Hi Frederic,
 
  Can you provide a minimal reproducible example including either real
 data
  (dput), or simulated data that mimics your situation? This will allow
 more
  people to help.
 
  Stephen
 
  On Mon, Mar 30, 2015 at 8:39 AM, Frederic Ntirenganya
 ntfr...@gmail.com
  wrote:
 

Re: [R] Multiple Plots using ggplot

2015-03-31 Thread John Kane
The data you supplied is still in a useless format.

Please send it to us in dput format (and don't post in html)

Here is a complete example of creating a data.frame and converting it to a 
useable data set that readers on R-help can use

##=Start Example===##
# Simple example data set in a data.frame
data1  -  data.frame(xx = 1:20, yy = sample(letters[1:26], 20, replace = 
TRUE), zz  -  rnorm(20))

dput(data1)  # convert to dput() format for tranfering to other userss
 
# dput() result. Copy and paste back into your editor
structure(list(xx = 1:20, yy = structure(c(6L, 3L, 7L, 12L, 1L, 
1L, 2L, 7L, 9L, 6L, 8L, 7L, 9L, 5L, 4L, 10L, 11L, 4L, 8L, 11L
), .Label = c(a, f, g, h, i, j, k, o, p, u, 
w, z), class = factor), zzrnorm.20. = c(0.379202224643519, 
-0.293649882956148, 2.27761155645142, 0.0378126031936277, 0.518138385757923, 
1.11655160886907, -1.64262245261915, 1.11341365979718, -0.184737977758355, 
0.439361470235051, 1.2597110753159, -0.795425331570368, 0.974654694801041, 
-0.309087884123705, -1.55929705211554, 0.147715827800676, -0.542626171203849, 
0.745294589678554, -0.254290052908619, 0.939894889209173)), .Names = c(xx, 
yy, zzrnorm.20.), row.names = c(NA, -20L), class = data.frame)

#  Read data back into standard R format, calling the data dat1

dat1  -  structure(list(xx = 1:20, yy = structure(c(6L, 3L, 7L, 12L, 1L, 
1L, 2L, 7L, 9L, 6L, 8L, 7L, 9L, 5L, 4L, 10L, 11L, 4L, 8L, 11L
), .Label = c(a, f, g, h, i, j, k, o, p, u, 
w, z), class = factor), zzrnorm.20. = c(0.379202224643519, 
-0.293649882956148, 2.27761155645142, 0.0378126031936277, 0.518138385757923, 
1.11655160886907, -1.64262245261915, 1.11341365979718, -0.184737977758355, 
0.439361470235051, 1.2597110753159, -0.795425331570368, 0.974654694801041, 
-0.309087884123705, -1.55929705211554, 0.147715827800676, -0.542626171203849, 
0.745294589678554, -0.254290052908619, 0.939894889209173)), .Names = c(xx, 
yy, zzrnorm.20.), row.names = c(NA, -20L), class = data.frame)

dat1
##=End Example===##

John Kane
Kingston ON Canada


 -Original Message-
 From: ntfr...@gmail.com
 Sent: Tue, 31 Mar 2015 10:55:11 +0300
 To: jdnew...@dcn.davis.ca.us
 Subject: Re: [R] Multiple Plots using ggplot
 
  Hi All,
 
 Sorry for the shape of data which was not good enough.This is how my
 data look like.
 
 I want to plot multiple using ggplot function from a data frame of
 many columns. I want to plot only Start.of.Rain..i.,
 Start.of.Rain..ii. and  Start.of.Rain..iii. and I failed to make it.
 What I want is to compare Start.of.Rain..i., Start.of.Rain..ii. and
 Start.of.Rain..iii. by plotting vertical line. I also need to add
 points to the plot to be able to separate them. The x-axis must be
 date column. Thanks!
 
 Here is how the data look like and how I tried to make it.
 
 
 
 Date Number.of.Rain.Days Total.rain Start.of.Rain..i. Start.of.Rain..ii.
 Start.of.Rain..iii. 1952-01-01 86 1139.952 92 239 11 1953-01-01 96
 977.646
 98 98 11 1954-01-01 114 1382.014 92 92 12 1955-01-01 119 1323.086 100 100
 12 1956-01-01 123 1266.444 92 92 11 1957-01-01 124 1235.964 92 92 11
 
 
 Here is how I tried to solve the problem.
 
 df1 -data.frame(data)
 df1
 df2 - melt(df1 ,  id = 'Date', variable_name = 'start of Rains')
 df2
 
 ggplot(df2, aes(Date,value)) + geom_line(aes(colour =red),type = h)
 
 Kindly any help is welcome. Thanks
 
 Regards,
 Frederic.
 
 Frederic Ntirenganya
 Maseno University,
 African Maths Initiative,
 Kenya.
 Mobile:(+254)718492836
 Email: fr...@aims.ac.za
 https://sites.google.com/a/aims.ac.za/fredo/
 
 On Tue, Mar 31, 2015 at 9:24 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us
 wrote:
 
 This is no better because (a) you are still posting using HTML format,
 and
 (b) using printed output loses the internal representation of the data.
 The
 dput function is very helpful for solving this. [1]
 
 [1]
 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
 ---
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.
 
 On March 30, 2015 10:56:48 PM PDT, Frederic Ntirenganya
 ntfr...@gmail.com
 wrote:
 Hi Stephen,
 
 Sorry, the data came in bad way.
 Here is the head of the data.
 
 head(data)Date Number.of.Rain.Days Total.rain
 Start.of.Rain..i. Start.of.Rain..ii. Start.of.Rain..iii.
 Start.Rain..iv.
 1 1952-01-01  86   1139.95292
  239 112 112
 2 1953-01-01  

Re: [R] Using matlab code in R

2015-03-31 Thread Jeff Newmiller
The Posting Guide recommends searching the archives before posting. Consider 
[1] and learn.

[1] https://stat.ethz.ch/pipermail/r-help/2007-March/127981.html
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On March 31, 2015 1:47:49 PM PDT, T.Riedle tr...@kent.ac.uk wrote:
Hi everybody,
I have a matlab code which I would like to use for my empirical
analysis. Unfortunately, I am not familiar with matlab and it would be
great if there was a tool to translate the matlab code into R so that
I can work with the code in R.
Is there such a tool or package in R?

Kind regards,
T.

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sarah Goslee
On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger r...@temple.edu wrote:
 I got rid of the extra column.

 data.frame(r=seq(8), foo=NA, bar=NA, row.names=r)

Brilliant!

After much fussing, including a disturbing detour into nested lapply
statements from which I barely emerged with my sanity (arguable, I
suppose), here is a one-liner that creates a data frame of arbitrary
number of rows given an existing data frame as template for column
number and name:


n - 8
df1 - data.frame(A=runif(9), B=runif(9))

do.call(data.frame, setNames(c(list(seq(n), r), as.list(rep(NA,
ncol(df1, c(r, row.names, colnames(df1

It's not elegant, but it is fairly R-ish. I should probably stop
hunting for an elegant solution now.

Thanks, everyone!

Sarah


 Rich

 On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer sven.temp...@gmail.com 
 wrote:
 If you don't mind an extra column, you could use something similar to:

 data.frame(r=seq(8),foo=NA,bar=NA)

 If you do, here is another approach (see function body):

 empty.frame - function (r = 1, n = 1, fill = NA_real_) {
   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
 }
 empty.frame()
 empty.frame(, seq(3))
 empty.frame(8, c(foo, bar))

 I could not put it in one line either, without retyping at least one
 argument (n in this case).
 So I suggest a function is the way to go for a simplified syntax ...

 Thanks to all for the ideas!
 Sven

 On 31 March 2015 at 20:55, William Dunlap wdun...@tibco.com wrote:

 You can use structure() to attach the names to a list that is input to
 data.frame.
 E.g.,

 dfNames - c(First, Second Name)
 data.frame(lapply(structure(dfNames, names=dfNames),
 function(name)rep(NA_real_, 5)))


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

  Hi,
 
  Duncan Murdoch suggested:
 
   The matrix() function has a dimnames argument, so you could do this:
  
   names - c(strat, id, pid)
   data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
 
  That's a definite improvement, thanks. But no way to skip matrix()? It
  just seems unRlike, although since it's only full of NA values there
  are no coercion issues with column types or anything, so it doesn't
  hurt. It's just inelegant. :)
 
  Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Sven E. Templer
If you don't mind an extra column, you could use something similar to:

data.frame(r=seq(8),foo=NA,bar=NA)

If you do, here is another approach (see function body):

empty.frame - function (r = 1, n = 1, fill = NA_real_) {
  data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
}
empty.frame()
empty.frame(, seq(3))
empty.frame(8, c(foo, bar))

I could not put it in one line either, without retyping at least one
argument (n in this case).
So I suggest a function is the way to go for a simplified syntax ...

Thanks to all for the ideas!
Sven

On 31 March 2015 at 20:55, William Dunlap wdun...@tibco.com wrote:

 You can use structure() to attach the names to a list that is input to
 data.frame.
 E.g.,

 dfNames - c(First, Second Name)
 data.frame(lapply(structure(dfNames, names=dfNames),
 function(name)rep(NA_real_, 5)))


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

  Hi,
 
  Duncan Murdoch suggested:
 
   The matrix() function has a dimnames argument, so you could do this:
  
   names - c(strat, id, pid)
   data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
 
  That's a definite improvement, thanks. But no way to skip matrix()? It
  just seems unRlike, although since it's only full of NA values there
  are no coercion issues with column types or anything, so it doesn't
  hurt. It's just inelegant. :)
 
  Sarah
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain a cross tab count of unique values

2015-03-31 Thread Rui Barradas

Hello,

Try the following.

table(unique(df)$PROJECT)


And please note that 'df' is the name of an R function, use something else.

Hope this helps,

Rui Barradas

Em 31-03-2015 20:51, Walter Anderson escreveu:

I have a data frame that shows all of the parks (including duplicates)
that are impacted by a projects 'footprint':

PROJECT PARKNAME
A   PRK A
A   PRK B
A   PRK A
B   PRK C
B   PRK A
C   PRK B
C   PRK D
...

What I need is a cross tabulation that shows me the number of unique
parks for each project.  If I using the standard table(df$PROJECT) it
reports:

A 3
B 2
C 2
...

where I need it to ignore duplicates and report:

A 2
B 2
C 2
...

Anyone have any suggestions on how to do this within the R paradigm?

Walter Anderson

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain a cross tab count of unique values

2015-03-31 Thread Sarah Goslee
Sure: tell R you want unique rows.

 mydf - data.frame(PROJECT=c(A,A,A,B,B,C,C), PARKNAME=c(PRK 
 A, PRK B, PRK A, PRK C, PRK A, PRK B, PRK D), 
 stringsAsFactors=FALSE)
 mydf
  PROJECT PARKNAME
1   APRK A
2   APRK B
3   APRK A
4   BPRK C
5   BPRK A
6   CPRK B
7   CPRK D

 mydf.unique - unique(mydf)
 table(mydf.unique$PROJECT)

A B C
2 2 2

Please provide reproducible data yourself in the future.

Sarah

On Tue, Mar 31, 2015 at 3:51 PM, Walter Anderson wandrso...@gmail.com wrote:
 I have a data frame that shows all of the parks (including duplicates)
 that are impacted by a projects 'footprint':

 PROJECT PARKNAME
 A   PRK A
 A   PRK B
 A   PRK A
 B   PRK C
 B   PRK A
 C   PRK B
 C   PRK D
 ...

 What I need is a cross tabulation that shows me the number of unique
 parks for each project.  If I using the standard table(df$PROJECT) it
 reports:

 A 3
 B 2
 C 2
 ...

 where I need it to ignore duplicates and report:

 A 2
 B 2
 C 2
 ...

 Anyone have any suggestions on how to do this within the R paradigm?

 Walter Anderson

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Richard M. Heiberger
I got rid of the extra column.

data.frame(r=seq(8), foo=NA, bar=NA, row.names=r)

Rich

On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer sven.temp...@gmail.com wrote:
 If you don't mind an extra column, you could use something similar to:

 data.frame(r=seq(8),foo=NA,bar=NA)

 If you do, here is another approach (see function body):

 empty.frame - function (r = 1, n = 1, fill = NA_real_) {
   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
 }
 empty.frame()
 empty.frame(, seq(3))
 empty.frame(8, c(foo, bar))

 I could not put it in one line either, without retyping at least one
 argument (n in this case).
 So I suggest a function is the way to go for a simplified syntax ...

 Thanks to all for the ideas!
 Sven

 On 31 March 2015 at 20:55, William Dunlap wdun...@tibco.com wrote:

 You can use structure() to attach the names to a list that is input to
 data.frame.
 E.g.,

 dfNames - c(First, Second Name)
 data.frame(lapply(structure(dfNames, names=dfNames),
 function(name)rep(NA_real_, 5)))


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

  Hi,
 
  Duncan Murdoch suggested:
 
   The matrix() function has a dimnames argument, so you could do this:
  
   names - c(strat, id, pid)
   data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
 
  That's a definite improvement, thanks. But no way to skip matrix()? It
  just seems unRlike, although since it's only full of NA values there
  are no coercion issues with column types or anything, so it doesn't
  hurt. It's just inelegant. :)
 
  Sarah
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can not load Rcmdr

2015-03-31 Thread a b
I have a similar issue with tcl.

I am using R on a Linux server.  Rcmdr installed OK, but it won't run:

 R.Version()
$platform
[1] x86_64-unknown-linux-gnu

$arch
[1] x86_64

$os
[1] linux-gnu

$system
[1] x86_64, linux-gnu

$status
[1] 

$major
[1] 3

$minor
[1] 1.0

$year
[1] 2014

$month
[1] 04

$day
[1] 10

$`svn rev`
[1] 65387

$language
[1] R

$version.string
[1] R version 3.1.0 (2014-04-10)

$nickname
[1] Spring Dance

 library(Rcmdr)
Error : .onAttach failed in attachNamespace() for 'Rcmdr', details:
  call: structure(.External(.C_dotTcl, ...), class = tclObj)
  error: [tcl] Invalid state name hover.

Error: package or namespace load failed for 'Rcmdr'
 


This is kind of frustrating because I don't have admin privileges to install
Rstudio on this server, either.  

I guess it's time to use Emacs.



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-not-load-Rcmdr-tp4655656p4705370.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating different PCAs in R

2015-03-31 Thread im db
Dear All, I want to use princomp() function in R in order to calculate 
Principle Component Analysis.In different papers, I have seen PCA 1, PCA 2, 
PCA 11 , etc. Would you please tell me how can i calculate different PCAs in 
R?At the moment i just use this line eigenVectors - pca$loadingsBut I don’t 
know if it is correct to use loadings.Thank you in advance.  Best regards,
Iman Dabbaghi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using matlab code in R

2015-03-31 Thread T.Riedle
Hi everybody,
I have a matlab code which I would like to use for my empirical analysis. 
Unfortunately, I am not familiar with matlab and it would be great if there was 
a tool to translate the matlab code into R so that I can work with the code 
in R.
Is there such a tool or package in R?

Kind regards,
T.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] idiom for constructing data frame

2015-03-31 Thread Henrik Bengtsson
I've got dataFrame() in R.utils for this purpose, e.g.

 df - dataFrame(colClasses=c(a=integer, b=double, c=character), 
 nrow=10L)
 str(df)
'data.frame':   10 obs. of  3 variables:
 $ a: int  0 0 0 0 0 0 0 0 0 0
 $ b: num  0 0 0 0 0 0 0 0 0 0
 $ c: chr  ...

Related: You can use the colClasses() function to generate the
'colClasses' argument dynamically, e.g.

 cols - colClasses(idc)
 names(cols) - c(a, b, c)
 str(cols)
 Named chr [1:3] integer double character
 - attr(*, names)= chr [1:3] a b c

 cols - colClasses(sprintf(c2d%di, 4))
 df - dataFrame(colClasses=cols, nrow=10L)
str(df)
'data.frame':   10 obs. of  7 variables:
 $ : chr  ...
 $ : num  0 0 0 0 0 0 0 0 0 0
 $ : num  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0
 $ : int  0 0 0 0 0 0 0 0 0 0


dataFrame() is basically implemented as:

dataFrame - function(colClasses, nrow=1L, ...) {
  df - vector(list, length=length(colClasses))
  names(df) - names(colClasses)
  for (kk in seq(along=df)) {
df[[kk]] - vector(colClasses[kk], length=nrow)
  }
  attr(df, row.names) - seq(length=nrow)
  class(df) - data.frame
  df
} # dataFrame()

/Henrik

On Tue, Mar 31, 2015 at 4:42 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 On Tue, Mar 31, 2015 at 6:35 PM, Richard M. Heiberger r...@temple.edu wrote:
 I got rid of the extra column.

 data.frame(r=seq(8), foo=NA, bar=NA, row.names=r)

 Brilliant!

 After much fussing, including a disturbing detour into nested lapply
 statements from which I barely emerged with my sanity (arguable, I
 suppose), here is a one-liner that creates a data frame of arbitrary
 number of rows given an existing data frame as template for column
 number and name:


 n - 8
 df1 - data.frame(A=runif(9), B=runif(9))

 do.call(data.frame, setNames(c(list(seq(n), r), as.list(rep(NA,
 ncol(df1, c(r, row.names, colnames(df1

 It's not elegant, but it is fairly R-ish. I should probably stop
 hunting for an elegant solution now.

 Thanks, everyone!

 Sarah


 Rich

 On Tue, Mar 31, 2015 at 6:18 PM, Sven E. Templer sven.temp...@gmail.com 
 wrote:
 If you don't mind an extra column, you could use something similar to:

 data.frame(r=seq(8),foo=NA,bar=NA)

 If you do, here is another approach (see function body):

 empty.frame - function (r = 1, n = 1, fill = NA_real_) {
   data.frame(setNames(lapply(rep(fill, length(n)), rep, times=r), n))
 }
 empty.frame()
 empty.frame(, seq(3))
 empty.frame(8, c(foo, bar))

 I could not put it in one line either, without retyping at least one
 argument (n in this case).
 So I suggest a function is the way to go for a simplified syntax ...

 Thanks to all for the ideas!
 Sven

 On 31 March 2015 at 20:55, William Dunlap wdun...@tibco.com wrote:

 You can use structure() to attach the names to a list that is input to
 data.frame.
 E.g.,

 dfNames - c(First, Second Name)
 data.frame(lapply(structure(dfNames, names=dfNames),
 function(name)rep(NA_real_, 5)))


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com

 On Tue, Mar 31, 2015 at 11:37 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

  Hi,
 
  Duncan Murdoch suggested:
 
   The matrix() function has a dimnames argument, so you could do this:
  
   names - c(strat, id, pid)
   data.frame(matrix(NA, nrow=10, ncol=3, dimnames=list(NULL, names)))
 
  That's a definite improvement, thanks. But no way to skip matrix()? It
  just seems unRlike, although since it's only full of NA values there
  are no coercion issues with column types or anything, so it doesn't
  hurt. It's just inelegant. :)
 
  Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.