Re: [R] Get means of matrix

2015-11-18 Thread Dennis Murphy
Hi:

Here's another way to look at the problem. Instead of manually adding
a new column after k datasets have been read in, read your individual
data files into a list, as long as they all have the same variable
names and the same class (in this case, data.frame). Then create a
vector of names for the list components and use 'apply family' logic
to get the column means, returning the combined results to a data
frame or matrix. Here's a toy example to illustrate the point.
Firstly, three data frames are created and saved to external files:

# Create some artificial data and ship to external files

d1 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25))
d2 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25))
d3 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25))

write.csv(d1, file = "d1.csv", row.names = TRUE, quote = FALSE)
write.csv(d2, file = "d2.csv", row.names = TRUE, quote = FALSE)
write.csv(d3, file = "d3.csv", row.names = TRUE, quote = FALSE)

###
# Now, read them back in and store them in a list object

# Vector of file names to process
files <- paste0("d", 1:3, ".csv")

# Create the list of data frames and assign names to list components
L <- lapply(files, function(x) read.csv(x, header = TRUE))
names(L) <- paste0("d", 1:3)

# Compute column means from each list component and row bind them
# Method 1: base R
do.call(rbind, lapply(L, colMeans))


# Method 2: plyr package
library(plyr)
ldply(L, colMeans)


Dennis

On Wed, Nov 18, 2015 at 2:19 AM, Jesús Para Fernández
 wrote:
> Hi everyone
>
> I have a dataframe "data" wich is the result of join multiple csv (400 rows 
> and 600cols every csv). The "data" dataframe has n rows and m columns (20 
> rows and 600 cols) , and I have add a new colum, "csvdata", in which I 
> specify the number of csv at wich those data belong.
>
> So, the dataframe "data" looks like:
>
> x1x2 x3xncsvdata
> 21   2332121
> 27   2139141
> 24   2230111
> ..
> 21   2432   19 2
> 27   2139142
> ..
> 27   22 3011n
>
>
>
> I want to store into a matrix the mean values of different substes of data of 
> every csv, for example:
>
> region1,1 (rows 1:20,columns 1:20) for every "csvdata" value
> region 2,1 (rows 21:40,columns 1:20) para every "csvdata" value
> 
>
> And so on for hole data.frame.
>
> I have tryed:
>
> area1<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T)
> area2<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T)
>
> But this error is the output I obtain:
>
> Error in tapply(data[1:30, ], datos$nueva, mean, na.rm = T) :
>   arguments must have same length
>
> I´m sure that it is not very complex to do it, but I have no idea of how to 
> do it.
>
> Thanks for all.
>
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Exporting column names with spaces with write.csv() XXXX

2015-09-05 Thread Dennis Murphy
...or, when trying to read it back into R,

read.csv(header = TRUE, text = '
"no good","no good at all"
1,4
2,5
3,6')

  no.good no.good.at.all
1   1  4
2   2  5
3   3  6


read.csv(header = TRUE, check.names = FALSE, text = '
"no good","no good at all"
1,4
2,5
3,6')

  no good no good at all
1   1  4
2   2  5
3   3  6


Thanks for the MWE, Jim.

Dennis

On Sat, Sep 5, 2015 at 7:57 PM, Jim Lemon  wrote:
> Hi Dan,
> I don't get this behavior with R-3.2.2:
>
> da.df<-data.frame(1:3,4:6)
> names(da.df)<-c("no good","no good at all")
>> da.df
>   no good no good at all
> 1   1  4
> 2   2  5
> 3   3  6
> write.csv(da.df,file="gloop.csv",row.names=FALSE)
>
> gives me this:
>
> "no good","no good at all"
> 1,4
> 2,5
> 3,6
>
> Are you sure that R didn't add the periods when you created the data frame?
> They would then be written into the CSV file.
>
> Jim
>
>
>
> On Sun, Sep 6, 2015 at 12:09 PM, Dan Abner  wrote:
>
>> Hi all,
>>
>> I have a number of columns with spaces in the column names exactly as
>> I want them in R. When I go to export the data table to csv using
>> write.csv(), the fn adds periods in place of the spaces.
>>
>> Is there a way to suppress the addition of the periods?
>>
>> Thanks,
>>
>> Dan
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if else statement for rain data to define zero for dry and one to wet

2015-06-06 Thread Dennis Murphy
I'm sorry, but I have to take issue with this particular use case of
ifelse(). When the goal is to generate a logical vector, ifelse() is
very inefficient. It's better to apply a logical condition directly to
the object in question and multiply the result by 1 to make it
numeric/integer rather than logical.

To illustrate this, consider the following toy example. The function
f1 replicates the suggestion to apply ifelse() columnwise (with the
additional overhead of preallocating storage for the result), whereas
the function f2 applies the logical condition on the matrix itself
using vectorization, with the recognition that a matrix is an atomic
vector with a dim attribute.

set.seed(5290)

# 1000 x 1000 matrix
m - matrix(sample(c(0, 0.05, 0.2), 1e6, replace = TRUE), ncol = 1000)

f1 - function(mat)
  {
 newmat - matrix(NA, ncol = ncol(mat), nrow = nrow(mat))
 for(i in seq_len(ncol(mat)))
 newmat[, i] - ifelse(mat[, i]  0.1, 1, 0)
 newmat
  }

f2 - function(mat) 1 * (mat  0.1)


On my system, I got

 system.time(m1 - f1(m))
   user  system elapsed
   0.140.000.14

 system.time(m2 - f2(m))
   user  system elapsed
   0.010.000.01

 identical(m1, m2)
[1] TRUE

The all too common practice of using  ifelse(condition, 1, 0) on an
atomic vector is easily replaced by 1 * (condition), where the result
of condition is a logical atomic object coerced to numeric.

To reduce memory, one should better define f2 as

f2 - function(mat) 1L * (mat  0.1)

but doing so in this example no longer creates identical objects since

 typeof(m1)
[1] double

Thus, f1 is not only inefficient in terms of execution time, it's also
inefficient in terms of storage.

Given several recent warnings in this forum about the inefficiency of
ifelse() and the dozens of times I've seen the idiom implemented in f1
as a solution over the last several years (to which I have likely
contributed in my distant past as an R-helper), I felt compelled to
say something about this practice, which BTW extends not just to 0/1
return values but to
0/x return values, where x is a nonzero real number.

Dennis


On Sat, Jun 6, 2015 at 12:50 AM, Jim Lemon drjimle...@gmail.com wrote:
 Hi rosalinazairimah,
 I think the problem is that you are using if instead of ifelse. Try this:

 wet_dry-function(x,thresh=0.1) {
  for(column in 1:dim(x)[2]) x[,column]-ifelse(x[,column]=thresh,1,0)
  return(x)
 }
 wet_dry(dt)

 and see what you get.

 Also, why can I read your message perfectly while everybody else can't?

 Jim

 -Original Message-
 From: roslina...@gmail.com
 Sent: Fri, 5 Jun 2015 16:49:08 +0800
 To: r-help@r-project.org
 Subject: [R] if else statement for rain data to define zero for dry and
 one to wet

 Dear r-users,

 I have a set of rain data:

 X1950 X1951 X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960 X1961
 X1962

 1   0.0   0.0  14.3   0.0  13.5  13.2   4.0 0   3.3 0 0   0.0


 2   0.0   0.0  21.9   0.0  10.9   6.6   2.1 0   0.0 0 0   0.0


 3  25.3   6.7  18.6   0.8   2.3   0.0   8.0 0   0.0 0 0  11.0


 4  12.7   3.4  37.2   0.9   8.4   0.0   5.8 0   0.0 0 0   5.5


 5   0.0   0.0  58.3   3.6  21.1   4.2   3.0 0   0.0 0 0  15.9


 I would like to go through each column and define each cell with value
 greater than 0.1 mm will be 1 and else zero. Hence I would like to attach
 the rain data and the category side by side:


 1950   state

 1 0.00

 2 0.00

 3 25.3   1

 4 12.7   1

 5 0.00


 ...


 This is my code:


 wet_dry  - function(dt)

 { cl   - length(dt)

   tresh  - 0.1


   for (i in 1:cl)

   {  xi - dt[,i]

  if (xi  tresh ) 0 else 1

   }

 dd - cbind(dt,xi)

 dd

 }


 wet_dry(dt)


 Results:

 wet_dry(dt)

X1950 X1951 X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960
 X1961
 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 X1973
 X1974 X1975 X1976 X1977

 10.0   0.0  14.3   0.0  13.5  13.2   4.0   0.0   3.3   0.0   0.0
 0.0
   4.2   0.0   2.2   0.0   4.4   5.1 0   7.2   0.0   0.0   0.0   5.1
 0   0.0 0   0.3

 20.0   0.0  21.9   0.0  10.9   6.6   2.1   0.0   0.0   0.0   0.0
 0.0
   8.4   0.0   4.0   0.0   4.9   0.7 0   0.0   0.0   0.0   0.0   5.4
 0   3.3 0   0.3

 3   25.3   6.7  18.6   0.8   2.3   0.0   8.0   0.0   0.0   0.0   0.0
 11.0
   4.2   0.0   2.0   0.0  14.2  17.1 0   0.0   0.0   0.0   0.0   2.1
 0   1.7 0   4.4

 4   12.7   3.4  37.2   0.9   8.4   0.0   5.8   0.0   0.0   0.0   0.0
 5.5
   0.0   0.0   5.4   0.0   6.4  14.9 0  10.1   2.9 143.4   0.0   6.1
 0   0.0 0  33.5


 It does not work and give me the original data.  Why is that?


 Thank you so much for your help.

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 

Re: [R] geom_text in ggplot (position)

2015-05-13 Thread Dennis Murphy
This problem is discussed in Winston Chang's R Graphics Handbook,
section 3.9. Adapting his code to this example:

test  - data.frame(variables =  c(PE_35, PE_49),
value1=c(13,3),
value2=c(75,31),
value3=c(7,17),
value4 =c(5,49))

library(reshape2)
library(ggplot2)
library(plyr)

m - melt(test, variables)
m$cO - gl(2, 2, length = nrow(m), labels = c(A, B))
m$cat - gl(2, 4, labels = c(0, 1))


names(m)[3] - recuento

mm - ddply(m, .(variables, cat), mutate,
   pos = cumsum(recuento) - 0.5 * recuento)   ## this
is the key part

ggplot(mm, aes(x = cat, y = recuento, fill = cO)) +
geom_bar(stat = identity) +
geom_text(aes(y = pos, label = recuento)) +
facet_wrap(~ variables)

Dennis

On Tue, May 12, 2015 at 9:50 AM, AURORA GONZALEZ VIDAL
aurora.gonzal...@um.es wrote:
 Hello everybody.

 I have an esthetic question. I have managed to create a stacked and
 grouped bar plot but I don't manage with putting the text in the middle of
 the bar plots. Do you know how to write the numbers in that position?

 Thank you so much.

 Example code:

 test  - data.frame(variables =  c(PE_35, PE_49),
 value1=c(13,3),
 value2=c(75,31),
 value3=c(7,17),
 value4 =c(5,49))

 library(reshape2) # for melt

 melted - melt(test, variables)
 melted$cO - c(A,A,B,B,A,A,B,B)

 melted$cat - ''
 melted[melted$variable == 'value1' | melted$variable == 'value2',]$cat -
 0
 melted[melted$variable == 'value3' | melted$variable == 'value4',]$cat -
 1

 names(melted)[3] - recuento

 library(ggplot2)

 ggplot(melted, aes(x = cat, y = recuento,ymax=max(recuento)*1.05, fill =
 cO)) +
   geom_bar(stat = 'identity', position = 'stack', col=black) +
 facet_grid(~ variables)+
   geom_text(aes(label = recuento), size = 5, hjust = 0.5, vjust = 1,
 position =stack)


 --
 Aurora González Vidal

 Sección Apoyo Estadístico.
 Servicio de Apoyo a la Investigación (SAI).
 Vicerrectorado de Investigación.
 Universidad de Murcia
 Edif. SACE . Campus de Espinardo.
 30100 Murcia

 @. aurora.gonzal...@um.es
 T. 868 88 7315
 F. 868 88 7302
 www.um.es/sai
 www.um.es/ae

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R code/package for calculation of Wasserstein distance between two densities

2015-04-20 Thread Dennis Murphy
Hi Ranjan:

Try this:

library(sos)
findFn(Wasserstein)

It appears there are three packages that might be relevant:
HistDAWass, transport and TDA.

HTH,
Dennis

On Sat, Apr 18, 2015 at 8:06 PM, Ranjan Maitra
maitra.mbox.igno...@inbox.com wrote:
 Dear friends,

 Before reinventing the wheel, I was wondering if anyone can point me to code 
 for calculating the Wasserstein distance between two densities. I am 
 particularly interested in mixture densities (in functional form). I know 
 that we have the earthmovers distance in R via the emdist package but it 
 appears to me upon a quick look that this can not handle densities in 
 functional form. So, I was wondering if anyone had any ideas on code for this 
 problem.

 Many thanks and best wishes,
 Ranjan

 
 Can't remember your password? Do you need a strong and secure password?
 Use Password manager! It stores your passwords  protects your account.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transpose a data frame according to a specific variable

2015-02-09 Thread Dennis Murphy
One way is to use the reshape2 package:

library(reshape2)
dcast(DF, id ~ Year, value.var = Day)


Dennis

On Mon, Feb 9, 2015 at 7:47 AM, jeff6868
geoffrey_kl...@etu.u-bourgogne.fr wrote:
 Dear R-users,

 I would like to transpose a large data.frame according to a specific column.
 Here's a reproductible example, it will be more understandable.

 At the moment, my data.frame looks like this example:

 DF - data.frame(id=c(A,A,A,B,B,B,C,C,C),
 Year=c(2001,2002,2003,2002,2003,2004,2000,2001,2002),
 Day=c(120,90,54,18,217,68,164,99,48))

 I would like it being transformed to this (fake example again, still just
 for being understandable):

 finalDF -
 data.frame(id=c(A,B,C),2000=c(NA,NA,164),2001=c(120,NA,99),
 2002=c(90,18,48),2003=c(54,217,NA),2004=c(NA,68,NA))

 Any ideas for doing this easily? I haven't found any good answer on the web.

 Thanks for the help!




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/transpose-a-data-frame-according-to-a-specific-variable-tp4702971.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] color heatmap according to value ranges

2015-02-06 Thread Dennis Murphy
Hi:

You get a gradient for your response variable because it is numeric.
You need to convert it to factor. (Discrete color sets need to be
mapped to discrete variables - a factor is one way to generate a
discrete variable.)  Here is one approach to get what you appear to be
looking for.

DF - data.frame(a = rep(1:3, each = 3),
 b = 1:3, d = 1:9)
DF$e - cut(DF$d, c(0, 2, 3, 5, 10), rightmost = TRUE,
 labels = c(1-2, 2-3, 3-5, 5))

library(ggplot2)
ggplot(DF, aes(x = a, y = b, fill = e)) +
   geom_tile() +
   scale_fill_manual(values = c(darkblue, blue, lightblue, white))

You need a border of some kind around the plot since the right side of
the graph is the same color as the default background. If you're OK
with the default graph above, that's fine. If you want to expand it to
the edges, you need to draw your own border, something like

ggplot(DF, aes(x = a, y = b, fill = e)) +
   geom_tile() +
   scale_fill_manual(values = c(darkblue, blue, lightblue, white)) +
  # eliminate the padding from each scale
   scale_x_continuous(expand = c(0, 0)) +
   scale_y_continuous(expand = c(0, 0)) +
 # draw the border
   theme(panel.border = element_rect(colour = black, fill = NA))

The problem that remains is an inability to distinguish the legend key
color in the last category from the plot background. The simplest
workaround is to change the color of the last category in the scale
specification - one suggestion is a light gray, but you can always
substitute another color:

ggplot(DF, aes(x = a, y = b, fill = e)) +
   geom_tile() +
   scale_fill_manual(values = c(darkblue, blue, lightblue, grey90)) +
   scale_x_continuous(expand = c(0, 0)) +
   scale_y_continuous(expand = c(0, 0)) +
   theme(panel.border = element_rect(colour = black, fill = NA))

Dennis

On Fri, Feb 6, 2015 at 7:13 AM,  n.hub...@ncmls.ru.nl wrote:
 Probably the simplest thing there is, but I can't get it to work:



 Example for my data:



 a - c(1,1,1,2,2,2,3,3,3)

 b - c(1,2,3,1,2,3,1,2,3)

 c - c(1,2,3,4,5,6,7,8,9)

 df - data.frame(cbind(a,b,c))



 I create a heat map with c being the values:



 ggplot(df, aes(df$a, df$b, fill = df$c)) + geom_raster()



 problem:

 The color coding is automatically a gradient. However, I would like to color 
 in 4 fixed colors dependent on the value in c. For example:



 if c=2 color darkblue

 if 2c=3 color blue

 if 3c=5 color lightblue

 if c5 color white



 In addition I would like to show a legend that illustrates this color coding.



 It must be very easy, but I just can't figure it out. Only find commands that 
 make gradients...



 Thanks a lot in advance!




 __

 Dr. Nina C. Hubner
 scientist quantitative proteomics

 Department of Molecular Biology, Radboud University Nijmegen, The Netherlands
 e-mail: n.hub...@ncmls.ru.nl
 tel: +31-24-3613655

 Visiting address:
 Department of Molecular Biology, RIMLS, 2nd floor
 Geert Grooteplein 26/28
 6525 GA Nijmegen
 The Netherlands



 The Radboud University Medical Centre is listed in the Commercial Register of 
 the Chamber of Commerce under file number 41055629.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Paste every two columns together

2015-01-28 Thread Dennis Murphy
Hi:

Don't know about performance, but this is fairly simple for operating
on atomic vectors:

x - c(A, A, G, T, C, G)
apply(embed(x, 2), 1, paste0, collapse = )
[1] AA GA TG CT GC

Check the help page of embed() for details.

Dennis

On Wed, Jan 28, 2015 at 3:55 PM, Kate Ignatius kate.ignat...@gmail.com wrote:
 I have genetic data as follows (simple example, actual data is much larger):

 comb =

 ID1 A A T G C T G C G T C G T A

 ID2 G C T G C C T G C T G T T T

 And I wish to get an output like this:

 ID1 AA TG CT GC GT CG TA

 ID2 GC TG CC TG CT GT TT

 That is, paste every two columns together.

 I have this code, but I get the error:

 Error in seq.default(2, nchar(x), 2) : 'to' must be of length 1

 conc - function(x) {
   s - seq(2, nchar(x), 2)
   paste0(x[s], x[s+1])
 }

 combn - as.data.frame(lapply(comb, conc), stringsAsFactors=FALSE)

 Thanks in advance!

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to determine power in my analysis?

2014-11-08 Thread Dennis Murphy
Hi Kristi:

I think this paper elucidates the problem Bert mentioned. A thorough
and careful reading of the last two sections should clarify what
post-hoc power is and is not.

http://www.stat.uiowa.edu/files/stat/techrep/tr378.pdf

Dennis

On Sat, Nov 8, 2014 at 11:25 AM, Kristi Glover
kristi.glo...@hotmail.com wrote:
 Hi Bert, Thanks for the message. So far I know we can test whether my sample 
 size in my analysis is enough or not. It is also post hoc property. For 
 example, we can calculate standard deviations, error variance  etc in the 
 data sets, and then we can use them to determine whether the sample size was 
 enough or not with certain level of alpha and power. we can do it is some of 
 the statistical programs, but I was not aware in R. thanks KG

 Date: Sat, 8 Nov 2014 10:55:56 -0800
 Subject: Re: [R] how to determine power in my analysis?
 From: gunter.ber...@gene.com
 To: kristi.glo...@hotmail.com
 CC: r-h...@stat.math.ethz.ch

 Kristi:

 Power is a prespecified property of the design, not a post hoc
 property of the analysis (SAS procedures notwithstanding). So you're a
 day late and a dollar short.

 I suggest you consult with a local statistician about such matters, as
 you appear to be out of your depth.

 Cheers,
 Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 Clifford Stoll




 On Sat, Nov 8, 2014 at 3:49 AM, Kristi Glover kristi.glo...@hotmail.com 
 wrote:
  Hi R Users,
  I was trying to determine whether I have enough samples and power in my 
  analysis. Would you mind to provide some hints?.  I found a several 
  packages for power analysis but did not find any example data. I have two 
  sites and each site has 4 groups. I wanted to test whether there was an 
  effect of restoration activities and sites on the observed value. I used a 
  two way factorial ANOVA and now I wanted to test the power of the analysis 
  (whether the sample sizes are enough for the analysis? what are the alpha 
  and power in the analysis using this data set? if it is not enough, how 
  much samples should be collected for alpha 0.05 and power=0.8 and 0.9 for 
  the analysis (two way factorial analysis).
  The example data:data-structure(list(observedValue = c(0.08, 0.53, 0.14, 
  0.66, 0.37, 0.88, 0.84, 0.46, 0.3, 0.61, 0.75, 0.82, 0.67, 0.37, 0.95, 
  0.73, 0.74, 0.69, 0.06, 0.97, 0.97, 0.07, 0.75, 0.68, 0.53, 0.72, 0.34, 
  0.12, 0.49, 0.77, 0.45, 0.07, 0.97, 0.34, 0.68, 0.48, 0.65, 0.7, 0.57, 
  0.66, 0.4, 0.29, 0.88, 0.36, 0.68, 0.32, 0.8, 0, 0.11, 0.48, 0.85, 0.94, 
  0.12, 0.12, 0, 0.89, 0.66, 0.2, 0.57, 0.09, 0.27, 0.81, 0.53, 0.09, 0.5, 
  0.41, 0.89, 0.47, 0.39, 0.85, 0.71, 0.89, 0.01, 0.71, 0.42, 0.72, 0.62, 
  0.3, 0.56, 0.99, 0.97, 0.03, 0.09, 0.27, 0.27, 0.94, 0.23, 0.97, 0.81, 
  0.95), condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
  3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
  4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c(goo!
 d!
  , !
   medium, poor, verygood), class = factor), areas = structure(c(1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
  2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
  .Label = c(Restored, unrestored), class = factor)), .Names = 
  c(observedValue, condition, areas), class = data.frame, row.names 
  = c(NA, -90L))
  test= aov(observedValue~condition*areas,data=data)summary(test)
  power of the analysis?
  thanks for your help.
  Sincerely, KG
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot barplot error

2014-05-15 Thread Dennis Murphy
Hi:

There are several options, but you shot yourself in the foot by using
cbind() to combine objects of different classes into a matrix, so
everything ends up being a factor because a matrix is an atomic object
with a dim attribute, and atomic objects must have a common class, so
Flows and Category are coerced to factors. What you want is to combine
your atomic objects into a data frame, since that is what ggplot()
requires:

#--
Date - c(2014-04-01,2014-04-02,2014-04-03,2014-04-04,2014-04-07,
  2014-04-09,2014-04-10,2014-04-01,2014-04-02,2014-04-03,
  2014-04-04,2014-04-07,2014-04-09,2014-04-10)

Flows - c(479.6, 187.2,  148.6,   41.5,  123.5,  176.3,   68.3,  401.5,
   -164.2, -195.5, 35.1, -224.0,  -58.6, -138.9)

Category - c(Equity, Equity, Equity, Equity, Equity, Equity,
  Equity, Debt, Debt,   Debt,   Debt,   Debt,
  Debt,  Debt)

# Use data.frame, not cbind
DF - data.frame(Date, Flows, Category)

#-

Your ggplot() code will work, but probably not the way you intended.
Here are a couple of versions of your initial plot:

#--
library(ggplot2)

# Initial plot, stacked
ggplot(data = DF, aes(x = Date, y = Flows, fill = Category)) +
geom_bar(stat=identity)

# Initial plot, dodged
ggplot(data = DF, aes(x = Date, y = Flows, fill = Category)) +
geom_bar(stat=identity, position = dodge)
#--


I'm guessing what you want is to 'stack' the dodged version above; if
so, you need to plot the two groups separately and create a fill
factor 'on the fly':

#---
# Stacking the dodged plot - requires separate calls
# to each subset so that zero is the anchor

ggplot(data = DF, mapping = aes(x = Date, y = Flows)) +
geom_bar(data = subset(DF, Category == Equity),
 stat = identity, aes(fill = Equity)) +
geom_bar(data = subset(DF, Category == Debt),
 stat = identity, aes(fill = Debt)) +
scale_fill_manual(Category,
  values = c(Equity = blue, Debt = red))
#---


If you want to plot on the actual dates rather than the given dates,
and perhaps to change the date format since the year is common, here
is one way to do it:

#---
# Convert to Date format so that we can plot by actual date
DF$Date - as.Date(as.character(DF$Date))

# Need scales package to change date format
library(scales)

ggplot(data = DF, mapping = aes(x = Date, y = Flows)) +
geom_bar(data = subset(DF, Category == Equity),
 stat = identity, aes(fill = Equity)) +
geom_bar(data = subset(DF, Category == Debt),
 stat = identity, aes(fill = Debt)) +
scale_fill_manual(Category,
  values = c(Equity = blue, Debt = red)) +
scale_x_date(labels = date_format(%m-%d))
#---



I hope this covers your use case.

Dennis

On Thu, May 15, 2014 at 4:32 AM, Vikram Bahure
economics.vik...@gmail.com wrote:
 Hi,

 I am facing issue with ggplot plotting. I have given the data and the code
 below. I have also attached the graph below.

 Problem:
 On date *2014-04-07*, we have debt and equity flow, but it does not shows
 equity flow. Ideally what I would like is the stack for debt/equity starts
 from X axis.

 *## Data*
 *Date -
 c(2014-04-01,2014-04-02,2014-04-03,2014-04-04,2014-04-07,*
 *
 2014-04-09,2014-04-10,2014-04-01,2014-04-02,2014-04-03,*
 *  2014-04-04,2014-04-07,2014-04-09,2014-04-10)*
 *Flows - c(479.6,187.2,  148.6,   41.5,  123.5,  176.3,   68.3,  401.5,*
 *   -164.2, -195.5, 35.1, -224.0,  -58.6, -138.9)*
 *Category - c(Equity, Equity, Equity, Equity, Equity, Equity,*
 *  Equity, Debt, Debt,   Debt,   Debt,   Debt,*
 *  Debt,  Debt)*
 *data - cbind(Date,Flows,Category)*

 *## GGplot*
 *ggplot(data = fii.df, aes(x = Date, y = FII, fill = Category)) +
 geom_bar(stat=identity) *

 Thanks in advance.

 Regards
 Vikram Bahure

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to plot data using ggplot

2014-04-04 Thread Dennis Murphy
1. Look into the ggmap package if you want to overlay your data onto a map.
2. Re your color scale representation, define 'appropriately'. Do you
mean a continuous range expressible in a colorbar or a discrete range,
and if the latter, what intervals did you have in mind?

Dennis

On Fri, Apr 4, 2014 at 2:42 AM, drunkenphd enrr...@fimif.upt.al wrote:
 Hi,
 I have a list of cities and their coordinates, and also for each city I have
 a variable varA which I want to represent in a map using ggplot.
 For example :

 CityA lat 22.93977  lon 46.70663varA 545

 CityB lat 23.93977  lon 46.70663varA 122

 VarA values begin from 0 to 3000.
 I want the color scale to represent  this range appropriately.
 Can you help
 Regards



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-plot-data-using-ggplot-tp4688168.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating table with sequences of numbers based on the table

2014-03-13 Thread Dennis Murphy
Less coding with plyr:

tab - read.table(text=pop Freq
1   1   30
2   2   25
3   3   30
4   4   30
5   5   30
6   6   30
7   7   30,sep=,header=TRUE)

# Function to do the work on each row
f - function(pop, Freq) data.frame(ind = seq_len(Freq))

library(plyr)
u - mdply(tab, f)[, -2]

Dennis

On Thu, Mar 13, 2014 at 8:01 AM, arun smartpink...@yahoo.com wrote:
 Hi,
 Try:
 Either

 tab - read.table(text=pop Freq
 1   1   30
 2   2   25
 3   3   30
 4   4   30
 5   5   30
 6   6   30
 7   7   30,sep=,header=TRUE)

 indx - rep(1:nrow(tab),tab$Freq)
 tab1 - transform(tab[indx,],ind=ave(seq_along(indx),indx,FUN=seq_along))[,-2]
 #or
 tab2 -  transform(tab[indx,],ind=unlist(sapply(tab$Freq,seq)))[,-2]
 identical(tab1,tab2)
 #[1] TRUE
 #or
 tab3 - transform(tab[indx,], ind= 
 with(tab,seq_len(sum(Freq))-rep(cumsum(c(0L,Freq[-length(Freq)])),Freq)))[,-2]
 identical(tab1,tab3)
 #[1] TRUE

 A.K.


 I have a problem with transfering one table to another automatically. From 
 table like this:

 tab
   pop Freq
 1   1   30
 2   2   25
 3   3   30
 4   4   30
 5   5   30
 6   6   30
 7   7   30

 I want to use number of individuals (freq) and then in next
 table just list them with following numbers (depending on total number
 of individuals)
 Like this:
 in
 popind

 1  1
 1  2
 1  3
 1  4
 .   .
 .   .
 1  30
 2  1
 2  2
 2  3
 2  4
 .   .
 2  25
 3  1
 3  2
 .   .
 .   .

 How can i do it? I think i have to use loops but so far I failed.
 Thank you in advance,
 Best,
 Malgorzata Gazda

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Math symbols in ggplot facets

2014-02-22 Thread Dennis Murphy
Hi:

You were close...

library(ggplot2)
m - mpg

# Set the factor labels with plotmath code (note the ==)
m$drv - factor(m$drv, labels = c(sigma[0] == sqrt(2),
  sigma[0] == 2 * sqrt(2),
  sigma[0] == 3 * sqrt(2)))

ggplot(m, aes(x = displ, y = cty)) + geom_point() +
  facet_grid(. ~ drv, labeller = label_parsed)

Dennis

On Sat, Feb 22, 2014 at 3:24 AM, Lars Bishop lars...@gmail.com wrote:
 Hello,

 I would like to show in my facet labels the equivalent in LaTex of
 $\sigma_{0}= \sqrt{2}$. I think I'm close below, but not yet as it shows
 $(\sigma_{0}, \sqrt{2})$

 m - mpg
 levels(m$drv) - c(sigma[0]=sqrt(2), sigma[0]=2 * sqrt(2), sigma[0]= 3
 * sqrt(2))

 ggplot(m, aes(x = displ, y = cty)) + geom_point() +
   facet_grid(. ~ drv, labeller = label_parsed)


 Thanks,
 Lars.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Legend text populated with calculated values and super script?

2014-02-07 Thread Dennis Murphy
Here's a bquote version:

x=c(1,2,3,4);  y=c(1,2,3,4); z=c(1.25,1.5,2.5,3.5)

# first stats based on data, used to populate legend
wdt_n = 50;  wdt_mbias = 0.58
wdt_mae = 2.1;  wdt_R2 = 0.85
# second stats based on data, used to populate legend
spas_n = 50; spas_mbias = 0.58
spas_mae = 2.1; spas_R2 = 0.85

wleg - bquote(paste(WDT (, N == .(wdt_n), , ,
   Bias == .(wdt_mbias), , ,
   MAE == .(wdt_mae), , ,
   R^2 == .(wdt_R2), )))
sleg - bquote(paste(SPAS (, N == .(spas_n), , ,
   Bias == .(spas_mbias),
   , , MAE == .(spas_mae),
   , , R^2 == .(spas_R2), )))

plot(x,y, col=red1, pch=1); lines(x,z, type=p, col=green4,pch=3)

legend(topleft, legend = as.expression(c(sleg, wleg)),
col=c(red1,green4), pch=c(1,3),
  cex=0.85)

Dennis

On Fri, Feb 7, 2014 at 4:58 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Feb 7, 2014, at 7:54 AM, Douglas M. Hultstrand wrote:

 Hello,

 I am trying to generate a plot legend that contains calculated summary
 statistics, one statistic is R^2.  I have tried several variations using
 the commands expression and bqoute as stated on the R help pages.  I
 have not been able to get the R^2 super script correct along with the
 calculated statistics.

 I provided an example below, what I want is the legend (wdt_leg and
 spas_leg) as below but with the R^2 as superscript.

 # Example Data
 x=c(1,2,3,4);  y=c(1,2,3,4); z=c(1.25,1.5,2.5,3.5)

 # first stats based on data, used to populate legend
 wdt_n = 50;  wdt_mbias = 0.58
 wdt_mae = 2.1;  wdt_R2 = 0.85
 # second stats based on data, used to populate legend
 spas_n = 50; spas_mbias = 0.58
 spas_mae = 2.1; spas_R2 = 0.85

 # create legend
 wdt_leg - paste(WDT (N = , wdt_n,, Bias = ,wdt_mbias,, MAE =
 ,wdt_mae, , R2 = , wdt_R2, ), sep=)
 spas_leg - paste(SPAS (N = , spas_n,, Bias = ,spas_mbias,, MAE =
 ,spas_mae, , R2 = , spas_R2, ), sep=)

 # create plot
 plot(x,y, col=red1, pch=1); lines(x,z, type=p, col=green4,pch=3)
 leg.txt - c(spas_leg, wdt_leg)
 legend(topleft, legend = leg.txt, col=c(red1,green4), pch=c(1,3),
 cex=0.85)

 sublist -
 structure(list(wdt_n = 50, wdt_mbias = 0.58, wdt_mae = 2.1, wdt_R2 = 0.85,
 spas_n = 50, spas_mbias = 0.58, spas_mae = 2.1, spas_R2 = 0.85), .Names = 
 c(wdt_n,
 wdt_mbias, wdt_mae, wdt_R2, spas_n, spas_mbias, spas_mae,
 spas_R2))

 legends -c(
   substitute(
 atop(WDT * group((, list(N == wdt_n, Bias == wdt_mbias, MAE == wdt_mae 
 ), ) ),
   R^2 == wdt_R2 ) ,  env=sublist),
   substitute(
 atop(SPAS * group((, list(N == spas_n, Bias == spas_mbias, MAE == 
 spas_mae ), )),
   R^2 == spas_R2 ), env=sublist)  )

 # I tried with: as.expression( lapply( exprlist, function(e) bquote(e) ) ) 
 but failed repeatedly.
 # In order to get `substitute` to cook the bacon, you need to use real 
 expressions, not text.

 plot(x,y, col=red1, pch=1); lines(x,z, type=p, col=green4,pch=3)
 legend(topleft,
legend = as.expression(legends),
col=c(red1,green4), pch=c(1,3),
cex=0.85)


 --
 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] passing variable names to dplyr

2014-01-28 Thread Dennis Murphy
David:

I privately suggested he post to manipulatr because Hadley is more
likely to see his question there first than in R-help. He originally
posted here, noted the cross-posting and referral at manipulatr and
responded back to this list when he got a successful reply from
Hadley. I don't see that he's done anything wrong; in fact, he's been
exceptionally polite. If you want to yell at anybody, yell at me.

Dennis

On Tue, Jan 28, 2014 at 3:33 PM, David Winsemius dwinsem...@comcast.net wrote:

 On Jan 27, 2014, at 7:45 AM, Bos, Roger wrote:

 All,

 I would like to figure out how to pass variable names to the dplyr function 
 mutate.  For example, this works because hp is one of the variable names on 
 mtcars:

 mutate(mtcars, scale(hp))

 Let's says I want to pass in the target variable instead of hard-coding the 
 name, as follows:

 target - hp
 mutate(mtcars, scale(target))

 That dones't work.  I read somewhere about using lapply, but that suggestion 
 didn't work for me either:

 target - lapply(hp, as.symbol)
 mutate(mtcars, scale(target))

 Does anyone know how to do this?

 You cross-posted this to the manipulatr newsgroup (where it was addressed). 
 Crossposting is a practice which is not appreciated in R lists.

 --

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] demonstrating R in introductory class using point-and-click software

2014-01-15 Thread Dennis Murphy
Hi Ranjan:

I think this is an important and well-posed question. AFAIR, sciviews
provides a fairly sophisticated GUI for R, but I haven't looked at it
in a while. The two packages I'd suggest you look at are John Fox's R
Commander (package Rcmdr...and note all the plug-ins!) and Ian
Fellows' Deducer package, both of which use a more sophisticated GUI
than the R GUI console in Windows.

Deducer runs on top of Java; it needs both rJava and JGR. If you're
using 64-bit R, you also need 64-bit Java installed; if you're content
with 32-bit R, then 32-bit Java is all you need. I know that Deducer
runs pretty smoothly on 32-bit R, but for 64-bit, you'll need to be a
bit more vigilant about the Java interface. For classroom purposes, I
would think 32-bit R is sufficient.

R Commander uses Tcl/Tk to program the GUI, and since a version of
Tcl/Tk comes bundled with binary versions of R, you don't have to
worry too much about the interface.

Take notice that both packages have plug-in packages that you can load
on top of the standard GUI.

I find it a little disheartening that in this era, engineers in
training are averse to programming, though. Sigh..

HTH,
Dennis

On Wed, Jan 15, 2014 at 7:22 PM, Ranjan Maitra
maitra.mbox.igno...@inbox.com wrote:
 Dear friends,

 OK, I did not think that it would ever come down to this, but I am
 here with a question on what would be the best point-and-click approach
 to using R in the classroom in a way that the students can also follow
 and exhibit (on their own).

 So let me explain: I am teaching an introductory-level statistics class
 for introductory first- and second-year civil and industrial
 engineering students. This is a basic class following the book (not
 important): Basic Engineering Data Collection and Analysis by Stephen B.
 Vardeman and John Marcus. The class is very basic, and has
 traditionally relied on JMP and Excel (less prevalent) to illustrate
 data examples. I don't want to use either because I am a proponent of
 OSS, and also because I find these two too cumbersome to handle. Also,
 I don't think I have the time (and the students do not have the
 inclination, I am told) to handle even basic interactive programming.
 So, I was wondering if people with more experience would have
 suggestions on what would be best to use.

 I apologize if this has been discussed quite a bit here, but as I said
 before, I did not think that it would come to this, so I basically did
 not pay much attention.

 Thanks very much for suggestions and experiences!
 Best wishes,
 Ranjan



 --
 Important Notice: This mailbox is ignored: e-mails are set to be
 deleted on receipt. Please respond to the mailing list if appropriate.
 For those needing to send personal or professional e-mail, please use
 appropriate addresses.

 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at 
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and 
 most webmails

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with removing extra legend elements in ggplot

2013-11-19 Thread Dennis Murphy
The additional element comes from this code:

geom_point(aes(color = VegType, shape = VegType, size = 10))

Take the size argument outside the aes() statement and the legend will
disappear:

geom_point(aes(color = VegType, shape = VegType), size = 10)

The aes() statement maps a variable to a plot aesthetic. In this case
you're mapping VegType to color and shape. You want to *set* the size
aesthetic to a constant value, and that is done by assigning the value
10 to the size aesthetic outside of aes().

Dennis

On Tue, Nov 19, 2013 at 4:35 PM, Matthew Van Scoyoc sco...@gmail.com wrote:
 No dice. I still get the 10 legend element.
 Thanks for the quick reply.
 Cheers,

 MVS
 =
 Matthew Van Scoyoc
 Graduate Research Assistant, Ecology
 Wildland Resources Department http://www.cnr.usu.edu/wild/  Ecology
 Center http://www.usu.edu/ecology/
 Quinney College of Natural Resources http://cnr.usu.edu/
 Utah State University
 Logan, UT

 mvansco...@aggiemail.usu.eduhttps://sites.google.com/site/scoyoc/
 =
 Think SNOW!


 On Tue, Nov 19, 2013 at 5:12 PM, David Winsemius 
 dwinsem...@comcast.netwrote:


 On Nov 19, 2013, at 3:44 PM, Matthew Van Scoyoc wrote:

  I can't get the fine tuning right with my legend. I get an extra legend
  element 10 which is the point size in my plot. Can someone help me get
 rid
  of this extra element? Additionally I would also like to reduce the size
 of
  the legend.
 
  If you want to reproduce my figure you can  download my data in csv
 format
  here
  
 https://github.com/scoyoc/EcoSiteDelineation/blob/master/VegNMDS_scores.csv
 
  .
 
  Here is my code...
 
  veg.nmds.sc = read.csv(VegNMDS_scores.csv, header = T)
 
  nmds.fig = ggplot(data = veg.nmds.sc, aes(x = NMDS1, y = NMDS2))
  nmds.fig + geom_point(aes(color = VegType, shape = VegType, size = 10))
 +
  scale_colour_manual(name = Vegetation Type,
  values = c(blue, magenta, gray50, red,
  cyan3,
 green4, gold)) +
  scale_shape_manual(name = Vegetation Type, values = c(15, 16, 17, 18,
  15, 16, 17)) +
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid.major =
  element_blank(),
panel.grid.minor = element_blank(),
legend.key = element_rect(color = white)
)
 
  I have been messing around with
  theme(..., legend.key.size = unit(1, cm))
  but I keep getting the error could not find function unit. I'm not sure
  why, isn't unit supposed to be part of the legend.key argument?

 Try this workaround to what sounds like a bug:

 library(grid)

 # then repeat the call.

 --

 David Winsemius
 Alameda, CA, USA



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optimization

2013-11-16 Thread Dennis Murphy
 There are lots of errors in your code. In particular, the optimization
 routines do not like functions that ignore the parameters.

I would like to nominate this delicious riposte as a fortune
candidate. Anyone to second the motion?

Dennis

On Sat, Nov 16, 2013 at 1:26 PM, Prof J C Nash (U30A) nas...@uottawa.ca wrote:
 There are lots of errors in your code. In particular, the optimization
 routines do not like functions that ignore the parameters.

 And you have not provided out or out1 to the optimizer -- they are
 returned as elements of func(), but not correctly.

 Please try some of the examples for optim or optimx to learn how to
 structure your problem.

 JN


 On 13-11-16 06:00 AM, r-help-requ...@r-project.org wrote:
 Message: 19
 Date: Fri, 15 Nov 2013 09:17:47 -0800 (PST)
 From: IZHAK shabsogh ishaqb...@yahoo.com
 To: r-help@r-project.org r-help@r-project.org
 Subject: [R] optimization
 Message-ID:
   1384535867.58595.yahoomail...@web142506.mail.bf1.yahoo.com
 Content-Type: text/plain

 x1-c(5.548,4.896,1.964,3.586,3.824,3.111,3.607,3.557,2.989,18.053,3.773,1.253,2.094,2.726,1.758,5.011,2.455,0.913,0.890,2.468,4.168,4.810,34.319,1.531,1.481,2.239,4.204,3.463,1.727)
 y-c(2.590,3.770,1.270,1.445,3.290,0.930,1.600,1.250,3.450,1.096,1.745,1.060,0.890,2.755,1.515,4.770,2.220,0.590,0.530,1.910,4.010,1.745,1.965,2.555,0.770,0.720,1.730,2.860,0.760)
 x2-c(0.137,2.499,0.419,1.699,0.605,0.677,0.159,1.699,0.340,2.899,0.082,0.425,0.444,0.225,0.241,0.099,0.644,0.266,0.351,0.027,0.030,3.400,1.499,0.351,0.082,0.518,0.471,0.036,0.721)
 k-rep(1,29)
 x-data.frame(k,x1,x2)

 freg-function(y,x1,x2){
   reg- rlm(y ~ x1 + x2 , data=x)
   return(reg)
 }


  func - function(x1,x2,b){
   fit-freg(y,x1,x2)
 b-c(coef(fit))
   dv-1+ b[2]*x2^b[3]
 dv1-b[2]*x2^b[3]*log(x2)
   out - ( x1/(1+ b[2]*x2^b[3]))
   out1-  c(-x1*x2^b[3]/dv^2,-x1* dv1/dv^2)
 return(list( out,out1))
}
 optim(par=c(b[2],b[3]), fn=out, gr =out1,
 method = c(BFGS),
 lower = -Inf, upper = Inf,
 control = list(), hessian = T)
 can someone help me try running this code because i try many occasion but 
 prove  abortive. the aim is
 to optimize the parameter of the model that is parameter estimates using 
 optimization

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix values linked to vector index

2013-10-11 Thread Dennis Murphy
Attempting to follow the OP's conditions and assuming I understood
them correctly, here is one way to wrap this up into a function:

makeMat - function(x)
{
stopifnot(is.integer(x))
nr - length(x)
nc - max(x)

# Initialize a matrix of zeros
m - matrix(0, nr, nc)
# Conditionally replace with ones
for(i in seq_len(nr)) if(x[i] != 0)  m[i, 1:x[i]] - 1
m
}

## Examples:
x1 - 1:3
x2 - as.integer(c(2, 0, 4, 3, 1))
x3 - c(2, 1, 2.2)

makeMat(x1)
makeMat(x2)
makeMat(x3)
makeMat(4:6)


On Fri, Oct 11, 2013 at 9:49 AM, arun smartpink...@yahoo.com wrote:
 Hi,

 In the example you showed:

 m1- matrix(0,length(vec),max(vec))
 1*!upper.tri(m1)

 #or
  m1[!upper.tri(m1)] -  rep(rep(1,length(vec)),vec)

 #But, in a case like below, perhaps:
 vec1- c(3,4,5)

  m2- matrix(0,length(vec1),max(vec1))
  indx - 
 cbind(rep(seq_along(vec1),vec1),unlist(tapply(vec1,list(vec1),FUN=seq),use.names=FALSE))
 m2[indx]- 1
  m2
 # [,1] [,2] [,3] [,4] [,5]
 #[1,]11100
 #[2,]11110
 #[3,]11111




 A.K.


 Hi-

 I'd like to create a matrix of 0's and 1's where the number of
 1's in each row defined by the value indexed in another vector, and
 where the (value-1) is back-filled by 0's.

 For example, given the following vector:
 vec= c(1,2,3)

 I'd like to produce a matrix with dimensions (length(vec), max(vec)):

 1,0,0
 1,1,0
 1,1,1

 Thank you!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: Library update from version

2013-09-30 Thread Dennis Murphy
In this case the OP probably does need to reinstall contributed
packages since going from 2.15.x to 3.0.y entails a major version
change in R.

Dennis

On Mon, Sep 30, 2013 at 11:49 AM, Steve Lianoglou
lianoglou.st...@gene.com wrote:
 Hi,

 On Mon, Sep 30, 2013 at 11:12 AM, Cem Girit gi...@comcast.net wrote:
 Hello,



 I recently installed version 3.0.1 of R on to a computer. I
 have a working installation for a Statconn application using R version
 2.15.0 on another computer. I have many libraries under this old
 installation. Can I just copy them into the new library from the old, or do
 I install each one of them under the new R?

 No, you shouldn't do that.

 Also how can I get a list of
 differences in two libraries so that I can use this list to update the new
 one?

 A bit of googling could have provided several answers.

 This post in particular has a few answers from some people who know
 what they're doing w/ R, so probably a good place to start:

 http://stackoverflow.com/questions/1401904/painless-way-to-install-a-new-version-of-r-on-windows

 HTH,
 -steve

 --
 Steve Lianoglou
 Computational Biologist
 Bioinformatics and Computational Biology
 Genentech

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combine multiple tables into one

2013-05-01 Thread Dennis Murphy
Isn't this just a block diagonal matrix?

library(pracma)
blkdiag(table1, table2)
 [,1] [,2] [,3] [,4]
[1,]1100
[2,]1200
[3,]0001
[4,]0004


Dennis

On Wed, May 1, 2013 at 5:41 PM, David Winsemius dwinsem...@comcast.net wrote:
 add2blocks - function(m1, m2) { res - cbind(m1, matrix(0, dim(m2)[1], 
 dim(m2)[2]) )
 res - rbind(res,  cbind( matrix(0, dim(m1)[1], 
 dim(m1)[2]), m2) ) }

  new - add2block(table1, table2)
  new
 #---
  [,1] [,2] [,3] [,4]
 row11100
 row21200
 row30001
 row40004

 On May 1, 2013, at 12:37 PM, arun wrote:



 Hi,
 May be this helps:
 dat1- as.data.frame(table1)
  dat2- as.data.frame(table2)
 names(dat2)-c(V3,V4)
 library(plyr)
 res-join(dat1,dat2,type=full)
  res[is.na(res)]- 0
  res
 #  V1 V2 V3 V4
 #1  1  1  0  0
 #2  1  2  0  0
 #3  0  0  0  1
 #4  0  0  0  4
  combinedtable-as.matrix(res)
  colnames(combinedtable)- NULL
  combinedtable
 # [,1] [,2] [,3] [,4]
 #[1,]1100
 #[2,]1200
 #[3,]0001
 #[4,]0004
 A.K.



 Hi,

  I am trying to combine multiple tables into one, where the
 elements that are created as a result of the merge to be filled with
 zeroes.

 In other words, to go from this:

 #Create tables to combine
 row1 - c(1,1)
 row2 - c(1,2)
 row3 - c(0,1)
 row4 - c(0,4)
 table1 - rbind(row1,row2)
 table2 - rbind(row3, row4)
 table1
   [,1] [,2]
 row111
 row212
 table2
   [,1] [,2]
 row301
 row404

 To this:

 #What the combined table should look like if things worked out
 newrow1 - c(1,1,0,0)
 newrow2 - c(1,2,0,0)
 newrow3 - c(0,0,0,1)
 newrow4 - c(0,0,0,4)
 combinedtable - rbind(newrow1, newrow2, newrow3, newrow4)
 combinedtable
   [,1] [,2] [,3] [,4]
 newrow11100
 newrow21200
 newrow30001
 newrow40004

 I tried merge() but it
 doesn't make any sense here, and I also tried to melt() the orginal data
 that table1 and table2 are created from to re-created a combined
 table from one step further back from this but couldn't get it to form
 the right table.

 If that would be an easier solution, here is the starting point
 that table1 and table2 are created from and also the create.table()
 function I wrote to create them:

 create.table - function(a,b){
 obs - unique(c(a,b))
 squ - matrix(rep(0,length(obs)^2),ncol=length(obs))
 for(i in 1:length(obs)){
 for(j in 1:length(obs)){
 squ[i,j] - sum((a==obs[i])*(b==obs[j]), na.rm=TRUE)
 }
 }
 squ
 }

 #Mock data
 sampleid - c(1:5)
 Q1before - c(0,0,1,0,1)
 Q1after - c(0,1,0,0,1)
 Q2before - c(1,0,0,0,0)
 Q2after - c(0,0,0,0,0)

 #This is what my real data looks like
 #It may be easier to reformat this df to a format that could then
 use my create.table() function to get to the combined table endpoint?
 mydf - as.data.frame(cbind(sampleid,Q1before, Q1after, Q2before, Q2after))
 create.table(mydf$Q1before, mydf$Q1after)
 create.table(mydf$Q2before, mydf$Q2after)

 I hope at least some of this makes sense. Thank you for your input, you 
 guys are always the best!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Arranging two different types of ggplot2 plots with axes lined up

2013-04-19 Thread Dennis Murphy
library(ggplot2)
library(gridExtra)

#generate test precipitation data
year-c(2000,2001,2002,2003,2004)
precip-c(46,100,80,74,20)
yp-data.frame(year, precip)

#generate test fecal coliform data
year2-c(2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,
2002,2002,2002,2002,2002,2002,2002,2002,2002,2002,
2003,2003,2003,2003,2003,2003,2003,2003,2003,2003,
2004,2004,2004,2004,2004,2004,2004,2004,2004,2004)
fc-sample(1:1000, 50)
yfc-data.frame(year2, fc)

#make test precipitation plot
yp_plot-ggplot(yp, aes(factor(year), y=precip)) + geom_point() +
 geom_line(aes(group = 1)) +
 labs(title = Site X \n , x = ,
  y = Annual Precipitation (in.) \n ) +
 theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1),
   axis.text.y = element_text(angle = 90, hjust = 0.5, vjust = 0.5))

#make test fecal coliform plot
yfc_plot-ggplot(yfc) + geom_boxplot() + aes(x=factor(year2), y=fc) +
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=1),
axis.text.y = element_text(angle = 90, hjust = 0.5,
vjust = 0.5)) +
  xlab( \n Date) + ylab(Fecal coliforms (cfu/100 mL) \n ) +
  geom_smooth(stat='smooth', aes(group=1), size=1.5) +
  scale_y_log10()

#arrange plots together
grid.arrange(yp_plot, yfc_plot, ncol=1)

On Fri, Apr 19, 2013 at 8:49 AM, Saalem Adera saalemad...@gmail.com wrote:
 Hi all,

 Thanks for the quick replies.

 Dennis - I tried rotating the y-axis tick labels 90 degrees and while the
 x-axes became the same width, the x-axis values still didn't line up with
 each other.  So maybe I need to be more clear - how can I get the x-axis
 tick values to line up?  For example, I want the 2000 x-axis tick in the
 upper and lower plots to be on top of each other (along with all the other
 x-axis ticks).

 Andrés - I tried the multiplot() function that you suggested, with the same
 result as when I used the grid.arrange() function - the x-axis tick values
 still don't line up.

 The reason I want the x-axis tick values to line up is so that with a quick
 look at the plots, a viewer will be able to understand the relationship
 between the data depicted by the boxplots and the data depicted by the line
 plot.

 Any other ideas on how to make this work?

 Thanks,
 Saalem



 On Thu, Apr 18, 2013 at 2:22 PM, Andrés Aragón Martínez
 armand...@gmail.com wrote:

 Hi Saalem,

 Check the following:

 http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/


 Regards,


 Andrés AM

 El 18/04/2013, a las 09:47, Saalem Adera saalemad...@gmail.com escribió:

  Hi all,
 
  I want to arrange two ggplot2 plots on the same page with their x-axes
  lined up - even though one is a boxplot and the other is a line plot.
  Is
  there a simple way to do this?  I know I could do this using facetting
  if
  they were both the same type of plot (for example, if they were both
  boxplots), but I haven't been able to figure it out for two different
  types
  of plots.
 
  Below is my test case:
 
  library(ggplot2)
  library(gridExtra)
 
  #generate test precipitation data
  year-c(2000,2001,2002,2003,2004)
  precip-c(46,100,80,74,20)
  yp-data.frame(year, precip)
 
  #generate test fecal coliform data
  year2-c(2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
  2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,
  2002,2002,2002,2002,2002,2002,2002,2002,2002,2002,
  2003,2003,2003,2003,2003,2003,2003,2003,2003,2003,
  2004,2004,2004,2004,2004,2004,2004,2004,2004,2004)
  fc-sample(1:1000, 50)
  yfc-data.frame(year2, fc)
 
  #make test precipitation plot
  yp_plot-ggplot(yp) + geom_point() + geom_line() + aes(year, y=precip) +
  opts(title=Site X \n , axis.text.x=theme_text(angle=45, hjust=1,
  vjust=1)) +
  ylab(Annual Precipitation (in.) \n ) + xlab()
 
  #make test fecal coliform plot
  yfc_plot-ggplot(yfc) + geom_boxplot() + aes(x=as.factor(year2), y=fc) +
  opts(axis.text.x=theme_text(angle=45, hjust=1, vjust=1)) +
  xlab( \n Date) + ylab(Fecal coliforms (cfu/100 mL) \n ) +
  geom_smooth(stat='smooth', aes(group=1), size=1.5) + scale_y_log10()
 
  #arrange plots together
  grid.arrange(yp_plot, yfc_plot, ncol=1)
 
 
  You can see that I got the plot areas to line up using grid.arrange(),
  but
  the x-axes are still off.  I'd really appreciate any help I can get.
 
  Thanks,
  Saalem
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: less than equal sign

2013-03-28 Thread Dennis Murphy
Hi:

Here's a way you could do it entirely within ggplot2. The annotation
functions have a parse = argument which allows you to pass character
string representations of math expressions, but there is no such thing
in the scale functions, so you need a different approach.

library(ggplot2)
df - data.frame(vis=c(0,0,1,1) , count=c(10,15,20,10) , grp=c(0,1,0,1))

# Generate a list of expressions that will become the legend labels
lbs - list(expression(x = 10), expression(x  10))

ggplot(df, aes(x = factor(vis), y=count , fill=factor(grp)))  +
   geom_bar(stat =identity) +
   scale_fill_manual(breaks = levels(factor(df$grp)),
 values = c(blue, orange),
 labels = lbs)

The specifications in scale_fill_manual() are:
- breaks: the values to go on the horizontal axis
- values: the vector of fill colors for each level of grp
- labels: the legend labels
Notice that the labels = takes a list of expressions as its argument.
This approach gives you more control over the legend and choice of
fill colors at the expense of a couple of lines of code. To change the
axis and legend titles, one can use the labs() function; e.g.,

last_plot() + labs(x = Visibility, y = Frequency, fill = Threshold)

Dennis

On Thu, Mar 28, 2013 at 2:22 PM, soon yi soon...@ymail.com wrote:
 Hi

 I am trying to add a less than equal sign to a plot. I have previously done
 this using unicode but is not working in this instance. Any suggestions
 would be great

 thanks

 example code:

 library(ggplot2)
 df-data.frame(vis=c(0,0,1,1) , count=c(10,15,20,10) , grp=c(0,1,0,1))
 df$grp -factor(df$grp ,levels=c(0,1) , labels =c(x \u2265 10 , x  10))
 ggplot(df, aes(x = factor(vis), y=count , fill=grp))  + geom_bar(stat =
 identity)




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/ggplot2-less-than-equal-sign-tp4662784.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?

2013-02-27 Thread Dennis Murphy
Hi:

The other day I ran 100K simulations, each of which returned a 20 x 4
data frame. I stored these in a list object. When attempting to rbind
them into a single large data frame, my first thought was to try plyr:

library(plyr)
bigD - ldply(L, rbind)   # where L is the list object

I quit at around a half hour. Ditto for do.call(rbind, L). [Sorry, I
didn't time it - these are approximate times.] I then checked to see
if the data.table package could do this, and lo and behold, I
discovered the rbindlist() function. When applied to my list object,
it ran correctly in under a second. Here's the actual example with
some names changed to mask the application:

g - gs[1:10]   # gs is a list of lists
 length(g)
[1] 10
 class(g)
[1] list
 dim(g[[1]])
[1] 20  4
 dim(g[[10]])
[1] 20  4
 library(data.table)
 system.time(bigD - rbindlist(g))
   user  system elapsed
   0.450.020.47
 dim(bigD)
[1] 200   4
 class(bigD)
[1] data.table data.frame

Dennis

On Tue, Feb 26, 2013 at 7:05 PM, David Kulp dk...@fiksu.com wrote:
 On Feb 26, 2013, at 9:33 PM, Anika Masters anika.mast...@gmail.com wrote:

 Thanks Arun and David.  Another issue I am running into are memory
 issues when one of the data frames I'm trying to rbind to or merge
 with are very large.  (This is a repetitive  problem, as I am trying
 to merge/rbind thousands of small dataframes into a single very
 large dataframe.)



 I'm thinking of creating a function that creates an empty dataframe to
 which I can add data, but will need to first determine and ensure that
 each dataframe has the exact same columns, in the exact same
 location.



 Before I write any new code, is there any pre-existing functions or
 code that might solve this problem of merging small or medium sized
 dataframes with a very large dataframe.)

 Consider plyr. Memory issues can be a problem, but it's a piece of
 cake to write a one liner that iterates over a list of data frames and
 returns them all rbind'd together.  Or just: do.call(rbind,
 list.of.data.frames).

 If memory is a serious problem then I think it's best to write your
 own code that appends each row by index - which avoids copying entire
 data frames in memory.


 On Tue, Feb 26, 2013 at 2:00 PM, David L Carlson dcarl...@tamu.edu wrote:
 Clumsy but it doesn't require any packages:

 merge2 - function(x, y) {
 if(all(union(names(x), names(y)) == intersect(names(x), names(y{
rbind(x, y)
} else merge(x, y, all=TRUE)
 }
 merge2(df1, df2)
 df3 - df1
 merge2(df1, df3)

 --
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77843-4352




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of arun
 Sent: Tuesday, February 26, 2013 1:14 PM
 To: Anika Masters
 Cc: R help
 Subject: Re: [R] merging or joining 2 dataframes: merge, rbind.fill,
 etc.?

 Hi,

 You could also try:
 library(gtools)
 smartbind(df2,df1)
 #  a  b  d
 #1 7 99 12
 #2 7 99 12


 When df1!=df2
 smartbind(df1,df2)
 #   a  b  d  x  y  c
 #1  7 99 12 NA NA NA
 #2 NA 34 88 12 44 56
 A.K.




 - Original Message -
 From: Anika Masters anika.mast...@gmail.com
 To: r-help@r-project.org
 Cc:
 Sent: Tuesday, February 26, 2013 1:55 PM
 Subject: [R] merging or joining 2 dataframes: merge, rbind.fill, etc.?

 #I want to merge or join 2 dataframes (df1  df2) into a 3rd
 (mydf).  I want the 3rd dataframe to contain 1 row for each row in df1
  df2, and all the columns in both df1  df2. The solution should
 work even if the 2 dataframes are identical, and even if the 2
 dataframes do not have the same column names.  The rbind.fill function
 seems to work.  For learning purposes, are there other good ways to
 solve this problem, using merge or other functions other than
 rbind.fill?

 #e.g. These 3 examples all seem to work correctly and as I hoped:

 df1 - data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
 list( NULL ,  c('a' , 'b' , 'd') ) ) )
 df2 - data.frame(matrix(data=c(88, 34, 12, 44, 56) ,  nrow=1 ,
 dimnames = list( NULL ,  c('d' , 'b' , 'x' ,  'y', 'c') ) ) )
 mydf - merge(df2, df1, all.y=T, all.x=T)
 mydf

 #e.g. this works:
 library(reshape)
 mydf - rbind.fill(df1, df2)
 mydf

 #This works:
 library(reshape)
 mydf - rbind.fill(df1, df2)
 mydf

 #But this does not (the 2 dataframes are identical)
 df1 - data.frame(matrix(data=c(7, 99, 12) ,  nrow=1 ,  dimnames =
 list( NULL ,  c('a' , 'b' , 'd') ) ) )
 df2 - df1
 mydf - merge(df2, df1, all.y=T, all.x=T)
 mydf

 #Any way to get mere to work for this final example? Any other good
 solutions?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 

Re: [R] Plot a Matrix as an Image with ggplot

2013-02-15 Thread Dennis Murphy
Hi:

See if the following works for you:

library(reshape2)
library(ggplot2)
tdm - melt(testData)

ggplot(tdm, aes(x = Var2, y = Var1, fill = factor(value))) +
labs(x = MHz, y = Threshold, fill = Value) +
geom_raster() +
scale_fill_manual(breaks = levels(factor(tdm$value)),
  values = c(white, black)) +
theme(plot.background = element_rect(fill = grey90),
  legend.background = element_rect(fill = grey90)) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))

It was necessary to create a different plot background color because
you wanted to remove the padding, which blended much of the plot
boundary with the original white background, and had to do something
similar to the legend background so that you could see the white
legend box.

The expand = c(0, 0) argument to the two scale functions answers your
second question, and converting value from numeric to vector answers
the first.

Dennis

On Fri, Feb 15, 2013 at 3:24 PM, Alaios ala...@yahoo.com wrote:
 You were right,
 the following two work


 testData-matrix(data=round(runif(25)),nrow=5,ncol=5,dimnames=list(Var1=c(1:5),Var2=c(1:5)))
  ggplot(melt(testData), aes(Var2,Var1, fill=value))+xlab(MHz) + 
 ylab(Threshold) +geom_raster() + scale_fill_gradient(low=#FF, 
 high=#00)



 still there are two minor unresolved issues.

 a. How to reduce the extra space in the plot between legend and the tiles?
 b. How to make color bar depict only the two values of the game, 0 and 1 , 
 instead of 0, 0.25, 0.50, 0.75, 1?

 I would like to thank you in advanec for your help
 Regards
 Alex


 - Original Message -
 From: Dennis Murphy djmu...@gmail.com
 To: Alaios ala...@yahoo.com
 Cc:
 Sent: Friday, February 15, 2013 10:19 AM
 Subject: Re: [R] Plot a Matrix as an Image with ggplot

 Your aesthetic is fill, not color. Change scale_color_gradient to
 scale_fill_gradient and you'll get what you expect.

 D.

 On Thu, Feb 14, 2013 at 11:31 PM, Alaios ala...@yahoo.com wrote:


 Hi,
 thanks
 I changed slightly the code to be reproducible from everyone . I have  tried 
 ggplot
 but I need a bit of help to tweak it a bit

 So you can run the following in your computer

   
 testData-matrix(data=round(runif(25)),nrow=5,ncol=5,dimnames=list(1:5,1:5))

   p-ggplot(melt(testData), aes(Var2,Var1, fill=value))+xlab(MHz) + 
 ylab(Threshold) +geom_raster()


   p


 What I want to improve is
 a: make the colorbar so only two specific colors lets say black and white 
 and only two values 0 and 1 and somewhere the string as title of the color 
 bar text to appear.

 I have tried something like
 p+   scale_color_gradient(low=red,high=blue)
 Fehler in unit(tic_pos.c, mm) : 'x' and 'units' must have length  0


 ggplot(melt(testData), aes(Var2,Var1, fill=value,colorbin=2))+xlab(MHz) + 
 ylab(Threshold) +geom_raster()
 but this did not affect colorbar entries.


 b. reduce/remove the grayish border that appears between the legend and the 
 image plot

 Could you please help me with these two?

 Regards
 Alex



 
 From: John Kane jrkrid...@inbox.com
 To: Alaios ala...@yahoo.com; R help r-help@r-project.org
 Sent: Thursday, February 14, 2013 4:35 PM
 Subject: RE: [R] Plot a Matrix as an Image with ggplot

 The R-help list is rather picky about what attached. None of your 
 attachments arrived.

 The str() info is useful but please supply some sample data

 The easiest way to supply data  is to use the dput() function.  Example with 
 your file named testfile:
 dput(testfile)

 Then copy the output and paste into your email.  For large data sets, you 
 can just supply a representative sample.  Usually,
 dput(head(testfile, 100)) will be sufficient.

 How are you writing the code/or what format is the original email. Your code 
 in the body of the text is badly messed up -- you probably need to post only 
 in text. HTML etc is automatically dropped.




 John Kane
 Kingston ON Canada


 -Original Message-
 From: ala...@yahoo.com
 Sent: Thu, 14 Feb 2013 07:15:05 -0800 (PST)
 To: r-help@r-project.org
 Subject: [R] Plot a Matrix as an Image with ggplot

 Dear all,
 I am trying to plot a matrix I have  as an image

 str(matrixToPlot)
  num [1:21, 1:66] 0 0 0 0 0 0 0 0 0 0 .


  that contains only 0s and 1s,


 where the xlabel will be Labeled as
 str(xLabel)
  num [1:66] 1e+09 1e+09 1e+09 1e+09 1e+09 ...

 and the yLabels will be labeled as
 str(yLabel)
  num [1:21] -88 -87 -86 -85 -84 -83 -82 -81 -80 -79 ...


 I have found on the internet that I can do something like that with
 ggplot2.
 For example
 you can run the following



 library(reshape2)library(ggplot2)m
 =matrix(rnorm(20),5)ggplot(melt(m),aes(Var1,Var2,fill=value))+geom_raster()

 What I see missing here is to get my matrix and transform it to a data
 frame with labels for x and y axis so as ggplot can print the values
 nice. If you get the idea this matrix will be printed as a two tile

Re: [R] Why replacement has length zero? And How can I fix it?

2013-02-02 Thread Dennis Murphy
Hi:

No offense, but this is code is inefficient on several levels.
Firstly, a loop is unnecessary (see below); secondly,  ifelse() is a
vectorized function and you're attempting to apply it to each row of
DataSet. Try this instead, given

 dput(DataSet)
structure(list(Age = c(-0.552450789929175, -2.06988485889524,
-1.1290497440272, -0.268777167923272, -0.012748181465016, 0.554761979685086,
0.179401201802416, 0.0973216900180107, -0.914782595075178, -1.53708818506571
), Gender = c(-0.0705367578252014, 0.969426004829301, 0.0556317051882303,
1.33716753655214, -0.638382962845772, 0.480939398350137, -0.755053441639217,
1.25393329162734, 1.45933385954704, 0.672871276697159), SES =
c(-1.88102131807556,
0.529181612783923, 0.271835979074789, -0.656415898217512, -0.517622943602359,
0.639094817790355, -1.08205962796926, -0.193116856300625, 0.707374091321319,
1.98180909069287), ISS = c(-1.74208565286036, -1.235721877648,
0.459828219284939, -0.87781916182417, -0.429262433047679, 0.379335629940134,
0.949088546564315, 0.689758637278236, -1.14832161185134, 1.40363651773545
), IQTe = c(-0.266523341317252, -0.32477349599139, -0.75723700635683,
0.74556355929622, -0.238794158234391, 0.0906173696194765, 0.331200954821995,
-0.373009866084321, 0.417237160703897, 0.755603172535694), PreTe =
c(-0.743400287678305,
-1.1501727014007, -0.668278397501659, -0.0766957410591086, -0.232360222985187,
0.320050787942651, 0.474114337341715, -0.0181639478062117, -0.0905594374421625,
0.517614989060291)), .Names = c(Age, Gender, SES, ISS,
IQTe, PreTe), row.names = c(NA, -10L), class = data.frame)

u - runif(10)
 u
 [1] 0.72 0.57 0.71 0.64 0.38 0.53 0.14 0.14 0.82 0.12

# Remember two things:
#  (i) ifelse() is a vectorized function;
#  (ii) logical statements return either TRUE (1) or FALSE (0).
#  As a result,

# less efficient:
IAP0 - ifelse(DataSet$SES  0, ifelse(u  0.75, 1, 0),
 ifelse(u = 0.25, 1, 0))
# more efficient:
IAP1 - ifelse(DataSet$SES  0, u  0.75, u = 0.25) * 1
IAP1
 [1] 0 0 0 0 0 0 1 1 1 0
 identical(IAP0, IAP1)
[1] TRUE

# Verification:
cbind(DataSet$SES, u, IAP1)

This result conforms to what one would expect with u as the realized
uniform random vector. Moreover, the code is transparent: if SES  0,
then apply the test u  0.75, else apply the test u = 0.25). The
post-multiplication by 1 ensures that the result is numeric rather
than logical.

Dennis


On Sat, Feb 2, 2013 at 12:57 PM, soon yi soon...@ymail.com wrote:
 Hi

 for the loop section runif needs curved brackets

 Try

 IAP -NA
 for (i in 1:Sample.Size){
 if (DataSet$SES[i]0)   {
 IAP[i] -  ifelse(runif(1)0.75, 1, 0) # High SES, higher 
 chance to be in
 Treatment #
 }
 else  {
 IAP[i] - ifelse(runif(1)=0.25, 1, 0) # Low SES, lower 
 chance to be in
 Treatment #
  }
 } # End loop #
 IAP




 IAP
 zjiaqi19880219 wrote
 Hi, all,

 I am working on a project to run a simulation. I am now working on the
 main body of the simulation body, I generate the data and all work well
 before I identify a variable IAP, there is always an error message as
 replacement has length zero, and I do not know how to fix it. Can anyone
 help?

 Here is the code:

 library (mvtnorm)

 options(digits=2)

 # Set variable name #
 namelist1 - c(Age, Gender, SES, ISS, IQTe, PreTe)

 # Generate Data #
 Sample.Size - 10 # Variable Sample Size #
 CorMat - matrix(c(1.0 ,0.0 ,0.0 , 0.5, 0.2, 0.5,
   0.0, 1.0 ,0.0, 0.0, 0.0, 0.0,
   0.0 ,0.0 ,1.0, 0.5, 0.4, 0.5,
   0.5, 0.0, 0.5, 1.0, 0.4, 0.7,
   0.2, 0.0, 0.4, 0.4, 1.0, 0.9,
   0.5, 0.0, 0.5, 0.7, 0.9, 1.0),ncol=6) 
 # Correlation Matrix #

 DataSet - rmvnorm(Sample.Size, mean=c(0,0,0,0,0,0), sigma=CorMat) # Draw
 Sample.Size covariate set all mean 0 #
 DataSet - as.data.frame(DataSet)
 names(DataSet) - namelist1
 DataSet

 # For Treatment IAP #
 IAP - matrix(0, nrow=Sample.Size)
 for (i in 1:Sample.Size){
   IAP[i] - if (DataSet$SES0)
   {
   ifelse(runif[1]0.75, 1, 0) # High SES, higher chance to be in 
 Treatment
 #
   }
   IAP[i] - if (DataSet$SES=0)
   {
   ifelse(runif[1]=0.25, 1, 0) # Low SES, lower chance to be in 
 Treatment
 #
   }
 } # End loop #
 IAP

 Error in IAP[i] - if (DataSet$SES  0) { : replacement has length zero
 In addition: Warning message:
 In if (DataSet$SES  0) { :
   the condition has length  1 and only the first element will be used

 Thanks!





 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Why-replacement-has-length-zero-And-How-can-I-fix-it-tp4657373p4657380.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do 

Re: [R] facet plot

2013-02-01 Thread Dennis Murphy
Hi:

Try

DF - structure(list(Gene = structure(1:5, .Label = c(NM_001001130,
NM_001001144, NM_001001152, NM_001001160, NM_001001176
), class = factor), T0h = c(68L, 0L, 79L, 1L, 0L), T0.25h = c(95L,
1L, 129L, 1L, 0L), T0.5h = c(56L, 4L, 52L, 2L, 0L), T1h = c(43L,
0L, 50L, 0L, 0L), T2h = c(66L, 1L, 24L, 0L, 0L), T3h = c(62L,
1L, 45L, 1L, 0L), T6h = c(68L, 1L, 130L, 0L, 0L), T12h = c(90L,
4L, 154L, 0L, 0L), T24h = c(63L, 1L, 112L, 0L, 0L), T48h = c(89L,
2L, 147L, 1L, 0L)), .Names = c(Gene, T0h, T0.25h, T0.5h,
T1h, T2h, T3h, T6h, T12h, T24h, T48h), class =
data.frame, row.names = c(1, 2, 3, 4, 5))

library(reshape2)
library(ggplot2)

# stack the time-related variables by Gene
Dfm - melt(DF, id = Gene)
# extract the numeric time values
Dfm$time - as.numeric(gsub([a-zA-Z], , as.character(Dfm$variable)))

ggplot(Dfm, aes(x = time, y = value)) + geom_point() + facet_wrap(~ Gene)


Dennis

On Thu, Jan 31, 2013 at 2:37 AM, Jose Iparraguirre
jose.iparragui...@ageuk.org.uk wrote:
 Hi Robin,

 You didn't provide the list with a clear structure of your data, but I hope I 
 understood it properly. If not, it'd be easy for you to adapt what follows.

 I guess your data looks like this:

Gene  T0h T0.25h T0.5h  T1h  T2h  T3h  T6h T12h T24h T48h
 1  NM_001001130   68 9556   43   66   62   68   90   63   89
 2  NM_0010011440  1 40111412
 3  NM_001001152   7912952   50   24   45  130  154  112  147
 4  NM_0010011601  1 20010001
 5  NM_0010011760  0 00000000
 ...

 If this is the case, first reshape your data (say, gene.df) to a long format:

 Gene Time
 1   NM_001001130   68
 2   NM_0010011440
 3   NM_001001152   79
 4   NM_0010011601
 5   NM_0010011760
 6   NM_0010011771
 7   NM_0010011780
 8   NM_0010011790
 9   NM_001001182 4691
 ...

 Then, you create the ggplot:

 sp - ggplot(gene.df, aes(x=Gene, y=Time)) + geom_point(shape=1)

 And use the function facet_wrap():

 sp + facet_wrap( ~ Gene,ncol=3)

 You can set a different number of columns and a different shape for the 
 markers, of course.

 Hope this helps.

 Regards,

 José


 José Iparraguirre
 Chief Economist
 Age UK


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Robin Mjelle
 Sent: 31 January 2013 09:06
 To: r-help@r-project.org
 Subject: [R] facet plot

 Hi all,

 I want to plot a facet plot with column names as x and column values as y.
 One plot for each row. here is part of my dataset:

 Gene T0h T0.25h T0.5h T1h T2h T3h T6h T12h T24h T48h  NM_001001130 68 95
 56 43 66 62 68 90 63 89  NM_001001144 0 1 4 0 1 1 1 4 1 2  NM_001001152 79
 129 52 50 24 45 130 154 112 147  NM_001001160 1 1 2 0 0 1 0 0 0 1
 NM_001001176 0 0 0 0 0 0 0 0 0 0  NM_001001177 1 3 2 3 0 1 1 0 2 0
 NM_001001178 0 0 0 0 0 0 0 0 0 0  NM_001001179 0 0 0 0 0 0 0 0 0 0
 NM_001001180 1 0 0 1 0 0 0 0 0 0  NM_001001181 350 539 424 470 441 451 554
 553 419 548  NM_001001182 4691 6458 5686 6761 5944 4486 6030 5883 6009 7552
 NM_001001183 22 23 12 21 25 17 34 40 33 32  NM_001001184 138 147 111 100 54
 61 61 77 57 99  NM_001001185 2 3 4 3 4 5 9 2 4 4  NM_001001186 231 337 187
 168 148 178 275 319 181 279

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Wrap Up and Run 10k is back!

 Also, new for 2013 – 2km intergenerational walks at selected venues. So 
 recruit a buddy, dust off the trainers and beat the winter blues by
 signing up now:

 http://www.ageuk.org.uk/10k

  Milton Keynes | Oxford | Sheffield | Crystal Palace | Exeter 
 | Harewood House, Leeds |
  Tatton Park, Cheshire | Southampton | 
 Coventry



 Age UK Improving later life

 http://www.ageuk.org.uk




 ---
 Age UK is a registered charity and company limited by guarantee, (registered 
 charity number 1128267, registered company number 6825798).
 Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

 For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
 Representative of Age UK Enterprises Limited, Age UK is an Introducer
 Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
 Access for the purposes of introducing potential annuity and health
 cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
 Solutions Limited and Simplyhealth Access are all authorised and
 regulated by the Financial Services Authority.
 --

 This email and any files transmitted with it are confidential and intended 
 solely for the use of the individual 

Re: [R] How to select a subset data to do a barplot in ggplot2

2012-12-13 Thread Dennis Murphy
Hi:

The simplest way to do it is to modify the input data frame by taking
out the records not having status live or dead and then redefining the
factor in the new data frame to get rid of the removed levels. Calling
your input data frame DF rather than data,

DF - structure(list(FID = c(1L, 1L, 2L, 2L, 2L, 2L, 6L, 6L, 10L, 10L,
10L, 11L, 11L, 11L, 12L, 17L, 17L, 17L), IID = c(4621L, 4628L,
4631L, 4632L, 4633L, 4634L, 4675L, 4679L, 4716L, 4719L, 4721L,
4726L, 4728L, 4730L, 4732L, 4783L, 4783L, 4784L), STATUS = structure(c(2L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L,
2L), .Label = c(dead, live, nosperm), class = factor)), .Names
= c(FID,
IID, STATUS), class = data.frame, row.names = c(NA, -18L
))

# The right hand side above came from dput(DF), where DF was created by
# DF - read.table(textConnection(your posted data), header = TRUE)
# Consider using dput() to represent your data in the future.

# Retain the records with status live or dead only
DF2 - DF[DF$STATUS %in% c(live, dead), ]

# This does not get rid of the original levels...
levels(DF2$STATUS)
# ...so redefine the factor
DF2$STATUS - factor(DF2$STATUS)

 str(DF2)
'data.frame':   16 obs. of  3 variables:
 $ FID   : int  1 1 2 2 2 2 6 6 10 10 ...
 $ IID   : int  4621 4628 4631 4632 4633 4634 4675 4679 4716 4719 ...
 $ STATUS: Factor w/ 2 levels dead,live: 2 1 2 2 2 2 2 1 1 2 ...

# now plot:

# (1) FID numeric
ggplot(DF2, aes(x = FID, fill = STATUS)) + geom_bar()

# (2) FID factor
ggplot(DF2, aes(x = factor(FID), fill = STATUS)) + geom_bar()

The second one makes more sense to me, but you may have reasons to
prefer the first.

Dennis

On Thu, Dec 13, 2012 at 4:38 AM, Yao He yao.h.1...@gmail.com wrote:
 FID IID STATUS
 14621live
 14628dead
 24631live
 24632live
 24633live
 24634live
 64675live
 64679dead
 104716dead
 104719live
 104721dead
 114726live
 114728nosperm
 114730nosperm
 124732live
 174783live
 174783live
 174784live

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate() runs out of memory

2012-09-14 Thread Dennis Murphy
Hi:

This should give you some idea of what Steve is talking about:

library(data.table)
dt - data.table(x = sample(10, 1000, replace = TRUE),
  y = rnorm(1000), key = x)
dt[, .N, by = x]
system.time(dt[, .N, by = x])

...on my system, dual core 8Gb RAM running Win7 64-bit,
 system.time(dt[, .N, by = x])
   user  system elapsed
   0.120.020.14

.N is an optimized function to find the number of rows of each data subset.
Much faster than aggregate(). It might take a little longer because you
have more columns that suck up space, but you get the idea. It's also about
5-6 times faster if you set a key variable in the data table than if you
don't.

Dennis

On Fri, Sep 14, 2012 at 12:26 PM, Sam Steingold s...@gnu.org wrote:

 I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17
 columns).
 I want to get the result of
 table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x)
 alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is
 24.3G, and no end in sight.
 both V1 and V2 are characters (not factors).
 Is there anything I could do to speed this up?
 Thanks.

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X
 11.0.11103000
 http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
 http://dhimmi.com http://think-israel.org http://iris.org.il
 WinWord 6.0 UNinstall: Not enough disk space to uninstall WinWord

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fill binary matrices with random 1s

2011-11-29 Thread Dennis Murphy
Hi:

Here's one approach. I assume that your first 1000 matrices have a
single 1 in each matrix, the next set of 1000 have two 1's, ..., and
the last one has 99 1's. (No point in doing all 1's since they're all
constant.) If that's the case, then try the following.

# Each row represents a different 'density' of 1's
# upper triangle of m is 0
m - matrix(0, 100, 100)
m[lower.tri(m)] - 1
diag(m) - 1
m - m[-100, ]   # remove row of all 1's

# Functions to operate on a single matrix 
# Function to permute a vector of 0's and 1's
# and reshape it into a 10 x 10 matrix
randomMatrix - function(x) matrix(sample(x), 10, 10)

# Generate a 10 x 10 x 1000 array
marray - function(x) replicate(1000, randomMatrix(x))

# Create a vector of names to which to assign the results
# of simulating from each row of m:
arraynames - paste('array', 1:99, sep = '')

# apply the marray() function to each row of m and assign
# to the corresponding index of arraynames
for(i in seq_along(arraynames)) assign(arraynames[i], marray(m[i, ]))

HTH,
Dennis


On Tue, Nov 29, 2011 at 4:32 AM, Grant McDonald
grantforacco...@hotmail.co.uk wrote:

 Dear all, I am finding difficulty in the following, I would like to
 create an empty matrix e.g. 10x10 of 0s and sequentially fill this
 matrix with randomly placed a 1s until it is saturated. Producing 100
 matrices of sequentially increasing density., This process needs to be
 randomized 1000 times., I assume i should run this along the following
 lines, 1) Create 1000 matrices all zeros, 2) add a random 1 to all
 matrices, 3) run function on all 1000 matrices and output results to a
 vector table (i.e. calculate density of matric at each step for all 100 
 matrices.
 )., 4) add another 1 to the previous 1000 matrices in a
 random position., repeat till all matrices saturated., I have looked
 through histories on random fill algorithms but all packages I can find
 nothing as simple as the random fill I am looking for., sorry for
 bothering, Thank you for any help in advance.


 Something that starts along the lines of the following? Sorry this example is 
 atrocious.

 matrixfill - function(emptymatrix, K=fullmatrix, time=100, from=0, to=time)

 {

 N - numeric(time+1)

 N[1] - emptymatrix

 for (i in 1:time) N[i+1] - N[i]+place random 1 in a random xy position 
 until K.
 Calculate Density of matrix

 
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help needed in reproducing a plot

2011-11-29 Thread Dennis Murphy
Hi:

This looks like the one:

library('lattice')
data(Oats, package = 'MEMSS')
print(xyplot(yield ~ nitro | Block, Oats, groups = Variety, type = c(g, b),
 auto.key = list(lines = TRUE, space = 'top', columns = 3),
 xlab = Nitrogen concentration (cwt/acre),
 ylab = Yield (bushels/acre),
 aspect = 'xy'))

Look at the source code of the implementation document in the same
place that you found the pdf. I found it from the html help page for
the lme4 package, but there are other places you could find it.) The
code for the plots in the document are contained therein.

Dennis

On Tue, Nov 29, 2011 at 7:43 AM, syrvn ment...@gmx.net wrote:
 Hello,


 can anybody tell me how to produce a plot like the one in

 http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf

 on page 13, Figure 6?

 The data is stored in:

 library(nlme)
 data(Oats)


 Cheers

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Help-needed-in-reproducing-a-plot-tp4119603p4119603.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumsum in 3d arrays

2011-11-28 Thread Dennis Murphy
Hi:

Could you supply a small reproducible example with the output that you
expect? For example, what output would you expect from the following:

a - array(1:24, c(2, 2, 3))

?
Dennis

On Mon, Nov 28, 2011 at 12:32 AM, zloncaric
zlonca...@biologija.unios.hr wrote:
 Thank you for your time and help.

 I'm quite aware that this problem seems as basic stuff, and of course I've
 read several R and matlab manuals, and also consulted the D. Hiebeler,
 Matlab / R Reference and several others, but none of the suggested solutions
 seems to give me the right result.
 I've tried the combination of apply and cumsum but the result is wrong.





 --
 View this message in context: 
 http://r.789695.n4.nabble.com/cumsum-in-3d-arrays-tp4110470p4114510.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simplify source code

2011-11-26 Thread Dennis Murphy
Hi:

Here's one way you could do it. I manufactured some fake data with a
simple model to illustrate. This assumes you are using the same model
formula with the same starting values and remaining arguments for each
response.

dg - data.frame(x = 1:10, y1 = sort(abs(rnorm(10))),
  y2 = sort(abs(rnorm(10))), y3 = sort(abs(rnorm(10

# Model: y = b0 + b1 exp(x/theta)
vars - c('y1', 'y2', 'y3')

# Function to create the model formula by plugging in the
# response y and run the model
mfun - function(y) {
 form - as.formula(paste(y, 'cbind(1, exp(x/th))', sep = ' ~ '))
 nls(form, data = dg, start = list(th = 0.3), algorithm = 'plinear')
}

# Generate a list of model objects:
mlist - lapply(vars, mfun)

# To see what they contain:
str(mlist[[1]])
str(summary(mlist[[1]]))

# Extract a few features from each:
# The first two return matrices, the third returns a list

do.call(rbind, lapply(mlist, function(m) coef(m)))
do.call(rbind, lapply(mlist, function(m) deviance(m)))
lapply(mlist, function(m) summary(m)$cov.unscaled)


To get more control over the output format, the plyr package can come
in handy. For example, to get data frames for the first two
extractions above, one would do

library('plyr')
ldply(mlist, function(m) coef(m))
ldply(mlist, function(m) deviance(m))

# ldply() means list input, data frame output (ld).
# For the third extraction, one has a list input and a list output:
llply(mlist, function(m) summary(m)$cov.unscaled)

HTH,
Dennis

On Sat, Nov 26, 2011 at 2:30 PM, Christof Kluß ckl...@email.uni-kiel.de wrote:
 Hi

 I would like to shorten

 mod1 - nls(ColName2 ~ ColName1, data = table, ...)
 mod2 - nls(ColName3 ~ ColName1, data = table, ...)
 mod3 - nls(ColName4 ~ ColName1, data = table, ...)
 ...

 is there something like

 cols = c(ColName2,ColName3,ColName4,...)

 for i in ...
  mod[i-1] - nls(ColName[i] ~ ColName1, data = table, ...)

 I am looking forward to help

 Christof

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.table merge equivalent for all.x

2011-11-26 Thread Dennis Murphy
Hi:

There may well be a more efficient way to do this, but here's one take.

library('data.table')
# Want to merge by D in the end, so set D as part of the key:
t1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID, D)
t2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2, D)

# The J expression produces sums of B (the non-key variable) for each D group
# .SD denotes 'sub-data'.  The result 'junk' is a data table.
junk - t2[, lapply(.SD, sum), by = D]

tables()   # junk has no key
# set a key for junk so that it can be merged
setkey(junk, 'D')
# t1 and junk have a common key variable D, so the left join is
merge(t1, junk, by = 'D', all.x = TRUE)

# check against
t1
junk

HTH,
Dennis


On Sat, Nov 26, 2011 at 3:59 PM, ONKELINX, Thierry
thierry.onkel...@inbo.be wrote:
 Dear all,

 I'm trying to use data.table to summarise a table and merge it to another 
 table. Here is what I would like to do, but by using data.table() in a proper 
 way.

 library(data.table)
 tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID)
 tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2)
 junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum)
 merge(tab1, junk, by = D, all.x = TRUE)

 This my attempt using data.table()

 junk - tab2[, mean(B), by = D]
 tab1[junk]

 Best regards,

 Thierry



        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Copula Fitting Using R

2011-11-25 Thread Dennis Murphy
Hi:

This is the type of question for which the sos package can come to the rescue:

library('sos')
findFn('Gumbel Clayton copula')

It appears that the QRMlib package would be a reasonable place to start.

Dennis

On Thu, Nov 24, 2011 at 7:29 PM, cahaya iman qaisfay...@gmail.com wrote:
 Hi,

 Is anybody using Copula package for fitting copulas to own data?
 I have two marginals Log Normal with (parameters 1.17 and 0.76) and Gamma (
 2.7 and 1.05)

 Which package I should use to fit Gumbel and Clayton Copulas?

 Thanks,
 fayyad

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping and paste

2011-11-24 Thread Dennis Murphy
Hi:

There are two good reasons why the loop solution is not efficient in
this (and related) problem(s):

(i) There is more code and less transparency;
(ii) the vectorized solution is four times faster.

Here are the two proposed functions:

# Vectorized version
m1 - function(v) paste(v, ' to ', v + 50, ' mN', sep = '')

# Loop version:
m2 - function(v) {
  out - rep(NA, length(v))
  for(i in seq_along(v)) out[i] - paste(v[i], ' to ', v[i] + 50,
' mN', sep = '')
  out
}
BndY - seq(from = 18900, to = 19700, by = 50)

 identical(m1(BndY), m2(BndY))
[1] TRUE

# Put them to the test:
 system.time(replicate(1, m1(BndY)))
   user  system elapsed
   0.670.000.67
 system.time(replicate(1, m2(BndY)))
   user  system elapsed
   2.670.002.67

The vectorized version is four times faster and produces the same
output as the loop version. Experiments with a longer test vector (501
elements) maintained the timing ratio.

Dennis


On Wed, Nov 23, 2011 at 7:00 PM, markm0705 markm0...@gmail.com wrote:
 Thank you

 On Thu, Nov 24, 2011 at 7:31 AM, B77S [via R] 
 ml-node+s789695n4102066...@n4.nabble.com wrote:

 out - vector(list)
 Ylab - for(i in 1:length(BndY))
 {
 out[i] - paste(BndY[i], to ,BndY[i],mN)
 }

 Ylab - do.call(c, out)





  markm0705 wrote
 Dear R helpers

 I'm trying to make up some labels for plot from this vector

 BndY-seq(from = 18900,to= 19700, by = 50)

 using

 Ylab-for(i in BndY) {c((paste(i, to ,i+50,mN)))}

 but the vector created is NULL

 However if i use

 for(i in BndY) {print(c(paste(i, to ,i+50,mN)))}

 I can see the for loop is making the labels I'm looking for but not sure
 on my error in assigning them to a vector

 Thanks in advance



 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://r.789695.n4.nabble.com/Looping-and-paste-tp4101892p4102066.html
 To unsubscribe from Looping and paste, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4101892code=bWFya20wNzA1QGdtYWlsLmNvbXw0MTAxODkyfDExNDQyODMxMDM=
 .
 NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Looping-and-paste-tp4101892p4102553.html
 Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AlgDesign - $D $A $Ge $Dea

2011-11-24 Thread Dennis Murphy
See
?AlgDesign::optFederov

and look for the 'Value' section, where these elements of the output
list are defined.

Dennis

On Wed, Nov 23, 2011 at 7:15 PM, 蓁蓁 李 tszhenzh...@hotmail.com wrote:

 Hi,

 I am wondering how I should interpreate the output of optFederov() in 
 AlgDesign.

 Specially I want to know what is $D, $A, $Ge and $Dea, which one I can use as 
 an efficiency to say how good the optimal design is.

 I only know when a orthogonal design comes, $D = 1.

 I red the pdf document    --   vignette(AlgDesign)

 [Just type: vignette(AlgDesign) in R, you will get this pdf document]

 On page 20 section 4.2.1.3, it says: the D and G efficiencies of the 
 following ... are 98% and 89%.

 I tried the code as shown on page 20 a few times and output showed different 
 designs but the figure of $D, $A, $Ge and $Dea are similar. I don't see $D is 
 close to 0.98, but $Ge is 0.89. One of the output as below:




 dat = gen.factorial(3, 3, center = T, varNames = c(A, B, C))
 desC = optFederov(~quad(A,B,C), dat, nT = 14, evaluateI = T, nR = 100)
 desC
 $D
 [1] 0.4630447

 $A
 [1] 3.22

 $I
 [1] 9.945833

 $Ge
 [1] 0.893

 $Dea
 [1] 0.887

 $design
    A  B  C
 1  -1 -1 -1
 3   1 -1 -1
 5   0  0 -1
 7  -1  1 -1
 9   1  1 -1
 11  0 -1  0
 13 -1  0  0
 15  1  0  0
 17  0  1  0
 19 -1 -1  1
 21  1 -1  1
 23  0  0  1
 25 -1  1  1
 27  1  1  1

 $rows
  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27

 Thanks very much!
 J

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe indexing by number of cases per group

2011-11-24 Thread Dennis Murphy
A very similar question was asked a couple of days ago - see the
thread titled Removing rows in dataframe w'o duplicated values - in
particular, the responses by Dimitris Rizopoulos and David Winsemius.
The adaptation to this problem is

df[ave(as.numeric(df$group), as.numeric(df$group), FUN = length)  4, ]
   groupx
1  A 3.903747
2  A 3.599547
3  A 2.449991
4  A 2.740639
5  A 4.268988
6  B 8.649600
7  B 5.493841
8  B 1.892154
9  B 6.781754
10 B 1.459250
11 B 6.749522

HTH,
Dennis

On Thu, Nov 24, 2011 at 4:02 AM, Johannes Radinger jradin...@gmx.at wrote:
 Hello,

 assume we have following dataframe:

 group -c(rep(A,5),rep(B,6),rep(C,4))
 x - c(runif(5,1,5),runif(6,1,10),runif(4,2,15))
 df - data.frame(group,x)

 Now I want to select all cases (rows) for those groups
 which have more or equal 5 cases (so I want to select
 all cases of group A and B).
 How can I use the indexing for such questions?

 df[??]... I think it is probably quite easy but I really
 don't know how to do that at the moment.

 maybe someone can help me...

 /johannes
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thank you

2011-11-24 Thread Dennis Murphy
Well said. +1

Dennis

On Thu, Nov 24, 2011 at 6:43 AM, Bert Gunter gunter.ber...@gene.com wrote:
 ... and while I am at it, as this is the U.S. Thanksgiving...

 My sincere thanks to the many R developers and documenters who
 contribute large amounts of their personal time and effort to
 developing, improving, and enhancing the accessibility of R for data
 analysis and science. I believe it is fair to say that R has had as
 much or more impact than Gosset's Student's T, and I fear that
 academics who do much of this work do not receive the professional
 recognition they deserve. I continue to be amazed and humbled by their
 high quality and consummate professionalismism -- I could not live
 without R.

 Kind regards and best wishes to all,

 -- Bert

 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lines and points in xyplot()

2011-11-23 Thread Dennis Murphy
Hi:

Try this:

library('lattice')
xyplot(y ~ x,
  type = c('g', 'p'),
  panel = function(x, y, ...){
  panel.xyplot(x, y, ...)
  panel.lines(x, predict(fm), col = 'black', lwd = 2)
  }
 )

HTH,
Dennis

On Wed, Nov 23, 2011 at 9:18 AM, Doran, Harold hdo...@air.org wrote:
 Given the following data, I want a scatterplot with the data points and the 
 predictions from the regression.

 Sigma - matrix(c(1,.6,1,.6), 2)
 mu - c(0,0)
 dat - mvrnorm(5000, mu, Sigma)

 x - dat[,1] * 50 + 200
 y - dat[,2] * 50 + 200

 fm - lm(y ~ x)

 ### This gives the regression line, but not the data
 xyplot(y ~ x,
               type = c('g', 'p'),
               panel = function(x, y){
               panel.lines(x, predict(fm))
               }
 )

 ### This gives both data but as point
 xyplot(y + predict(fm) ~ x,
               type = c('g', 'p'),
               )

 I know I can add an abline easily, but my problem is a bit more complex and 
 the code above is just an example. What is the best way for the predicted 
 data to form a solid line and let the data points remain as points

 Harold

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zeros to NA's - faster

2011-11-23 Thread Dennis Murphy
Matrix multiplication, maybe?

 all_data %*% iu
 a b c
[1,] 1 4 0
[2,] 2 5 0
[3,] 3 6 0

I have no idea if this is a general solution or not, but it works in
this case. If you need something else, perhaps a more realistic
example would help.

Dennis

On Wed, Nov 23, 2011 at 11:41 AM, Ben quant ccqu...@gmail.com wrote:
 Hello,

 Is there a faster way to do this? Basically, I'd like to NA all values in
 all_data if there are no 1's in the same column of the other matrix, iu.
 Put another way, I want to replace values in the all_data columns if values
 in the same column in iu are all 0. This is pretty slow for me, but works:

  all_data = matrix(c(1:9),3,3)
  colnames(all_data) = c('a','b','c')
 all_data
     a b c
 [1,] 1 4 7
 [2,] 2 5 8
 [3,] 3 6 9
  iu = matrix(c(1,0,0,0,1,0,0,0,0),3,3)
  colnames(iu) = c('a','b','c')
 iu
     a b c
 [1,] 1 0 0
 [2,] 0 1 0
 [3,] 0 0 0

    fun = function(x,d){
      vals = d[,x]
      i = iu[,x]
      if(!any(i==1)){
        vals = rep(NA,times=length(vals))
      }else{
        vals
      }
      vals
    }
    all_data = sapply(colnames(iu),fun,all_data)
 all_data
     a b  c
 [1,] 1 4 NA
 [2,] 2 5 NA
 [3,] 3 6 NA

 ...again, this work, but is slow for a large number of columns. Have
 anything faster?

 Thanks,

 ben

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing rows in dataframe w'o duplicated values

2011-11-22 Thread Dennis Murphy
Hi:

Here's one way:

do.call(rbind, lapply(L, function(d) if(nrow(d)  1) return(d)))
id value value2
1.1  1 5  1
1.2  1 6  4
1.3  1 7  3
3.5  3 5  4
3.6  3 4  3

HTH,
Dennis

On Tue, Nov 22, 2011 at 9:43 AM, AC Del Re de...@wisc.edu wrote:
 Hi,

 Is there an easy way to remove dataframe rows without duplicated values of
 a specified column ('id')?  e.g.,

 dat - data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 =
 c(1,4,3,3,4,3))
 dat

  id value value2
 1  1     5      1
 2  1     6      4
 3  1     7      3
 4  2     4      3
 5  3     5      4
 6  3     4      3


 This is sample data and the real data has hundreds of rows. In this
 case, only row 4 does not have a duplicated id and I would like to
 remove it without using:


 dat$id[4] - NULL


 Any help is appreciated!


 AC

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing rows in dataframe w'o duplicated values

2011-11-22 Thread Dennis Murphy
Sorry, you need this first:

L - split(dat, dat$id)
do.call(rbind, lapply(L, function(d) if(nrow(d)  1) return(d)))

D.

On Tue, Nov 22, 2011 at 10:38 AM, Dennis Murphy djmu...@gmail.com wrote:
 Hi:

 Here's one way:

 do.call(rbind, lapply(L, function(d) if(nrow(d)  1) return(d)))
    id value value2
 1.1  1     5      1
 1.2  1     6      4
 1.3  1     7      3
 3.5  3     5      4
 3.6  3     4      3

 HTH,
 Dennis

 On Tue, Nov 22, 2011 at 9:43 AM, AC Del Re de...@wisc.edu wrote:
 Hi,

 Is there an easy way to remove dataframe rows without duplicated values of
 a specified column ('id')?  e.g.,

 dat - data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 =
 c(1,4,3,3,4,3))
 dat

  id value value2
 1  1     5      1
 2  1     6      4
 3  1     7      3
 4  2     4      3
 5  3     5      4
 6  3     4      3


 This is sample data and the real data has hundreds of rows. In this
 case, only row 4 does not have a duplicated id and I would like to
 remove it without using:


 dat$id[4] - NULL


 Any help is appreciated!


 AC

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R ignores number only with a nine under 10000

2011-11-21 Thread Dennis Murphy
Hi:

Strictly a guess, but the following might be helpful. The call below
assumes that the referent data frame is test1, which consists of a
single column named x. Modify as appropriate.

test_lab - with(test1, cut(x, c(0, 18954, 37791, 56951, 75944, 84885, 113835),
 labels = c('I8456', 'ref', 'Cyprus1', 'KE3870', 'KE3873',
'KE3926OT')))

cut() creates a factor from a numeric variable. The second argument
consists of the cut points and the third argument generates the labels
to be associated with values falling between the cut points. See ?cut
for more details, and pay attention to the options.

The object test_lab is a vector external to test1; if you want it to
be a column of test1, then add it to the data frame in one of the
usual ways.

HTH,
Dennis

On Mon, Nov 21, 2011 at 7:42 AM, set asta...@hotmail.com wrote:
 Hello R users,

 I'm trying to replace numerical values in a datamatrix with strings. R does
 this except for numbers under 1 starting with a 9 (eg 98, 970, 9504
 etc). This is really weird and I wondered whether someone had encountered
 such a problem or knows the solution. I'm using the next script:

 test_1 - read.table(5+ref_15clusters3.csv, header = TRUE, sep = ,,
 colClasses = numeric)
 test_1[test_1  94885  test_1 = 113835] = KE3926OT
 test_1[test_1 != 0  test_1 = 18954] = I8456
 test_1[test_1  75944  test_1 = 94885] = KE3873
 test_1[test_1  56951  test_1 = 75944] = KE3870
 test_1[test_1  37991  test_1 = 56951] = Cyprus1
 test_1[test_1  18954  test_1 = 37991] = ref
 write.table(test_1, file = test_replace7.txt, quote = FALSE, sep=\t)

 Thanks,
 Set

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-1-tp4091936p4091936.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding two or more columns of a data frame for each row when NAs are present.

2011-11-20 Thread Dennis Murphy
Hi:

Does this work for you?

 yy
  Q20 Q21 Q22 Q23 Q24
1   0   1   2   3   4
2   1  NA   2   3   4
3   2   1   2   3   4

rowSums(yy, na.rm = TRUE)
 1  2  3
10 10 12

# Use a subset of the variables in yy:
selectVars - paste('Q', c(20, 21, 24), sep = '')
rowSums(yy[, selectVars], na.rm = TRUE)
1 2 3
5 5 7

HTH,
Dennis
On Sun, Nov 20, 2011 at 12:38 PM, Ian Strang hamame...@ntlworld.com wrote:

 I am fairly new to R and would like help with the problem below. I am trying
 to sum and count several rows in the data frame yy below. All works well as
 in example 1. When I try to add the columns, with an NA in Q21, I get as NA
 as mySum. I would like NA to be treated as O, or igored.
 I wrote a function to try to count an NA element as 0, Example 3 function.
 It works with a few warnings, Example 4, but still gives NA instead of the
 addition when there is an NA in an element.

 In Example 6  7, I tried using sum() but it just sums the whole data frame,
 I think,

 How do I add together several columns giving the result for each row in
 mySum? NA should be treated as a 0. Please, note, I do not want to sum all
 the columns, as I think rowSums would do, just the selected ones.

 Thanks for your help.
 Ian,

 yy - read.table( header = T, sep=,, text =     ## to create a data
 frame
 + Q20, Q21, Q22, Q23, Q24
 +  0,1, 2,3,4
 +  1,NA,2,3,4
 +  2,1, 2,3,4)
 +  yy
  Q20 Q21 Q22 Q23 Q24
 1   0   1    2   3   4
 2   1  NA   2   3   4
 3   2   1    2   3   4

 x - transform( yy,     ## Example 1
 +   mySum = as.numeric(Q20) + as.numeric(Q22) + as.numeric(Q24),
 +   myCount =
 as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
 + )
 + x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
 1   0   1    2   3   4     6       3
 2   1  NA   2   3   4     7       2
 3   2   1    2   3   4     8       3

 + x - transform( yy,      Example 2
 +   mySum = as.numeric(Q20) + as.numeric(Q21) + as.numeric(Q24),
 +   myCount =
 as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
 + )
 + x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
 1   0   1    2   3   4     5       3
 2   1  NA   2   3   4    NA       2
 3   2   1    2   3   4     7       3

 NifAvail - function(x) { if (is.na(x)) x-0 else x - x   ###
 Example 3
 +   return(as.numeric(x))
 + } #end function
 + NifAvail(5)
 [1] 5
 + NifAvail(NA)
 [1] 0

 x - transform( yy,
 +   mySum = NifAvail(Q20) + NifAvail(Q22) + NifAvail(Q24),
  ### Example 4
 +   myCount =
 as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
 + )
 Warning messages:
 1: In if (is.na(x)) x - 0 else x - x :
  the condition has length  1 and only the first element will be used
 2: In if (is.na(x)) x - 0 else x - x :
  the condition has length  1 and only the first element will be used
 3: In if (is.na(x)) x - 0 else x - x :
  the condition has length  1 and only the first element will be used
 x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
 1   0   1    2   3   4     6       3
 2   1  NA   2   3   4     7       2
 3   2   1    2   3   4     8       3
 x - transform( yy,
 +   mySum = NifAvail(Q20) + NifAvail(Q21) + NifAvail(Q24),
  Example 5
 +   myCount =
 as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
 + )
 Warning messages:
 1: In if (is.na(x)) x - 0 else x - x :
  the condition has length  1 and only the first element will be used
 2: In if (is.na(x)) x - 0 else x - x :
  the condition has length  1 and only the first element will be used
 3: In if (is.na(x)) x - 0 else x - x :
  the condition has length  1 and only the first element will be used
 x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
 1   0   1    2   3   4     5       3
 2   1  NA   2   3   4    NA       2
 3   2   1    2   3   4     7       3


 x - transform( yy,                                        
 Example 6
 +   mySum = sum(as.numeric(Q20), as.numeric(Q21), as.numeric(Q23), na.rm=T),
 +   myCount =
 as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
 + )
 + x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
 1   0   1    2   3   4    14       3
 2   1  NA   2   3   4    14       2
 3   2   1    2   3   4    14       3

 x - transform( yy,                                       #
 Example 7
 +   mySum = sum(as.numeric(Q20), as.numeric(Q22), as.numeric(Q23), na.rm=T),
 +   myCount =
 as.numeric(!is.na(Q20))+as.numeric(!is.na(Q21))+as.numeric(!is.na(Q24))
 + )
 + x
  Q20 Q21 Q22 Q23 Q24 mySum myCount
 1   0   1    2   3   4    18       3
 2   1  NA   2   3   4    18       2
 3   2   1    2   3   4    18       3

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list

Re: [R] Continuasly Compunded Returns with quantmod-data

2011-11-20 Thread Dennis Murphy
Hi:

Start by writing a small utility function to compute the CCR (not to
be confused with the rock band):

ccr - function(x) 100 * (log(x[-1]) - log(x[-length(x)]))

# Small test:
x - round(rnorm(10, 5, 0.5), 2)
ccr(x)  # Works on vectors...

# Load up the stock data:
library(quantmod)
getSymbols(GOOG,from=2011-11-01)
GOOG1-GOOG[,1]

# Try it directly:
 ccr(GOOG[, 1])
   GOOG.Open
2011-11-02 0
2011-11-03 0
2011-11-04 0
2011-11-07 0
2011-11-08 0
2011-11-09 0
2011-11-10 0
2011-11-11 0
2011-11-14 0
2011-11-15 0
2011-11-16 0
2011-11-17 0

## Hmm, no go. Try this instead:

 apply(GOOG1, 2, ccr)
 GOOG.Open
2011-11-02  0.82403900
2011-11-03  0.35839274
2011-11-04  1.10123942
2011-11-07 -0.03033316
2011-11-08  2.60843853
2011-11-09 -0.78136988
2011-11-10  0.27598990
2011-11-11 -0.76704898
2011-11-14  1.10809039
2011-11-15  0.78637365
2011-11-16 -0.11756255
2011-11-17 -0.33220719
2011-11-18 -1.32834757

If you look at the structure of GOOG1 [str(GOOG1)], you'll see that it
is a one column matrix, so apply() is one way to go, especially if you
want to run the function on multiple columns of a matrix.

HTH,
Dennis

On Sun, Nov 20, 2011 at 3:52 PM, barb mainze...@hotmail.com wrote:
 Hey guys,

 i want to calculate the continuasly compounded returns for stock prices.

 Formula for CCR:
 R_t = ln(P_t/P_{t-1})*100

 With R:

 First i have to modify the vectors, so that they have the same length
 and we start at the second observation.

 log(GOOG1[-1]/GOOG1[1:length(GOOG1)-1])*100

 That does work with normal vectors.

 My Questions:

 1) I want to use this for stock prices.


 so i use:

 library(quantmod)
 getSymbols(GOOG,from=2011-11-01)
 GOOG1-GOOG[,1]


 If i use my formula i get only the value 1 for every observation :(



 Thanks for your time and help!
 I appreciate it

 Regards
 Tonio

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Continuasly-Compunded-Returns-with-quantmod-data-tp4090014p4090014.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Deleting multiple rows from a data matrix based on exp value

2011-11-20 Thread Dennis Murphy
Without a reproducible example this is just a guess, but try

Matrix[apply(Matrix, 1, function(x) any(x  1.11)), ]

This will retain all rows of Matrix where at least one value in a row
is above the threshold 1.11. If that doesn't do what you want, please
provide a small reproducible example and a clearer statement of the
desired output. It's not at all clear to me what 'exp. values' means -
I can devise at least three meanings off the top of my head, none of
which may conform to what you mean.

HTH,
Dennis

On Sun, Nov 20, 2011 at 2:45 PM, Peter Davidsen pkdavid...@gmail.com wrote:
 Dear List,

 I have a data matrix that consists of ~4500 rows and 25 columns (i.e.
 an exprSet object that I converted via the 'exprs' function into a
 data matrix)

 Now I want to remove/delete the rows where all exp. values in that
 particular row are below or equal to a specific cut-off value (e.g
 1.11)

 I have tried using several commands to address this issue:
Matrix[rowSums(Matrix = 1.11) = 1.11, ]

 or

Matrix[ !apply(Matrix=1.11,1,all), ]


 The above commands seem to work fine when I generate a small test matrix 
 like:
M - matrix(c(2,5,8,0.4,0.8,0.5,4,12,3), nrow=3, byrow=T)

 However, non of the two commands decrease the number of rows in my real 
 matrix!

 Any help would be appreciated

 Kind regards,
 Peter

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Apply functions along layers of a data matrix

2011-11-18 Thread Dennis Murphy
Hi:

Here are two ways to do it; further solutions can be found in the doBy
and data.table packages, among others.

library('plyr')
ddply(daf, .(id), colwise(mean, c('v1', 'v2', 'v3', 'v4')))

aggregate(cbind(v1, v2, v3, v4) ~ id, data = daf, FUN = mean)

# Result of each:
  id v1 v2 v3 v4
1  1  6 21 36 51
2  2  7 22 37 52
3  3  8 23 38 53
4  4  9 24 39 54
5  5 10 25 40 55

Dennis

On Fri, Nov 18, 2011 at 5:05 AM,  saschav...@gmail.com wrote:
 Hello

 How can I apply functions along layers of a data matrix?

 Example:

 daf - data.frame(
  'id' = rep(1:5, 3),
  matrix(1:60, nrow=15, dimnames=list( NULL, paste('v', 1:4, sep='') )),
  rep = rep(1:3, each=5)
 )

 The data frame daf contains 3 repetitions/layers (rep) of 4 variables of 5
 persons (id). For some reason, I want to calculate various statistics (e.g.,
 mean, median) *along* the repetitions. The mean calculation, for example,
 would produce the means of daf[1, 'v1'] *along* the 3 repetition:

 (daf[1, 'v1'] + daf[6, 'v1'] + daf[11, 'v1']) / 3

 That is to say, each of the calculations would result in a data frame with 4
 variables (and the id) of the 5 persons:

  id v1 v2 v3 v4
 1  1  6 21 36 51
 2  2  7 22 37 52
 3  3  8 23 38 53
 4  4  9 24 39 54
 5  5 10 25 40 55

 Currently, I do this in a loop, but I was wondering about a quick and
 ressource-friendly way to achieve this?

 Thanks
 *S*

 --
 Sascha Vieweg, saschav...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] couting events by subject with black out windows

2011-11-18 Thread Dennis Murphy
Hi:

Here's a Q  D solution that could be improved. It uses the plyr
package. Starting from your data1 data frame,

library('plyr')
dseq - seq(as.Date('2010-01-01'), as.Date('2011-06-05'), by = '30 days')
# Use the cut() function to create a factor whose levels are demarcated
# by the dates in dseq:
# See ?cut for labeling options
data1[['tf']] - cut(data1$Date, dseq)
ddply(subset(data1, event == 1L), .(tf),  summarise, Date.min = min(Date))

  tf   Date.min
1 2010-01-01 2010-01-01
2 2010-01-31 2010-02-12
3 2010-05-01 2010-05-03
4 2011-03-27 2011-04-21

The value of tf is the left endpoint of the time interval.

This isn't your desired output in two respects: (1) summarise won't
carry along extra variables, so ID gets dropped; (2) you have
2010-03-01 as the first date of a 30-day period, but according to the
way I defined the 30-day intervals, Mar. 1 is the last day of an
interval, so that's why it's not included [2010-2-12 precedes it]. You
can always change the definitions. If you group by months instead, you
get the output you expected.

Hope this is enough to get you started..
Dennis



On Fri, Nov 18, 2011 at 3:22 PM, Chris Conner connerpha...@yahoo.com wrote:
 I large datset that includes subjects(ID), Dates and events that need to be 
 counted.  Not every date includes an event, and I need to only count one 
 event per 30days, per subject.  So in essence, I need to create a 30-day 
 black out period during which time an event cannot be counted for each 
 subject.  The reason is that a rule has been set up, whereby a subject can 
 only be counted once per 30 day period (the 30 day window includes the day 
 the event of interest is counted).

 The solution should count only the following events per subject(per the 
 30-day blackout rule):

 ID Date
 auto1 1/1/2010
 auto2 2/12/2010
 auto2 4/21/2011
 auto3 3/1/2010
 auto3 5/3/2010

 I have created a multistep process to do this, but it is extremely clumsy 
 (detailed below).  I have to believe that one of you has a much more elegant 
 solution.  Thank you all in advance for any help

 ## example data
 data1 - structure(list(ID = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,3L, 
 4L, 4L, 4L, 4L, 4L), .Label = c(, auto1, auto2, auto3), class = 
 factor), Date = structure(c(14610, 14610, 14627,14680, 14652, 14660, 14725, 
 15085, 15086, 14642, 14669, 14732,14747, 14749), class = Date), event = 
 c(1L, 1L, 1L, 0L, 1L,1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L)), .Names = c(ID, 
 Date,event), class = data.frame, row.names = c(NA, 14L))
 ## remove non events
 data2 - data1[data1$event==1,]
 library(doBy)
 ## create a table of first events
 step1 - summaryBy(Date~ID, data = data2, FUN=min)
 step1$Date30 - step1$Date.min+30
 step2 - merge(data2, step1, by.x=ID, by.y=ID)
 ## use an ifelse to essentially remove any events that shouldn't be 
 counted
 step2$event - ifelse(as.numeric(step2$Date) = step2$Date.min  
 as.numeric(step2$Date) = step2$Date30, 0, step2$event)
 ## basically repeat steps above until I get an error (no more events)
 data3 - step2[step2$event==1,]
 data3- data3[,1:3]
 step3 - summaryBy(Date~ID, data = data3, FUN=min)
 step3$Date30 - step3$Date.min+30
 step4 - merge(data3, step3, by.x=ID, by.y=ID)
 step4$event - ifelse(as.numeric(step4$Date) = step4$Date.min  
 as.numeric(step4$Date) = step4$Date30, 0, step4$event)
 ## then I rbind the keepers
 ## in this case steps 1 and 3 above
 final - rbind(step1,step3)
 ## then reformat
 final - final[,1:2]
 final$Date.min - as.Date(final$Date.min,origin=1970-01-01)
 ## again, extremely clumsy, but it works...  HELP! :)
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshape data.frame

2011-11-18 Thread Dennis Murphy
This is very straightforward using the reshape2 package:

library('reshape2')
dc - dcast(a, name ~ year, value_var = 'amount')

  name 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984
1a123456789   10   NA   NA   NA   NA
2b   11   12   13   14   15   16   17   18   19   20   21   22   23   24
  1985
1   NA
2   25

dcast() returns a data frame; a companion function acast() returns a
matrix. If you want to change the names afterward, use

names(dc)[-1] - paste('X', 1971:1985, sep = '.')

HTH,
Dennis

On Fri, Nov 18, 2011 at 4:04 PM, Trevor Davies davies.tre...@gmail.com wrote:
 A late friday afternoon coding question.  I'm having a hard time thinking
 of the correct search terms for what I want to do.

 If I have a df like this:

  a -
 data.frame(name=c(rep('a',10),rep('b',15)),year=c(1971:1980,1971:1985),amount=1:25)
   name year amount
 1     a 1971      1
 2     a 1972      2
 3     a 1973      3
 4     a 1974      4
 5     a 1975      5
 6     a 1976      6
 7     a 1977      7
 8     a 1978      8
 9     a 1979      9
 10    a 1980     10
 11    b 1971     11
 12    b 1972     12
 13    b 1973     13
 14    b 1974     14
 15    b 1975     15
 16    b 1976     16
 17    b 1977     17
 18    b 1978     18
 19    b 1979     19
 20    b 1980     20
 21    b 1981     21
 22    b 1982     22
 23    b 1983     23
 24    b 1984     24
 25    b 1985     25


 and I'd like to reshape it so it is like this:
  X.1971 X.1972 X.1973 X.1974 X.1975 X.1976 X.1977 X.1978 X.1979 X.1980
 X.1981
 a      1      2      3      4      5      6      7      8      9     10
 NA
 b     11     12     13     14     15     16     17     18     19     20
 21
  X.1982 X.1983 X.1984 X.1985
 a     NA     NA     NA     NA
 b     22     23     24     25

 Thanks for the assist.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split list of characters in groups of 2

2011-11-17 Thread Dennis Murphy
Hi:

Here's one way:

apply(matrix(var.names, ncol = 2, byrow = TRUE), 1, function(x)
paste(x[1], x[2], sep = ','))
[1] a,b c,d e,f

HTH,
Dennis

On Wed, Nov 16, 2011 at 9:46 PM, B77S bps0...@auburn.edu wrote:
 hi,

 If i have a list of things, like this

 var.names - c(a, b, c, d, e, f)

 how can i get this:

 a, b, c, d, e, f

 thanks ahead of time.


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/split-list-of-characters-in-groups-of-2-tp4079031p4079031.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Introducing \n's so that par.strip.text can produce multiline strips in lattice

2011-11-17 Thread Dennis Murphy
Hi:

This worked for me - I needed to modify some of the strip labels to
improve the appearance a bit and also reduced the strip font size a
bit to accommodate the lengths of the strings. The main thing was to
change \\n to \n.

Firstly, I created a new variable called Indic as a character variable
and then did some minor surgery on three of the strings:

Indic - as.character(imports$Indicator)
Indic[3 + 6 *(0:5)] - Chemicals and related\n   products imports
Indic[4 + 6 *(0:5)] - Pearls, semiprecious \nprecious stones imports
Indic[5 + 6 *(0:5)] - Metaliferrous ores \nmetal scrap imports

# Read Indic into the imports data frame as a factor:
imports$Indic - factor(Indic)

# Redo the plot:
barchart(X03/1000 ~ time | Indic,
 data = imports[which(imports$time != 1), ],
 horiz = FALSE,
 scales = list(x = list(rot=45, labels=paste(Mar,2007:2011))),
 par.strip.text=list(lineheight=1, lines=2, cex = 0.8))

Dennis

On Wed, Nov 16, 2011 at 11:25 PM, Ashim Kapoor ashimkap...@gmail.com wrote:
 Dear all,

 I have the following data, which has \\n in place of \n. I introduced \n's
 in the csv file so that I could use it in barchart in lattice. When I did
 that and read it into R using read.csv, it read it as \\n. My question is
 how do I introduce \n in the middle of a long string of quoted text so
 that lattice can make multiline strips. Hitting Enter which is supposed to
 introduce \n's does'nt work because when I goto the middle of the line and
 press enter Open Office thinks that I am done with editing my text and
 takes me to the next line.


 dput(imports)
 structure(list(Indicator = structure(c(5L, 4L, 2L, 12L, 8L, 7L,
 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L,
 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L, 5L, 4L, 2L, 12L, 8L, 7L
 ), .Label = c(, Chemicals and related\\n products imports,
 Coal export, Gold imports, Gold  silver imports, Iron ore export,
 Iron  steel imports, Metaliferrous ores  metal scrap imports,
 Mica export, Ores  minerals\\nexport, Other ores \\nminerals
 export,
 Pearls precious \\n semiprecious stones imports, Processed minerals\\n
 export
 ), class = factor), Units = structure(c(2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(,
 Rs.crore), class = factor), Expression = structure(c(2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L), .Label = c(, Ival), class = factor), time = c(7,
 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10,
 10, 10, 10, 11, 11, 11, 11, 11, 11, 1, 1, 1, 1, 1, 1), X03 = c(66170.46,
 65337.72, 62669.86, 33870.17, 36779.35, 27133.25, 71829.14, 67226.04,
 75086.89, 29505.61, 31750.99, 32961.26, 104786.39, 95323.8, 134276.63,
 76263, 36363.61, 41500.36, 140440.36, 135877.91, 111269.69, 76678.27,
 36449.89, 36808.06, 162253.77, 154346.72, 124895.76, 142437.03,
 42872.16, 43881.85, 109096.024, 103622.438, 101639.766, 71750.816,
 36843.2, 36456.956), id = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L)), row.names = c(1.7,
 2.7, 3.7, 4.7, 5.7, 6.7, 1.8, 2.8, 3.8, 4.8,
 5.8, 6.8, 1.9, 2.9, 3.9, 4.9, 5.9, 6.9, 1.10,
 2.10, 3.10, 4.10, 5.10, 6.10, 1.11, 2.11, 3.11,
 4.11, 5.11, 6.11, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1
 ), .Names = c(Indicator, Units, Expression, time, X03,
 id), class = data.frame, reshapeLong = structure(list(varying =
 structure(list(
    X03 = c(X03.07, X03.08, X03.09, X03.10, X03.11,
    X03.1)), .Names = X03, v.names = X03, times = c(7,
 8, 9, 10, 11, 1)), v.names = X03, idvar = id, timevar = time), .Names
 = c(varying,
 v.names, idvar, timevar)))


 On which I want to run

 barchart(X03/1000~time|Indicator,
         data=imports[which(imports$time!=1),],
         horiz=F,
         scales=list(x=list(rot=45,labels=paste(Mar,2007:2011))),
         par.strip.text=list(lineheight=1,lines=2))

 Many thanks,
 Ashim.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Theil decomposition

2011-11-16 Thread Dennis Murphy
Looking at the help pages in the ineq package, the Theil() function
simply returns the value of the index. You can look at the source code
of the relevant functions to see what they actually compute:

library('ineq')
ineq # the main function
Theil# for the Theil index

Neither function is long - ineq() is a wrapper for the index of
choice. The code for Theil is very straightforward.

It appears that if you want to decompose the index, you'll have to
roll your own function to do so. If you know how to compute it
componentwise, you could write a function to get the necessary
information in each group, and then use that information to compute
the overall index.

Here's a reproducible example using the chickwts data that computes
means and sample sizes by feed type, then uses the result to compute a
weighted mean:

library('plyr')
(v - ddply(chickwts, .(feed), summarise,
  m = mean(weight), n = length(weight)))
with(v, weighted.mean(m, n))
# [1] 261.3099

This is a template of how you might produce the components you need
for the Theil index and them pull them together into a weighted sum or
product, if that's what you need to do.

HTH,
Dennis

On Tue, Nov 15, 2011 at 10:35 PM, Kitty Lee lee.ki...@yahoo.com wrote:
 I came across the package 'ineq' that computes a variety of inequality 
 measures (e.g. gini, theil etc). I want to compute the Theil index (racial 
 segregation) and decompose the total into sub-components (by geog levels). I 
 think the package doesn't report the decomposition (correct me if I'm wrong). 
 Just wonder is that available elsewhere?



 K.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geom_bar with missing data in package ggplot

2011-11-16 Thread Dennis Murphy
Hi:

Here's one way, but it puts the two countries side by side rather than
stacked (I'm not a big fan of stacked bar charts except in certain
contexts). The first version uses the original data, but one can see
immediately that there is no distinction between NA and 0:

ggplot(g, aes(x = Date, y = value, fill = var2)) +
geom_bar(position = 'dodge', stat = 'identity') +
facet_wrap(~ variable, nrow = 1) +
scale_fill_manual('Country', breaks = levels(g$var2),
 values = c('red', 'blue')) +
opts(legend.position = c(0.87, 0.88),
 legend.background = theme_rect(fill = 'white'))

To compensate, I copied the data to a new object g2 and imputed a
small negative value to replace the zero:

g2 - g
g2$value[8] - -0.01
ggplot(g2, aes(x = Date, y = value, fill = var2)) +
geom_bar(position = 'dodge', stat = 'identity') +
facet_wrap(~ variable, nrow = 1) +
scale_fill_manual('Country', breaks = levels(g2$var2),
 values = c('red', 'blue')) +
opts(legend.position = c(0.87, 0.88),
 legend.background = theme_rect(fill = 'white'))

An additional improvement could be made by keeping the original data
and adding some text that indicates where the NAs reside; to do this,
we need to offset the date a bit to decently locate the text:

ggplot(g, aes(x = Date, y = value, fill = var2)) +
geom_bar(position = 'dodge', stat = 'identity') +
facet_wrap(~ variable, nrow = 1) +
scale_fill_manual('Country', breaks = levels(g$var2),
 values = c('red', 'blue')) +
geom_text(aes(x = as.Date('2001-3-31'), y = 1, label = 'NA'),
   size = 6) +
opts(legend.position = c(0.87, 0.88),
 legend.background = theme_rect(fill = 'white'))

HTH,
Dennis



On Wed, Nov 16, 2011 at 1:55 AM, Aidan Corcoran
aidan.corcora...@gmail.com wrote:
 Dear all,

 I was hoping someone could help with a ggplot question. I would like
 to generate a faceted bar chart, but missing data are causing
 problems.

 g-structure(list(Date = structure(c(11322, 11687, 12052, 11322,
 11687, 12052, 11322, 11687, 12052, 11322, 11687, 12052), class = Date),
    variable = c(Govt Revenues to GDP, Govt Revenues to GDP,
    Govt Revenues to GDP, Govt Revenues to GDP, Govt Revenues to GDP,
    Govt Revenues to GDP, Structural Budget Position, Structural
 Budget Position,
    Structural Budget Position, Structural Budget Position,
    Structural Budget Position, Structural Budget Position
    ), var2 = c(United States, United States, United States,
    Japan, Japan, Japan, United States, United States,
    United States, Japan, Japan, Japan), value = c(NA,
    34.288, 31.831, 29.636, 30.539, 29.093, NA, 0, -2.7, -7.4,
    -5.7, -7)), .Names = c(Date, variable, var2, value
 ), row.names = c(21L, 22L, 23L, 169L, 170L, 171L, 206L, 207L,
 208L, 354L, 355L, 356L), class = data.frame)

 gp - ggplot(g, aes(Date, value))
 gp- gp + geom_line()
 gp -gp + facet_grid(var2 ~ variable)
 gp

 this works, but trying to get a bar chart version

 gp - ggplot(g, aes(Date, value))
 gp- gp + geom_bar(stat=identity)
 gp -gp + facet_grid(var2 ~ variable)
 gp

 gives the error
 Error in if (!is.null(data$ymin)  !all(data$ymin == 0))
 warning(Stacking not well defined when ymin != 0,  :
  missing value where TRUE/FALSE needed

 Is there something I can do to have a gap for missing data, as happens
 with the line version?

 More generally, I may also have missing data between present data: e.g.

 is.na(g[5,4])-TRUE

 and I would like if possible to simply see gaps at these points.

 Thanks for any help.

 Aidan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create list of names where two df contain == values

2011-11-16 Thread Dennis Murphy
Hi:

I think you're overthinking this problem. As is usually the case in R,
a vectorized solution is clearer and provides more easily understood
code.

It's not obvious to me exactly what you want, so we'll try a couple of
variations on the same idea. Equality of floating point numbers is a
difficult computational problem (see R FAQ 7.31), but if it makes
sense to define a threshold difference between floating numbers that
practically equates to zero, then you're in business. In your example,
the difference in numb1 for letter h in the two data frames is far
from zero, so define 'equal' to be a difference  10 ^{-6}. Then:

# Return the entire matching data frame
df.1[abs(df.1$numb1 - df.2$numb1)  0.01, ]
   Letters numb1 extra.colid
1a 0.3735462 1 CG234
2b 1.1836433 2 CG232
3c 0.1643714 3 CG441
4d 2.5952808 4 CG128
5e 1.3295078 5 CG125
6f 0.1795316 6 CG182
7g 1.4874291 7 CG982
9i 1.5757814 9 CG282
10   j 0.694611610 CG154

# Return the matching letters only as a vector:
df.1[abs(df.1$numb1 - df.2$numb1)  0.01, 'Letters' ]

If you want the latter object to remain a data frame, use drop = FALSE
as an extra argument after 'Letters'. If you want to create a list
object such that each letter comprises a different list component,
then the following will do - the as.character() part coerces the
factor Letters into a character object:

as.list(as.character(df.1[abs(df.1$numb1 - df.2$numb1)  0.01,
 'Letters' ]))

HTH,
Dennis


On Wed, Nov 16, 2011 at 5:03 AM, Rob Griffin robgriffin...@hotmail.com wrote:
 Hello again... sorry to be posting yet again, but I hadn't anticipated this
 problem.

 I am trying to now put the names found in one column in data frame 1 (lets
 call it df.1[,1]) in to a list from the rows where the values in df.1[,2]
 match values in a column of another dataframe (df.2[3])
 I tried to write this function so that it put the list of names (called
 Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think its
 too complex for a beginner R-enthusiast

 ify-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL}
 Iffy-apply(  df.1,  1,  FUN=ify,  x=df.1,  y=df.2,  a=2,  b=3,  c=1  )

 But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s)
 (newX[, i])


 Here is a dataset that replicates the problem, you'll notice the h
 criteria values are different between the two dataframes and therefore it
 would produce a list  of the 9 letters where the two criteria columns
 matched (a,b,c,d,e,f,g,i,j):



 df.1-data.frame(rep(letters[1:10]))
 colnames(df.1)[1]-(Letters)
 set.seed(1)
 df.1$numb1-rnorm(10,1,1)
 df.1$extra.col-c(1,2,3,4,5,6,7,8,9,10)
 df.1$id-c(CG234,CG232,CG441,CG128,CG125,CG182,CG982,CG541,CG282,CG154)
 df.1

 df.2-data.frame(rep(letters[1:10]))
 colnames(df.2)[1]-(Letters)
 set.seed(1)
 df.2$extra.col-c(1,2,3,4,5,6,7,8,9,10)
 df.2$numb1-rnorm(10,1,1)
 df.2$id-c(CG234,CG232,CG441,CG128,CG125,CG182,CG982,CG541,CG282,CG154)
 df.2[8,3]-12

 df.1
 df.2




 Your patience is much appreciated,
 Rob

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] access sublists by a variable

2011-11-16 Thread Dennis Murphy
Hi:

Try dict[[a]] rather than dict$a, and read the section on lists in the
Introduction to R manual to learn how to properly index them.

HTH,
Dennis

On Wed, Nov 16, 2011 at 7:16 AM, Antje Gruner n...@allesjetzt.net wrote:
 Dear list,

 I'd like to access the elements of a list within another list with the
 help of a variable.

 dict - list(  24 = c(1,2,3,6,12,24,48,72,96,120,144,168,720),
          168 = c(1,12,24,48,72,96,120,144,168,336,504,672),
         8760 = c(1,24,168,730,4380,8760)
        )

 dict$24 works fine, but

 a - 24
 dict$a

 returns  NULL

 Unfortunately dict[a] does not work for me, because I see no possibility
 to access the inner elements of list 24 in that case
 i.e. dict[a][1] returns the whole list and not the first element.

 What is the correct syntax to access those elements with the help of a
 variable?
 Thanks in advance

 Antje

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] equal spacing of the polygons in levelplot key (lattice)

2011-11-16 Thread Dennis Murphy
OK, how about this instead?

# library('lattice')
levs - as.vector(quantile(volcano,c(0,0.1,0.5,0.9,0.99,1)))
levq - seq(min(levs), max(levs), length = 6)
levelplot(volcano, at = levs,
   colorkey = list(at = levq,
 labels = list(at = levq,
   labels = levs) ))

Dennis

On Wed, Nov 16, 2011 at 10:27 AM, Andy Bunn andy.b...@wwu.edu wrote:


 -Original Message-
 From: Dennis Murphy [mailto:djmu...@gmail.com]
 Sent: Tuesday, November 15, 2011 8:54 PM
 To: Andy Bunn
 Cc: r-help@r-project.org
 Subject: Re: [R] equal spacing of the polygons in levelplot key
 (lattice)

 Hi:

 Does this work?

 Thanks Dennis.

 This almost works. Is there a way to make the rectangles in the key the same 
 size? In this example five rectangles of the same area evenly arrayed? Can 
 the key be coerced into being categorical?

 The data I want to work with are not spatial but it occurs to me that this is 
 a common mapping task (e.g., in this example you might want to label these 
 colors 'low', 'kind of low', 'medium low', etc. or map land covers or such.) 
 I'll look at the sp or raster plotting equivalent.







 # library('lattice')
 levs - as.vector(quantile(volcano,c(0,0.1,0.5,0.9,0.99,1)))
 levelplot(volcano, at = levs,
             colorkey = list(labels = list(at = levs,
                                                    labels = levs) ))

 HTH,
 Dennis

 On Tue, Nov 15, 2011 at 1:12 PM, Andy Bunn andy.b...@wwu.edu wrote:
  Given the example:
  R (levs - quantile(volcano,c(0,0.1,0.5,0.9,0.99,1)))
    0%  10%  50%  90%  99% 100%
    94  100  124  170  189  195
  R levelplot(volcano,at=levs)
 
  How can I make the key categorical with the size of the divisions
 equally spaced in the key? E.g., five equal size rectangles with labels
 at levs c(100,124,170,189,195)?
 
  Apologies if this is obvious.
 
  -A
 
  R version
                 _
   platform       i386-pc-mingw32
   arch           i386
   os             mingw32
   system         i386, mingw32
   status
   major          2
   minor          14.0
   year           2011
   month          10
   day            31
   svn rev        57496
   language       R
   version.string R version 2.14.0 (2011-10-31)
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot strange behavior

2011-11-16 Thread Dennis Murphy
And you got a reply on the ggplot2 list, which is why you're asked not
to cross-post.

For those who are wondering, geom_boxplot() in the ggplot2 package
will by default plot outside points along the same line as the boxplot
whiskers at their actual values. The gentleman jittered the original
points in a separate call to geom_point(), so the result is that the
outside points are plotted twice - once along the whisker lines (in
black) from geom_boxplot() and once as jittered points with alpha
transparency from geom_point(). Since jitter can be positive or
negative in either the horizontal direction, confusion ensues...

Dennis

On Wed, Nov 16, 2011 at 11:16 AM, Giovanni Azua
azuag...@student.ethz.ch wrote:
 Hello,

 I generate box plots from my data like this:

 qplot(x=xxx,y=column,data=data,geom=boxplot) + xlab(xxx) + ylab(ylabel) + 
 theme_bw() + scale_y_log10() + geom_jitter(alpha=I(1/10))

 The problem is that I see lot of points above the maximum at the same level 
 as some outliers. It looks very weird as I expected the outliers to be few 
 and specially not see any points other than the outliers below the minimum or 
 above the maximum indicator. Can anyone explain what's going on?

 Attached I send a snapshot, see the second level having some outliers, 
 separate from the outliers there are also some points which seem not to be 
 outliers and that are above the maximum indicator? is this a bug or am I 
 missing anything?

 TIA,
 Best regards,
 Giovanni

 PS: I sent this to the ggplot2 list too (sorry for the double post but I am 
 kind of under pressure with this)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] equal spacing of the polygons in levelplot key (lattice)

2011-11-15 Thread Dennis Murphy
Hi:

Does this work?

# library('lattice')
levs - as.vector(quantile(volcano,c(0,0.1,0.5,0.9,0.99,1)))
levelplot(volcano, at = levs,
colorkey = list(labels = list(at = levs,
   labels = levs) ))

HTH,
Dennis

On Tue, Nov 15, 2011 at 1:12 PM, Andy Bunn andy.b...@wwu.edu wrote:
 Given the example:
 R (levs - quantile(volcano,c(0,0.1,0.5,0.9,0.99,1)))
   0%  10%  50%  90%  99% 100%
   94  100  124  170  189  195
 R levelplot(volcano,at=levs)

 How can I make the key categorical with the size of the divisions equally 
 spaced in the key? E.g., five equal size rectangles with labels at levs 
 c(100,124,170,189,195)?

 Apologies if this is obvious.

 -A

 R version
                _
  platform       i386-pc-mingw32
  arch           i386
  os             mingw32
  system         i386, mingw32
  status
  major          2
  minor          14.0
  year           2011
  month          10
  day            31
  svn rev        57496
  language       R
  version.string R version 2.14.0 (2011-10-31)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert full matrix back to lower triangular matrix

2011-11-15 Thread Dennis Murphy
Hi:

(1) Here is why your e-mails look mangled on this list:
  [[alternative HTML version deleted]]
 R-help is a text-only list, so please change your mailer's
settings to send ASCII text rather than HTML.

(2) The print method you see displayed in dd1 is equivalent to the following:

ddm * lower.tri(ddm)

Note that lower.tri(ddm) produces a logical matrix; in R, when you
multiply a numeric object by a logical, the latter is converted to 0's
and 1's. In this case, the lower triangular elements are retained
while the other elements are coerced to 0.

HTH,
Dennis


On Tue, Nov 15, 2011 at 2:55 PM, Juliet Ndukum jpnts...@yahoo.com wrote:
 Given a vector; ab = seq(0.5,1, by=0.1) ab[1] 0.5 0.6 0.7 0.8 0.9 1.0
 The euclidean distance between the vector elements is given by the lower 
 triangular matrix  dd1 =
  dist(ab,euclidean) dd1    1   2   3   4   52 0.1                3 0.2 0.1 
            4 0.3 0.2 0.1        5 0.4 0.3 0.2 0.1    6 0.5 0.4 0.3 0.2 0.1
 Convert the lower triangular matrix to a full matrix ddm = as.matrix(dd1) 
 ddm    1   2   3   4   5   61 0.0 0.1 0.2 0.3 0.4 0.52 0.1 0.0 0.1 0.2 0.3 
 0.43 0.2 0.1 0.0 0.1 0.2 0.34 0.3 0.2 0.1
  0.0 0.1 0.25 0.4 0.3 0.2 0.1 0.0 0.16 0.5 0.4 0.3 0.2 0.1 0.0
 I would be grateful if someone could provide me with a code to convert ddm to 
 the lower triangular matrix as before i.e. dd1
 above.
 Your help will be greatly appreciated.JN
 NB: I apologize for the poor output of the previous email. Thanks for  your 
 understanding.
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] global optimisation with inequality constraints

2011-11-15 Thread Dennis Murphy
(1) Please do not hijack another thread to ask an off-topic question -
start a new one instead.
(2) Your question refers to a topic called 'restricted least squares'.
Search the R-help archives on that subject and you should get a number
of answers. Someone answered a similar question here within the past
two weeks.
(3) Weighted least squares is not restricted least squares. They have
different purposes.

Good luck.
Dennis

On Tue, Nov 15, 2011 at 10:24 PM, geeknick geekn...@rocketmail.com wrote:
 Please advice on the package I should use to run a linear regression model
 (weighted least squared) with linear equality constraint. I initially tried
 constrOptim but it turned out that it only supported
 http://solve-graph-linear-inequalities.blogspot.com/2011/11/blog-post.html
 linear inequality  constraint. Thank you very much in advance.

 Cheers,

 NICK

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/global-optimisation-with-inequality-constraints-tp3799258p4075455.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] max min values within dataframe

2011-11-14 Thread Dennis Murphy
Groupwise data summarization is a very common task, and it is worth
learning the various ways to do it in R. Josh showed you one way to
use aggregate() from the base package and Michael showed you one way
of using the plyr package to do the same; another way would be

ddply(df, .(Patient, Region), summarise, max = max(Score), min = min(Score))

to save on writing an explicit function. Similarly, if you have a
version of R = 2.11.0, the aggregate() function now has a nice
formula interface, so Josh's code could also be written as

aggregate(Score ~ Patient + Region, data = df, FUN = range)

with a subsequent renaming of the variables as shown.

Other packages that could perform this task with ease include the doBy
package, the data.table package, the remix package, the Hmisc package
and, if you are comfortable with SQL, the sqldf package. For relative
novices, the doBy package is a very nice place to start because it
comes with a well written vignette and the function names correspond
well with the tasks they perform (e.g., summaryBy(), transformBy()).
The plyr and data.table packages are more general and more powerful in
terms of the types of tasks to which each is suited. Unlike
aggregate() and doBy:::summaryBy(), these packages can process
multivariable functions. As noted above, if you have an SQL
background, sqldf operates on R data objects as though they were SQL
tables, which is advantageous in complex data extraction tasks.
Package remix is useful if you want to organize results into a tabular
form that is reminiscent of SAS.

HTH,
Dennis

On Mon, Nov 14, 2011 at 8:10 AM, B Laura gm.spam2...@gmail.com wrote:
 dear R-team

 I need to find the min, max values for each patient from dataset and keep
 the output of it as a dataframe with the following columns
  - Patient nr
  - Region (remains same per patient)
  - Min score
  - Max score


    Patient Region Score Time
 1        1      X    19   28
 2        1      X    20  126
 3        1      X    22  100
 4        1      X    25  191
 5        2      Y    12    1
 6        2      Y    12    2
 7        2      Y    25    4
 8        2      Y    26    7
 9        3      X     6    1
 10       3      X     6    4
 11       3      X    21   31
 12       3      X    22   68
 13       3      X    23   31
 14       3      X    24   38
 15       3      X    21   15
 16       3      X    22   24
 17       3      X    23   15
 18       3      X    24  243
 19       3      X    25   77
 20       4      Y     6    5
 21       4      Y    22   28
 22       4      Y    23   75
 23       4      Y    24   19
 24       5      Y    23    3
 25       5      Y    24    1
 26       5      Y    23   33
 27       5      Y    24   13
 28       5      Y    25   42
 29       5      Y    26   21
 30       5      Y    27    4
 31       6      Y    24    4
 32       6      Y    32    8

 So far I could find the min and max values for each patient, but the output
 of it is not (yet) what I need.

 Patient.nr = unique(Patient)
 aggregate(Score, list(Patient), max)
  Group.1  x
 1       1 25
 2       2 26
 3       3 25
 4       4 24
 5       5 27
 6       6 32

 aggregate(Score, list(Patient), min)
  Group.1  x
 1       1 19
 2       2 12
 3       3  6
 4       4  6
 5       5 23
 6       6 24
 I would like to do same but writing this new information (min, max values)
 in a dataframe with following columns
  - Patient nr
 - Region (remains same per patient)
 - Min score
 - Max score

 Can anybody help me with this?

 Thanks
 Laura

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding units to levelplot's colorkey

2011-11-14 Thread Dennis Murphy
You don't show code or a reproducible example, so I guess you want a
general answer. Use the draw.colorkey() function inside the
levelplot() call. It takes an argument key =, which accepts a list of
arguments, including space, col, at, labels, tick.number, width and
height (see p. 155 of the Lattice book). More specifically, the labels
argument also accepts a list of values, with components at, labels,
cex, col, font, fontface and fontfamily. You could attach the units as
labels with a combination of at = and labels = inside the (outer)
labels argument, something like

myAt - c(1, 3, 5, 7, 9)
myLab - paste(myAt, 'cm')

levelplot( ...
   draw.colorkey(key = list(labels = list(at = myAt, labels = myLab)),
...)

If that doesn't work, then please provide a reproducible example.

HTH,
Dennis

On Mon, Nov 14, 2011 at 12:34 PM, Carlisle Thacker
carlisle.thac...@noaa.gov wrote:
 How to add units (e.g. cm) to the color key of a lattice levelplot?

 The plots looks fantastic, but it would be nice to indicate somewhere near
 the end of the color key that the values associated with its colors are in
 centimeters or some other physical units.

 The only thing I find is the possibility to specify the labels so that one
 explicitly includes the units.  That leaves little flexibility for
 positioning where this information appears.  Is there a better way?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generate the distribution

2011-11-13 Thread Dennis Murphy
Google is an amazing resource for getting information. Try Googling
'simulation in R' - I got several useful hits on the first page.

HTH,
Dennis

On Sun, Nov 13, 2011 at 7:41 AM, Anban nino.z...@gmail.com wrote:
 Hi everyone,

 i really need some help with one task. I simply cant understand what i
 really have to do.

 The task is:

 Generate the distribution of maximum on samples of size 200 from beta with
 shape parameters 5 and 5 distribution. Plot a histogram of simulated values
 and overlay at least one distribution curve that you think might be
 suitable.

 Im rookie with simulations, so i need yours help.

 Tnx

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Generate-the-distribution-tp4036755p4036755.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2^k*r (with replications) experimental design question

2011-11-13 Thread Dennis Murphy
I'm guessing you have nine replicates of a 2^5 factorial design with a
couple of missing values. If so, define a variable to designate the
replicates and use it as a blocking factor in the ANOVA. If you want
to treat the replicates as a random rather than a fixed factor, then
look into the nlme or lme4 packages.

HTH,
Dennis

On Sun, Nov 13, 2011 at 4:33 PM, Giovanni Azua brave...@gmail.com wrote:
 Hello,

 I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the 
 context of performance analysis i.e. my response variables are Throughput and 
 Response Time. I use the aov function and the results look ok:

 str(throughput)
 'data.frame':   286 obs. of  7 variables:
  $ Time          : int  6 7 8 9 10 11 12 13 14 15 ...
  $ Throughput    : int  42 44 33 41 43 40 37 40 42 37 ...
  $ No_databases  : Factor w/ 2 levels 1,4: 1 1 1 1 1 1 1 1 1 1 ...
  $ Partitioning  : Factor w/ 2 levels sharding,replication: 1 1 1 1 1 1 1 
 1 1 1 ...
  $ No_middlewares: Factor w/ 2 levels 2,4: 1 1 1 1 1 1 1 1 1 1 ...
  $ Queue_size    : Factor w/ 2 levels 40,100: 1 1 1 1 1 1 1 1 1 1 ...
  $ No_clients    : Factor w/ 1 level 128: 1 1 1 1 1 1 1 1 1 1 ...
 head(throughput)
  Time Throughput No_databases Partitioning No_middlewares Queue_size
 1    6         42            1     sharding              2         40
 2    7         44            1     sharding              2         40
 3    8         33            1     sharding              2         40
 4    9         41            1     sharding              2         40
 5   10         43            1     sharding              2         40
 6   11         40            1     sharding              2         40

 throughput.aov - 
 aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
 summary(throughput.aov)
                              Df    Sum Sq  Mean Sq F value    Pr(F)
 No_databases       1    28488651 28488651 53.4981 2.713e-12 ***
 Partitioning            1    71687    71687  0.1346  0.713966
 No_middlewares   1     5624454  5624454 10.5620  0.001295 **
 Queue_size          1     50892    50892  0.0956  0.757443
 Residuals             281 149637226   532517
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


 This is somehow what I expected and I am happy, it is saying that the 
 Throughput is significatively affected firstly by the number of database 
 instances and secondly by the number of middleware instances.

 The problem is that I need to integrate multiple replications of this same 
 2^k so I can also account for experimental error i.e. the _r_ of 2^k*r but I 
 can't see how to integrate the _r_ term into the data and into the aov 
 function parameters. Can anyone advice?

 TIA,
 Best regards,
 Giovanni
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] beanplot without log scale?

2011-11-11 Thread Dennis Murphy
Hi:

Try the log =  argument to beanplot(). Here's an example:

library('beanplot')
v - exp(runif(1000, 0, 10))
beanplot(v)
beanplot(v, log = ')

HTH,
Dennis

On Thu, Nov 10, 2011 at 1:27 PM, Twila Moon twi...@uw.edu wrote:
 Is it possible to force beanplot not to use a log scale? I want to be able to 
 create multiple beanplots of different data on the same specified y-axis.

 When I try to force a ylim, I get the following...
 beanplot(reg5vel,ylim=(c(0,12000)))
 log=y selected
 Warning message:
 In plot.window(xlim = c(0.5, 7.5), ylim = c(0, 12000), log = y) :
  nonfinite axis limits [GScale(-inf,4.07918,2, .); log=1]

 Thanks,
 Twila

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multivariate modeling codes

2011-11-11 Thread Dennis Murphy
Hi:

R-Bloggers picked up on this web site that contains several
interesting posts re usage of R, including this one on using a
Bayesian approach to multivariate mixed effects models using the
MCMCglmm package:
http://www.quantumforest.com/2011/11/coming-out-of-the-bayesian-closet-multivariate-version/

With multiple responses in a longitudinal study, this might be of
interest to you. A similar question re multivariate response was asked
a few days ago on R-sig-mixed-models, where Kevin Wright posted the
above link. I'd suggest that future questions on this subject be moved
to that group.

HTH,
Dennis

On Fri, Nov 11, 2011 at 12:25 PM, yurirouge lilysh...@msn.com wrote:
 HI,

 I am relatively new to R and would appreciate some help or directions for
 this.
 I am trying to model 3 longitudinal outcomes jointly and to identify some
 predictors for these 3 joint outcomes (all continuous). I am trying to find
 some codes that I may modify to do this but cannot seem to find anything.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/multivariate-modeling-codes-tp4032624p4032624.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining Overlapping Data

2011-11-11 Thread Dennis Murphy
Hi:

This doesn't sort the data by strain level, but I think it does what
you're after. It helps if strain is either a factor or character
vector in each data frame.

h - function(x, y) {
   tbx - table(x$strain)
   tby - table(y$strain)
  # Select the strains who have more than one member
  # in each data frame
   mgrps - intersect(names(tbx[tbx  0]),
  names(tby[tby  0]))
  # concatenate the data with common strains
   rbind(subset(x, gp %in% mgrps),
 subset(y, gp %in% mgrps))
   }

# Result:
dc - h(x, y)

HTH,
Dennis

On Fri, Nov 11, 2011 at 1:07 PM, kickout kyle.ko...@gmail.com wrote:
 I've scoured the archives but have found no concrete answer to my question.

 Problem: Two data sets

 1st data set(x) = 20,000 rows
 2nd data set(y) = 5,000 rows

 Both have the same column names, the column of interest to me is a variable
 called strain.

 For example, a strain named Chab1405 appears in x 150 times and in y 25
 times...
 strain Chab1999 only appears 200 times in x and none in y (so i dont want
 that retained).


 I want to create a new data frame that has all 175 measurements for
 Chab1405 and any other 'strain' that appears in both the two data sets..
 but not strains that appear in only one data set...So i want the
 intersection of two data sets (maybe?).

 I've tried x %in% y, but that only gives TRUE/FALSE


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Combining-Overlapping-Data-tp4032719p4032719.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 - regression statistics how to display on plot

2011-11-10 Thread Dennis Murphy
Hi:

Here's an example of how one might do this in a specific example using
geom_text().

# Some fake data:
df - data.frame(x = 1:10, y = 0.5 + (1:10) + rnorm(10))

# Fit a linear model to the data and save the model object:
mod - lm(y ~ x, data = df)

# Create a list of character strings - the first component
# produces the fitted model, the second produces a
# string to compute R^2, but in plotmath syntax.

rout - list(paste('Fitted model: ', round(coef(mod)[1], 3), ' + ',
   round(coef(mod)[2], 3), ' x', sep = ''),
  paste('R^2 == ', round(summary(mod)[['r.squared']], 3),
sep = '')  )

# This looks ugly, but I'm using the round() function to make the
# equations look more sane. coef(mod) extracts the model
# coefficients (intercept, then slope), and
# summary(mod)[['r.squared']] extracts R^2.

# See what they look like:
rout

# Notice that the first component of rout is simply a text string
# that can be passed as is, but the second string needs to be
# wrapped inside an expression, which is what parse = TRUE
# in geom_text() does.

# Now construct the plot; given the (x, y) extent of the data, the
# coordinates make sense, but you have to adapt them to
# your data.

ggplot(df, aes(x, y)) +
   geom_smooth(method = 'lm') +
   geom_text(aes(x = 2, y = 10, label = rout[[1]]), hjust = 0) +
   geom_text(aes(x = 2, y = 9.5, label = rout[[2]]), hjust = 0, parse = TRUE )

hjust = 0 makes the text strings flush left, parse = TRUE in the
second call converts the second string in rout into an expression.

This is fine if your plot is a one-off or two-off deal; the code above
gives you a sense of the programming required. OTOH, if you need to do
this a lot, then you're better off with a function that can automate
the process, which Bryan was kind enough to provide.

I might add that there is a dedicated ggplot2 listserv on google:
ggpl...@googlegroups.com, for which questions like these are well
suited.

HTH,
Dennis

On Thu, Nov 10, 2011 at 5:45 AM, Durant, James T. (ATSDR/DTEM/PRMSB)
h...@cdc.gov wrote:
 Hello -

 So I am trying to use ggplot2 to show a linear regression between two 
 variables, but I want to also show the fit of the line on the graph as well.

  I am using ggplot2 for other graphics in what I am working on, so even 
 though this would be a fairly easy thing to do in Excel, I would prefer to do 
 it in R to keep my look and feel, and I think ggplot2 is just cooler.

 Here is a sample script of what I am trying to accomplish:

 df-NULL
 df$x-rnorm(100)
 df$y-rnorm(100)
 df-data.frame(df)

 ggplot(df, aes(x=x,y=y))+geom_point()+geom_smooth(method=lm)


 # would like to be able to showr squared and slope/intercept of lm

 VR

 Jim


 James T. Durant, MSPH CIH
 Emergency Response Coordinator
 US Agency for Toxic Substances and Disease Registry
 Atlanta, GA 30341
 770-378-1695





        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] newbie's question : xyplot legend with a white background

2011-11-10 Thread Dennis Murphy
Hi:

Add a background = 'color' argument to key():

library('lattice')
xyplot(1~1,
  panel = function(x,y, ...) {
  panel.xyplot(x,y)
  panel.abline(v=seq(0,1.4,by=0.1))
  panel.abline(h=seq(0,1.4,by=0.1))
}
  ,key = list (x=0.2, y=0.88,
   points = list(col=c(red,green),pch=20, cex=2),
   text=list(c(a,z)),
   background = 'white'))

Dennis

On Thu, Nov 10, 2011 at 8:09 AM, PtitBleu ptit_b...@yahoo.fr wrote:
 Hello,

 Sorry in advance for adding a silly question on this forum but I haven't
 found the right keywords to find a solution to this basic problem.
 I'm just looking a way to have a white background behind the legend to hide
 the grid.

 Thanks in advance.


 The silly example for my silly question:

 xyplot(1~1,
       panel = function(x,y, ...) {
           panel.xyplot(x,y)
                   panel.abline(v=seq(0,1.4,by=0.1))
                   panel.abline(h=seq(0,1.4,by=0.1))
                 }
 ,key=list(x=0.2, y=0.88, points=list(col=c(red,green),pch=20, cex=2),
 text=list(c(a,z)))
 )


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/newbie-s-question-xyplot-legend-with-a-white-background-tp4024152p4024152.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting columns that match criteria

2011-11-10 Thread Dennis Murphy
Hi:

Here's a toy example:

# Default var names are V1-V20:
u - as.data.frame(matrix(rpois(100, 3), ncol = 20))
u - transform(u,
ngt1 = apply(u[, c('V1', 'V4', 'V9', 'V15')], 1, function(x) sum(x  1)) )
u

HTH,
Dennis

On Thu, Nov 10, 2011 at 7:24 AM, JL Villanueva jlpost...@gmail.com wrote:
 Hi,

 I am a little new in R but I'm finding it extremely useful :)

 Here's my tiny question:

 I've got a table with a lot of columns. What I am interested now is to
 evaluate how many of 4 columns have a value greater than 1.
 I think it can be done with subset() but it will take a very long condition
 and become unfeasible if I want to compare more than 4 columns.

 I put here a small example

 Col1  Col2  Col3  Col 4
 1        1       1       1             -0 columns greater than 1
 2        1       1       1             -1 column greater than 1
 4        1       4      1              -2 columns greater than 1
 3        3       3       3             -3 columns greater than 1

 Then I want to filter by that number, my idea is to create a new column
 storing the number calculated and subset() by it.

 Any hints?
 Thanks in advance

 JL

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting a function with given formula in ggplot2

2011-11-10 Thread Dennis Murphy
Hi:

Borrowing from this thread, courtesy of Brian Diggs:
http://groups.google.com/group/ggplot2/browse_thread/thread/478f9e61d41b4678/ed323c497db61156?lnk=gstq=stat_function#ed323c497db61156

...here's a small reproducible example:

ddf - data.frame(x = 1:10, y = 0.4 + 0.6 * (1:10) + rnorm(10))
# Find the linear model coefficients
lmc - coef(lm(y ~ x, data = ddf))
# Create a function to produce the fitted line
lmeq - function(x) lmc[1] + lmc[2] * x

# Construct the ggplot() and use stat_function():
ggplot(ddf, aes(x = x, y = y)) +
geom_point() +
stat_function(fun = lmeq, colour = 'red', size = 1)

HTH,
Dennis

On Thu, Nov 10, 2011 at 10:47 AM, Curiouslearn curiousle...@gmail.com wrote:
 Hi All,

 I have a scatter plot produced using ggplot2 and I want to add the
 regression line to this scatter plot. I suppose I can use
 geom_smooth() to do this, but for the sake of learning ( I am new both
 to R and ggplot2), I want to try and add it as a function (something
 that curve() does in the standard R plotting). I did some search and
 found that stat_function() can be used for this. But somehow it is not
 working. The following is my code. Can you please tell me where I am
 going wrong and what the correct code would be:

 reg1 - lm(kid_score ~ mom_hs, data=iqdata)

 scatter  -  ggplot() + geom_point(data=iqdata,

 aes(x=as.factor(mom_hs), y=kid_score) )
                               + xlab(Mother's High School Status)
                               + ylab(Children's Test Score)
                               + stat_function(fun=function(x)
 coef(reg1)[1] + coef(reg1)[2] * x)

 I understand that you probably do not have the data I am using (I am
 trying this out from Gelman Hill's book on Multilevel models). But I
 was hoping you may be able to point out the error without the data.

 Thanks for your help.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Listing tables together from random samples from a generated population?

2011-11-10 Thread Dennis Murphy
Hi:

There are two ways you could go about this: lists or arrays. It's
pretty easy to generate an array, a little more work to get the list.
I'm assuming the objective is to extract a chi-square statistic from
each table, so I'll show a couple of ways to do that, too.

library('plyr')

## Start with the data:
y-data.frame(gender=sample(c('Male', 'Female'), size=10,
replace=TRUE, c(0.5, 0.5)),
 tea=sample(c('Yes', 'No'),  size=10,
replace=TRUE, c(0.5, 0.5)))
## Function to produce a table:
tabfun - function(d) table(d[sample(seq_len(nrow(d)), 100), ])
x2stat - function(m) chisq.test(m)$statistic

## Array version:

tbarr - replicate(100, tabfun(y))
# X^2 statistics using apply() from base R and
# aaply() from plyr:
u1 - apply(tablist, 3, x2stat)
u2 - aaply(tablist, 3, x2stat)

## List version:

tblst - vector('list', 100)
for(i in seq_along(tblst)) tblst[[i]] - tabfun(y)

v1 - unname(do.call(c, lapply(tblst, x2stat)))
v2 - laply(tblst, x2stat)

From here, it's easy to do the histogram :)

HTH,
Dennis


On Thu, Nov 10, 2011 at 12:48 PM, Simon Kiss sjk...@gmail.com wrote:
 .
 HI there,
 I'd like to show demonstrate how the chi-squared distribution works, so I've 
 come up with a sample data frame of two categorical variables
 y-data.frame(gender=sample(c('Male', 'Female'), size=10, replace=TRUE, 
 c(0.5, 0.5)), tea=sample(c('Yes', 'No'), size=10, replace=TRUE, c(0.5, 
 0.5)))

 And I'd like to create a list of 100 different samples of those two variables 
 and the resulting 2X2 contingency tables

 table(.y[sample(nrow(.y), 100), ])

 How would I combine these 100 tables into a list? I'd like to be able to go 
 in and find some of the extreme values to show how the sampling distribution 
 of the chi-square values.

 I can already get a histogram of 100 different chi-squared values that shows 
 the distribution nicely (see below), but I'd like to actually show the 
 underlying tables, for demonstration's sake.

  .z-vector()
 for (i in 1:100) {
 .z-c(.z, chisq.test(table(.y[sample(nrow(.y), 200), ]))$statistic)
 }
 hist(.z, xlab='Chi-Square Value', main=Chi-Squared Values From 100 different 
 samples asking\nabout gender and tea/coffee drinking)
 abline(v=3.84, lty=2)

 Thank you in advance,
 Simon Kiss

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why NA coefficients

2011-11-08 Thread Dennis Murphy
The cell mean mu_{12} is non-estimable because it has no data in the
cell. How can you estimate something that's not there (at least
without imputation :)? Every parametric function that involves mu_{12}
will also be non-estimable - in particular,  the interaction term and
the population marginal means . That's why you get the NA estimates
and the warning. All this follows from the linear model theory
described in, for example, Milliken and Johnson (1992), Analysis of
Messy Data, vol. 1, ch. 13.

Here's an example from Milliken and Johnson (1992) to illustrate:
  B1 B2   B3
T1  2, 6   8, 6
T23  14  12, 9
T36   9

Assume a cell means model E(Y_{ijk}) = \mu_{ij}, where the cell means
are estimated by the cell averages.

From M  J (p. 173, whence this example is taken):
Whenever treatment combinations are missing, certain
hypotheses cannot be tested without making some
additional assumptions about the parameters in the model.
Hypotheses involving parameters corresponding to the
missing cells generally cannot be tested. For example,
for the data [above] it is not possible to estimate any
linear combinations (or to test any hypotheses) that
involve parameters \mu_{12} and \mu_{33} unless one
is willing to make some assumptions about them.

They continue:
One common assumption is that there is no
interactions between the levels of T and the levels of B.
In our opinion, this assumption should not be made
without some supporting experimental evidence.

In other words, removing the interaction term makes the
non-estimability problem disappear, but it's a copout unless there is
some tangible scientific justification for an additive rather than an
interaction model.

For the above data, M  J note that it is not possible to estimate all
of the expected marginal means - in particular, one cannot estimate
the population marginal means $\bar{\mu}_{1.}$, $\bar{\mu}_{3.}$,
$\bar{\mu}_{.2}$ or $\bar{\mu}_{.3}$. OTOH, $\bar{\mu}_{2.}$ and
$\bar{\mu}_{.1}$ since these functions of the parameters involve terms
associated with the means of the missing cells. Moreover, any
hypotheses involving parametric functions that contain non-estimable
cell means are not testable. In this example, the test of equal row
population marginal means is not testable because $\bar{\mu}_{1.}$ and
$\bar{\mu}_{3.}$ are not estimable.

[Aside: if the term parametric function is not familiar, in this
context it refers to linear combinations of model parameters.  In the
M  J example, $\bar{\mu}_{1.} = \mu_{11} + \mu_{12} + \mu_{13}$ is a
parametric function.]

Hopefully this sheds some light on the situation.

Dennis

On Mon, Nov 7, 2011 at 10:17 PM, array chip arrayprof...@yahoo.com wrote:
 Hi Dennis,
 The cell mean mu_12 from the model involves the intercept and factor 2:
 Coefficients:
   (Intercept) factor(treat)2
 factor(treat)3
  0.429244
 0.499982   0.352971
    factor(treat)4 factor(treat)5
 factor(treat)6
     -0.204752
 0.142042   0.044155
    factor(treat)7 factor(group)2
 factor(treat)2:factor(group)2
     -0.007775
 -0.337907  -0.208734
 factor(treat)3:factor(group)2  factor(treat)4:factor(group)2
 factor(treat)5:factor(group)2
     -0.195138
 0.800029   0.227514
 factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
  0.331548 NA
 So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by:
 predict(fit,data.frame(list(treat=1,group=2)))
  1
 0.09133691
 Warning message:
 In predict.lm(fit, data.frame(list(treat = 1, group = 2))) :
   prediction from a rank-deficient fit may be misleading

 But as you can see, it gave a warning about rank-deficient fit... why this
 is a rank-deficient fit?
 Because treat 1_group 2 has no cases, so why it is still estimable while
 on the contrary, treat 7_group 2 which has 2 cases is not?
 Thanks
 John




 
 From: Dennis Murphy djmu...@gmail.com
 To: array chip arrayprof...@yahoo.com
 Sent: Monday, November 7, 2011 9:29 PM
 Subject: Re: [R] why NA coefficients

 Hi John:

 What is the estimate of the cell mean \mu_{12}? Which model effects
 involve that cell mean? With this data arrangement, the expected
 population marginal means of treatment 1 and group 2 are not estimable
 either, unless you're willing to assume a no-interaction model.

 Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data
 (vol. 1) cover this topic in some detail, but it assumes you're
 familiar with the matrix form of a linear statistical model. Both
 chapters cover the two-way model with interaction - Ch.13 from the
 cell means model approach and Ch. 14 from the model effects approach.
 Because this was written

Re: [R] ggplot2 reorder factors for faceting

2011-11-08 Thread Dennis Murphy
Hi:

(1) Here is one way to reorganize the levels of a factor:
plotData[['infection']] - factor(plotData[['infection']],
   levels = c('InfA', 'InfC', 'InfB', 'InfD'))

Do this ahead of the call to ggplot(), preferably after plotData is defined.

relevel() resets the baseline category of a factor, but here you want
to make multiple changes.

(2) You probably want a better title for the legend. Assuming you want
'Scale' as the title, you can add the following to labs:

labs(..., fill = 'Scale')

HTH,
Dennis


On Tue, Nov 8, 2011 at 3:51 AM, Iain Gallagher
iaingallag...@btopenworld.com wrote:


 Dear List

 I am trying to draw a heatmap using ggplot2. In this heatmap I have faceted 
 my data by 'infection' of which I have four. These four infections break down 
 into two types and I would like to reorder the 'infection' column of my data 
 to reflect this.

 Toy example below:

 library(ggplot2)

 # test data for ggplot reordering
 genes - (rep (c(rep('a',4), rep('b',4), rep('c',4), rep('d',4), rep('e',4), 
 rep('f',4)) ,4))
 fcData - rnorm(96)
 times - rep(rep(c(2,6,24,48),6),4)
 infection - c(rep('InfA', 24), rep('InfB', 24), rep('InfC', 24), rep('InfD', 
 24))
 infType - c(rep('M', 24), rep('D',24), rep('M', 24), rep('D', 24))

 # data is long format for ggplot2
 plotData - as.data.frame(cbind(genes, as.numeric(fcData), as.numeric(times), 
 infection, infType))

 hp2 - ggplot(plotData, aes(factor(times), genes)) + geom_tile(aes(fill = 
 scale(as.numeric(fcData + facet_wrap(~infection, ncol=4)

 # set scale
 hp2 - hp2 + scale_fill_gradient2(name=NULL, low=#0571B0, mid=#F7F7F7, 
 high=#CA0020, midpoint=0, breaks=NULL, labels=NULL, limits=NULL, 
 trans=identity)

 # set up text (size, colour etc etc)
 hp2 - hp2 + labs(x = Time, y = ) + scale_y_discrete(expand = c(0, 0)) + 
 opts(axis.ticks = theme_blank(), axis.text.x = theme_text(size = 10, angle = 
 360, hjust = 0, colour = grey25), axis.text.y = theme_text(size=10, colour 
 = 'gray25'))

 hp2 - hp2 + theme_bw()

 In the resulting plot I would like infections infA and infC plotted next to 
 each other and likewise for infB and infD. I have a column in the data - 
 infType - which I could use to reorder the infection column but so far I have 
 no luck getting this to work.

 Could someone give me a pointer to the best way to reorder the infection 
 factor and accompanying data into the order I would like?

 Best

 iain

 sessionInfo()
 R version 2.13.2 (2011-09-30)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_GB.utf8   LC_NUMERIC=C
  [3] LC_TIME=en_GB.utf8    LC_COLLATE=en_GB.utf8
  [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
  [7] LC_PAPER=en_GB.utf8   LC_NAME=C
  [9] LC_ADDRESS=C  LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

 attached base packages:
 [1] grid  stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
 [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6

 loaded via a namespace (and not attached):
 [1] digest_0.5.0 tools_2.13.2


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lapply to list of variables

2011-11-08 Thread Dennis Murphy
Hi:

Here's another way of doing this on the simplified version of your example:

L - vector('list', 3)  # initialize a list of three components
## populate it
for(i in seq_along(L)) L[[i]] - rnorm(20, 10, 3)
## name the components
names(L) - c('Monday', 'Tuesday', 'Wednesday')
## replace values = 10 with NA
lapply(L, function(x) replace(x, x = 10, NA)

If you have a large number of atomic objects, you could create a
vector of the object names, make a list out of them (e.g., L -
list(objnames)) and then mimic the lapply() as above. replace() only
works on vectors, though - Uwe's solution is more general.

HTH,
Dennis

On Tue, Nov 8, 2011 at 8:59 AM, Ana rrast...@gmail.com wrote:
 Hi

 Can someone help me with this?

 How can I apply a function to a list of variables.

 something like this


 listvar=list(Monday,Tuesday,Wednesday)
 func=function(x){x[which(x=10)]=NA}

 lapply(listvar, func)

 were
 Monday=[213,56,345,33,34,678,444]
 Tuesday=[213,56,345,33,34,678,444]
 ...

 in my case I have a neverending list of vectors.

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sampling with conditions

2011-11-08 Thread Dennis Murphy
In addition to Dan's quite valid concern,  the final sample is not
truly 'random' - the first k - 1 elements are randomly chosen, but the
last is determined so that the constraint is met.

Dennis

On Tue, Nov 8, 2011 at 9:59 AM, Nordlund, Dan (DSHS/RDA)
nord...@dshs.wa.gov wrote:
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of SarahJoyes
 Sent: Tuesday, November 08, 2011 5:57 AM
 To: r-help@r-project.org
 Subject: Re: [R] Sampling with conditions

 That is exactly what I want, and it's so simple!
 Thanks so much!


 Sarah,

 I want to point out that my post was qualified by something like.  I am not 
 sure it is exactly what you want.  Since you didn't quote my post, let me 
 show my suggestion and then express my concern.

 n - matrix(0,nrow=5, ncol=10)
 repeat{
  c1 - sample(0:10, 4, replace=TRUE)
  if(sum(c1) = 10) break
 }
 n[,1] - c(c1,10-sum(c1))
 n

 This nominally meets your criteria, but it will tend to result in larger 
 digits being under-represented.  For example, you unlikely to get a result 
 like c(0,8,0,0,2) or (9,0,0,1,0).

 That may be OK for your purposes, but I wanted to point it out.

 You could use something like

 n - matrix(0,nrow=5, ncol=10)
 c1 - rep(0,4)
 for(i in 1:4){
  upper - 10-sum(c1)
  c1[i] - sample(0:upper, 1, replace=TRUE)
  if(sum(c1) == 10) break
 }
 n[,1] - c(c1,10-sum(c1))
 n

 if that would suit your purposes better.


 Good luck,

 Dan

 Daniel J. Nordlund
 Washington State Department of Social and Health Services
 Planning, Performance, and Accountability
 Research and Data Analysis Division
 Olympia, WA 98504-5204


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nesting scale_manual caracteristics in ggplot

2011-11-08 Thread Dennis Murphy
Hi:

The problem is that you use different sets of labels in each
scale_*_manual. To get all of the scales to merge into one, you need
to have the same title, same breaks and same labels for the legend.
This gets you a single legend:

ggplot(data = df, aes(x = year, y = total, colour = treatment,
  linetype=treatment)) +
geom_point(aes(shape = treatment)) +
facet_wrap(~country) +
scale_colour_manual(breaks=c('A','B','C','D','E','F','G'),
values=c('A'='black','B'='black', 'C'='grey',
  'D'='grey','E'='red','F'='grey', 'G'='red'),
labels=c('A: Line 1','B: Line 2','C: Line3','D: Line 4',
 'E: Line 5','F: Line 6','G: Line 7')) +
scale_linetype_manual(breaks=c('A','B','C','D','E','F','G'),
values=c('A'='solid','B'='dotted', 'C'='solid','D'='dashed',
 'E'='dashed','F'='dotted', 'G'='dotted'),
labels=c('A: Line 1','B: Line 2','C: Line3','D: Line 4',
 'E: Line 5','F: Line 6','G: Line 7')) +
scale_shape_manual(breaks=c('A','B','C','D','E','F','G'),
labels=c('A: Line 1','B: Line 2','C: Line3','D: Line 4',
 'E: Line 5','F: Line 6','G: Line 7'),
values = c(0, 1, 2, 3, 4, 5, 6)) +
theme_bw() +
geom_abline(aes(intercept = Intercept, slope = Slope,
colour = treatment, linetype=treatment),
data = lines)

In your original, you had different line numbers as labels in
scale_colour_manual() and scale_shape_manual(). If you want to
maintain that, then you'll have to have separate legends for color and
shape.

HTH,
Dennis

On Tue, Nov 8, 2011 at 11:22 AM, Sigrid s.stene...@gmail.com wrote:
 Hi there,
 I am having a little problem with combining three scale_manual commands in a
 facet plot.  I am not able to combine the three different characteristics,
 instead ending up with three different descriptions next to the graph for
 the same geom.  I would like to see two separate labels (not three); one
 describing lines 1-7 and the other 8-14. For each of the treatments (A-B) I
 want a combination of color, line type and symbol.  How do I do this?

 Here are my codes (Feel free to modify the example to make it easier to work
 with. I was not able to do this while keeping the problem I wanted help
 with)
 df -structure(list(year = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), treatment =
 structure(c(1L,
 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L,
 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L,
 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L,
 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L,
 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 2L,
 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L,
 7L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L,
 6L, 6L, 6L, 7L, 7L, 7L), .Label = c(A, B, C, D, E,
 F, G), class = factor), total = c(135L, 118L, 121L, 64L,
 53L, 49L, 178L, 123L, 128L, 127L, 62L, 129L, 126L, 99L, 183L,
 45L, 57L, 45L, 72L, 30L, 71L, 123L, 89L, 102L, 60L, 44L, 59L,
 124L, 145L, 126L, 103L, 67L, 97L, 66L, 76L, 108L, 36L, 48L, 41L,
 69L, 47L, 57L, 167L, 136L, 176L, 85L, 36L, 82L, 222L, 149L, 171L,
 145L, 122L, 192L, 136L, 164L, 154L, 46L, 57L, 57L, 70L, 55L,
 102L, 111L, 152L, 204L, 41L, 46L, 103L, 156L, 148L, 155L, 103L,
 124L, 176L, 111L, 142L, 187L, 43L, 52L, 75L, 64L, 91L, 78L, 196L,
 314L, 265L, 44L, 39L, 98L, 197L, 273L, 274L, 89L, 91L, 74L, 91L,
 112L, 98L, 140L, 90L, 121L, 120L, 161L, 83L, 230L, 266L, 282L,
 35L, 53L, 57L, 315L, 332L, 202L, 90L, 79L, 89L, 67L, 116L, 109L,
 44L, 68L, 75L, 29L, 52L, 52L, 253L, 203L, 87L, 105L, 234L, 152L,
 247L, 243L, 144L, 167L, 165L, 95L, 300L, 128L, 125L, 84L, 183L,
 88L, 153L, 185L, 175L, 226L, 216L, 118L, 118L, 94L, 224L, 259L,
 176L, 175L, 147L, 197L, 141L, 176L, 187L, 87L, 92L, 148L, 86L,
 139L, 122L), country = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
 1L, 1L, 1L, 2L, 2L, 2L, 2L, 

Re: [R] window?

2011-11-08 Thread Dennis Murphy
The ets() function in the forecast package requires either a numeric
vector or a Time-Series object (produced from ts()). The frequency
argument in ts() refers to the time duration between observations;
e.g., frequency = 7 means that the data are weekly; frequency = 12
means that the data are monthly; frequency = 4 means that the data are
quarterly. You can see this from the examples on the help page of ts:
?ts at the R prompt.

The example associated with the forecast::ets() function uses the
USAccDeaths data:

data(USAccDeaths)
USAccDeaths   ## monthly data for six years
# Simulate the same structure with ts:
u - ts(rnorm(72), start = c(1973, 1), frequency = 12)
u

# Evidently you want to produce a multivariate series;
# here's one way with monthly frequency:
v - ts(matrix(rnorm(106), ncol = 2), start = c(2001, 1), frequency = 12)
v

Is that more or less what you were after?

Dennis

On Tue, Nov 8, 2011 at 2:04 PM, Kevin Burton rkevinbur...@charter.net wrote:
 I expect the frequency to be set to what I set it at and the window to
 return all of the data in the window from the original time series. The
 error is not because it is prime. I can generate a time series with just 52
 values (or 10) and it still occurs. I am building these objects for use with
 the 'forecast' packages and one of the methods 'ets' cannot handle a
 frequency above 24 so I set it (or try to) to 1. Will 'window' take z zoo or
 xts object? Can I convert from zoo or xts to ts?

 -Original Message-
 From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com]
 Sent: Tuesday, November 08, 2011 2:28 PM
 To: Kevin Burton
 Cc: r-help@r-project.org
 Subject: Re: [R] window?

 I'm not entirely sure that your request makes sense: what do you expect the
 frequency to be? It makes sense to me as is...Might your troubles be because
 53 is prime?

 More generally, most people don't like working with the raw ts class and
 prefer the zoo or xts packages because they are much more pleasant for most
 time series work. You might want to take a look into those.

 Michael

 On Tue, Nov 8, 2011 at 3:18 PM, Kevin Burton rkevinbur...@charter.net
 wrote:
 This doesn't seem to work:



 d - rnorm(2*53)

 ds - ts(d, frequency=53, start=c(2000,10))

 dswin - window(ds, start=c(2001,1), end=c(2001,10), frequency=1)

 dswin

 Time Series:

 Start = 2001

 End = 2001

 Frequency = 1

 [1] 1.779409



 dswin - window(ds, start=c(2001,1), end=c(2001,10))

 dswin

 Time Series:

 Start = c(2001, 1)

 End = c(2001, 10)

 Frequency = 53

  [1]  1.7794090  0.6916779 -0.6641813 -0.7426889 -0.5584049 -0.2312959

 [7] -0.0183454 -1.0026301  0.4534920  0.6058198





 The problem is that when the frequency is specified only one value
 shows up in the window. When no frequency is specified I get all 10
 values but now the time series has a frequency that I don't want.



 Comments?



 Kevin




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nesting scale_manual caracteristics in ggplot

2011-11-08 Thread Dennis Murphy
As I mentioned earlier, if you use different labels/breaks/titles for
the legends, you'll get separate legends in a ggplot. I'm assuming you
want to keep your original labels in scale_shape_manual(); there's no
problem if you do that. The code I provided showed you how to align
all three legends; if you want one to be different, change its labels
and a second legend will appear. Couldn't be simpler ;)

Dennis

On Tue, Nov 8, 2011 at 1:20 PM, Sigrid s.stene...@gmail.com wrote:
 Hi Dennis,

 Thank you, it looks much better now the three characteristics are combined
 in the label. However, I would like to keep two different sets of labels on
 the side as I want to describe lines for each facet (high and low). Do you
 think this is possible, without breaking up the characteristics again?

 Cheers


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/nesting-scale-manual-caracteristics-in-ggplot-tp4017171p4017523.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Panel Data by Time

2011-11-08 Thread Dennis Murphy
Here's another approach using the plyr and data.table packages, where
df is the name I gave to your example data:

# plyr
library('plyr')
ddply(df, .(TIME), mutate, L1 = sort(S1))

# Another way with the data.table package:
library('data.table')
dt - data.table(df, key = 'TIME')
dt[, list(X1, S1, L1 = sort(S1)), by = 'TIME']

HTH,
Dennis

On Tue, Nov 8, 2011 at 11:58 AM, economicurtis curtisesj...@gmail.com wrote:
 I have panel data in the following form:

 TIME   X1     S1
  1       1      0.99
  1       2      0.50
  1       3      0.01
  2       3      0.99
  2       1      0.99
  2       2      0.25
  3       3      0.75
  3       2      0.50
  3       1      0.25
 ...      ...     ..

 And desire a new vector of observations in which one column (S1 above) is
 sorted for each second from least to largest.

 That is, a new vector (L1 below) of the form:

 TIME   X1     S1       L1
  1       1      0.99    0.01
  1       2      0.50    0.50
  1       3      0.01    0.99
  2       3      0.99    0.25
  2       1      0.99    0.99
  2       2      0.25    0.99
  3       3      0.75    0.25
  3       2      0.50    0.50
  3       1      0.25    0.75
 ...      ...     ..    .

 Sorry for the NOOB question, but any help would be great.

 Curtis Kephart

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Sorting-Panel-Data-by-Time-tp4017271p4017271.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot separate groups with plotmeans()

2011-11-08 Thread Dennis Murphy
Hi:

Here are a couple of ways using the plyr and ggplot2 packages:

library('plyr')
library('ggplot2')

# Create a summary data frame that computes the mean
# and standard deviation at each time/group combination
xsumm - ddply(x, .(Time, Group), summarise, m = mean(Score), s = sd(Score))

# Create a new time variable to appose the two groups at
# each time (in case that was the intent):
xsumm2 - mutate(xsumm, Time2 = rep(1:5, each = 2))

# Plot 1: Plot all 10 times in one panel, using color to
#separate the two groups
ggplot(xsumm, aes(x = Time, y = m)) +
   geom_errorbar(aes(ymin = m - s, ymax = m + s, colour = Group),
 size = 1) +
   geom_point(size = 3, colour = 'red') +
   geom_line(aes(group = 1),  colour = 'red', size = 1) +
   scale_colour_manual(breaks = c('A', 'B'),
 values = c('blue', 'orange')) +
   scale_x_continuous(breaks = 1L:10L) +
   ylab('Score')

# Plot 2: Separate the two groups into different panels
ggplot(xsumm2, aes(x = Time2, y = m)) +
   geom_errorbar(aes(ymin = m - s, ymax = m + s),
 colour = 'blue', size = 1, width = 0.4) +
   geom_point(size = 3, colour = 'red') +
   geom_line(aes(group = 1), colour = 'red', size = 1) +
   labs(x = 'Time', y = 'Score') +
   facet_wrap(~ Group, ncol = 1)

# Plot 3: Put the groups side by side at each time
# This one is a little trickier...
ggplot(xsumm2, aes(x = Time2, y = m)) +
   geom_errorbar(aes(ymin = m - s, ymax = m + s, colour = Group),
 position = position_dodge(width = 0.4),
 size = 1, width = 0.4) +
   geom_point(size = 3, colour = 'red',
  position = position_dodge(width = 0.4)) +
   geom_line(aes(group = Group), colour = 'red', size  = 1,
  position = position_dodge(width = 0.4)) +
   labs(x = 'Time', y = 'Score') +
   scale_colour_manual(breaks = c('A', 'B'),
 values = c('blue', 'orange'))

You might have to be careful about saving the last plot, especially if
you want to shrink the size or change the aspect ratio - it might
upset the alignment of the bars, lines and points. If you save it at
the standard size (7'' x 7'') you should be fine. See the ggsave()
function in the ggplot2 package for details (?ggplot2::ggsave once the
package is installed).

If this is new to you, go to http://had.co.nz/ggplot2/ for the
package's web page; the on-line help files (with many examples) start
as you scroll toward the bottom of the page. Click on the book's
website (linked from the above site) to access the chapter on qplots,
the entry level chapter of the ggplot2 book.

This would be harder to do in the lattice package because it doesn't
have a built-in error bar panel function (by design, I think :).

HTH,
Dennis

On Tue, Nov 8, 2011 at 4:19 PM, Jon Zadra jr...@virginia.edu wrote:
 Hi,

 I often use plotmeans() from the gplots package to quickly visualize a
 pattern of change.  I would like to be able to plot separate lines for
 different groups, but the function gives an error when a grouping variable
 is included in the formula argument.

 For instance,
 require(gplots)
 x - data.frame(Score=rnorm(100), Time=rep(1:10, 10),
 Group=factor(rep(c(A,B),50)))
 plotmeans(Score ~ Time, x) #works fine
 plotmeans(Score ~ Time * Group, x)
 Error in .subset2(x, i, exact = exact) :
  attempt to select more than one element
 plotmeans(Score ~ Time | Group, x)
 Error in ns - 1 : non-numeric argument to binary operator
 In addition: Warning message:
 In Ops.factor(Time, Group) : | not meaningful for factors

 The help for plotmeans() states that it accepts a formula object including a
 grouping variable.  There is no other grouping argument listed.  Any help is
 appreciated, including pointing to a more robust function for plotting
 means.

 Thanks,

 Jon
 --
 Jon Zadra
 Department of Psychology
 University of Virginia
 P.O. Box 400400
 Charlottesville VA 22904
 (434) 982-4744
 email: za...@virginia.edu
 http://www.google.com/calendar/embed?src=jzadra%40gmail.com
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistric regression: model revision

2011-11-07 Thread Dennis Murphy
Since you didn't provide a reproducible example, here are a couple of
possibilities to check, but I have utterly no idea if they're
applicable to your problem or not:

 * does costdis1 consist of 0's and 1's?
 * is costdis1 a factor?

In the first model, you treat costdis1 as a pure quadratic and in the
second model, it is a linear term. The two models are not nested.
Modeling a term as a pure quadratic is a very strong assumption - the
more usual practice is to fit both a linear and quadratic term in
costdis1 to allow more flexibility in the fitted surface, but that
would require costdis1 to be numeric.

HTH,
Dennis

On Mon, Nov 7, 2011 at 7:58 AM, Sally Ann Sims sallys...@earthlink.net wrote:
 Hello,

 I am working on fitting a logistic regression model to my dataset.  I removed 
 the squared term in the second version of the model, but my model output is 
 exactly the same.

 Model version 1:  
 GRP_GLM-glm(HB_NHB~elev+costdis1^2,data=glm_1,family=binomial(link=logit))
 summary(GRP_GLM)


 Model version 2:  
 QM_1-glm(HB_NHB~elev+costdis1,data=glm_2,family=binomial(link=logit))
 summary(QM_1)


 The call in version 2 has changed:
 Call:
 glm(formula = HB_NHB ~ elev + costdis1, family = binomial(link = logit),
    data = glm_2)
 But I’m getting the exact same results as I did in the model where costdis1 
 is squared.

 Any ideas what I might do to correct this?  Thank you.

 Sally
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correction in error

2011-11-07 Thread Dennis Murphy
Hi:

In your function call, x[1, 1] = theta = 0. In the first line of the
loop, your rbinom() call works out to be
  x[2, 1] - rbinom(x[1, 3], 1, x[1, 1])   = rbinom(10, 1, 0)

That likely accounts for the error message:
Error in x[t, 1] - rbinom(x[t - 1, 3], 1, x[t - 1, 1]) :
  replacement has length zero

HTH,
Dennis

On Mon, Nov 7, 2011 at 12:10 PM, Gyanendra Pokharel
gyanendra.pokha...@gmail.com wrote:
 Hello R community, following is my code and it shows error, can some one
 fix this error and explain why this occurs?

 gibbs -function(m,n, theta = 0, lambda = 1){
    alpha - 1.5
    beta - 1.5
    gamma - 1.5
    x- array(0,c(m+1, 3))
    x[1,1] - theta
    x[1,2] - lambda
    x[1,3]- n
    for(t in 2:m+1){
        x[t,1] - rbinom(x[t-1,3], 1, x[t-1,1])
        x[t,2]-rbeta(m, x[t-1,1] + alpha, tx[t-1,3] - x[t-1,1] + beta)
        x[t,3] - rpois(x[t-1,3] - x[t-1,1],(1 - x[t-1,2])*gamma)
    }
    x
 }
 gibbs(100, 10)

 Gyn

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation between matrices

2011-11-06 Thread Dennis Murphy
Hi:

On Sat, Nov 5, 2011 at 11:06 PM, Kaiyin Zhong kindlych...@gmail.com wrote:
 Thank you Dennis, your tips are really helpful.
 I don't quite understand the lm(y~mouse) part; my intention was -- in
 pseudo code -- lm(y(Enzyme) ~ y(each elem)).

As I said in my first response, I didn't quite understand what you
were trying to regress so I used the mouse as a way of showing you how
the code works. I think I understand what you want now, though.

I'll create a data set in two ways: the first assumes you have the
data as constructed in your original post and the second generates
random numbers after erecting a 'scaffold' data frame. The game is to
separate the enzyme data from the element data and put them into the
final data frame as separate columns. Then the regression is easy if
that's what you need to do.

# Method 1: Generate the data as you did into separate data frames

elem0 - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')

# Creates five 5 x 5 data frames with names V1-V5:
for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
   assign(n, as.data.frame(replicate(5, rnorm(5
 }

# Stack the chemical element data using melt() from
# the reshape2 package:
library('reshape2')
d1 - rbind(melt(Cu), melt(Zn), melt(Fe), melt(Ca))
# Relabel V1 - V5 with brain region names, add a factor
# to distinguish individual elements and tack on the melted
# Enzyme data so that it repeats in each element block
d1 - within(d1, {
variable - factor(d1$variable, labels = regions)
elem - factor(rep(elem0[1:4], each = 25))
Enzyme - melt(Enzyme)[, 2]
  } )
# Plot the data using lattice and latticeExtra:
library('lattice')
library('latticeExtra')
p - xyplot(Enzyme ~ value | variable + elem,
   data = d1, type = c('p', 'r'))
useOuterStrips(p)

###
## Method 2: Generate the random data after setting
## up the element/region/mouse combinations
##

# Generate a data frame from the combinations of
# mice, regions and elem:

library('ggplot2')

mice - paste('mouse', 1:5, sep = '')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')
elem - elem0[1:4]
d0 - data.frame(expand.grid(mice = mice,
 regions = regions, elem = elem))
d0 - within(d0, {
value - rnorm(100)   # generate element values
Enzyme - rnorm(25)  # generate enzyme values
  } )

# the Enzyme values are recycled through all element blocks.

# You can either adapt the lattice code above to plot d0, or you
# can do the following to get an analogous plot in ggplot2.
# It's easier to compute the slopes and intercepts and put
# them into a data frame that ggplot() can import, so that's
# what we'll do first.

# A function to return regression coefficients from a
# generic data frame. Since this function goes into ddply(),
# the argument df is a (generic) data frame and the output
# will be converted to a one-line data frame.

coefun - function(df) coef(lm(Enzyme ~ value, data = df))

# Apply the function to all regions * elem combinations.
# Output is a data frame of coefficients corresponding to
# each region/element combination

coefs - ddply(d0, .(regions, elem), coefun)
# Rename the columns
names(coefs) - c('regions', 'elem', 'b0', 'b1')

# Generate the plot using package ggplot2:
ggplot(d0, aes(x = val, y = Enzyme)) +
   geom_point(size = 2.5) +
   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
 size = 1) +
   xlab() +
   facet_grid(elem ~ regions)


 In addition, attach(d) seems necessary before using lm(y~mouse), and
 since d$mouse has a length 125, while each elem for each region has a
 length 5, it generates the following error:

You should never need to use attach() - use the data = argument in
lm() instead, where the value of data is the name of a data frame.
It's always easier to use the modeling functions in R having formula
interfaces with data frames.

 coefs = ddply(d, .(regions, elem), coefun)
 Error in model.frame.default(formula = y ~ mouse, drop.unused.levels = TRUE) :
  variable lengths differ (found for 'mouse')

You're clearly doing something here that's messing up the structure of
the data. Study what the code (and its output) above are telling you,
particularly if you're not familiar with plyr, lattice and/or ggplot2.
Writing functions to insert into a **ply() function in plyr can be
tricky. If you continue to have problems, please provide a
reproducible example as you did here.

HTH,
Dennis


 On Sun, Nov 6, 2011 at 12:53 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 I don't think you want to keep these objects separate; it's better to
 combine everything into a data frame. Here's a variation of your
 example - the x variable ends up being a mouse, but you may have
 another variable that's more appropriate to plot so take

Re: [R] Correlation between matrices

2011-11-05 Thread Dennis Murphy
Hi:

I don't think you want to keep these objects separate; it's better to
combine everything into a data frame. Here's a variation of your
example - the x variable ends up being a mouse, but you may have
another variable that's more appropriate to plot so take this as a
starting point. One plot uses the ggplot2 package, the other uses the
lattice and latticeExtra packages.

library('ggplot2')
regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
'cerebellum')
mice = paste('mouse', 1:5, sep='')
elem - c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')

# Generate a data frame from the combinations of
# mice, regions and elem:
d - data.frame(expand.grid(mice = mice, regions = regions,
elem = elem), y = rnorm(125))
# Create a numeric version of mice
d$mouse - as.numeric(d$mice)

# A function to return regression coefficients
coefun - function(df) coef(lm(y ~ mouse), data = df)
# Apply to all regions * elem combinations
coefs - ddply(d, .(regions, elem), coefun)
names(coefs) - c('regions', 'elem', 'b0', 'b1')

# Generate the plot using package ggplot2:
ggplot(d, aes(x = mouse, y = y)) +
   geom_point(size = 2.5) +
   geom_abline(data = coefs, aes(intercept = b0, slope = b1),
 size = 1) +
   facet_grid(elem ~ regions)

# Same plot in lattice:
library('lattice')
library('latticeExtra')
p - xyplot(y ~ mouse | elem + regions, data = d, type = c('p', 'r'),
 layout = c(5, 5))


HTH,
Dennis

On Sat, Nov 5, 2011 at 10:49 AM, Kaiyin Zhong kindlych...@gmail.com wrote:
 regions = c('cortex', 'hippocampus', 'brain_stem', 'mid_brain',
 'cerebellum')
 mice = paste('mouse', 1:5, sep='')
 for (n in c('Cu', 'Fe', 'Zn', 'Ca', 'Enzyme')) {
 +   assign(n, as.data.frame(replicate(5, rnorm(5
 + }
 names(Cu) = names(Zn) = names(Fe) = names(Ca) = names(Enzyme) = regions
 row.names(Cu) = row.names(Zn) = row.names(Fe) = row.names(Ca) =
 row.names(Enzyme) = mice
 Cu
           cortex hippocampus brain_stem  mid_brain cerebellum
 mouse1 -0.5436573 -0.31486713  0.1039148 -0.3908665 -1.0849112
 mouse2  1.4559136  1.75731752 -2.1195118 -0.9894767  0.3609033
 mouse3 -0.6735427 -0.04666507  0.9641000  0.4683339  0.7419944
 mouse4  0.6926557 -0.47820023  1.3560802  0.9967562 -1.3727874
 mouse5  0.2371585  0.20031393 -1.4978517  0.7535148  0.5632443
 Zn
            cortex hippocampus brain_stem  mid_brain  cerebellum
 mouse1 -0.66424043   0.6664478  1.1983546  0.0319403  0.41955740
 mouse2 -1.14510448   1.5612235  0.3210821  0.4094753  1.01637466
 mouse3 -0.85954416   2.8275458 -0.6922565 -0.8182307 -0.06961242
 mouse4  0.03606034  -0.7177256  0.7067217  0.2036655 -0.25542524
 mouse5  0.67427572   0.6171704  0.1044267 -1.8636174 -0.07654666
 Fe
           cortex hippocampus  brain_stem  mid_brain cerebellum
 mouse1  1.8337008   2.0884261  0.29730413 -1.6884804  0.8336137
 mouse2 -0.2734139  -0.5728439  0.63791556 -0.6232828 -1.1352224
 mouse3 -0.4795082   0.1627235  0.21775206  1.0751584 -0.5581422
 mouse4  1.7125147  -0.5830600  1.40597896 -0.2815305  0.3776360
 mouse5 -0.3469067  -0.4813120 -0.09606797  1.0970077 -1.1234038
 Ca
           cortex hippocampus  brain_stem   mid_brain cerebellum
 mouse1 -0.7663354   0.8595091  1.33803798 -1.17651576  0.8299963
 mouse2 -0.7132260  -0.2626811  0.08025079 -2.40924271  0.7883005
 mouse3 -0.7988904  -0.1144639 -0.65901136  0.42462227  0.7068755
 mouse4  0.3880393   0.5570068 -0.49969135  0.06633009 -1.3497228
 mouse5  1.0077684   0.6023264 -0.57387762  0.25919461 -0.9337281
 Enzyme
           cortex hippocampus  brain_stem  mid_brain cerebellum
 mouse1  1.3430936   0.5335819 -0.56992947  1.3565803 -0.8323391
 mouse2  1.0520850  -1.0201124  0.8965  1.4719880  1.0854768
 mouse3 -0.2802482   0.6863323 -1.37483570 -0.7790174  0.2446761
 mouse4 -0.1916415  -0.4566571  1.93365932  1.3493848  0.2130424
 mouse5 -1.0349593  -0.1940268 -0.07216321 -0.2968288  1.7406905

 In each anatomic region, I would like to calculate the correlation between
 Enzyme activity and each of the concentrations of Cu, Zn, Fe, and Ca, and
 do a scatter plot with a tendency line, organizing those plots into a grid.
 See the image below for the desired effect:
 http://postimage.org/image/62brra6jn/
 How can I achieve this?

 Thank you in advance.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is it possible to vectorize/accelerate this?

2011-11-03 Thread Dennis Murphy
Hi:

You're doing the right thing in R by pre-allocating memory for the
result, but ifelse() is a vectorized function and your loop is
operating elementwise, so if-else is more appropriate. Try

for (i in 2:100){
b_vec[i] - if(abs(b_vec[i-1] + a_vec[i])  1) a_vec[i] else
b_vec[i-1] + a_vec[i]
  }

If speed is an issue, then I echo Michael's suggestion to write a
C(++) function and call it within R. The inline package is good for
this kind of thing.

HTH,
Dennis


On Thu, Nov 3, 2011 at 12:10 PM, hihi v.p.m...@freemail.hu wrote:
 Dear Members,

 I work on a simulaton experiment but it has an bottleneck. It's quite fast 
 because of R and vectorizing, but it has a very slow for loop. The adjacent 
 element of a vector (in terms of index number) depends conditionally on the 
 former value of itself. Like a simple cumulating function (eg. cumsum) but 
 with condition. Let's show me an example:
 a_vec = rnorm(100)
 b_vec = rep(0, 100)
 b_vec[1]=a_vec[1]
 for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], 
 b_vec[i-1]+a_vec[i])}
 print(b_vec)

 (The behaviour is like cumsum's, but when the value would excess 1.0 then it 
 has another value from a_vec.)
 Is it possible to make this faster? I experienced that my way is even slower 
 than in Excel! Programming in C would my last try...
 Any suggestions?

 Than you,
 Peter

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round up a number to 10^4

2011-11-01 Thread Dennis Murphy
Works in the newly released 2.14.0:

 X = c(60593.23, 71631.17, 75320.1)
 round(X, -4)
[1] 6 7 8

Dennis


On Tue, Nov 1, 2011 at 10:07 AM, Wendy wendy2.q...@gmail.com wrote:
 Hi all,

 I have a list of numbers, e.g., X = c(60593.23, 71631.17, 75320.1), and want
 to round them  so the output is Y = c(6, 8, 8). I tried
 Y-round(X,-4), but it gives me Y = c(6, 7, 8). Do anybody know
 how to round up a number to 10^4?

 Thank you in advance.

 Wendy


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/round-up-a-number-to-10-4-tp3964394p3964394.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removal/selecting specific rows in a dataframe conditional on 2 columns

2011-11-01 Thread Dennis Murphy
Does this work?

library('plyr')
# Function to return a data frame if it has one row, else return NULL:
f - function(d) if(nrow(d) == 1L) d else NULL
 ddply(RV09, .(set, month), f)
  record.t trip set month stratum NAFO unit.area time dur.set distance
15  913   110 351   3O   R31 1044  179
25  913   310 340   3O   Q31 1800  189
35  913   410 340   3O   Q32 2142  179

ddply() is an apply-like function that takes a data frame as input and
a data frame as output (hence the dd). The first argument is the data
frame name, the second argument the set of grouping variables and the
third is the function to be called (in this application).

HTH,
Dennis

On Tue, Nov 1, 2011 at 10:16 AM, Aurelie Cosandey Godin god...@dal.ca wrote:
 Dear list,

 After reading different mails, blogs, and tried a few different codes without 
 any success, I am asking your help!
 I have the following data frame where each row represent a survey unit with 
 the following variables:

 names(RV09)
  [1] record.t  trip      set       month     stratum   NAFO
  [7] unit.area time      dur.set   distance  operation mean.d
 [13] min.d     max.d     temp.d    slat      slong     spp
 [19] number    weight    elat      elong

 Each survey unit generates one set record, denoted by a 5 in column 
 record.t. Each species identified in this particular survey unit generates 
 an additional set record, denoted by a 6.

 unique(RV09$record.t)
 [1] 5 6

 Each survey unit are identified by a specific trip and set number, so if 
 there is a 5 record type with no associated 6 records, it means that no 
 species were observed in that survey unit. I would like to be able to select 
 all and only these survey units, which represent my zeros.

 So as an exemple, in this trip number 913, set 1, 3, and 4 would be part of 
 my zeros data.frame as they appear with no record.t 6, such that no species 
 were observed in this survey unit.

 head(RV09)
   record.t trip set month stratum NAFO unit.area time dur.set distance
 585        5  913   1    10     351   3O       R31 1044      17        9
 586        5  913   2    10     351   3O       R31 1440      17        9
 587        6  913   2    10     351   3O       R31 1440      17        9
 588        5  913   3    10     340   3O       Q31 1800      18        9
 589        5  913   4    10     340   3O       Q32 2142      17        9

 Any tips on how extract this zero data.frame in R?
 Thank you very much in advance!

 Best,
 ~Aurelie


 Aurelie Cosandey-Godin
 Ph.D. student, Department of Biology
 Industrial Graduate Fellow, WWF-Canada
 Dalhousie University | Email: god...@dal.ca


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting entries to create a new table

2011-11-01 Thread Dennis Murphy
Hi:

After cleaning up your data, here's one way using the plyr  and
reshape packages:

d - read.csv(textConnection(
Individual, A, B, C, D
Day1, 1,1,1,1
Day2, 1,3,4,2
Day3, 3,,6,4), header = TRUE)
closeAllConnections()

d
library('plyr')
library('reshape')
# Stack the variables
dm - melt(d, id = 'Individual')
# Convert the new value column to factor as follows:
dm$value - factor(dm$value, levels = c(1:7, NA), exclude = NULL)
# Use ddply() in conjunction with tabulate():
ddply(dm, .(variable), function(d) tabulate(d$value, nbins = 8))
  variable V1 V2 V3 V4 V5 V6 V7 V8
1A  2  0  1  0  0  0  0  0
2B  1  0  1  0  0  0  0  1
3C  1  0  0  1  0  1  0  0
4D  1  1  0  1  0  0  0  0

This returns a data frame. If you want a matrix instead, use the
daply() function rather than ddply() and leave everything else the
same.

HTH,
Dennis

On Tue, Nov 1, 2011 at 1:05 PM, Empty Empty phytophthor...@yahoo.com wrote:
 Hi,

 I am an R novice and I am trying to do something that it seems should be 
 fairly simple, but I can't quite figure it out and I must not be using the 
 right words when I search for answers.

 I have a dataset with a number of individuals and observations for each day 
 (7 possible codes plus missing data)
 So it looks something like this

 Individual A, B, C, D
 Day1 1,1,1,1
 Day 2 1,3,4,2
 Day3 3,,6,4
 (I've also tried transposing it so that individuals are rows and days are 
 columns)

 I want to summarize the total observation codes by individual so that I end 
 up with a table something like this:

 ,  1, 2 ,3 ,4, 5, 6,7, missing
 A  2,0,1,0,0,0,0,0
 B  1,0,1,0,0,0,0,1
 C  1,0,0,1,0,1,0,0
 D 1,1,0,1,0,0,0,0

 If I can get that I'll be happy! Part two is
  subsetting which days I include in the counts so that I could, say look at 
 days 1-30 and 30-60 separately - create two different tables from the same 
 original table. (Or I can do it manually and start with different subsets of 
 the data).

 Thanks so much for any help.
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict for a cv.glmnet returns an error

2011-11-01 Thread Dennis Murphy
Hi:

Here's what I got when I ran your code:

library('glmnet')
 x=matrix(rnorm(100*20),100,20)
 y=rnorm(100)
 cv.fit=cv.glmnet(x,y)
 predict(cv.fit,newx=x[1:5,])
 1
[1,] 0.1213114
[2,] 0.1213114
[3,] 0.1213114
[4,] 0.1213114
[5,] 0.1213114
 coef(cv.fit)
21 x 1 sparse Matrix of class dgCMatrix
1
(Intercept) 0.1213114
V1  0.000
V2  0.000
V3  0.000
V4  0.000
V5  0.000
V6  0.000
V7  0.000
V8  0.000
V9  0.000
V10 0.000
V11 0.000
V12 0.000
V13 0.000
V14 0.000
V15 0.000
V16 0.000
V17 0.000
V18 0.000
V19 0.000
V20 0.000

### Check against the versions of the packages listed below:
 sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] glmnet_1.7.1   Matrix_1.0-1   lattice_0.20-0 ggplot2_0.8.9  proto_0.3-9.2
[6] reshape_0.8.4  plyr_1.6

loaded via a namespace (and not attached):
[1] tools_2.14.0


Dennis

On Tue, Nov 1, 2011 at 3:34 PM, asafw assafweinst...@gmail.com wrote:

 Hi there,

 I am trying to use predict() with an object returned by cv.glmnet(), and get
 the following error:
 no applicable method for 'predict' applied to an object of class cv.glmnet

 What's wrong?

 my code:

 x=matrix(rnorm(100*20),100,20)
 y=rnorm(100)
 cv.fit=cv.glmnet(x,y)
 predict(cv.fit,newx=x[1:5,])
 coef(cv.fit)

 Thanks so much,

 Asaf

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/predict-for-a-cv-glmnet-returns-an-error-tp3965744p3965744.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] building a subscript programatically

2011-11-01 Thread Dennis Murphy
Here's a hack, but perhaps you might want to rethink what type of
output you want.

# Function:
g - function(arr, lastSubscript = 1) {
   n - length(dim(arr))
   commas - paste(rep(',', n - 1), collapse = '')
   .call - paste('arr[', commas, lastSubscript, ']', sep = '')
   eval(parse(text = .call))
   }

# Examples:
a1 - array(1:8, c(2, 2, 2))
a2 - array(1:16, c(2, 2, 2, 2))
a3 - array(1:32, c(2, 2, 2, 2, 2))
g(a1, 2)
g(a2, 2)
g(a3, 2)

Notice the subscripting in the last two examples - if you only want
one submatrix returned, then try this:

h - function(arr, lastSubscript = c(1)) {
   n - length(dim(arr))
   subs - if(length(lastSubscript)  1)
  paste(lastSubscript, collapse = ',') else lastSubscript
   .call - paste('arr[,,', subs, ']', sep = '')
   eval(parse(text = .call))
   }
h(a2, c(1, 1))
h(a3, c(2, 1, 1))

These functions have some ugly code, but I think it does what you were
looking for. Hopefully someone can devise a more elegant solution.

Dennis

HTH

2011/11/1 Ernest Adrogué nfdi...@gmail.com:
 Hi,

 On ocasion, you need to subscript an array that has an arbitrary
 (ie. not known in advance) number of dimensions. How do you deal with
 these situations?
 It appears that it is not possible use a list as an index, for
 instance this fails:

 x - array(NA, c(2,2,2))
 x[list(TRUE,TRUE,2)]
 Error in x[list(TRUE, TRUE, 2)] : invalid subscript type 'list'

 The only way I know is using do.call() but it's rather ugly. There
 must be a better way!!

 do.call('[', c(list(x), TRUE, TRUE, 2))
     [,1] [,2]
 [1,]   NA   NA
 [2,]   NA   NA

 Any idea?

 Regards,
 Ernest

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] jarquebera_test_results

2011-10-30 Thread Dennis Murphy
Hi:

I'd double check the form of the upper bound of the sequence. In the
first case (i = 1), it uses the subvector loghozamok[1:220]; when i =
2, it uses the subvector loghozamok[21:260], etc. In the last case, it
uses loghozamok[1181:1420]. If instead you want to split the vector
into 60 groups of size 20, then this is easier (assuming that
loghozamok has length 1200):

gp - rep(1:60, each = 20)
# Returns a vector
do.call(c, lapply(split(loghozamok, gp),
 function(z) unname(tseries::jarque.bera.test(z)$p.value)))

This works when gp and loghozamok have the same length.

# Small test:
x - rnorm(100)
gp - rep(1:5, each = 20)
do.call(c, lapply(split(x, gp),
 function(z) unname(tseries::jarque.bera.test(z)$p.value)))

1 2 3 4 5
0.5849843 0.6745735 0.9453412 0.5978477 0.7207138

HTH,
Dennis

On Sun, Oct 30, 2011 at 10:23 AM, Szűcs Ákos szucsa...@t-online.hu wrote:
 Hi!
 I got a loop where i print out the results of Jarque Bera tests, but I have
 to put, the p-values in a vector. Can you help me how to do it in an
 effective way and not just typing in the results to a vector? Thanks a lot,
 here is the code:
 for(i in 1:60){
 print(jarque.bera.test(loghozamok[((20*(i-1))+1):(20*(i+11))]))}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

2011-10-30 Thread Dennis Murphy
Hi:

Here are a few ways (untested, so caveat emptor):

# plyr package
library('plyr')
ddply(df, .(Plant, Tissue, Gene), summarise, ntest =
shapiro.test(ExpressionLevel))

# data.table package
library('data.table')
dt - data.table(df, key = 'Plant, Tissue, Gene')
dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)]

# aggregate() function
aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN =
shapiro.test)

# doBy package:
summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN =
shapiro.test)

There are others, too...

HTH,
Dennis

2011/10/30 Joel Fürstenberg-Hägg jo...@life.ku.dk:
 Dear R users,

 I have a data frame in the form below, on which I would like to make 
 normality tests on the values in the ExpressionLevel column.

 head(df)
  ID Plant  Tissue  Gene ExpressionLevel
 1  1 p1     t1      g1   366.53
 2  2 p1     t1      g2     0.57
 3  3 p1     t1      g3    11.81
 4  4 p1     t2      g1   498.43
 5  5 p1     t2      g2     2.14
 6  6 p1     t2      g3     7.85

 I would like to make the tests on every group according to the content of the 
 Plant, Tissue and Gene columns.

 My first problem is how to run a function for all these sub groups.
 I first thought of making subsets:

 group1 - subset(df, Plant==p1  Tissue==t1  Gene==g1)
 group2 - subset(df, Plant==p1  Tissue==t1  Gene==g2)
 group3 - subset(df, Plant==p1  Tissue==t1  Gene==g3)
 group4 - subset(df, Plant==p1  Tissue==t2  Gene==g1)
 group5 - subset(df, Plant==p1  Tissue==t2  Gene==g2)
 group6 - subset(df, Plant==p1  Tissue==t2  Gene==g3) etc...

 But that would be very time consuming and I would like to be able to use the 
 code for other data frames...
 I have also tried to store these in a list, which I am looping through, 
 running the tests, something like this:

 alist=list(group1, group2, group3, group4, group5, group6)
 for(i in alist)
 {
  print(shapiro.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel, adjust=FALSE))
 }

 But, there must be an easier and more elegant way of doing this... I found 
 the example below at 
 http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop.
  I think might be used for the printing of the results, but I do not know how 
 to adjust for my data frame, since the functions are applied on several 
 columns instead of certain rows in one column.

 DF - data.frame(A = rnorm(100), B = rlnorm(100))

 obj2 - lapply(DF, shapiro.test)

 tab2 - lapply(obj, function(x) c(W = unname(x$statistic), p.value = 
 x$p.value))
 tab2 - data.frame(do.call(rbind, tab2))
 printCoefmat(tab2, has.Pvalue = TRUE)

 Finally, I have found several different functions for testing for normality, 
 but which one(s) should I choose? As far as I can see in the help files they 
 only differ in the minimum number of samples required.

 Thanks in advance!

 Kind regards,

 Joel






        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster process for creating a matrix based on matrix element comparison

2011-10-28 Thread Dennis Murphy
 dat
 [,1] [,2] [,3]
[1,]001
[2,]010
[3,]011
[4,]   NA11

1 - is.na(dat)

 [,1] [,2] [,3]
[1,]111
[2,]111
[3,]111
[4,]011

D.

On Fri, Oct 28, 2011 at 11:40 AM, Evgenia ev...@aueb.gr wrote:
 I have matrix data
 data-matrix(cbind(c(0,0,0),c(NA,0,1),c(1,1,1),c(0,1,1)),ncol=3)
 and I want to create a new matrix by checking each element of the data and
 put value 0 if i have NA and 1 otherwise.
 For this reason i made the function below
 pdata-matrix(NA,ncol=ncol(data),nrow=nrow(data))

 pdata-sapply(1:nrow(data),function(i) sapply (1:ncol(data),function(j)
 if (is.na(data[i,j])) {pdata[i,j]-0} else {pdata[i,j]-1}
 ))
 pdata-matrix(t(pdata),ncol=ncol(data),nrow=nrow(data))

 with pdata

 pdata
     [,1] [,2] [,3]
 [1,]    1    1    1
 [2,]    1    1    1
 [3,]    1    1    1
 [4,]    0    1    1

 But in my case I have a matrix with 5 rows and 10 colums
 and the creation of pdata takes alot of time.
 Could anyone have any suggestion to make pdata faster?
 Thanks




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Faster-process-for-creating-a-matrix-based-on-matrix-element-comparison-tp3948841p3948841.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] show.ci='FALSE' ignored in simple.lm

2011-10-27 Thread Dennis Murphy
Hi:

A trip to package sos reveals that simple.lm() is in the UsingR
package. Looking at the code of this function, it plots the (x, y)
pairs and the fitted least squares line without an option to suppress
the plot. Here's a slight hack of the function; it adds a new argument
plot with default action TRUE; after the linear model is fit, an if
statement tests to see if the plot is desired; if FALSE, it simply
returns the (print method of the) fitted object.

simple.lm2 - function (x, y, plot = TRUE, show.residuals = FALSE,
  show.ci = FALSE, conf.level = 0.95, pred = FALSE)
{
op - par()
ord - order(x)
x - x[ord]
y - y[ord]
tmp.lm - lm(y ~ x)
if(!plot) return(tmp.lm) else {

if (show.residuals) {
par(mfrow = c(2, 2))
}
plot(x, y)
abline(tmp.lm)

if (show.ci) {
xvals - seq(min(x), max(y), length = 15)
curve(predict(tmp.lm, data.frame(x = x), level = conf.level,
interval = confidence)[, 3], add = TRUE)
curve(predict(tmp.lm, data.frame(x = x), level = conf.level,
interval = confidence)[, 2], add = TRUE)
curve(predict(tmp.lm, data.frame(x = x), level = conf.level,
interval = prediction)[, 3], lty = 3, add = TRUE)
curve(predict(tmp.lm, data.frame(x = x), level = conf.level,
interval = prediction)[, 2], lty = 3, add = TRUE)
}
coeffs - floor(tmp.lm$coeff * 100 + 0.5)/100
plusorminus - c(+)
if (coeffs[1]  0)
plusorminus - c()
title(paste(y = , coeffs[2], x , plusorminus, coeffs[1]))
if (show.residuals) {
Fitted - fitted.values(tmp.lm)
Residuals - residuals(tmp.lm)
plot(Fitted, Residuals)
abline(h = 0)
title(Residuals vs. fitted)
hist(Residuals, main = hist of residuals)
qqnorm(Residuals, main = normal plot of residuals)
qqline(Residuals)
}
if (pred) {
print(predict(tmp.lm, data.frame(x = pred)))
}
  }
tmp.lm
}


#
# Simple test:

plot(x - 1:10)
y - 5*x + rnorm(10,0,1)

tmp1 - simple.lm2(x, y, plot = FALSE)
summary(tmp1)
# Produces the plot
simple.lm2(x, y)



HTH,
Dennis


On Thu, Oct 27, 2011 at 7:35 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Oct 27, 2011, at 6:58 AM, Elinor Zeev wrote:

 Hi,

 I am using PostgreSQL 9.1 and want to run


  loglinear_simple-simple.lm(x$p_overPnot,x$logQuantity,show.ci
 =FALSE,conf.level=confidenceLevel)

 without showing the graph,  however the show.ci=FALSE is ignored.

 My guess (and it is a guess since you did not specify a package for
 simple.lm)  is that specifying a conf.level is over-riding the show.ci
 parameter. Why would one specify a value for conf.level if one did not
 want a confidence level?

 I have quite few packages loaded and I still get:
 ?simple.lm
 No documentation for 'simple.lm' in specified packages and libraries:
 you could try '??simple.lm'


 --

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with parallel processing code

2011-10-27 Thread Dennis Murphy
Did you load the class package before calling lda()?

Dennis

On Thu, Oct 27, 2011 at 10:14 AM, 1Rnwb sbpuro...@gmail.com wrote:
 my modification gives me error
  rows- c(1:nrow(mat))
  scores - c()
  labels -c()
  itr-1000
  chnksz-ceiling(itr/getDoParWorkers())
 smpopt=list(chunkSize=chnksz)
  foreach(icount(itr),.combine=cbind,.options.smp=smpopts)%dopar%
 +  {
 +  train - sample(rows, length(rows)-1)
 +  label =0 ; if (mat$Self_T1D[-train] == N) label = 1  #need the value
 for this line, should it be 'N' or 'Y'
 +  z - lda(mat[train,4] ~ mat[train,1:3])
 +  score = predict(z, mat[-train, ])$pos[1]
 +  scores - c(scores, score)
 +  labels- c(labels, label)
 +  }
 Error in { : task 1 failed - could not find function lda

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/help-with-parallel-processing-code-tp3944303p3945243.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Syntax Check: rshape2 melt()

2011-10-27 Thread Dennis Murphy
Try this, based on your small example:

 tds.a - read.table(textConnection(
+  site   sampdate param quant
+ 1UDS-O 2006-12-06   TDS 10800
+ 4   STC-FS 1996-06-14  Cond   280
+ 7UDS-O 2007-10-04Mg  1620
+ 9UDS-O 2007-10-04   SO4  7580
+ 19 JCM-10B 2007-06-21Ca79
+ 20 JCM-10B 2007-06-21Cl   114), header = TRUE, stringsAsFactors = FALSE)
 closeAllConnections()

# Define param so that all of its levels are represented:
tds.a - within(tds.a, {
   param = factor(param, levels = c('TDS', 'Cond', 'Mg', 'Ca',
'Cl', 'Na', 'SO4'))
   sampdate = as.Date(sampdate)  } )
library('reshape2')
dcast(tds.a, site + sampdate ~ param, value_var = 'quant')

# Result:
 site   sampdate   TDS Cond   Mg Ca  Cl  SO4
1 JCM-10B 2007-06-21NA   NA   NA 79 114   NA
2  STC-FS 1996-06-14NA  280   NA NA  NA   NA
3   UDS-O 2006-12-06 10800   NA   NA NA  NA   NA
4   UDS-O 2007-10-04NA   NA 1620 NA  NA 7580

HTH,
Dennis

On Thu, Oct 27, 2011 at 8:26 AM, Rich Shepard rshep...@appl-ecosys.com wrote:
  This is my first excursion into using reshape2 and I want to ensure that
 the melt() function call is syntactically correct.

  The unmodifed data frame is organized this way:

 head(tds.anal)
      site   sampdate param quant
 1    UDS-O 2006-12-06   TDS 10800
 4   STC-FS 1996-06-14  Cond   280
 7    UDS-O 2007-10-04    Mg  1620
 9    UDS-O 2007-10-04   SO4  7580
 19 JCM-10B 2007-06-21    Ca    79
 20 JCM-10B 2007-06-21    Cl   114

  What I want looks like this:

    site    sampdate  TDS   Cond   Mg  Ca   Cl  Na  SO4
   UDS-O  2006-12-06  10800  NA   1620 NA   NA  NA  7580

 with the actual data for each param, of course.

  I've read the reshape.pdf, reshape2.pdf, the ?melt help page, and the
 ?melt.data.frame help page. I'm still unclear on the differences among
 measure.vars, variable.name, and value.name. After several attempts I have
 what may be what the melted tds.anal should look like:

 m.tds.anal - melt(tds.anal, id.vars = c('site', 'sampdate', 'param'), \
 measure.vars = 'quant', value.name = 'quant', na.rm = F)

 head(m.tds.anal)

     site   sampdate param variable quant
 1   UDS-O 2006-12-06   TDS    quant 10800
 2  STC-FS 1996-06-14  Cond    quant   280
 3   UDS-O 2007-10-04    Mg    quant  1620
 4   UDS-O 2007-10-04   SO4    quant  7580
 5 JCM-10B 2007-06-21    Ca    quant    79
 6 JCM-10B 2007-06-21    Cl    quant   114

  Is the melt() function call correct? Should the melted result look like
 the unmelted (long form in Paul Dalgaard's book) data with the additional
 'variable' column containing 'quant' for each row?

 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with a scatter plot

2011-10-26 Thread Dennis Murphy
Hi:

The sort of thing you appear to want is fairly straightforward to in
lattice and ggplot2, as both have ways to automate conditioning plots.
Since you were looking at ggpot2, let's consider that problem. You
don't really show enough data to provide a useful demonstration, but
let's see if we can capture the essential structure.

 I want to create a plot where the colors of the hits represent the Product
 (A,B,C..), the character represents the color (X for yellow, box for green,
 etc..), the X axis is the price and the Y axis is the number (0-5) from the
 different Stores (A,B,C,D). I've thought either to create a matrix of 4
 plots ( for the 4 stores) or in some creative way combine them into one
 plot?

The first step is to melt the data so that Store becomes a factor
variable and its corresponding values are assigned to another
variable. To that end, one can invoke the very useful melt() function
in the reshape2 package:

library('reshape2')
mdata - melt(mydata, id = c('Product', 'Color', 'Price'))

This creates a new data frame with variables Product, Color, Price,
variable and value. variable contains StoreA, ... StoreD as factor
levels and value is a numeric variable consisting of the corresponding
values. For its structure, see
str(mdata)

If you want to change StoreA - StoreD to A - D, say, then you could
optionally do

mdata$variable - factor(mdata$variable, labels = LETTERS[1:4])


Assuming that you've done enough reading to understand what aesthetics
are about, the problem is essentially this:

x = Price
y = value
shape = Color
faceting variable = variable (the stores)

So a template of a ggplot2 graph for the melted data might look something like

library('ggplot2')
ggplot(mdata, aes(x = Price, y = value, shape = Color)) +
geom_point() +
facet_wrap( ~ variable, ncol = 2) +
   scale_shape_manual('Color', breaks = levels(mdata$Color),
 values = c(4, 0, 2, 8),
 labels = c('Blue',
'Green', 'Red', 'Yellow'))

This assigns  x, box, triangle and asterisk as shapes via their
numeric codes (see Hadley Wickham's ggplot2 book, p. 197 for the
reference). The labels = argument lets you change the letters B, G, R,
Y (which would comprise the default labels) to something more
evocative.

If for some odd reason you wanted to add corresponding colors to the
shapes, you could also do that, as follows:

ggplot(mdata, aes(x = Price, y = value, shape = Color, colour = Color)) +
geom_point() +
facet_wrap( ~ variable, ncol = 2) +
   scale_shape_manual('Color', breaks = levels(mdata$Color),
 values = c(4, 0, 2, 8),
 labels = c('Blue',
'Green', 'Red', 'Yellow')) +
   scale_colour_manual('Color', breaks = levels(mdata$Color),
 values = c('blue',
'green', 'red', 'yellow'),
 labels = c('Blue',
'Green', 'Red', 'Yellow'))

This should color the shapes in the graph and provide one (merged)
legend with colored shapes as symbols.

HTH,
Dennis


On Tue, Oct 25, 2011 at 11:48 PM, RanRL rnr...@gmail.com wrote:
 Hi everyone,

 I have some data about a market research which I want to arrange in one plot
 for easy viewing,
 the data looks something like:

 Product        Color        StoreA  StoreB  StoreC  StoreD      Price

 ProdA           R               NA        4.33     2         4.33         35
                  G                NA        4.33     2         4.33
 35
                   B               NA        4.33     2         3.76
 58
                  Y                NA         3.72    3         5.33
 23

 ProdB           B                5.44       NA      4.22      3.76        87

 ProdC           G                 4.77      3.22    4.77     2.10         65
                   B                 ...           ...     ...        ...
 ..

 And so on...

 I want to create a plot where the colors of the hits represent the Product
 (A,B,C..), the characther represent the color (X for yellow, box for green,
 etc..), the X axis is the price and the Y axis is the number (0-5) from the
 different Stores (A,B,C,D). I've thought either to create a matrix of 4
 plots ( for the 4 stores) or in some creative way combine them into one
 plot?

 Please help me or point me in the right direction as to which functions to
 look into, I've been playing around with ggplot for a few days, but can't
 seem to wrap my head around it yet...

 Thanks

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Help-with-a-scatter-plot-tp3939585p3939585.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, 

Re: [R] dotPlot with diagonal

2011-10-26 Thread Dennis Murphy
Let's see: there is a dotPlot() function in each of the following packages:
BHH2, caret, mosaic, qualityTools
Would you be kind enough to share which of these packages (if any) you
are using?

Dennis

On Wed, Oct 26, 2011 at 4:25 AM, Jörg Reuter jo...@reuter.at wrote:
 Hi,
 I want draw a dotPlot. All works fine:
 (Seq - matrix(c(1, 1, 6, 1, 2, 2, 5, 4, 3, 3, 4, 3,
 4, 4, 3, 2, 5, 5, 2, 5, 6, 6, 1, 6), ncol = 6))
 dotPlot(Seq[1,], Seq[2,], main = Sequenz 1 und
 Sequenz 2, asp = 1)
 Is there a way to draw a small diagonal, begin at (0/0) to (6/6)
 (perhaps in red??) or must I use gimp? I have many dotPlots, so it is
 fine if R can do this.

 Thanks Joerg

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >