Re: [R-es] Sustitución NAs

2015-10-30 Thread Carlos Ortega
Hola,

Esta una forma de hacerlo.
Fíjate que por la forma que has dado los datos, tienes blancos antes y
después de cada cadena y por no cambiarlo, en vez de utilizar
"is.na(datIn$COLEGIO)"
he tenido que hacer lo que ves datIn$COLEGIO== " NA "

#-
> Lines <- "
+ | NOMBRE  | LOCALIDAD | COLEGIO |
+ | JUAN | SANTANDER | A |
+ | ALBERTO | LA RIOJA | C  |
+ | MANUEL | MADRID | B |
+ | MARTA | MADRID | NA |
+ | IRENE | VALLADOLID | NA |
+ | LUCAS | LA RIOJA | C |
+ | LUIS | LA RIOJA | NA |
+ | ALBA | MADRID | B |
+ | ANTONIO | MADRID | NA |
+ | JOSE | VALLADOLID | D |
+ | JUAN | LA RIOJA | C |
+ "
>
> datIn <- read.table(textConnection(Lines), as.is=TRUE, header=TRUE,
sep="|")
> datIn <- datIn[, -c(1,5)]
> datIn
  NOMBRELOCALIDAD COLEGIO
1  JUANSANTANDER   A
2   ALBERTO LA RIOJA  C
3MANUEL   MADRID   B
4 MARTA   MADRID  NA
5 IRENE   VALLADOLID  NA
6 LUCAS LA RIOJA   C
7  LUIS LA RIOJA  NA
8  ALBA   MADRID   B
9   ANTONIO   MADRID  NA
10 JOSE   VALLADOLID   D
11 JUAN LA RIOJA   C
>
>  datIn$COLEGIO <- ifelse(datIn$LOCALIDAD==" MADRID " & datIn$COLEGIO=="
NA ", "B", datIn$COLEGIO)
>  datIn$COLEGIO <- ifelse(datIn$LOCALIDAD==" LA RIOJA " & datIn$COLEGIO==
" NA ", "C", datIn$COLEGIO)
>  datIn$COLEGIO <- ifelse(datIn$LOCALIDAD==" VALLADOLID " &
datIn$COLEGIO== " NA ", "D", datIn$COLEGIO)
>  datIn
  NOMBRELOCALIDAD COLEGIO
1  JUANSANTANDER   A
2   ALBERTO LA RIOJA  C
3MANUEL   MADRID   B
4 MARTA   MADRIDB
5 IRENE   VALLADOLIDD
6 LUCAS LA RIOJA   C
7  LUIS LA RIOJAC
8  ALBA   MADRID   B
9   ANTONIO   MADRIDB
10 JOSE   VALLADOLID   D
11 JUAN LA RIOJA   C


#-

Saludos,
Carlos Ortega
www.qualityexcellence.es

El 30 de octubre de 2015, 9:00, jose luis  escribió:

>
> Hola a todos. Pongo esta tabla sencilla a ver si alguien puede echarme un
> cable. En la columna COLEGIO salen unos NAs.
>
>
> | NOMBRE  | LOCALIDAD | COLEGIO |
> | JUAN | SANTANDER | A |
> | ALBERTO | LA RIOJA | C  |
> | MANUEL | MADRID | B |
> | MARTA | MADRID | NA |
> | IRENE | VALLADOLID | NA |
> | LUCAS | LA RIOJA | C |
> | LUIS | LA RIOJA | NA |
> | ALBA | MADRID | B |
> | ANTONIO | MADRID | NA |
> | JOSE | VALLADOLID | D |
> | JUAN | LA RIOJA | C |
>
>
>
>
> Pues bien, estoy buscando una orden que me diga que:si la LOCALIDAD es
> MADRID, sustituya los posibles NA de la variable COLEGIO por la letra B,
> para LA RIOJA que los sustituya por la letra C, y para VALLADOLID por la
> letra D.Saludos
> Jose Luis
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R-es] Sustitución NAs

2015-10-30 Thread jose luis

Hola a todos. Pongo esta tabla sencilla a ver si alguien puede echarme un 
cable. En la columna COLEGIO salen unos NAs. 


| NOMBRE  | LOCALIDAD | COLEGIO |
| JUAN | SANTANDER | A |
| ALBERTO | LA RIOJA | C  |
| MANUEL | MADRID | B |
| MARTA | MADRID | NA |
| IRENE | VALLADOLID | NA |
| LUCAS | LA RIOJA | C |
| LUIS | LA RIOJA | NA |
| ALBA | MADRID | B |
| ANTONIO | MADRID | NA |
| JOSE | VALLADOLID | D |
| JUAN | LA RIOJA | C |




Pues bien, estoy buscando una orden que me diga que:si la LOCALIDAD es MADRID, 
sustituya los posibles NA de la variable COLEGIO por la letra B, para LA RIOJA 
que los sustituya por la letra C, y para VALLADOLID por la letra D.Saludos
Jose Luis




   
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] ggplot2: Controlling width of line

2015-10-30 Thread Jeff Newmiller
Not sure why you are making this so complicated. In what way is the 
following not meeting your expectations?


ggplot( data=matz
  , aes( x = X1
   , y = value
   , col=X2
   , lty=X2
   , shape=X2
   , size=mylwd
   )
  ) +
   geom_line() +
   geom_point( size = 3 ) +
   scale_linetype_manual( values = ltyvect ) +
   scale_color_manual( values = colvect ) +
   scale_size_continuous( range = c( 0.1, 2 ) ) +
   theme( legend.title = element_blank() )


On Fri, 30 Oct 2015, Brian Smith wrote:


Hi,

I was trying to increase the size of certain lines in my plot (samples 'B'
and 'D' in example below). However, when I try to modify the line size, I
seem to screw up the linetypes. Also, is there a way to reflect the line
size in the legend?

Here is some sample code for illustration:

library(reshape)
matx <- matrix(sample(1:1000),4,5)
colnames(matx) <-  LETTERS[1:5]
rownames(matx) <- 1:4

subset1 <- c('B','D')

ltyvect <- c("solid","longdash","longdash","solid","solid")
colvect <- c("red","black","orange","blue","lightblue")
lwdvect <- rep(1,ncol(matx))

## For subset of samples, increase line width size
fmakelwd <- function(set1,subset1,vals1,val2=2){
   idx <- set1 %in% subset1
   vals1[idx] <- val2
   return(vals1)
}

maty <- melt(matx)
set1 <- maty$X2
vals1 <- rep(1,length(set1))

mylwd <- fmakelwd(set1,subset1,vals1,val2=1.5)
matz <- data.frame(maty,mylwd)

# code without trying to modify line size

p <- ggplot(data=matz,aes(x=X1, y = value,col=X2,lty=X2,shape=X2))
p <- p + geom_line(aes(group = X2)) + geom_point(aes(shape =
factor(X2)),size=3) +
   scale_linetype_manual(values = ltyvect) +
   scale_color_manual(values = colvect) +
   theme(legend.title = element_blank())
p


#  modifying line size

p <- ggplot(data=matz,aes(x=X1, y =
value,col=X2,lty=X2,shape=X2,size=mylwd))
p <- p + geom_line(aes(group = X2,size=mylwd)) + geom_point(aes(shape =
factor(X2)),size=3) +
   scale_linetype_manual(values = ltyvect) +
   scale_color_manual(values = colvect) +
   scale_size(range=c(0.1, 2), guide=FALSE) +
   theme(legend.title = element_blank())
p


#


thanks!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to work with time of day (independent of date)

2015-10-30 Thread Jeff Newmiller
Sys.setenv( TZ="Etc/GMT+8" )

executed before converting to POSIXct works for me, though using that string 
with the tz parameter also works. You should read ?Sys.timezone. For windows, 
look at the files in C:\Program Files\R\R-3.2.2\share\zoneinfo and note that 
PST is not defined though PST8PDT is.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On October 30, 2015 12:30:22 PM PDT, Clint Bowman  wrote:
>Bill,
>
>Your final words, "changes in spring and fall" reminds me of a problem 
>I have yet to solve.  Most of my data is logged in standard time (no 
>daylight times) but often I see the note "daylight time encountered 
>switching to UTC" even when I've specified "tz="PST".
>
>I hope I've been missing something simple--any suggestions?
>
>TIA
>
>Clint
>
>Clint Bowman   INTERNET:   cl...@ecy.wa.gov
>Air Quality ModelerINTERNET:   cl...@math.utah.edu
>Department of Ecology  VOICE:  (360) 407-6815
>PO Box 47600   FAX:(360) 407-7534
>Olympia, WA 98504-7600
>
> USPS:   PO Box 47600, Olympia, WA 98504-7600
> Parcels:300 Desmond Drive, Lacey, WA 98503-1274
>
>On Fri, 30 Oct 2015, William Dunlap wrote:
>
>> You can use difftime objects to get the amount of time since the
>start of
>> the current day.  E.g.,
>>  > dateTime <- as.POSIXlt( c("2015-10-29 00:50:00",
>>  + "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30
>00:50:00",
>>  + "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31
>00:50:00",
>>  + "2015-10-31 10:30:00"))
>>  > date <- trunc(dateTime, units="days")
>>  > sinceMidnight <- difftime(dateTime, date, units="mins")
>>  > sinceMidnight
>>  Time differences in mins
>>  [1]   50  570 1270   50  570 1270   50  630
>>
>> I use difftime(x, y, units=) instead of the similar x-y because the
>latter
>> chooses
>> the units based on how far apart x and y are, while the former gives
>me
>> consistent
>> units:
>>  > dateTime[1] - date[1]
>>  Time difference of 50 mins
>>  > as.numeric(.Last.value)
>>  [1] 50
>>  > dateTime[5:6] - date[5:6]
>>  Time differences in hours
>>  [1]  9.5 21.16667
>>  > as.numeric(.Last.value)
>>  [1]  9.5 21.16667
>>
>> Depending on what you are using this for, you might want to compute
>time
>> since 3am
>> of the current day so you don't get discontinuities for most times
>when the
>> time
>> changes in spring and fall.
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Oct 30, 2015 at 10:35 AM, Daniel Nordlund
>
>> wrote:
>>
>>> I have a data frame with date/times represented as charaacter
>strings and
>>> and a value at that date/time.  I want to get the mean value for
>each time
>>> of day, across days, and then plot time of day on the x-axis and
>means on
>>> the y-axis.  R doesn't appear to have a built-in time of day time
>type
>>> (independent of a date), unless I have missed something. What is the
>best
>>> way to create a time variable so that I can aggregate and plot by
>time of
>>> day, with time labelled in HH:MM format.  My current approach is to
>convert
>>> all date/times to the same date.  I can then manage the rest of what
>I want
>>> with ggplot2.  But I am  wondering if there is an easier/better way
>to do
>>> deal with time of day.
>>>
>>> Here is a sample data frame.
>>>
>>> df <- structure(list(date = structure(1:8, .Label = c("2015-10-29
>>> 00:50:00",
>>> "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
>>> "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
>>> "2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
>>> 80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"),
>row.names =
>>> c(NA,
>>> -8L), class = "data.frame")
>>>
>>>
>>> Any suggestions appreciated.
>>>
>>> Dan
>>>
>>> --
>>> Daniel Nordlund
>>> Bothell, WA  USA
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read 

Re: [R] User-defined functions in dplyr

2015-10-30 Thread Axel Urbiz
So in this case, "create_bins" returns a vector and I still get the same
error.


create_bins <- function(x, nBins)
{
  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
  bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
  bin
}


### Using dplyr (fails)
nBins = 10
by_group <- dplyr::group_by(df, models)
res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
Error: not a vector

On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller 
wrote:

> You are jumping the gun (your other email did get through) and you are
> posting using HTML (which does not come through on the list). Some time
> (re)reading the Posting Guide mentioned at the bottom of all emails on this
> list seems to be in order.
>
> The error is actually quite clear. You should return a vector from your
> function, not a data frame.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On October 29, 2015 4:55:19 PM MST, Axel Urbiz 
> wrote:
> >Hello,
> >
> >Sorry, resending this question as the prior was not sent properly.
> >
> >I’m using the plyr package below to add a variable named "bin" to my
> >original data frame "df" with the user-defined function "create_bins".
> >I'd
> >like to get similar results using dplyr instead, but failing to do so.
> >
> >set.seed(4)
> >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
> >c("model1", "model2")))
> >
> >
> >### Using plyr (works fine)
> >create_bins <- function(x, nBins)
> >{
> >  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
> >  dfB <-  data.frame(pred = x$pred,
> >bin = cut(x$pred, breaks = Breaks, include.lowest =
> >TRUE))
> >  dfB
> >}
> >
> >nBins = 10
> >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> >head(res_plyr)
> >
> >### Using dplyr (fails)
> >
> >by_group <- dplyr::group_by(df, models)
> >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> >Error: not a vector
> >
> >
> >Any help would be much appreciated.
> >
> >Best,
> >Axel.
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R-es] Seleccionar dos tklistbox a la vez

2015-10-30 Thread Jesús Para Fernández
Buenas, 

Quiero, usando la libreria tcltk, poder seleccionar a la vez en dos listbox, 
pero s�lo me deja en uno u el otro, pero no los dos a la vez. 

He probado cambiando los selectmode a sus diferentes opciones (multiple, 
browse,single), pero cuando voy a seleccionar el segundo, me quita la seleccion 
del primero. 

El c�digo:

library(tcltk)
tt<-tktoplevel()
datos<<-c("uno","sdos","tres")

tl<-tklistbox(tt,height=4,selectmode="browse",background="white")
for(i in 1:3){tkinsert(tl,"end",datos[i])}
tkselection.set(tl,0)
tkpack(tl)

tl2<-tklistbox(tt,height=4,selectmode="browse",background="white")
for(i in 1:3){tkinsert(tl2,"end",datos[i])}
tkselection.set(tl2,2)
tkpack(tl2)


Gracias!!
Jes�s
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R-es] Seleccionar dos tklistbox a la vez

2015-10-30 Thread Jesús Para Fernández

Era una bobaa, es añadir el 

exportselection="FALSE"

Gracias igualmente
From: j.para.fernan...@hotmail.com
To: r-help-es@r-project.org
Date: Fri, 30 Oct 2015 12:22:18 +0100
Subject: [R-es] Seleccionar dos tklistbox a la vez

Buenas, 
 
Quiero, usando la libreria tcltk, poder seleccionar a la vez en dos listbox, 
pero s�lo me deja en uno u el otro, pero no los dos a la vez. 
 
He probado cambiando los selectmode a sus diferentes opciones (multiple, 
browse,single), pero cuando voy a seleccionar el segundo, me quita la seleccion 
del primero. 
 
El c�digo:
 
library(tcltk)
tt<-tktoplevel()
datos<<-c("uno","sdos","tres")
 
tl<-tklistbox(tt,height=4,selectmode="browse",background="white")
for(i in 1:3){tkinsert(tl,"end",datos[i])}
tkselection.set(tl,0)
tkpack(tl)
 
tl2<-tklistbox(tt,height=4,selectmode="browse",background="white")
for(i in 1:3){tkinsert(tl2,"end",datos[i])}
tkselection.set(tl2,2)
tkpack(tl2)
 
 
Gracias!!
Jes�s
  
[[alternative HTML version deleted]]
 

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es 
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] Achieve independent fine user control of ggplot geom settings when using groups in multiple geom's

2015-10-30 Thread Hadley Wickham
I'd recommend reading the ggplot2 book - learning more about how
scales work in ggplot2 will help you understand why this isn't
possible.
Hadley

On Thu, Oct 29, 2015 at 6:31 PM, sbihorel
 wrote:
> Thank for your reply,
>
> I may accept your point about the mapping consistency when the different
> geom's use the same data source. However, as pointed out in my example code,
> this does not have to be the case. Hence my question about the geom-specific
> control of group-dependent graphical settings.
>
> Sebastien
>
>
> On 10/29/2015 4:49 PM, Jeff Newmiller wrote:
>>
>> I think a fundamental design principle of ggplot is that mapping of values
>> to visual representation are consistent within a single plot, so reassigning
>> color mapping for different elements would not be supported.
>>
>> That being said, it is possible to explicitly control specific attributes
>> within a single geom outside of the mapping, though this usually does break
>> mappings in the legend.
>>
>> ---
>> Jeff NewmillerThe .   .  Go
>> Live...
>> DCN:Basics: ##.#.   ##.#.  Live
>> Go...
>>Live:   OO#.. Dead: OO#..  Playing
>> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>>
>> ---
>> Sent from my phone. Please excuse my brevity.
>>
>> On October 29, 2015 11:27:55 AM MST, sbihorel
>>  wrote:
>>>
>>> Thank you for your reply.
>>>
>>> I do not have anything specific data/geom/grouping in mind, rather a
>>> framework in which users would just pile of each other layer after
>>> layer
>>> of geom each defined with specific settings. A minimum realistic
>>> scenario would a geom_point followed by a geom_smooth or a geom_path
>>> using different colors...
>>>
>>> Sebastien
>>>
>>> On 10/29/2015 1:34 PM, Ista Zahn wrote:

 I would say in a word, 'no'. What you seem to be implying is that you
 want multiple color scales, multiple shape scales, etc. As far as I
 know there is no support for that in ggplot2.

 Perhaps if you show us what you're actually trying to accomplish
 someone can suggest a solution or at least a work-around.

 Best,
 Ista

 On Thu, Oct 29, 2015 at 12:26 PM, sbihorel
  wrote:
>
> Hello,
>
> Before I get to my question, I want to make clear that the topic of
>>>
>>> my
>
> present post is similar to posts I recently submitted to the list.
>>>
>>> Although
>
> I appreciate the replies I got, I believe that I did not correctly
>>>
>>> frame
>
> these previous posts to get to the bottom of things.
> I also want to make clear that the code example that I have inserted
>>>
>>> in this
>
> post is meant to illustrate my points/questions and does not reflect
>>>
>>> a
>
> particular interest in the data or the sequence of ggplot geom's
>>>
>>> used
>
> (except otherwise mentioned). Actually, I purposefully used junk
>>>
>>> meaningless
>
> data, geom's sequence, and settings, so that we agree the plot is
>>>
>>> ugly and
>
> that we, hopefully, don't get hang on specifics and start discussing
>>>
>>> about
>
> the merit of one approach vs another.
>
> So here are my questions:
>
> 1- Can a user independently control the settings of each geom's used
>>>
>>> in a
>
> ggplot call sequence when grouping is required?
>
> By control, I mean: user defines the graphical settings (groups,
>>>
>>> symbol
>
> shapes, colors, fill colors, line types, size scales, and alpha) and
>>>
>>> does
>
> not let ggplot choose these settings from some theme default.
> By independently, I mean: the set of graphical settings can be
>>>
>>> totally
>
> different from one group to the next and from one geom to the next.
>
> If this fine control can be achieved, how would you go about it
>>>
>>> (please, be
>
> assured that I already spent hours miserably failing to get to
>>>
>>> anything
>
> remotely productive, so your help would be really appreciated)?
>
> library(dplyr)
> library(tidyr)
> library(ggplot2)
> set.seed(1234)
> dummy <- data.frame(dummy = numeric())
> data <- data.frame(x1 = rep(-2:2, each = 80) + rnorm(4000, sd =
>>>
>>> 0.1),
>
>  g1 = rep(1:4, each = 1000))
> data <- data %>% mutate(y1 = -x1^2 + 2*x1 - 2 + g1 + rnorm(4000, sd
>>>
>>> = 0.25))
>
> data2 <- data %>% select(x2=x1, y2=y1, g2=g1) %>% mutate(x2=-x2)
> data3 <- data.frame(x3 = sample(seq(-2, 2, by = 0.1), 20, replace 

Re: [R] monte carlo simulations in permanova in vegan package

2015-10-30 Thread Sean Porter
Thank you Jari,

It seems now that my question is morphing more into a statistical one, and
perhaps not appropriate for R-help list, so apologies. Yes we are talking
about the latest versions of the vegan and permute packages. 

When there are an insufficient number of permutations available due to low
sample sizes apparently an alternative is to use the result given in
Anderson & Robinson (2003) regarding the asymptotic permutation of the
numerator (or denominator) of the test statistic under permutation. And I
quote from Anderson et al. 2008 "It is demonstrated that each of the sums of
squares has, under permutation, an asymptotic distribution that is a linear
form in chi-square variables, where the coefficients are actually the
eigenvalues from a PCO of the resemblance matrix. Thus, chi-square variables
can be drawn randomly and independently, using Monte Carlo sampling, and
these can be combined with the eigenvalues to construct the asymptotic
permutation distribution for each of the numerator and denominator and,
thus,  for the entire pseudo-F statistic, in the event that too few actual
unique permutations exist."

Anderson, Gorley & Clarke. 2008. PERMANOVA+ for PRIMER: Guide to software
and statistical models.
Anderson & Robinson 2003. Generalised discriminant analysis based on
distances. Australian and New Zealand Journal of Statistics. 45: 301-318

I am sure you already know this! The above is what I am trying to do in the
vegan package though.. 

Apologies if I am missing something and if what you have said still applies
(that is not appropriate to exceed the possible number of permutations), I
am not a statistician..so any help/clarity would be welcome.. 


Regards, sean

 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jari Oksanen
Sent: 29 October 2015 03:23 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] monte carlo simulations in permanova in vegan package

Sean Porter  ori.org.za> writes:

> I am trying to run a PERMANOVA in the vegan package with an 
> appropriate number of permutations (see example below), ideally .
> Obviously that number of permutations does not exists so I would like 
> to use Monte Carlo permutation tests to derive the probability value, 
> as is done in the commercial package PERMANOVA+ for PRIMER. How can I 
> adapt my code so that adonis will do so ? Many thanks, Sean
[...clip...]
> 
> > permanova <- adonis(species ~ time, data = time, permutations=999,
> method="bray")
> 
> 'nperm' > set of all permutations; Resetting 'nperm'.
> 
I assume we are talking about the latest version of vegan and permute
packages. In that case you really should switch to complete enumeration if
you request exceeds the number of distinct permutations. As people have told
you, you should be satisfied with that because there are no more distinct
permutations. Alternatively, you need more data.

If you mean by Monte Carlo that the same that you have a sampling with
return instead of permutation, or that the same observation can appear
several times and therefore some other unit is missing, then there are two
pieces of advice:

1. You should not do so.
2. If you want to do so, you can generate your resampling matrices by hand
and use that matrix as the argument of permutations=. See the documentations
(?adonis) which tells how to do so.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to direct R to read commands from a text file

2015-10-30 Thread Boris Steipe
I should think this thread contains all you need:
   http://stackoverflow.com/questions/18306362/run-r-script-from-command-line


B.


On Oct 30, 2015, at 11:07 PM, Gregory Coats  wrote:

> All of the R commands that I want to issue are in a text file that concludes 
> with the R command quit (save = “yes”), and is called R_commands.txt. I can 
> start R, and then manually issue
> source (“R_commands.txt”).
> But I would prefer to issue, from the bash command line, a one line command, 
> directing R to start, execute all of the R commands in R_commands.txt, and 
> then quit. How do I do that?
> Greg Coats
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If else

2015-10-30 Thread Ista Zahn
Using numeric for missing sounds like asking for trouble. But if you
must, something like

mydata$confusingWillCauseProblemsLater <-
  ifelse(
is.na(mydata$sex),
0,
as.numeric(factor(mydata$sex,
  levels = c("M", "F"

should do it.

Best,
Ista

On Fri, Oct 30, 2015 at 9:15 PM, Val  wrote:
> Hi all,
> Iam trying to change character  to numeric but have probelm
>
> mydata <- read.table(header=TRUE, text=', sep=" "
>  id  sex
>   1  NA
>   2  NA
>   3  M
>   4  F
>   5  M
>   6  F
>   7  F
>')
>
> if sex is missing then sex=0;
> if sex is"M" then sex=1;
> if sex is"F" then sex=2;
>
> Any help please ?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to direct R to read commands from a text file

2015-10-30 Thread Gregory Coats
All of the R commands that I want to issue are in a text file that concludes 
with the R command quit (save = “yes”), and is called R_commands.txt. I can 
start R, and then manually issue
source (“R_commands.txt”).
But I would prefer to issue, from the bash command line, a one line command, 
directing R to start, execute all of the R commands in R_commands.txt, and then 
quit. How do I do that?
Greg Coats
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] If else

2015-10-30 Thread Val
Hi all,
Iam trying to change character  to numeric but have probelm

mydata <- read.table(header=TRUE, text=', sep=" "
 id  sex
  1  NA
  2  NA
  3  M
  4  F
  5  M
  6  F
  7  F
   ')

if sex is missing then sex=0;
if sex is"M" then sex=1;
if sex is"F" then sex=2;

Any help please ?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] If else

2015-10-30 Thread Val
I am trying to change the mydata$sex  from character to numeric
I want teh out put like
   id  sex
  1  NA   0
  2  NA   0
  3  M 1
  4  F 2
  5  M1
  6  F 2
  7  F2

mydata$sex1 <- 0
if(mydata$sex =="M " ){
  mydata$sex1<-1
} else {
  mydata$sex1<-2
}

mydata$sex1

Warning message:In if (mydata$sex == "M ") { :
  the condition has length > 1 and only the first element will be
used> mydata$sex1[1] 2 2 2 2 2 2 2 2

>


On Fri, Oct 30, 2015 at 8:28 PM, Ista Zahn  wrote:

> Using numeric for missing sounds like asking for trouble. But if you
> must, something like
>
> mydata$confusingWillCauseProblemsLater <-
>   ifelse(
> is.na(mydata$sex),
> 0,
> as.numeric(factor(mydata$sex,
>   levels = c("M", "F"
>
> should do it.
>
> Best,
> Ista
>
> On Fri, Oct 30, 2015 at 9:15 PM, Val  wrote:
> > Hi all,
> > Iam trying to change character  to numeric but have probelm
> >
> > mydata <- read.table(header=TRUE, text=', sep=" "
> >  id  sex
> >   1  NA
> >   2  NA
> >   3  M
> >   4  F
> >   5  M
> >   6  F
> >   7  F
> >')
> >
> > if sex is missing then sex=0;
> > if sex is"M" then sex=1;
> > if sex is"F" then sex=2;
> >
> > Any help please ?
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] If else

2015-10-30 Thread Rolf Turner

On 31/10/15 14:15, Val wrote:

Hi all,
Iam trying to change character  to numeric but have probelm

mydata <- read.table(header=TRUE, text=', sep=" "
  id  sex
   1  NA
   2  NA
   3  M
   4  F
   5  M
   6  F
   7  F
')

if sex is missing then sex=0;
if sex is"M" then sex=1;
if sex is"F" then sex=2;

Any help please ?


sex <- c(NA,NA,"M","F","M","F","F")

# 1.
match(sex,c(NA,"M","F"))-1

# 2.
as.numeric(factor(sex,exclude=NULL,levels=c(NA,"M","F")))-1

cheers,

Rolf Turner

P. S. As others have told you, converting character to numeric is highly 
ill-advised.


R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A simple crop/clip of a png map

2015-10-30 Thread Jim Burke
I have a 5.76" x 5.75" png image which I would like to crop to some inch
size. To use as a report header (after placing some title text on it).

So how to use R to crop a nice rectangle from my image?

Thanks for your thoughts
Jim Burke

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nested ANOVA yields surprising results

2015-10-30 Thread Daniel Wagenaar

Dear R users:

All textbook references that I consult say that in a nested ANOVA (e.g., 
A/B), the F statistic for factor A should be calculated as


F_A = MS_A / MS_(B within A).

But when I run this simple example:

set.seed(1)
A <- factor(rep(1:3, each=4))
B <- factor(rep(1:2, 3, each=2))
Y <- rnorm(12)
anova(lm(Y ~ A/B))

I get this result:

  Analysis of Variance Table

  Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
  A  2 0.4735 0.23675  0.2845 0.7620
  A:B3 1.7635 0.58783  0.7064 0.5823
  Residuals  6 4.9931 0.83218

Evidently, R calculates the F value for A as MS_A / MS_Residuals.

While it is straightforward enough to calculate what I think is the 
correct result from the table, I am surprised that R doesn't give me 
that answer directly. Does anybody know if R's behavior is intentional, 
and if so, why? Equally importantly, is there a straightforward way to 
make R give the answer I expect, that is:


 Df Sum Sq Mean Sq F value Pr(>F)
  A   2 0.4735 0.23675  0.4028 0.6999

The students in my statistics class would be much happier if they didn't 
have to type things like


  a <- anova(...)
  F <- a$`Sum Sq`[1] / a$`Sum Sq`[2]
  P <- 1 - pf(F, a$Df[1], a$Df[2])

(They are not R programmers (yet).) And to be honest, I would find it 
easier to read those results directly from the table as well.


Thanks,

Daniel Wagenaar

--
Daniel A. Wagenaar, PhD
Assistant Professor
Department of Biological Sciences
McMicken College of Arts and Sciences
University of Cincinnati
Cincinnati, OH 45221
Phone: +1 (513) 556-9757
Email: daniel.wagen...@uc.edu
Web: http://www.danielwagenaar.net

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to add legend to 2 different data frame overplot?

2015-10-30 Thread C W
Hi,

I am trying to do add a legend to an overplot, something like this:

ggplot() +
geom_density(data = df1, aes(x = x), fill = "green", show_guide =
FALSE) +
geom_area(data = df2, aes(x = x), fill = "yellow", show_guide = FALSE) +
scale_color_manual(values = c("green", "yellow"), labels = c('df1',
'df2'))

But the legend doesn't actually show up when I plot it.  How should I fix
this?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Achieve independent fine user control of ggplot geom settings when using groups in multiple geom's

2015-10-30 Thread sbihorel

Thanks Hadley,

I will certainly read your book. Unfortunately, what you just confirmed 
as the developer of ggplot means that ggplot is non-starter for what I 
want to build. Too bad, I was starting to appreciate some of its 
advantages over lattice.


About your book, in case I do not find a proper box on which to build it 
from source, I was wondering when it would become available in hard copy.


Sebastien

On 10/30/2015 07:34, Hadley Wickham wrote:

I'd recommend reading the ggplot2 book - learning more about how
scales work in ggplot2 will help you understand why this isn't
possible.
Hadley

On Thu, Oct 29, 2015 at 6:31 PM, sbihorel
 wrote:

Thank for your reply,

I may accept your point about the mapping consistency when the different
geom's use the same data source. However, as pointed out in my example code,
this does not have to be the case. Hence my question about the geom-specific
control of group-dependent graphical settings.

Sebastien


On 10/29/2015 4:49 PM, Jeff Newmiller wrote:

I think a fundamental design principle of ggplot is that mapping of values
to visual representation are consistent within a single plot, so reassigning
color mapping for different elements would not be supported.

That being said, it is possible to explicitly control specific attributes
within a single geom outside of the mapping, though this usually does break
mappings in the legend.

---
Jeff NewmillerThe .   .  Go
Live...
DCN:Basics: ##.#.   ##.#.  Live
Go...
Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.
rocks...1k

---
Sent from my phone. Please excuse my brevity.

On October 29, 2015 11:27:55 AM MST, sbihorel
 wrote:

Thank you for your reply.

I do not have anything specific data/geom/grouping in mind, rather a
framework in which users would just pile of each other layer after
layer
of geom each defined with specific settings. A minimum realistic
scenario would a geom_point followed by a geom_smooth or a geom_path
using different colors...

Sebastien

On 10/29/2015 1:34 PM, Ista Zahn wrote:

I would say in a word, 'no'. What you seem to be implying is that you
want multiple color scales, multiple shape scales, etc. As far as I
know there is no support for that in ggplot2.

Perhaps if you show us what you're actually trying to accomplish
someone can suggest a solution or at least a work-around.

Best,
Ista

On Thu, Oct 29, 2015 at 12:26 PM, sbihorel
 wrote:

Hello,

Before I get to my question, I want to make clear that the topic of

my

present post is similar to posts I recently submitted to the list.

Although

I appreciate the replies I got, I believe that I did not correctly

frame

these previous posts to get to the bottom of things.
I also want to make clear that the code example that I have inserted

in this

post is meant to illustrate my points/questions and does not reflect

a

particular interest in the data or the sequence of ggplot geom's

used

(except otherwise mentioned). Actually, I purposefully used junk

meaningless

data, geom's sequence, and settings, so that we agree the plot is

ugly and

that we, hopefully, don't get hang on specifics and start discussing

about

the merit of one approach vs another.

So here are my questions:

1- Can a user independently control the settings of each geom's used

in a

ggplot call sequence when grouping is required?

By control, I mean: user defines the graphical settings (groups,

symbol

shapes, colors, fill colors, line types, size scales, and alpha) and

does

not let ggplot choose these settings from some theme default.
By independently, I mean: the set of graphical settings can be

totally

different from one group to the next and from one geom to the next.

If this fine control can be achieved, how would you go about it

(please, be

assured that I already spent hours miserably failing to get to

anything

remotely productive, so your help would be really appreciated)?

library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(1234)
dummy <- data.frame(dummy = numeric())
data <- data.frame(x1 = rep(-2:2, each = 80) + rnorm(4000, sd =

0.1),

  g1 = rep(1:4, each = 1000))
data <- data %>% mutate(y1 = -x1^2 + 2*x1 - 2 + g1 + rnorm(4000, sd

= 0.25))

data2 <- data %>% select(x2=x1, y2=y1, g2=g1) %>% mutate(x2=-x2)
data3 <- data.frame(x3 = sample(seq(-2, 2, by = 0.1), 20, replace =

TRUE),

   y3 = runif(20, min=-8, max=4),
   g3 = rep(1:4, each = 5)) %>% group_by(g3) %>%
arrange(x3)

gplot <- 

[R-es] Paquete que autocargue

2015-10-30 Thread Jesús Para Fernández
Buenas,

Estoy creando un paquete y me gustar�a crear un acceso directo en el escritorio 
que al pulsar sobre el, abra R, cargue el paquete [ library(mipaquete) ] y 
ejecute la funcion inicio()

�Es posible hacerlo?

Gracias
  
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

[R] ggplot2: Controlling width of line

2015-10-30 Thread Brian Smith
Hi,

I was trying to increase the size of certain lines in my plot (samples 'B'
and 'D' in example below). However, when I try to modify the line size, I
seem to screw up the linetypes. Also, is there a way to reflect the line
size in the legend?

Here is some sample code for illustration:

library(reshape)
matx <- matrix(sample(1:1000),4,5)
colnames(matx) <-  LETTERS[1:5]
rownames(matx) <- 1:4

subset1 <- c('B','D')

ltyvect <- c("solid","longdash","longdash","solid","solid")
colvect <- c("red","black","orange","blue","lightblue")
lwdvect <- rep(1,ncol(matx))

## For subset of samples, increase line width size
fmakelwd <- function(set1,subset1,vals1,val2=2){
idx <- set1 %in% subset1
vals1[idx] <- val2
return(vals1)
}

maty <- melt(matx)
set1 <- maty$X2
vals1 <- rep(1,length(set1))

mylwd <- fmakelwd(set1,subset1,vals1,val2=1.5)
matz <- data.frame(maty,mylwd)

# code without trying to modify line size

p <- ggplot(data=matz,aes(x=X1, y = value,col=X2,lty=X2,shape=X2))
p <- p + geom_line(aes(group = X2)) + geom_point(aes(shape =
factor(X2)),size=3) +
scale_linetype_manual(values = ltyvect) +
scale_color_manual(values = colvect) +
theme(legend.title = element_blank())
p


#  modifying line size

p <- ggplot(data=matz,aes(x=X1, y =
value,col=X2,lty=X2,shape=X2,size=mylwd))
p <- p + geom_line(aes(group = X2,size=mylwd)) + geom_point(aes(shape =
factor(X2)),size=3) +
scale_linetype_manual(values = ltyvect) +
scale_color_manual(values = colvect) +
scale_size(range=c(0.1, 2), guide=FALSE) +
theme(legend.title = element_blank())
p


#


thanks!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nested effects (was: "no subject")

2015-10-30 Thread Rolf Turner

On 31/10/15 03:32, Wagenaar, Daniel (wagenadl) wrote:

Dear R users:

All textbook references that I consult say that in a nested ANOVA
(e.g., A/B), the F statistic for factor A should be calculated as F_A =
MS_A / MS_(B within A). But when I run this simple example:

set.seed(1)
A = factor(rep(1:3, each=4))
B = factor(rep(1:2, 3, each=2))
Y = rnorm(12)
anova(lm(Y ~ A/B))

I get this result:




Analysis of Variance Table

Response: Y Df Sum Sq Mean Sq F value Pr(>F) A 2 0.4735 0.23675
0.2845 0.7620 A:B 3 1.7635 0.58783 0.7064 0.5823 Residuals 6 4.9931
0.83218

Evidently, R calculates the F value for A as MS_A / MS_Residuals.
While it is straightforward enough to calculate what I think is the
correct result from the table, I am surprised that R doesn't give me
that answer directly. Does anybody know if R's behavior is intentional,
and if so, why? And, perhaps most importantly, how to get the "textbook"
result in the most straightforward way? (I'd like to be able to give me
students a simple procedure...)


The formula that you specify is based upon factor "B" being a *random* 
effect.  The lm() function handles *fixed* effects only, and thus treats 
"B" as a fixed effect --- whether this makes any sense or not is another 
story.  (IMHO only random effects make sense as nested effects.)


Kevin Wright has already told you how to get what you want/need using 
aov() and the Error() function.  This works only for balanced designs, 
essentially.  For more complicated designs you will need to dive into 
the nlme and lme4 packages.  For which you will need *lots* of patience, 
determination, and luck! :-)


cheers,

Rolf Turner

P. S. Please provide a useful *subject line* in your posts to this list.

R. T.

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] User-defined functions in dplyr

2015-10-30 Thread William Dunlap
The error message is not very helpful and the stack trace is pretty
inscrutable as well
> dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
Error: not a vector
> traceback()
14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
12: summarise_impl(.data, dots)
11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
9: dplyr::summarize(., create_bins)
8: function_list[[k]](value)
7: withVisible(function_list[[k]](value))
6: freduce(value, `_function_list`)
5: `_fseq`(`_lhs`)
4: eval(expr, envir, enclos)
3: eval(quote(`_fseq`(`_lhs`)), env, env)
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)


It does not mean that your function, create_bins, does not return a vector
--
the sum function gives the same result. help(summarize,package="dplyr")
says:
 ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
  ‘max()’ etc.
It apparently means calls to summary functions, not summary functions
themselves.  The examples in the help file show the proper usage.

Use a call to your function and you will see it works better
   > dplyr::group_by(df, models) %>%
dplyr::summarize(create_bins(pred,nBins))
   Error: $ operator is invalid for atomic vectors
The traceback again is not very useful, because the call information was
stripped by dplyr (by the call=NULL in the call to stop()):
  > traceback()
  14: stop(list(message = "$ operator is invalid for atomic vectors",
  call = NULL, cppstack = NULL))
  13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
However it is clear that the fault is in your function, which is expecting a
data.frame x with a column called pred but gets pred itself.  Change x to
xpred
in the argument list and x$pred to xpred in the body of the function.

You will run into more problems because your function returns a vector
the length of its input but summarize expects a summary function - one
that returns a scalar for any size vector input.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz  wrote:

> So in this case, "create_bins" returns a vector and I still get the same
> error.
>
>
> create_bins <- function(x, nBins)
> {
>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>   bin
> }
>
>
> ### Using dplyr (fails)
> nBins = 10
> by_group <- dplyr::group_by(df, models)
> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> Error: not a vector
>
> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller 
> wrote:
>
> > You are jumping the gun (your other email did get through) and you are
> > posting using HTML (which does not come through on the list). Some time
> > (re)reading the Posting Guide mentioned at the bottom of all emails on
> this
> > list seems to be in order.
> >
> > The error is actually quite clear. You should return a vector from your
> > function, not a data frame.
> >
> ---
> > Jeff NewmillerThe .   .  Go
> Live...
> > DCN:Basics: ##.#.   ##.#.  Live
> > Go...
> >   Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> > /Software/Embedded Controllers)   .OO#.   .OO#.
> rocks...1k
> >
> ---
> > Sent from my phone. Please excuse my brevity.
> >
> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz 
> > wrote:
> > >Hello,
> > >
> > >Sorry, resending this question as the prior was not sent properly.
> > >
> > >I’m using the plyr package below to add a variable named "bin" to my
> > >original data frame "df" with the user-defined function "create_bins".
> > >I'd
> > >like to get similar results using dplyr instead, but failing to do so.
> > >
> > >set.seed(4)
> > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
> > >c("model1", "model2")))
> > >
> > >
> > >### Using plyr (works fine)
> > >create_bins <- function(x, nBins)
> > >{
> > >  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
> > >  dfB <-  data.frame(pred = x$pred,
> > >bin = cut(x$pred, breaks = Breaks, include.lowest =
> > >TRUE))
> > >  dfB
> > >}
> > >
> > >nBins = 10
> > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> > >head(res_plyr)
> > >
> > >### Using dplyr (fails)
> > >
> > >by_group <- dplyr::group_by(df, models)
> > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> > >Error: not a vector
> > >
> > >
> > >Any help would be much appreciated.
> > >
> > 

[R] (no subject)

2015-10-30 Thread Wagenaar, Daniel (wagenadl)
Dear R users:

All textbook references that I consult say that in a nested ANOVA (e.g., A/B), 
the F statistic for factor A should be calculated as F_A = MS_A / MS_(B within 
A). But when I run this simple example:

set.seed(1)
A = factor(rep(1:3, each=4))
B = factor(rep(1:2, 3, each=2))
Y = rnorm(12)
anova(lm(Y ~ A/B))

I get this result:

Analysis of Variance Table

Response: Y
  Df Sum Sq Mean Sq F value Pr(>F)
A  2 0.4735 0.23675  0.2845 0.7620
A:B3 1.7635 0.58783  0.7064 0.5823
Residuals  6 4.9931 0.83218   

Evidently, R calculates the F value for A as MS_A / MS_Residuals. While it is 
straightforward enough to calculate what I think is the correct result from the 
table, I am surprised that R doesn't give me that answer directly. Does anybody 
know if R's behavior is intentional, and if so, why? And, perhaps most 
importantly, how to get the "textbook" result in the most straightforward way? 
(I'd like to be able to give me students a simple procedure...)

Thanks,

Daniel Wagenaar

-- 
Daniel A. Wagenaar, PhD
Assistant Professor
Department of Biological Sciences
McMicken College of Arts and Sciences
University of Cincinnati
Cincinnati, OH 45221
Phone: +1 (513) 556-9757
Email: daniel.wagen...@uc.edu
Web: http://www.danielwagenaar.net
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] User-defined functions in dplyr

2015-10-30 Thread William Dunlap
dplyr::mutate is probably what you want instead of dplyr::summarize:

create_bins3 <- function (xpred, nBins)
{
Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins)))
bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE)
bin
}
dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins))
#Source: local data frame [100 x 3]
#Groups: models [2]
#
# pred models   Bin
#(dbl) (fctr)(fctr)
#1   0.2167549 model1 (0.167,0.577]
#2  -0.5424926 model1   (-0.869,-0.481]
...


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap  wrote:

> The error message is not very helpful and the stack trace is pretty
> inscrutable as well
> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
> Error: not a vector
> > traceback()
> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> 12: summarise_impl(.data, dots)
> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
> 9: dplyr::summarize(., create_bins)
> 8: function_list[[k]](value)
> 7: withVisible(function_list[[k]](value))
> 6: freduce(value, `_function_list`)
> 5: `_fseq`(`_lhs`)
> 4: eval(expr, envir, enclos)
> 3: eval(quote(`_fseq`(`_lhs`)), env, env)
> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>
>
> It does not mean that your function, create_bins, does not return a vector
> --
> the sum function gives the same result. help(summarize,package="dplyr")
> says:
>  ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
>   ‘max()’ etc.
> It apparently means calls to summary functions, not summary functions
> themselves.  The examples in the help file show the proper usage.
>
> Use a call to your function and you will see it works better
>> dplyr::group_by(df, models) %>%
> dplyr::summarize(create_bins(pred,nBins))
>Error: $ operator is invalid for atomic vectors
> The traceback again is not very useful, because the call information was
> stripped by dplyr (by the call=NULL in the call to stop()):
>   > traceback()
>   14: stop(list(message = "$ operator is invalid for atomic vectors",
>   call = NULL, cppstack = NULL))
>   13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> However it is clear that the fault is in your function, which is expecting
> a
> data.frame x with a column called pred but gets pred itself.  Change x to
> xpred
> in the argument list and x$pred to xpred in the body of the function.
>
> You will run into more problems because your function returns a vector
> the length of its input but summarize expects a summary function - one
> that returns a scalar for any size vector input.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz  wrote:
>
>> So in this case, "create_bins" returns a vector and I still get the same
>> error.
>>
>>
>> create_bins <- function(x, nBins)
>> {
>>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>>   bin
>> }
>>
>>
>> ### Using dplyr (fails)
>> nBins = 10
>> by_group <- dplyr::group_by(df, models)
>> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
>> Error: not a vector
>>
>> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller > >
>> wrote:
>>
>> > You are jumping the gun (your other email did get through) and you are
>> > posting using HTML (which does not come through on the list). Some time
>> > (re)reading the Posting Guide mentioned at the bottom of all emails on
>> this
>> > list seems to be in order.
>> >
>> > The error is actually quite clear. You should return a vector from your
>> > function, not a data frame.
>> >
>> ---
>> > Jeff NewmillerThe .   .  Go
>> Live...
>> > DCN:Basics: ##.#.   ##.#.  Live
>> > Go...
>> >   Live:   OO#.. Dead: OO#..  Playing
>> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> > /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>> >
>> ---
>> > Sent from my phone. Please excuse my brevity.
>> >
>> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz 
>> > wrote:
>> > >Hello,
>> > >
>> > >Sorry, resending this question as the prior was not sent properly.
>> > >
>> > >I’m using the plyr package below to add a variable named "bin" to my
>> > >original data frame "df" with the user-defined function "create_bins".
>> > >I'd
>> > >like to get similar results 

[R] Error: Invalid First Argument in DPlyr

2015-10-30 Thread Abraham Mathew
I'm getting an "invalid first argument" error for the following. However,
con is an actual connection and is set up properly. So what does this error
actually refer to?

library(dplyr)
con <- RSQLServer::src_sqlserver("***", database = "***")

myData <- con %>%
  tbl("table") %>%
  group_by( work_dt, campaign, ad_group, matchtype, keyword ) %>%
  select( work_dt, campaign, ad_group, matchtype, keyword,
impressions, clicks, cost ) %>%
  filter(site_id %in% c(6932,6946,6948,6949,6951,6952,6953,6954,
6955,6964,6978,6979,7061,7260,7272,7329,
7791,7794,7850,7858,7983)) %>%
  filter(work_dt >= as.Date("2014-10-01 00:00:00") & work_dt <
as.Date("2014-10-02 00:00:00")) %>%
  summarise(
sum_impressions = sum(impressions),
sum_clicks = sum(clicks),
sum_cost = sum(cost),
  ) %>%
  collect()

This code produces:

Error in exists(name, env) : invalid first argument



exists("con")
> exists(con)
Error in exists(con) : invalid first argument> exists("con")[1] TRUE





-- 


*Abraham MathewData Ninja and Statistical Modeler*



*Minneapolis, MN720-648-0108@abmathewksAnalytics_Blog
*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2015-10-30 Thread Kevin Wright
Maybe you want

summary(aov(Y ~ A + Error(A:B)))

Kevin


On Fri, Oct 30, 2015 at 9:32 AM, Wagenaar, Daniel (wagenadl) <
wagen...@ucmail.uc.edu> wrote:

> Dear R users:
>
> All textbook references that I consult say that in a nested ANOVA (e.g.,
> A/B), the F statistic for factor A should be calculated as F_A = MS_A /
> MS_(B within A). But when I run this simple example:
>
> set.seed(1)
> A = factor(rep(1:3, each=4))
> B = factor(rep(1:2, 3, each=2))
> Y = rnorm(12)
> anova(lm(Y ~ A/B))
>
> I get this result:
>
> Analysis of Variance Table
>
> Response: Y
>   Df Sum Sq Mean Sq F value Pr(>F)
> A  2 0.4735 0.23675  0.2845 0.7620
> A:B3 1.7635 0.58783  0.7064 0.5823
> Residuals  6 4.9931 0.83218
>
> Evidently, R calculates the F value for A as MS_A / MS_Residuals. While it
> is straightforward enough to calculate what I think is the correct result
> from the table, I am surprised that R doesn't give me that answer directly.
> Does anybody know if R's behavior is intentional, and if so, why? And,
> perhaps most importantly, how to get the "textbook" result in the most
> straightforward way? (I'd like to be able to give me students a simple
> procedure...)
>
> Thanks,
>
> Daniel Wagenaar
>
> --
> Daniel A. Wagenaar, PhD
> Assistant Professor
> Department of Biological Sciences
> McMicken College of Arts and Sciences
> University of Cincinnati
> Cincinnati, OH 45221
> Phone: +1 (513) 556-9757
> Email: daniel.wagen...@uc.edu
> Web: http://www.danielwagenaar.net
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Kevin Wright

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to work with time of day (independent of date)

2015-10-30 Thread Daniel Nordlund

On 10/30/2015 11:17 AM, Mark Leeds wrote:

Daniel: Just to complete my solution, here's the code for doing the
mean. Didn't expect this to take 3 emails !!! Have a good weekend.

temp <- tapply(f$value, f$justtimes, mean)
finalDF <- data.frame(chrontimes = times(rownames(temp)), values = temp)
plot(values ~ chrontimes, data = finalDF)





On Fri, Oct 30, 2015 at 2:09 PM, Mark Leeds > wrote:

Hi Daniel: I forgot that you wanted the mean so my code doesn't do
exactly what you asked for but you can use jim's code for that part.
His substring approach is also good but maybe
the chron approach is more general ? Sorry for confusion.




On Fri, Oct 30, 2015 at 2:07 PM, Mark Leeds > wrote:

Hi Daniel:  Assuming that you don't have to deal with time
zones, then you can use a chron object which has a seperate
field for the time.  See below for how to convert to just times.
I sent privately in order to not keep others from sending since
there may  be other ways. But, if you're okay with just this,
then you can just send to list to close out thread. No credit
needed. All the best.


library(chron)

f <- structure(list(date = structure(1:8, .Label = c("2015-10-29
00:50:00",
"2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
"2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
"2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"),
row.names = c(NA,
-8L), class = "data.frame")

print(f)

f$dateandtimes <-
as.chron(as.POSIXct(as.character(f$date),format = "%Y-%m-%d
%H:%M:%S"))
print(f)

f$justtimes <- times(as.numeric(f$dateandtimes) %% 1)
print(f)

plot(value ~ justtimes, data = f)

On Fri, Oct 30, 2015 at 1:35 PM, Daniel Nordlund
> wrote:

I have a data frame with date/times represented as
charaacter strings and and a value at that date/time.  I
want to get the mean value for each time of day, across
days, and then plot time of day on the x-axis and means on
the y-axis.  R doesn't appear to have a built-in time of day
time type (independent of a date), unless I have missed
something. What is the best way to create a time variable so
that I can aggregate and plot by time of day, with time
labelled in HH:MM format.  My current approach is to convert
all date/times to the same date.  I can then manage the rest
of what I want with ggplot2.  But I am  wondering if there
is an easier/better way to do deal with time of day.

Here is a sample data frame.

df <- structure(list(date = structure(1:8, .Label =
c("2015-10-29 00:50:00",
"2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30
00:50:00",
"2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31
00:50:00",
"2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"),
row.names = c(NA,
-8L), class = "data.frame")


Any suggestions appreciated.

Dan

--
Daniel Nordlund
Bothell, WA  USA

__
R-help@r-project.org  mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.






Thanks to all who responded (both on and off list).  Several useful 
suggestions were presented.  It looks like using the chron package may 
get me what I want, but I will play with all the solutions to see what 
works best for me.



Dan

--
Daniel Nordlund
Bothell, WA  USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to work with time of day (independent of date)

2015-10-30 Thread Clint Bowman

Bill,

Your final words, "changes in spring and fall" reminds me of a problem 
I have yet to solve.  Most of my data is logged in standard time (no 
daylight times) but often I see the note "daylight time encountered 
switching to UTC" even when I've specified "tz="PST".


I hope I've been missing something simple--any suggestions?

TIA

Clint

Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

USPS:   PO Box 47600, Olympia, WA 98504-7600
Parcels:300 Desmond Drive, Lacey, WA 98503-1274

On Fri, 30 Oct 2015, William Dunlap wrote:


You can use difftime objects to get the amount of time since the start of
the current day.  E.g.,
 > dateTime <- as.POSIXlt( c("2015-10-29 00:50:00",
 + "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
 + "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
 + "2015-10-31 10:30:00"))
 > date <- trunc(dateTime, units="days")
 > sinceMidnight <- difftime(dateTime, date, units="mins")
 > sinceMidnight
 Time differences in mins
 [1]   50  570 1270   50  570 1270   50  630

I use difftime(x, y, units=) instead of the similar x-y because the latter
chooses
the units based on how far apart x and y are, while the former gives me
consistent
units:
 > dateTime[1] - date[1]
 Time difference of 50 mins
 > as.numeric(.Last.value)
 [1] 50
 > dateTime[5:6] - date[5:6]
 Time differences in hours
 [1]  9.5 21.16667
 > as.numeric(.Last.value)
 [1]  9.5 21.16667

Depending on what you are using this for, you might want to compute time
since 3am
of the current day so you don't get discontinuities for most times when the
time
changes in spring and fall.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 10:35 AM, Daniel Nordlund 
wrote:


I have a data frame with date/times represented as charaacter strings and
and a value at that date/time.  I want to get the mean value for each time
of day, across days, and then plot time of day on the x-axis and means on
the y-axis.  R doesn't appear to have a built-in time of day time type
(independent of a date), unless I have missed something. What is the best
way to create a time variable so that I can aggregate and plot by time of
day, with time labelled in HH:MM format.  My current approach is to convert
all date/times to the same date.  I can then manage the rest of what I want
with ggplot2.  But I am  wondering if there is an easier/better way to do
deal with time of day.

Here is a sample data frame.

df <- structure(list(date = structure(1:8, .Label = c("2015-10-29
00:50:00",
"2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
"2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
"2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"), row.names =
c(NA,
-8L), class = "data.frame")


Any suggestions appreciated.

Dan

--
Daniel Nordlund
Bothell, WA  USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: Invalid First Argument in DPlyr

2015-10-30 Thread Jeff Newmiller
You need to divide and conquer... find out which step is breaking the pipe by 
terminating it early at various points and if the problem is still not clear 
one you know which step is broken then give us a reproducible example.

I am not familiar with RSQLServer specifically, but the version of dplyr that I 
have installed (0.4.3) does not have a variant of the tbl function that is 
adapted to it.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On October 30, 2015 7:07:45 AM PDT, Abraham Mathew  
wrote:
>I'm getting an "invalid first argument" error for the following.
>However,
>con is an actual connection and is set up properly. So what does this
>error
>actually refer to?
>
>library(dplyr)
>con <- RSQLServer::src_sqlserver("***", database = "***")
>
>myData <- con %>%
>  tbl("table") %>%
>  group_by( work_dt, campaign, ad_group, matchtype, keyword ) %>%
>  select( work_dt, campaign, ad_group, matchtype, keyword,
>impressions, clicks, cost ) %>%
>  filter(site_id %in% c(6932,6946,6948,6949,6951,6952,6953,6954,
>6955,6964,6978,6979,7061,7260,7272,7329,
>7791,7794,7850,7858,7983)) %>%
>  filter(work_dt >= as.Date("2014-10-01 00:00:00") & work_dt <
>as.Date("2014-10-02 00:00:00")) %>%
>  summarise(
>sum_impressions = sum(impressions),
>sum_clicks = sum(clicks),
>sum_cost = sum(cost),
>  ) %>%
>  collect()
>
>This code produces:
>
>Error in exists(name, env) : invalid first argument
>
>
>
>exists("con")
>> exists(con)
>Error in exists(con) : invalid first argument> exists("con")[1] TRUE

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to work with time of day (independent of date)

2015-10-30 Thread Daniel Nordlund
I have a data frame with date/times represented as charaacter strings 
and and a value at that date/time.  I want to get the mean value for 
each time of day, across days, and then plot time of day on the x-axis 
and means on the y-axis.  R doesn't appear to have a built-in time of 
day time type (independent of a date), unless I have missed something. 
What is the best way to create a time variable so that I can aggregate 
and plot by time of day, with time labelled in HH:MM format.  My current 
approach is to convert all date/times to the same date.  I can then 
manage the rest of what I want with ggplot2.  But I am  wondering if 
there is an easier/better way to do deal with time of day.


Here is a sample data frame.

df <- structure(list(date = structure(1:8, .Label = c("2015-10-29 
00:50:00",

"2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
"2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
"2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"), row.names = 
c(NA,

-8L), class = "data.frame")


Any suggestions appreciated.

Dan

--
Daniel Nordlund
Bothell, WA  USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] monte carlo simulations in permanova in vegan package

2015-10-30 Thread Sean Porter
Thank you Jari,

It seems now that my question is morphing more into a statistical one, and
perhaps not appropriate for R-help list, so apologies. Yes we are talking
about the latest versions of the vegan and permute packages. 

When there are an insufficient number of permutations available due to low
sample sizes apparently an alternative is to use the result given in
Anderson & Robinson (2003) regarding the asymptotic permutation of the
numerator (or denominator) of the test statistic under permutation. And I
quote from Anderson et al. 2008 "It is demonstrated that each of the sums of
squares has, under permutation, an asymptotic distribution that is a linear
form in chi-square variables, where the coefficients are actually the
eigenvalues from a PCO of the resemblance matrix. Thus, chi-square variables
can be drawn randomly and independently, using Monte Carlo sampling, and
these can be combined with the eigenvalues to construct the asymptotic
permutation distribution for each of the numerator and denominator and,
thus,  for the entire pseudo-F statistic, in the event that too few actual
unique permutations exist."

Anderson, Gorley & Clarke. 2008. PERMANOVA+ for PRIMER: Guide to software
and statistical models.
Anderson & Robinson 2003. Generalised discriminant analysis based on
distances. Australian and New Zealand Journal of Statistics. 45: 301-318

I am sure you already know this! The above is what I am trying to do in the
vegan package though.. 

Apologies if I am missing something and if what you have said still applies
(that is not appropriate to exceed the possible number of permutations), I
am not a statistician..so any help/clarity would be welcome.. 


Regards, sean

 


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jari Oksanen
Sent: 29 October 2015 03:23 PM
To: r-h...@stat.math.ethz.ch
Subject: Re: [R] monte carlo simulations in permanova in vegan package

Sean Porter  ori.org.za> writes:

> I am trying to run a PERMANOVA in the vegan package with an 
> appropriate number of permutations (see example below), ideally . 
> Obviously that number of permutations does not exists so I would like 
> to use Monte Carlo permutation tests to derive the probability value, 
> as is done in the commercial package PERMANOVA+ for PRIMER. How can I 
> adapt my code so that adonis will do so ? Many thanks, Sean
[...clip...]
> 
> > permanova <- adonis(species ~ time, data = time, permutations=999,
> method="bray")
> 
> 'nperm' > set of all permutations; Resetting 'nperm'.
> 
I assume we are talking about the latest version of vegan and permute
packages. In that case you really should switch to complete enumeration if
you request exceeds the number of distinct permutations. As people have told
you, you should be satisfied with that because there are no more distinct
permutations. Alternatively, you need more data.

If you mean by Monte Carlo that the same that you have a sampling with
return instead of permutation, or that the same observation can appear
several times and therefore some other unit is missing, then there are two
pieces of advice:

1. You should not do so.
2. If you want to do so, you can generate your resampling matrices by hand
and use that matrix as the argument of permutations=. See the documentations
(?adonis) which tells how to do so.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to work with time of day (independent of date)

2015-10-30 Thread jim holtman
is this what you want:

> df <- structure(list(date = structure(1:8, .Label = c("2015-10-29
00:50:00",
+ "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
+ "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
+ "2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
+ 80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"), row.names =
c(NA,
+ -8L), class = "data.frame")
>
> # extract just the time and summarize by it
> tapply(df$value, substring(df$date, 12, 16), mean)
00:50 09:30 10:30 21:10
 66.0  20.0  79.0  59.5



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Oct 30, 2015 at 1:35 PM, Daniel Nordlund 
wrote:

> I have a data frame with date/times represented as charaacter strings and
> and a value at that date/time.  I want to get the mean value for each time
> of day, across days, and then plot time of day on the x-axis and means on
> the y-axis.  R doesn't appear to have a built-in time of day time type
> (independent of a date), unless I have missed something. What is the best
> way to create a time variable so that I can aggregate and plot by time of
> day, with time labelled in HH:MM format.  My current approach is to convert
> all date/times to the same date.  I can then manage the rest of what I want
> with ggplot2.  But I am  wondering if there is an easier/better way to do
> deal with time of day.
>
> Here is a sample data frame.
>
> df <- structure(list(date = structure(1:8, .Label = c("2015-10-29
> 00:50:00",
> "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
> "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
> "2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
> 80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"), row.names =
> c(NA,
> -8L), class = "data.frame")
>
>
> Any suggestions appreciated.
>
> Dan
>
> --
> Daniel Nordlund
> Bothell, WA  USA
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to work with time of day (independent of date)

2015-10-30 Thread William Dunlap
You can use difftime objects to get the amount of time since the start of
the current day.  E.g.,
  > dateTime <- as.POSIXlt( c("2015-10-29 00:50:00",
  + "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
  + "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
  + "2015-10-31 10:30:00"))
  > date <- trunc(dateTime, units="days")
  > sinceMidnight <- difftime(dateTime, date, units="mins")
  > sinceMidnight
  Time differences in mins
  [1]   50  570 1270   50  570 1270   50  630

I use difftime(x, y, units=) instead of the similar x-y because the latter
chooses
the units based on how far apart x and y are, while the former gives me
consistent
units:
  > dateTime[1] - date[1]
  Time difference of 50 mins
  > as.numeric(.Last.value)
  [1] 50
  > dateTime[5:6] - date[5:6]
  Time differences in hours
  [1]  9.5 21.16667
  > as.numeric(.Last.value)
  [1]  9.5 21.16667

Depending on what you are using this for, you might want to compute time
since 3am
of the current day so you don't get discontinuities for most times when the
time
changes in spring and fall.



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 10:35 AM, Daniel Nordlund 
wrote:

> I have a data frame with date/times represented as charaacter strings and
> and a value at that date/time.  I want to get the mean value for each time
> of day, across days, and then plot time of day on the x-axis and means on
> the y-axis.  R doesn't appear to have a built-in time of day time type
> (independent of a date), unless I have missed something. What is the best
> way to create a time variable so that I can aggregate and plot by time of
> day, with time labelled in HH:MM format.  My current approach is to convert
> all date/times to the same date.  I can then manage the rest of what I want
> with ggplot2.  But I am  wondering if there is an easier/better way to do
> deal with time of day.
>
> Here is a sample data frame.
>
> df <- structure(list(date = structure(1:8, .Label = c("2015-10-29
> 00:50:00",
> "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
> "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
> "2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
> 80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"), row.names =
> c(NA,
> -8L), class = "data.frame")
>
>
> Any suggestions appreciated.
>
> Dan
>
> --
> Daniel Nordlund
> Bothell, WA  USA
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to work with time of day (independent of date)

2015-10-30 Thread William Dunlap
I get confused by this also, but I believe your time zone is US/Pacific,
which
specifies both the offset from UTC and the dates on which we switch between
'standard' (winter) and 'daylight savings' (summer).  I think you would have
to create a new time zone entry that is always UTC+8 hours, or whatever,
if you want to use standard time at all times.

I usually lie and use tz="UTC" when using data in local standard time (e.g.,
tide tables in the US).


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 12:30 PM, Clint Bowman  wrote:

> Bill,
>
> Your final words, "changes in spring and fall" reminds me of a problem I
> have yet to solve.  Most of my data is logged in standard time (no daylight
> times) but often I see the note "daylight time encountered switching to
> UTC" even when I've specified "tz="PST".
>
> I hope I've been missing something simple--any suggestions?
>
> TIA
>
> Clint
>
> Clint BowmanINTERNET:   cl...@ecy.wa.gov
> Air Quality Modeler INTERNET:   cl...@math.utah.edu
> Department of Ecology   VOICE:  (360) 407-6815
> PO Box 47600FAX:(360) 407-7534
> Olympia, WA 98504-7600
>
> USPS:   PO Box 47600, Olympia, WA 98504-7600
> Parcels:300 Desmond Drive, Lacey, WA 98503-1274
>
> On Fri, 30 Oct 2015, William Dunlap wrote:
>
> You can use difftime objects to get the amount of time since the start of
>> the current day.  E.g.,
>>  > dateTime <- as.POSIXlt( c("2015-10-29 00:50:00",
>>  + "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
>>  + "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
>>  + "2015-10-31 10:30:00"))
>>  > date <- trunc(dateTime, units="days")
>>  > sinceMidnight <- difftime(dateTime, date, units="mins")
>>  > sinceMidnight
>>  Time differences in mins
>>  [1]   50  570 1270   50  570 1270   50  630
>>
>> I use difftime(x, y, units=) instead of the similar x-y because the latter
>> chooses
>> the units based on how far apart x and y are, while the former gives me
>> consistent
>> units:
>>  > dateTime[1] - date[1]
>>  Time difference of 50 mins
>>  > as.numeric(.Last.value)
>>  [1] 50
>>  > dateTime[5:6] - date[5:6]
>>  Time differences in hours
>>  [1]  9.5 21.16667
>>  > as.numeric(.Last.value)
>>  [1]  9.5 21.16667
>>
>> Depending on what you are using this for, you might want to compute time
>> since 3am
>> of the current day so you don't get discontinuities for most times when
>> the
>> time
>> changes in spring and fall.
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Oct 30, 2015 at 10:35 AM, Daniel Nordlund <
>> djnordl...@frontier.com>
>> wrote:
>>
>> I have a data frame with date/times represented as charaacter strings and
>>> and a value at that date/time.  I want to get the mean value for each
>>> time
>>> of day, across days, and then plot time of day on the x-axis and means on
>>> the y-axis.  R doesn't appear to have a built-in time of day time type
>>> (independent of a date), unless I have missed something. What is the best
>>> way to create a time variable so that I can aggregate and plot by time of
>>> day, with time labelled in HH:MM format.  My current approach is to
>>> convert
>>> all date/times to the same date.  I can then manage the rest of what I
>>> want
>>> with ggplot2.  But I am  wondering if there is an easier/better way to do
>>> deal with time of day.
>>>
>>> Here is a sample data frame.
>>>
>>> df <- structure(list(date = structure(1:8, .Label = c("2015-10-29
>>> 00:50:00",
>>> "2015-10-29 09:30:00", "2015-10-29 21:10:00", "2015-10-30 00:50:00",
>>> "2015-10-30 09:30:00", "2015-10-30 21:10:00", "2015-10-31 00:50:00",
>>> "2015-10-31 10:30:00"), class = "factor"), value = c(88L, 17L,
>>> 80L, 28L, 23L, 39L, 82L, 79L)), .Names = c("date", "value"), row.names =
>>> c(NA,
>>> -8L), class = "data.frame")
>>>
>>>
>>> Any suggestions appreciated.
>>>
>>> Dan
>>>
>>> --
>>> Daniel Nordlund
>>> Bothell, WA  USA
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do