Re: [R] User-defined functions in dplyr

2015-11-02 Thread Axel Urbiz
Actually, the results are not the same. Looks like in the code below (see 
"using dplyr”), the function create_bins2 is not being applied separately to 
each "group_by" variable. That is surprising to me, or I'm misunderstanding 
dplyr.

### Create some data

set.seed(4)
df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = 
c("model1", "model2")))

### This is the code using plyr, which I'd like to change using dplyr

create_bins <- function(x, nBins) {
  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
  dfB <-  data.frame(pred = x$pred,
bin = cut(x$pred, breaks = Breaks, 
include.lowest = TRUE))
  dfB
}

nBins = 10
res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
head(res_plyr)

### Attempt using dplyr

create_bins2 <- function (pred, nBins) {
  Breaks <- unique(quantile(pred, probs = seq(0, 1, 1/nBins)))
  bin <- cut(pred, breaks = Breaks, include.lowest = TRUE)
  bin
}

res_dplyr <- dplyr::mutate(dplyr::group_by(df, models),
  bin=create_bins2(pred, nBins))


identical(res_plyr, as.data.frame(res_dplyr))
[1] FALSE
#levels(res_dplyr$bin) == levels(res_plyr$bin)

Thanks,
Axel.



> On Oct 30, 2015, at 12:19 PM, William Dunlap  wrote:
> 
> dplyr::mutate is probably what you want instead of dplyr::summarize:
> 
> create_bins3 <- function (xpred, nBins) 
> {
> Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins)))
> bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE)
> bin
> }
> dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins))
> #Source: local data frame [100 x 3]
> #Groups: models [2]
> #
> # pred models   Bin
> #(dbl) (fctr)(fctr)
> #1   0.2167549 model1 (0.167,0.577]
> #2  -0.5424926 model1   (-0.869,-0.481]
> ...
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com 
> On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap  > wrote:
> The error message is not very helpful and the stack trace is pretty 
> inscrutable as well
> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
> Error: not a vector
> > traceback()
> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> 12: summarise_impl(.data, dots)
> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
> 9: dplyr::summarize(., create_bins)
> 8: function_list[[k]](value)
> 7: withVisible(function_list[[k]](value))
> 6: freduce(value, `_function_list`)
> 5: `_fseq`(`_lhs`)
> 4: eval(expr, envir, enclos)
> 3: eval(quote(`_fseq`(`_lhs`)), env, env)
> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
> 
> 
> It does not mean that your function, create_bins, does not return a vector --
> the sum function gives the same result. help(summarize,package="dplyr")
> says:
>  ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
>   ‘max()’ etc.
> It apparently means calls to summary functions, not summary functions
> themselves.  The examples in the help file show the proper usage.
> 
> Use a call to your function and you will see it works better
>> dplyr::group_by(df, models) %>% dplyr::summarize(create_bins(pred,nBins))
>Error: $ operator is invalid for atomic vectors
> The traceback again is not very useful, because the call information was
> stripped by dplyr (by the call=NULL in the call to stop()):  
>   > traceback()
>   14: stop(list(message = "$ operator is invalid for atomic vectors", 
>   call = NULL, cppstack = NULL))
>   13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> However it is clear that the fault is in your function, which is expecting a
> data.frame x with a column called pred but gets pred itself.  Change x to 
> xpred
> in the argument list and x$pred to xpred in the body of the function.
> 
> You will run into more problems because your function returns a vector
> the length of its input but summarize expects a summary function - one
> that returns a scalar for any size vector input.
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com 
> 
> On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz  > wrote:
> So in this case, "create_bins" returns a vector and I still get the same
> error.
> 
> 
> create_bins <- function(x, nBins)
> {
>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>   bin
> }
> 
> 
> ### Using dplyr (fails)
> nBins = 10
> by_group <- dplyr::group_by(df, models)
> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> Error: not a vector
> 
> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller 

Re: [R] User-defined functions in dplyr

2015-11-02 Thread William Dunlap
dplyr::mutate does not collapse factor variables well.  They seem to get
their levels from the levels
computed for the first group and mutate does not check for them having
different levels.

> data.frame(group=rep(c("A","B","C"),each=2),
value=rep(c("X","Y","Z"),3:1)) %>% dplyr::group_by(group) %>%
dplyr::mutate(fv=factor(value))
Source: local data frame [6 x 3]
Groups: group [3]

   group  value fv
  (fctr) (fctr) (fctr)
1  A  X  X
2  A  X  X
3  B  X  X
4  B  Y NA
5  C  Y  X
6  C  Z NA
> levels(.Last.value$fv)
[1] "X"



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Nov 2, 2015 at 5:38 PM, Axel Urbiz  wrote:

> Actually, the results are not the same. Looks like in the code below (see
> "using dplyr”), the function create_bins2 is not being applied separately
> to each "group_by" variable. That is surprising to me, or I'm
> misunderstanding dplyr.
>
> ### Create some data
>
> set.seed(4)
> df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
> c("model1", "model2")))
>
> ### This is the code using plyr, which I'd like to change using dplyr
>
> create_bins <- function(x, nBins) {
>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>   dfB <-  data.frame(pred = x$pred,
> bin = cut(x$pred, breaks = Breaks,
> include.lowest = TRUE))
>   dfB
> }
>
> nBins = 10
> res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> head(res_plyr)
>
> ### Attempt using dplyr
>
> create_bins2 <- function (pred, nBins) {
>   Breaks <- unique(quantile(pred, probs = seq(0, 1, 1/nBins)))
>   bin <- cut(pred, breaks = Breaks, include.lowest = TRUE)
>   bin
> }
>
> res_dplyr <- dplyr::mutate(dplyr::group_by(df, models),
>   bin=create_bins2(pred, nBins))
>
>
> identical(res_plyr, as.data.frame(res_dplyr))
> [1] FALSE
> #levels(res_dplyr$bin) == levels(res_plyr$bin)
>
> Thanks,
> Axel.
>
>
>
> On Oct 30, 2015, at 12:19 PM, William Dunlap  wrote:
>
> dplyr::mutate is probably what you want instead of dplyr::summarize:
>
> create_bins3 <- function (xpred, nBins)
> {
> Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins)))
> bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE)
> bin
> }
> dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins))
> #Source: local data frame [100 x 3]
> #Groups: models [2]
> #
> # pred models   Bin
> #(dbl) (fctr)(fctr)
> #1   0.2167549 model1 (0.167,0.577]
> #2  -0.5424926 model1   (-0.869,-0.481]
> ...
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap  wrote:
>
>> The error message is not very helpful and the stack trace is pretty
>> inscrutable as well
>> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>> Error: not a vector
>> > traceback()
>> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
>> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
>> 12: summarise_impl(.data, dots)
>> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
>> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
>> 9: dplyr::summarize(., create_bins)
>> 8: function_list[[k]](value)
>> 7: withVisible(function_list[[k]](value))
>> 6: freduce(value, `_function_list`)
>> 5: `_fseq`(`_lhs`)
>> 4: eval(expr, envir, enclos)
>> 3: eval(quote(`_fseq`(`_lhs`)), env, env)
>> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
>> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>>
>>
>> It does not mean that your function, create_bins, does not return a
>> vector --
>> the sum function gives the same result. help(summarize,package="dplyr")
>> says:
>>  ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
>>   ‘max()’ etc.
>> It apparently means calls to summary functions, not summary functions
>> themselves.  The examples in the help file show the proper usage.
>>
>> Use a call to your function and you will see it works better
>>> dplyr::group_by(df, models) %>%
>> dplyr::summarize(create_bins(pred,nBins))
>>Error: $ operator is invalid for atomic vectors
>> The traceback again is not very useful, because the call information was
>> stripped by dplyr (by the call=NULL in the call to stop()):
>>   > traceback()
>>   14: stop(list(message = "$ operator is invalid for atomic vectors",
>>   call = NULL, cppstack = NULL))
>>   13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
>> However it is clear that the fault is in your function, which is
>> expecting a
>> data.frame x with a column called pred but gets pred itself.  Change x to
>> xpred
>> in the argument list and x$pred to xpred in the body of the function.
>>
>> You will run into more problems because your function returns a vector
>> the length of its 

Re: [R] User-defined functions in dplyr

2015-11-02 Thread Axel Urbiz
Nice example of the issue Bill. Thank you.

Is this a known issue? Plans to be fixed?

Thanks again,
Axel.


> On Nov 2, 2015, at 8:58 PM, William Dunlap  wrote:
> 
> dplyr::mutate does not collapse factor variables well.  They seem to get 
> their levels from the levels
> computed for the first group and mutate does not check for them having 
> different levels.
> 
> > data.frame(group=rep(c("A","B","C"),each=2), value=rep(c("X","Y","Z"),3:1)) 
> > %>% dplyr::group_by(group) %>% dplyr::mutate(fv=factor(value)) 
> Source: local data frame [6 x 3]
> Groups: group [3]
> 
>group  value fv
>   (fctr) (fctr) (fctr)
> 1  A  X  X
> 2  A  X  X
> 3  B  X  X
> 4  B  Y NA
> 5  C  Y  X
> 6  C  Z NA
> > levels(.Last.value$fv)
> [1] "X"
> 
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com 
> On Mon, Nov 2, 2015 at 5:38 PM, Axel Urbiz  > wrote:
> Actually, the results are not the same. Looks like in the code below (see 
> "using dplyr”), the function create_bins2 is not being applied separately to 
> each "group_by" variable. That is surprising to me, or I'm misunderstanding 
> dplyr.
> 
> ### Create some data
> 
> set.seed(4)
> df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = 
> c("model1", "model2")))
> 
> ### This is the code using plyr, which I'd like to change using dplyr
> 
> create_bins <- function(x, nBins) {
>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>   dfB <-  data.frame(pred = x$pred,
> bin = cut(x$pred, breaks = Breaks, 
> include.lowest = TRUE))
>   dfB
> }
> 
> nBins = 10
> res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> head(res_plyr)
> 
> ### Attempt using dplyr
> 
> create_bins2 <- function (pred, nBins) {
>   Breaks <- unique(quantile(pred, probs = seq(0, 1, 1/nBins)))
>   bin <- cut(pred, breaks = Breaks, include.lowest = TRUE)
>   bin
> }
> 
> res_dplyr <- dplyr::mutate(dplyr::group_by(df, models),
>   bin=create_bins2(pred, nBins))
> 
> 
> identical(res_plyr, as.data.frame(res_dplyr))
> [1] FALSE
> #levels(res_dplyr$bin) == levels(res_plyr$bin)
> 
> Thanks,
> Axel.
> 
> 
> 
>> On Oct 30, 2015, at 12:19 PM, William Dunlap > > wrote:
>> 
>> dplyr::mutate is probably what you want instead of dplyr::summarize:
>> 
>> create_bins3 <- function (xpred, nBins) 
>> {
>> Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins)))
>> bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE)
>> bin
>> }
>> dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins))
>> #Source: local data frame [100 x 3]
>> #Groups: models [2]
>> #
>> # pred models   Bin
>> #(dbl) (fctr)(fctr)
>> #1   0.2167549 model1 (0.167,0.577]
>> #2  -0.5424926 model1   (-0.869,-0.481]
>> ...
>> 
>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com 
>> On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap > > wrote:
>> The error message is not very helpful and the stack trace is pretty 
>> inscrutable as well
>> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>> Error: not a vector
>> > traceback()
>> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
>> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
>> 12: summarise_impl(.data, dots)
>> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
>> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
>> 9: dplyr::summarize(., create_bins)
>> 8: function_list[[k]](value)
>> 7: withVisible(function_list[[k]](value))
>> 6: freduce(value, `_function_list`)
>> 5: `_fseq`(`_lhs`)
>> 4: eval(expr, envir, enclos)
>> 3: eval(quote(`_fseq`(`_lhs`)), env, env)
>> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
>> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>> 
>> 
>> It does not mean that your function, create_bins, does not return a vector --
>> the sum function gives the same result. help(summarize,package="dplyr")
>> says:
>>  ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
>>   ‘max()’ etc.
>> It apparently means calls to summary functions, not summary functions
>> themselves.  The examples in the help file show the proper usage.
>> 
>> Use a call to your function and you will see it works better
>>> dplyr::group_by(df, models) %>% 
>> dplyr::summarize(create_bins(pred,nBins))
>>Error: $ operator is invalid for atomic vectors
>> The traceback again is not very useful, because the call information was
>> stripped by dplyr (by the call=NULL in the call to stop()):  
>>   > traceback()
>>   14: stop(list(message = "$ operator is invalid for atomic vectors", 
>>   call = NULL, 

Re: [R] User-defined functions in dplyr

2015-10-30 Thread Axel Urbiz
So in this case, "create_bins" returns a vector and I still get the same
error.


create_bins <- function(x, nBins)
{
  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
  bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
  bin
}


### Using dplyr (fails)
nBins = 10
by_group <- dplyr::group_by(df, models)
res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
Error: not a vector

On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller 
wrote:

> You are jumping the gun (your other email did get through) and you are
> posting using HTML (which does not come through on the list). Some time
> (re)reading the Posting Guide mentioned at the bottom of all emails on this
> list seems to be in order.
>
> The error is actually quite clear. You should return a vector from your
> function, not a data frame.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On October 29, 2015 4:55:19 PM MST, Axel Urbiz 
> wrote:
> >Hello,
> >
> >Sorry, resending this question as the prior was not sent properly.
> >
> >I’m using the plyr package below to add a variable named "bin" to my
> >original data frame "df" with the user-defined function "create_bins".
> >I'd
> >like to get similar results using dplyr instead, but failing to do so.
> >
> >set.seed(4)
> >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
> >c("model1", "model2")))
> >
> >
> >### Using plyr (works fine)
> >create_bins <- function(x, nBins)
> >{
> >  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
> >  dfB <-  data.frame(pred = x$pred,
> >bin = cut(x$pred, breaks = Breaks, include.lowest =
> >TRUE))
> >  dfB
> >}
> >
> >nBins = 10
> >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> >head(res_plyr)
> >
> >### Using dplyr (fails)
> >
> >by_group <- dplyr::group_by(df, models)
> >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> >Error: not a vector
> >
> >
> >Any help would be much appreciated.
> >
> >Best,
> >Axel.
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] User-defined functions in dplyr

2015-10-30 Thread William Dunlap
The error message is not very helpful and the stack trace is pretty
inscrutable as well
> dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
Error: not a vector
> traceback()
14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
12: summarise_impl(.data, dots)
11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
9: dplyr::summarize(., create_bins)
8: function_list[[k]](value)
7: withVisible(function_list[[k]](value))
6: freduce(value, `_function_list`)
5: `_fseq`(`_lhs`)
4: eval(expr, envir, enclos)
3: eval(quote(`_fseq`(`_lhs`)), env, env)
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)


It does not mean that your function, create_bins, does not return a vector
--
the sum function gives the same result. help(summarize,package="dplyr")
says:
 ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
  ‘max()’ etc.
It apparently means calls to summary functions, not summary functions
themselves.  The examples in the help file show the proper usage.

Use a call to your function and you will see it works better
   > dplyr::group_by(df, models) %>%
dplyr::summarize(create_bins(pred,nBins))
   Error: $ operator is invalid for atomic vectors
The traceback again is not very useful, because the call information was
stripped by dplyr (by the call=NULL in the call to stop()):
  > traceback()
  14: stop(list(message = "$ operator is invalid for atomic vectors",
  call = NULL, cppstack = NULL))
  13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
However it is clear that the fault is in your function, which is expecting a
data.frame x with a column called pred but gets pred itself.  Change x to
xpred
in the argument list and x$pred to xpred in the body of the function.

You will run into more problems because your function returns a vector
the length of its input but summarize expects a summary function - one
that returns a scalar for any size vector input.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz  wrote:

> So in this case, "create_bins" returns a vector and I still get the same
> error.
>
>
> create_bins <- function(x, nBins)
> {
>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>   bin
> }
>
>
> ### Using dplyr (fails)
> nBins = 10
> by_group <- dplyr::group_by(df, models)
> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> Error: not a vector
>
> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller 
> wrote:
>
> > You are jumping the gun (your other email did get through) and you are
> > posting using HTML (which does not come through on the list). Some time
> > (re)reading the Posting Guide mentioned at the bottom of all emails on
> this
> > list seems to be in order.
> >
> > The error is actually quite clear. You should return a vector from your
> > function, not a data frame.
> >
> ---
> > Jeff NewmillerThe .   .  Go
> Live...
> > DCN:Basics: ##.#.   ##.#.  Live
> > Go...
> >   Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> > /Software/Embedded Controllers)   .OO#.   .OO#.
> rocks...1k
> >
> ---
> > Sent from my phone. Please excuse my brevity.
> >
> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz 
> > wrote:
> > >Hello,
> > >
> > >Sorry, resending this question as the prior was not sent properly.
> > >
> > >I’m using the plyr package below to add a variable named "bin" to my
> > >original data frame "df" with the user-defined function "create_bins".
> > >I'd
> > >like to get similar results using dplyr instead, but failing to do so.
> > >
> > >set.seed(4)
> > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
> > >c("model1", "model2")))
> > >
> > >
> > >### Using plyr (works fine)
> > >create_bins <- function(x, nBins)
> > >{
> > >  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
> > >  dfB <-  data.frame(pred = x$pred,
> > >bin = cut(x$pred, breaks = Breaks, include.lowest =
> > >TRUE))
> > >  dfB
> > >}
> > >
> > >nBins = 10
> > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
> > >head(res_plyr)
> > >
> > >### Using dplyr (fails)
> > >
> > >by_group <- dplyr::group_by(df, models)
> > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
> > >Error: not a vector
> > >
> > >
> > >Any help would be much appreciated.
> > >
> > 

Re: [R] User-defined functions in dplyr

2015-10-30 Thread William Dunlap
dplyr::mutate is probably what you want instead of dplyr::summarize:

create_bins3 <- function (xpred, nBins)
{
Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins)))
bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE)
bin
}
dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins))
#Source: local data frame [100 x 3]
#Groups: models [2]
#
# pred models   Bin
#(dbl) (fctr)(fctr)
#1   0.2167549 model1 (0.167,0.577]
#2  -0.5424926 model1   (-0.869,-0.481]
...


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap  wrote:

> The error message is not very helpful and the stack trace is pretty
> inscrutable as well
> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
> Error: not a vector
> > traceback()
> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL))
> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> 12: summarise_impl(.data, dots)
> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...))
> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...))
> 9: dplyr::summarize(., create_bins)
> 8: function_list[[k]](value)
> 7: withVisible(function_list[[k]](value))
> 6: freduce(value, `_function_list`)
> 5: `_fseq`(`_lhs`)
> 4: eval(expr, envir, enclos)
> 3: eval(quote(`_fseq`(`_lhs`)), env, env)
> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)
>
>
> It does not mean that your function, create_bins, does not return a vector
> --
> the sum function gives the same result. help(summarize,package="dplyr")
> says:
>  ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’,
>   ‘max()’ etc.
> It apparently means calls to summary functions, not summary functions
> themselves.  The examples in the help file show the proper usage.
>
> Use a call to your function and you will see it works better
>> dplyr::group_by(df, models) %>%
> dplyr::summarize(create_bins(pred,nBins))
>Error: $ operator is invalid for atomic vectors
> The traceback again is not very useful, because the call information was
> stripped by dplyr (by the call=NULL in the call to stop()):
>   > traceback()
>   14: stop(list(message = "$ operator is invalid for atomic vectors",
>   call = NULL, cppstack = NULL))
>   13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots)
> However it is clear that the fault is in your function, which is expecting
> a
> data.frame x with a column called pred but gets pred itself.  Change x to
> xpred
> in the argument list and x$pred to xpred in the body of the function.
>
> You will run into more problems because your function returns a vector
> the length of its input but summarize expects a summary function - one
> that returns a scalar for any size vector input.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz  wrote:
>
>> So in this case, "create_bins" returns a vector and I still get the same
>> error.
>>
>>
>> create_bins <- function(x, nBins)
>> {
>>   Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>>   bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE)
>>   bin
>> }
>>
>>
>> ### Using dplyr (fails)
>> nBins = 10
>> by_group <- dplyr::group_by(df, models)
>> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
>> Error: not a vector
>>
>> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller > >
>> wrote:
>>
>> > You are jumping the gun (your other email did get through) and you are
>> > posting using HTML (which does not come through on the list). Some time
>> > (re)reading the Posting Guide mentioned at the bottom of all emails on
>> this
>> > list seems to be in order.
>> >
>> > The error is actually quite clear. You should return a vector from your
>> > function, not a data frame.
>> >
>> ---
>> > Jeff NewmillerThe .   .  Go
>> Live...
>> > DCN:Basics: ##.#.   ##.#.  Live
>> > Go...
>> >   Live:   OO#.. Dead: OO#..  Playing
>> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>> > /Software/Embedded Controllers)   .OO#.   .OO#.
>> rocks...1k
>> >
>> ---
>> > Sent from my phone. Please excuse my brevity.
>> >
>> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz 
>> > wrote:
>> > >Hello,
>> > >
>> > >Sorry, resending this question as the prior was not sent properly.
>> > >
>> > >I’m using the plyr package below to add a variable named "bin" to my
>> > >original data frame "df" with the user-defined function "create_bins".
>> > >I'd
>> > >like to get similar results 

Re: [R] User-defined functions in dplyr

2015-10-29 Thread Jeff Newmiller
You are jumping the gun (your other email did get through) and you are posting 
using HTML (which does not come through on the list). Some time (re)reading the 
Posting Guide mentioned at the bottom of all emails on this list seems to be in 
order.

The error is actually quite clear. You should return a vector from your 
function, not a data frame.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On October 29, 2015 4:55:19 PM MST, Axel Urbiz  wrote:
>Hello,
>
>Sorry, resending this question as the prior was not sent properly.
>
>I’m using the plyr package below to add a variable named "bin" to my
>original data frame "df" with the user-defined function "create_bins".
>I'd
>like to get similar results using dplyr instead, but failing to do so.
>
>set.seed(4)
>df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
>c("model1", "model2")))
>
>
>### Using plyr (works fine)
>create_bins <- function(x, nBins)
>{
>  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
>  dfB <-  data.frame(pred = x$pred,
>bin = cut(x$pred, breaks = Breaks, include.lowest =
>TRUE))
>  dfB
>}
>
>nBins = 10
>res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
>head(res_plyr)
>
>### Using dplyr (fails)
>
>by_group <- dplyr::group_by(df, models)
>res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
>Error: not a vector
>
>
>Any help would be much appreciated.
>
>Best,
>Axel.
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] User-defined functions in dplyr

2015-10-29 Thread Axel Urbiz
Hello,

Sorry, resending this question as the prior was not sent properly.

I’m using the plyr package below to add a variable named "bin" to my
original data frame "df" with the user-defined function "create_bins". I'd
like to get similar results using dplyr instead, but failing to do so.

set.seed(4)
df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels =
c("model1", "model2")))


### Using plyr (works fine)
create_bins <- function(x, nBins)
{
  Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))
  dfB <-  data.frame(pred = x$pred,
 bin = cut(x$pred, breaks = Breaks, include.lowest =
TRUE))
  dfB
}

nBins = 10
res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins)
head(res_plyr)

### Using dplyr (fails)

by_group <- dplyr::group_by(df, models)
res_dplyr <- dplyr::summarize(by_group, create_bins, nBins)
Error: not a vector


Any help would be much appreciated.

Best,
Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] User-defined functions in dplyr

2015-10-29 Thread Axel Urbiz
Hello,

I’m using the plyr package to add a variable named "bin" to my original
data frame "df" with a user-defined function "create_bins". I'd like to get
similar results using dplyr instead, but failing to do so. set.seed(4)df <-
data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = c("model1",
"model2")))
### Using plyr (works fine)create_bins <- function(x, nBins){  Breaks <-
unique(quantile(x$pred, probs = seq(0, 1, 1/nBins)))  dfB <-
 data.frame(pred = x$pred, bin = cut(x$pred, breaks =
Breaks, include.lowest = TRUE))  dfB} nBins = 10res_plyr <- plyr::ddply(df,
plyr::.(models), create_bins, nBins)head(res_plyr) ### Using dplyr
(fails) by_group <- dplyr::group_by(df, models)res_dplyr <-
dplyr::summarize(by_group, create_bins, nBins)Error: not a vector  Any help
would be much appreciated. Best,Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.