Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
I already found a solution, you suggested I try to find a non hacky
solution, which was not really my priority. I should have declined
politely, which I will do now. Or, ifyou just want me to post reproducible
code because you are bored or because you like solving problems then let me
know and I will accommodate. You have been helpful and I wouldnt mind in
that case.  Also, IQR was not a help from the beginning. If it supplies one
value, then its not even a candidate to be helpful for my problem. I
already talked about the format i was looking for.  I dont think I violated
any posting guideline, I asked for help, and people pointed me in a
direction and it helped me. Thanks again, I appreciate it.
On Apr 19, 2016 10:53 PM, "Bert Gunter" <bgunter.4...@gmail.com> wrote:

> ???
>
> IQR returns a single number.
>
> > IQR(rnorm(10))
> [1] 1.090168
>
> To your 2nd response:
> "I could have used average, min, max, they all would have returned the
> same thing., "
>
> I can only respond: huh?? Are all your values identical?
>
> You really need to provide a small reproducible example as requested
> by the posting guide -- I certainly don't get it, and I'm done
> guessing. Maybe others will see what I am missing and say something
> useful. I clearly can't.
>
> Cheers,
> Bert
>
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 19, 2016 at 5:29 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
> > Again, IQR returns two both a .25 and a .75 value and it failed, which is
> > why I didn't use it before. Also, the first function just returns tha
> same
> > value repeating.  Since they are the same, before the second call, using
> the
> > mode function is just a way to grab one value. I could have used average,
> > min, max, they all would have returned the same thing.
> >
> > Mike
> >
> > On Tue, Apr 19, 2016 at 7:24 PM, Marc Schwartz <marc_schwa...@me.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Jumping into this thread mainly on the point of the mode of the
> >> distribution, while also supporting Bert's comments below on theory.
> >>
> >> If the vector 'x' that is being passed to this function is an integer
> >> vector, then a tabulation of the integers can yield a 'mode', presuming
> of
> >> course that there is only one unique mode. You may have to decide how
> you
> >> want to handle a multi-modal discrete distribution.
> >>
> >> If the vector 'x' is continuous (e.g. contains floating point values),
> >> then a tabulation is going to be problematic for a variety of reasons.
> >>
> >> In that case, prior discussions on this point, have yielded the
> following
> >> estimation of the mode of a continuous distribution by using:
> >>
> >> Mode <- function(x) {
> >>   D <- density(x)
> >>   D$x[which.max(D$y)]
> >> }
> >>
> >> where the second line of the function gets you the value of 'x' at the
> >> maximum of the density estimate. Of course, there is still the
> possibility
> >> of a multi-modal distribution and the nuances of which kernel is used,
> etc.,
> >> etc.
> >>
> >> Food for thought.
> >>
> >> Regards,
> >>
> >> Marc Schwartz
> >>
> >>
> >> > On Apr 19, 2016, at 7:07 PM, Bert Gunter <bgunter.4...@gmail.com>
> wrote:
> >> >
> >> > Well, instead of your functions try:
> >> >
> >> > Mode <- function(x) {
> >> > tabx <- table(x)
> >> > tabx[which.max(tabx)]
> >> > }
> >> >
> >> > and use R's IQR function instead of yours.
> >> >
> >> > ... so I still don't get why you want to return a character string
> >> > instead of a value for the IQR;
> >> > and the mode of a sample defined as above is generally a bad estimator
> >> > of the mode of the distribution. To say more than that would take me
> >> > too far afield. Post on stats.stackexchange.com if you want to know
> >> > why (if it's even relevant).
> >> >
> >> > Cheers,
> >> > Bert
> >> > Bert Gunter
> >> >
> >> > "The trouble with having an open mind is that people keep coming along
> >> > and sticking things into it."
> >> >

Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
Again, IQR returns two both a .25 and a .75 value and it failed, which is
why I didn't use it before. Also, the first function just returns tha same
value repeating.  Since they are the same, before the second call, using
the mode function is just a way to grab one value. I could have used
average, min, max, they all would have returned the same thing.

Mike

On Tue, Apr 19, 2016 at 7:24 PM, Marc Schwartz <marc_schwa...@me.com> wrote:

> Hi,
>
> Jumping into this thread mainly on the point of the mode of the
> distribution, while also supporting Bert's comments below on theory.
>
> If the vector 'x' that is being passed to this function is an integer
> vector, then a tabulation of the integers can yield a 'mode', presuming of
> course that there is only one unique mode. You may have to decide how you
> want to handle a multi-modal discrete distribution.
>
> If the vector 'x' is continuous (e.g. contains floating point values),
> then a tabulation is going to be problematic for a variety of reasons.
>
> In that case, prior discussions on this point, have yielded the following
> estimation of the mode of a continuous distribution by using:
>
> Mode <- function(x) {
>   D <- density(x)
>   D$x[which.max(D$y)]
> }
>
> where the second line of the function gets you the value of 'x' at the
> maximum of the density estimate. Of course, there is still the possibility
> of a multi-modal distribution and the nuances of which kernel is used,
> etc., etc.
>
> Food for thought.
>
> Regards,
>
> Marc Schwartz
>
>
> > On Apr 19, 2016, at 7:07 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:
> >
> > Well, instead of your functions try:
> >
> > Mode <- function(x) {
> > tabx <- table(x)
> > tabx[which.max(tabx)]
> > }
> >
> > and use R's IQR function instead of yours.
> >
> > ... so I still don't get why you want to return a character string
> > instead of a value for the IQR;
> > and the mode of a sample defined as above is generally a bad estimator
> > of the mode of the distribution. To say more than that would take me
> > too far afield. Post on stats.stackexchange.com if you want to know
> > why (if it's even relevant).
> >
> > Cheers,
> > Bert
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Tue, Apr 19, 2016 at 4:25 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
> >> Hi,
> >>  Here is what I am doing
> >>
> >> notGroupedAll <- ddply(data
> >> ,~groupColumn
> >> ,summarise
> >> ,col1_mean=mean(col1)
> >> ,col2_mode=Mode(col2) #Function I wrote for getting the
> >> mode shown below
> >> ,col3_Range=myIqr(col3)
> >> )
> >>
> >> groupedAll <- ddply(data
> >> ,~groupColumn
> >> ,summarise
> >> ,col1_mean=mean(col1)
> >> ,col2_mode=Mode(col2) #Function I wrote for getting the
> >> mode shown below
> >> ,col3_Range=Mode(col3)
> >> )
> >>
> >> #custom Mode function
> >> Mode <- function(x) {
> >>  ux <- unique(x)
> >>  ux[which.max(tabulate(match(x, ux)))]
> >>
> >> #the range function
> >> myIqr <- function(x) {
> >>  paste(round(quantile(x,0.375),0),round(quantile(x,0.625),0),sep="-")
> >> }
> >>
> >>
> >> }
> >>
> >>
> >> Here is what I am doing!! :)
> >>
> >>
> >>
> >> On Tue, Apr 19, 2016 at 2:57 PM, William Dunlap <wdun...@tibco.com>
> wrote:
> >>>
> >>> If you show us, not just tell us about, a self-contained example
> >>> someone might show you a non-hacky way of getting the job done.
> >>> (I don't see an argument to plyr::ddply called 'transform'.)
> >>>
> >>> Bill Dunlap
> >>> TIBCO Software
> >>> wdunlap tibco.com
> >>>
> >>> On Tue, Apr 19, 2016 at 12:18 PM, Michael Artz <michaelea...@gmail.com
> >
> >>> wrote:
> >>>>
> >>>> Oh thanks for that clarification Bert!  Hope you enjoyed your
> coffee!  I
> >>>> ended up just using the transform argument in the ddply f

Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
Hi,
  Here is what I am doing

notGroupedAll <- ddply(data
 ,~groupColumn
 ,summarise
 ,col1_mean=mean(col1)
 ,col2_mode=Mode(col2) #Function I wrote for getting the
mode shown below
 ,col3_Range=myIqr(col3)
 )

groupedAll <- ddply(data
 ,~groupColumn
 ,summarise
 ,col1_mean=mean(col1)
 ,col2_mode=Mode(col2) #Function I wrote for getting the
mode shown below
 ,col3_Range=Mode(col3)
 )

#custom Mode function
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]

#the range function
myIqr <- function(x) {
  paste(round(quantile(x,0.375),0),round(quantile(x,0.625),0),sep="-")
}


}


Here is what I am doing!! :)



On Tue, Apr 19, 2016 at 2:57 PM, William Dunlap <wdun...@tibco.com> wrote:

> If you show us, not just tell us about, a self-contained example
> someone might show you a non-hacky way of getting the job done.
> (I don't see an argument to plyr::ddply called 'transform'.)
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Apr 19, 2016 at 12:18 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
>
>> Oh thanks for that clarification Bert!  Hope you enjoyed your coffee!  I
>> ended up just using the transform argument in the ddply function.  It
>> worked and it repeated, then I called a mode function in another call to
>> ddply that summarised.  Kinda hacky but oh well!
>>
>> On Tue, Apr 19, 2016 at 12:31 PM, Bert Gunter <bgunter.4...@gmail.com>
>> wrote:
>>
>>> ... and I'm getting another cup of coffee...
>>>
>>> -- Bert
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Tue, Apr 19, 2016 at 10:30 AM, Bert Gunter <bgunter.4...@gmail.com>
>>> wrote:
>>> > NO NO  -- I am wrong! The paste() expression is of course evaluated.
>>> > It's just that a character string is returned of the form "something -
>>> > something".
>>> >
>>> > I apologize for the confusion.
>>> >
>>> > -- Bert
>>> >
>>> >
>>> >
>>> >
>>> > Bert Gunter
>>> >
>>> > "The trouble with having an open mind is that people keep coming along
>>> > and sticking things into it."
>>> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >
>>> >
>>> > On Tue, Apr 19, 2016 at 10:25 AM, Bert Gunter <bgunter.4...@gmail.com>
>>> wrote:
>>> >> To be precise:
>>> >>
>>> >> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>>> >>
>>> >> is an expression that evaluates to a character string:
>>> >> "round(quantile(x,.25),0) - round(quantile(x,0.75),0)"
>>> >>
>>> >> no matter what the argument of your function, x. Hence
>>> >>
>>> >> return(paste(...)) will return this exact character string and never
>>> >> evaluates x.
>>> >>
>>> >>
>>> >> Cheers,
>>> >> Bert
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Bert Gunter
>>> >>
>>> >> "The trouble with having an open mind is that people keep coming along
>>> >> and sticking things into it."
>>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >>
>>> >>
>>> >> On Tue, Apr 19, 2016 at 8:34 AM, William Dunlap via R-help
>>> >> <r-help@r-project.org> wrote:
>>> >>>> That didn't work Jim!
>>> >>>
>>> >>> It always helps to say how the suggestion did not work.  Jim's
>>> >>> function had a typo in it - was that the problem?  Or did you not
>>> >>> change the call to ddply to use that function.  Here is something
>>> >>> that might "work" for you:
>>> >>>
>>> >>>  library(plyr)
>>> >>>
>>> >>>  data <- data.frame(gr

Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
Oh thanks for that clarification Bert!  Hope you enjoyed your coffee!  I
ended up just using the transform argument in the ddply function.  It
worked and it repeated, then I called a mode function in another call to
ddply that summarised.  Kinda hacky but oh well!

On Tue, Apr 19, 2016 at 12:31 PM, Bert Gunter <bgunter.4...@gmail.com>
wrote:

> ... and I'm getting another cup of coffee...
>
> -- Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 19, 2016 at 10:30 AM, Bert Gunter <bgunter.4...@gmail.com>
> wrote:
> > NO NO  -- I am wrong! The paste() expression is of course evaluated.
> > It's just that a character string is returned of the form "something -
> > something".
> >
> > I apologize for the confusion.
> >
> > -- Bert
> >
> >
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Tue, Apr 19, 2016 at 10:25 AM, Bert Gunter <bgunter.4...@gmail.com>
> wrote:
> >> To be precise:
> >>
> >> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>
> >> is an expression that evaluates to a character string:
> >> "round(quantile(x,.25),0) - round(quantile(x,0.75),0)"
> >>
> >> no matter what the argument of your function, x. Hence
> >>
> >> return(paste(...)) will return this exact character string and never
> >> evaluates x.
> >>
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Tue, Apr 19, 2016 at 8:34 AM, William Dunlap via R-help
> >> <r-help@r-project.org> wrote:
> >>>> That didn't work Jim!
> >>>
> >>> It always helps to say how the suggestion did not work.  Jim's
> >>> function had a typo in it - was that the problem?  Or did you not
> >>> change the call to ddply to use that function.  Here is something
> >>> that might "work" for you:
> >>>
> >>>  library(plyr)
> >>>
> >>>  data <- data.frame(groupColumn=rep(1:5,1:5), col1=2^(0:14))
> >>>  myIqr <- function(x) {
> >>>  paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>>  }
> >>>  ddply(data, ~groupColumn, summarise, col1_myIqr=myIqr(col1),
> >>> col1_IQR=stats::IQR(col1))
> >>>  #  groupColumn col1_myIqr col1_IQR
> >>>  #1   11-10
> >>>  #2   22-41
> >>>  #3   3  12-24   12
> >>>  #4   4112-320  208
> >>>  #5   5  2048-8192 6144
> >>>
> >>> The important point is that
> >>> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>> is not a function, it is an expression.   ddplyr wants functions.
> >>>
> >>>
> >>> Bill Dunlap
> >>> TIBCO Software
> >>> wdunlap tibco.com
> >>>
> >>> On Tue, Apr 19, 2016 at 7:56 AM, Michael Artz <michaelea...@gmail.com>
> >>> wrote:
> >>>
> >>>> That didn't work Jim!
> >>>>
> >>>> Thanks anyway
> >>>>
> >>>> On Mon, Apr 18, 2016 at 9:02 PM, Jim Lemon <drjimle...@gmail.com>
> wrote:
> >>>>
> >>>> > Hi Michael,
> >>>> > At a guess, try this:
> >>>> >
> >>>> > iqr<-function(x) {
> >>>> >
> >>>>
> return(paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >>>> > }
> >>>> >
> >>>> > .col3_Range=iqr(datat$tenure)
> >>>> >
> >>>> > Jim
> >>>> >
> >>>> >
>

Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
HI that did not work for me either.  The value I got returned from that
function was " - "  :(. thanks for the reply
through

On Tue, Apr 19, 2016 at 10:34 AM, William Dunlap <wdun...@tibco.com> wrote:

> > That didn't work Jim!
>
> It always helps to say how the suggestion did not work.  Jim's
> function had a typo in it - was that the problem?  Or did you not
> change the call to ddply to use that function.  Here is something
> that might "work" for you:
>
>  library(plyr)
>
>  data <- data.frame(groupColumn=rep(1:5,1:5), col1=2^(0:14))
>  myIqr <- function(x) {
>  paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>  }
>  ddply(data, ~groupColumn, summarise, col1_myIqr=myIqr(col1),
> col1_IQR=stats::IQR(col1))
>  #  groupColumn col1_myIqr col1_IQR
>  #1   11-10
>  #2   22-41
>  #3   3  12-24   12
>  #4   4112-320  208
>  #5   5  2048-8192 6144
>
> The important point is that
> paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> is not a function, it is an expression.   ddplyr wants functions.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Tue, Apr 19, 2016 at 7:56 AM, Michael Artz <michaelea...@gmail.com>
> wrote:
>
>> That didn't work Jim!
>>
>> Thanks anyway
>>
>> On Mon, Apr 18, 2016 at 9:02 PM, Jim Lemon <drjimle...@gmail.com> wrote:
>>
>> > Hi Michael,
>> > At a guess, try this:
>> >
>> > iqr<-function(x) {
>> >
>> return(paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
>> > }
>> >
>> > .col3_Range=iqr(datat$tenure)
>> >
>> > Jim
>> >
>> >
>> >
>> > On Tue, Apr 19, 2016 at 11:15 AM, Michael Artz <michaelea...@gmail.com>
>> > wrote:
>> > > Hi,
>> > >   I am trying to show an interquartile range while grouping values
>> using
>> > > the function ddply().  So my function call now is like
>> > >
>> > > groupedAll <- ddply(data
>> > >  ,~groupColumn
>> > >  ,summarise
>> > >  ,col1_mean=mean(col1)
>> > >  ,col2_mode=Mode(col2) #Function I wrote for getting
>> the
>> > > mode shown below
>> > >
>> > >  ,col3_Range=paste(as.character(round(quantile(datat$tenure,c(.25,
>> > > as.character(round(quantile(data$tenure,c(.75, sep = "-")
>> > >  )
>> > >
>> > > #custom Mode function
>> > > Mode <- function(x) {
>> > >   ux <- unique(x)
>> > >   ux[which.max(tabulate(match(x, ux)))]
>> > > }
>> > >
>> > > I am not sre what is going wrong on my interquartile range function,
>> it
>> > > works on its own outside of ddply()
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > __
>> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
Hi bert,

I understand the difference between a character string and a number. I need
to return a character string, that is a requirement.  It needs to be in
that format.  Getting the range with IQR is trivial I already tried it. The
grouping function accepts only one return value,  and IQR returns two.
Thanks for the reply though sir.
On Apr 19, 2016 10:20 AM, "Bert Gunter" <bgunter.4...@gmail.com> wrote:

> Are you aware that there *already is* a function that does this?
>
> ?IQR
>
> (also your "function" iqr" is just a character string and would have
> to be parsed and evaluated to become a function. But this is a
> TERRIBLE way to do things in R as it completely circumvents R's
> central functional programming paradigm).
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Apr 19, 2016 at 7:56 AM, Michael Artz <michaelea...@gmail.com>
> wrote:
> > That didn't work Jim!
> >
> > Thanks anyway
> >
> > On Mon, Apr 18, 2016 at 9:02 PM, Jim Lemon <drjimle...@gmail.com> wrote:
> >
> >> Hi Michael,
> >> At a guess, try this:
> >>
> >> iqr<-function(x) {
> >>
> return(paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> >> }
> >>
> >> .col3_Range=iqr(datat$tenure)
> >>
> >> Jim
> >>
> >>
> >>
> >> On Tue, Apr 19, 2016 at 11:15 AM, Michael Artz <michaelea...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >   I am trying to show an interquartile range while grouping values
> using
> >> > the function ddply().  So my function call now is like
> >> >
> >> > groupedAll <- ddply(data
> >> >  ,~groupColumn
> >> >  ,summarise
> >> >  ,col1_mean=mean(col1)
> >> >  ,col2_mode=Mode(col2) #Function I wrote for getting
> the
> >> > mode shown below
> >> >
> >> >  ,col3_Range=paste(as.character(round(quantile(datat$tenure,c(.25,
> >> > as.character(round(quantile(data$tenure,c(.75, sep = "-")
> >> >  )
> >> >
> >> > #custom Mode function
> >> > Mode <- function(x) {
> >> >   ux <- unique(x)
> >> >   ux[which.max(tabulate(match(x, ux)))]
> >> > }
> >> >
> >> > I am not sre what is going wrong on my interquartile range function,
> it
> >> > works on its own outside of ddply()
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interquartile Range

2016-04-19 Thread Michael Artz
That didn't work Jim!

Thanks anyway

On Mon, Apr 18, 2016 at 9:02 PM, Jim Lemon <drjimle...@gmail.com> wrote:

> Hi Michael,
> At a guess, try this:
>
> iqr<-function(x) {
>  return(paste(round(quantile(x,0.25),0),round(quantile(x,0.75),0),sep="-")
> }
>
> .col3_Range=iqr(datat$tenure)
>
> Jim
>
>
>
> On Tue, Apr 19, 2016 at 11:15 AM, Michael Artz <michaelea...@gmail.com>
> wrote:
> > Hi,
> >   I am trying to show an interquartile range while grouping values using
> > the function ddply().  So my function call now is like
> >
> > groupedAll <- ddply(data
> >  ,~groupColumn
> >  ,summarise
> >  ,col1_mean=mean(col1)
> >  ,col2_mode=Mode(col2) #Function I wrote for getting the
> > mode shown below
> >
> >  ,col3_Range=paste(as.character(round(quantile(datat$tenure,c(.25,
> > as.character(round(quantile(data$tenure,c(.75, sep = "-")
> >  )
> >
> > #custom Mode function
> > Mode <- function(x) {
> >   ux <- unique(x)
> >   ux[which.max(tabulate(match(x, ux)))]
> > }
> >
> > I am not sre what is going wrong on my interquartile range function, it
> > works on its own outside of ddply()
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interquartile Range

2016-04-18 Thread Michael Artz
Hi,
  I am trying to show an interquartile range while grouping values using
the function ddply().  So my function call now is like

groupedAll <- ddply(data
 ,~groupColumn
 ,summarise
 ,col1_mean=mean(col1)
 ,col2_mode=Mode(col2) #Function I wrote for getting the
mode shown below

 ,col3_Range=paste(as.character(round(quantile(datat$tenure,c(.25,
as.character(round(quantile(data$tenure,c(.75, sep = "-")
 )

#custom Mode function
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

I am not sre what is going wrong on my interquartile range function, it
works on its own outside of ddply()

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree and Random Forrest

2016-04-15 Thread Michael Artz
Thanks bill that will give the result I would like, however the example I
used is not the actual data I'm working with.  I have 25 or so columns,
each with 1-5 factors and 4 off them are numerical.

On Fri, Apr 15, 2016 at 5:44 PM, William Dunlap <wdun...@tibco.com> wrote:

> Since you only have 3 predictors, each categorical with a small number of
> categories, you can use expand.grid to make a data.frame containing all
> possible combinations and give that the predict method for your model to
> get all possible predictions.
>
> Something like the following untested code.
> newdata <- expand.grid(
> Humidity = levels(Humidity), #(High, Medium,Low)
> Pending_Chores = levels(Pending_Chores), #(Taxes, None, Laundry,
> Car Maintenance)
> Wind = levels(Wind)) # (High,Low)
> newdata$ProbabilityOfPlayingGolf <- predict(fittedModel,
> newdata=newdata)
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Apr 15, 2016 at 3:09 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
>
>> I need the output to have groups and the probability any given record in
>> that group then has of being in the response class. Just like my email in
>> the beginning i need the output that looks like if A and if B and if C
>> then
>> %77 it will be D.  The examples you provided are just simply not similar.
>> They are different and would take interpretation to get what i need.
>> On Apr 14, 2016 1:26 AM, "Sarah Goslee" <sarah.gos...@gmail.com> wrote:
>>
>> > So. Given that the second and third panels of the first figure in the
>> > first link I gave show a decision tree with decision rules at each split
>> > and the number of samples at each direction, what _exactly_ is your
>> > problem?
>> >
>> >
>> >
>> > On Wednesday, April 13, 2016, Michael Eugene <far...@hotmail.com>
>> wrote:
>> >
>> >> I still need the output to match my requiremnt in my original post.
>> With
>> >> decision rules "clusters" and probability attached to them.  The
>> examples
>> >> are sort of similar.  You just provided links to general info about
>> trees.
>> >>
>> >>
>> >>
>> >> Sent from my Verizon, Samsung Galaxy smartphone
>> >>
>> >>
>> >>  Original message 
>> >> From: Sarah Goslee <sarah.gos...@gmail.com>
>> >> Date: 4/13/16 8:04 PM (GMT-06:00)
>> >> To: Michael Artz <michaelea...@gmail.com>
>> >> Cc: "r-help@r-project.org" <R-help@r-project.org>
>> >> Subject: Re: [R] Decision Tree and Random Forrest
>> >>
>> >>
>> >>
>> >> On Wednesday, April 13, 2016, Michael Artz <michaelea...@gmail.com>
>> >> wrote:
>> >>
>> >> Tjats great that you are familiar and thanks for responding.  Have you
>> >> ever done what I am referring to? I have alteady spent time going
>> through
>> >> links and tutorials about decision trees and random forrests and have
>> even
>> >> used them both before.
>> >>
>> >> Then what specifically is your problem? Both of the tutorials I
>> provided
>> >> show worked examples, as does even the help for rpart. If none of
>> those, or
>> >> your extensive reading, work for your project you will have to be a lot
>> >> more specific about why not.
>> >>
>> >> Sarah
>> >>
>> >>
>> >>
>> >> Mike
>> >> On Apr 13, 2016 5:32 PM, "Sarah Goslee" <sarah.gos...@gmail.com>
>> wrote:
>> >>
>> >> It sounds like you want classification or regression trees. rpart does
>> >> exactly what you describe.
>> >>
>> >> Here's an overview:
>> >> http://www.statmethods.net/advstats/cart.html
>> >>
>> >> But there are a lot of other ways to do the same thing in R, for
>> instance:
>> >> http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/
>> >>
>> >> You can get the same kind of information from random forests, but it's
>> >> less straightforward. If you want a clear set of rules as in your golf
>> >> example, then you need rpart or similar.
>> >>
>> >> Sarah
>> >>
>> >> On Wed, Apr 13, 2016 at 6:02 PM, Michael Artz <michaelea...@gmail.com>
>> >> wrote:
>> >> > Ah yes I w

Re: [R] Decision Tree and Random Forrest

2016-04-15 Thread Michael Artz
I need the output to have groups and the probability any given record in
that group then has of being in the response class. Just like my email in
the beginning i need the output that looks like if A and if B and if C then
%77 it will be D.  The examples you provided are just simply not similar.
They are different and would take interpretation to get what i need.
On Apr 14, 2016 1:26 AM, "Sarah Goslee" <sarah.gos...@gmail.com> wrote:

> So. Given that the second and third panels of the first figure in the
> first link I gave show a decision tree with decision rules at each split
> and the number of samples at each direction, what _exactly_ is your
> problem?
>
>
>
> On Wednesday, April 13, 2016, Michael Eugene <far...@hotmail.com> wrote:
>
>> I still need the output to match my requiremnt in my original post.  With
>> decision rules "clusters" and probability attached to them.  The examples
>> are sort of similar.  You just provided links to general info about trees.
>>
>>
>>
>> Sent from my Verizon, Samsung Galaxy smartphone
>>
>>
>> ---- Original message 
>> From: Sarah Goslee <sarah.gos...@gmail.com>
>> Date: 4/13/16 8:04 PM (GMT-06:00)
>> To: Michael Artz <michaelea...@gmail.com>
>> Cc: "r-help@r-project.org" <R-help@r-project.org>
>> Subject: Re: [R] Decision Tree and Random Forrest
>>
>>
>>
>> On Wednesday, April 13, 2016, Michael Artz <michaelea...@gmail.com>
>> wrote:
>>
>> Tjats great that you are familiar and thanks for responding.  Have you
>> ever done what I am referring to? I have alteady spent time going through
>> links and tutorials about decision trees and random forrests and have even
>> used them both before.
>>
>> Then what specifically is your problem? Both of the tutorials I provided
>> show worked examples, as does even the help for rpart. If none of those, or
>> your extensive reading, work for your project you will have to be a lot
>> more specific about why not.
>>
>> Sarah
>>
>>
>>
>> Mike
>> On Apr 13, 2016 5:32 PM, "Sarah Goslee" <sarah.gos...@gmail.com> wrote:
>>
>> It sounds like you want classification or regression trees. rpart does
>> exactly what you describe.
>>
>> Here's an overview:
>> http://www.statmethods.net/advstats/cart.html
>>
>> But there are a lot of other ways to do the same thing in R, for instance:
>> http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/
>>
>> You can get the same kind of information from random forests, but it's
>> less straightforward. If you want a clear set of rules as in your golf
>> example, then you need rpart or similar.
>>
>> Sarah
>>
>> On Wed, Apr 13, 2016 at 6:02 PM, Michael Artz <michaelea...@gmail.com>
>> wrote:
>> > Ah yes I will have to use the predict function.  But the predict
>> function
>> > will not get me there really.  If I can take the example that I have a
>> > model predicting whether or not I will play golf (this is the dependent
>> > value), and there are three independent variables Humidity(High, Medium,
>> > Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind
>> (High,
>> > Low).  I would like rules like where any record that follows these rules
>> > (IF humidity = high AND pending_chores = None AND Wind = High THEN 77%
>> > there is probability that play_golf is YES).  I was thinking that random
>> > forrest would weight the rules somehow on the collection of trees and
>> give
>> > a probability.  But if that doesnt make sense, then can you just tell me
>> > how to get the decsion rules with one tree and I will work from that.
>> >
>> > Mike
>> >
>> > Mike
>> >
>> > On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4...@gmail.com>
>> wrote:
>> >
>> >> I think you are missing the point of random forests. But if you just
>> >> want to predict using the forest, there is a predict() method that you
>> >> can use. Other than that, I certainly don't understand what you mean.
>> >> Maybe someone else might.
>> >>
>> >> Cheers,
>> >> Bert
>> >>
>> >>
>> >> Bert Gunter
>> >>
>> >> "The trouble with having an open mind is that people keep coming along
>> >> and sticking things into it."
>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic str

Re: [R] Decision Tree and Random Forrest

2016-04-13 Thread Michael Artz
Tjats great that you are familiar and thanks for responding.  Have you ever
done what I am referring to? I have alteady spent time going through links
and tutorials about decision trees and random forrests and have even used
them both before.

Mike
On Apr 13, 2016 5:32 PM, "Sarah Goslee" <sarah.gos...@gmail.com> wrote:

It sounds like you want classification or regression trees. rpart does
exactly what you describe.

Here's an overview:
http://www.statmethods.net/advstats/cart.html

But there are a lot of other ways to do the same thing in R, for instance:
http://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/

You can get the same kind of information from random forests, but it's
less straightforward. If you want a clear set of rules as in your golf
example, then you need rpart or similar.

Sarah

On Wed, Apr 13, 2016 at 6:02 PM, Michael Artz <michaelea...@gmail.com>
wrote:
> Ah yes I will have to use the predict function.  But the predict function
> will not get me there really.  If I can take the example that I have a
> model predicting whether or not I will play golf (this is the dependent
> value), and there are three independent variables Humidity(High, Medium,
> Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind
(High,
> Low).  I would like rules like where any record that follows these rules
> (IF humidity = high AND pending_chores = None AND Wind = High THEN 77%
> there is probability that play_golf is YES).  I was thinking that random
> forrest would weight the rules somehow on the collection of trees and give
> a probability.  But if that doesnt make sense, then can you just tell me
> how to get the decsion rules with one tree and I will work from that.
>
> Mike
>
> Mike
>
> On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4...@gmail.com>
wrote:
>
>> I think you are missing the point of random forests. But if you just
>> want to predict using the forest, there is a predict() method that you
>> can use. Other than that, I certainly don't understand what you mean.
>> Maybe someone else might.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaelea...@gmail.com>
>> wrote:
>> > Ok is there a way to do  it with decision tree?  I just need to make
the
>> > decision rules. Perhaps I can pick one of the trees used with Random
>> > Forrest.  I am somewhat familiar already with Random Forrest with
>> respective
>> > to bagging and feature sampling and getting the mode from the leaf
nodes
>> and
>> > it being an ensemble technique of many trees.  I am just working from
the
>> > perspective that I need decision rules, and I am working backward form
>> that,
>> > and I need to do it in R.
>> >
>> > On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4...@gmail.com>
>> wrote:
>> >>
>> >> Nope.
>> >>
>> >> Random forests are not decision trees -- they are ensembles (forests)
>> >> of trees. You need to go back and read up on them so you understand
>> >> how they work. The Hastie/Tibshirani/Friedman "The Elements of
>> >> Statistical Learning" has a nice explanation, but I'm sure there are
>> >> lots of good web resources, too.
>> >>
>> >> Cheers,
>> >> Bert
>> >>
>> >>
>> >> Bert Gunter
>> >>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree and Random Forrest

2016-04-13 Thread Michael Artz
Ah yes I will have to use the predict function.  But the predict function
will not get me there really.  If I can take the example that I have a
model predicting whether or not I will play golf (this is the dependent
value), and there are three independent variables Humidity(High, Medium,
Low), Pending_Chores(Taxes, None, Laundry, Car Maintenance) and Wind (High,
Low).  I would like rules like where any record that follows these rules
(IF humidity = high AND pending_chores = None AND Wind = High THEN 77%
there is probability that play_golf is YES).  I was thinking that random
forrest would weight the rules somehow on the collection of trees and give
a probability.  But if that doesnt make sense, then can you just tell me
how to get the decsion rules with one tree and I will work from that.

Mike

Mike

On Wed, Apr 13, 2016 at 4:30 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:

> I think you are missing the point of random forests. But if you just
> want to predict using the forest, there is a predict() method that you
> can use. Other than that, I certainly don't understand what you mean.
> Maybe someone else might.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Apr 13, 2016 at 2:11 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
> > Ok is there a way to do  it with decision tree?  I just need to make the
> > decision rules. Perhaps I can pick one of the trees used with Random
> > Forrest.  I am somewhat familiar already with Random Forrest with
> respective
> > to bagging and feature sampling and getting the mode from the leaf nodes
> and
> > it being an ensemble technique of many trees.  I am just working from the
> > perspective that I need decision rules, and I am working backward form
> that,
> > and I need to do it in R.
> >
> > On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4...@gmail.com>
> wrote:
> >>
> >> Nope.
> >>
> >> Random forests are not decision trees -- they are ensembles (forests)
> >> of trees. You need to go back and read up on them so you understand
> >> how they work. The Hastie/Tibshirani/Friedman "The Elements of
> >> Statistical Learning" has a nice explanation, but I'm sure there are
> >> lots of good web resources, too.
> >>
> >> Cheers,
> >> Bert
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>
> >>
> >> On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaelea...@gmail.com>
> >> wrote:
> >> > Hi I'm trying to get the top decision rules from a decision tree.
> >> > Eventually I will like to do this with R and Random Forrest.  There
> has
> >> > to
> >> > be a way to output the decsion rules of each leaf node in an easily
> >> > readable way. I am looking at the randomforrest and rpart packages
> and I
> >> > dont see anything yet.
> >> > Mike
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree and Random Forrest

2016-04-13 Thread Michael Artz
Also that being said, just because random forest are not the same thing as
decision trees does not mean that you can't get decision rules from random
forest.

On Wed, Apr 13, 2016 at 4:11 PM, Michael Artz <michaelea...@gmail.com>
wrote:

> Ok is there a way to do  it with decision tree?  I just need to make the
> decision rules. Perhaps I can pick one of the trees used with Random
> Forrest.  I am somewhat familiar already with Random Forrest with
> respective to bagging and feature sampling and getting the mode from the
> leaf nodes and it being an ensemble technique of many trees.  I am just
> working from the perspective that I need decision rules, and I am working
> backward form that, and I need to do it in R.
>
> On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4...@gmail.com>
> wrote:
>
>> Nope.
>>
>> Random forests are not decision trees -- they are ensembles (forests)
>> of trees. You need to go back and read up on them so you understand
>> how they work. The Hastie/Tibshirani/Friedman "The Elements of
>> Statistical Learning" has a nice explanation, but I'm sure there are
>> lots of good web resources, too.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaelea...@gmail.com>
>> wrote:
>> > Hi I'm trying to get the top decision rules from a decision tree.
>> > Eventually I will like to do this with R and Random Forrest.  There has
>> to
>> > be a way to output the decsion rules of each leaf node in an easily
>> > readable way. I am looking at the randomforrest and rpart packages and I
>> > dont see anything yet.
>> > Mike
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Decision Tree and Random Forrest

2016-04-13 Thread Michael Artz
Ok is there a way to do  it with decision tree?  I just need to make the
decision rules. Perhaps I can pick one of the trees used with Random
Forrest.  I am somewhat familiar already with Random Forrest with
respective to bagging and feature sampling and getting the mode from the
leaf nodes and it being an ensemble technique of many trees.  I am just
working from the perspective that I need decision rules, and I am working
backward form that, and I need to do it in R.

On Wed, Apr 13, 2016 at 4:08 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:

> Nope.
>
> Random forests are not decision trees -- they are ensembles (forests)
> of trees. You need to go back and read up on them so you understand
> how they work. The Hastie/Tibshirani/Friedman "The Elements of
> Statistical Learning" has a nice explanation, but I'm sure there are
> lots of good web resources, too.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Apr 13, 2016 at 1:40 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
> > Hi I'm trying to get the top decision rules from a decision tree.
> > Eventually I will like to do this with R and Random Forrest.  There has
> to
> > be a way to output the decsion rules of each leaf node in an easily
> > readable way. I am looking at the randomforrest and rpart packages and I
> > dont see anything yet.
> > Mike
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Decision Tree and Random Forrest

2016-04-13 Thread Michael Artz
Hi I'm trying to get the top decision rules from a decision tree.
Eventually I will like to do this with R and Random Forrest.  There has to
be a way to output the decsion rules of each leaf node in an easily
readable way. I am looking at the randomforrest and rpart packages and I
dont see anything yet.
Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] No color in plotting

2016-04-13 Thread Michael Artz
Hi I am having a problem with plot () and ggplot ().  When I call one of
these functions, the plotting area starts to look as though it is working,
but nothijg ever is visible.  Unless it was a dendrogram.  Woth the bar
chart, the plotting area just had an x and y axis and nothing else. I tried
a bar chart with ggplot and i tried to plot a tree result from rpart ().  I
couldnt see anything plotted.  Is there some way I should be
troubleshooting this?  Im thinking its an R config I did or didnt do.  I
really have no idea though.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dissimilarity matrix and number clusters determination

2016-04-11 Thread Michael Artz
Hi,
  I already have a dissimilarity matrix and I am submitting the results to
the elbow.obj method to get an optimal number of clusters.  Am I reading
the below output correctly that I should have 17 clusters?

code:
top150 <- sampleset[1:150,]
{cluster1 <- daisy(top150
   , metric = c("gower")
   , stand = TRUE
   , type = list(symm = 1))
}

dist.obj <- dist(cluster1)
hclust.obj <- hclust(dist.obj)
css.obj <- css.hclust(dist.obj,hclust.obj)
elbow.obj <- elbow.batch(css.obj)

[1] "A \"good\" k=17 (EV=0.80) is detected when the EV is no less than
0.8\nand the increment of EV is no more than 0.01 for a bigger k.\n"
attr(,"class")

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why data frame's logical index isnt working

2016-04-07 Thread Michael Artz
I don't get it, I thought the double index was to indicate and individual
element within a column(vector)?
I will stop using data.frame, thanks a lot!

On Thu, Apr 7, 2016 at 9:29 PM, David Winsemius <dwinsem...@comcast.net>
wrote:

>
> > On Apr 7, 2016, at 6:46 PM, Michael Artz <michaelea...@gmail.com> wrote:
> >
> > data.frame.$columnToAdd["CurrentColumnName" == "ConditionMet"] <- 1
> >
> > Can someone please explain to me why the above command gives all NAs to
> > columnToAdd?  I thought this was possible in R to do logical expression
> in
> > the index of a data frame
>
> It is possible, but please execute this at a console line and then read
> ?"[" to see what is happening:
>
> "CurrentColumnName" == "ConditionMet"  # almost surely FALSE
>
> Let's assume your dataframe were named 'dat'.
>
> Perhaps you meant to write:
>
> dat$colToAdd[ dat[["CurrentColumnName"]] == dat[["ConditionMet"]] ] <- 1
>
> And do please stop naming your dataframes "data.frame".
>
>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple question on data frames assignment

2016-04-07 Thread Michael Artz
Why am I better off with true and false?

On Thu, Apr 7, 2016 at 8:41 AM, Hadley Wickham <h.wick...@gmail.com> wrote:

> == is also vectorised, and you're better off with TRUE and FALSE
> rather than 1 and 0, so I'd recommend:
>
> colordata$response <- colordata$color == 'blue'
>
> Hadley
>
> On Thu, Apr 7, 2016 at 6:52 AM, David Barron <dnbar...@gmail.com> wrote:
> > ifelse is vectorised, so just use that without the loop.
> >
> > colordata$response <- ifelse(colordata$color == 'blue', 1, 0)
> >
> > David
> >
> > On 7 April 2016 at 12:41, Michael Artz <michaelea...@gmail.com> wrote:
> >
> >> Hi I'm not sure how to ask this, but its a very easy question to answer
> for
> >> an R person.
> >>
> >> What is an easy way to check for a column value and then assigne a new
> >> column a value based on that old column value?
> >>
> >> For example, Im doing
> >>  colordata <- data.frame(id = c(1,2,3,4,5), color = c("blue", "red",
> >> "green", "blue", "orange"))
> >>  for (i in 1:nrow(colordata)){
> >>colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
> >>  }
> >>
> >> which works,  but I don't want to use the for loop I want to "vecotrize"
> >> this.  How would this be implemented?
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> http://hadley.nz
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple question on data frames assignment

2016-04-07 Thread Michael Artz
Fyi, This statement returned the following error

'Error in "Yes" + 0 : non-numeric argument to binary operator'

On Thu, Apr 7, 2016 at 8:43 AM, <ruipbarra...@sapo.pt> wrote:

> Hello,
>
> Or even simpler, without ifelse,
>
> colordata$response <- colordata$color == 'blue' + 0
>
> Hope this helps,
>
> Rui Barradas
>
>
> Citando David Barron <dnbar...@gmail.com>:
>
> ifelse is vectorised, so just use that without the loop.
>
> colordata$response <- ifelse(colordata$color == 'blue', 1, 0)
>
> David
>
> On 7 April 2016 at 12:41, Michael Artz <michaelea...@gmail.com> wrote:
>
> Hi I'm not sure how to ask this, but its a very easy question to answer for
> an R person.
>
> What is an easy way to check for a column value and then assigne a new
> column a value based on that old column value?
>
> For example, Im doing
> colordata <- data.frame(id = c(1,2,3,4,5), color = c("blue", "red",
> "green", "blue", "orange"))
> for (i in 1:nrow(colordata)){
>colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
> }
>
> which works,  but I don't want to use the for loop I want to "vecotrize"
> this.  How would this be implemented?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.htmland provide commented,
> minimal, self-contained, reproducible code.
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why data frame's logical index isnt working

2016-04-07 Thread Michael Artz
data.frame.$columnToAdd["CurrentColumnName" == "ConditionMet"] <- 1

Can someone please explain to me why the above command gives all NAs to
columnToAdd?  I thought this was possible in R to do logical expression in
the index of a data frame

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple question on data frames assignment

2016-04-07 Thread Michael Artz
It all makes so much sense now

On Thu, Apr 7, 2016 at 10:04 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
wrote:

> lapply(colordata2[ -1 ], f )
>
> When you put the parentheses on, you are calling the function yourself
> before lapply gets a chance. The error pops up because you are giving a
> vector of numbers (the answer f gave you) to the second argument of lapply
> instead of a function.
> --
> Sent from my phone. Please excuse my brevity.
>
> On April 7, 2016 7:31:18 AM PDT, Michael Artz <michaelea...@gmail.com>
> wrote:
>>
>> If you are not using an anonymous function and say you had written the
>> function out
>>
>> The below gives me the error > 'f(colordata2$color1)' is not a function,
>> character or symbol'  But then how is the anonymous function working?
>>
>>
>> f <- function(col){
>>   ifelse(col == 'blue', 1, 0)
>> }
>> responses <- lapply(colordata2[ -1 ], f(colordata2$color1) )
>>
>> 'f(colordata2$color1)' is not a function, character or symbol'
>>
>> then how could you then use this fuction in lapply if not for the
>> anonymous function?
>>
>> On Thu, Apr 7, 2016 at 8:17 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
>> wrote:
>>
>>> Lapply is not a vectorized function. It is compact to read, but it would
>>> not be worth using for this calculation.
>>>
>>> However, if your data frame had multiple color columns in your data
>>> frame that you wanted to make responses for then you might want to use
>>> lapply as a more compact version of a for loop to repeat this operation.
>>>
>>> colordata2 <- data.frame(id = c(1,2,3,4,5), color1 = c("blue", "red",
>>> "green", "blue", "orange"), color2 = c("orange", "green",
>>> "blue", "red", "red"))
>>> responses <- lapply( colordata2[ -1 ], function(col) { ifelse(col ==
>>> 'blue', 1, 0) } )
>>> names(responses) <- names( colordata2 )[-1]
>>>
>>> where each of the columns other than the first is handed in turn to the
>>> anonymous function that does the response calculation. The result is a data
>>> frame (list of columns) with no column names, so I give the new columns
>>> names based on the old column names. You could choose different names, e.g.
>>>
>>> names(responses) <- paste0( "response", 1:2 )
>>>
>>> but you have to be careful to fix that code whenever you change the
>>> colordata2 data frame to have more columns.
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> On April 7, 2016 4:57:04 AM PDT, Michael Artz <michaelea...@gmail.com>
>>> wrote:
>>>>
>>>> Thaks so much!  And how would you incorporate lapply() here?
>>>>
>>>> On Thu, Apr 7, 2016 at 6:52 AM, David Barron <dnbar...@gmail.com> wrote:
>>>>
>>>>  ifelse is vectorised, so just use that without the loop.
>>>>>
>>>>>  colordata$response <- ifelse(colordata$color == 'blue', 1, 0)
>>>>>
>>>>>  David
>>>>>
>>>>>  On 7 April 2016 at 12:41, Michael Artz <michaelea...@gmail.com> wrote:
>>>>>
>>>>>  Hi I'm not sure how to ask this, but its a very easy question to answer
>>>>>>  for
>>>>>>  an R person.
>>>>>>
>>>>>>  What is an easy way to check for a column value and
>>>>>> then assigne a new
>>>>>>  column a value based on that old column value?
>>>>>>
>>>>>>  For example, Im doing
>>>>>>   colordata <- data.frame(id = c(1,2,3,4,5),
>>>>>> color = c("blue", "red",
>>>>>>  "green", "blue", "orange"))
>>>>>>   for (i in 1:nrow(colordata)){
>>>>>> colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
>>>>>>   }
>>>>>>
>>>>>>  which works,  but I don't want to use the for loop I want to "vecotrize"
>>>>>>  this.  How would this be implemented?
>>>>>>
>>>>>>  [[alternative HTML version deleted]]
>>>>>>
>>>>>> --
>>>>>>
>>>>>>  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>  PLEASE do read the posting guide
>>>>>>  http://www.R-project.org/posting-guide.html
>>>>>>  and provide commented, minimal, self-contained, reproducible
>>>>>> code.
>>>>>
>>>>>
>>>>>
>>>>
>>>>  [[alternative HTML version deleted]]
>>>>
>>>> --
>>>>
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple question on data frames assignment

2016-04-07 Thread Michael Artz
If you are not using an anonymous function and say you had written the
function out

The below gives me the error > 'f(colordata2$color1)' is not a function,
character or symbol'  But then how is the anonymous function working?


f <- function(col){
  ifelse(col == 'blue', 1, 0)
}
responses <- lapply(colordata2[ -1 ], f(colordata2$color1) )

'f(colordata2$color1)' is not a function, character or symbol'

then how could you then use this fuction in lapply if not for the anonymous
function?

On Thu, Apr 7, 2016 at 8:17 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
wrote:

> Lapply is not a vectorized function. It is compact to read, but it would
> not be worth using for this calculation.
>
> However, if your data frame had multiple color columns in your data frame
> that you wanted to make responses for then you might want to use lapply as
> a more compact version of a for loop to repeat this operation.
>
> colordata2 <- data.frame(id = c(1,2,3,4,5), color1 = c("blue", "red",
> "green", "blue", "orange"), color2 = c("orange", "green",
> "blue", "red", "red"))
> responses <- lapply( colordata2[ -1 ], function(col) { ifelse(col ==
> 'blue', 1, 0) } )
> names(responses) <- names( colordata2 )[-1]
>
> where each of the columns other than the first is handed in turn to the
> anonymous function that does the response calculation. The result is a data
> frame (list of columns) with no column names, so I give the new columns
> names based on the old column names. You could choose different names, e.g.
>
> names(responses) <- paste0( "response", 1:2 )
>
> but you have to be careful to fix that code whenever you change the
> colordata2 data frame to have more columns.
> --
> Sent from my phone. Please excuse my brevity.
>
> On April 7, 2016 4:57:04 AM PDT, Michael Artz <michaelea...@gmail.com>
> wrote:
>>
>> Thaks so much!  And how would you incorporate lapply() here?
>>
>> On Thu, Apr 7, 2016 at 6:52 AM, David Barron <dnbar...@gmail.com> wrote:
>>
>>  ifelse is vectorised, so just use that without the loop.
>>>
>>>  colordata$response <- ifelse(colordata$color == 'blue', 1, 0)
>>>
>>>  David
>>>
>>>  On 7 April 2016 at 12:41, Michael Artz <michaelea...@gmail.com> wrote:
>>>
>>>  Hi I'm not sure how to ask this, but its a very easy question to answer
>>>>  for
>>>>  an R person.
>>>>
>>>>  What is an easy way to check for a column value and then assigne a new
>>>>  column a value based on that old column value?
>>>>
>>>>  For example, Im doing
>>>>   colordata <- data.frame(id = c(1,2,3,4,5),
>>>> color = c("blue", "red",
>>>>  "green", "blue", "orange"))
>>>>   for (i in 1:nrow(colordata)){
>>>> colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
>>>>   }
>>>>
>>>>  which works,  but I don't want to use the for loop I want to "vecotrize"
>>>>  this.  How would this be implemented?
>>>>
>>>>  [[alternative HTML version deleted]]
>>>>
>>>> --
>>>>
>>>>  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>>>  PLEASE do read the posting guide
>>>>  http://www.R-project.org/posting-guide.html
>>>>  and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>>
>>  [[alternative HTML version deleted]]
>>
>> --
>>
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple question on data frames assignment

2016-04-07 Thread Michael Artz
Thaks so much!  And how would you incorporate lapply() here?

On Thu, Apr 7, 2016 at 6:52 AM, David Barron <dnbar...@gmail.com> wrote:

> ifelse is vectorised, so just use that without the loop.
>
> colordata$response <- ifelse(colordata$color == 'blue', 1, 0)
>
> David
>
> On 7 April 2016 at 12:41, Michael Artz <michaelea...@gmail.com> wrote:
>
>> Hi I'm not sure how to ask this, but its a very easy question to answer
>> for
>> an R person.
>>
>> What is an easy way to check for a column value and then assigne a new
>> column a value based on that old column value?
>>
>> For example, Im doing
>>  colordata <- data.frame(id = c(1,2,3,4,5), color = c("blue", "red",
>> "green", "blue", "orange"))
>>  for (i in 1:nrow(colordata)){
>>colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
>>  }
>>
>> which works,  but I don't want to use the for loop I want to "vecotrize"
>> this.  How would this be implemented?
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simple question on data frames assignment

2016-04-07 Thread Michael Artz
Hi I'm not sure how to ask this, but its a very easy question to answer for
an R person.

What is an easy way to check for a column value and then assigne a new
column a value based on that old column value?

For example, Im doing
 colordata <- data.frame(id = c(1,2,3,4,5), color = c("blue", "red",
"green", "blue", "orange"))
 for (i in 1:nrow(colordata)){
   colordata$response[i] <- ifelse(colordata[i,"color"] == "blue", 1, 0)
 }

which works,  but I don't want to use the for loop I want to "vecotrize"
this.  How would this be implemented?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p values from GLM

2016-04-02 Thread Michael Artz
Maybe it's not the article itself for sale.  Sometimes a company will
charge a fee to have access to its knowledge base.  Not because it owns all
of the content, but because the articles, publications, etc have been
tracked down and centralized.  This is also the whole idea behind paying a
company a few dollars to do a semi-extensive background check.


On Sat, Apr 2, 2016 at 11:51 AM, Spencer Graves <
spencer.gra...@effectivedefense.org> wrote:

>
>
> On 4/2/2016 11:07 AM, David Winsemius wrote:
>
>> On Apr 1, 2016, at 5:01 PM, Duncan Murdoch 
>>> wrote:
>>>
>>> On 01/04/2016 6:46 PM, Bert Gunter wrote:
>>>
 ... of course, whether one **should** get them is questionable...

>>> They're just statistics.  How could it hurt to look at them?
>>>
>> Like Rolf, I thought that this utterance on April 1 deserved fortune
>> enshrinement. It reminded me of one of my favorite articles: "P-Values are
>> Random Variables".
>>
>> Unfortunately a legal copy of that paper is still behind a corporate
>> firewall for which you would need to fork over USD 50.00, but a google
>> search for "P-Values are Random Variables The American Statistician" should
>> yield options for the less squeamish. (My copy was obtained when I did have
>> legal access.)
>>
>
>
>   How much did money or do the authors of that paper receive in
> royalties?
>
>
>   That's important, because the purpose of US copyright law is, "To
> promote the Progress of Science and useful Arts, by securing for limited
> Times to Authors and Inventors the exclusive Right to their respective
> Writings and Discoveries."  (E.g., Wikipedia, "Copyright law of the United
> States", "https://en.wikipedia.org/wiki/Copyright_law_of_the_United_States;)
> Very few if any refereed academic papers are written for financial gain:
> Lawrence Lessig said that congressional representatives rarely hear
> counterarguments to the garbage they get from corporate lobbyists.  The
> Trans Pacific Partnership (TPP, and probably also the Transatlantic Trade
> and Investment Partnership) will strengthen the rights of corporations in
> this area.  If you think that will limit the progress of science and the
> useful arts, as I do, I suggest you contact your elected representatives
> and tell them so -- if you are a citizen of a country with elected
> representatives.  I think we should also ask the American Statistical
> Association how much money they make from that and what it would take to
> put all that material in the public domain.  I think professional
> organizations should come out strongly against these provisions of US
> copyright law and trade agreements that strengthen rather than weaken the
> stranglehold that major corporations have on the intellectual heritage of
> humanity.
>
>
>   This relates to R, because R is based on an assumption that the
> dissemination of publications, articles and software, for which the authors
> are not remunerated from copyright proceeds should not be limited by
> pre-internet rules that stifle unnecessarily the distribution of knowledge
> and with it improvements in productivity and economic growth.
>
>
>   Best Wishes,
>   Spencer Graves
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Could not find function even though I have all necessary packages

2016-03-28 Thread Michael Artz
Thank you everyone I got it!

I needed to install munsell was all.  I was giving a typo when I tried to
install munsell

On Mon, Mar 28, 2016 at 12:01 PM, Michael Artz <michaelea...@gmail.com>
wrote:

> Thanks.  SessionInfo() did not show it.
>
> This is the error when I try library(caret)
>
>
> > library(caret)
> Loading required package: ggplot2
> Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck
> = vI[[j]]) :
>   there is no package called ‘munsell’
> Error: package ‘ggplot2’ could not be loaded
>
> I tried installing.packages("ggplot2") and then I ran
> library(ggplot2) and it gave me error
>
> Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck
> = vI[[j]]) :
>   there is no package called ‘munsell’
> Error: package or namespace load failed for ‘ggplot2’
>
>
>
> On Mon, Mar 28, 2016 at 11:57 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us
> > wrote:
>
>> Post plain text only please.
>>
>> Are you sure it loaded? Verify with sessionInfo()...
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On March 28, 2016 9:21:56 AM PDT, Michael Artz <michaelea...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>   I am getting the error,
>>>
>>> Error: could not find function "createDataPartition"
>>>
>>> when I do the code
>>> dataFrame_data <- createDataPartition(data$colA, p=.7, list=FALSE)
>>>
>>> even though I have run already
>>>
>>> install.packages("caret", dependencies = c("Depends", "Imports",
>>> "Suggests"))
>>> and
>>> install.packages("caret")
>>>
>>> those worked and I then ran
>>> library(caret)
>>>
>>> does anyone know why I'm unable to use this function?
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> --
>>>
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Could not find function even though I have all necessary packages

2016-03-28 Thread Michael Artz
Thanks.  SessionInfo() did not show it.

This is the error when I try library(caret)


> library(caret)
Loading required package: ggplot2
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck
= vI[[j]]) :
  there is no package called ‘munsell’
Error: package ‘ggplot2’ could not be loaded

I tried installing.packages("ggplot2") and then I ran
library(ggplot2) and it gave me error

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck
= vI[[j]]) :
  there is no package called ‘munsell’
Error: package or namespace load failed for ‘ggplot2’



On Mon, Mar 28, 2016 at 11:57 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us>
wrote:

> Post plain text only please.
>
> Are you sure it loaded? Verify with sessionInfo()...
> --
> Sent from my phone. Please excuse my brevity.
>
> On March 28, 2016 9:21:56 AM PDT, Michael Artz <michaelea...@gmail.com>
> wrote:
>
>> Hi,
>>   I am getting the error,
>>
>> Error: could not find function "createDataPartition"
>>
>> when I do the code
>> dataFrame_data <- createDataPartition(data$colA, p=.7, list=FALSE)
>>
>> even though I have run already
>>
>> install.packages("caret", dependencies = c("Depends", "Imports",
>> "Suggests"))
>> and
>> install.packages("caret")
>>
>> those worked and I then ran
>> library(caret)
>>
>> does anyone know why I'm unable to use this function?
>>
>>  [[alternative HTML version deleted]]
>>
>> --
>>
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Could not find function even though I have all necessary packages

2016-03-28 Thread Michael Artz
Hi,
  I am getting the error,

Error: could not find function "createDataPartition"

when I do the code
dataFrame_data <- createDataPartition(data$colA, p=.7, list=FALSE)

even though I have run already

install.packages("caret", dependencies = c("Depends", "Imports",
"Suggests"))
and
install.packages("caret")

those worked and I then ran
library(caret)

does anyone know why I'm unable to use this function?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression output baseline (reference) category

2016-03-25 Thread Michael Artz
Hi,
  I have now read an introductory text on regression and I think I do
understand what the intercept is doing.  However, my original question is
still unanswered.  I understand that the intercept term is the constant
that each other term is measured against.  I think baseline is a good word
for it.  However, it does not represent any one of the x variables by
itself.  Is there a way in R, to extrapolate the individual x variable
intercepts from the equation somehow.


On Tue, Mar 15, 2016 at 8:26 PM, David Winsemius <dwinsem...@comcast.net>
wrote:

>
> > On Mar 15, 2016, at 1:27 PM, Michael Artz <michaelea...@gmail.com>
> wrote:
> >
> > Hi,
> >   I am trying to use the summary from the glm function as a data source.
> I
> > am using the call sink() then
> > summary(logisticRegModel)$coefficients then sink().
>
> Since it's a matrix you may need to locate a function that write matrices
> to files. I seem to remember that the MASS package has one.
>
> >  The independent
> > variables are categorical and thus there is always a baseline value for
> > every category that is omitted from the glm output.
>
> Well, it's not really omitted, so much as shared among all variables. For
> further reading in the halp pages consult:
>
> ?model.matrix
> ?contrasts
> ?contr.treatment
>
> But you probably need to supplement that with an introductory text that
> covers R regression defaults.
>
> >  I am interested in how
> > to get the Z column for all of the categorical values.
>
> The Z column? You meant the "z value" column. Again, since it's a matrix
> you need to use column indexing with "["
>
> summary(logisticRegModel)$coefficients[  , "z value"]
>
> Read up on the summary function for glm objects at:
>
> ?summary.glm
>
>
> >  I don't see any row
> > for the reference category.
>
> What do you imagine the (Intercept) row to be doing? If you are having
> difficulty understanding this (which is not really an R-specific issue)
> there are probably already several explanations to similar questions on:
>
> http://stats.stackexchange.com/
>
>
> >
> > How can I get this Z value in the output?
>
> Asked and answered.
>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression output baseline (reference) category

2016-03-15 Thread Michael Artz
Hi,
   I am trying to use the summary from the glm function as a data source. I
am using the call sink() then
summary(logisticRegModel)$coefficients then sink().  The independent
variables are categorical and thus there is always a baseline value for
every category that is omitted from the glm output.  I am interested in how
to get the Z column for all of the categorical values.  I don't see any row
for the reference category.  How can I get this Z value in the output?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prediction from a rank deficient fit may be misleading

2016-03-10 Thread Michael Artz
Here is the results of the logistic regression model.  Is it because of the
NA values?

Call:
glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection +
gender + InternetService + MonthlyCharges + MultipleLines +
OnlineBackup + OnlineSecurity + PaperlessBilling + Partner +
PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies +
StreamingTV + TechSupport + tenure + TotalCharges, family =
binomial(link = "logit"),
data = churn_training)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-1.8943  -0.6867  -0.2863   0.7378   3.4259

Coefficients: (7 not defined because of singularities)
   Estimate Std. Error z value Pr(>|z|)

(Intercept)   1.0664928  1.7195494   0.620   0.5351

ContractOne year -0.6874005  0.1314227  -5.230 1.69e-07
***
ContractTwo year -1.2775385  0.2101193  -6.080 1.20e-09
***
DependentsYes-0.1485301  0.1095348  -1.356   0.1751

DeviceProtectionNo internet service  -1.5547306  0.9661837  -1.609   0.1076

DeviceProtectionYes   0.0459115  0.2114253   0.217   0.8281

genderMale   -0.0350970  0.0776896  -0.452   0.6514

InternetServiceFiber optic1.4800374  0.9545398   1.551   0.1210

InternetServiceNoNA NA  NA   NA

MonthlyCharges   -0.0324614  0.0379646  -0.855   0.3925

MultipleLinesNo phone service 0.0808745  0.7736359   0.105   0.9167

MultipleLinesYes  0.3990450  0.2131343   1.872   0.0612
.
OnlineBackupNo internet service  NA NA  NA   NA

OnlineBackupYes  -0.0328892  0.2081145  -0.158   0.8744

OnlineSecurityNo internet serviceNA NA  NA   NA

OnlineSecurityYes-0.2760602  0.2132917  -1.294   0.1956

PaperlessBillingYes   0.3509944  0.0890884   3.940 8.15e-05
***
PartnerYes0.0306815  0.0940650   0.326   0.7443

PaymentMethodCredit card (automatic) -0.0710923  0.1377252  -0.516   0.6057

PaymentMethodElectronic check 0.3074078  0.1137939   2.701   0.0069
**
PaymentMethodMailed check-0.0201076  0.1377539  -0.146   0.8839

PhoneServiceYes  NA NA  NA   NA

SeniorCitizen 0.1856454  0.1023527   1.814   0.0697
.
StreamingMoviesNo internet service   NA NA  NA   NA

StreamingMoviesYes0.5260087  0.3899615   1.349   0.1774

StreamingTVNo internet service   NA NA  NA   NA

StreamingTVYes0.4781321  0.3905777   1.224   0.2209

TechSupportNo internet service   NA NA  NA   NA

TechSupportYes   -0.2511197  0.2181612  -1.151   0.2497

tenure   -0.0702813  0.0077113  -9.114  < 2e-16
***
TotalCharges  0.0004276  0.874   4.892 9.97e-07
***

On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsem...@comcast.net>
wrote:

>
> > On Mar 10, 2016, at 8:08 AM, Michael Artz <michaelea...@gmail.com>
> wrote:
> >
> > HI all,
> > I have the following error -
> >> resultVector <- predict(logitregressmodel, dataset1, type='response')
> > Warning message:
> > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==
> :
> >  prediction from a rank-deficient fit may be misleading
>
> It wasn't an R error. It was an R warning. Was the `summary` output on
> logitregressmodel informative? Does the resultVector look sensible given
> its inputs?
>
>
> > I have seen on internet that there may be some collinearity in the data
> and
> > this is causing that.  How can I be sure?
>
> Do some diagnostics. After looking carefully at the output of
> summary(logitregressmodel)  and perhaps summary(dataset1) if it was the
> original input to the modeling functions, and then you could move on to
> looking at cross-correlations on things you think are continuous and
> crosstabs on factor variables and the condition number on the full data
> matrix.
>
> Lots of stuff turns up on search for "detecting collinearity condition
> number in r"
>
> >
> > Thanks
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> A

[R] Prediction from a rank deficient fit may be misleading

2016-03-10 Thread Michael Artz
HI all,
I have the following error -
  >  resultVector <- predict(logitregressmodel, dataset1, type='response')
Warning message:
In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==  :
  prediction from a rank-deficient fit may be misleading

I have seen on internet that there may be some collinearity in the data and
this is causing that.  How can I be sure?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.