Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Duncan Murdoch
There's a package called "pivottabler" which exports PivotTable: 
http://pivottabler.org.uk/reference/PivotTable.html .


Duncan Murdoch

On 30/09/2023 7:11 a.m., John Kane wrote:

To follow up on Rui Barradas's post, I do not think PivotTable is an R
command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in
the {tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:


Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has

the

FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with
calls to library() when using non-base functionality.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Ebert,Timothy Aaron
In this sort of post it would help if we knew the package that was being used 
for the example. I found one option.
https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html

There may be a way to create a custom data type that would be a date but 
restricted to a -mm format. I do not know how to do this.
Could you work with the date as a string with a -mm format. The issue is 
that R will not handle the string as a date.
A third option would be to look at the lubridate package that can be installed 
by itself or as part of tidyverse. I do not promise that this is a solution, 
but it could be.



-Original Message-
From: R-help  On Behalf Of John Kane
Sent: Saturday, September 30, 2023 7:11 AM
To: Rui Barradas 
Cc: Paul Bernal ; R 
Subject: Re: [R] Grouping by Date and showing count of failures by date

[External Email]

To follow up on Rui Barradas's post, I do not think PivotTable is an R command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in the 
{tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:

> Às 21:29 de 29/09/2023, Paul Bernal escreveu:
> > Dear friends,
> >
> > Hope you are doing great. I am attaching the dataset I am working
> > with because, when I tried to dput() it, I was not able to copy the
> > entire result from dput(), so I apologize in advance for that.
> >
> > I am interested in creating a column named Failure_Date_Period that
> > has
> the
> > FAILDATE but formatted as _MM. Then I want to count the number
> > of failures (given by column WONUM) and just have a dataframe that
> > has the FAILDATE and the count of WONUM.
> >
> > I tried this:
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt$defineCalculation(calculationName = "FailCounts",
> > summariseExpression="n()")
> > pt$renderPivot()
> >
> > but I was not successful. Bottom line, I need to create a new
> > dataframe that has the number of failures by FAILDATE, but in -MM 
> > format.
> >
> > Any help and/or guidance will be greatly appreciated.
> >
> > Kind regards,
> > Paul
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://st/
> > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl
> > .edu%7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a6233
> > 1e1b84%7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> > %7C%7C=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D
> > ed=0
> > PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C76
> 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=I8
> QM8BE7EdpXzfFuwe93IqqL4JS7wWGgfr24XRH5LHs%3D=0
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> No data is attached. Maybe try
>
> dput(head(failuredf, 30))
>
> ?
>
> And where can we find non-base PivotTable? Please start the scripts
> with calls to library() when using non-base functionality.
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> presença de vírus.
> http://www.a/
> vg.com%2F=05%7C01%7Ctebert%40ufl.edu%7C7647cf60560f40177c9908dbc1
> a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638316691975258863
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=zBgPdOLZNvCOa4n2GXWWjLh4wg
> 3L4TdGXBMaGJ6n%2BsI%3D=0
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu
> %7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> ta=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D=0
> PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C76
> 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C

Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread John Kane
To follow up on Rui Barradas's post, I do not think PivotTable is an R
command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in
the {tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:

> Às 21:29 de 29/09/2023, Paul Bernal escreveu:
> > Dear friends,
> >
> > Hope you are doing great. I am attaching the dataset I am working with
> > because, when I tried to dput() it, I was not able to copy the entire
> > result from dput(), so I apologize in advance for that.
> >
> > I am interested in creating a column named Failure_Date_Period that has
> the
> > FAILDATE but formatted as _MM. Then I want to count the number of
> > failures (given by column WONUM) and just have a dataframe that has the
> > FAILDATE and the count of WONUM.
> >
> > I tried this:
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt$defineCalculation(calculationName = "FailCounts",
> > summariseExpression="n()")
> > pt$renderPivot()
> >
> > but I was not successful. Bottom line, I need to create a new dataframe
> > that has the number of failures by FAILDATE, but in -MM format.
> >
> > Any help and/or guidance will be greatly appreciated.
> >
> > Kind regards,
> > Paul
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> No data is attached. Maybe try
>
> dput(head(failuredf, 30))
>
> ?
>
> And where can we find non-base PivotTable? Please start the scripts with
> calls to library() when using non-base functionality.
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> presença de vírus.
> www.avg.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Rui Barradas

Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with 
calls to library() when using non-base functionality.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Paul Bernal
Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Question

2020-03-22 Thread Chris Evans
Here's a very "step by step" example with dplyr as I'm trying to teach myself 
the Tidyverse way of being

library(dplyr)

# SerialMeasurementMeas_testSerial_test
# 117failfail
# 116passfail
# 212passpass
# 28passpass
# 210passpass
# 319failfail
# 313passpass

dat <- as.data.frame(list(Serial = c(1,1,2,2,2,3,3),
  Measurement = c(17, 16, 12, 8, 10, 19, 13),
  Meas_test = c("fail", "pass", "pass", "pass", "pass", 
"fail", "pass")))

dat %>%
  group_by(Serial) %>%
  summarise(Serial_test = sum(Meas_test == "fail")) %>%
  mutate(Serial_test = if_else(Serial_test > 0, 1, 0),
 Serial_test = factor(Serial_test,
  levels = 0:1,
  labels = c("pass", "fail"))) -> groupedDat

dat %>%
  left_join(groupedDat) # add -> dat to the end to pip to dat

Gives:

  Serial Measurement Meas_test Serial_test
1  1  17  failfail
2  1  16  passfail
3  2  12  passpass
4  2   8  passpass
5  2  10  passpass
6  3  19  failfail
7  3  13  passfail

Would be easier for us if used dput() to share your data but thanks for the 
minimal example!

Chris

- Original Message -
> From: "Ivan Krylov" 
> To: "Thomas Subia via R-help" 
> Cc: "Thomas Subia" 
> Sent: Sunday, 22 March, 2020 07:24:15
> Subject: Re: [R] Grouping Question

> On Sat, 21 Mar 2020 20:01:30 -0700
> Thomas Subia via R-help  wrote:
> 
>> Serial_test is a pass, when all of the Meas_test are pass for a given
>> serial. Else Serial_test is a fail.
> 
> Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
> packages.
> 
> --
> Best regards,
> Ivan
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chris Evans  Visiting Professor, University of Sheffield 

I do some consultation work for the University of Roehampton 
 and other places
but  remains my main Email address.  I have a work web site 
at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see: 
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
That page will also take you to my blog which started with earlier joys in 
France and Spain!

If you want to book to talk, I am trying to keep that to Thursdays and my diary 
is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Question

2020-03-22 Thread Ivan Krylov
On Sat, 21 Mar 2020 20:01:30 -0700
Thomas Subia via R-help  wrote:

> Serial_test is a pass, when all of the Meas_test are pass for a given
> serial. Else Serial_test is a fail.

Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
packages.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping Question

2020-03-21 Thread Thomas Subia via R-help
Colleagues,

Here is my dataset.

Serial  Measurement Meas_test   Serial_test
1   17  failfail
1   16  passfail
2   12  passpass
2   8   passpass
2   10  passpass
3   19  failfail
3   13  passpass

If a measurement is less than or equal to 16, then Meas_test is pass. Else
Meas_test is fail
This is easy to code.

Serial_test is a pass, when all of the Meas_test are pass for a given
serial. Else Serial_test is a fail.
I'm at a loss to figure out how to do this in R.

Some guidance would be appreciated.

All the best,

Thomas Subia

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman
Rui

Your first code worked just fine.

Jeff

-Original Message-
From: Rui Barradas <ruipbarra...@sapo.pt> 
Sent: Saturday, May 26, 2018 8:30 AM
To: reichm...@sbcglobal.net; 'R-help' <r-help@r-project.org>
Subject: Re: [R] Grouping by 3 variable and renaming groups

Hello,

Sorry, but I think my first answer is wrong.
You probably want something along the lines of


sp <- split(priceStore_Grps, priceStore_Grps$StorePC) res <- 
lapply(seq_along(sp), function(i){
 sp[[i]]$StoreID <- paste("Store", i, sep = "_")
 sp[[i]]
})
res <- do.call(rbind, res)
row.names(res) <- NULL


Hope this helps,

Rui Barradas

On 5/26/2018 2:22 PM, Rui Barradas wrote:
> Hello,
> 
> See if this is it:
> 
> priceStore_Grps$StoreID <- paste("Store", 
> seq_len(nrow(priceStore_Grps)), sep = "_")
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> On 5/26/2018 2:03 PM, Jeff Reichman wrote:
>> ALCON
>>
>>
>> I'm trying to figure out how to rename groups in a data frame after 
>> groups by selected variabels.  I am using the dplyr library to group 
>> my data by 3 variables as follows
>>
>>
>> # group by lat (StoreX)/long (StoreY)
>>
>> priceStore <- LapTopSales[,c(4,5,15,16)]
>>
>> priceStore <- priceStore[complete.cases(priceStore), ]  # keep only 
>> non NA records
>>
>> priceStore_Grps <- priceStore %>%
>>
>>group_by(StorePC, StoreX, StoreY) %>%
>>
>>summarize(meanPrice=(mean(RetailPrice)))
>>
>>
>> which results in .
>>
>>
>>> priceStore_Grps
>>
>> # A tibble: 15 x 4
>>
>> # Groups:   StorePC, StoreX [?]
>>
>> StorePC  StoreX StoreY meanPrice
>>
>> 
>>
>> 1 CR7 8LE  532714 168302  472.
>>
>> 2 E2 0RY   535652 182961  520.
>>
>> 3 E7 8NW   541428 184515  467.
>>
>> 4 KT2 5AU  517917 170243  522.
>>
>> 5 N17 6QA  533788 189994  523.
>>
>>
>> Which is fine, but I then want to give each group (e.g. CR7 8LE  
>> 532714
>> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique 
>> identifier.
>>
>>
>> StorePC  StoreX StoreY meanPrice
>>
>> 
>>
>> 1 CR7 8LE  532714 168302  472.   Store 1
>>
>> 2 E2 0RY   535652 182961  520.   Store 2
>>
>> 3 E7 8NW   541428 184515  467.   Store 3
>>
>> 4 KT2 5AU  517917 170243  522.   Store 4
>>
>> 5 N17 6QA  533788 189994  523.   Store 5
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman
Rui

That did it 

Jeff

-Original Message-
From: Rui Barradas <ruipbarra...@sapo.pt> 
Sent: Saturday, May 26, 2018 8:23 AM
To: reichm...@sbcglobal.net; 'R-help' <r-help@r-project.org>
Subject: Re: [R] Grouping by 3 variable and renaming groups

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", seq_len(nrow(priceStore_Grps)), sep = 
"_")


Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:
> ALCON
> 
>   
> 
> I'm trying to figure out how to rename groups in a data frame after groups
> by selected variabels.  I am using the dplyr library to group my data by 3
> variables as follows
> 
>   
> 
> # group by lat (StoreX)/long (StoreY)
> 
> priceStore <- LapTopSales[,c(4,5,15,16)]
> 
> priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
> records
> 
> priceStore_Grps <- priceStore %>%
> 
>group_by(StorePC, StoreX, StoreY) %>%
> 
>summarize(meanPrice=(mean(RetailPrice)))
> 
>   
> 
> which results in .
> 
>   
> 
>> priceStore_Grps
> 
> # A tibble: 15 x 4
> 
> # Groups:   StorePC, StoreX [?]
> 
> StorePC  StoreX StoreY meanPrice
> 
> 
> 
> 1 CR7 8LE  532714 168302  472.
> 
> 2 E2 0RY   535652 182961  520.
> 
> 3 E7 8NW   541428 184515  467.
> 
> 4 KT2 5AU  517917 170243  522.
> 
> 5 N17 6QA  533788 189994  523.
> 
>   
> 
> Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique
> identifier.
> 
>   
> 
> StorePC  StoreX StoreY meanPrice
> 
> 
> 
> 1 CR7 8LE  532714 168302  472.   Store 1
> 
> 2 E2 0RY   535652 182961  520.   Store 2
> 
> 3 E7 8NW   541428 184515  467.   Store 3
> 
> 4 KT2 5AU  517917 170243  522.   Store 4
> 
> 5 N17 6QA  533788 189994  523.   Store 5
> 
>   
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Rui Barradas

Hello,

Sorry, but I think my first answer is wrong.
You probably want something along the lines of


sp <- split(priceStore_Grps, priceStore_Grps$StorePC)
res <- lapply(seq_along(sp), function(i){
sp[[i]]$StoreID <- paste("Store", i, sep = "_")
sp[[i]]
})
res <- do.call(rbind, res)
row.names(res) <- NULL


Hope this helps,

Rui Barradas

On 5/26/2018 2:22 PM, Rui Barradas wrote:

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", 
seq_len(nrow(priceStore_Grps)), sep = "_")



Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:

ALCON


I'm trying to figure out how to rename groups in a data frame after 
groups
by selected variabels.  I am using the dplyr library to group my data 
by 3

variables as follows


# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only 
non NA

records

priceStore_Grps <- priceStore %>%

   group_by(StorePC, StoreX, StoreY) %>%

   summarize(meanPrice=(mean(RetailPrice)))


which results in .



priceStore_Grps


# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

    StorePC  StoreX StoreY meanPrice

        

1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.


Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.


    StorePC  StoreX StoreY meanPrice

        

1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Rui Barradas

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", 
seq_len(nrow(priceStore_Grps)), sep = "_")



Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:

ALCON

  


I'm trying to figure out how to rename groups in a data frame after groups
by selected variabels.  I am using the dplyr library to group my data by 3
variables as follows

  


# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
records

priceStore_Grps <- priceStore %>%

   group_by(StorePC, StoreX, StoreY) %>%

   summarize(meanPrice=(mean(RetailPrice)))

  


which results in .

  


priceStore_Grps


# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

StorePC  StoreX StoreY meanPrice



1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.

  


Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.

  


StorePC  StoreX StoreY meanPrice



1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5

  



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman
ALCON

 

I'm trying to figure out how to rename groups in a data frame after groups
by selected variabels.  I am using the dplyr library to group my data by 3
variables as follows

 

# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
records

priceStore_Grps <- priceStore %>%

  group_by(StorePC, StoreX, StoreY) %>%

  summarize(meanPrice=(mean(RetailPrice)))

 

which results in .

 

> priceStore_Grps

# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

   StorePC  StoreX StoreY meanPrice

   

1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.

 

Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.

 

   StorePC  StoreX StoreY meanPrice

   

1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping in R

2015-06-18 Thread PIKAL Petr
Hi

We can only guess what you really want.

Maybe this.

set.seed(111)
cust-sample(letters[1:5], 500, replace =T)
value-sample(1:1000, 500)
month-sample(1:12, 500, replace=T)
dat-data.frame(cust, value, month)
dat.ag-aggregate(dat$value, list(dat$month, dat$cust), sum)

 head(dat.ag)
  Group.1 Group.2x
1   1   a 2444
2   2   a 6234
3   3   a 6082
4   4   a 3691
5   5   a 3044
6   6   a 3534

dput(dat.ag)
structure(list(Group.1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L), Group.2 = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c(a, b,
c, d, e), class = factor), x = c(2444L, 6234L, 6082L,
3691L, 3044L, 3534L, 7444L, 1819L, 2295L, 4774L, 3659L, 1159L,
6592L, 1272L, 8245L, 2324L, 5189L, 3935L, 2945L, 2386L, 2796L,
2869L, 3142L, 4657L, 4411L, 6223L, 3266L, 3842L, 6056L, 7472L,
3879L, 7135L, 4544L, 4498L, 2703L, 3409L, 2748L, 2288L, 2654L,
4995L, 4626L, 5543L, 2162L, 4681L, 5853L, 6229L, 3001L, 5274L,
3852L, 2635L, 5643L, 2809L, 2988L, 3756L, 5180L, 2997L, 4883L,
4208L, 2669L, 3151L)), .Names = c(Group.1, Group.2, x), row.names = c(NA,
-60L), class = data.frame)


But maybe something different. Who knows?

If you wanted grouping by value use

?cut or ?findInterval

Cheers
Petr


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Shivi82
 Sent: Thursday, June 18, 2015 9:22 AM
 To: r-help@r-project.org
 Subject: [R] Grouping in R

 Hi All,

 I am working on a data where the total row count is 25+ and have
 approx.
 20 variables. One of the var on which i need to summarize the data is
 Consignor i.e. seller name.

 Now the issue here is after deleting all the duplicate names i still
 have 55000 unique customer name and i am not sure on how to summarize
 the data.

 Is there a possibility that i could create 8 or 10 groups based on the
 weight or booking they made from our company and eventually all 55000
 customers would fall under these 10 groups. Then it could be easier for
 me to analyze in which group there is a variance on a month on month
 level.




 --
 View this message in context: http://r.789695.n4.nabble.com/Grouping-
 in-R-tp4708800.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings

[R] Grouping in R

2015-06-18 Thread Shivi82
Hi All,

I am working on a data where the total row count is 25+ and have approx.
20 variables. One of the var on which i need to summarize the data is
Consignor i.e. seller name. 

Now the issue here is after deleting all the duplicate names i still have
55000 unique customer name and i am not sure on how to summarize the data.

Is there a possibility that i could create 8 or 10 groups based on the
weight or booking they made from our company and eventually all 55000
customers would fall under these 10 groups. Then it could be easier for me
to analyze in which group there is a variance on a month on month level.




--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-in-R-tp4708800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Maria Kernecker
Dear all, 

I am trying to run a GLMM following the procedure described by Rhodes et al. 
(Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in 
his example, I have four sets of explanatory variables: 
1. Land use - 1 variable, factor (forest or agriculture)
2. Location - 1 variable, factor (riparian or upland)
3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
manure, annual crop)
4. Vegetation patterns - 4 variables that are continuous (# of plant species in 
4 different functional guilds)

How do I create these sets?  I would like to build my model with these sets 
only instead of listing every variable. 

Also: is there a way of running all possible models with the different 
combinations of these sets and/or variables, sort of like running ordistep for 
ordinations?

Thanks a bunch in advance for your help!
Maria  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Bert Gunter
Have you read An Introduction to R (or other online tutorial)? If
not, please do so before posting further here. It sounds like you are
missing very basic knowledge -- on factors -- which you need to learn
about before proceeding.

?factor

gives you the answer you seek, I believe.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
maria.kernec...@mail.mcgill.ca wrote:
 Dear all,

 I am trying to run a GLMM following the procedure described by Rhodes et al. 
 (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in 
 his example, I have four sets of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant species 
 in 4 different functional guilds)

 How do I create these sets?  I would like to build my model with these 
 sets only instead of listing every variable.

 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?

 Thanks a bunch in advance for your help!
 Maria

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Bert Gunter
Unless there is reason to keep the conversation private, always reply
to the list. How will anyone else know that my answer wasn't
satisfactory?

1. I don't intend to go through your references. A minimal
reproducible example of what you wish to do and what you tried would
help.

2. Have you read An Intro to R?

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD
mkernec...@gmail.com wrote:
 Thanks for getting back to me.

 It seems I didn't write my question clearly and that it was misunderstood - 
 even if it is easy to answer: I would like to reduce the number of 
 explanatory variables in my model by using sets or categories that these 
 variables belong to, like Rhodes et al. did in their chapter, or like Lentini 
 et al. 2012 did in their paper.

 Factor is not the answer I am looking for, unfortunately.

 On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote:

 Have you read An Introduction to R (or other online tutorial)? If
 not, please do so before posting further here. It sounds like you are
 missing very basic knowledge -- on factors -- which you need to learn
 about before proceeding.

 ?factor

 gives you the answer you seek, I believe.

 Cheers,
 Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch




 On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
 maria.kernec...@mail.mcgill.ca wrote:
 Dear all,

 I am trying to run a GLMM following the procedure described by Rhodes et 
 al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . 
 Like in his example, I have four sets of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant 
 species in 4 different functional guilds)

 How do I create these sets?  I would like to build my model with these 
 sets only instead of listing every variable.

 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?

 Thanks a bunch in advance for your help!
 Maria

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Don McKenzie
Reading the Intro, as Bert suggests, would likely solve some of your problems. 
If you think about how many combinations it would take, using only one variable 
from each group in any one model, you would see that the number of individual 
models (12) is not so onerous that you couldn’t specify them one at a time.

On Apr 3, 2014, at 8:55 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Unless there is reason to keep the conversation private, always reply
 to the list. How will anyone else know that my answer wasn't
 satisfactory?
 
 1. I don't intend to go through your references. A minimal
 reproducible example of what you wish to do and what you tried would
 help.
 
 2. Have you read An Intro to R?
 
 Cheers,
 Bert
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch
 
 
 
 
 On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD
 mkernec...@gmail.com wrote:
 Thanks for getting back to me.
 
 It seems I didn't write my question clearly and that it was misunderstood - 
 even if it is easy to answer: I would like to reduce the number of 
 explanatory variables in my model by using sets or categories that these 
 variables belong to, like Rhodes et al. did in their chapter, or like 
 Lentini et al. 2012 did in their paper.
 
 Factor is not the answer I am looking for, unfortunately.
 
 On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote:
 
 Have you read An Introduction to R (or other online tutorial)? If
 not, please do so before posting further here. It sounds like you are
 missing very basic knowledge -- on factors -- which you need to learn
 about before proceeding.
 
 ?factor
 
 gives you the answer you seek, I believe.
 
 Cheers,
 Bert
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch
 
 
 
 
 On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
 maria.kernec...@mail.mcgill.ca wrote:
 Dear all,
 
 I am trying to run a GLMM following the procedure described by Rhodes et 
 al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . 
 Like in his example, I have four sets of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant 
 species in 4 different functional guilds)
 
 How do I create these sets?  I would like to build my model with these 
 sets only instead of listing every variable.
 
 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?
 
 Thanks a bunch in advance for your help!
 Maria
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Don McKenzie
Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences 
College of the Environment
University of Washington
d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping on a Distance Matrix

2014-02-13 Thread Dario Strbenac
Hello,

I'm looking for a function that groups elements below a certain distance 
threshold, based on a distance matrix. In other words, I'd like to group 
samples without using a standard clustering algorithm on the distance matrix. 
For example, let the distance matrix be :

  A B C D
A 0  0.03  0.77  1.12  
B  0.03 0  1.59  1.11
C  0.77  1.59 0  0.09  
D  1.12  1.11  0.09 0

Two clusters would be found with a cutoff of 0.1. The first contains A,B. The 
second has C,D. Is there an efficient function that does this ? I can think of 
how to do this recursively, but am hoping it's already been considered.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping on a Distance Matrix

2014-02-13 Thread Bert Gunter
You need to re-think. What you said is nonsense. Use an appropriate
clustering algorithm.
(a can be near b; b can be near c; but a is not near c, using near =
closer than threshhold)

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Feb 13, 2014 at 12:00 AM, Dario Strbenac
dstr7...@uni.sydney.edu.au wrote:
 Hello,

 I'm looking for a function that groups elements below a certain distance 
 threshold, based on a distance matrix. In other words, I'd like to group 
 samples without using a standard clustering algorithm on the distance matrix. 
 For example, let the distance matrix be :

   A B C D
 A 0  0.03  0.77  1.12
 B  0.03 0  1.59  1.11
 C  0.77  1.59 0  0.09
 D  1.12  1.11  0.09 0

 Two clusters would be found with a cutoff of 0.1. The first contains A,B. The 
 second has C,D. Is there an efficient function that does this ? I can think 
 of how to do this recursively, but am hoping it's already been considered.

 --
 Dario Strbenac
 PhD Student
 University of Sydney
 Camperdown NSW 2050
 Australia
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi

I would like to group commands, so that after a group of commands has
been executed, the variables defined in that group are automatically
deleted.

My reasoning: I have a longer sript which is used to load data, do
analysis and plot graphs, all part of a document (in org-mode /
emacs).

I have several datasets which are loaded, and each one is quite
big. So after doing one part of the job (e.g. analysing the data and
storing the results) I want to delete all variables used to free space
and to avoid having these variables being used in the next block and
still having the old (for this block invalid) values.

I can't use rm(list=ls()) as I have some variables as constants
which do not change over the whole document and also some functions
defined.

I could put each block in a function and then call the function and
delete it afterwards, but this is as I see it abusing functions.

I don't want to keep track manually of the variables.

Therefore my question:

Can I do something like:

x - 15

{ #here begins the block
a - 1:100
b - 4:400
} # here ends the block

# here are and b not defined anymore
# but x is still defined

{} is great for grouping the commands, but the variables are not
deleted afterwards.

Am I missing a language feature in R?

Rainer

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v
49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F
I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw
Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g
0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz
7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0=
=dBo5
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread jim holtman
Check out the use of the 'local' function:


 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199420 10.7 407500 21.8   35 18.7
Vcells 308004  2.4 786432  6.0   786424  6.0
 result - local({
+ a - rnorm(100)  # big objects
+ b - rnorm(100)
+ mean(a + b)  # return value
+ })

 result
[1] 0.0001819203
 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199666 10.7 407500 21.8   35 18.7
Vcells 308780  2.42975200 22.7  3710863 28.4


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug rai...@krugs.de wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi

 I would like to group commands, so that after a group of commands has
 been executed, the variables defined in that group are automatically
 deleted.

 My reasoning: I have a longer sript which is used to load data, do
 analysis and plot graphs, all part of a document (in org-mode /
 emacs).

 I have several datasets which are loaded, and each one is quite
 big. So after doing one part of the job (e.g. analysing the data and
 storing the results) I want to delete all variables used to free space
 and to avoid having these variables being used in the next block and
 still having the old (for this block invalid) values.

 I can't use rm(list=ls()) as I have some variables as constants
 which do not change over the whole document and also some functions
 defined.

 I could put each block in a function and then call the function and
 delete it afterwards, but this is as I see it abusing functions.

 I don't want to keep track manually of the variables.

 Therefore my question:

 Can I do something like:

 x - 15

 { #here begins the block
 a - 1:100
 b - 4:400
 } # here ends the block

 # here are and b not defined anymore
 # but x is still defined

 {} is great for grouping the commands, but the variables are not
 deleted afterwards.

 Am I missing a language feature in R?

 Rainer

 - --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

 Tel :   +33 - (0)9 53 10 27 44
 Cell:   +33 - (0)6 85 62 59 98
 Fax :   +33 - (0)9 58 10 27 44

 Fax (D):+49 - (0)3 21 21 25 22 44

 email:  rai...@krugs.de

 Skype:  RMkrug
 -BEGIN PGP SIGNATURE-
 Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v
 49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F
 I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw
 Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g
 0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz
 7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0=
 =dBo5
 -END PGP SIGNATURE-

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 01/20/14, 14:27 , jim holtman wrote:
 Check out the use of the 'local' function:

True - have completely forgotten the local function.

Thanks,

Rainer

 
 
 gc()
 used (Mb) gc trigger (Mb) max used (Mb) Ncells 199420 10.7
 407500 21.8   35 18.7 Vcells 308004  2.4 786432  6.0
 786424  6.0
 result - local({
 + a - rnorm(100)  # big objects + b - rnorm(100) 
 + mean(a + b)  # return value + })
 
 result
 [1] 0.0001819203
 gc()
 used (Mb) gc trigger (Mb) max used (Mb) Ncells 199666 10.7
 407500 21.8   35 18.7 Vcells 308780  2.42975200 22.7
 3710863 28.4
 
 
 Jim Holtman Data Munger Guru
 
 What is the problem that you are trying to solve? Tell me what you
 want to do, not how you want to do it.
 
 
 On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug rai...@krugs.de
 wrote: Hi
 
 I would like to group commands, so that after a group of commands
 has been executed, the variables defined in that group are
 automatically deleted.
 
 My reasoning: I have a longer sript which is used to load data, do 
 analysis and plot graphs, all part of a document (in org-mode / 
 emacs).
 
 I have several datasets which are loaded, and each one is quite 
 big. So after doing one part of the job (e.g. analysing the data
 and storing the results) I want to delete all variables used to
 free space and to avoid having these variables being used in the
 next block and still having the old (for this block invalid)
 values.
 
 I can't use rm(list=ls()) as I have some variables as constants 
 which do not change over the whole document and also some
 functions defined.
 
 I could put each block in a function and then call the function
 and delete it afterwards, but this is as I see it abusing
 functions.
 
 I don't want to keep track manually of the variables.
 
 Therefore my question:
 
 Can I do something like:
 
 x - 15
 
 { #here begins the block a - 1:100 b - 4:400 } # here ends the
 block
 
 # here are and b not defined anymore # but x is still defined
 
 {} is great for grouping the commands, but the variables are not 
 deleted afterwards.
 
 Am I missing a language feature in R?
 
 Rainer
 
 
 __ 
 R-help@r-project.org mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
 posting guide http://www.R-project.org/posting-guide.html and
 provide commented, minimal, self-contained, reproducible code.

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS3Sn/AAoJENvXNx4PUvmC7uIIAIkXdCNVCA1sqJ7jqODTWbG9
OrDTkhRD/IyR//39sCj5YC79peLbPkpKtQgnmoj7jMoNg2euxmCn3wIGLigWhy2w
cyGqh/TocfRnYVKyQXz4LC/IqVFAi+W9ymyevVDA0vQ9RcEYILEsXxjxl06VhZhS
wzOHOiXXdHka8xswjChPJRjA/17LQaStLYeEIQbukz3WCj1wTY68b6YixqlSh/BZ
7C91EULBQtTqV5OetvfV9lulicw0XyWp+ZcNvEa72Y3jZw5DX0LloLcRuuGLZf3N
dxmnB7Uj4kBArjupgfGtkwZzT1d3UX0bb3vqPt0TRoeJCT04XnupbpdpwUOhJ8c=
=0zID
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Matrix by Columns; OHLC Data

2013-09-26 Thread arun
HI,
May be this helps:

set.seed(24)
 mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24)
colnames(mat1)- rep(c(O,H,L,C),6)
indx-seq_along(colnames(mat1))
n- length(unique(colnames(mat1)))
 res- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])
lapply(res,head,2)
#$`1`
#  O  O  O  O  O  O
#[1,] 18 56 51 24 24 52
#[2,] 14 31 60 12 43 34
#
#$`2`
#  H  H  H  H  H  H
#[1,] 20  6  4 23 10  2
#[2,] 15 37 22 52 30 42
#
#$`3`
#  L  L  L  L  L  L
#[1,] 30 25 29  1 57 16
#[2,] 15 23 15 10 44 60
#
#$`4`
#  C  C  C  C  C  C
#[1,] 20 13  8 44  5 13
#[2,] 45 17 35  8 25 12

A.K.



Motivation: 

Bring in data containing a number of columns divisable by 4. 
This data contains several different assets and the columns correspond 
to Open,High,Low,Close, Open,High,Low,Close,  etc (thus divisible by
 4). From where I am getting this data, the header is not labled as 
Open,High,Low,Close, but rather just has the asset symbol. 

The end goal is to have each Open,High,Low,Close,  as its own 
OHLC object, to be run through different volatility functions (via 
QuantMod ) 

I believe i am best served by first grouping the original data 
so that each asset is its own object, with 4 columns. Then i can rename 
the columns to be: 
colnames(function$asset) -c(Open, High,Low, Close) 

I've attempted to use split, but am having trouble with split along the 
columns. 

Obviously I could manipulate the indexing, with something like 
data[i:i+4] and use a loop. Maybe this indexing approach would work with
 use of apply(). 


Previously, I've been using Mathematica for most of my data 
manipulation, and there I would partition the entire data set i.e. 
Matrix, into   column# / 4 separate objects.  So, in that case I have a 3
 dimensional object. I'd then call the object by its 3rd dimension index
 # [][#]. 

I'm having trouble doing that here. Any thoughts, or at the least  helping me 
to group the data by column. 

For the sake of possible examples, lets say the dimensions of my data is n.rows 
= 30, n.col = 24 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Matrix by Columns; OHLC Data

2013-09-26 Thread arun
Hi Jake.

Sorry, I misunderstood about what you wanted.
Instead of this:

lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])

If I use:
res1- lapply(split(indx,(indx-1)%/%n+1),function(i) mat1[,i])

#or
lapply(split(indx, as.numeric(gl(ncol(mat1),n,ncol(mat1,function(i) 
mat1[,i])



 lapply(res1,head,2)[1:2]
#$`1`
 #     O  H  L  C
#[1,] 18 20 30 20
#[2,] 14 15 15 45
#
#$`2`
 #     O  H  L  C
#[1,] 56  6 25 13
#[2,] 31 37 23 17

A.K.




So, i got it worked out. Thanks for your input. I see that you used a 
mod, which worked well for the application which you solved, and an 
application that will likely come up again. Anyways, here is the 
solution I was lookin for: 


set.seed(24) 
 mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24) 
colnames(mat1)- rep(c(O,H,L,C),6) 
indx-seq_along(colnames(mat1)) 
n- length(unique(colnames(mat1))) 


res -lapply(split(indx,rep(1:6,each = 4, times = 1)),function(i) mat1[,i]) 
##rep(1:6,each = 4, times = 1) 
## [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 

lapply(res,head,2) 


$`1` 
      O  H  L  C 
[1,] 18 20 30 20 
[2,] 14 15 15 45 

$`2` 
      O  H  L  C 
[1,] 56  6 25 13 
[2,] 31 37 23 17 

$`3` 
      O  H  L  C 
[1,] 51  4 29  8 
[2,] 60 22 15 35 

$`4` 
      O  H  L  C 
[1,] 24 23  1 44 
[2,] 12 52 10  8 

$`5` 
      O  H  L  C 
[1,] 24 10 57  5 
[2,] 43 30 44 25 

$`6` 
      O  H  L  C 
[1,] 52  2 16 13 
[2,] 34 42 60 12 

Thanks again 


- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Thursday, September 26, 2013 5:15 PM
Subject: Re: Grouping Matrix by Columns; OHLC Data

HI,
May be this helps:

set.seed(24)
 mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24)
colnames(mat1)- rep(c(O,H,L,C),6)
indx-seq_along(colnames(mat1))
n- length(unique(colnames(mat1)))
 res- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])
lapply(res,head,2)
#$`1`
#  O  O  O  O  O  O
#[1,] 18 56 51 24 24 52
#[2,] 14 31 60 12 43 34
#
#$`2`
#  H  H  H  H  H  H
#[1,] 20  6  4 23 10  2
#[2,] 15 37 22 52 30 42
#
#$`3`
#  L  L  L  L  L  L
#[1,] 30 25 29  1 57 16
#[2,] 15 23 15 10 44 60
#
#$`4`
#  C  C  C  C  C  C
#[1,] 20 13  8 44  5 13
#[2,] 45 17 35  8 25 12

A.K.



Motivation: 

Bring in data containing a number of columns divisable by 4. 
This data contains several different assets and the columns correspond 
to Open,High,Low,Close, Open,High,Low,Close,  etc (thus divisible by
4). From where I am getting this data, the header is not labled as 
Open,High,Low,Close, but rather just has the asset symbol. 

The end goal is to have each Open,High,Low,Close,  as its own 
OHLC object, to be run through different volatility functions (via 
QuantMod ) 

I believe i am best served by first grouping the original data 
so that each asset is its own object, with 4 columns. Then i can rename 
the columns to be: 
colnames(function$asset) -c(Open, High,Low, Close) 

I've attempted to use split, but am having trouble with split along the 
columns. 

Obviously I could manipulate the indexing, with something like 
data[i:i+4] and use a loop. Maybe this indexing approach would work with
use of apply(). 


Previously, I've been using Mathematica for most of my data 
manipulation, and there I would partition the entire data set i.e. 
Matrix, into   column# / 4 separate objects.  So, in that case I have a 3
dimensional object. I'd then call the object by its 3rd dimension index
# [][#]. 

I'm having trouble doing that here. Any thoughts, or at the least  helping me 
to group the data by column. 

For the sake of possible examples, lets say the dimensions of my data is n.rows 
= 30, n.col = 24 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping variables by a irregular time interval

2013-09-21 Thread Raoni Rodrigues
Hello all,

I have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
  1   2 999 22/07/2013 11:00:2117  1
  1   2 999 22/07/2013 11:33:4731  1
  1   2 999 22/07/2013 11:34:0019  1
  1   2 999 22/07/2013 11:34:1653  1
  1   2 999 22/07/2013 11:43:2015  1
  1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval: look to Tag 999: first
interval are between 11:00 and 11:10, second between 11:34 and 11:44 and
third are between 11:44 and 11:45.

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- Good Sport
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(3L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 18L, 19L, 19L, 19L, 19L, 20L, 23L, 19L, 17L, 17L,
17L, 23L, 18L, 1L, 1L, 1L, 2L, 2L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L,
13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 16L, 16L, 16L, 16L, 18L, 19L, 21L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 22L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L,
25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L,
27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L), .Label = c(19/06/2013
22:15,
19/06/2013 22:16, 22/07/2013 11:00, 22/07/2013 11:33, 22/07/2013
11:34,
22/07/2013 11:43, 22/07/2013 11:44, 22/07/2013 11:45, 25/07/2013
14:10,
25/07/2013 14:11, 25/07/2013 14:12, 25/07/2013 14:13, 25/07/2013
14:14,
25/07/2013 14:15, 25/07/2013 14:16, 25/07/2013 14:17, 25/07/2013
14:18,
25/07/2013 14:19, 25/07/2013 14:20, 25/07/2013 14:21, 25/07/2013
14:23,
25/07/2013 14:24, 25/07/2013 14:25, 25/07/2013 14:26, 25/07/2013
14:27,
25/07/2013 14:28, 25/07/2013 14:29, 25/07/2013 14:30), class =
factor),
Power = c(17L, 31L, 19L, 53L, 15L, 17L, 21L, 12L, 15L, 

Re: [R] Grouping variables by a irregular time interval

2013-09-21 Thread Raoni Rodrigues
Arun caught my attention that I committed a mistake with example data set.
I send now the correct, with same text explain my problem.

Sorry all of you for the confusion.

I have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
1   1   2 999 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:33:4731  1
3   1   2 999 22/07/2013 11:34:0019  1
4   1   2 999 22/07/2013 11:34:1653  1
5   1   2 999 22/07/2013 11:43:2015  1
6   1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
1   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
3   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
4   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
5   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
6   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval (look at tag 999).

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- Good Sport
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 68L, 70L, 72L, 73L, 71L, 75L, 86L,
74L, 64L, 64L, 65L, 87L, 67L, 1L, 2L, 3L, 4L, 5L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 66L, 69L, 76L, 77L,
78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 88L, 89L, 90L, 91L, 92L,
93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L,
105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L,
116L, 117L), .Label = c(19/06/2013 22:15:49, 19/06/2013 22:15:54,
19/06/2013 22:15:59, 19/06/2013 22:16:24, 19/06/2013 22:16:29,
22/07/2013 11:00:21, 22/07/2013 11:33:47, 22/07/2013 11:34:00,
22/07/2013 11:34:16, 22/07/2013 11:43:20, 22/07/2013 11:43:35,
22/07/2013 11:44:35, 22/07/2013 11:44:41, 22/07/2013 11:44:42,
22/07/2013 11:44:43, 22/07/2013 11:44:44, 22/07/2013 11:44:59,
22/07/2013 11:45:11, 22/07/2013 11:45:29, 22/07/2013 11:45:30,
22/07/2013 11:45:31, 22/07/2013 11:45:35, 22/07/2013 11:45:37,

[R] Grouping variables by a irregular time interval

2013-09-20 Thread Raoni Rodrigues
Hello all,

I´m have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
1   1   2 999 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:33:4731  1
3   1   2 999 22/07/2013 11:34:0019  1
4   1   2 999 22/07/2013 11:34:1653  1
5   1   2 999 22/07/2013 11:43:2015  1
6   1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
1   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
3   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
4   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
5   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
6   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval (look at tag 999)

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- Good Sport
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(3L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 18L, 19L, 19L, 19L, 19L, 20L, 23L, 19L, 17L, 17L,
17L, 23L, 18L, 1L, 1L, 1L, 2L, 2L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L,
13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 16L, 16L, 16L, 16L, 18L, 19L, 21L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 22L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L,
25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L,
27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L), .Label = c(19/06/2013
22:15,
19/06/2013 22:16, 22/07/2013 11:00, 22/07/2013 11:33, 22/07/2013
11:34,
22/07/2013 11:43, 22/07/2013 11:44, 22/07/2013 11:45, 25/07/2013
14:10,
25/07/2013 14:11, 25/07/2013 14:12, 25/07/2013 14:13, 25/07/2013
14:14,
25/07/2013 14:15, 25/07/2013 14:16, 25/07/2013 14:17, 25/07/2013
14:18,
25/07/2013 14:19, 25/07/2013 14:20, 25/07/2013 14:21, 25/07/2013
14:23,
25/07/2013 14:24, 25/07/2013 14:25, 25/07/2013 14:26, 25/07/2013
14:27,
25/07/2013 14:28, 25/07/2013 14:29, 25/07/2013 14:30), class =
factor),
Power = c(17L, 31L, 19L, 53L, 15L, 17L, 21L, 12L, 15L, 22L,
19L, 15L, 13L, 14L, 15L, 12L, 23L, 19L, 16L, 20L, 30L, 37L,
25L, 167L, 24L, 14L, 

Re: [R] grouping followed by finding frequent patterns in R

2013-03-10 Thread Bert Gunter
1.Please cc to the list, as I have here, unless your comments are off topic.

2. Use dput() (?dput) to include **small** amounts of data in your
message, as attachments are generally stripped from r-help.

3. I have no experience with itemsets or the arules package, but a
quick glance at the docs there said that your data argument must be in
a specific form coercible into an S4 transactions class. I suspect
that neither your initial data frame nor the list deriving from split
is, but maybe someone familiar with the package can tell you for sure.
That's why you need to cc to the list.

-- Bert


On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas crazydh...@gmail.com wrote:
 Dear Bert,

 My intention is to mine frequent itemsets of TRN_TYP for individual CIN out
 of that data.
 But the problem is using eclat after splitting gives the following error:

 Error in eclat(list) : internal error in trio library

 PS: I have attached my dataset.


 On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter gunter.ber...@gene.com wrote:

 I **suggest** that you explain what you wish to accomplish using a
 reproducible example rather than telling us what packages you think
 you should use. I believe you are making things too complicated; e.g.
 what do you mean by frequent patterns?  Moreover, basket format is
 rather unclear -- and may well be unnecessary. But using lists, it
 could be simply accomplished by

 ?split  ## as in
 the_list - with(yourdata, split(TYP,  CIN.TRN))

 or possibly

 the_list - with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

 Of course, these may be irrelevant and useless, but without knowing
 your purpose ...?

 -- Bert

 On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas crazydh...@gmail.com
 wrote:
  I have a data in the following form :
  CIN TRN_TYP
  90799541
  90799542
  90799543
  90799544
  90799545
  90799544
  90799545
  90799546
  90799547
  90799548
  90799549
  90799549
  ..
  ..
  ..
  there are 100 types of CIN (9079954,12441087,15246633,...) and
  respective
  TRN_TYP
 
  first of all, I want this data to be grouped into basket format:
  9079954   1, 2, 3, 4, 5, 
  12441087  19, 14, 21, 3, 7, ...
  .
  .
  .
  and then apply eclat from arules package to find frequent patterns.
 
  1) I ran the following code:
  file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
  file - file[!duplicated(file),]
  eclat(split(file$TRN_TYP,file$CIN))
 
  but it gave me the following error:
  Error in asMethod(object) : can not coerce list with transactions with
  duplicated items
 
  2) I ran this code:
  file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
  file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other
  columns as well, so I selecting only CIN and TRN_TYP
  file_new - file_new[!duplicated(file_new),]
  eclat(split(file_new$TRN_TYP,file_new$CIN))
 
  but again:
  Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
internal error in trio library
 
  PLEASE HELP
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:

 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm





-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping followed by finding frequent patterns in R

2013-03-09 Thread Dhiman Biswas
I have a data in the following form :
CIN TRN_TYP
90799541
90799542
90799543
90799544
90799545
90799544
90799545
90799546
90799547
90799548
90799549
90799549
..
..
..
there are 100 types of CIN (9079954,12441087,15246633,...) and respective
TRN_TYP

first of all, I want this data to be grouped into basket format:
9079954   1, 2, 3, 4, 5, 
12441087  19, 14, 21, 3, 7, ...
.
.
.
and then apply eclat from arules package to find frequent patterns.

1) I ran the following code:
file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
file - file[!duplicated(file),]
eclat(split(file$TRN_TYP,file$CIN))

but it gave me the following error:
Error in asMethod(object) : can not coerce list with transactions with
duplicated items

2) I ran this code:
file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other
columns as well, so I selecting only CIN and TRN_TYP
file_new - file_new[!duplicated(file_new),]
eclat(split(file_new$TRN_TYP,file_new$CIN))

but again:
Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
  internal error in trio library

PLEASE HELP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping followed by finding frequent patterns in R

2013-03-09 Thread Bert Gunter
I **suggest** that you explain what you wish to accomplish using a
reproducible example rather than telling us what packages you think
you should use. I believe you are making things too complicated; e.g.
what do you mean by frequent patterns?  Moreover, basket format is
rather unclear -- and may well be unnecessary. But using lists, it
could be simply accomplished by

?split  ## as in
the_list - with(yourdata, split(TYP,  CIN.TRN))

or possibly

the_list - with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

Of course, these may be irrelevant and useless, but without knowing
your purpose ...?

-- Bert

On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas crazydh...@gmail.com wrote:
 I have a data in the following form :
 CIN TRN_TYP
 90799541
 90799542
 90799543
 90799544
 90799545
 90799544
 90799545
 90799546
 90799547
 90799548
 90799549
 90799549
 ..
 ..
 ..
 there are 100 types of CIN (9079954,12441087,15246633,...) and respective
 TRN_TYP

 first of all, I want this data to be grouped into basket format:
 9079954   1, 2, 3, 4, 5, 
 12441087  19, 14, 21, 3, 7, ...
 .
 .
 .
 and then apply eclat from arules package to find frequent patterns.

 1) I ran the following code:
 file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
 file - file[!duplicated(file),]
 eclat(split(file$TRN_TYP,file$CIN))

 but it gave me the following error:
 Error in asMethod(object) : can not coerce list with transactions with
 duplicated items

 2) I ran this code:
 file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
 file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other
 columns as well, so I selecting only CIN and TRN_TYP
 file_new - file_new[!duplicated(file_new),]
 eclat(split(file_new$TRN_TYP,file_new$CIN))

 but again:
 Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
   internal error in trio library

 PLEASE HELP

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping elements of a data frame

2013-01-15 Thread Nuri Alpay Temiz
Hi everyone,

I have a question on selecting and grouping elements of a data frame. For 
example:

A.df- [ a c 0.9
 b  x 0.8
 b z 0.5
 c y 0.9
 c x 0.7
 c z 0.6]


I want to create a list of a data frame that gives me the unique values of 
column 1 of A.df so that i can create intersects. That is:

B[a]- [ c 0.9]

B[b]- [ x 0.8
 z 0.5]

B[c]- [ y 0.9
 x 0.7
 z 0.6]


B[c] n B[b] - c(x,z)


How can I accomplish this?

Thanks,
Al
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping elements of a data frame

2013-01-15 Thread David Winsemius

On Jan 15, 2013, at 9:10 AM, Nuri Alpay Temiz wrote:

 Hi everyone,
 
 I have a question on selecting and grouping elements of a data frame. For 
 example:
 
 A.df- [ a c 0.9
 b  x 0.8
 b z 0.5
 c y 0.9
 c x 0.7
 c z 0.6]

That is not R code. Matlab?, Python? 

 
 
 I want to create a list of a data frame that gives me the unique values of 
 column 1 of A.df so that i can create intersects. That is:
 
 B[a]- [ c 0.9]
 
 B[b]- [ x 0.8
 z 0.5]
 
 B[c]- [ y 0.9
 x 0.7
 z 0.6]
 
 
 B[c] n B[b] - c(x,z)
 

That's some sort of coded message? We are supposed to know what the n 
operation will do when assigned a vector?


Assuming your really do have a dataframe named B:

intersect(B$c, B$b)

Please code up examples in R in the future.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping elements of a data frame

2013-01-15 Thread arun
Hi,
Try this:
The last part was not clear.
A.df-read.table(text=
        a c 0.9
    b  x 0.8
    b z 0.5
    c y 0.9
    c x 0.7
    c z 0.6
,sep=,header=FALSE,stringsAsFactors=FALSE)
 lst1-split(A.df[,-1],A.df$V1)
lst1
#$a
#  V2  V3
#1  c 0.9
#
#$b
#  V2  V3
#2  x 0.8
#3  z 0.5
#
#$c
#  V2  V3
#4  y 0.9
#5  x 0.7
#6  z 0.6


A.K.



- Original Message -
From: Nuri Alpay Temiz alpayte...@outlook.com
To: R-help@r-project.org
Cc: 
Sent: Tuesday, January 15, 2013 12:10 PM
Subject: [R] grouping elements of a data frame

Hi everyone,

I have a question on selecting and grouping elements of a data frame. For 
example:

A.df- [ a c 0.9
             b  x 0.8
             b z 0.5
             c y 0.9
             c x 0.7
             c z 0.6]


I want to create a list of a data frame that gives me the unique values of 
column 1 of A.df so that i can create intersects. That is:

B[a]- [ c 0.9]

B[b]- [ x 0.8
             z 0.5]

B[c]- [ y 0.9
             x 0.7
             z 0.6]


B[c] n B[b] - c(x,z)


How can I accomplish this?

Thanks,
Al
                    
            
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping distances

2012-06-11 Thread Jhope
Hi R-listers, 

I am trying to group my HTL data, this is a column of data of Distances to
the HTL data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column. 

So far I have gotten this far: 

HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) { 
hope -values * weights 
return (apply(hope, 1, sum)/apply(values, 1, sum)) 
} 
write.csv(turtlehatch, HTLIndex, row.names=FALSE) 
 

But I do not seem to be able to create a new column  in a new file. 

Please advise, Jean 

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping distances

2012-06-11 Thread Jhope
Hi R-listers, 

I am trying to group my HTL data, this is a column of data of Distances to
the HTL data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column. 

So far I have gotten this far:

HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) {
hope -values * weights
return (apply(hope, 1, sum)/apply(values, 1, sum))
}
write.csv(turtlehatch, HTLIndex, row.names=FALSE)


But I do not seem to be able to create a new column  in a new file. 

Please advise, Jean



--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632984.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping distances

2012-06-11 Thread Rui Barradas

Hello,

It's easy to create a new column. Since you haven't said where nor the 
type of data structure you are using, I'll try to answer to both.

Suppose that 'x' s a matrix. Then

newcolumn - newvalues
x2 - cbind(x, newcolumn)  # new column added to x, result in x2

Suppose that 'y' is a data.frame. Then the same would do it, or

y$newcolumn - newvalues

Now, I believe that the new values come from your function. If so, you 
must assign the function value to some variable outside the function.


htlindex - HTL.index(...etc...)  # 'htlindex' is the 'newvalues' above


Two extra notes.
One, rowSums() does what your apply() instructions do.

Second, first you multiply then you divide, to give 'weights'. I think 
this is just an example, not the real function.


Hope this helps,

Rui Barradas

Em 11-06-2012 07:01, Jhope escreveu:

Hi R-listers,

I am trying to group my HTL data, this is a column of data of Distances to
the HTL data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column.

So far I have gotten this far:

HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) {
hope -values * weights
return (apply(hope, 1, sum)/apply(values, 1, sum))
}
write.csv(turtlehatch, HTLIndex, row.names=FALSE)


But I do not seem to be able to create a new column  in a new file.

Please advise, Jean

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping distances

2012-06-11 Thread Jhope
Thank you Rui, 

I am trying to create a column in the data file turtlehatch.csv

Saludos, Jean

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985p4632989.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping function

2012-05-08 Thread Geoffrey Smith
Hello, I would like to write a function that makes a grouping variable for
some panel data .  The grouping variable is made conditional on the begin
year and the end year.  Here is the code I have written so far.

name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

df - data.frame(name, begin, end);
df;

#This is the part I am stuck on;

makegroup - function(x,y) {
 group - 0
 if (x = 1990  y  1990) {group==1}
 if (x = 1991  y  1991) {group==2}
 if (x = 1992  y  1992) {group==3}
 return(x,y)
}

makegroup(df$begin,df$end);

#I am looking for output where each observation belongs to a group
conditional on the begin year and end year.  I would also like to use a for
loop for programming accuracy as well;

Thank you!  Geoff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping function

2012-05-08 Thread Sarah Goslee
Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I would like to write a function that makes a grouping variable for
 some panel data .  The grouping variable is made conditional on the begin
 year and the end year.  Here is the code I have written so far.

 name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
 begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
 end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

 df - data.frame(name, begin, end);
 df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

 #This is the part I am stuck on;

 makegroup - function(x,y) {
  group - 0
  if (x = 1990  y  1990) {group==1}
  if (x = 1991  y  1991) {group==2}
  if (x = 1992  y  1992) {group==3}
  return(x,y)
 }

 makegroup(df$begin,df$end);

 #I am looking for output where each observation belongs to a group
 conditional on the begin year and end year.  I would also like to use a for
 loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always = start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that looks like your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 - function(x,y) {
 group - numeric(length(x))
 group[x = 1990  y  1990] - 1
 group[x = 1991  y  1991] - 2
 group[x = 1992  y  1992] - 3
 group
}

makegroup2 - function(x, y) {
   ifelse(x = 1990  y  1990, 1,
  ifelse(x = 1991  y  1991, 2,
   ifelse(x = 1992  y  1992, 3, 0)))
}

 makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
 [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
 df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping function

2012-05-08 Thread Sarah Goslee
Sorry, yes: I changed it before posting it to more closely match what
the default value in the pseudocode. That's a very minor issue: the
very last value in the nested ifelse() statements is what's used by
default.

Sarah

On Tue, May 8, 2012 at 2:46 PM, arun smartpink...@yahoo.com wrote:
 HI Sarah,

 I run the same code from your reply email.  For the makegroup2, the results 
 are 0 in places of NA.

 makegroup1 - function(x,y) {
 + group - numeric(length(x))
 + group[x = 1990  y  1990] - 1
 + group[x = 1991  y  1991] - 2
 + group[x = 1992  y  1992] - 3
 + group
 + }
 makegroup2 - function(x, y) {
 +   ifelse(x = 1990  y  1990, 1,
 +   ifelse(x = 1991  y  1991, 2,
 + ifelse(x = 1992  y  1992, 3, 0)))
 + }
 makegroup1(df$begin,df$end)
  [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
  [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0


 A. K.




 - Original Message -
 From: Sarah Goslee sarah.gos...@gmail.com
 To: g...@asu.edu
 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Tuesday, May 8, 2012 2:33 PM
 Subject: Re: [R] grouping function

 Hi,

 On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I would like to write a function that makes a grouping variable for
 some panel data .  The grouping variable is made conditional on the begin
 year and the end year.  Here is the code I have written so far.

 name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
 begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
 end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

 df - data.frame(name, begin, end);
 df;

 Thanks for providing reproducible data. Two minor points: you don't
 need ; at the end of lines, and calling your data frame df is
 confusing because there's a df() function.

 #This is the part I am stuck on;

 makegroup - function(x,y) {
  group - 0
  if (x = 1990  y  1990) {group==1}
  if (x = 1991  y  1991) {group==2}
  if (x = 1992  y  1992) {group==3}
  return(x,y)
 }

 makegroup(df$begin,df$end);

 #I am looking for output where each observation belongs to a group
 conditional on the begin year and end year.  I would also like to use a for
 loop for programming accuracy as well;

 This isn't a clear specification:
 1990, 1994 for instance fits into all three groups. Do you want to
 extend this to more start years, or are you only interested in those
 three? Assuming end is always = start, you don't even need to
 consider the end years in your grouping.

 Here are two methods, one that looks like your pseudocode, and one
 that is more R-ish. They give different results because of different
 handling of cases that fit all three groups. Rearranging the
 statements in makegroup1() from broadest to most restrictive would
 make it give the same result as makegroup2().


 makegroup1 - function(x,y) {
 group - numeric(length(x))
 group[x = 1990  y  1990] - 1
 group[x = 1991  y  1991] - 2
 group[x = 1992  y  1992] - 3
 group
 }

 makegroup2 - function(x, y) {
    ifelse(x = 1990  y  1990, 1,
       ifelse(x = 1991  y  1991, 2,
       ifelse(x = 1992  y  1992, 3, 0)))
 }

 makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
 [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
 df


 But really, it's a better idea to develop an unambiguous statement of
 your desired output.

 Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping function

2012-05-08 Thread arun
HI Sarah,

I run the same code from your reply email.  For the makegroup2, the results are 
0 in places of NA.

 makegroup1 - function(x,y) {
+ group - numeric(length(x))
+ group[x = 1990  y  1990] - 1
+ group[x = 1991  y  1991] - 2
+ group[x = 1992  y  1992] - 3
+ group
+ }
 makegroup2 - function(x, y) {
+   ifelse(x = 1990  y  1990, 1,
+   ifelse(x = 1991  y  1991, 2,
+ ifelse(x = 1992  y  1992, 3, 0)))
+ }
 makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
 [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0


A. K.




- Original Message -
From: Sarah Goslee sarah.gos...@gmail.com
To: g...@asu.edu
Cc: r-help@r-project.org r-help@r-project.org
Sent: Tuesday, May 8, 2012 2:33 PM
Subject: Re: [R] grouping function

Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I would like to write a function that makes a grouping variable for
 some panel data .  The grouping variable is made conditional on the begin
 year and the end year.  Here is the code I have written so far.

 name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
 begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
 end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

 df - data.frame(name, begin, end);
 df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

 #This is the part I am stuck on;

 makegroup - function(x,y) {
  group - 0
  if (x = 1990  y  1990) {group==1}
  if (x = 1991  y  1991) {group==2}
  if (x = 1992  y  1992) {group==3}
  return(x,y)
 }

 makegroup(df$begin,df$end);

 #I am looking for output where each observation belongs to a group
 conditional on the begin year and end year.  I would also like to use a for
 loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always = start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that looks like your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 - function(x,y) {
group - numeric(length(x))
group[x = 1990  y  1990] - 1
group[x = 1991  y  1991] - 2
group[x = 1992  y  1992] - 3
group
}

makegroup2 - function(x, y) {
   ifelse(x = 1990  y  1990, 1,
      ifelse(x = 1991  y  1991, 2,
      ifelse(x = 1992  y  1992, 3, 0)))
}

 makegroup1(df$begin,df$end)
[1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
[1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
 df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-04 Thread Berend Hasselman

On 04-04-2012, at 07:15, Ashish Agarwal wrote:

 Yes. I was missing the DROP argument.
 But now the problem is splitting is causing some weird ordering of groups.

Why weird?

 See below:
 
 DF - read.table(text=
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 3,1,5,7
 , header=TRUE, sep=,)
 aa - split(DF, DF[, 1:2], drop=TRUE)
 
 Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
 preserve the ascending order?
 

Try this

aa[order(names(aa))]

Berend

 aa[3]
 $`3.1`
  Houseid Personid Tripid taz
 7   31  5   7
 aa[4]
 $`2.2`
  Houseid Personid Tripid taz
 6   22  1  58
 
 
 On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas rui1...@sapo.pt wrote:
 
 Hello,
 
 
 Ashish Agarwal wrote
 
 I have a dataframe imported from csv file below:
 
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 
 There are three groups identified based on the combination of first and
 second columns. How do I split this data frame?
 
 I tried
 aa - split(inpfil, inpfil[,1:2])
 but it has problems.
 
 Output desired is
 
 aa[1]
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 aa[2]
 Houseid,Personid,Tripid,taz
 2,1,1,96
 2,1,2,4
 2,1,3,2
 aa[3]
 Houseid,Personid,Tripid,taz
 2,2,1,58
 
  [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 Any of the following three works with me.
 
 
 DF - read.table(text=
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 , header=TRUE, sep=,)
 
 DF
 
 split(DF, DF[, 1:2], drop=TRUE)
 split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
 with(DF, split(DF, list(Houseid, Personid), drop=TRUE))
 
 The argument 'drop' defaults to FALSE. Was that the problem?
 
 Hope this helps,
 
 Rui Barrada
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-04 Thread Ashish Agarwal
Thanks a ton!
It was weird because according to me ordering should have by default.
Anyways, your workaround along with Weidong's method are both good
solutions.
On Wed, Apr 4, 2012 at 12:10 PM, Berend Hasselman b...@xs4all.nl wrote:


 On 04-04-2012, at 07:15, Ashish Agarwal wrote:

  Yes. I was missing the DROP argument.
  But now the problem is splitting is causing some weird ordering of
 groups.

 Why weird?

  See below:
 
  DF - read.table(text=
  Houseid,Personid,Tripid,taz
  1,1,1,4
  1,1,2,7
  2,1,1,96
  2,1,2,4
  2,1,3,2
  2,2,1,58
  3,1,5,7
  , header=TRUE, sep=,)
  aa - split(DF, DF[, 1:2], drop=TRUE)
 
  Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
  preserve the ascending order?
 

 Try this

 aa[order(names(aa))]

 Berend

  aa[3]
  $`3.1`
   Houseid Personid Tripid taz
  7   31  5   7
  aa[4]
  $`2.2`
   Houseid Personid Tripid taz
  6   22  1  58


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping

2012-04-03 Thread Val
Hi all,

Assume that I have the following 10 data points.
 x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.  The first
two groups  will have three observation  and the third will have four
observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234

Can anyone help me out?

In SAS I used to do it using proc rank.

thanks in advance

Val

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David Winsemius


On Apr 3, 2012, at 8:47 AM, Val wrote:


Hi all,

Assume that I have the following 10 data points.
x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
 y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)


The methods below do not require a sorting step.



I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.  The  
first

two groups  will have three observation  and the third will have four
observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234


I hope those weren't answers from SAS.



Can anyone help me out?



I usually do this with Hmisc::cut2 since it has a `g = n` parameter  
that auto-magically calls the quantile splitting criterion but this is  
done in base R.


split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) )

$`[36,65.9]`
[1] 36 45 46

$`(65.9,189]`
[1]  66  78 125

$`(189,297]`
[1] 193 209 242 297


 lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) ), mean)

$`[36,65.9]`
[1] 42.3

$`(65.9,189]`
[1] 89.7

$`(189,297]`
[1] 235.25

Or to get a table instead of a list:
 tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) , mean)

 [36,65.9] (65.9,189]  (189,297]
  42.3   89.7  235.25000


In SAS I used to do it using proc rank.


?quantile isn't equivalent to  Proc Rank but it will provide a useful  
basis for splitting or tabling functions.




thanks in advance

Val

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread R. Michael Weylandt
Ignoring the fact your desired answers are wrong, I'd split the
separating part and the group means parts into three steps:

i) quantile() can help you get the split points,
ii)  findInterval() can assign each y to a group
iii) then ave() or tapply() will do group-wise means

Something like:

y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here.
ave(y, findInterval(y, quantile(y, c(0.33, 0.66
tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

You could also use cut2 from the Hmisc package to combine findInterval
and quantile into a single step.

Depending on your desired output.

Hope that helps,
Michael

On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:
 Hi all,

 Assume that I have the following 10 data points.
  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

 sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

 I want to  group the sorted  data point (y)  into  equal number of
 observation per group. In this case there will be three groups.  The first
 two groups  will have three observation  and the third will have four
 observations

 group 1  = 34, 45, 46
 group 2  = 66, 78, 125
 group 3  = 193, 209, 242,297

 Finally I want to calculate the group mean

 group 1  =  42
 group 2  =  87
 group 3  =  234

 Can anyone help me out?

 In SAS I used to do it using proc rank.

 thanks in advance

 Val

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Giovanni Petris
Probably something along the following lines:

 x - c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
 sorted - c(36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
 tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean)
0 1 2 3 
 42.3  89.7 214.7 297.0 

Hope this helps,
Giovanni

On Tue, 2012-04-03 at 08:47 -0400, Val wrote:
 Hi all,
 
 Assume that I have the following 10 data points.
  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
 
 sort x  and get the following
   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
 
 I want to  group the sorted  data point (y)  into  equal number of
 observation per group. In this case there will be three groups.  The first
 two groups  will have three observation  and the third will have four
 observations
 
 group 1  = 34, 45, 46
 group 2  = 66, 78, 125
 group 3  = 193, 209, 242,297
 
 Finally I want to calculate the group mean
 
 group 1  =  42
 group 2  =  87
 group 3  =  234
 
 Can anyone help me out?
 
 In SAS I used to do it using proc rank.
 
 thanks in advance
 
 Val
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 

Giovanni Petris  gpet...@uark.edu
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread K. Elo

Hi!

Maybe not the most elegant solution, but works:

for(i in seq(1,length(data)-(length(data) %% 3), 3)) { 
ifelse((length(data)-i)3, { print(sort(data)[ c(i:(i+2)) ]); 
print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[ 
c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) }


Produces:

[1] 36 45 46
[1] 42.3
[1]  66  78 125
[1] 89.7
[1] 193 209 242 297
[1] 235.25

HTH,
Kimmo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
Thank you all (David, Michael, Giovanni)  for your prompt response.

First there was a typo error for the group mean it was 89.6 not 87.

For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
to group in to three groups in this case. However,  if I want to extend the
number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

Is there a short cut for that?


Thanks











On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt 
michael.weyla...@gmail.com wrote:

 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:
  Hi all,
 
  Assume that I have the following 10 data points.
   x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
 
  sort x  and get the following
   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
 
  I want to  group the sorted  data point (y)  into  equal number of
  observation per group. In this case there will be three groups.  The
 first
  two groups  will have three observation  and the third will have four
  observations
 
  group 1  = 34, 45, 46
  group 2  = 66, 78, 125
  group 3  = 193, 209, 242,297
 
  Finally I want to calculate the group mean
 
  group 1  =  42
  group 2  =  87
  group 3  =  234
 
  Can anyone help me out?
 
  In SAS I used to do it using proc rank.
 
  thanks in advance
 
  Val
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread R. Michael Weylandt
Use cut2 as I suggested and David demonstrated.

Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote:
 Thank you all (David, Michael, Giovanni)  for your prompt response.

 First there was a typo error for the group mean it was 89.6 not 87.

 For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1) to
 group in to three groups in this case. However,  if I want to extend the
 number of groupings say 10 or 15 then do I have to figure it out the
   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

 Is there a short cut for that?


 Thanks











 On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:

 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:
  Hi all,
 
  Assume that I have the following 10 data points.
   x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
 
  sort x  and get the following
   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
 
  I want to  group the sorted  data point (y)  into  equal number of
  observation per group. In this case there will be three groups.  The
  first
  two groups  will have three observation  and the third will have four
  observations
 
  group 1  = 34, 45, 46
  group 2  = 66, 78, 125
  group 3  = 193, 209, 242,297
 
  Finally I want to calculate the group mean
 
  group 1  =  42
  group 2  =  87
  group 3  =  234
 
  Can anyone help me out?
 
  In SAS I used to do it using proc rank.
 
  thanks in advance
 
  Val
 
         [[alternative HTML version deleted]]

 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Petr Savicky
On Tue, Apr 03, 2012 at 09:31:29AM -0400, Val wrote:
 Thank you all (David, Michael, Giovanni)  for your prompt response.
 
 First there was a typo error for the group mean it was 89.6 not 87.
 
 For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
 to group in to three groups in this case. However,  if I want to extend the
 number of groupings say 10 or 15 then do I have to figure it out the
   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
 
 Is there a short cut for that?

Hi.

There may be better ways for the whole task, but specifically
c(0, .333, .66 ,1) can be obtained as

  seq(0, 1, length=3+1)

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David Winsemius


On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:


Use cut2 as I suggested and David demonstrated.


Agree that Hmisc::cut2 is extremely handy and I also like that fact  
that the closed ends of intervals are on the left side (which is not  
the same behavior as cut()), which has the otehr effect of setting  
include.lowest = TRUE which is not the default for cut() either (to my  
continued amazement).


But let me add the method I use when doing it by hand:

cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)

--
David.




Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote:

Thank you all (David, Michael, Giovanni)  for your prompt response.

First there was a typo error for the group mean it was 89.6 not 87.

For a small data set and few groupings I can use  prob=c(0, .333, . 
66 ,1) to
group in to three groups in this case. However,  if I want to  
extend the

number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

Is there a short cut for that?


Thanks











On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:


Ignoring the fact your desired answers are wrong, I'd split the
separating part and the group means parts into three steps:

i) quantile() can help you get the split points,
ii)  findInterval() can assign each y to a group
iii) then ave() or tapply() will do group-wise means

Something like:

y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a  
c here.

ave(y, findInterval(y, quantile(y, c(0.33, 0.66
tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

You could also use cut2 from the Hmisc package to combine  
findInterval

and quantile into a single step.

Depending on your desired output.

Hope that helps,
Michael

On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:

Hi all,

Assume that I have the following 10 data points.
 x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
 y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.   
The

first
two groups  will have three observation  and the third will have  
four

observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234

Can anyone help me out?

In SAS I used to do it using proc rank.

thanks in advance

Val

   [[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David L Carlson
Or just replace c(0, .333, .667, 1) with 

n - 10
split(x, cut(x, quantile(x, prob= c(0, 1:(n-1)/n, 1)), include.lowest=TRUE))

where n is the number of groups you want.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of R. Michael Weylandt
Sent: Tuesday, April 03, 2012 8:32 AM
To: Val
Cc: r-help@r-project.org
Subject: Re: [R] grouping

Use cut2 as I suggested and David demonstrated.

Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote:
 Thank you all (David, Michael, Giovanni)  for your prompt response.

 First there was a typo error for the group mean it was 89.6 not 87.

 For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
to
 group in to three groups in this case. However,  if I want to extend the
 number of groupings say 10 or 15 then do I have to figure it out the
   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

 Is there a short cut for that?


 Thanks











 On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:

 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c
here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:
  Hi all,
 
  Assume that I have the following 10 data points.
   x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
 
  sort x  and get the following
   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
 
  I want to  group the sorted  data point (y)  into  equal number of
  observation per group. In this case there will be three groups.  The
  first
  two groups  will have three observation  and the third will have four
  observations
 
  group 1  = 34, 45, 46
  group 2  = 66, 78, 125
  group 3  = 193, 209, 242,297
 
  Finally I want to calculate the group mean
 
  group 1  =  42
  group 2  =  87
  group 3  =  234
 
  Can anyone help me out?
 
  In SAS I used to do it using proc rank.
 
  thanks in advance
 
  Val
 
         [[alternative HTML version deleted]]

 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
David W and all,

Thank you very much for your help.

Here is the final output that I want in the form of data frame. The data
frame should contain  x, group and group_ mean in the following way

x   group   group mean
46   142.3
125 289.6
36   142.3
193 3235.25
209 3235.25
78   289.6
66   289.6
242 3235.25
297 3235.25
45   142.3

Thanks a lot








On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:

  Use cut2 as I suggested and David demonstrated.


 Agree that Hmisc::cut2 is extremely handy and I also like that fact that
 the closed ends of intervals are on the left side (which is not the same
 behavior as cut()), which has the otehr effect of setting include.lowest =
 TRUE which is not the default for cut() either (to my continued amazement).

 But let me add the method I use when doing it by hand:

 cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)

 --
 David.




 Michael

 On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote:

 Thank you all (David, Michael, Giovanni)  for your prompt response.

 First there was a typo error for the group mean it was 89.6 not 87.

 For a small data set and few groupings I can use  prob=c(0, .333, .66
 ,1) to
 group in to three groups in this case. However,  if I want to extend the
 number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

 Is there a short cut for that?


 Thanks











 On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:


 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c
 here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:

 Hi all,

 Assume that I have the following 10 data points.
  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

 sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

 I want to  group the sorted  data point (y)  into  equal number of
 observation per group. In this case there will be three groups.  The
 first
 two groups  will have three observation  and the third will have four
 observations

 group 1  = 34, 45, 46
 group 2  = 66, 78, 125
 group 3  = 193, 209, 242,297

 Finally I want to calculate the group mean

 group 1  =  42
 group 2  =  87
 group 3  =  234

 Can anyone help me out?

 In SAS I used to do it using proc rank.

 thanks in advance

 Val

   [[alternative HTML version deleted]]



 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David Winsemius

On Apr 3, 2012, at 10:11 AM, Val wrote:

 David W and all,

 Thank you very much for your help.

 Here is the final output that I want in the form of data frame. The  
 data frame should contain  x, group and group_ mean in the following  
 way

 x   group   group mean
 46   142.3
 125 289.6
 36   142.3
 193 3235.25
 209 3235.25
 78   289.6
 66   289.6
 242 3235.25
 297 3235.25
 45   142.3

I you want group means in a vector the same length as x then instead  
of using tapply as done in earlier solutions you should use `ave`.

-- 
DW



 Thanks a lot








 On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius dwinsem...@comcast.net 
  wrote:

 On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:

 Use cut2 as I suggested and David demonstrated.

 Agree that Hmisc::cut2 is extremely handy and I also like that fact  
 that the closed ends of intervals are on the left side (which is not  
 the same behavior as cut()), which has the otehr effect of setting  
 include.lowest = TRUE which is not the default for cut() either (to  
 my continued amazement).

 But let me add the method I use when doing it by hand:

 cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)),  
 include.lowest=TRUE)

 -- 
 David.




 Michael

 On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote:
 Thank you all (David, Michael, Giovanni)  for your prompt response.

 First there was a typo error for the group mean it was 89.6 not 87.

 For a small data set and few groupings I can use  prob=c(0, .333, . 
 66 ,1) to
 group in to three groups in this case. However,  if I want to extend  
 the
 number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

 Is there a short cut for that?


 Thanks











 On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:

 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c  
 here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:
 Hi all,

 Assume that I have the following 10 data points.
  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

 sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

 I want to  group the sorted  data point (y)  into  equal number of
 observation per group. In this case there will be three groups.  The
 first
 two groups  will have three observation  and the third will have four
 observations

 group 1  = 34, 45, 46
 group 2  = 66, 78, 125
 group 3  = 193, 209, 242,297

 Finally I want to calculate the group mean

 group 1  =  42
 group 2  =  87
 group 3  =  234

 Can anyone help me out?

 In SAS I used to do it using proc rank.

 thanks in advance

 Val

   [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT



David Winsemius, MD
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
Hi All,

On the same data  points
x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

I want to have have the following output  as data frame

x   group   group mean
46   142.3
125 289.6
36   142.3
193 3235.25
209 3235.25
78   289.6
66   289.6
242 3235.25
297 3235.25
45   142.3

I tried the following code


dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
gxc - with(dat, tapply(xc, group, mean))
dat$gxc - gxce[as.character(dat$group)]
txc=dat$gxc

it did not work for me.













On Tue, Apr 3, 2012 at 10:15 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 3, 2012, at 10:11 AM, Val wrote:

 David W and all,

 Thank you very much for your help.

 Here is the final output that I want in the form of data frame. The data
 frame should contain  x, group and group_ mean in the following way

 x   group   group mean
 46   142.3
 125 289.6
 36   142.3
 193 3235.25
 209 3235.25
 78   289.6
 66   289.6
 242 3235.25
 297 3235.25
 45   142.3


 I you want group means in a vector the same length as x then instead of
 using tapply as done in earlier solutions you should use `ave`.

 --
 DW



 Thanks a lot








 On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:

  Use cut2 as I suggested and David demonstrated.


 Agree that Hmisc::cut2 is extremely handy and I also like that fact that
 the closed ends of intervals are on the left side (which is not the same
 behavior as cut()), which has the otehr effect of setting include.lowest =
 TRUE which is not the default for cut() either (to my continued amazement).

 But let me add the method I use when doing it by hand:

 cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)

 --
 David.




 Michael

 On Tue, Apr 3, 2012 at 9:31 AM, Val valkr...@gmail.com wrote:

 Thank you all (David, Michael, Giovanni)  for your prompt response.

 First there was a typo error for the group mean it was 89.6 not 87.

 For a small data set and few groupings I can use  prob=c(0, .333, .66
 ,1) to
 group in to three groups in this case. However,  if I want to extend the
 number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

 Is there a short cut for that?


 Thanks











 On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
 michael.weyla...@gmail.com wrote:


 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y - c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a c
 here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val valkr...@gmail.com wrote:

 Hi all,

 Assume that I have the following 10 data points.
  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

 sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

 I want to  group the sorted  data point (y)  into  equal number of
 observation per group. In this case there will be three groups.  The
 first
 two groups  will have three observation  and the third will have four
 observations

 group 1  = 34, 45, 46
 group 2  = 66, 78, 125
 group 3  = 193, 209, 242,297

 Finally I want to calculate the group mean

 group 1  =  42
 group 2  =  87
 group 3  =  234

 Can anyone help me out?

 In SAS I used to do it using proc rank.

 thanks in advance

 Val

   [[alternative HTML version deleted]]



 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



 David Winsemius, MD
 West Hartford, CT




Re: [R] grouping

2012-04-03 Thread Petr Savicky
On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
 Hi All,
 
 On the same data  points
 x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
 
 I want to have have the following output  as data frame
 
 x   group   group mean
 46   142.3
 125 289.6
 36   142.3
 193 3235.25
 209 3235.25
 78   289.6
 66   289.6
 242 3235.25
 297 3235.25
 45   142.3
 
 I tried the following code
 
 
 dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
 gxc - with(dat, tapply(xc, group, mean))
 dat$gxc - gxce[as.character(dat$group)]
 txc=dat$gxc
 
 it did not work for me.

David Winsemius suggested to use ave(), when you asked this
question for the first time. Can you have look at it?

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
I did look at it the result  is below,

x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

#lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
  ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)

 ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
$`[36,74]`
[1] NA

$`(74,197]`
[1] NA

$`(197,297]`
[1] NA

There were 11 warnings (use warnings() to see them)





On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky savi...@cs.cas.cz wrote:

 On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
  Hi All,
 
  On the same data  points
  x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
 
  I want to have have the following output  as data frame
 
  x   group   group mean
  46   142.3
  125 289.6
  36   142.3
  193 3235.25
  209 3235.25
  78   289.6
  66   289.6
  242 3235.25
  297 3235.25
  45   142.3
 
  I tried the following code
 
 
  dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
 ,1
  gxc - with(dat, tapply(xc, group, mean))
  dat$gxc - gxce[as.character(dat$group)]
  txc=dat$gxc
 
  it did not work for me.

 David Winsemius suggested to use ave(), when you asked this
 question for the first time. Can you have look at it?

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman b...@xs4all.nl wrote:


 On 03-04-2012, at 20:21, Val wrote:

  Hi All,
 
  On the same data  points
  x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
 
  I want to have have the following output  as data frame
 
  x   group   group mean
  46   142.3
  125 289.6
  36   142.3
  193 3235.25
  209 3235.25
  78   289.6
  66   289.6
  242 3235.25
  297 3235.25
  45   142.3
 
  I tried the following code
 
 
  dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
 ,1
  gxc - with(dat, tapply(xc, group, mean))
  dat$gxc - gxce[as.character(dat$group)]
  txc=dat$gxc
 
  it did not work for me.
 

 I'm not surprised.

 In the line dat - there are 5 opening parentheses and 4 closing )'s.
 In the line dat$gxc - you reference an object gxce. Where was it created?

 So I tried this

  dat - data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333,
 .66 ,1)), all.inside=TRUE))
  dat$gmean - ave(dat$x, as.factor(dat$group))
  dat
 x group gmean
 1   46 1  42.3
 2  125 2  89.7
 3   36 1  42.3
 4  193 3 235.25000
 5  209 3 235.25000
 6   78 2  89.7
 7   66 2  89.7
 8  242 3 235.25000
 9  297 3 235.25000
 10  45 1  42.3


Thank you very much. It is working now.  there  was a type error on
gxce. But in the  r-code it was correct,  gxc..




 Berend



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Berend Hasselman

On 03-04-2012, at 21:02, Val wrote:

 
 
 On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman b...@xs4all.nl wrote:
 
 On 03-04-2012, at 20:21, Val wrote:
 
  Hi All,
 
  On the same data  points
  x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
 
  I want to have have the following output  as data frame
 
  x   group   group mean
  46   142.3
  125 289.6
  36   142.3
  193 3235.25
  209 3235.25
  78   289.6
  66   289.6
  242 3235.25
  297 3235.25
  45   142.3
 
  I tried the following code
 
 
  dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
  gxc - with(dat, tapply(xc, group, mean))
  dat$gxc - gxce[as.character(dat$group)]
  txc=dat$gxc
 
  it did not work for me.
 
 
 I'm not surprised.
 
 In the line dat - there are 5 opening parentheses and 4 closing )'s.
 In the line dat$gxc - you reference an object gxce. Where was it created?
 
 So I tried this
 
  dat - data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 
  ,1)), all.inside=TRUE))
  dat$gmean - ave(dat$x, as.factor(dat$group))

And the as.factor is not necessary. This will do

dat$gmean - ave(dat$x, dat$group)

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Berend Hasselman

On 03-04-2012, at 20:21, Val wrote:

 Hi All,
 
 On the same data  points
 x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
 
 I want to have have the following output  as data frame
 
 x   group   group mean
 46   142.3
 125 289.6
 36   142.3
 193 3235.25
 209 3235.25
 78   289.6
 66   289.6
 242 3235.25
 297 3235.25
 45   142.3
 
 I tried the following code
 
 
 dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
 gxc - with(dat, tapply(xc, group, mean))
 dat$gxc - gxce[as.character(dat$group)]
 txc=dat$gxc
 
 it did not work for me.
 

I'm not surprised.

In the line dat - there are 5 opening parentheses and 4 closing )'s.
In the line dat$gxc - you reference an object gxce. Where was it created?

So I tried this

 dat - data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 
 ,1)), all.inside=TRUE))
 dat$gmean - ave(dat$x, as.factor(dat$group))
 dat
 x group gmean
1   46 1  42.3
2  125 2  89.7
3   36 1  42.3
4  193 3 235.25000
5  209 3 235.25000
6   78 2  89.7
7   66 2  89.7
8  242 3 235.25000
9  297 3 235.25000
10  45 1  42.3

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread R. Michael Weylandt
Please take a look at my first reply to you:

ave(y, findInterval(y, quantile(y, c(0.33, 0.66

Then read ?ave for an explanation of the syntax. ave takes two
vectors, the first being the data to be averaged, the second being an
index to split by. You don't want to use split() here.

Michael

On Tue, Apr 3, 2012 at 2:50 PM, Val valkr...@gmail.com wrote:
 I did look at it the result  is below,

 x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

 #lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
 include.lowest=TRUE) ), mean)
  ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
 include.lowest=TRUE) ), mean)

 ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
 include.lowest=TRUE) ), mean)
 $`[36,74]`
 [1] NA

 $`(74,197]`
 [1] NA

 $`(197,297]`
 [1] NA

 There were 11 warnings (use warnings() to see them)





 On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky savi...@cs.cas.cz wrote:

 On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
  Hi All,
 
  On the same data  points
  x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
 
  I want to have have the following output  as data frame
 
  x       group   group mean
  46       1        42.3
  125     2        89.6
  36       1        42.3
  193     3        235.25
  209     3        235.25
  78       2        89.6
  66       2        89.6
  242     3        235.25
  297     3        235.25
  45       1        42.3
 
  I tried the following code
 
 
  dat - data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
 ,1
  gxc - with(dat, tapply(xc, group, mean))
  dat$gxc - gxce[as.character(dat$group)]
  txc=dat$gxc
 
  it did not work for me.

 David Winsemius suggested to use ave(), when you asked this
 question for the first time. Can you have look at it?

 Petr Savicky.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping and/or splitting

2012-04-03 Thread Ashish Agarwal
I have a dataframe imported from csv file below:

Houseid,Personid,Tripid,taz
1,1,1,4
1,1,2,7
2,1,1,96
2,1,2,4
2,1,3,2
2,2,1,58

There are three groups identified based on the combination of first and
second columns. How do I split this data frame?

I tried
aa - split(inpfil, inpfil[,1:2])
but it has problems.

Output desired is

aa[1]
 Houseid,Personid,Tripid,taz
1,1,1,4
1,1,2,7
aa[2]
 Houseid,Personid,Tripid,taz
2,1,1,96
2,1,2,4
2,1,3,2
aa[3]
 Houseid,Personid,Tripid,taz
2,2,1,58

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Weidong Gu
how about

split(inpfil, paste(inpfil[,1],inpfil[,2],sep=','))

Weidong Gu

On Tue, Apr 3, 2012 at 6:42 PM, Ashish Agarwal
ashish.agarw...@gmail.com wrote:
 I have a dataframe imported from csv file below:

 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58

 There are three groups identified based on the combination of first and
 second columns. How do I split this data frame?

 I tried
 aa - split(inpfil, inpfil[,1:2])
 but it has problems.

 Output desired is

 aa[1]
  Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 aa[2]
  Houseid,Personid,Tripid,taz
 2,1,1,96
 2,1,2,4
 2,1,3,2
 aa[3]
  Houseid,Personid,Tripid,taz
 2,2,1,58

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Rui Barradas
Hello,


Ashish Agarwal wrote
 
 I have a dataframe imported from csv file below:
 
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 
 There are three groups identified based on the combination of first and
 second columns. How do I split this data frame?
 
 I tried
 aa - split(inpfil, inpfil[,1:2])
 but it has problems.
 
 Output desired is
 
 aa[1]
  Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 aa[2]
  Houseid,Personid,Tripid,taz
 2,1,1,96
 2,1,2,4
 2,1,3,2
 aa[3]
  Houseid,Personid,Tripid,taz
 2,2,1,58
 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Any of the following three works with me.


DF - read.table(text=
Houseid,Personid,Tripid,taz
1,1,1,4
1,1,2,7
2,1,1,96
2,1,2,4
2,1,3,2
2,2,1,58 
, header=TRUE, sep=,)

DF

split(DF, DF[, 1:2], drop=TRUE)
split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
with(DF, split(DF, list(Houseid, Personid), drop=TRUE))

The argument 'drop' defaults to FALSE. Was that the problem?

Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-and-or-splitting-tp4530410p4530624.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Ashish Agarwal
Yes. I was missing the DROP argument.
But now the problem is splitting is causing some weird ordering of groups.
See below:

DF - read.table(text=
Houseid,Personid,Tripid,taz
1,1,1,4
1,1,2,7
2,1,1,96
2,1,2,4
2,1,3,2
2,2,1,58
3,1,5,7
, header=TRUE, sep=,)
aa - split(DF, DF[, 1:2], drop=TRUE)

Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
preserve the ascending order?

 aa[3]
$`3.1`
  Houseid Personid Tripid taz
7   31  5   7
 aa[4]
$`2.2`
  Houseid Personid Tripid taz
6   22  1  58


On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas rui1...@sapo.pt wrote:

 Hello,


 Ashish Agarwal wrote
  
  I have a dataframe imported from csv file below:
 
  Houseid,Personid,Tripid,taz
  1,1,1,4
  1,1,2,7
  2,1,1,96
  2,1,2,4
  2,1,3,2
  2,2,1,58
 
  There are three groups identified based on the combination of first and
  second columns. How do I split this data frame?
 
  I tried
  aa - split(inpfil, inpfil[,1:2])
  but it has problems.
 
  Output desired is
 
  aa[1]
   Houseid,Personid,Tripid,taz
  1,1,1,4
  1,1,2,7
  aa[2]
   Houseid,Personid,Tripid,taz
  2,1,1,96
  2,1,2,4
  2,1,3,2
  aa[3]
   Houseid,Personid,Tripid,taz
  2,2,1,58
 
[[alternative HTML version deleted]]
 
  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 Any of the following three works with me.


 DF - read.table(text=
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 , header=TRUE, sep=,)

 DF

 split(DF, DF[, 1:2], drop=TRUE)
 split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
 with(DF, split(DF, list(Houseid, Personid), drop=TRUE))

 The argument 'drop' defaults to FALSE. Was that the problem?

 Hope this helps,

 Rui Barrada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping together a time variable

2012-02-09 Thread Abraham Mathew
I have the following variable, time, which is a character variable and it's
structured as follows.

 head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 
 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 
 00:03:22
[12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09
00:05:17 00:05:19 00:05:21 00:05:22 00:05:22
[23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10
00:08:15 00:08:26


What I am trying to do is group the data into one hour increment. So
5:01-6:00am, 6:01-7:00am, 7:01-8:00a,
and so forth.

However, I'm not sure if there's a simple route to do this in R or how to
do it.
Can anyone point me in the right direction?

-- 
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping together a time variable

2012-02-09 Thread R. Michael Weylandt
Perhaps cut.POSIXt (which is a generic so you can just call cut)
depending on the unstated form of your time object.

Michael

On Thu, Feb 9, 2012 at 12:15 PM, Abraham Mathew abmathe...@gmail.com wrote:
 I have the following variable, time, which is a character variable and it's
 structured as follows.

 head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 
 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 
 00:03:22
 [12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09
 00:05:17 00:05:19 00:05:21 00:05:22 00:05:22
 [23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10
 00:08:15 00:08:26


 What I am trying to do is group the data into one hour increment. So
 5:01-6:00am, 6:01-7:00am, 7:01-8:00a,
 and so forth.

 However, I'm not sure if there's a simple route to do this in R or how to
 do it.
 Can anyone point me in the right direction?

 --
 *Abraham Mathew
 Statistical Analyst
 www.amathew.com
 720-648-0108
 @abmathewks*

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping miliseconds By Hours

2012-02-05 Thread Hasan Diwan
I have a list of numbers corresponding to timestamps, a sample of which follows:
c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527,
1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588,
1328213951, 1328236836, 1328300276, 1328335936, 1328429102)

I would like to group these into hours. In other words, something like:
c( 2012-01-31 21:14:26 PST 2012-02-01 20:25:21 PST
 2012-02-02 04:39:48 PST 2012-02-02 20:19:11 PST
2012-02-03 02:40:36 PST 2012-02-03 20:17:56 PST
2012-02-04 06:12:16 PST 2012-02-05 08:05:02 PST)
Hour  Hits
21  1
20  3
41
21
61
81

How would I do this without too much pain (from a CPU perspective)?
This is a subset of a million entries and I would rather not go
through these manually... So, any advice? Many thanks! -- H
--
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping miliseconds By Hours

2012-02-05 Thread jim holtman
Is this what you are after:

 x - c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
+ 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527,
+ 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588,
+ 1328213951, 1328236836, 1328300276, 1328335936, 1328429102)

 x - as.POSIXct(x, origin = '1970-1-1')
 x
 [1] 2012-01-22 05:49:18 EST 2012-01-22 08:46:39 EST 2012-01-25
21:34:56 EST
 [4] 2012-01-26 05:23:53 EST 2012-01-27 21:50:42 EST 2012-01-28
14:36:29 EST
 [7] 2012-01-28 20:03:13 EST 2012-01-29 05:41:10 EST 2012-01-29
07:42:44 EST
[10] 2012-01-30 04:24:57 EST 2012-01-30 04:25:27 EST 2012-01-30
15:24:32 EST
[13] 2012-01-30 15:45:00 EST 2012-01-30 21:06:29 EST 2012-01-31
21:14:26 EST
[16] 2012-02-01 20:25:21 EST 2012-02-02 04:39:48 EST 2012-02-02
20:19:11 EST
[19] 2012-02-03 02:40:36 EST 2012-02-03 20:17:56 EST 2012-02-04
06:12:16 EST
[22] 2012-02-05 08:05:02 EST
 table(format(x, %H))

02 04 05 06 07 08 14 15 20 21
 1  3  3  1  1  2  1  2  4  4




On Sun, Feb 5, 2012 at 4:54 AM, Hasan Diwan hasan.di...@gmail.com wrote:
 I have a list of numbers corresponding to timestamps, a sample of which 
 follows:
 c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527,
 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588,
 1328213951, 1328236836, 1328300276, 1328335936, 1328429102)

 I would like to group these into hours. In other words, something like:
 c( 2012-01-31 21:14:26 PST 2012-02-01 20:25:21 PST
  2012-02-02 04:39:48 PST 2012-02-02 20:19:11 PST
 2012-02-03 02:40:36 PST 2012-02-03 20:17:56 PST
 2012-02-04 06:12:16 PST 2012-02-05 08:05:02 PST)
 Hour  Hits
 21      1
 20      3
 4        1
 2        1
 6        1
 8        1

 How would I do this without too much pain (from a CPU perspective)?
 This is a subset of a million entries and I would rather not go
 through these manually... So, any advice? Many thanks! -- H
 --
 Sent from my mobile device
 Envoyait de mon portable

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping miliseconds By Hours

2012-02-05 Thread David Winsemius


On Feb 5, 2012, at 9:54 AM, jim holtman wrote:


Is this what you are after:


x - c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
+ 1327761389, 1327780993, 1327815670, 1327822964, 1327897497,  
1327897527,
+ 1327937072, 1327938300, 1327957589, 1328044466, 1328127921,  
1328157588,

+ 1328213951, 1328236836, 1328300276, 1328335936, 1328429102)


x - as.POSIXct(x, origin = '1970-1-1')
x

[1] 2012-01-22 05:49:18 EST 2012-01-22 08:46:39 EST 2012-01-25
21:34:56 EST
[4] 2012-01-26 05:23:53 EST 2012-01-27 21:50:42 EST 2012-01-28
14:36:29 EST
[7] 2012-01-28 20:03:13 EST 2012-01-29 05:41:10 EST 2012-01-29
07:42:44 EST
[10] 2012-01-30 04:24:57 EST 2012-01-30 04:25:27 EST 2012-01-30
15:24:32 EST
[13] 2012-01-30 15:45:00 EST 2012-01-30 21:06:29 EST 2012-01-31
21:14:26 EST
[16] 2012-02-01 20:25:21 EST 2012-02-02 04:39:48 EST 2012-02-02
20:19:11 EST
[19] 2012-02-03 02:40:36 EST 2012-02-03 20:17:56 EST 2012-02-04
06:12:16 EST
[22] 2012-02-05 08:05:02 EST

table(format(x, %H))


02 04 05 06 07 08 14 15 20 21
1  3  3  1  1  2  1  2  4  4


It's possible that you may not realize that jim holman has implicitly  
given you a handle on doing operations on such groups, since you could  
use the value of format(x. %H) as the indexing argument in tapply,  
ave, or aggregate.


--
David.








On Sun, Feb 5, 2012 at 4:54 AM, Hasan Diwan hasan.di...@gmail.com  
wrote:
I have a list of numbers corresponding to timestamps, a sample of  
which follows:

c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
1327761389, 1327780993, 1327815670, 1327822964, 1327897497,  
1327897527,
1327937072, 1327938300, 1327957589, 1328044466, 1328127921,  
1328157588,

1328213951, 1328236836, 1328300276, 1328335936, 1328429102)

I would like to group these into hours. In other words, something  
like:

c( 2012-01-31 21:14:26 PST 2012-02-01 20:25:21 PST
 2012-02-02 04:39:48 PST 2012-02-02 20:19:11 PST
2012-02-03 02:40:36 PST 2012-02-03 20:17:56 PST
2012-02-04 06:12:16 PST 2012-02-05 08:05:02 PST)
Hour  Hits
21  1
20  3
41
21
61
81

How would I do this without too much pain (from a CPU perspective)?
This is a subset of a million entries and I would rather not go
through these manually... So, any advice? Many thanks! -- H
--
Sent from my mobile device
Envoyait de mon portable

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping clusters from dendrograms

2011-11-03 Thread plangfelder
Hi Julia, 

sorry for the very late reply, your original email was posted while I was on
hiatus from R-help. I'm the author of the dynamicTreeCut package. I
recommend that you try using the hybrid method using the cutreeDynamic
function. What you observed is a known problem of the tree method (which, by
the way, was the reason I developed the Hybrid method). 

Using the hybrid method is simple, for example as 

cut2-cutreeDynamic(dendro,distM = combo2,
maxTreeHeight=1,deepSplit=2,minModuleSize=1)

You can play with the argument deepSplit to obtain finer or coarser modules.

HTH,

Peter 

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-clusters-from-dendrograms-tp2316521p3988526.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping variables in a data frame

2011-08-27 Thread Liviu Andronic
On Sat, Aug 27, 2011 at 7:26 AM, Andra Isan andra_i...@yahoo.com wrote:
 Hi All,

 I have a data frame as follow:

 user_id time age location gender
 .

 and I learn a logistic regression to learn the weights (glm with family= 
 (link = logit))), my response value is either zero or one. I would like to 
 group the users based on user_id and time and see the y values and predicted 
 y values at the same time. Or plot them some how. Is there any way to somehow 
 group them together so that I can learn more about my data by grouping them?

It's very difficult to help you because you haven't followed the
posting guide. But I suspect you're looking for the following:

 require(plyr)
Loading required package: plyr
 data(mtcars)
 ##considering 'gear' as 'id' and 'carb' as time
 ddply(mtcars, .(gear, carb), function(x) mean(x$hp))
   gear carbV1
1 31 104.0
2 32 162.5
3 33 180.0
4 34 228.0
5 41  72.5
6 42  79.5
7 44 116.5
8 52 102.0
9 54 264.0
1056 175.0
1158 335.0

This will compute the mean of 'hp' for each group of id  time.
Liviu


 I would like to get these at the end
 user_id time y predicted_y

 Thanks a lot,
 Andra

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping variables in a data frame

2011-08-26 Thread Andra Isan
Hi All, 

I have a data frame as follow:

user_id time age location gender 
.

and I learn a logistic regression to learn the weights (glm with family= (link 
= logit))), my response value is either zero or one. I would like to group the 
users based on user_id and time and see the y values and predicted y values at 
the same time. Or plot them some how. Is there any way to somehow group them 
together so that I can learn more about my data by grouping them?

I would like to get these at the end
user_id time y predicted_y

Thanks a lot,
Andra

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping columns

2011-07-21 Thread Geophagus
Hi @ all,
both possibilities are working very fine.
Thanks a lot for the fast help!

Best Greetinx from the Earth Eater Geophagus 

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3683076.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping data

2011-07-20 Thread Dieter Menne

adolfpf wrote:
 
 How do I group my data in dolf the same way the data Orthodont are
 grouped.
 
 show(dolf)
distance   age Subjectt Sex
 16.83679 22.01   F1   F
 26.63245 23.04   F1   F
 3   11.58730 39.26   M2   M
 
 

I know that many sample in that excellent book use grouped data, but the
concept of grouped data is more confusing than helpful. I only got started
using nlme/lme when I realized that everything could be done without grouped
data. Too bad, many examples in Pinheiro/Bates rely on the concept (but no
longer do in the coing lme4).

So I suggest that you try to solve the problem with vanilla data frames
instead of grouped ones. In most cases, it only means that you have to put
the formula into the lme(..) call instead of relying on some hidden
defaults.

Dieter







--
View this message in context: 
http://r.789695.n4.nabble.com/grouping-data-tp3679803p3680115.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping data

2011-07-20 Thread adolfpf
All the examples in 'nlme' are in Grouped Data: distance ~ age | Subject
format.

How do I group my data in dolf the same way the data Orthodont are
grouped.

 show(dolf)
   distance   age Subjectt Sex
16.83679 22.01   F1   F
26.63245 23.04   F1   F
3   11.58730 39.26   M2   M


 show(Orthodont)
Grouped Data: distance ~ age | Subject
   distance age SubjectSex
1   26.0   8 M01   Male
2   25.0  10 M01   Male
3   29.0  12 M01   Male


--
View this message in context: 
http://r.789695.n4.nabble.com/grouping-data-tp3679803p3679803.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping columns

2011-07-20 Thread Brad Patrick Schneid
untested because I don't have access to your data, but this should work. 

b13.NEW - b13[, c(Gesamt, Wasser, Boden, Luft, Abwasser,
Gefährliche Abfälle, nicht gefährliche Abfälle)] 







Geophagus wrote:
 
 *Hi @ all,
 I have a question concerning the possibilty of grouping the columns of a
 matrix.
 R groups the columns alphabetically. 
 What can I do to group the columns in my specifications?
 
 The script is the following:*
 
 #R-Skript: Anzahl xyz
 
 #Quelldatei einlesen
 b-read.csv2(Z:/int/xyz.csv, header=TRUE) 
 
 #Teilmengen für die Einzeljahre generieren
 b1-subset(b,jahr==2007)
 b2-subset(b,jahr==2008)
 b3-subset(b,jahr==2009)
 
 #tapply für die Einzeljahre auf die jeweilige BranchenID
 b1_1-tapply(b1$betriebs_id,b1$umweltkompartiment,length)
 b1_2-tapply(b2$betriebs_id,b2$umweltkompartiment,length)
 b1_3-tapply(b3$betriebs_id,b3$umweltkompartiment,length)
 
 #Verbinden der Ergebnisse
 b11-rbind(b1_1,b1_2,b1_3)
 Gesamt-apply(X=b11,MARGIN=1, sum)
 b13-cbind(Gesamt,b11)
 b13
  Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche
 Abfälle Wasser
 b1_1   9832  432183147 2839 
 1592   1804
 b1_2  10271  413283360 2920 
 1715   1835
 b1_3   9983  404213405 2741 
 1691   1721
 
 *Now I want to have the following order of the columns:
 Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht
 gefährliche Abfälle
 
 Thanks a lot for your answers!
 Fak*
 


--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681121.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping columns

2011-07-20 Thread Geophagus
*Hi @ all,
I have a question concerning the possibilty of grouping the columns of a
matrix.
R groups the columns alphabetically. 
What can I do to group the columns in my specifications?

The script is the following:*

 #R-Skript: Anzahl xyz
 
 #Quelldatei einlesen
 b-read.csv2(Z:/int/xyz.csv, header=TRUE) 
 
 #Teilmengen für die Einzeljahre generieren
 b1-subset(b,jahr==2007)
 b2-subset(b,jahr==2008)
 b3-subset(b,jahr==2009)
 
 #tapply für die Einzeljahre auf die jeweilige BranchenID
 b1_1-tapply(b1$betriebs_id,b1$umweltkompartiment,length)
 b1_2-tapply(b2$betriebs_id,b2$umweltkompartiment,length)
 b1_3-tapply(b3$betriebs_id,b3$umweltkompartiment,length)
 
 #Verbinden der Ergebnisse
 b11-rbind(b1_1,b1_2,b1_3)
 Gesamt-apply(X=b11,MARGIN=1, sum)
 b13-cbind(Gesamt,b11)
 b13
 Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche
Abfälle Wasser
b1_1   9832  432183147 2839 
1592   1804
b1_2  10271  413283360 2920 
1715   1835
b1_3   9983  404213405 2741 
1691   1721

*Now I want to have the following order of the columns:
Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht
gefährliche Abfälle

Thanks a lot for your answers!
Fak*



--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681018.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping columns

2011-07-20 Thread David Winsemius


On Jul 20, 2011, at 10:42 AM, Geophagus wrote:


*Hi @ all,
I have a question concerning the possibilty of grouping the columns  
of a

matrix.
R groups the columns alphabetically.
What can I do to group the columns in my specifications?


Dear Earth Eater;

You can create a factor whose levels are ordered to your  
specification. Your columns: umweltkompartiment obviously has those  
levels. This might also offer advantages in situations where there was  
not complete representation of all levels in all the files


So your tapply() calls could have been of this form:

b1_1-tapply(b1$betriebs_id,
 factor( b1$umweltkompartiment, levels=
c(Gesamt, Wasser, Boden, Luft, Abwasser,
  Gefährliche Abfälle, nicht gefährliche  
Abfälle) )

   ,length)
# code would e more compact if you created a facvtor vector and use it  
as an argument to factor:


faclevs - c(Gesamt, Wasser, Boden, Luft, Abwasser,
  Gefährliche Abfälle, nicht gefährliche  
Abfälle)

b1_1-tapply(b1$betriebs_id,
 factor( b1$umweltkompartiment, levels= faclev )
   ,length)
 lather, rinse, repeat x 3
--
David.


The script is the following:*


#R-Skript: Anzahl xyz

#Quelldatei einlesen
b-read.csv2(Z:/int/xyz.csv, header=TRUE)

#Teilmengen für die Einzeljahre generieren
b1-subset(b,jahr==2007)
b2-subset(b,jahr==2008)
b3-subset(b,jahr==2009)

#tapply für die Einzeljahre auf die jeweilige BranchenID
b1_1-tapply(b1$betriebs_id,b1$umweltkompartiment,length)
b1_2-tapply(b2$betriebs_id,b2$umweltkompartiment,length)
b1_3-tapply(b3$betriebs_id,b3$umweltkompartiment,length)

#Verbinden der Ergebnisse
b11-rbind(b1_1,b1_2,b1_3)
Gesamt-apply(X=b11,MARGIN=1, sum)
b13-cbind(Gesamt,b11)
b13

Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche
Abfälle Wasser
b1_1   9832  432183147 2839
1592   1804
b1_2  10271  413283360 2920
1715   1835
b1_3   9983  404213405 2741
1691   1721

*Now I want to have the following order of the columns:
Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht
gefährliche Abfälle

Thanks a lot for your answers!
Fak*



--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681018.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping data in ranges in table

2011-03-05 Thread Jason Rupert
Working with the built in R data set Orange, e.g. with(Orange, table(age, 
circumference)). 

 
How should I go about about grouping the ages and circumferences in the 
following ranges and having them display as such in a table?
age range:
118 - 664
1004 - 1372
1582

circumference range:
30-58
62- 115
120-142
145-177
179-214

Thanks for any feedback and insights, as I hoping for an output that looks 
something like the following:
   circumference range
   30-58 62- 115  145-177
age range
118 - 664 ...
1004 - 1372 ...
1582


Thanks a ton.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in ranges in table

2011-03-05 Thread Greg Snow
?cut

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Jason Rupert
 Sent: Saturday, March 05, 2011 3:38 PM
 To: R Project Help
 Subject: [R] Grouping data in ranges in table
 
 Working with the built in R data set Orange, e.g. with(Orange,
 table(age,
 circumference)).
 
 
 How should I go about about grouping the ages and circumferences in the
 following ranges and having them display as such in a table?
 age range:
 118 - 664
 1004 - 1372
 1582
 
 circumference range:
 30-58
 62- 115
 120-142
 145-177
 179-214
 
 Thanks for any feedback and insights, as I hoping for an output that
 looks
 something like the following:
circumference range
30-58 62- 115  145-177
 age range
 118 - 664 ...
 1004 - 1372 ...
 1582
 
 
 Thanks a ton.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in ranges in table

2011-03-05 Thread Jorge Ivan Velez
Hi Jason,

Something along the lines of

with(Orange, table(cut(age, breaks = c(118, 664, 1004, 1372, 1582, Inf)),
   cut(circumference, breaks = c(30, 58, 62, 115,
145, 179, 214

should get you started.

HTH,
Jorge


On Sat, Mar 5, 2011 at 5:38 PM, Jason Rupert  wrote:

 Working with the built in R data set Orange, e.g. with(Orange, table(age,
 circumference)).


 How should I go about about grouping the ages and circumferences in the
 following ranges and having them display as such in a table?
 age range:
 118 - 664
 1004 - 1372
 1582

 circumference range:
 30-58
 62- 115
 120-142
 145-177
 179-214

 Thanks for any feedback and insights, as I hoping for an output that looks
 something like the following:
   circumference range
   30-58 62- 115  145-177
 age range
 118 - 664 ...
 1004 - 1372 ...
 1582


 Thanks a ton.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping data

2011-03-04 Thread Steve Hong
Hi R-list,

I have a data set with plot locations and observations and want to label
them based on locations.  For example, I have GPS information (x and y) as
follows:

 x
  [1] -87.85092 -87.85092 -87.85092 -87.85093 -87.85093 -87.85093 -87.85094
  [8] -87.85094 -87.85094 -87.85096 -87.85095 -87.85095 -87.85095 -87.85096
 [15] -87.85096 -87.85096 -87.85096 -87.85088 -87.85088 -87.85087 -87.85087
 [22] -87.85087 -87.85087 -87.85086 -87.85086 -87.85086 -87.85085 -87.85086
 [29] -87.85085 -87.85085 -87.85084 -87.85084 -87.85084 -87.85084 -87.85075
 [36] -87.85075 -87.85076 -87.85076 -87.85077 -87.85076 -87.85076 -87.85076
 [43] -87.85077 -87.85077 -87.85077 -87.85077 -87.85077 -87.85077 -87.85070
 [50] -87.85072 -87.85073 -87.85075 -87.85078 -87.85079 -87.85082 -87.85084
 [57] -87.85077 -87.85078 -87.85078 -87.85078 -87.85078 -87.85078 -87.85079
 [64] -87.85079 -87.85080 -87.85080 -87.85071 -87.85071 -87.85071 -87.85070
 [71] -87.85071 -87.85079 -87.85071 -87.85070 -87.85070 -87.85069 -87.85069
 [78] -87.85069 -87.85069 -87.85068 -87.85068 -87.85068 -87.85067 -87.85059
 [85] -87.85060 -87.85060 -87.85060 -87.85061 -87.85061 -87.85061 -87.85061
 [92] -87.85061 -87.85062 -87.85062 -87.85062 -87.85062 -87.85063 -87.85063
 [99] -87.85063 -87.85055 -87.85055 -87.85055 -87.85054 -87.85054 -87.85053
[106] -87.85053 -87.85053 -87.85053 -87.85053 -87.85052 -87.85052 -87.85052
[113] -87.85052 -87.85051 -87.85051 -87.85043 -87.85043 -87.85044 -87.85044
[120] -87.85044 -87.85045 -87.85045 -87.85045 -87.85045 -87.85046 -87.85046
[127] -87.85046 -87.85046 -87.85047 -87.85047 -87.85039 -87.85039 -87.85038
[134] -87.85038 -87.85038 -87.85037 -87.85037 -87.85037 -87.85037 -87.85036
[141] -87.85036 -87.85036 -87.85035 -87.85035 -87.85035 -87.85027 -87.85027
[148] -87.85027 -87.85027 -87.85028 -87.85028 -87.85028 -87.85029 -87.85029
[155] -87.85029 -87.85029 -87.85029 -87.85030 -87.85030 -87.85030 -87.85022
[162] -87.85022 -87.85022 -87.85021 -87.85021 -87.85021 -87.85020 -87.85020
[169] -87.85020 -87.85020 -87.85019 -87.85019 -87.85019 -87.85019 -87.85011
[176] -87.85011 -87.85011 -87.85011 -87.85012 -87.85012 -87.85012 -87.85012
[183] -87.85013 -87.85013 -87.85013 -87.85014 -87.85014 -87.85014 -87.85006
[190] -87.85006 -87.85006 -87.85005 -87.85005 -87.85004 -87.85004 -87.85004
[197] -87.85004 -87.85003 -87.85003 -87.85003 -87.85002 -87.85003 -87.84994
[204] -87.84994 -87.84995 -87.84995 -87.84995 -87.84995 -87.84996 -87.84996
[211] -87.84996 -87.84996 -87.84996 -87.84996 -87.84996 -87.84996 -87.84996
[218] -87.84996 -87.84996 -87.84996 -87.84996 -87.84990 -87.84991 -87.84993
[225] -87.84995 -87.84998 -87.84999 -87.85001 -87.85003 -87.84996 -87.84998
[232] -87.84997 -87.84998 -87.84989 -87.84990 -87.84989 -87.84989 -87.84988
[239] -87.84988 -87.84988 -87.84988 -87.84988 -87.84987 -87.84987 -87.84987
[246] -87.84987 -87.84978 -87.84978 -87.84979 -87.84979 -87.84979 -87.84979
[253] -87.84979 -87.84980 -87.84980 -87.84981 -87.84980 -87.84981 -87.84981
[260] -87.84973 -87.84973 -87.84973 -87.84972 -87.84972 -87.84972 -87.84971
[267] -87.84971 -87.84971 -87.84970 -87.84970 -87.84970 -87.84963 -87.84963
[274] -87.84963 -87.84963 -87.84963 -87.84964 -87.84964 -87.84965 -87.84964
[281] -87.84964 -87.84965 -87.84957 -87.84957 -87.84956 -87.84956 -87.84958
[288] -87.84958
 y
  [1] 33.90342 33.90335 33.90328 33.90321 33.90314 33.90308 33.90301
33.90294
  [9] 33.90287 33.90280 33.90274 33.90267 33.90260 33.90253 33.90246
33.90240
 [17] 33.90233 33.90232 33.90239 33.90245 33.90252 33.90259 33.90266
33.90273
 [25] 33.90279 33.90286 33.90293 33.90300 33.90307 33.90314 33.90321
33.90327
 [33] 33.90334 33.90339 33.90337 33.90335 33.90328 33.90321 33.90319
33.90318
 [41] 33.90317 33.90316 33.90315 33.90313 33.90312 33.90310 33.90309
33.90307
 [49] 33.90314 33.90314 33.90314 33.90314 33.90314 33.90314 33.90314
33.90314
 [57] 33.90300 33.90294 33.90287 33.90280 33.90273 33.90266 33.90252
33.90245
 [65] 33.90239 33.90232 33.90231 33.90237 33.90245 33.90251 33.90258
33.90259
 [73] 33.90265 33.90272 33.90279 33.90286 33.90292 33.90299 33.90306
33.90313
 [81] 33.90320 33.90326 33.90334 33.90332 33.90327 33.90320 33.90314
33.90307
 [89] 33.90300 33.90293 33.90286 33.90279 33.90272 33.90265 33.90258
33.90252
 [97] 33.90245 33.90238 33.90231 33.90231 33.90237 33.90243 33.90250
33.90257
[105] 33.90264 33.90271 33.90278 33.90285 33.90292 33.90298 33.90306
33.90312
[113] 33.90319 33.90326 33.90329 33.90326 33.90319 33.90312 33.90306
33.90299
[121] 33.90292 33.90286 33.90279 33.90272 33.90265 33.90258 33.90251
33.90245
[129] 33.90237 33.90231 33.90230 33.90236 33.90243 33.90250 33.90257
33.90264
[137] 33.90271 33.90277 33.90284 33.90291 33.90298 33.90305 33.90311
33.90319
[145] 33.90325 33.90323 33.90319 33.90312 33.90305 33.90299 33.90291
33.90285
[153] 33.90278 33.90272 33.90264 33.90257 33.90250 33.90243 33.90237
33.90230
[161] 33.90229 33.90235 33.90243 33.90250 33.90256 33.90263 33.90270
33.90277
[169] 33.90283 33.90290 33.90297 33.90304 33.90311 

Re: [R] grouping data

2011-03-04 Thread Joshua Wiley
Hi Steve,

Just test whether y is greater than the predicted y (i.e., your line).

## function using the model coefficients*
f - function(x) {82.9996 + (.5589 * x)}
## Find group membership
group - ifelse(y  foo(x), A, B)

*Note that depending how accurate this needs to be, you will probably
want to use the model itself rather than just reading from the
printout like I did.  If you need to do that, take a look at ?predict

For future reference, it would be easier for readers if you provided
your data via something like: dput(x) that can be copied directly into
the R console.  Also, if you are generating random data (rnorm()), you
can use set.seed() so that we can replicate exactly what you get.

HTH,

Josh

On Fri, Mar 4, 2011 at 1:39 PM, Steve Hong empti...@gmail.com wrote:
 Hi R-list,

 I have a data set with plot locations and observations and want to label
 them based on locations.  For example, I have GPS information (x and y) as
 follows:
[snip]
 (fm1 - lm(ysim~xsim))
 Call:
 lm(formula = ysim ~ xsim)
 Coefficients:
 (Intercept)         xsim
    82.9996       0.5589

 I overlapped fitted line on the plot.

 abline(fm1)
 My question is:
 As you can see in the plot, how can I label (or re-group) those in upper
 diagonal as (say) 'A' and the others in lower diagonal as 'B'?

 Thanks a lot in advance!!!

 Steve

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-27 Thread zem
have nobody any idea? 
i have already try with tapply(d,gr, ... ) but i have problems with the
choose of the function ...  also i am not really sure if that is the right
direction with tapply ... 
it'll be really great when somebody comes with new suggestion..

10x

-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3327240.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-27 Thread jim holtman
Here is one solution; mine differs since there should be at least one
item in the range which would be itself:

  tm gr
1  12345  1
2  42352  3
3  12435  1
4  67546  2
5  24234  2
6  76543  4
7  31243  2
8  13334  3
9  64562  3
10 64123  3
 d$ct - ave(d$tm, d$gr, FUN = function(x){
+ # determine count in the range
+ sapply(x, function(a) sum((x = a - 500)  (x = a + 500)))
+ })

 d
  tm gr ct
1  12345  1  2
2  42352  3  1
3  12435  1  2
4  67546  2  1
5  24234  2  1
6  76543  4  1
7  31243  2  1
8  13334  3  1
9  64562  3  2
10 64123  3  2


On Sat, Feb 26, 2011 at 5:10 PM, zem zmanol...@gmail.com wrote:
 sry,
 new try:

 tm-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123)
 gr-c(1,3,1,2,2,4,2,3,3,3)
 d-data.frame(cbind(time,gr))

 where tm are unix times and gr the factor grouping by
 i have a skalar for example k=500
 now i need to calculate in for every row how much examples in the group are
 in the interval [i-500;i+500] and i is the active tm-element, like this:

d
    time gr ct
 1  12345  1  2
 2  42352  3  0
 3  12435  1  2
 4  67546  2  0
 5  24234  2  0
 6  76543  4  0
 7  31243  2  0
 8  13334  3  0
 9  64562  3  2
 10 64123  3  2

 i hope that was a better illustration of my problem

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-26 Thread zem
sry, 
new try: 

tm-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123) 
gr-c(1,3,1,2,2,4,2,3,3,3) 
d-data.frame(cbind(time,gr))

where tm are unix times and gr the factor grouping by
i have a skalar for example k=500
now i need to calculate in for every row how much examples in the group are
in the interval [i-500;i+500] and i is the active tm-element, like this: 

d
time gr ct
1  12345  1  2
2  42352  3  0
3  12435  1  2
4  67546  2  0
5  24234  2  0
6  76543  4  0
7  31243  2  0
8  13334  3  0
9  64562  3  2
10 64123  3  2

i hope that was a better illustration of my problem

-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping and counting in dataframe

2011-02-25 Thread zem

hi all,

i have a little problem, i have some code writen, but it is to slow... 

i have a dataframe with a column of time series and a grouping column,
really there is no metter if in the first col what kind of data is, it can
be a random number like this
x-rnorm(10)
gr-c(1,3,1,2,2,4,2,3,3,3)
x-cbind(x,gr)

now i have to look for every row i , for this group, how much from the x[,1]
is in a range from x[1,i] such x[1,i] (+/-) k (k is a another number) 

thanks in advance 

-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3325476.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-25 Thread David Winsemius


On Feb 25, 2011, at 8:28 PM, zem wrote:



hi all,

i have a little problem, i have some code writen, but it is to slow...

i have a dataframe with a column of time series and a grouping column,
really there is no metter if in the first col what kind of data is,  
it can

be a random number like this
x-rnorm(10)
gr-c(1,3,1,2,2,4,2,3,3,3)
x-cbind(x,gr)


That is not a dataframe. It is a matrix. And not all time series  
objects are the same, so you should not assume that any old two column  
object will respond the same way to R functions.




now i have to look for every row i , for this group, how much from  
the x[,1]

is in a range from x[1,i] such x[1,i] (+/-) k (k is a another number)


You may find that the function, findInterval, is useful. I cannot  
determine what you goal is from the description and there is no  
complete example with a specification of what correct output would  
be   as you should have seen requested in the Posting Guide.





--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping by factors in R

2011-02-08 Thread Christopher R. Dolanc
I'm having a hard time figuring out how to group results by certain 
factors in R.  I have data with the following headings:


[1] Time  Plot  LatCatElevation ElevCat   
AspectAspCatSlope

[9] SlopeCat  Species   SizeClass Stems

and I'm trying to use a GLM to test differences in Stems for different 
categories/factors - most importantly, I want to group things so that I 
see results by SizeClass and then by Species.  This is pretty easy 
in SAS using the Group By command, but in R, I haven't figured it out.


I've tried using the following code:

 stems139GLM - glm(Stems ~ Time | SizeClass | Species, 
family=poisson, data=stems139)


but R gives me this message:

Error in pmax(exp(eta), .Machine$double.eps) :
  cannot mix 0-length vectors with others
In addition: Warning messages:
1: In Ops.factor(Time, SizeClass) : | not meaningful for factors
2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for factors

I'd appreciate any help.

Thanks.

--
Christopher R. Dolanc
PhD Candidate
Ecology Graduate Group
University of California, Davis
Lab Phone: (530) 752-2644 (Barbour lab)un

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by factors in R

2011-02-08 Thread Dennis Murphy
Hi:

One approach would be to use dlply() from the plyr package to generate the
models and assign the results to a list, something like the following:

library(plyr)
# function to run the GLM in each data subset - the argument is a generic
data subset d
gfun - function(d) glm(Stems ~ Time, data = d, family = poisson)
mlist - dlply(stems139, .(SizeClass, Species), gfun)

To see the result, try mlist[[1]] or summary(mlist[[1]]) to execute the
print and summary methods on the first fitted model. Each output list object
from glm() is a list component of mlist, so mlist is actually a list of
lists.

You can extract various pieces from mlist by using ldply() with a suitable
extraction function or by use of the do.call/lapply combination.

All of this is untested since no minimal example was provided per
instructions in the Posting Guide...

HTH,
Dennis


On Tue, Feb 8, 2011 at 11:54 AM, Christopher R. Dolanc crdol...@ucdavis.edu
 wrote:

 I'm having a hard time figuring out how to group results by certain factors
 in R.  I have data with the following headings:

 [1] Time  Plot  LatCatElevation ElevCat   Aspect
  AspCatSlope
 [9] SlopeCat  Species   SizeClass Stems

 and I'm trying to use a GLM to test differences in Stems for different
 categories/factors - most importantly, I want to group things so that I see
 results by SizeClass and then by Species.  This is pretty easy in SAS
 using the Group By command, but in R, I haven't figured it out.

 I've tried using the following code:

  stems139GLM - glm(Stems ~ Time | SizeClass | Species, family=poisson,
 data=stems139)

 but R gives me this message:

 Error in pmax(exp(eta), .Machine$double.eps) :
  cannot mix 0-length vectors with others
 In addition: Warning messages:
 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors
 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for factors

 I'd appreciate any help.

 Thanks.

 --
 Christopher R. Dolanc
 PhD Candidate
 Ecology Graduate Group
 University of California, Davis
 Lab Phone: (530) 752-2644 (Barbour lab)un

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by factors in R

2011-02-08 Thread Christopher R. Dolanc
I'm working on getting this to work - need to figure out how to extract 
pieces properly.

In the mean time, I may have figured out an alternate method to group 
the factors by the following:

  stems139$SpeciesF - factor(stems139$Species)

  stems139GLM - glm(Stems ~ Time*SizeClassF*Species, family=poisson, 
data=stems139)

  summary(stems139GLM)

Call:

glm(formula = Stems ~ Time * SizeClassF * Species, family = poisson,

data = stems139)

Deviance Residuals:

Min1QMedian3QMax

-4.2308-1.0107-0.6786-0.339316.7415

Coefficients:

Estimate Std. Error z value Pr(|z|)

(Intercept)-0.6717940.118678-5.661 1.51e-08 ***

TimeVTM-0.5738000.197698-2.902 0.003703 **

SizeClassF2-0.7661720.210684-3.637 0.000276 ***

SizeClassF3-1.960095 0.337764-5.803 6.51e-09 ***

SizeClassF4-2.6532420.462693-5.734 9.79e-09 ***

SpeciesABMA1.8240950.12789514.262 2e-16 ***

SpeciesJUOC-0.0882930.171666-0.514 0.607022

SpeciesPIAL1.9479200.12685615.355  2e-16 ***

SpeciesPICO2.8634070.12201823.467  2e-16 ***

SpeciesPIJE-0.5250100.194664-2.697 0.006997 **

SpeciesPIMO0.3720490.1542512.412 0.015866 *

SpeciesTSME1.9194050.12708515.103 2e-16 ***

TimeVTM:SizeClassF2-0.6201220.411567-1.507 0.131879

TimeVTM:SizeClassF30.7561220.4716121.603 0.108875

TimeVTM:SizeClassF40.9102730.6180141.473 0.140778

The problem now though, is that R for some reason does not list factor 1 
in the output.  Why would this be?

On 2/8/2011 2:21 PM, Dennis Murphy wrote:
 Hi:

 One approach would be to use dlply() from the plyr package to generate 
 the models and assign the results to a list, something like the following:

 library(plyr)
 # function to run the GLM in each data subset - the argument is a 
 generic data subset d
 gfun - function(d) glm(Stems ~ Time, data = d, family = poisson)
 mlist - dlply(stems139, .(SizeClass, Species), gfun)

 To see the result, try mlist[[1]] or summary(mlist[[1]]) to execute 
 the print and summary methods on the first fitted model. Each output 
 list object from glm() is a list component of mlist, so mlist is 
 actually a list of lists.

 You can extract various pieces from mlist by using ldply() with a 
 suitable extraction function or by use of the do.call/lapply combination.

 All of this is untested since no minimal example was provided per 
 instructions in the Posting Guide...

 HTH,
 Dennis


 On Tue, Feb 8, 2011 at 11:54 AM, Christopher R. Dolanc 
 crdol...@ucdavis.edu mailto:crdol...@ucdavis.edu wrote:

 I'm having a hard time figuring out how to group results by
 certain factors in R.  I have data with the following headings:

 [1] Time  Plot  LatCatElevation ElevCat  
 AspectAspCatSlope
 [9] SlopeCat  Species   SizeClass Stems

 and I'm trying to use a GLM to test differences in Stems for
 different categories/factors - most importantly, I want to group
 things so that I see results by SizeClass and then by Species.
  This is pretty easy in SAS using the Group By command, but in
 R, I haven't figured it out.

 I've tried using the following code:

  stems139GLM - glm(Stems ~ Time | SizeClass | Species,
 family=poisson, data=stems139)

 but R gives me this message:

 Error in pmax(exp(eta), .Machine$double.eps) :
  cannot mix 0-length vectors with others
 In addition: Warning messages:
 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors
 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for
 factors

 I'd appreciate any help.

 Thanks.

 -- 
 Christopher R. Dolanc
 PhD Candidate
 Ecology Graduate Group
 University of California, Davis
 Lab Phone: (530) 752-2644 (Barbour lab)un

 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Christopher R. Dolanc
PhD Candidate
Ecology Graduate Group
University of California, Davis
Lab Phone: (530) 752-2644 (Barbour lab)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grouping question

2010-10-29 Thread will phillips

Hello

I have what is probably a very simple grouping question however, given my
limited exposure to R, I have not found a solution yet despite my research
efforts and wild attempts at what I thought might produce some sort of
result.

I have a very simple list of integers that range between 1 and 24.  These
correspond to hours of the day.

I am trying to create a grouping of Day and Night with 
Day = 6 to 17.99
Night = 1 to 5.59  and  18 to 24

Using the Cut command I can create the segments but I have not found a
combine type of command to merger the two night segments.  No luck with
if/else either.

Any help would be greatly appreciated

Thank you

Will


-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread jim holtman
try this:

 x
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 y - cut(x, breaks=c(-Inf,6,18, Inf), labels=c('a','b','c'))
 levels(y) - c('night','day','night')
 y
 [1] night night night night night night night day   day   day   day
day   day   day   day   day   day   day
[19] day   night night night night night night
Levels: night day



On Fri, Oct 29, 2010 at 8:56 PM, will phillips will.phill...@q.com wrote:

 Hello

 I have what is probably a very simple grouping question however, given my
 limited exposure to R, I have not found a solution yet despite my research
 efforts and wild attempts at what I thought might produce some sort of
 result.

 I have a very simple list of integers that range between 1 and 24.  These
 correspond to hours of the day.

 I am trying to create a grouping of Day and Night with
 Day = 6 to 17.99
 Night = 1 to 5.59  and  18 to 24

 Using the Cut command I can create the segments but I have not found a
 combine type of command to merger the two night segments.  No luck with
 if/else either.

 Any help would be greatly appreciated

 Thank you

 Will


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread Jorge Ivan Velez
Hi Will,

One way would be:

 x
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24
 factor(ifelse(x6  x18, 'day', 'night'))
 [1] night night night night night night night day   day   day   day   day
day   day   day
[16] day   day   day   night night night night night night night
Levels: day night

HTH,
Jorge


On Fri, Oct 29, 2010 at 8:56 PM, will phillips  wrote:


 Hello

 I have what is probably a very simple grouping question however, given my
 limited exposure to R, I have not found a solution yet despite my research
 efforts and wild attempts at what I thought might produce some sort of
 result.

 I have a very simple list of integers that range between 1 and 24.  These
 correspond to hours of the day.

 I am trying to create a grouping of Day and Night with
 Day = 6 to 17.99
 Night = 1 to 5.59  and  18 to 24

 Using the Cut command I can create the segments but I have not found a
 combine type of command to merger the two night segments.  No luck with
 if/else either.

 Any help would be greatly appreciated

 Thank you

 Will


 --
 View this message in context:
 http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread will phillips

Hello Jim

Wow.  I tried cut but i see you have an interim step with labels a,b,c but
levels night and day.  i was really close to this.  i have labels
night,day,night and it wouldn't let me duplicate labels.  I am very greatful
for your input

Will
-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019950.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread will phillips

Hello Jorge,

Thank you for the reply.  I tried a few different things with if/else but
couldn't get them to go.  I really appreciate your feedback.  I learned
something new from this

Will
-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019952.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >