Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Duncan Murdoch
There's a package called "pivottabler" which exports PivotTable: 
http://pivottabler.org.uk/reference/PivotTable.html .


Duncan Murdoch

On 30/09/2023 7:11 a.m., John Kane wrote:

To follow up on Rui Barradas's post, I do not think PivotTable is an R
command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in
the {tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:


Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has

the

FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with
calls to library() when using non-base functionality.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
www.avg.com

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Ebert,Timothy Aaron
In this sort of post it would help if we knew the package that was being used 
for the example. I found one option.
https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html

There may be a way to create a custom data type that would be a date but 
restricted to a -mm format. I do not know how to do this.
Could you work with the date as a string with a -mm format. The issue is 
that R will not handle the string as a date.
A third option would be to look at the lubridate package that can be installed 
by itself or as part of tidyverse. I do not promise that this is a solution, 
but it could be.



-Original Message-
From: R-help  On Behalf Of John Kane
Sent: Saturday, September 30, 2023 7:11 AM
To: Rui Barradas 
Cc: Paul Bernal ; R 
Subject: Re: [R] Grouping by Date and showing count of failures by date

[External Email]

To follow up on Rui Barradas's post, I do not think PivotTable is an R command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in the 
{tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:

> Às 21:29 de 29/09/2023, Paul Bernal escreveu:
> > Dear friends,
> >
> > Hope you are doing great. I am attaching the dataset I am working
> > with because, when I tried to dput() it, I was not able to copy the
> > entire result from dput(), so I apologize in advance for that.
> >
> > I am interested in creating a column named Failure_Date_Period that
> > has
> the
> > FAILDATE but formatted as _MM. Then I want to count the number
> > of failures (given by column WONUM) and just have a dataframe that
> > has the FAILDATE and the count of WONUM.
> >
> > I tried this:
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt$defineCalculation(calculationName = "FailCounts",
> > summariseExpression="n()")
> > pt$renderPivot()
> >
> > but I was not successful. Bottom line, I need to create a new
> > dataframe that has the number of failures by FAILDATE, but in -MM 
> > format.
> >
> > Any help and/or guidance will be greatly appreciated.
> >
> > Kind regards,
> > Paul
> > __
> > [email protected] mailing list -- To UNSUBSCRIBE and more, see
> > https://st/
> > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl
> > .edu%7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a6233
> > 1e1b84%7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> > %7C%7C&sdata=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D&reserv
> > ed=0
> > PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C76
> 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I8
> QM8BE7EdpXzfFuwe93IqqL4JS7wWGgfr24XRH5LHs%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> No data is attached. Maybe try
>
> dput(head(failuredf, 30))
>
> ?
>
> And where can we find non-base PivotTable? Please start the scripts
> with calls to library() when using non-base functionality.
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> presença de vírus.
> http://www.a/
> vg.com%2F&data=05%7C01%7Ctebert%40ufl.edu%7C7647cf60560f40177c9908dbc1
> a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638316691975258863
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=zBgPdOLZNvCOa4n2GXWWjLh4wg
> 3L4TdGXBMaGJ6n%2BsI%3D&reserved=0
>
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
> %7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> ta=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D&reserved=0
> PLEASE do read the posting guide
> http://ww

Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread John Kane
To follow up on Rui Barradas's post, I do not think PivotTable is an R
command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in
the {tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:

> Às 21:29 de 29/09/2023, Paul Bernal escreveu:
> > Dear friends,
> >
> > Hope you are doing great. I am attaching the dataset I am working with
> > because, when I tried to dput() it, I was not able to copy the entire
> > result from dput(), so I apologize in advance for that.
> >
> > I am interested in creating a column named Failure_Date_Period that has
> the
> > FAILDATE but formatted as _MM. Then I want to count the number of
> > failures (given by column WONUM) and just have a dataframe that has the
> > FAILDATE and the count of WONUM.
> >
> > I tried this:
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt$defineCalculation(calculationName = "FailCounts",
> > summariseExpression="n()")
> > pt$renderPivot()
> >
> > but I was not successful. Bottom line, I need to create a new dataframe
> > that has the number of failures by FAILDATE, but in -MM format.
> >
> > Any help and/or guidance will be greatly appreciated.
> >
> > Kind regards,
> > Paul
> > __
> > [email protected] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> No data is attached. Maybe try
>
> dput(head(failuredf, 30))
>
> ?
>
> And where can we find non-base PivotTable? Please start the scripts with
> calls to library() when using non-base functionality.
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> presença de vírus.
> www.avg.com
>
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Rui Barradas

Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with 
calls to library() when using non-base functionality.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Question

2020-03-21 Thread Chris Evans
Here's a very "step by step" example with dplyr as I'm trying to teach myself 
the Tidyverse way of being

library(dplyr)

# SerialMeasurementMeas_testSerial_test
# 117failfail
# 116passfail
# 212passpass
# 28passpass
# 210passpass
# 319failfail
# 313passpass

dat <- as.data.frame(list(Serial = c(1,1,2,2,2,3,3),
  Measurement = c(17, 16, 12, 8, 10, 19, 13),
  Meas_test = c("fail", "pass", "pass", "pass", "pass", 
"fail", "pass")))

dat %>%
  group_by(Serial) %>%
  summarise(Serial_test = sum(Meas_test == "fail")) %>%
  mutate(Serial_test = if_else(Serial_test > 0, 1, 0),
 Serial_test = factor(Serial_test,
  levels = 0:1,
  labels = c("pass", "fail"))) -> groupedDat

dat %>%
  left_join(groupedDat) # add -> dat to the end to pip to dat

Gives:

  Serial Measurement Meas_test Serial_test
1  1  17  failfail
2  1  16  passfail
3  2  12  passpass
4  2   8  passpass
5  2  10  passpass
6  3  19  failfail
7  3  13  passfail

Would be easier for us if used dput() to share your data but thanks for the 
minimal example!

Chris

- Original Message -
> From: "Ivan Krylov" 
> To: "Thomas Subia via R-help" 
> Cc: "Thomas Subia" 
> Sent: Sunday, 22 March, 2020 07:24:15
> Subject: Re: [R] Grouping Question

> On Sat, 21 Mar 2020 20:01:30 -0700
> Thomas Subia via R-help  wrote:
> 
>> Serial_test is a pass, when all of the Meas_test are pass for a given
>> serial. Else Serial_test is a fail.
> 
> Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
> packages.
> 
> --
> Best regards,
> Ivan
> 
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chris Evans  Visiting Professor, University of Sheffield 

I do some consultation work for the University of Roehampton 
 and other places
but  remains my main Email address.  I have a work web site 
at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see: 
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
That page will also take you to my blog which started with earlier joys in 
France and Spain!

If you want to book to talk, I am trying to keep that to Thursdays and my diary 
is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Question

2020-03-21 Thread Ivan Krylov
On Sat, 21 Mar 2020 20:01:30 -0700
Thomas Subia via R-help  wrote:

> Serial_test is a pass, when all of the Meas_test are pass for a given
> serial. Else Serial_test is a fail.

Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
packages.

-- 
Best regards,
Ivan

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman
Rui

Your first code worked just fine.

Jeff

-Original Message-
From: Rui Barradas  
Sent: Saturday, May 26, 2018 8:30 AM
To: [email protected]; 'R-help' 
Subject: Re: [R] Grouping by 3 variable and renaming groups

Hello,

Sorry, but I think my first answer is wrong.
You probably want something along the lines of


sp <- split(priceStore_Grps, priceStore_Grps$StorePC) res <- 
lapply(seq_along(sp), function(i){
 sp[[i]]$StoreID <- paste("Store", i, sep = "_")
 sp[[i]]
})
res <- do.call(rbind, res)
row.names(res) <- NULL


Hope this helps,

Rui Barradas

On 5/26/2018 2:22 PM, Rui Barradas wrote:
> Hello,
> 
> See if this is it:
> 
> priceStore_Grps$StoreID <- paste("Store", 
> seq_len(nrow(priceStore_Grps)), sep = "_")
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> On 5/26/2018 2:03 PM, Jeff Reichman wrote:
>> ALCON
>>
>>
>> I'm trying to figure out how to rename groups in a data frame after 
>> groups by selected variabels.  I am using the dplyr library to group 
>> my data by 3 variables as follows
>>
>>
>> # group by lat (StoreX)/long (StoreY)
>>
>> priceStore <- LapTopSales[,c(4,5,15,16)]
>>
>> priceStore <- priceStore[complete.cases(priceStore), ]  # keep only 
>> non NA records
>>
>> priceStore_Grps <- priceStore %>%
>>
>>group_by(StorePC, StoreX, StoreY) %>%
>>
>>summarize(meanPrice=(mean(RetailPrice)))
>>
>>
>> which results in .
>>
>>
>>> priceStore_Grps
>>
>> # A tibble: 15 x 4
>>
>> # Groups:   StorePC, StoreX [?]
>>
>> StorePC  StoreX StoreY meanPrice
>>
>> 
>>
>> 1 CR7 8LE  532714 168302  472.
>>
>> 2 E2 0RY   535652 182961  520.
>>
>> 3 E7 8NW   541428 184515  467.
>>
>> 4 KT2 5AU  517917 170243  522.
>>
>> 5 N17 6QA  533788 189994  523.
>>
>>
>> Which is fine, but I then want to give each group (e.g. CR7 8LE  
>> 532714
>> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique 
>> identifier.
>>
>>
>> StorePC  StoreX StoreY meanPrice
>>
>> 
>>
>> 1 CR7 8LE  532714 168302  472.   Store 1
>>
>> 2 E2 0RY   535652 182961  520.   Store 2
>>
>> 3 E7 8NW   541428 184515  467.   Store 3
>>
>> 4 KT2 5AU  517917 170243  522.   Store 4
>>
>> 5 N17 6QA  533788 189994  523.   Store 5
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> [email protected] mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman
Rui

That did it 

Jeff

-Original Message-
From: Rui Barradas  
Sent: Saturday, May 26, 2018 8:23 AM
To: [email protected]; 'R-help' 
Subject: Re: [R] Grouping by 3 variable and renaming groups

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", seq_len(nrow(priceStore_Grps)), sep = 
"_")


Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:
> ALCON
> 
>   
> 
> I'm trying to figure out how to rename groups in a data frame after groups
> by selected variabels.  I am using the dplyr library to group my data by 3
> variables as follows
> 
>   
> 
> # group by lat (StoreX)/long (StoreY)
> 
> priceStore <- LapTopSales[,c(4,5,15,16)]
> 
> priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
> records
> 
> priceStore_Grps <- priceStore %>%
> 
>group_by(StorePC, StoreX, StoreY) %>%
> 
>summarize(meanPrice=(mean(RetailPrice)))
> 
>   
> 
> which results in .
> 
>   
> 
>> priceStore_Grps
> 
> # A tibble: 15 x 4
> 
> # Groups:   StorePC, StoreX [?]
> 
> StorePC  StoreX StoreY meanPrice
> 
> 
> 
> 1 CR7 8LE  532714 168302  472.
> 
> 2 E2 0RY   535652 182961  520.
> 
> 3 E7 8NW   541428 184515  467.
> 
> 4 KT2 5AU  517917 170243  522.
> 
> 5 N17 6QA  533788 189994  523.
> 
>   
> 
> Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique
> identifier.
> 
>   
> 
> StorePC  StoreX StoreY meanPrice
> 
> 
> 
> 1 CR7 8LE  532714 168302  472.   Store 1
> 
> 2 E2 0RY   535652 182961  520.   Store 2
> 
> 3 E7 8NW   541428 184515  467.   Store 3
> 
> 4 KT2 5AU  517917 170243  522.   Store 4
> 
> 5 N17 6QA  533788 189994  523.   Store 5
> 
>   
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Rui Barradas

Hello,

Sorry, but I think my first answer is wrong.
You probably want something along the lines of


sp <- split(priceStore_Grps, priceStore_Grps$StorePC)
res <- lapply(seq_along(sp), function(i){
sp[[i]]$StoreID <- paste("Store", i, sep = "_")
sp[[i]]
})
res <- do.call(rbind, res)
row.names(res) <- NULL


Hope this helps,

Rui Barradas

On 5/26/2018 2:22 PM, Rui Barradas wrote:

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", 
seq_len(nrow(priceStore_Grps)), sep = "_")



Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:

ALCON


I'm trying to figure out how to rename groups in a data frame after 
groups
by selected variabels.  I am using the dplyr library to group my data 
by 3

variables as follows


# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only 
non NA

records

priceStore_Grps <- priceStore %>%

   group_by(StorePC, StoreX, StoreY) %>%

   summarize(meanPrice=(mean(RetailPrice)))


which results in .



priceStore_Grps


# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

    StorePC  StoreX StoreY meanPrice

        

1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.


Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.


    StorePC  StoreX StoreY meanPrice

        

1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5



[[alternative HTML version deleted]]

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Rui Barradas

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", 
seq_len(nrow(priceStore_Grps)), sep = "_")



Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:

ALCON

  


I'm trying to figure out how to rename groups in a data frame after groups
by selected variabels.  I am using the dplyr library to group my data by 3
variables as follows

  


# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
records

priceStore_Grps <- priceStore %>%

   group_by(StorePC, StoreX, StoreY) %>%

   summarize(meanPrice=(mean(RetailPrice)))

  


which results in .

  


priceStore_Grps


# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

StorePC  StoreX StoreY meanPrice



1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.

  


Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.

  


StorePC  StoreX StoreY meanPrice



1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5

  



[[alternative HTML version deleted]]

__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping in R

2015-06-18 Thread PIKAL Petr
Hi

We can only guess what you really want.

Maybe this.

set.seed(111)
cust<-sample(letters[1:5], 500, replace =T)
value<-sample(1:1000, 500)
month<-sample(1:12, 500, replace=T)
dat<-data.frame(cust, value, month)
dat.ag<-aggregate(dat$value, list(dat$month, dat$cust), sum)

> head(dat.ag)
  Group.1 Group.2x
1   1   a 2444
2   2   a 6234
3   3   a 6082
4   4   a 3691
5   5   a 3044
6   6   a 3534

dput(dat.ag)
structure(list(Group.1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L), Group.2 = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("a", "b",
"c", "d", "e"), class = "factor"), x = c(2444L, 6234L, 6082L,
3691L, 3044L, 3534L, 7444L, 1819L, 2295L, 4774L, 3659L, 1159L,
6592L, 1272L, 8245L, 2324L, 5189L, 3935L, 2945L, 2386L, 2796L,
2869L, 3142L, 4657L, 4411L, 6223L, 3266L, 3842L, 6056L, 7472L,
3879L, 7135L, 4544L, 4498L, 2703L, 3409L, 2748L, 2288L, 2654L,
4995L, 4626L, 5543L, 2162L, 4681L, 5853L, 6229L, 3001L, 5274L,
3852L, 2635L, 5643L, 2809L, 2988L, 3756L, 5180L, 2997L, 4883L,
4208L, 2669L, 3151L)), .Names = c("Group.1", "Group.2", "x"), row.names = c(NA,
-60L), class = "data.frame")
>

But maybe something different. Who knows?

If you wanted grouping by value use

?cut or ?findInterval

Cheers
Petr


> -Original Message-
> From: R-help [mailto:[email protected]] On Behalf Of Shivi82
> Sent: Thursday, June 18, 2015 9:22 AM
> To: [email protected]
> Subject: [R] Grouping in R
>
> Hi All,
>
> I am working on a data where the total row count is 25+ and have
> approx.
> 20 variables. One of the var on which i need to summarize the data is
> Consignor i.e. seller name.
>
> Now the issue here is after deleting all the duplicate names i still
> have 55000 unique customer name and i am not sure on how to summarize
> the data.
>
> Is there a possibility that i could create 8 or 10 groups based on the
> weight or booking they made from our company and eventually all 55000
> customers would fall under these 10 groups. Then it could be easier for
> me to analyze in which group there is a variance on a month on month
> level.
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Grouping-
> in-R-tp4708800.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the emai

Re: [R] grouping explanatory variables into "sets" for GLMM

2014-04-03 Thread Don McKenzie
Reading the Intro, as Bert suggests, would likely solve some of your problems. 
If you think about how many combinations it would take, using only one variable 
from each group in any one model, you would see that the number of individual 
models (12) is not so onerous that you couldn’t specify them one at a time.

On Apr 3, 2014, at 8:55 PM, Bert Gunter  wrote:

> Unless there is reason to keep the conversation private, always reply
> to the list. How will anyone else know that my answer wasn't
> satisfactory?
> 
> 1. I don't intend to go through your references. A minimal
> reproducible example of what you wish to do and what you tried would
> help.
> 
> 2. Have you read An Intro to R?
> 
> Cheers,
> Bert
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
> 
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> H. Gilbert Welch
> 
> 
> 
> 
> On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD
>  wrote:
>> Thanks for getting back to me.
>> 
>> It seems I didn't write my question clearly and that it was misunderstood - 
>> even if it is easy to answer: I would like to reduce the number of 
>> explanatory variables in my model by using "sets" or categories that these 
>> variables belong to, like Rhodes et al. did in their chapter, or like 
>> Lentini et al. 2012 did in their paper.
>> 
>> Factor is not the answer I am looking for, unfortunately.
>> 
>> On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote:
>> 
>>> Have you read "An Introduction to R" (or other online tutorial)? If
>>> not, please do so before posting further here. It sounds like you are
>>> missing very basic knowledge -- on factors -- which you need to learn
>>> about before proceeding.
>>> 
>>> ?factor
>>> 
>>> gives you the answer you seek, I believe.
>>> 
>>> Cheers,
>>> Bert
>>> 
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>> (650) 467-7374
>>> 
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>> H. Gilbert Welch
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
>>>  wrote:
 Dear all,
 
 I am trying to run a GLMM following the procedure described by Rhodes et 
 al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . 
 Like in his example, I have four "sets" of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant 
 species in 4 different functional guilds)
 
 How do I create these "sets"?  I would like to build my model with these 
 "sets" only instead of listing every variable.
 
 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?
 
 Thanks a bunch in advance for your help!
 Maria
 
 __
 [email protected] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Don McKenzie
Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences 
College of the Environment
University of Washington
[email protected]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping explanatory variables into "sets" for GLMM

2014-04-03 Thread Bert Gunter
Unless there is reason to keep the conversation private, always reply
to the list. How will anyone else know that my answer wasn't
satisfactory?

1. I don't intend to go through your references. A minimal
reproducible example of what you wish to do and what you tried would
help.

2. Have you read An Intro to R?

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD
 wrote:
> Thanks for getting back to me.
>
> It seems I didn't write my question clearly and that it was misunderstood - 
> even if it is easy to answer: I would like to reduce the number of 
> explanatory variables in my model by using "sets" or categories that these 
> variables belong to, like Rhodes et al. did in their chapter, or like Lentini 
> et al. 2012 did in their paper.
>
> Factor is not the answer I am looking for, unfortunately.
>
> On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote:
>
>> Have you read "An Introduction to R" (or other online tutorial)? If
>> not, please do so before posting further here. It sounds like you are
>> missing very basic knowledge -- on factors -- which you need to learn
>> about before proceeding.
>>
>> ?factor
>>
>> gives you the answer you seek, I believe.
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> (650) 467-7374
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> H. Gilbert Welch
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
>>  wrote:
>>> Dear all,
>>>
>>> I am trying to run a GLMM following the procedure described by Rhodes et 
>>> al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . 
>>> Like in his example, I have four "sets" of explanatory variables:
>>> 1. Land use - 1 variable, factor (forest or agriculture)
>>> 2. Location - 1 variable, factor (riparian or upland)
>>> 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
>>> manure, annual crop)
>>> 4. Vegetation patterns - 4 variables that are continuous (# of plant 
>>> species in 4 different functional guilds)
>>>
>>> How do I create these "sets"?  I would like to build my model with these 
>>> "sets" only instead of listing every variable.
>>>
>>> Also: is there a way of running all possible models with the different 
>>> combinations of these sets and/or variables, sort of like running ordistep 
>>> for ordinations?
>>>
>>> Thanks a bunch in advance for your help!
>>> Maria
>>>
>>> __
>>> [email protected] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping explanatory variables into "sets" for GLMM

2014-04-03 Thread Bert Gunter
Have you read "An Introduction to R" (or other online tutorial)? If
not, please do so before posting further here. It sounds like you are
missing very basic knowledge -- on factors -- which you need to learn
about before proceeding.

?factor

gives you the answer you seek, I believe.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
 wrote:
> Dear all,
>
> I am trying to run a GLMM following the procedure described by Rhodes et al. 
> (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in 
> his example, I have four "sets" of explanatory variables:
> 1. Land use - 1 variable, factor (forest or agriculture)
> 2. Location - 1 variable, factor (riparian or upland)
> 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
> manure, annual crop)
> 4. Vegetation patterns - 4 variables that are continuous (# of plant species 
> in 4 different functional guilds)
>
> How do I create these "sets"?  I would like to build my model with these 
> "sets" only instead of listing every variable.
>
> Also: is there a way of running all possible models with the different 
> combinations of these sets and/or variables, sort of like running ordistep 
> for ordinations?
>
> Thanks a bunch in advance for your help!
> Maria
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping on a Distance Matrix

2014-02-13 Thread Bert Gunter
You need to re-think. What you said is nonsense. Use an appropriate
clustering algorithm.
(a can be near b; b can be near c; but a is not near c, using "near" =
closer than threshhold)

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Thu, Feb 13, 2014 at 12:00 AM, Dario Strbenac
 wrote:
> Hello,
>
> I'm looking for a function that groups elements below a certain distance 
> threshold, based on a distance matrix. In other words, I'd like to group 
> samples without using a standard clustering algorithm on the distance matrix. 
> For example, let the distance matrix be :
>
>   A B C D
> A 0  0.03  0.77  1.12
> B  0.03 0  1.59  1.11
> C  0.77  1.59 0  0.09
> D  1.12  1.11  0.09 0
>
> Two clusters would be found with a cutoff of 0.1. The first contains A,B. The 
> second has C,D. Is there an efficient function that does this ? I can think 
> of how to do this recursively, but am hoping it's already been considered.
>
> --
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 01/20/14, 14:27 , jim holtman wrote:
> Check out the use of the 'local' function:

True - have completely forgotten the "local" function.

Thanks,

Rainer

> 
> 
>> gc()
> used (Mb) gc trigger (Mb) max used (Mb) Ncells 199420 10.7
> 407500 21.8   35 18.7 Vcells 308004  2.4 786432  6.0
> 786424  6.0
>> result <- local({
> + a <- rnorm(100)  # big objects + b <- rnorm(100) 
> + mean(a + b)  # return value + })
>> 
>> result
> [1] 0.0001819203
>> gc()
> used (Mb) gc trigger (Mb) max used (Mb) Ncells 199666 10.7
> 407500 21.8   35 18.7 Vcells 308780  2.42975200 22.7
> 3710863 28.4
>> 
> 
> Jim Holtman Data Munger Guru
> 
> What is the problem that you are trying to solve? Tell me what you
> want to do, not how you want to do it.
> 
> 
> On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug 
> wrote: Hi
> 
> I would like to group commands, so that after a group of commands
> has been executed, the variables defined in that group are
> automatically deleted.
> 
> My reasoning: I have a longer sript which is used to load data, do 
> analysis and plot graphs, all part of a document (in org-mode / 
> emacs).
> 
> I have several datasets which are loaded, and each one is quite 
> big. So after doing one part of the job (e.g. analysing the data
> and storing the results) I want to delete all variables used to
> free space and to avoid having these variables being used in the
> next block and still having the old (for this block invalid)
> values.
> 
> I can't use rm(list=ls()) as I have some variables as "constants" 
> which do not change over the whole document and also some
> functions defined.
> 
> I could put each block in a function and then call the function
> and delete it afterwards, but this is as I see it abusing
> functions.
> 
> I don't want to keep track manually of the variables.
> 
> Therefore my question:
> 
> Can I do something like:
> 
> x <- 15
> 
> { #here begins the block a <- 1:100 b <- 4:400 } # here ends the
> block
> 
> # here are and b not defined anymore # but x is still defined
> 
> {} is great for grouping the commands, but the variables are not 
> deleted afterwards.
> 
> Am I missing a language feature in R?
> 
> Rainer
> 
>> 
>> __ 
>> [email protected] mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
>> posting guide http://www.R-project.org/posting-guide.html and
>> provide commented, minimal, self-contained, reproducible code.

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  [email protected]

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS3Sn/AAoJENvXNx4PUvmC7uIIAIkXdCNVCA1sqJ7jqODTWbG9
OrDTkhRD/IyR//39sCj5YC79peLbPkpKtQgnmoj7jMoNg2euxmCn3wIGLigWhy2w
cyGqh/TocfRnYVKyQXz4LC/IqVFAi+W9ymyevVDA0vQ9RcEYILEsXxjxl06VhZhS
wzOHOiXXdHka8xswjChPJRjA/17LQaStLYeEIQbukz3WCj1wTY68b6YixqlSh/BZ
7C91EULBQtTqV5OetvfV9lulicw0XyWp+ZcNvEa72Y3jZw5DX0LloLcRuuGLZf3N
dxmnB7Uj4kBArjupgfGtkwZzT1d3UX0bb3vqPt0TRoeJCT04XnupbpdpwUOhJ8c=
=0zID
-END PGP SIGNATURE-

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread jim holtman
Check out the use of the 'local' function:


> gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199420 10.7 407500 21.8   35 18.7
Vcells 308004  2.4 786432  6.0   786424  6.0
> result <- local({
+ a <- rnorm(100)  # big objects
+ b <- rnorm(100)
+ mean(a + b)  # return value
+ })
>
> result
[1] 0.0001819203
> gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199666 10.7 407500 21.8   35 18.7
Vcells 308780  2.42975200 22.7  3710863 28.4
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi
>
> I would like to group commands, so that after a group of commands has
> been executed, the variables defined in that group are automatically
> deleted.
>
> My reasoning: I have a longer sript which is used to load data, do
> analysis and plot graphs, all part of a document (in org-mode /
> emacs).
>
> I have several datasets which are loaded, and each one is quite
> big. So after doing one part of the job (e.g. analysing the data and
> storing the results) I want to delete all variables used to free space
> and to avoid having these variables being used in the next block and
> still having the old (for this block invalid) values.
>
> I can't use rm(list=ls()) as I have some variables as "constants"
> which do not change over the whole document and also some functions
> defined.
>
> I could put each block in a function and then call the function and
> delete it afterwards, but this is as I see it abusing functions.
>
> I don't want to keep track manually of the variables.
>
> Therefore my question:
>
> Can I do something like:
>
> x <- 15
>
> { #here begins the block
> a <- 1:100
> b <- 4:400
> } # here ends the block
>
> # here are and b not defined anymore
> # but x is still defined
>
> {} is great for grouping the commands, but the variables are not
> deleted afterwards.
>
> Am I missing a language feature in R?
>
> Rainer
>
> - --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
> Biology, UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology
> Stellenbosch University
> South Africa
>
> Tel :   +33 - (0)9 53 10 27 44
> Cell:   +33 - (0)6 85 62 59 98
> Fax :   +33 - (0)9 58 10 27 44
>
> Fax (D):+49 - (0)3 21 21 25 22 44
>
> email:  [email protected]
>
> Skype:  RMkrug
> -BEGIN PGP SIGNATURE-
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v
> 49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F
> I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw
> Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g
> 0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz
> 7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0=
> =dBo5
> -END PGP SIGNATURE-
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Matrix by Columns; OHLC Data

2013-09-26 Thread arun
Hi Jake.

Sorry, I misunderstood about what you wanted.
Instead of this:

lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])

If I use:
res1<- lapply(split(indx,(indx-1)%/%n+1),function(i) mat1[,i])

#or
lapply(split(indx, as.numeric(gl(ncol(mat1),n,ncol(mat1,function(i) 
mat1[,i])



 lapply(res1,head,2)[1:2]
#$`1`
 #     O  H  L  C
#[1,] 18 20 30 20
#[2,] 14 15 15 45
#
#$`2`
 #     O  H  L  C
#[1,] 56  6 25 13
#[2,] 31 37 23 17

A.K.




So, i got it worked out. Thanks for your input. I see that you used a 
mod, which worked well for the application which you solved, and an 
application that will likely come up again. Anyways, here is the 
solution I was lookin for: 


set.seed(24) 
 mat1<- matrix(sample(1:60,30*24,replace=TRUE),ncol=24) 
colnames(mat1)<- rep(c("O","H","L","C"),6) 
indx<-seq_along(colnames(mat1)) 
n<- length(unique(colnames(mat1))) 


res <-lapply(split(indx,rep(1:6,each = 4, times = 1)),function(i) mat1[,i]) 
##rep(1:6,each = 4, times = 1) 
## [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 

lapply(res,head,2) 


$`1` 
      O  H  L  C 
[1,] 18 20 30 20 
[2,] 14 15 15 45 

$`2` 
      O  H  L  C 
[1,] 56  6 25 13 
[2,] 31 37 23 17 

$`3` 
      O  H  L  C 
[1,] 51  4 29  8 
[2,] 60 22 15 35 

$`4` 
      O  H  L  C 
[1,] 24 23  1 44 
[2,] 12 52 10  8 

$`5` 
      O  H  L  C 
[1,] 24 10 57  5 
[2,] 43 30 44 25 

$`6` 
      O  H  L  C 
[1,] 52  2 16 13 
[2,] 34 42 60 12 

Thanks again 


- Original Message -
From: arun 
To: R help 
Cc: 
Sent: Thursday, September 26, 2013 5:15 PM
Subject: Re: Grouping Matrix by Columns; OHLC Data

HI,
May be this helps:

set.seed(24)
 mat1<- matrix(sample(1:60,30*24,replace=TRUE),ncol=24)
colnames(mat1)<- rep(c("O","H","L","C"),6)
indx<-seq_along(colnames(mat1))
n<- length(unique(colnames(mat1)))
 res<- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])
lapply(res,head,2)
#$`1`
#  O  O  O  O  O  O
#[1,] 18 56 51 24 24 52
#[2,] 14 31 60 12 43 34
#
#$`2`
#  H  H  H  H  H  H
#[1,] 20  6  4 23 10  2
#[2,] 15 37 22 52 30 42
#
#$`3`
#  L  L  L  L  L  L
#[1,] 30 25 29  1 57 16
#[2,] 15 23 15 10 44 60
#
#$`4`
#  C  C  C  C  C  C
#[1,] 20 13  8 44  5 13
#[2,] 45 17 35  8 25 12

A.K.



Motivation: 

Bring in data containing a number of columns divisable by 4. 
This data contains several different assets and the columns correspond 
to Open,High,Low,Close, Open,High,Low,Close,  etc (thus divisible by
4). From where I am getting this data, the header is not labled as 
Open,High,Low,Close, but rather just has the asset symbol. 

The end goal is to have each Open,High,Low,Close,  as its own 
OHLC object, to be run through different volatility functions (via 
QuantMod ) 

I believe i am best served by first grouping the original data 
so that each asset is its own object, with 4 columns. Then i can rename 
the columns to be: 
colnames(function$asset) <-c("Open", "High","Low", "Close") 

I've attempted to use split, but am having trouble with split along the 
columns. 

Obviously I could manipulate the indexing, with something like 
data[i:i+4] and use a loop. Maybe this indexing approach would work with
use of apply(). 


Previously, I've been using Mathematica for most of my data 
manipulation, and there I would partition the entire data set i.e. 
Matrix, into   column# / 4 separate objects.  So, in that case I have a 3
dimensional object. I'd then call the object by its 3rd dimension index
# [][#]. 

I'm having trouble doing that here. Any thoughts, or at the least  helping me 
to group the data by column. 

For the sake of possible examples, lets say the dimensions of my data is n.rows 
= 30, n.col = 24 


__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Matrix by Columns; OHLC Data

2013-09-26 Thread arun
HI,
May be this helps:

set.seed(24)
 mat1<- matrix(sample(1:60,30*24,replace=TRUE),ncol=24)
colnames(mat1)<- rep(c("O","H","L","C"),6)
indx<-seq_along(colnames(mat1))
n<- length(unique(colnames(mat1)))
 res<- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])
lapply(res,head,2)
#$`1`
#  O  O  O  O  O  O
#[1,] 18 56 51 24 24 52
#[2,] 14 31 60 12 43 34
#
#$`2`
#  H  H  H  H  H  H
#[1,] 20  6  4 23 10  2
#[2,] 15 37 22 52 30 42
#
#$`3`
#  L  L  L  L  L  L
#[1,] 30 25 29  1 57 16
#[2,] 15 23 15 10 44 60
#
#$`4`
#  C  C  C  C  C  C
#[1,] 20 13  8 44  5 13
#[2,] 45 17 35  8 25 12

A.K.



Motivation: 

Bring in data containing a number of columns divisable by 4. 
This data contains several different assets and the columns correspond 
to Open,High,Low,Close, Open,High,Low,Close,  etc (thus divisible by
 4). From where I am getting this data, the header is not labled as 
Open,High,Low,Close, but rather just has the asset symbol. 

The end goal is to have each Open,High,Low,Close,  as its own 
OHLC object, to be run through different volatility functions (via 
QuantMod ) 

I believe i am best served by first grouping the original data 
so that each asset is its own object, with 4 columns. Then i can rename 
the columns to be: 
colnames(function$asset) <-c("Open", "High","Low", "Close") 

I've attempted to use split, but am having trouble with split along the 
columns. 

Obviously I could manipulate the indexing, with something like 
data[i:i+4] and use a loop. Maybe this indexing approach would work with
 use of apply(). 


Previously, I've been using Mathematica for most of my data 
manipulation, and there I would partition the entire data set i.e. 
Matrix, into   column# / 4 separate objects.  So, in that case I have a 3
 dimensional object. I'd then call the object by its 3rd dimension index
 # [][#]. 

I'm having trouble doing that here. Any thoughts, or at the least  helping me 
to group the data by column. 

For the sake of possible examples, lets say the dimensions of my data is n.rows 
= 30, n.col = 24 


__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping variables by a irregular time interval

2013-09-21 Thread Raoni Rodrigues
Arun caught my attention that I committed a mistake with example data set.
I send now the correct, with same text explain my problem.

Sorry all of you for the confusion.

I have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
1   1   2 999 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:33:4731  1
3   1   2 999 22/07/2013 11:34:0019  1
4   1   2 999 22/07/2013 11:34:1653  1
5   1   2 999 22/07/2013 11:43:2015  1
6   1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
1   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
3   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
4   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
5   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
6   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval (look at tag 999).

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, "10 min")),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- "Good Sport"
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
[email protected]


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 68L, 70L, 72L, 73L, 71L, 75L, 86L,
74L, 64L, 64L, 65L, 87L, 67L, 1L, 2L, 3L, 4L, 5L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 66L, 69L, 76L, 77L,
78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 88L, 89L, 90L, 91L, 92L,
93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L,
105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L,
116L, 117L), .Label = c("19/06/2013 22:15:49", "19/06/2013 22:15:54",
"19/06/2013 22:15:59", "19/06/2013 22:16:24", "19/06/2013 22:16:29",
"22/07/2013 11:00:21", "22/07/2013 11:33:47", "22/07/2013 11:34:00",
"22/07/2013 11:34:16", "22/07/2013 11:43:20", "22/07/2013 11:43:35",
"22/07/2013 11:44:35", "22/07/2013 11:44:41", "22/07/2013 11:44:42",
"22/07/2013 11:44:43", "22/07/2013 11:44:44", "22/07/2013 11:44:59",
"22/07/2013 11:45:11", "22/07/2013 11:45:29", "22/07/2013 11:45:30",
"22/07/2013 11:45:31

Re: [R] grouping followed by finding frequent patterns in R

2013-03-10 Thread Bert Gunter
1.Please cc to the list, as I have here, unless your comments are off topic.

2. Use dput() (?dput) to include **small** amounts of data in your
message, as attachments are generally stripped from r-help.

3. I have no experience with itemsets or the arules package, but a
quick glance at the docs there said that your data argument must be in
a specific form coercible into an S4 "transactions" class. I suspect
that neither your initial data frame nor the list deriving from split
is, but maybe someone familiar with the package can tell you for sure.
That's why you need to cc to the list.

-- Bert


On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas  wrote:
> Dear Bert,
>
> My intention is to mine frequent itemsets of TRN_TYP for individual CIN out
> of that data.
> But the problem is using eclat after splitting gives the following error:
>
> Error in eclat(list) : internal error in trio library
>
> PS: I have attached my dataset.
>
>
> On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter  wrote:
>>
>> I **suggest** that you explain what you wish to accomplish using a
>> reproducible example rather than telling us what packages you think
>> you should use. I believe you are making things too complicated; e.g.
>> what do you mean by "frequent patterns"?  Moreover, "basket format" is
>> rather unclear -- and may well be unnecessary. But using lists, it
>> could be simply accomplished by
>>
>> ?split  ## as in
>> the_list <- with(yourdata, split(TYP,  CIN.TRN))
>>
>> or possibly
>>
>> the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table))
>>
>> Of course, these may be irrelevant and useless, but without knowing
>> your purpose ...?
>>
>> -- Bert
>>
>> On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas 
>> wrote:
>> > I have a data in the following form :
>> > CIN TRN_TYP
>> > 90799541
>> > 90799542
>> > 90799543
>> > 90799544
>> > 90799545
>> > 90799544
>> > 90799545
>> > 90799546
>> > 90799547
>> > 90799548
>> > 90799549
>> > 90799549
>> > ..
>> > ..
>> > ..
>> > there are 100 types of CIN (9079954,12441087,15246633,...) and
>> > respective
>> > TRN_TYP
>> >
>> > first of all, I want this data to be grouped into basket format:
>> > 9079954   1, 2, 3, 4, 5, 
>> > 12441087  19, 14, 21, 3, 7, ...
>> > .
>> > .
>> > .
>> > and then apply eclat from arules package to find frequent patterns.
>> >
>> > 1) I ran the following code:
>> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
>> > file <- file[!duplicated(file),]
>> > eclat(split(file$TRN_TYP,file$CIN))
>> >
>> > but it gave me the following error:
>> > Error in asMethod(object) : can not coerce list with transactions with
>> > duplicated items
>> >
>> > 2) I ran this code:
>> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
>> > file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other
>> > columns as well, so I selecting only CIN and TRN_TYP
>> > file_new <- file_new[!duplicated(file_new),]
>> > eclat(split(file_new$TRN_TYP,file_new$CIN))
>> >
>> > but again:
>> > Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
>> >   internal error in trio library
>> >
>> > PLEASE HELP
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > [email protected] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping followed by finding frequent patterns in R

2013-03-09 Thread Bert Gunter
I **suggest** that you explain what you wish to accomplish using a
reproducible example rather than telling us what packages you think
you should use. I believe you are making things too complicated; e.g.
what do you mean by "frequent patterns"?  Moreover, "basket format" is
rather unclear -- and may well be unnecessary. But using lists, it
could be simply accomplished by

?split  ## as in
the_list <- with(yourdata, split(TYP,  CIN.TRN))

or possibly

the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

Of course, these may be irrelevant and useless, but without knowing
your purpose ...?

-- Bert

On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas  wrote:
> I have a data in the following form :
> CIN TRN_TYP
> 90799541
> 90799542
> 90799543
> 90799544
> 90799545
> 90799544
> 90799545
> 90799546
> 90799547
> 90799548
> 90799549
> 90799549
> ..
> ..
> ..
> there are 100 types of CIN (9079954,12441087,15246633,...) and respective
> TRN_TYP
>
> first of all, I want this data to be grouped into basket format:
> 9079954   1, 2, 3, 4, 5, 
> 12441087  19, 14, 21, 3, 7, ...
> .
> .
> .
> and then apply eclat from arules package to find frequent patterns.
>
> 1) I ran the following code:
> file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
> file <- file[!duplicated(file),]
> eclat(split(file$TRN_TYP,file$CIN))
>
> but it gave me the following error:
> Error in asMethod(object) : can not coerce list with transactions with
> duplicated items
>
> 2) I ran this code:
> file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
> file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other
> columns as well, so I selecting only CIN and TRN_TYP
> file_new <- file_new[!duplicated(file_new),]
> eclat(split(file_new$TRN_TYP,file_new$CIN))
>
> but again:
> Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
>   internal error in trio library
>
> PLEASE HELP
>
> [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping elements of a data frame

2013-01-15 Thread arun
Hi,
Try this:
The last part was not clear.
A.df<-read.table(text="
        a c 0.9
    b  x 0.8
    b z 0.5
    c y 0.9
    c x 0.7
    c z 0.6
",sep="",header=FALSE,stringsAsFactors=FALSE)
 lst1<-split(A.df[,-1],A.df$V1)
lst1
#$a
#  V2  V3
#1  c 0.9
#
#$b
#  V2  V3
#2  x 0.8
#3  z 0.5
#
#$c
#  V2  V3
#4  y 0.9
#5  x 0.7
#6  z 0.6


A.K.



- Original Message -
From: Nuri Alpay Temiz 
To: [email protected]
Cc: 
Sent: Tuesday, January 15, 2013 12:10 PM
Subject: [R] grouping elements of a data frame

Hi everyone,

I have a question on selecting and grouping elements of a data frame. For 
example:

A.df<- [ a c 0.9
             b  x 0.8
             b z 0.5
             c y 0.9
             c x 0.7
             c z 0.6]


I want to create a list of a data frame that gives me the unique values of 
column 1 of A.df so that i can create intersects. That is:

B[a]<- [ c 0.9]

B[b]<- [ x 0.8
             z 0.5]

B[c]<- [ y 0.9
             x 0.7
             z 0.6]


B[c] n B[b] <- c(x,z)


How can I accomplish this?

Thanks,
Al
                    
            
__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping elements of a data frame

2013-01-15 Thread David Winsemius

On Jan 15, 2013, at 9:10 AM, Nuri Alpay Temiz wrote:

> Hi everyone,
> 
> I have a question on selecting and grouping elements of a data frame. For 
> example:
> 
> A.df<- [ a c 0.9
> b  x 0.8
> b z 0.5
> c y 0.9
> c x 0.7
> c z 0.6]

That is not R code. Matlab?, Python? 

> 
> 
> I want to create a list of a data frame that gives me the unique values of 
> column 1 of A.df so that i can create intersects. That is:
> 
> B[a]<- [ c 0.9]
> 
> B[b]<- [ x 0.8
> z 0.5]
> 
> B[c]<- [ y 0.9
> x 0.7
> z 0.6]
> 
> 
> B[c] n B[b] <- c(x,z)
> 

That's some sort of coded message? We are supposed to know what the "n" 
operation will do when assigned a vector?


Assuming your really do have a dataframe named B:

intersect(B$c, B$b)

Please code up examples in R in the future.

-- 

David Winsemius
Alameda, CA, USA

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping distances

2012-06-11 Thread Jhope
Thank you Rui, 

I am trying to create a column in the data file turtlehatch.csv

Saludos, Jean

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985p4632989.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping distances

2012-06-11 Thread Rui Barradas

Hello,

It's easy to create a new column. Since you haven't said where nor the 
type of data structure you are using, I'll try to answer to both.

Suppose that 'x' s a matrix. Then

newcolumn <- newvalues
x2 <- cbind(x, newcolumn)  # new column added to x, result in x2

Suppose that 'y' is a data.frame. Then the same would do it, or

y$newcolumn <- newvalues

Now, I believe that the new values come from your function. If so, you 
must assign the function value to some variable outside the function.


htlindex <- HTL.index(...etc...)  # 'htlindex' is the 'newvalues' above


Two extra notes.
One, rowSums() does what your apply() instructions do.

Second, first you multiply then you divide, to give 'weights'. I think 
this is just an example, not the real function.


Hope this helps,

Rui Barradas

Em 11-06-2012 07:01, Jhope escreveu:

Hi R-listers,

I am trying to group my HTL data, this is a column of data of "Distances to
the HTL" data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column.

So far I have gotten this far:

HTL.index <- function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) {
hope <-values * weights
return (apply(hope, 1, sum)/apply(values, 1, sum))
}
write.csv(turtlehatch, "HTLIndex", row.names=FALSE)


But I do not seem to be able to create a new column " in a new file.

Please advise, Jean

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping function

2012-05-08 Thread arun
HI Sarah,

I run the same code from your reply email.  For the makegroup2, the results are 
0 in places of NA.

> makegroup1 <- function(x,y) {
+ group <- numeric(length(x))
+ group[x <= 1990 & y > 1990] <- 1
+ group[x <= 1991 & y > 1991] <- 2
+ group[x <= 1992 & y > 1992] <- 3
+ group
+ }
> makegroup2 <- function(x, y) {
+   ifelse(x <= 1990 & y > 1990, 1,
+   ifelse(x <= 1991 & y > 1991, 2,
+ ifelse(x <= 1992 & y > 1992, 3, 0)))
+ }
> makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
> makegroup2(df$begin,df$end)
 [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0


A. K.




- Original Message -
From: Sarah Goslee 
To: [email protected]
Cc: "[email protected]" 
Sent: Tuesday, May 8, 2012 2:33 PM
Subject: Re: [R] grouping function

Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith  wrote:
> Hello, I would like to write a function that makes a grouping variable for
> some panel data .  The grouping variable is made conditional on the begin
> year and the end year.  Here is the code I have written so far.
>
> name <- c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
> begin <- c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
> end <- c(seq(1995,1999), seq(1995,1999), seq(1996,2000));
>
> df <- data.frame(name, begin, end);
> df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

> #This is the part I am stuck on;
>
> makegroup <- function(x,y) {
>  group <- 0
>  if (x <= 1990 & y > 1990) {group==1}
>  if (x <= 1991 & y > 1991) {group==2}
>  if (x <= 1992 & y > 1992) {group==3}
>  return(x,y)
> }
>
> makegroup(df$begin,df$end);
>
> #I am looking for output where each observation belongs to a group
> conditional on the begin year and end year.  I would also like to use a for
> loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always >= start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that "looks like" your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 <- function(x,y) {
group <- numeric(length(x))
group[x <= 1990 & y > 1990] <- 1
group[x <= 1991 & y > 1991] <- 2
group[x <= 1992 & y > 1992] <- 3
group
}

makegroup2 <- function(x, y) {
   ifelse(x <= 1990 & y > 1990, 1,
      ifelse(x <= 1991 & y > 1991, 2,
      ifelse(x <= 1992 & y > 1992, 3, 0)))
}

> makegroup1(df$begin,df$end)
[1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
> makegroup2(df$begin,df$end)
[1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
> df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping function

2012-05-08 Thread Sarah Goslee
Sorry, yes: I changed it before posting it to more closely match what
the default value in the pseudocode. That's a very minor issue: the
very last value in the nested ifelse() statements is what's used by
default.

Sarah

On Tue, May 8, 2012 at 2:46 PM, arun  wrote:
> HI Sarah,
>
> I run the same code from your reply email.  For the makegroup2, the results 
> are 0 in places of NA.
>
>> makegroup1 <- function(x,y) {
> + group <- numeric(length(x))
> + group[x <= 1990 & y > 1990] <- 1
> + group[x <= 1991 & y > 1991] <- 2
> + group[x <= 1992 & y > 1992] <- 3
> + group
> + }
>> makegroup2 <- function(x, y) {
> +   ifelse(x <= 1990 & y > 1990, 1,
> +   ifelse(x <= 1991 & y > 1991, 2,
> + ifelse(x <= 1992 & y > 1992, 3, 0)))
> + }
>> makegroup1(df$begin,df$end)
>  [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
>> makegroup2(df$begin,df$end)
>  [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0
>
>
> A. K.
>
>
>
>
> - Original Message -
> From: Sarah Goslee 
> To: [email protected]
> Cc: "[email protected]" 
> Sent: Tuesday, May 8, 2012 2:33 PM
> Subject: Re: [R] grouping function
>
> Hi,
>
> On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith  wrote:
>> Hello, I would like to write a function that makes a grouping variable for
>> some panel data .  The grouping variable is made conditional on the begin
>> year and the end year.  Here is the code I have written so far.
>>
>> name <- c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
>> begin <- c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
>> end <- c(seq(1995,1999), seq(1995,1999), seq(1996,2000));
>>
>> df <- data.frame(name, begin, end);
>> df;
>
> Thanks for providing reproducible data. Two minor points: you don't
> need ; at the end of lines, and calling your data frame df is
> confusing because there's a df() function.
>
>> #This is the part I am stuck on;
>>
>> makegroup <- function(x,y) {
>>  group <- 0
>>  if (x <= 1990 & y > 1990) {group==1}
>>  if (x <= 1991 & y > 1991) {group==2}
>>  if (x <= 1992 & y > 1992) {group==3}
>>  return(x,y)
>> }
>>
>> makegroup(df$begin,df$end);
>>
>> #I am looking for output where each observation belongs to a group
>> conditional on the begin year and end year.  I would also like to use a for
>> loop for programming accuracy as well;
>
> This isn't a clear specification:
> 1990, 1994 for instance fits into all three groups. Do you want to
> extend this to more start years, or are you only interested in those
> three? Assuming end is always >= start, you don't even need to
> consider the end years in your grouping.
>
> Here are two methods, one that "looks like" your pseudocode, and one
> that is more R-ish. They give different results because of different
> handling of cases that fit all three groups. Rearranging the
> statements in makegroup1() from broadest to most restrictive would
> make it give the same result as makegroup2().
>
>
> makegroup1 <- function(x,y) {
> group <- numeric(length(x))
> group[x <= 1990 & y > 1990] <- 1
> group[x <= 1991 & y > 1991] <- 2
> group[x <= 1992 & y > 1992] <- 3
> group
> }
>
> makegroup2 <- function(x, y) {
>    ifelse(x <= 1990 & y > 1990, 1,
>       ifelse(x <= 1991 & y > 1991, 2,
>       ifelse(x <= 1992 & y > 1992, 3, 0)))
> }
>
>> makegroup1(df$begin,df$end)
> [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
>> makegroup2(df$begin,df$end)
> [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
>> df
>
>
> But really, it's a better idea to develop an unambiguous statement of
> your desired output.
>
> Sarah
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping function

2012-05-08 Thread Sarah Goslee
Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith  wrote:
> Hello, I would like to write a function that makes a grouping variable for
> some panel data .  The grouping variable is made conditional on the begin
> year and the end year.  Here is the code I have written so far.
>
> name <- c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
> begin <- c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
> end <- c(seq(1995,1999), seq(1995,1999), seq(1996,2000));
>
> df <- data.frame(name, begin, end);
> df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

> #This is the part I am stuck on;
>
> makegroup <- function(x,y) {
>  group <- 0
>  if (x <= 1990 & y > 1990) {group==1}
>  if (x <= 1991 & y > 1991) {group==2}
>  if (x <= 1992 & y > 1992) {group==3}
>  return(x,y)
> }
>
> makegroup(df$begin,df$end);
>
> #I am looking for output where each observation belongs to a group
> conditional on the begin year and end year.  I would also like to use a for
> loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always >= start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that "looks like" your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 <- function(x,y) {
 group <- numeric(length(x))
 group[x <= 1990 & y > 1990] <- 1
 group[x <= 1991 & y > 1991] <- 2
 group[x <= 1992 & y > 1992] <- 3
 group
}

makegroup2 <- function(x, y) {
   ifelse(x <= 1990 & y > 1990, 1,
  ifelse(x <= 1991 & y > 1991, 2,
   ifelse(x <= 1992 & y > 1992, 3, 0)))
}

> makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
> makegroup2(df$begin,df$end)
 [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
> df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Ashish Agarwal
Thanks a ton!
It was weird because according to me ordering should have by default.
Anyways, your workaround along with Weidong's method are both good
solutions.
On Wed, Apr 4, 2012 at 12:10 PM, Berend Hasselman  wrote:

>
> On 04-04-2012, at 07:15, Ashish Agarwal wrote:
>
> > Yes. I was missing the DROP argument.
> > But now the problem is splitting is causing some weird ordering of
> groups.
>
> Why weird?
>
> > See below:
> >
> > DF <- read.table(text="
> > Houseid,Personid,Tripid,taz
> > 1,1,1,4
> > 1,1,2,7
> > 2,1,1,96
> > 2,1,2,4
> > 2,1,3,2
> > 2,2,1,58
> > 3,1,5,7
> > ", header=TRUE, sep=",")
> > aa <- split(DF, DF[, 1:2], drop=TRUE)
> >
> > Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
> > preserve the ascending order?
> >
>
> Try this
>
> aa[order(names(aa))]
>
> Berend
>
> >> aa[3]
> > $`3.1`
> >  Houseid Personid Tripid taz
> > 7   31  5   7
> >> aa[4]
> > $`2.2`
> >  Houseid Personid Tripid taz
> > 6   22  1  58
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Berend Hasselman

On 04-04-2012, at 07:15, Ashish Agarwal wrote:

> Yes. I was missing the DROP argument.
> But now the problem is splitting is causing some weird ordering of groups.

Why weird?

> See below:
> 
> DF <- read.table(text="
> Houseid,Personid,Tripid,taz
> 1,1,1,4
> 1,1,2,7
> 2,1,1,96
> 2,1,2,4
> 2,1,3,2
> 2,2,1,58
> 3,1,5,7
> ", header=TRUE, sep=",")
> aa <- split(DF, DF[, 1:2], drop=TRUE)
> 
> Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
> preserve the ascending order?
> 

Try this

aa[order(names(aa))]

Berend

>> aa[3]
> $`3.1`
>  Houseid Personid Tripid taz
> 7   31  5   7
>> aa[4]
> $`2.2`
>  Houseid Personid Tripid taz
> 6   22  1  58
> 
> 
> On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas  wrote:
> 
>> Hello,
>> 
>> 
>> Ashish Agarwal wrote
>>> 
>>> I have a dataframe imported from csv file below:
>>> 
>>> Houseid,Personid,Tripid,taz
>>> 1,1,1,4
>>> 1,1,2,7
>>> 2,1,1,96
>>> 2,1,2,4
>>> 2,1,3,2
>>> 2,2,1,58
>>> 
>>> There are three groups identified based on the combination of first and
>>> second columns. How do I split this data frame?
>>> 
>>> I tried
>>> aa <- split(inpfil, inpfil[,1:2])
>>> but it has problems.
>>> 
>>> Output desired is
>>> 
>>> aa[1]
>>> Houseid,Personid,Tripid,taz
>>> 1,1,1,4
>>> 1,1,2,7
>>> aa[2]
>>> Houseid,Personid,Tripid,taz
>>> 2,1,1,96
>>> 2,1,2,4
>>> 2,1,3,2
>>> aa[3]
>>> Houseid,Personid,Tripid,taz
>>> 2,2,1,58
>>> 
>>>  [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@ mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> Any of the following three works with me.
>> 
>> 
>> DF <- read.table(text="
>> Houseid,Personid,Tripid,taz
>> 1,1,1,4
>> 1,1,2,7
>> 2,1,1,96
>> 2,1,2,4
>> 2,1,3,2
>> 2,2,1,58
>> ", header=TRUE, sep=",")
>> 
>> DF
>> 
>> split(DF, DF[, 1:2], drop=TRUE)
>> split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
>> with(DF, split(DF, list(Houseid, Personid), drop=TRUE))
>> 
>> The argument 'drop' defaults to FALSE. Was that the problem?
>> 
>> Hope this helps,
>> 
>> Rui Barrada
> 
>   [[alternative HTML version deleted]]
> 
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Ashish Agarwal
Yes. I was missing the DROP argument.
But now the problem is splitting is causing some weird ordering of groups.
See below:

DF <- read.table(text="
Houseid,Personid,Tripid,taz
1,1,1,4
1,1,2,7
2,1,1,96
2,1,2,4
2,1,3,2
2,2,1,58
3,1,5,7
", header=TRUE, sep=",")
aa <- split(DF, DF[, 1:2], drop=TRUE)

Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
preserve the ascending order?

> aa[3]
$`3.1`
  Houseid Personid Tripid taz
7   31  5   7
> aa[4]
$`2.2`
  Houseid Personid Tripid taz
6   22  1  58


On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas  wrote:

> Hello,
>
>
> Ashish Agarwal wrote
>  >
> > I have a dataframe imported from csv file below:
> >
> > Houseid,Personid,Tripid,taz
> > 1,1,1,4
> > 1,1,2,7
> > 2,1,1,96
> > 2,1,2,4
> > 2,1,3,2
> > 2,2,1,58
> >
> > There are three groups identified based on the combination of first and
> > second columns. How do I split this data frame?
> >
> > I tried
> > aa <- split(inpfil, inpfil[,1:2])
> > but it has problems.
> >
> > Output desired is
> >
> > aa[1]
> >  Houseid,Personid,Tripid,taz
> > 1,1,1,4
> > 1,1,2,7
> > aa[2]
> >  Houseid,Personid,Tripid,taz
> > 2,1,1,96
> > 2,1,2,4
> > 2,1,3,2
> > aa[3]
> >  Houseid,Personid,Tripid,taz
> > 2,2,1,58
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@ mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> Any of the following three works with me.
>
>
> DF <- read.table(text="
> Houseid,Personid,Tripid,taz
> 1,1,1,4
> 1,1,2,7
> 2,1,1,96
> 2,1,2,4
> 2,1,3,2
> 2,2,1,58
> ", header=TRUE, sep=",")
>
> DF
>
> split(DF, DF[, 1:2], drop=TRUE)
> split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
> with(DF, split(DF, list(Houseid, Personid), drop=TRUE))
>
> The argument 'drop' defaults to FALSE. Was that the problem?
>
> Hope this helps,
>
> Rui Barrada

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Rui Barradas
Hello,


Ashish Agarwal wrote
> 
> I have a dataframe imported from csv file below:
> 
> Houseid,Personid,Tripid,taz
> 1,1,1,4
> 1,1,2,7
> 2,1,1,96
> 2,1,2,4
> 2,1,3,2
> 2,2,1,58
> 
> There are three groups identified based on the combination of first and
> second columns. How do I split this data frame?
> 
> I tried
> aa <- split(inpfil, inpfil[,1:2])
> but it has problems.
> 
> Output desired is
> 
> aa[1]
>  Houseid,Personid,Tripid,taz
> 1,1,1,4
> 1,1,2,7
> aa[2]
>  Houseid,Personid,Tripid,taz
> 2,1,1,96
> 2,1,2,4
> 2,1,3,2
> aa[3]
>  Houseid,Personid,Tripid,taz
> 2,2,1,58
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Any of the following three works with me.


DF <- read.table(text="
Houseid,Personid,Tripid,taz
1,1,1,4
1,1,2,7
2,1,1,96
2,1,2,4
2,1,3,2
2,2,1,58 
", header=TRUE, sep=",")

DF

split(DF, DF[, 1:2], drop=TRUE)
split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
with(DF, split(DF, list(Houseid, Personid), drop=TRUE))

The argument 'drop' defaults to FALSE. Was that the problem?

Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-and-or-splitting-tp4530410p4530624.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and/or splitting

2012-04-03 Thread Weidong Gu
how about

split(inpfil, paste(inpfil[,1],inpfil[,2],sep=','))

Weidong Gu

On Tue, Apr 3, 2012 at 6:42 PM, Ashish Agarwal
 wrote:
> I have a dataframe imported from csv file below:
>
> Houseid,Personid,Tripid,taz
> 1,1,1,4
> 1,1,2,7
> 2,1,1,96
> 2,1,2,4
> 2,1,3,2
> 2,2,1,58
>
> There are three groups identified based on the combination of first and
> second columns. How do I split this data frame?
>
> I tried
> aa <- split(inpfil, inpfil[,1:2])
> but it has problems.
>
> Output desired is
>
> aa[1]
>  Houseid,Personid,Tripid,taz
> 1,1,1,4
> 1,1,2,7
> aa[2]
>  Houseid,Personid,Tripid,taz
> 2,1,1,96
> 2,1,2,4
> 2,1,3,2
> aa[3]
>  Houseid,Personid,Tripid,taz
> 2,2,1,58
>
>        [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread R. Michael Weylandt
Please take a look at my first reply to you:

ave(y, findInterval(y, quantile(y, c(0.33, 0.66

Then read ?ave for an explanation of the syntax. ave takes two
vectors, the first being the data to be averaged, the second being an
index to split by. You don't want to use split() here.

Michael

On Tue, Apr 3, 2012 at 2:50 PM, Val  wrote:
> I did look at it the result  is below,
>
> x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
>
> #lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
> include.lowest=TRUE) ), mean)
>  ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
> include.lowest=TRUE) ), mean)
>
>> ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
> include.lowest=TRUE) ), mean)
> $`[36,74]`
> [1] NA
>
> $`(74,197]`
> [1] NA
>
> $`(197,297]`
> [1] NA
>
> There were 11 warnings (use warnings() to see them)
>
>
>
>
>
> On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky  wrote:
>
>> On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
>> > Hi All,
>> >
>> > On the same data  points
>> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
>> >
>> > I want to have have the following output  as data frame
>> >
>> > x       group   group mean
>> > 46       1        42.3
>> > 125     2        89.6
>> > 36       1        42.3
>> > 193     3        235.25
>> > 209     3        235.25
>> > 78       2        89.6
>> > 66       2        89.6
>> > 242     3        235.25
>> > 297     3        235.25
>> > 45       1        42.3
>> >
>> > I tried the following code
>> >
>> >
>> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
>> ,1
>> > gxc <- with(dat, tapply(xc, group, mean))
>> > dat$gxc <- gxce[as.character(dat$group)]
>> > txc=dat$gxc
>> >
>> > it did not work for me.
>>
>> David Winsemius suggested to use ave(), when you asked this
>> question for the first time. Can you have look at it?
>>
>> Petr Savicky.
>>
>> __
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Berend Hasselman

On 03-04-2012, at 20:21, Val wrote:

> Hi All,
> 
> On the same data  points
> x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> 
> I want to have have the following output  as data frame
> 
> x   group   group mean
> 46   142.3
> 125 289.6
> 36   142.3
> 193 3235.25
> 209 3235.25
> 78   289.6
> 66   289.6
> 242 3235.25
> 297 3235.25
> 45   142.3
> 
> I tried the following code
> 
> 
> dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
> gxc <- with(dat, tapply(xc, group, mean))
> dat$gxc <- gxce[as.character(dat$group)]
> txc=dat$gxc
> 
> it did not work for me.
> 

I'm not surprised.

In the line dat <- there are 5 opening parentheses and 4 closing )'s.
In the line dat$gxc <- you reference an object gxce. Where was it created?

So I tried this

> dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 
> ,1)), all.inside=TRUE))
> dat$gmean <- ave(dat$x, as.factor(dat$group))
> dat
 x group gmean
1   46 1  42.3
2  125 2  89.7
3   36 1  42.3
4  193 3 235.25000
5  209 3 235.25000
6   78 2  89.7
7   66 2  89.7
8  242 3 235.25000
9  297 3 235.25000
10  45 1  42.3

Berend

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Berend Hasselman

On 03-04-2012, at 21:02, Val wrote:

> 
> 
> On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman  wrote:
> 
> On 03-04-2012, at 20:21, Val wrote:
> 
> > Hi All,
> >
> > On the same data  points
> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> >
> > I want to have have the following output  as data frame
> >
> > x   group   group mean
> > 46   142.3
> > 125 289.6
> > 36   142.3
> > 193 3235.25
> > 209 3235.25
> > 78   289.6
> > 66   289.6
> > 242 3235.25
> > 297 3235.25
> > 45   142.3
> >
> > I tried the following code
> >
> >
> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
> > gxc <- with(dat, tapply(xc, group, mean))
> > dat$gxc <- gxce[as.character(dat$group)]
> > txc=dat$gxc
> >
> > it did not work for me.
> >
> 
> I'm not surprised.
> 
> In the line dat <- there are 5 opening parentheses and 4 closing )'s.
> In the line dat$gxc <- you reference an object gxce. Where was it created?
> 
> So I tried this
> 
> > dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333, .66 
> > ,1)), all.inside=TRUE))
> > dat$gmean <- ave(dat$x, as.factor(dat$group))

And the as.factor is not necessary. This will do

dat$gmean <- ave(dat$x, dat$group)

Berend

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
On Tue, Apr 3, 2012 at 2:53 PM, Berend Hasselman  wrote:

>
> On 03-04-2012, at 20:21, Val wrote:
>
> > Hi All,
> >
> > On the same data  points
> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> >
> > I want to have have the following output  as data frame
> >
> > x   group   group mean
> > 46   142.3
> > 125 289.6
> > 36   142.3
> > 193 3235.25
> > 209 3235.25
> > 78   289.6
> > 66   289.6
> > 242 3235.25
> > 297 3235.25
> > 45   142.3
> >
> > I tried the following code
> >
> >
> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
> ,1
> > gxc <- with(dat, tapply(xc, group, mean))
> > dat$gxc <- gxce[as.character(dat$group)]
> > txc=dat$gxc
> >
> > it did not work for me.
> >
>
> I'm not surprised.
>
> In the line dat <- there are 5 opening parentheses and 4 closing )'s.
> In the line dat$gxc <- you reference an object gxce. Where was it created?
>
> So I tried this
>
> > dat <- data.frame(x, group=findInterval(x, quantile(x, prob=c(0, .333,
> .66 ,1)), all.inside=TRUE))
> > dat$gmean <- ave(dat$x, as.factor(dat$group))
> > dat
> x group gmean
> 1   46 1  42.3
> 2  125 2  89.7
> 3   36 1  42.3
> 4  193 3 235.25000
> 5  209 3 235.25000
> 6   78 2  89.7
> 7   66 2  89.7
> 8  242 3 235.25000
> 9  297 3 235.25000
> 10  45 1  42.3
>
>
Thank you very much. It is working now.  there  was a type error on
"gxce". But in the  r-code it was correct,  gxc..




> Berend
>
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
I did look at it the result  is below,

x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

#lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
  ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)

> ave( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,
include.lowest=TRUE) ), mean)
$`[36,74]`
[1] NA

$`(74,197]`
[1] NA

$`(197,297]`
[1] NA

There were 11 warnings (use warnings() to see them)





On Tue, Apr 3, 2012 at 2:35 PM, Petr Savicky  wrote:

> On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
> > Hi All,
> >
> > On the same data  points
> > x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> >
> > I want to have have the following output  as data frame
> >
> > x   group   group mean
> > 46   142.3
> > 125 289.6
> > 36   142.3
> > 193 3235.25
> > 209 3235.25
> > 78   289.6
> > 66   289.6
> > 242 3235.25
> > 297 3235.25
> > 45   142.3
> >
> > I tried the following code
> >
> >
> > dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66
> ,1
> > gxc <- with(dat, tapply(xc, group, mean))
> > dat$gxc <- gxce[as.character(dat$group)]
> > txc=dat$gxc
> >
> > it did not work for me.
>
> David Winsemius suggested to use ave(), when you asked this
> question for the first time. Can you have look at it?
>
> Petr Savicky.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Petr Savicky
On Tue, Apr 03, 2012 at 02:21:36PM -0400, Val wrote:
> Hi All,
> 
> On the same data  points
> x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )
> 
> I want to have have the following output  as data frame
> 
> x   group   group mean
> 46   142.3
> 125 289.6
> 36   142.3
> 193 3235.25
> 209 3235.25
> 78   289.6
> 66   289.6
> 242 3235.25
> 297 3235.25
> 45   142.3
> 
> I tried the following code
> 
> 
> dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
> gxc <- with(dat, tapply(xc, group, mean))
> dat$gxc <- gxce[as.character(dat$group)]
> txc=dat$gxc
> 
> it did not work for me.

David Winsemius suggested to use ave(), when you asked this
question for the first time. Can you have look at it?

Petr Savicky.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
Hi All,

On the same data  points
x=c(46, 125 , 36 ,193, 209, 78, 66, 242 , 297,45 )

I want to have have the following output  as data frame

x   group   group mean
46   142.3
125 289.6
36   142.3
193 3235.25
209 3235.25
78   289.6
66   289.6
242 3235.25
297 3235.25
45   142.3

I tried the following code


dat <- data.frame(xc=split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1
gxc <- with(dat, tapply(xc, group, mean))
dat$gxc <- gxce[as.character(dat$group)]
txc=dat$gxc

it did not work for me.













On Tue, Apr 3, 2012 at 10:15 AM, David Winsemius wrote:

>
> On Apr 3, 2012, at 10:11 AM, Val wrote:
>
> David W and all,
>
> Thank you very much for your help.
>
> Here is the final output that I want in the form of data frame. The data
> frame should contain  x, group and group_ mean in the following way
>
> x   group   group mean
> 46   142.3
> 125 289.6
> 36   142.3
> 193 3235.25
> 209 3235.25
> 78   289.6
> 66   289.6
> 242 3235.25
> 297 3235.25
> 45   142.3
>
>
> I you want group means in a vector the same length as x then instead of
> using tapply as done in earlier solutions you should use `ave`.
>
> --
> DW
>
>
>
> Thanks a lot
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius wrote:
>
>>
>> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>>
>>  Use cut2 as I suggested and David demonstrated.
>>>
>>
>> Agree that Hmisc::cut2 is extremely handy and I also like that fact that
>> the closed ends of intervals are on the left side (which is not the same
>> behavior as cut()), which has the otehr effect of setting include.lowest =
>> TRUE which is not the default for cut() either (to my continued amazement).
>>
>> But let me add the method I use when doing it "by hand":
>>
>> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)
>>
>> --
>> David.
>>
>>
>>
>>
>>> Michael
>>>
>>> On Tue, Apr 3, 2012 at 9:31 AM, Val  wrote:
>>>
 Thank you all (David, Michael, Giovanni)  for your prompt response.

 First there was a typo error for the group mean it was 89.6 not 87.

 For a small data set and few groupings I can use  prob=c(0, .333, .66
 ,1) to
 group in to three groups in this case. However,  if I want to extend the
 number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

 Is there a short cut for that?


 Thanks











 On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
  wrote:

>
> Ignoring the fact your desired answers are wrong, I'd split the
> separating part and the group means parts into three steps:
>
> i) quantile() can help you get the split points,
> ii)  findInterval() can assign each y to a group
> iii) then ave() or tapply() will do group-wise means
>
> Something like:
>
> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
> here.
> ave(y, findInterval(y, quantile(y, c(0.33, 0.66
> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>
> You could also use cut2 from the Hmisc package to combine findInterval
> and quantile into a single step.
>
> Depending on your desired output.
>
> Hope that helps,
> Michael
>
> On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:
>
>> Hi all,
>>
>> Assume that I have the following 10 data points.
>>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>>
>> sort x  and get the following
>>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>>
>> I want to  group the sorted  data point (y)  into  equal number of
>> observation per group. In this case there will be three groups.  The
>> first
>> two groups  will have three observation  and the third will have four
>> observations
>>
>> group 1  = 34, 45, 46
>> group 2  = 66, 78, 125
>> group 3  = 193, 209, 242,297
>>
>> Finally I want to calculate the group mean
>>
>> group 1  =  42
>> group 2  =  87
>> group 3  =  234
>>
>> Can anyone help me out?
>>
>> In SAS I used to do it using proc rank.
>>
>> thanks in advance
>>
>> Val
>>
>>   [[alternative HTML version deleted]]
>>
>
>
>> __**
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/**posting-guide.html
>> and provide commented, minimal, self-contained, repro

Re: [R] grouping

2012-04-03 Thread David Winsemius

On Apr 3, 2012, at 10:11 AM, Val wrote:

> David W and all,
>
> Thank you very much for your help.
>
> Here is the final output that I want in the form of data frame. The  
> data frame should contain  x, group and group_ mean in the following  
> way
>
> x   group   group mean
> 46   142.3
> 125 289.6
> 36   142.3
> 193 3235.25
> 209 3235.25
> 78   289.6
> 66   289.6
> 242 3235.25
> 297 3235.25
> 45   142.3

I you want group means in a vector the same length as x then instead  
of using tapply as done in earlier solutions you should use `ave`.

-- 
DW


>
> Thanks a lot
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius  > wrote:
>
> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>
> Use cut2 as I suggested and David demonstrated.
>
> Agree that Hmisc::cut2 is extremely handy and I also like that fact  
> that the closed ends of intervals are on the left side (which is not  
> the same behavior as cut()), which has the otehr effect of setting  
> include.lowest = TRUE which is not the default for cut() either (to  
> my continued amazement).
>
> But let me add the method I use when doing it "by hand":
>
> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)),  
> include.lowest=TRUE)
>
> -- 
> David.
>
>
>
>
> Michael
>
> On Tue, Apr 3, 2012 at 9:31 AM, Val  wrote:
> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, . 
> 66 ,1) to
> group in to three groups in this case. However,  if I want to extend  
> the
> number of groupings say 10 or 15 then do I have to figure it out the
>  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>  wrote:
>
> Ignoring the fact your desired answers are wrong, I'd split the
> separating part and the group means parts into three steps:
>
> i) quantile() can help you get the split points,
> ii)  findInterval() can assign each y to a group
> iii) then ave() or tapply() will do group-wise means
>
> Something like:
>
> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"  
> here.
> ave(y, findInterval(y, quantile(y, c(0.33, 0.66
> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>
> You could also use cut2 from the Hmisc package to combine findInterval
> and quantile into a single step.
>
> Depending on your desired output.
>
> Hope that helps,
> Michael
>
> On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:
> Hi all,
>
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
>   [[alternative HTML version deleted]]
>
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT


[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
David W and all,

Thank you very much for your help.

Here is the final output that I want in the form of data frame. The data
frame should contain  x, group and group_ mean in the following way

x   group   group mean
46   142.3
125 289.6
36   142.3
193 3235.25
209 3235.25
78   289.6
66   289.6
242 3235.25
297 3235.25
45   142.3

Thanks a lot








On Tue, Apr 3, 2012 at 9:51 AM, David Winsemius wrote:

>
> On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:
>
>  Use cut2 as I suggested and David demonstrated.
>>
>
> Agree that Hmisc::cut2 is extremely handy and I also like that fact that
> the closed ends of intervals are on the left side (which is not the same
> behavior as cut()), which has the otehr effect of setting include.lowest =
> TRUE which is not the default for cut() either (to my continued amazement).
>
> But let me add the method I use when doing it "by hand":
>
> cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)
>
> --
> David.
>
>
>
>
>> Michael
>>
>> On Tue, Apr 3, 2012 at 9:31 AM, Val  wrote:
>>
>>> Thank you all (David, Michael, Giovanni)  for your prompt response.
>>>
>>> First there was a typo error for the group mean it was 89.6 not 87.
>>>
>>> For a small data set and few groupings I can use  prob=c(0, .333, .66
>>> ,1) to
>>> group in to three groups in this case. However,  if I want to extend the
>>> number of groupings say 10 or 15 then do I have to figure it out the
>>>  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>>>
>>> Is there a short cut for that?
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>>>  wrote:
>>>

 Ignoring the fact your desired answers are wrong, I'd split the
 separating part and the group means parts into three steps:

 i) quantile() can help you get the split points,
 ii)  findInterval() can assign each y to a group
 iii) then ave() or tapply() will do group-wise means

 Something like:

 y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
 here.
 ave(y, findInterval(y, quantile(y, c(0.33, 0.66
 tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

 You could also use cut2 from the Hmisc package to combine findInterval
 and quantile into a single step.

 Depending on your desired output.

 Hope that helps,
 Michael

 On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:

> Hi all,
>
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The
> first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
>   [[alternative HTML version deleted]]
>


> __**
> [email protected] mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/**posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

>>>
>>>
>> __**
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html 
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David L Carlson
Or just replace c(0, .333, .667, 1) with 

n <- 10
split(x, cut(x, quantile(x, prob= c(0, 1:(n-1)/n, 1)), include.lowest=TRUE))

where n is the number of groups you want.

--
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



-Original Message-
From: [email protected] [mailto:[email protected]] On
Behalf Of R. Michael Weylandt
Sent: Tuesday, April 03, 2012 8:32 AM
To: Val
Cc: [email protected]
Subject: Re: [R] grouping

Use cut2 as I suggested and David demonstrated.

Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val  wrote:
> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
to
> group in to three groups in this case. However,  if I want to extend the
> number of groupings say 10 or 15 then do I have to figure it out the
>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>  wrote:
>>
>> Ignoring the fact your desired answers are wrong, I'd split the
>> separating part and the group means parts into three steps:
>>
>> i) quantile() can help you get the split points,
>> ii)  findInterval() can assign each y to a group
>> iii) then ave() or tapply() will do group-wise means
>>
>> Something like:
>>
>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c"
here.
>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66
>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>
>> You could also use cut2 from the Hmisc package to combine findInterval
>> and quantile into a single step.
>>
>> Depending on your desired output.
>>
>> Hope that helps,
>> Michael
>>
>> On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:
>> > Hi all,
>> >
>> > Assume that I have the following 10 data points.
>> >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>> >
>> > sort x  and get the following
>> >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>> >
>> > I want to  group the sorted  data point (y)  into  equal number of
>> > observation per group. In this case there will be three groups.  The
>> > first
>> > two groups  will have three observation  and the third will have four
>> > observations
>> >
>> > group 1  = 34, 45, 46
>> > group 2  = 66, 78, 125
>> > group 3  = 193, 209, 242,297
>> >
>> > Finally I want to calculate the group mean
>> >
>> > group 1  =  42
>> > group 2  =  87
>> > group 3  =  234
>> >
>> > Can anyone help me out?
>> >
>> > In SAS I used to do it using proc rank.
>> >
>> > thanks in advance
>> >
>> > Val
>> >
>> >        [[alternative HTML version deleted]]
>>
>> >
>> > __
>> > [email protected] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David Winsemius


On Apr 3, 2012, at 9:32 AM, R. Michael Weylandt wrote:


Use cut2 as I suggested and David demonstrated.


Agree that Hmisc::cut2 is extremely handy and I also like that fact  
that the closed ends of intervals are on the left side (which is not  
the same behavior as cut()), which has the otehr effect of setting  
include.lowest = TRUE which is not the default for cut() either (to my  
continued amazement).


But let me add the method I use when doing it "by hand":

cut(x, quantile(x, prob=seq(0, 1, length=ngrps+1)), include.lowest=TRUE)

--
David.




Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val  wrote:

Thank you all (David, Michael, Giovanni)  for your prompt response.

First there was a typo error for the group mean it was 89.6 not 87.

For a small data set and few groupings I can use  prob=c(0, .333, . 
66 ,1) to
group in to three groups in this case. However,  if I want to  
extend the

number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

Is there a short cut for that?


Thanks











On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
 wrote:


Ignoring the fact your desired answers are wrong, I'd split the
separating part and the group means parts into three steps:

i) quantile() can help you get the split points,
ii)  findInterval() can assign each y to a group
iii) then ave() or tapply() will do group-wise means

Something like:

y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a  
"c" here.

ave(y, findInterval(y, quantile(y, c(0.33, 0.66
tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

You could also use cut2 from the Hmisc package to combine  
findInterval

and quantile into a single step.

Depending on your desired output.

Hope that helps,
Michael

On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:

Hi all,

Assume that I have the following 10 data points.
 x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
 y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.   
The

first
two groups  will have three observation  and the third will have  
four

observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234

Can anyone help me out?

In SAS I used to do it using proc rank.

thanks in advance

Val

   [[alternative HTML version deleted]]




__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Petr Savicky
On Tue, Apr 03, 2012 at 09:31:29AM -0400, Val wrote:
> Thank you all (David, Michael, Giovanni)  for your prompt response.
> 
> First there was a typo error for the group mean it was 89.6 not 87.
> 
> For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
> to group in to three groups in this case. However,  if I want to extend the
> number of groupings say 10 or 15 then do I have to figure it out the
>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
> 
> Is there a short cut for that?

Hi.

There may be better ways for the whole task, but specifically
c(0, .333, .66 ,1) can be obtained as

  seq(0, 1, length=3+1)

Hope this helps.

Petr Savicky.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread R. Michael Weylandt
Use cut2 as I suggested and David demonstrated.

Michael

On Tue, Apr 3, 2012 at 9:31 AM, Val  wrote:
> Thank you all (David, Michael, Giovanni)  for your prompt response.
>
> First there was a typo error for the group mean it was 89.6 not 87.
>
> For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1) to
> group in to three groups in this case. However,  if I want to extend the
> number of groupings say 10 or 15 then do I have to figure it out the
>   split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))
>
> Is there a short cut for that?
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt
>  wrote:
>>
>> Ignoring the fact your desired answers are wrong, I'd split the
>> separating part and the group means parts into three steps:
>>
>> i) quantile() can help you get the split points,
>> ii)  findInterval() can assign each y to a group
>> iii) then ave() or tapply() will do group-wise means
>>
>> Something like:
>>
>> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here.
>> ave(y, findInterval(y, quantile(y, c(0.33, 0.66
>> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>>
>> You could also use cut2 from the Hmisc package to combine findInterval
>> and quantile into a single step.
>>
>> Depending on your desired output.
>>
>> Hope that helps,
>> Michael
>>
>> On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:
>> > Hi all,
>> >
>> > Assume that I have the following 10 data points.
>> >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>> >
>> > sort x  and get the following
>> >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>> >
>> > I want to  group the sorted  data point (y)  into  equal number of
>> > observation per group. In this case there will be three groups.  The
>> > first
>> > two groups  will have three observation  and the third will have four
>> > observations
>> >
>> > group 1  = 34, 45, 46
>> > group 2  = 66, 78, 125
>> > group 3  = 193, 209, 242,297
>> >
>> > Finally I want to calculate the group mean
>> >
>> > group 1  =  42
>> > group 2  =  87
>> > group 3  =  234
>> >
>> > Can anyone help me out?
>> >
>> > In SAS I used to do it using proc rank.
>> >
>> > thanks in advance
>> >
>> > Val
>> >
>> >        [[alternative HTML version deleted]]
>>
>> >
>> > __
>> > [email protected] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Val
Thank you all (David, Michael, Giovanni)  for your prompt response.

First there was a typo error for the group mean it was 89.6 not 87.

For a small data set and few groupings I can use  prob=c(0, .333, .66 ,1)
to group in to three groups in this case. However,  if I want to extend the
number of groupings say 10 or 15 then do I have to figure it out the
  split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1))

Is there a short cut for that?


Thanks











On Tue, Apr 3, 2012 at 9:13 AM, R. Michael Weylandt <
[email protected]> wrote:

> Ignoring the fact your desired answers are wrong, I'd split the
> separating part and the group means parts into three steps:
>
> i) quantile() can help you get the split points,
> ii)  findInterval() can assign each y to a group
> iii) then ave() or tapply() will do group-wise means
>
> Something like:
>
> y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here.
> ave(y, findInterval(y, quantile(y, c(0.33, 0.66
> tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)
>
> You could also use cut2 from the Hmisc package to combine findInterval
> and quantile into a single step.
>
> Depending on your desired output.
>
> Hope that helps,
> Michael
>
> On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:
> > Hi all,
> >
> > Assume that I have the following 10 data points.
> >  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> >
> > sort x  and get the following
> >  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> >
> > I want to  group the sorted  data point (y)  into  equal number of
> > observation per group. In this case there will be three groups.  The
> first
> > two groups  will have three observation  and the third will have four
> > observations
> >
> > group 1  = 34, 45, 46
> > group 2  = 66, 78, 125
> > group 3  = 193, 209, 242,297
> >
> > Finally I want to calculate the group mean
> >
> > group 1  =  42
> > group 2  =  87
> > group 3  =  234
> >
> > Can anyone help me out?
> >
> > In SAS I used to do it using proc rank.
> >
> > thanks in advance
> >
> > Val
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread K. Elo

Hi!

Maybe not the most elegant solution, but works:

for(i in seq(1,length(data)-(length(data) %% 3), 3)) { 
ifelse((length(data)-i)>3, { print(sort(data)[ c(i:(i+2)) ]); 
print(mean(sort(data)[ c(i:(i+2)) ])) }, { print(sort(data)[ 
c(i:length(data)) ]); print(mean(sort(data)[ c(i:length(data)) ])) } ) }


Produces:

[1] 36 45 46
[1] 42.3
[1]  66  78 125
[1] 89.7
[1] 193 209 242 297
[1] 235.25

HTH,
Kimmo

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread Giovanni Petris
Probably something along the following lines:

> x <- c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> sorted <- c(36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> tapply(sorted, INDEX = (seq_along(sorted) - 1) %/% 3, FUN = mean)
0 1 2 3 
 42.3  89.7 214.7 297.0 

Hope this helps,
Giovanni

On Tue, 2012-04-03 at 08:47 -0400, Val wrote:
> Hi all,
> 
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
> 
> sort x  and get the following
>   y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
> 
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The first
> two groups  will have three observation  and the third will have four
> observations
> 
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
> 
> Finally I want to calculate the group mean
> 
> group 1  =  42
> group 2  =  87
> group 3  =  234
> 
> Can anyone help me out?
> 
> In SAS I used to do it using proc rank.
> 
> thanks in advance
> 
> Val
> 
>   [[alternative HTML version deleted]]
> 
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Giovanni Petris  
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread R. Michael Weylandt
Ignoring the fact your desired answers are wrong, I'd split the
separating part and the group means parts into three steps:

i) quantile() can help you get the split points,
ii)  findInterval() can assign each y to a group
iii) then ave() or tapply() will do group-wise means

Something like:

y <- c(36, 45, 46, 66, 78, 125, 193, 209, 242, 297) # You need a "c" here.
ave(y, findInterval(y, quantile(y, c(0.33, 0.66
tapply(y, findInterval(y, quantile(y, c(0.33, 0.66))), mean)

You could also use cut2 from the Hmisc package to combine findInterval
and quantile into a single step.

Depending on your desired output.

Hope that helps,
Michael

On Tue, Apr 3, 2012 at 8:47 AM, Val  wrote:
> Hi all,
>
> Assume that I have the following 10 data points.
>  x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)
>
> sort x  and get the following
>  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)
>
> I want to  group the sorted  data point (y)  into  equal number of
> observation per group. In this case there will be three groups.  The first
> two groups  will have three observation  and the third will have four
> observations
>
> group 1  = 34, 45, 46
> group 2  = 66, 78, 125
> group 3  = 193, 209, 242,297
>
> Finally I want to calculate the group mean
>
> group 1  =  42
> group 2  =  87
> group 3  =  234
>
> Can anyone help me out?
>
> In SAS I used to do it using proc rank.
>
> thanks in advance
>
> Val
>
>        [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping

2012-04-03 Thread David Winsemius


On Apr 3, 2012, at 8:47 AM, Val wrote:


Hi all,

Assume that I have the following 10 data points.
x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
 y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)


The methods below do not require a sorting step.



I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.  The  
first

two groups  will have three observation  and the third will have four
observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234


I hope those weren't answers from SAS.



Can anyone help me out?



I usually do this with Hmisc::cut2 since it has a `g = ` parameter  
that auto-magically calls the quantile splitting criterion but this is  
done in base R.


split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) )

$`[36,65.9]`
[1] 36 45 46

$`(65.9,189]`
[1]  66  78 125

$`(189,297]`
[1] 193 209 242 297


> lapply( split(x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) ), mean)

$`[36,65.9]`
[1] 42.3

$`(65.9,189]`
[1] 89.7

$`(189,297]`
[1] 235.25

Or to get a table instead of a list:
> tapply( x, cut(x, quantile(x, prob=c(0, .333, .66 ,1)) ,  
include.lowest=TRUE) , mean)

 [36,65.9] (65.9,189]  (189,297]
  42.3   89.7  235.25000


In SAS I used to do it using proc rank.


?quantile isn't equivalent to  Proc Rank but it will provide a useful  
basis for splitting or tabling functions.




thanks in advance

Val

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping together a time variable

2012-02-09 Thread R. Michael Weylandt
Perhaps cut.POSIXt (which is a generic so you can just call cut)
depending on the unstated form of your time object.

Michael

On Thu, Feb 9, 2012 at 12:15 PM, Abraham Mathew  wrote:
> I have the following variable, time, which is a character variable and it's
> structured as follows.
>
>> head(as.character(dat$time), 30) [1] "00:00:01" "00:00:16" "00:00:24" 
>> "00:00:25" "00:00:25" "00:00:40" "00:01:50" "00:01:54" "00:02:33" "00:02:43" 
>> "00:03:22"
> [12] "00:03:31" "00:03:41" "00:03:42" "00:03:43" "00:04:04" "00:05:09"
> "00:05:17" "00:05:19" "00:05:21" "00:05:22" "00:05:22"
> [23] "00:05:28" "00:05:44" "00:05:54" "00:06:54" "00:06:54" "00:07:10"
> "00:08:15" "00:08:26"
>
>
> What I am trying to do is group the data into one hour increment. So
> 5:01-6:00am, 6:01-7:00am, 7:01-8:00a,
> and so forth.
>
> However, I'm not sure if there's a simple route to do this in R or how to
> do it.
> Can anyone point me in the right direction?
>
> --
> *Abraham Mathew
> Statistical Analyst
> www.amathew.com
> 720-648-0108
> @abmathewks*
>
>        [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping miliseconds By Hours

2012-02-05 Thread David Winsemius


On Feb 5, 2012, at 9:54 AM, jim holtman wrote:


Is this what you are after:


x <- c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
+ 1327761389, 1327780993, 1327815670, 1327822964, 1327897497,  
1327897527,
+ 1327937072, 1327938300, 1327957589, 1328044466, 1328127921,  
1328157588,

+ 1328213951, 1328236836, 1328300276, 1328335936, 1328429102)


x <- as.POSIXct(x, origin = '1970-1-1')
x

[1] "2012-01-22 05:49:18 EST" "2012-01-22 08:46:39 EST" "2012-01-25
21:34:56 EST"
[4] "2012-01-26 05:23:53 EST" "2012-01-27 21:50:42 EST" "2012-01-28
14:36:29 EST"
[7] "2012-01-28 20:03:13 EST" "2012-01-29 05:41:10 EST" "2012-01-29
07:42:44 EST"
[10] "2012-01-30 04:24:57 EST" "2012-01-30 04:25:27 EST" "2012-01-30
15:24:32 EST"
[13] "2012-01-30 15:45:00 EST" "2012-01-30 21:06:29 EST" "2012-01-31
21:14:26 EST"
[16] "2012-02-01 20:25:21 EST" "2012-02-02 04:39:48 EST" "2012-02-02
20:19:11 EST"
[19] "2012-02-03 02:40:36 EST" "2012-02-03 20:17:56 EST" "2012-02-04
06:12:16 EST"
[22] "2012-02-05 08:05:02 EST"

table(format(x, "%H"))


02 04 05 06 07 08 14 15 20 21
1  3  3  1  1  2  1  2  4  4


It's possible that you may not realize that jim holman has implicitly  
given you a handle on doing operations on such groups, since you could  
use the value of format(x. "%H") as the indexing argument in tapply,  
ave, or aggregate.


--
David.








On Sun, Feb 5, 2012 at 4:54 AM, Hasan Diwan   
wrote:
I have a list of numbers corresponding to timestamps, a sample of  
which follows:

c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
1327761389, 1327780993, 1327815670, 1327822964, 1327897497,  
1327897527,
1327937072, 1327938300, 1327957589, 1328044466, 1328127921,  
1328157588,

1328213951, 1328236836, 1328300276, 1328335936, 1328429102)

I would like to group these into hours. In other words, something  
like:

c( "2012-01-31 21:14:26 PST" "2012-02-01 20:25:21 PST"
 "2012-02-02 04:39:48 PST" "2012-02-02 20:19:11 PST"
"2012-02-03 02:40:36 PST" "2012-02-03 20:17:56 PST"
"2012-02-04 06:12:16 PST" "2012-02-05 08:05:02 PST")
Hour  Hits
21  1
20  3
41
21
61
81

How would I do this without too much pain (from a CPU perspective)?
This is a subset of a million entries and I would rather not go
through these manually... So, any advice? Many thanks! -- H
--
Sent from my mobile device
Envoyait de mon portable

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping miliseconds By Hours

2012-02-05 Thread jim holtman
Is this what you are after:

> x <- c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
+ 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527,
+ 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588,
+ 1328213951, 1328236836, 1328300276, 1328335936, 1328429102)
>
> x <- as.POSIXct(x, origin = '1970-1-1')
> x
 [1] "2012-01-22 05:49:18 EST" "2012-01-22 08:46:39 EST" "2012-01-25
21:34:56 EST"
 [4] "2012-01-26 05:23:53 EST" "2012-01-27 21:50:42 EST" "2012-01-28
14:36:29 EST"
 [7] "2012-01-28 20:03:13 EST" "2012-01-29 05:41:10 EST" "2012-01-29
07:42:44 EST"
[10] "2012-01-30 04:24:57 EST" "2012-01-30 04:25:27 EST" "2012-01-30
15:24:32 EST"
[13] "2012-01-30 15:45:00 EST" "2012-01-30 21:06:29 EST" "2012-01-31
21:14:26 EST"
[16] "2012-02-01 20:25:21 EST" "2012-02-02 04:39:48 EST" "2012-02-02
20:19:11 EST"
[19] "2012-02-03 02:40:36 EST" "2012-02-03 20:17:56 EST" "2012-02-04
06:12:16 EST"
[22] "2012-02-05 08:05:02 EST"
> table(format(x, "%H"))

02 04 05 06 07 08 14 15 20 21
 1  3  3  1  1  2  1  2  4  4
>
>


On Sun, Feb 5, 2012 at 4:54 AM, Hasan Diwan  wrote:
> I have a list of numbers corresponding to timestamps, a sample of which 
> follows:
> c(1327211358, 1327221999, 1327527296, 1327555433, 1327701042,
> 1327761389, 1327780993, 1327815670, 1327822964, 1327897497, 1327897527,
> 1327937072, 1327938300, 1327957589, 1328044466, 1328127921, 1328157588,
> 1328213951, 1328236836, 1328300276, 1328335936, 1328429102)
>
> I would like to group these into hours. In other words, something like:
> c( "2012-01-31 21:14:26 PST" "2012-02-01 20:25:21 PST"
>  "2012-02-02 04:39:48 PST" "2012-02-02 20:19:11 PST"
> "2012-02-03 02:40:36 PST" "2012-02-03 20:17:56 PST"
> "2012-02-04 06:12:16 PST" "2012-02-05 08:05:02 PST")
> Hour  Hits
> 21      1
> 20      3
> 4        1
> 2        1
> 6        1
> 8        1
>
> How would I do this without too much pain (from a CPU perspective)?
> This is a subset of a million entries and I would rather not go
> through these manually... So, any advice? Many thanks! -- H
> --
> Sent from my mobile device
> Envoyait de mon portable
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping clusters from dendrograms

2011-11-03 Thread plangfelder
Hi Julia, 

sorry for the very late reply, your original email was posted while I was on
hiatus from R-help. I'm the author of the dynamicTreeCut package. I
recommend that you try using the "hybrid" method using the cutreeDynamic
function. What you observed is a known problem of the tree method (which, by
the way, was the reason I developed the Hybrid method). 

Using the hybrid method is simple, for example as 

cut2<-cutreeDynamic(dendro,distM = combo2,
maxTreeHeight=1,deepSplit=2,minModuleSize=1)

You can play with the argument deepSplit to obtain finer or coarser modules.

HTH,

Peter 

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-clusters-from-dendrograms-tp2316521p3988526.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping variables in a data frame

2011-08-27 Thread Liviu Andronic
On Sat, Aug 27, 2011 at 7:26 AM, Andra Isan  wrote:
> Hi All,
>
> I have a data frame as follow:
>
> user_id time age location gender
> .
>
> and I learn a logistic regression to learn the weights (glm with family= 
> (link = logit))), my response value is either zero or one. I would like to 
> group the users based on user_id and time and see the y values and predicted 
> y values at the same time. Or plot them some how. Is there any way to somehow 
> group them together so that I can learn more about my data by grouping them?
>
It's very difficult to help you because you haven't followed the
posting guide. But I suspect you're looking for the following:

> require(plyr)
Loading required package: plyr
> data(mtcars)
> ##considering 'gear' as 'id' and 'carb' as time
> ddply(mtcars, .(gear, carb), function(x) mean(x$hp))
   gear carbV1
1 31 104.0
2 32 162.5
3 33 180.0
4 34 228.0
5 41  72.5
6 42  79.5
7 44 116.5
8 52 102.0
9 54 264.0
1056 175.0
1158 335.0

This will compute the mean of 'hp' for each group of id & time.
Liviu


> I would like to get these at the end
> user_id time y predicted_y
>
> Thanks a lot,
> Andra
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping columns

2011-07-21 Thread Geophagus
Hi @ all,
both possibilities are working very fine.
Thanks a lot for the fast help!

Best Greetinx from the "Earth Eater" Geophagus 

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3683076.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping columns

2011-07-20 Thread David Winsemius


On Jul 20, 2011, at 10:42 AM, Geophagus wrote:


*Hi @ all,
I have a question concerning the possibilty of grouping the columns  
of a

matrix.
R groups the columns alphabetically.
What can I do to group the columns in my specifications?


Dear Earth Eater;

You can create a factor whose levels are ordered to your  
specification. Your columns: "umweltkompartiment" obviously has those  
levels. This might also offer advantages in situations where there was  
not complete representation of all levels in all the files


So your tapply() calls could have been of this form:

b1_1<-tapply(b1$betriebs_id,
 factor( b1$umweltkompartiment, levels=
c("Gesamt", "Wasser", "Boden", "Luft", "Abwasser",
  "Gefährliche Abfälle", "nicht gefährliche  
Abfälle") )

   ,length)
# code would e more compact if you created a facvtor vector and use it  
as an argument to factor:


faclevs <- c("Gesamt", "Wasser", "Boden", "Luft", "Abwasser",
  "Gefährliche Abfälle", "nicht gefährliche  
Abfälle")

b1_1<-tapply(b1$betriebs_id,
 factor( b1$umweltkompartiment, levels= faclev )
   ,length)
<< lather, rinse, repeat x 3>>
--
David.


The script is the following:*


#R-Skript: Anzahl xyz

#Quelldatei einlesen
b<-read.csv2("Z:/int/xyz.csv", header=TRUE)

#Teilmengen für die Einzeljahre generieren
b1<-subset(b,jahr=="2007")
b2<-subset(b,jahr=="2008")
b3<-subset(b,jahr=="2009")

#tapply für die Einzeljahre auf die jeweilige BranchenID
b1_1<-tapply(b1$betriebs_id,b1$umweltkompartiment,length)
b1_2<-tapply(b2$betriebs_id,b2$umweltkompartiment,length)
b1_3<-tapply(b3$betriebs_id,b3$umweltkompartiment,length)

#Verbinden der Ergebnisse
b11<-rbind(b1_1,b1_2,b1_3)
Gesamt<-apply(X=b11,MARGIN=1, sum)
b13<-cbind(Gesamt,b11)
b13

Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche
Abfälle Wasser
b1_1   9832  432183147 2839
1592   1804
b1_2  10271  413283360 2920
1715   1835
b1_3   9983  404213405 2741
1691   1721

*Now I want to have the following order of the columns:
Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht
gefährliche Abfälle

Thanks a lot for your answers!
Fak*



--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681018.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping columns

2011-07-20 Thread Brad Patrick Schneid
untested because I don't have access to your data, but this should work. 

b13.NEW <- b13[, c("Gesamt", "Wasser", "Boden", "Luft", "Abwasser",
"Gefährliche Abfälle", "nicht gefährliche Abfälle")] 







Geophagus wrote:
> 
> *Hi @ all,
> I have a question concerning the possibilty of grouping the columns of a
> matrix.
> R groups the columns alphabetically. 
> What can I do to group the columns in my specifications?
> 
> The script is the following:*
> 
>> #R-Skript: Anzahl xyz
>> 
>> #Quelldatei einlesen
>> b<-read.csv2("Z:/int/xyz.csv", header=TRUE) 
>> 
>> #Teilmengen für die Einzeljahre generieren
>> b1<-subset(b,jahr=="2007")
>> b2<-subset(b,jahr=="2008")
>> b3<-subset(b,jahr=="2009")
>> 
>> #tapply für die Einzeljahre auf die jeweilige BranchenID
>> b1_1<-tapply(b1$betriebs_id,b1$umweltkompartiment,length)
>> b1_2<-tapply(b2$betriebs_id,b2$umweltkompartiment,length)
>> b1_3<-tapply(b3$betriebs_id,b3$umweltkompartiment,length)
>> 
>> #Verbinden der Ergebnisse
>> b11<-rbind(b1_1,b1_2,b1_3)
>> Gesamt<-apply(X=b11,MARGIN=1, sum)
>> b13<-cbind(Gesamt,b11)
>> b13
>  Gesamt Abwasser Boden Gefährliche Abfälle Luft nicht gefährliche
> Abfälle Wasser
> b1_1   9832  432183147 2839 
> 1592   1804
> b1_2  10271  413283360 2920 
> 1715   1835
> b1_3   9983  404213405 2741 
> 1691   1721
> 
> *Now I want to have the following order of the columns:
> Gesamt, Wasser, Boden, Luft, Abwasser, Gefährliche Abfälle, nicht
> gefährliche Abfälle
> 
> Thanks a lot for your answers!
> Fak*
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-columns-tp3681018p3681121.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping data

2011-07-20 Thread Dieter Menne

adolfpf wrote:
> 
> How do I "group" my data in "dolf" the same way the data "Orthodont" are
> grouped.
> 
>> show(dolf)
>distance   age Subjectt Sex
> 16.83679 22.01   F1   F
> 26.63245 23.04   F1   F
> 3   11.58730 39.26   M2   M
> 
> 

I know that many sample in that excellent book use grouped data, but the
concept of "grouped data" is more confusing than helpful. I only got started
using nlme/lme when I realized that everything could be done without grouped
data. Too bad, many examples in Pinheiro/Bates rely on the concept (but no
longer do in the coing lme4).

So I suggest that you try to solve the problem with vanilla data frames
instead of grouped ones. In most cases, it only means that you have to put
the formula into the lme(..) call instead of relying on some hidden
defaults.

Dieter







--
View this message in context: 
http://r.789695.n4.nabble.com/grouping-data-tp3679803p3680115.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in ranges in table

2011-03-05 Thread Jorge Ivan Velez
Hi Jason,

Something along the lines of

with(Orange, table(cut(age, breaks = c(118, 664, 1004, 1372, 1582, Inf)),
   cut(circumference, breaks = c(30, 58, 62, 115,
145, 179, 214

should get you started.

HTH,
Jorge


On Sat, Mar 5, 2011 at 5:38 PM, Jason Rupert <> wrote:

> Working with the built in R data set Orange, e.g. with(Orange, table(age,
> circumference)).
>
>
> How should I go about about grouping the ages and circumferences in the
> following ranges and having them display as such in a table?
> age range:
> 118 - 664
> 1004 - 1372
> 1582
>
> circumference range:
> 30-58
> 62- 115
> 120-142
> 145-177
> 179-214
>
> Thanks for any feedback and insights, as I hoping for an output that looks
> something like the following:
>   circumference range
>   30-58 62- 115  145-177
> age range
> 118 - 664 ...
> 1004 - 1372 ...
> 1582
>
>
> Thanks a ton.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in ranges in table

2011-03-05 Thread Greg Snow
?cut

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[email protected]
801.408.8111


> -Original Message-
> From: [email protected] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Jason Rupert
> Sent: Saturday, March 05, 2011 3:38 PM
> To: R Project Help
> Subject: [R] Grouping data in ranges in table
> 
> Working with the built in R data set Orange, e.g. with(Orange,
> table(age,
> circumference)).
> 
> 
> How should I go about about grouping the ages and circumferences in the
> following ranges and having them display as such in a table?
> age range:
> 118 - 664
> 1004 - 1372
> 1582
> 
> circumference range:
> 30-58
> 62- 115
> 120-142
> 145-177
> 179-214
> 
> Thanks for any feedback and insights, as I hoping for an output that
> looks
> something like the following:
>circumference range
>30-58 62- 115  145-177
> age range
> 118 - 664 ...
> 1004 - 1372 ...
> 1582
> 
> 
> Thanks a ton.
> 
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping data

2011-03-04 Thread Joshua Wiley
Hi Steve,

Just test whether y is greater than the predicted y (i.e., your line).

## function using the model coefficients*
f <- function(x) {82.9996 + (.5589 * x)}
## Find group membership
group <- ifelse(y > foo(x), "A", "B")

*Note that depending how accurate this needs to be, you will probably
want to use the model itself rather than just reading from the
printout like I did.  If you need to do that, take a look at ?predict

For future reference, it would be easier for readers if you provided
your data via something like: dput(x) that can be copied directly into
the R console.  Also, if you are generating random data (rnorm()), you
can use set.seed() so that we can replicate exactly what you get.

HTH,

Josh

On Fri, Mar 4, 2011 at 1:39 PM, Steve Hong  wrote:
> Hi R-list,
>
> I have a data set with plot locations and observations and want to label
> them based on locations.  For example, I have GPS information (x and y) as
> follows:
[snip]
>> (fm1 <- lm(ysim~xsim))
> Call:
> lm(formula = ysim ~ xsim)
> Coefficients:
> (Intercept)         xsim
>    82.9996       0.5589
>
> I overlapped fitted line on the plot.
>
>> abline(fm1)
> My question is:
> As you can see in the plot, how can I label (or re-group) those in upper
> diagonal as (say) 'A' and the others in lower diagonal as 'B'?
>
> Thanks a lot in advance!!!
>
> Steve
>
>        [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-27 Thread jim holtman
Here is one solution; mine differs since there should be at least one
item in the range which would be itself:

  tm gr
1  12345  1
2  42352  3
3  12435  1
4  67546  2
5  24234  2
6  76543  4
7  31243  2
8  13334  3
9  64562  3
10 64123  3
> d$ct <- ave(d$tm, d$gr, FUN = function(x){
+ # determine count in the range
+ sapply(x, function(a) sum((x >= a - 500) & (x <= a + 500)))
+ })
>
> d
  tm gr ct
1  12345  1  2
2  42352  3  1
3  12435  1  2
4  67546  2  1
5  24234  2  1
6  76543  4  1
7  31243  2  1
8  13334  3  1
9  64562  3  2
10 64123  3  2


On Sat, Feb 26, 2011 at 5:10 PM, zem  wrote:
> sry,
> new try:
>
> tm<-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123)
> gr<-c(1,3,1,2,2,4,2,3,3,3)
> d<-data.frame(cbind(time,gr))
>
> where tm are unix times and gr the factor grouping by
> i have a skalar for example k=500
> now i need to calculate in for every row how much examples in the group are
> in the interval [i-500;i+500] and i is the active tm-element, like this:
>
>>d
>    time gr ct
> 1  12345  1  2
> 2  42352  3  0
> 3  12435  1  2
> 4  67546  2  0
> 5  24234  2  0
> 6  76543  4  0
> 7  31243  2  0
> 8  13334  3  0
> 9  64562  3  2
> 10 64123  3  2
>
> i hope that was a better illustration of my problem
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-27 Thread zem
have nobody any idea? 
i have already try with tapply(d,gr, ... ) but i have problems with the
choose of the function ...  also i am not really sure if that is the right
direction with tapply ... 
it'll be really great when somebody comes with new suggestion..

10x

-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3327240.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-26 Thread zem
sry, 
new try: 

tm<-c(12345,42352,12435,67546,24234,76543,31243,13334,64562,64123) 
gr<-c(1,3,1,2,2,4,2,3,3,3) 
d<-data.frame(cbind(time,gr))

where tm are unix times and gr the factor grouping by
i have a skalar for example k=500
now i need to calculate in for every row how much examples in the group are
in the interval [i-500;i+500] and i is the active tm-element, like this: 

>d
time gr ct
1  12345  1  2
2  42352  3  0
3  12435  1  2
4  67546  2  0
5  24234  2  0
6  76543  4  0
7  31243  2  0
8  13334  3  0
9  64562  3  2
10 64123  3  2

i hope that was a better illustration of my problem

-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-and-counting-in-dataframe-tp3325476p3326338.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping and counting in dataframe

2011-02-25 Thread David Winsemius


On Feb 25, 2011, at 8:28 PM, zem wrote:



hi all,

i have a little problem, i have some code writen, but it is to slow...

i have a dataframe with a column of time series and a grouping column,
really there is no metter if in the first col what kind of data is,  
it can

be a random number like this
x<-rnorm(10)
gr<-c(1,3,1,2,2,4,2,3,3,3)
x<-cbind(x,gr)


That is not a dataframe. It is a matrix. And not all time series  
objects are the same, so you should not assume that any old two column  
object will respond the same way to R functions.




now i have to look for every row i , for this group, how much from  
the x[,1]

is in a range from x[1,i] such x[1,i] (+/-) k (k is a another number)


You may find that the function, findInterval, is useful. I cannot  
determine what you goal is from the description and there is no  
complete example with a specification of what correct output would  
be   as you should have seen requested in the Posting Guide.





--

David Winsemius, MD
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by factors in R

2011-02-08 Thread Christopher R. Dolanc
I'm working on getting this to work - need to figure out how to extract 
pieces properly.

In the mean time, I may have figured out an alternate method to group 
the factors by the following:

>  stems139$SpeciesF <- factor(stems139$Species)

>  stems139GLM <- glm(Stems ~ Time*SizeClassF*Species, family=poisson, 
data=stems139)

>  summary(stems139GLM)

Call:

glm(formula = Stems ~ Time * SizeClassF * Species, family = poisson,

data = stems139)

Deviance Residuals:

Min1QMedian3QMax

-4.2308-1.0107-0.6786-0.339316.7415

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)-0.6717940.118678-5.661 1.51e-08 ***

TimeVTM-0.5738000.197698-2.902 0.003703 **

SizeClassF2-0.7661720.210684-3.637 0.000276 ***

SizeClassF3-1.960095 0.337764-5.803 6.51e-09 ***

SizeClassF4-2.6532420.462693-5.734 9.79e-09 ***

SpeciesABMA1.8240950.12789514.262< 2e-16 ***

SpeciesJUOC-0.0882930.171666-0.514 0.607022

SpeciesPIAL1.9479200.12685615.355 < 2e-16 ***

SpeciesPICO2.8634070.12201823.467 < 2e-16 ***

SpeciesPIJE-0.5250100.194664-2.697 0.006997 **

SpeciesPIMO0.3720490.1542512.412 0.015866 *

SpeciesTSME1.9194050.12708515.103< 2e-16 ***

TimeVTM:SizeClassF2-0.6201220.411567-1.507 0.131879

TimeVTM:SizeClassF30.7561220.4716121.603 0.108875

TimeVTM:SizeClassF40.9102730.6180141.473 0.140778

The problem now though, is that R for some reason does not list factor 1 
in the output.  Why would this be?

On 2/8/2011 2:21 PM, Dennis Murphy wrote:
> Hi:
>
> One approach would be to use dlply() from the plyr package to generate 
> the models and assign the results to a list, something like the following:
>
> library(plyr)
> # function to run the GLM in each data subset - the argument is a 
> generic data subset d
> gfun <- function(d) glm(Stems ~ Time, data = d, family = poisson)
> mlist <- dlply(stems139, .(SizeClass, Species), gfun)
>
> To see the result, try mlist[[1]] or summary(mlist[[1]]) to execute 
> the print and summary methods on the first fitted model. Each output 
> list object from glm() is a list component of mlist, so mlist is 
> actually a list of lists.
>
> You can extract various pieces from mlist by using ldply() with a 
> suitable extraction function or by use of the do.call/lapply combination.
>
> All of this is untested since no minimal example was provided per 
> instructions in the Posting Guide...
>
> HTH,
> Dennis
>
>
> On Tue, Feb 8, 2011 at 11:54 AM, Christopher R. Dolanc 
> mailto:[email protected]>> wrote:
>
> I'm having a hard time figuring out how to group results by
> certain factors in R.  I have data with the following headings:
>
> [1] "Time"  "Plot"  "LatCat""Elevation" "ElevCat"  
> "Aspect""AspCat""Slope"
> [9] "SlopeCat"  "Species"   "SizeClass" "Stems"
>
> and I'm trying to use a GLM to test differences in "Stems" for
> different categories/factors - most importantly, I want to group
> things so that I see results by "SizeClass" and then by "Species".
>  This is pretty easy in SAS using the "Group By" command, but in
> R, I haven't figured it out.
>
> I've tried using the following code:
>
> > stems139GLM <- glm(Stems ~ Time | SizeClass | Species,
> family=poisson, data=stems139)
>
> but R gives me this message:
>
> Error in pmax(exp(eta), .Machine$double.eps) :
>  cannot mix 0-length vectors with others
> In addition: Warning messages:
> 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors
> 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for
> factors
>
> I'd appreciate any help.
>
> Thanks.
>
> -- 
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)un
>
> __
> [email protected]  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Christopher R. Dolanc
PhD Candidate
Ecology Graduate Group
University of California, Davis
Lab Phone: (530) 752-2644 (Barbour lab)


[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by factors in R

2011-02-08 Thread Dennis Murphy
Hi:

One approach would be to use dlply() from the plyr package to generate the
models and assign the results to a list, something like the following:

library(plyr)
# function to run the GLM in each data subset - the argument is a generic
data subset d
gfun <- function(d) glm(Stems ~ Time, data = d, family = poisson)
mlist <- dlply(stems139, .(SizeClass, Species), gfun)

To see the result, try mlist[[1]] or summary(mlist[[1]]) to execute the
print and summary methods on the first fitted model. Each output list object
from glm() is a list component of mlist, so mlist is actually a list of
lists.

You can extract various pieces from mlist by using ldply() with a suitable
extraction function or by use of the do.call/lapply combination.

All of this is untested since no minimal example was provided per
instructions in the Posting Guide...

HTH,
Dennis


On Tue, Feb 8, 2011 at 11:54 AM, Christopher R. Dolanc  wrote:

> I'm having a hard time figuring out how to group results by certain factors
> in R.  I have data with the following headings:
>
> [1] "Time"  "Plot"  "LatCat""Elevation" "ElevCat"   "Aspect"
>  "AspCat""Slope"
> [9] "SlopeCat"  "Species"   "SizeClass" "Stems"
>
> and I'm trying to use a GLM to test differences in "Stems" for different
> categories/factors - most importantly, I want to group things so that I see
> results by "SizeClass" and then by "Species".  This is pretty easy in SAS
> using the "Group By" command, but in R, I haven't figured it out.
>
> I've tried using the following code:
>
> > stems139GLM <- glm(Stems ~ Time | SizeClass | Species, family=poisson,
> data=stems139)
>
> but R gives me this message:
>
> Error in pmax(exp(eta), .Machine$double.eps) :
>  cannot mix 0-length vectors with others
> In addition: Warning messages:
> 1: In Ops.factor(Time, SizeClass) : | not meaningful for factors
> 2: In Ops.factor(Time | SizeClass, Species) : | not meaningful for factors
>
> I'd appreciate any help.
>
> Thanks.
>
> --
> Christopher R. Dolanc
> PhD Candidate
> Ecology Graduate Group
> University of California, Davis
> Lab Phone: (530) 752-2644 (Barbour lab)un
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread will phillips

Hello Jorge,

Thank you for the reply.  I tried a few different things with if/else but
couldn't get them to go.  I really appreciate your feedback.  I learned
something new from this

Will
-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019952.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread will phillips

Hello Jim

Wow.  I tried cut but i see you have an interim step with labels a,b,c but
levels night and day.  i was really close to this.  i have labels
night,day,night and it wouldn't let me duplicate labels.  I am very greatful
for your input

Will
-- 
View this message in context: 
http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019950.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread Jorge Ivan Velez
Hi Will,

One way would be:

> x
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24
> factor(ifelse(x>6 & x<18, 'day', 'night'))
 [1] night night night night night night night day   day   day   day   day
day   day   day
[16] day   day   day   night night night night night night night
Levels: day night

HTH,
Jorge


On Fri, Oct 29, 2010 at 8:56 PM, will phillips <> wrote:

>
> Hello
>
> I have what is probably a very simple grouping question however, given my
> limited exposure to R, I have not found a solution yet despite my research
> efforts and wild attempts at what I thought "might" produce some sort of
> result.
>
> I have a very simple list of integers that range between 1 and 24.  These
> correspond to hours of the day.
>
> I am trying to create a grouping of Day and Night with
> Day = 6 to 17.99
> Night = 1 to 5.59  and  18 to 24
>
> Using the Cut command I can create the segments but I have not found a
> "combine" type of command to merger the two "night" segments.  No luck with
> if/else either.
>
> Any help would be greatly appreciated
>
> Thank you
>
> Will
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping question

2010-10-29 Thread jim holtman
try this:

> x
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> y <- cut(x, breaks=c(-Inf,6,18, Inf), labels=c('a','b','c'))
> levels(y) <- c('night','day','night')
> y
 [1] night night night night night night night day   day   day   day
day   day   day   day   day   day   day
[19] day   night night night night night night
Levels: night day
>


On Fri, Oct 29, 2010 at 8:56 PM, will phillips  wrote:
>
> Hello
>
> I have what is probably a very simple grouping question however, given my
> limited exposure to R, I have not found a solution yet despite my research
> efforts and wild attempts at what I thought "might" produce some sort of
> result.
>
> I have a very simple list of integers that range between 1 and 24.  These
> correspond to hours of the day.
>
> I am trying to create a grouping of Day and Night with
> Day = 6 to 17.99
> Night = 1 to 5.59  and  18 to 24
>
> Using the Cut command I can create the segments but I have not found a
> "combine" type of command to merger the two "night" segments.  No luck with
> if/else either.
>
> Any help would be greatly appreciated
>
> Thank you
>
> Will
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/grouping-question-tp3019922p3019922.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping dataframe entries using a categorical variable

2010-09-17 Thread Phil Spector

Bastien -
   In what way did

subset(yourdataframe,ESS %in% softwood)

not work?
- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 [email protected]



On Fri, 17 Sep 2010, Bastien Ferland-Raymond wrote:


DearR Users,

I have a problem which I think you might be able to help.  I have a dataframe which I'm 
trying to "filter" following different groups I specified.  It's a little hard 
to explain, so here is an example:

My dataframe:

  ESS DHP
1  EPB  22
2  SAB  10
3  SAB  20
4  BOJ  14
5  ERS  28
11 SAB  10
12 SAB  22
13 BOJ  26
20 SAB  10
21 SAB  22
22 BOJ  32
29 SAB  14
30 SAB  22
38 SAB  14
47 SAB  18

I'm trying to filter it by selecting a subgroup of ESS, for example:
softwood<- c("EPB","SAB")

So I can obtain:
NEW dataframe:
  ESS DHP
1  EPB  22
2  SAB  10
3  SAB  20
11 SAB  10
12 SAB  22
20 SAB  10
21 SAB  22
29 SAB  14
30 SAB  22
38 SAB  14
47 SAB  18

(my real groups are actually bigger and so are my dataframe but you get the 
idea).

I have looked at subset and aggregate but it doesn't work and the loop would be totally 
inefficient. I'm sure there is a function in R that does something like that but I 
couldn't find the proper "keyword" to search for it.

Thanks for your help,

Bastien
__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grouping dataframe entries using a categorical variable

2010-09-17 Thread Ista Zahn
Hi Bastien,

You can use match(), or the convenience function %in%, like this
(assuming your data.frame is named "dat"):

subset(dat, ESS %in% c("EPB","SAB"))

dat[dat$ESS %in% c("EPB","SAB"), ]

best,
Ista

On Fri, Sep 17, 2010 at 1:02 PM, Bastien Ferland-Raymond
 wrote:
>  DearR Users,
>
> I have a problem which I think you might be able to help.  I have a dataframe 
> which I'm trying to "filter" following different groups I specified.  It's a 
> little hard to explain, so here is an example:
>
> My dataframe:
>
>   ESS DHP
> 1  EPB  22
> 2  SAB  10
> 3  SAB  20
> 4  BOJ  14
> 5  ERS  28
> 11 SAB  10
> 12 SAB  22
> 13 BOJ  26
> 20 SAB  10
> 21 SAB  22
> 22 BOJ  32
> 29 SAB  14
> 30 SAB  22
> 38 SAB  14
> 47 SAB  18
>
> I'm trying to filter it by selecting a subgroup of ESS, for example:
>  softwood<- c("EPB","SAB")
>
> So I can obtain:
> NEW dataframe:
>   ESS DHP
> 1  EPB  22
> 2  SAB  10
> 3  SAB  20
> 11 SAB  10
> 12 SAB  22
> 20 SAB  10
> 21 SAB  22
> 29 SAB  14
> 30 SAB  22
> 38 SAB  14
> 47 SAB  18
>
> (my real groups are actually bigger and so are my dataframe but you get the 
> idea).
>
> I have looked at subset and aggregate but it doesn't work and the loop would 
> be totally inefficient. I'm sure there is a function in R that does something 
> like that but I couldn't find the proper "keyword" to search for it.
>
> Thanks for your help,
>
> Bastien
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Joshua Wiley
Hi Johnny,

Something like this

rbind(NA, dat.med)[as.numeric(dat$image.group), ]

should do the trick (with the data you provided and Ista's code).  The
key is that dat.med has a different row for each level of the factor
image.group (and in the same order).  The idea is to convert the
factor created by cut that shows which row belongs to which group into
numbers (2, 2, 2, 3, 3, 3, etc.), and use that to select the
appropriate rows from dat.med.

Since level 1 had no data (so no row in dat.med), I just added an NA
row in using rbind().  Supposing levels 1, 13, and 15 were missing,
you would need to insert rows in the appropriate positions for my code
to work.  This is because if you have a factor with 3 levels (like you
do), when converted to numbers they will be 1, 2, and 3.  Even if
there is no actual data for level 1, the numeric conversions of levels
2 and 3 will still be 2 and 3.  So, you need to make sure that row 2
in dat.med, matches level 2 in image.group, and so on for every other
level.

HTH,

Josh



On Fri, Aug 27, 2010 at 12:26 PM, Johnny Tkach  wrote:
> HI Ista,
>
> Thanks for the help.  The 'cut' function seems to do the trick .
>
> I'm not sure why you suggested this line of code:
>> ddply(dat, .(image.group), transform, measure.median = median(Measurement))
>
> I think I might have confused the issue by putting a 'Measurement' column in 
> my example in the body of the e-mail, while there is no such column in the 
> actual data.
>
> The second ddply function on the cut data file seems to do the trick for 
> taking the median of the relevant data. However, I still have one more 
> question.  Would it be possible to assign the median data back to the 
> original a.ImageNumber number.  In this situation, the same data would be 
> associated with images 1 through 3 and another set associated with images 4 
> through 6 and so on.
>
> For example (again, I use 'Measurement' just as a generic column):
>
> ImageNumber     Measurement
> 1                               1
> 1                               2
> 1                               3
> 2                               2
> 2                               2
> 3                               4
> 3                               3
> 3                               3
> 3                               4
>
> where the median of all the 'Measurement' data is 3 and the output would be:
>
> ImageNumber     Measurement
> 1                               3
> 1                               3
> 1                               3
> 2                               3
> 2                               3
> 3                               3
> 3                               3
> 3                               3
> 3                               3
>
> or
>
> ImageNumber     Measurement
> 1                               3
> 2                               3
> 3                               3
>
> I really appreciate your help with this.
>
> JT
>
> Johnny Tkach, PhD
> Donnelly CCBR, Rm. 1230
> Department of Biochemistry
> University of Toronto
> 160 College Street
> M5S 3E1
>
> phone - 416 946 5774
> fax - 416 978 8548
> [email protected]
>
> "Beauty's just another word I'm never certain how to spell"
>
>
>
>
> On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote:
>
>> Hi Johnny,
>>
>> If I understand correctly, I think you can use cut() to create a grouping 
>> variable, and then calculate your summaries based on that. Something like
>>
>> dat <- read.csv("~/Downloads/exampledata.csv")
>>
>> dat$image.group <- cut(dat$a.ImageNumber, breaks = seq(0, 
>> max(dat$a.ImageNumber), by = 3))
>> library(plyr)
>> ddply(dat, .(image.group), transform, measure.median = median(Measurement))
>>
>> dat.med <- ddply(dat, .(image.group), summarize,
>>       a.AreaShape_Area.median = median(a.AreaShape_Area),
>>       a.Intensity_IntegratedIntensity_OrigRFP.median = 
>> median(a.Intensity_IntegratedIntensity_OrigRFP),
>>       a.Intensity_IntegratedIntensity_OrigGFP.median = 
>> median(a.Intensity_IntegratedIntensity_OrigGFP),
>>       b.Intensity_MeanIntensity_OrigGFP.median = 
>> median(b.Intensity_MeanIntensity_OrigGFP),
>>       EstCytoIntensity.median = median(EstCytoIntensity),
>>       TotalIntensity.median = median(TotalIntensity),
>>       NucToCytoRatio.median = median(NucToCytoRatio)
>>       )
>>
>> Best,
>> Ista
>> On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach  
>> wrote:
>> Hi all,
>>
>>
>> Since I could not attach a file to my original e-mail request, for those who 
>> want to look at an example of a data file I am working with, please use this 
>> link:
>>
>> http://dl.dropbox.com/u/4637975/exampledata.csv
>>
>> Thanks again,
>>
>> Johnny.
>>
>> __
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate stu

Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Johnny Tkach
HI Ista,

Thanks for the help.  The 'cut' function seems to do the trick .

I'm not sure why you suggested this line of code:
> ddply(dat, .(image.group), transform, measure.median = median(Measurement))

I think I might have confused the issue by putting a 'Measurement' column in my 
example in the body of the e-mail, while there is no such column in the actual 
data.

The second ddply function on the cut data file seems to do the trick for taking 
the median of the relevant data. However, I still have one more question.  
Would it be possible to assign the median data back to the original 
a.ImageNumber number.  In this situation, the same data would be associated 
with images 1 through 3 and another set associated with images 4 through 6 and 
so on.

For example (again, I use 'Measurement' just as a generic column):

ImageNumber Measurement
1   1
1   2
1   3
2   2
2   2
3   4
3   3
3   3
3   4

where the median of all the 'Measurement' data is 3 and the output would be:

ImageNumber Measurement
1   3
1   3
1   3
2   3
2   3
3   3
3   3
3   3
3   3

or

ImageNumber Measurement
1   3
2   3
3   3

I really appreciate your help with this.

JT

Johnny Tkach, PhD
Donnelly CCBR, Rm. 1230 
Department of Biochemistry
University of Toronto
160 College Street
M5S 3E1

phone - 416 946 5774
fax - 416 978 8548
[email protected]

"Beauty's just another word I'm never certain how to spell"




On Aug 27, 2010, at 2:01 PM, Ista Zahn wrote:

> Hi Johnny,
> 
> If I understand correctly, I think you can use cut() to create a grouping 
> variable, and then calculate your summaries based on that. Something like
> 
> dat <- read.csv("~/Downloads/exampledata.csv")
> 
> dat$image.group <- cut(dat$a.ImageNumber, breaks = seq(0, 
> max(dat$a.ImageNumber), by = 3))
> library(plyr)
> ddply(dat, .(image.group), transform, measure.median = median(Measurement))
> 
> dat.med <- ddply(dat, .(image.group), summarize,
>   a.AreaShape_Area.median = median(a.AreaShape_Area),
>   a.Intensity_IntegratedIntensity_OrigRFP.median = 
> median(a.Intensity_IntegratedIntensity_OrigRFP),
>   a.Intensity_IntegratedIntensity_OrigGFP.median = 
> median(a.Intensity_IntegratedIntensity_OrigGFP),
>   b.Intensity_MeanIntensity_OrigGFP.median = 
> median(b.Intensity_MeanIntensity_OrigGFP),
>   EstCytoIntensity.median = median(EstCytoIntensity),
>   TotalIntensity.median = median(TotalIntensity),
>   NucToCytoRatio.median = median(NucToCytoRatio)
>   )
> 
> Best,
> Ista
> On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach  
> wrote:
> Hi all,
> 
> 
> Since I could not attach a file to my original e-mail request, for those who 
> want to look at an example of a data file I am working with, please use this 
> link:
> 
> http://dl.dropbox.com/u/4637975/exampledata.csv
> 
> Thanks again,
> 
> Johnny.
> 
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org


[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Ista Zahn
Hi Johnny,

If I understand correctly, I think you can use cut() to create a grouping
variable, and then calculate your summaries based on that. Something like

dat <- read.csv("~/Downloads/exampledata.csv")

dat$image.group <- cut(dat$a.ImageNumber, breaks = seq(0,
max(dat$a.ImageNumber), by = 3))
library(plyr)
ddply(dat, .(image.group), transform, measure.median = median(Measurement))

dat.med <- ddply(dat, .(image.group), summarize,
  a.AreaShape_Area.median = median(a.AreaShape_Area),
  a.Intensity_IntegratedIntensity_OrigRFP.median =
median(a.Intensity_IntegratedIntensity_OrigRFP),
  a.Intensity_IntegratedIntensity_OrigGFP.median =
median(a.Intensity_IntegratedIntensity_OrigGFP),
  b.Intensity_MeanIntensity_OrigGFP.median =
median(b.Intensity_MeanIntensity_OrigGFP),
  EstCytoIntensity.median = median(EstCytoIntensity),
  TotalIntensity.median = median(TotalIntensity),
  NucToCytoRatio.median = median(NucToCytoRatio)
  )

Best,
Ista
On Fri, Aug 27, 2010 at 5:28 PM, Johnny Tkach wrote:

> Hi all,
>
>
> Since I could not attach a file to my original e-mail request, for those
> who want to look at an example of a data file I am working with, please use
> this link:
>
> http://dl.dropbox.com/u/4637975/exampledata.csv
>
> Thanks again,
>
> Johnny.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping sets of data, performing function and re-assigning values

2010-08-27 Thread Johnny Tkach
Hi all,


Since I could not attach a file to my original e-mail request, for those who 
want to look at an example of a data file I am working with, please use this 
link:

http://dl.dropbox.com/u/4637975/exampledata.csv

Thanks again,

Johnny.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and stacking bar plot for categorical variables

2010-07-20 Thread Jim Lemon

On 07/19/2010 11:36 PM, Simon Kiss wrote:

Hi all,
I have a series of cateogiral variables that look just like this:

welfare=sample(c("less", "same", "more"), 1000, replace=TRUE)
education=sample(c("less", "same", "more"), 1000, replace=TRUE)
defence=sample(c("less", "same", "more"), 1000, replace=TRUE)
egp=sample(c("salariat", "routine non-manual", "self-employed, farmers",
"skilled labour, foremen", "unskilled labour", "social and cultural
specialists"), 1000, replace=TRUE)

welfare, education and defence are responses to a series of questions
about whether or not the respondent supports, less, the same or more
spending on an issue.

egp is a class category.

What I would like is a barplot that is both stacked and grouped. The
x-axis categories should be the egp class category. Within each class
category I would like a cluster of stacked bars that show the
distribution of spending support for each issue.

Can anyone suggest something?


Hi Simon,
You might also like to look at the barNest function in the plotrix 
package, not stacked, but nested.


Jim

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping and stacking bar plot for categorical variables

2010-07-19 Thread David Winsemius


On Jul 19, 2010, at 9:36 AM, Simon Kiss wrote:


Hi all,
I have a series of cateogiral variables that look just like this:

welfare=sample(c("less", "same", "more"), 1000, replace=TRUE)
education=sample(c("less", "same", "more"), 1000, replace=TRUE)
defence=sample(c("less", "same", "more"), 1000, replace=TRUE)
egp=sample(c("salariat", "routine non-manual", "self-employed,  
farmers", "skilled labour, foremen", "unskilled labour", "social and  
cultural specialists"), 1000, replace=TRUE)


welfare, education and defence are responses to a series of  
questions about whether or not the respondent supports, less, the  
same or more spending on an issue.


egp is a class category.

What I would like is a barplot that is both stacked and grouped.   
The x-axis categories should be the egp class category.  Within each  
class category I would like a cluster of stacked bars that show the  
distribution of spending support for each issue.


Can anyone suggest something?


Learn to search:

RSiteSearch("stacked barchart")

Once you are there you can also ask for prior years' r-help searching  
which will provide a large number of worked examples since this has  
been a frequently asked (and answered) question.


--
David.


Yours, Simon Kiss

*
Simon J. Kiss, PhD
SSHRC and DAAD Post-Doctoral Fellow
John F. Kennedy Institute of North America Studies
Free University of Berlin
Lansstraße 7-9
14195 Berlin, Germany
Cell: +49 (0)1525-300-2812,
Web: http://www.jfki.fu-berlin.de/index.html

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping rows of data by day

2010-04-19 Thread Henrique Dallazuanna
Try this:

aggregate(DF[c('data1','data2')], list(gsub('\\..*', '', DF$time)), FUN =
sum)

On Mon, Apr 19, 2010 at 12:00 PM, jennyed  wrote:

>
> Hi all,
>
> I have a set of data in hourly time steps with each row identified as
> time  data column1 data column2
> 1  
> 1.042
> 1.083
> 1.125
> 1.167
> 1.208
> 1.25 .and so on (the
> time column is in fractions of a day)
>
> I want to be able to group the data by day. I managed to do this using:
>
> Day1H = hourlydata[c(1:24),]
>
> but I'd like to be able to create groups for each day without doing this
> manually for each set of 24 rows.
>
> Any suggestions greatly appreciated
>
> Thanks
>
>
> --
> View this message in context:
> http://n4.nabble.com/Grouping-rows-of-data-by-day-tp2016063p2016063.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in a data frame: is there an efficient way to do it?

2009-09-02 Thread milton ruser
Hi there,

I think the option of 30 seconds is ok because it is less than each one
expent reading the messages :-) Just kiding...

bests

milton

On Wed, Sep 2, 2009 at 8:01 PM, Leo Alekseyev  wrote:

> Thanks everyone for the useful suggestions.  The bottleneck might be
> memory limitations of my machine (3.2GHz, 2 GB) and the fact that I am
> aggregating on a field that is a string.  Using the suggested
> as.data.frame(table(my.df$my.field)) I do get a speedup, but the
> computation still takes 30 seconds.  For the sake of comparison, I did
> write the "counting up rows with common values" function using a Perl
> hash (it's only 5 lines of Perl) and it takes 15 seconds to run -- a
> 2x speedup.  Not yet sure if it's worth the hassle.
>
> --Leo
>
> On Wed, Sep 2, 2009 at 4:28 PM, David M
> Smith wrote:
> > You may want to try using isplit (from the iterators package). Combined
> with
> > foreach, it's an efficient way of iterating through a data frame by
> groups
> > of rows defined by common values of a columns (which I think is what
> you're
> > after). You can speed things up further if you have a multiprocessor
> system
> > with the doMC package to run iterations in parallel. There's an example
> > here:
> >
> http://blog.revolution-computing.com/2009/08/blockprocessing-a-data-frame-with-isplit.html
> > Hope this helps,
> > # David Smith
> > On Wed, Sep 2, 2009 at 3:39 PM, Leo Alekseyev  wrote:
> >>
> >> I have a data frame with about 10^6 rows; I want to group the data
> >> according to entries in one of the columns and do something with it.
> >> For instance, suppose I want to count up the number of elements in
> >> each group.  I tried something like aggregate(my.df$my.field,
> >> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> >> the split() function was slow (I killed it before it completed).  Is
> >> there a way to efficiently accomplish this in R?..  I am almost
> >> tempted to write an external Perl/Python script entering every row
> >> into a hashtable keyed by my.field and iterating over the keys...
> >> Might this be faster?..
> >>
> >> __
> >> [email protected] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > David M Smith 
> > Director of Community, REvolution Computing www.revolution-computing.com
> > Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)
> >
> > Check out our upcoming events schedule at
> > www.revolution-computing.com/events
> >
> >
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in a data frame: is there an efficient way to do it?

2009-09-02 Thread Leo Alekseyev
Thanks everyone for the useful suggestions.  The bottleneck might be
memory limitations of my machine (3.2GHz, 2 GB) and the fact that I am
aggregating on a field that is a string.  Using the suggested
as.data.frame(table(my.df$my.field)) I do get a speedup, but the
computation still takes 30 seconds.  For the sake of comparison, I did
write the "counting up rows with common values" function using a Perl
hash (it's only 5 lines of Perl) and it takes 15 seconds to run -- a
2x speedup.  Not yet sure if it's worth the hassle.

--Leo

On Wed, Sep 2, 2009 at 4:28 PM, David M
Smith wrote:
> You may want to try using isplit (from the iterators package). Combined with
> foreach, it's an efficient way of iterating through a data frame by groups
> of rows defined by common values of a columns (which I think is what you're
> after). You can speed things up further if you have a multiprocessor system
> with the doMC package to run iterations in parallel. There's an example
> here:
> http://blog.revolution-computing.com/2009/08/blockprocessing-a-data-frame-with-isplit.html
> Hope this helps,
> # David Smith
> On Wed, Sep 2, 2009 at 3:39 PM, Leo Alekseyev  wrote:
>>
>> I have a data frame with about 10^6 rows; I want to group the data
>> according to entries in one of the columns and do something with it.
>> For instance, suppose I want to count up the number of elements in
>> each group.  I tried something like aggregate(my.df$my.field,
>> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
>> the split() function was slow (I killed it before it completed).  Is
>> there a way to efficiently accomplish this in R?..  I am almost
>> tempted to write an external Perl/Python script entering every row
>> into a hashtable keyed by my.field and iterating over the keys...
>> Might this be faster?..
>>
>> __
>> [email protected] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> David M Smith 
> Director of Community, REvolution Computing www.revolution-computing.com
> Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)
>
> Check out our upcoming events schedule at
> www.revolution-computing.com/events
>
>

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in a data frame: is there an efficient way to do it?

2009-09-02 Thread David M Smith
You may want to try using isplit (from the iterators package). Combined with
foreach, it's an efficient way of iterating through a data frame by groups
of rows defined by common values of a columns (which I think is what you're
after). You can speed things up further if you have a multiprocessor system
with the doMC package to run iterations in parallel. There's an example
here:
http://blog.revolution-computing.com/2009/08/blockprocessing-a-data-frame-with-isplit.html

Hope this helps,
# David Smith

On Wed, Sep 2, 2009 at 3:39 PM, Leo Alekseyev  wrote:

> I have a data frame with about 10^6 rows; I want to group the data
> according to entries in one of the columns and do something with it.
> For instance, suppose I want to count up the number of elements in
> each group.  I tried something like aggregate(my.df$my.field,
> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> the split() function was slow (I killed it before it completed).  Is
> there a way to efficiently accomplish this in R?..  I am almost
> tempted to write an external Perl/Python script entering every row
> into a hashtable keyed by my.field and iterating over the keys...
> Might this be faster?..
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
David M Smith 
Director of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (San Francisco, USA)

Check out our upcoming events schedule at
www.revolution-computing.com/events

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in a data frame: is there an efficient way to do it?

2009-09-02 Thread jim holtman
Take 0.6 seconds on my slow laptop:

> n <- 1e6
> x <- data.frame(a=sample(LETTERS, n, TRUE))
> system.time(print(tapply(x$a, x$a, length)))
A B C D E F G H I J K
L M N O P Q
38555 38349 38647 38271 38456 38352 38644 38679 38575 38730 38471
38379 38540 38413 38365 38501 38555
R S T U V W X Y Z
38379 38417 38326 38509 38238 38395 38625 38175 38454
   user  system elapsed
   0.590.020.63
>




On Wed, Sep 2, 2009 at 6:39 PM, Leo Alekseyev wrote:
> I have a data frame with about 10^6 rows; I want to group the data
> according to entries in one of the columns and do something with it.
> For instance, suppose I want to count up the number of elements in
> each group.  I tried something like aggregate(my.df$my.field,
> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> the split() function was slow (I killed it before it completed).  Is
> there a way to efficiently accomplish this in R?..  I am almost
> tempted to write an external Perl/Python script entering every row
> into a hashtable keyed by my.field and iterating over the keys...
> Might this be faster?..
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in a data frame: is there an efficient way todo it?

2009-09-02 Thread Bert Gunter
table() and xtabs() are fast only because they are just doing counts. If you
want the general case, you need ?tapply. aggregate() is basically a wrapper
for lapply and so you may have the same performance issues with tapply . Try
it to see. They are essentially doing the sort of hash table you describe in
R code.

Whether Perl or Python would be faster I cannot say -- but are you including
the time required to develop and debug the script in your assessment?

Bert Gunter
Genentech Nonclinical Biostatistics


-Original Message-
From: [email protected] [mailto:[email protected]] On
Behalf Of David Winsemius
Sent: Wednesday, September 02, 2009 3:59 PM
To: Leo Alekseyev
Cc: [email protected]
Subject: Re: [R] Grouping data in a data frame: is there an efficient way
todo it?

table is reasonably fast. I have more than 4 X 10^6 records and a 2D  
table takes very little time:

  nUA <- with (TRdta, table(URwbc, URrbc)) # both URwbc and URrbc are  
factors
  nUA

This does the same thing and took about 5 seconds just now:

xtabs( ~ URwbc + URrbc, data=TRdta)

On Sep 2, 2009, at 6:39 PM, Leo Alekseyev wrote:

> I have a data frame with about 10^6 rows; I want to group the data
> according to entries in one of the columns and do something with it.
> For instance, suppose I want to count up the number of elements in
> each group.  I tried something like aggregate(my.df$my.field,
> list(my.df$my.field), length) but it seems to be very slow.  Likewise,
> the split() function was slow (I killed it before it completed).  Is
> there a way to efficiently accomplish this in R?..  I am almost
> tempted to write an external Perl/Python script entering every row
> into a hashtable keyed by my.field and iterating over the keys...
> Might this be faster?..



David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in a data frame: is there an efficient way to do it?

2009-09-02 Thread David Winsemius
table is reasonably fast. I have more than 4 X 10^6 records and a 2D  
table takes very little time:


 nUA <- with (TRdta, table(URwbc, URrbc)) # both URwbc and URrbc are  
factors

 nUA

This does the same thing and took about 5 seconds just now:

xtabs( ~ URwbc + URrbc, data=TRdta)

On Sep 2, 2009, at 6:39 PM, Leo Alekseyev wrote:


I have a data frame with about 10^6 rows; I want to group the data
according to entries in one of the columns and do something with it.
For instance, suppose I want to count up the number of elements in
each group.  I tried something like aggregate(my.df$my.field,
list(my.df$my.field), length) but it seems to be very slow.  Likewise,
the split() function was slow (I killed it before it completed).  Is
there a way to efficiently accomplish this in R?..  I am almost
tempted to write an external Perl/Python script entering every row
into a hashtable keyed by my.field and iterating over the keys...
Might this be faster?..




David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in dataframe

2009-07-15 Thread Timo Schneider
Am Mittwoch, den 15.07.2009, 00:42 -0500 schrieb [email protected]:

Hi!

> Hi: I think aggregate does what you want. you had 34 in one of your
> columns but I think you meant it to be 33.
> 
> DF <- read.table(textConnection("ExpA ExpB ExpC Size
> 1 12 23 33 1
> 2 12 24 29 1
> 3 10 22 34 1
> 4 25 50 60 2
> 5 24 53 62 2
> 6 21 49 61 2"),header=TRUE)
> 
> print(DF)
> print(str(DF))
> 
> aggregate(DF,list(DF$Size),median)

Yes, thanks to you and all the other people who helped! The aggregate
function is exactly what I was looking for. Thanks for the help!

Regards,
Timo

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in dataframe

2009-07-15 Thread John Kane

Another approach is to use the reshape package 

--Assuming your data.frame is called xx
--
libarary(reshape)
mm <- melt(xx, id=c("Size")) ; mm
cast(mm, Size ~variable, median)
--

--- On Tue, 7/14/09, Timo Schneider  wrote:

> From: Timo Schneider 
> Subject: [R] Grouping data in dataframe
> To: "[email protected]" 
> Received: Tuesday, July 14, 2009, 11:56 PM
> Hello,
> 
> I have a dataframe (obtained from read.table()) which looks
> like
> 
>  
>    ExpA   ExpB   ExpC   Size
> 1      12     23 
>   33      1
> 2      12     24 
>   29      1
> 3      10     22 
>   34      1
> 4      25     50 
>   60      2
> 5      24     53 
>   62      2
> 6      21     49 
>   61      2
> 
> now I want to take all rows that have the same value in the
> "Size"
> column and apply a function to the columns of these rows
> (for example
> median()). The result should be a new dataframe with the
> medians of the
> groups, like this:
> 
>  
>    ExpA   ExpB   ExpC   Size
> 1      12     23 
>   34      1
> 2      24     50 
>   61      2
> 
> I tried to play with the functions by() and tapply() but I
> didn't get
> the results I wanted so far, so any help on this would be
> great!
> 
> The reason why I am having this problem: (I explain this
> just to make
> sure I don't do something against the nature of R.)
> 
> I am doing 3 simillar experiments, A,B,C and I change a
> parameter in the
> experiment (size). Every experiment is done multiple times
> and I need
> the median or average over all experiments that are the
> same. Should I
> preprocess my data files so that they are completely
> different? Or is it
> fine the way it is and I just overlooked the simple
> solution to the
> problem described above?
> 
> Regards,
> Timo
> 
> __
> [email protected]
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 


  __
Make your browsing faster, safer, and easier with the new Internet Explorer® 8. 
Optimized for Yahoo! Get it Now for Free! at 
http://downloads.yahoo.com/ca/internetexplorer/

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in dataframe

2009-07-14 Thread Dieter Menne



Timo Schneider wrote:
> 
> 
> I have a dataframe (obtained from read.table()) which looks like
> 
>  ExpA   ExpB   ExpC   Size
> 1  12 2333  1
> 2  12 2429  1
> 3  10 2234  1
> 4  25 5060  2
> 5  24 5362  2
> 6  21 4961  2
> 
> now I want to take all rows that have the same value in the "Size"
> column and apply a function to the columns of these rows (for example
> median()). The result should be a new dataframe with the medians of the
> groups, like this:
> 
> 

Besides the mentioned aggregate, you could use one of the functions in
package plyr.

Dieter

-- 
View this message in context: 
http://www.nabble.com/Grouping-data-in-dataframe-tp24491539p24492807.html
Sent from the R help mailing list archive at Nabble.com.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in dataframe

2009-07-14 Thread Moshe Olshansky

Try ?aggregate

--- On Wed, 15/7/09, Timo Schneider  wrote:

> From: Timo Schneider 
> Subject: [R] Grouping data in dataframe
> To: "[email protected]" 
> Received: Wednesday, 15 July, 2009, 1:56 PM
> Hello,
> 
> I have a dataframe (obtained from read.table()) which looks
> like
> 
>  
>    ExpA   ExpB   ExpC   Size
> 1      12     23 
>   33      1
> 2      12     24 
>   29      1
> 3      10     22 
>   34      1
> 4      25     50 
>   60      2
> 5      24     53 
>   62      2
> 6      21     49 
>   61      2
> 
> now I want to take all rows that have the same value in the
> "Size"
> column and apply a function to the columns of these rows
> (for example
> median()). The result should be a new dataframe with the
> medians of the
> groups, like this:
> 
>  
>    ExpA   ExpB   ExpC   Size
> 1      12     23 
>   34      1
> 2      24     50 
>   61      2
> 
> I tried to play with the functions by() and tapply() but I
> didn't get
> the results I wanted so far, so any help on this would be
> great!
> 
> The reason why I am having this problem: (I explain this
> just to make
> sure I don't do something against the nature of R.)
> 
> I am doing 3 simillar experiments, A,B,C and I change a
> parameter in the
> experiment (size). Every experiment is done multiple times
> and I need
> the median or average over all experiments that are the
> same. Should I
> preprocess my data files so that they are completely
> different? Or is it
> fine the way it is and I just overlooked the simple
> solution to the
> problem described above?
> 
> Regards,
> Timo
> 
> __
> [email protected]
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in dataframe

2009-07-14 Thread Peter Alspach
Tena koe Timo

?aggregate

HTH ...

Peter Alspach 

> -Original Message-
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Timo Schneider
> Sent: Wednesday, 15 July 2009 3:56 p.m.
> To: [email protected]
> Subject: [R] Grouping data in dataframe
> 
> Hello,
> 
> I have a dataframe (obtained from read.table()) which looks like
> 
>  ExpA   ExpB   ExpC   Size
> 1  12 2333  1
> 2  12 2429  1
> 3  10 2234  1
> 4  25 5060  2
> 5  24 5362  2
> 6  21 4961  2
> 
> now I want to take all rows that have the same value in the "Size"
> column and apply a function to the columns of these rows (for 
> example median()). The result should be a new dataframe with 
> the medians of the groups, like this:
> 
>  ExpA   ExpB   ExpC   Size
> 1  12 2334  1
> 2  24 5061  2
> 
> I tried to play with the functions by() and tapply() but I 
> didn't get the results I wanted so far, so any help on this 
> would be great!
> 
> The reason why I am having this problem: (I explain this just 
> to make sure I don't do something against the nature of R.)
> 
> I am doing 3 simillar experiments, A,B,C and I change a 
> parameter in the experiment (size). Every experiment is done 
> multiple times and I need the median or average over all 
> experiments that are the same. Should I preprocess my data 
> files so that they are completely different? Or is it fine 
> the way it is and I just overlooked the simple solution to 
> the problem described above?
> 
> Regards,
> Timo
> 
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

The contents of this e-mail are confidential and may be subject to legal 
privilege.
 If you are not the intended recipient you must not use, disseminate, 
distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received 
this
 e-mail in error, please notify the sender and delete all material pertaining 
to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping multiple runs of multiple datasets in lattice's xyplot

2009-04-28 Thread Deepayan Sarkar
On Mon, Apr 27, 2009 at 12:33 PM, Daniel Kornhauser
 wrote:
> Hi:
>
> I don't know if my explanation below is clear, so afterwards, I wrote a
> small a self contained annotated example that generates two plots.
> I execute simulations with different parameters settings that create several
> datasets, and for each parameter setting (or dataset) I can obtain different
> results by choosing a different random seed.
>
> I am stuck grouping data from the datasets and runs in an xyplot of lattice.
> First, I would like to group my data *by datasets* but only using color.
> Then, I would like to "subgroup" the data points into to draw  individual
> line *by runs*.
> Does anybody have any suggestions on how to do this ?
>
>               Thanks
>                                                Daniel.
>
> ### START EXAMPLE
> library("lattice")
> model.run = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
> 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5,
> 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8,
> 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11,
> 11, 11, 11, 11, 12, 12, 12, 12, 12, 12)
>
> model.dataset = c(2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 1, 2, 1, 2, 1, 2, 1, 2,
> 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,
> 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1,
> 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
>
> model.conditional.var = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,
> 33, 33, 33, 33, 33, 33, 33, 33, 66, 66, 66, 66, 33, 33, 33, 33, 33, 33, 66,
> 66, 66, 66, 66, 66, 66, 66, 66, 66, 99, 99,99, 99, 66, 66, 66, 66, 66, 66,
> 99, 99, 99, 99, 66, 66, 66, 66, 66, 66, 99, 99, 99, 99, 99, 99, 99, 99, 99,
> 99, 99, 99, 99, 99, 99, 99, 99, 99)
>
> model.x = c(0, 0, 1, 1, 2, 2, 3, 3, 4, 5, 0, 0, 1, 1, 2, 2, 3, 3, 4, 5, 0,
> 1, 2, 3, 4, 5, 0, 1, 2, 3, 0, 0, 1, 1, 2, 2, 3, 3, 4, 5, 0, 1, 2, 3, 4, 5,
> 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 0, 1,2, 3, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 0, 1,
> 2, 3, 4, 5, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4,
> 5, 0, 1, 2, 3, 4, 5)
>
> model.y = c(50, 50, 50, 51, 50, 55, 50, 59, 49, 46, 50, 50, 49, 53, 51, 55,
> 53, 54, 53, 52, 50, 50, 52, 52, 52, 46, 83, 82, 83, 87, 83, 83, 85, 89, 85,
> 93, 89, 97, 92, 92, 83, 85, 85, 88, 90, 90, 116, 117, 115, 117, 83, 86, 88,
> 91, 91, 96, 116, 116, 118, 123, 116, 118, 121,122, 120, 121, 149, 151, 154,
> 158, 116, 118, 120, 123, 128, 131, 149, 148, 150, 153, 116, 116, 118, 118,
> 122, 126, 149, 147, 145, 149, 157, 163, 149, 156, 165, 175, 180, 180, 149,
> 150, 152, 157, 165, 169)
>
> dat = data.frame (run =  model.run,  dataset = model.dataset, conditional.x
> = model.conditional.x, x = model.x, y = model.y)

myColors <- c("red", "blue")

xyplot(y ~ x | factor(conditional.var),
   data = dat,
   groups = interaction(dataset, run),
   col = myColors,
   ylab =  NULL,
   type = "l",
   key = list(text = list(c("1", "2")),
  lines = list(col = myColors),
  space = "right"))

The trick is to specify just two colors, and create the interaction()
in the right order so that the colors get recycled properly.

-Deepayan

>
> a = xyplot(y ~ x | conditional.var,
>        main = "Correctly grouped colors",
>        sub =  "I would like to group multipe data sets with the colors
> shown here, but I don't want the data points to be grouped in a single
> lines",
>        groups = dataset, # Grouping by Data Set
>        ylab =  NULL,
>        data = dat,
>        type = "l",
>        auto.key = list(space = "right"))
> print(a)
>
> dev.new()
> b = xyplot(y ~ x | conditional.var,
>        main = "Correctly grouped lines",
>        sub = "I would like group two data sets with the lines shown here,
> but not with an individual color for each line",
>        data = dat,
>        groups = run,    # Grouping by Run
>        ylab =  NULL,
>        type = "l",
>        auto.key = list(space = "right"))
> print(b)
> ### END EXAMPLE
>
>        [[alternative HTML version deleted]]
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping of data frames to calculate means

2009-04-06 Thread Thomas Lumley

On Mon, 6 Apr 2009, Steve Murray wrote:



Dear R Users,

I have 120 data frames of the format table_198601, table_198602... 
table_198612, table_198701, table_198702... table_198712 through to 
table_199512 (ie. the first 4 digits are years which vary from 1986 to 
1995, and the final two digits are months which vary from 01 to 12 for 
each year).


I simply hope to find the means of column 3 of each of the 120 tables 
without having to type out mean(table_198601[3]) etc etc each time. How 
would I go about doing this? And how would I go about finding the mean 
of all the January months (01) from say 1986 to 1990?


Put all the tables in a list, then you can iterate over them, eg
 lapply(the_list, function(table) mean(table[,3))

You might also want a list of lists to maintain the annual structure, so 
that


januaries <- lapply(list_list, function(year) mean(year[[1]][,3]))

-thomas


Finally, I hope to be able to plot (as a scatter graph) the values of 
column 1 against the mean of those from column 3 for all the months in 
the period 1989 to 1990 and then 1991 to 1995.


Any help offered would be very much appreciated.

Thanks,

Steve

_
[[elided Hotmail spam]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
[email protected] of Washington, Seattle

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Numbers

2009-03-25 Thread Jason Rupert

Jorge, 

Thank you very much for your post. 

I tried the below with a few modifications:
# First case
N<-10
X<-rnorm(N)
step_size<-1

# Groups
g<-rep(1:(N/step_size),each=step_size)

# The result
tmp_output<-tapply(X,g,mean)

length_tmp_output<-length(tmp_output)
tmp_x_vals<-rep(step_size,length_tmp_output)
plot(tmp_x_vals, tmp_output)

for(ii in 1:val_size)
{   
step_size<-ii

# Groups
g<-rep(1:(N/step_size),each=step_size)

# The result
tmp_output<-tapply(X,g,mean)

length_tmp_output<-length(tmp_output)
tmp_x_vals<-rep(step_size,length_tmp_output)
points(tmp_x_vals, tmp_output)
}

However, when I change the step_size to 100, I receive the following error:
"Error in tapply(X, g, mean) : arguments must have same length"

Do you have any idea why the for loop crashes?  

I figured it would run smooth since it runs fine prior to the loop. 

Thanks for any insights. 




--- On Tue, 3/24/09, Jorge Ivan Velez  wrote:

> From: Jorge Ivan Velez 
> Subject: Re: [R] Grouping Numbers
> To: [email protected]
> Cc: [email protected]
> Date: Tuesday, March 24, 2009, 10:28 PM
> Dear Jason,
> Try this:
> 
> # First case
> N<-10
> X<-rnorm(N)
> 
> # Groups
> g<-rep(1:(N/n),each=10)
> 
> # The result
> tapply(X,g,mean)
> 
> 
> For the second case, just change each=10 by each=100 and
> run again the code
> above.
> 
> HTH,
> 
> Jorge
> 
> 
> On Tue, Mar 24, 2009 at 10:52 PM, Jason Rupert
> wrote:
> 
> >
> > Ugh...This should be very simple, but evidently I am
> not searching for the
> > proper term.
> >
> > Given the below:
> > val_size<-10
> > x_vals<-rnorm(val_size)
> >
> > I would like to group them according to the following
> > x_vals_mean_tmp[1]<-mean(x_vals[1:10])
> > x_vals_mean_tmp[2]<-mean(x_vals[11:20])
> > ...
> > x_vals_mean_tmp[n]<-mean(x_vals[1:10])
> >
> > Then,
> > I would like to group them according to the following
> > x_vals_mean_tmp[1]<-mean(x_vals[1:100])
> > x_vals_mean_tmp[2]<-mean(x_vals[101:200])
> > ...
> > x_vals_mean_tmp[m]<-mean(x_vals[99901:10])
> > etc.
> >
> > I'm pretty sure I can come up with a loop to do
> this, but wondering if
> > there is something that will allow me to break up the
> x_vals vector
> > according to a certain step size.  I looked at split
> and cut, but those did
> > not appear to accomplish what is needed.
> >
> > Thanks again.
> >
> > __
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Numbers

2009-03-24 Thread Jorge Ivan Velez
Dear Jason,
Try this:

# First case
N<-10
X<-rnorm(N)

# Groups
g<-rep(1:(N/n),each=10)

# The result
tapply(X,g,mean)


For the second case, just change each=10 by each=100 and run again the code
above.

HTH,

Jorge


On Tue, Mar 24, 2009 at 10:52 PM, Jason Rupert wrote:

>
> Ugh...This should be very simple, but evidently I am not searching for the
> proper term.
>
> Given the below:
> val_size<-10
> x_vals<-rnorm(val_size)
>
> I would like to group them according to the following
> x_vals_mean_tmp[1]<-mean(x_vals[1:10])
> x_vals_mean_tmp[2]<-mean(x_vals[11:20])
> ...
> x_vals_mean_tmp[n]<-mean(x_vals[1:10])
>
> Then,
> I would like to group them according to the following
> x_vals_mean_tmp[1]<-mean(x_vals[1:100])
> x_vals_mean_tmp[2]<-mean(x_vals[101:200])
> ...
> x_vals_mean_tmp[m]<-mean(x_vals[99901:10])
> etc.
>
> I'm pretty sure I can come up with a loop to do this, but wondering if
> there is something that will allow me to break up the x_vals vector
> according to a certain step size.  I looked at split and cut, but those did
> not appear to accomplish what is needed.
>
> Thanks again.
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping Numbers

2009-03-24 Thread David Winsemius

Look at the seq function's help page.

--  
David Winsemius

On Mar 24, 2009, at 10:52 PM, Jason Rupert wrote:



Ugh...This should be very simple, but evidently I am not searching  
for the proper term.


Given the below:
val_size<-10
x_vals<-rnorm(val_size)

I would like to group them according to the following
x_vals_mean_tmp[1]<-mean(x_vals[1:10])
x_vals_mean_tmp[2]<-mean(x_vals[11:20])
...
x_vals_mean_tmp[n]<-mean(x_vals[1:10])

Then,
I would like to group them according to the following
x_vals_mean_tmp[1]<-mean(x_vals[1:100])
x_vals_mean_tmp[2]<-mean(x_vals[101:200])
...
x_vals_mean_tmp[m]<-mean(x_vals[99901:10])
etc.

I'm pretty sure I can come up with a loop to do this, but wondering  
if there is something that will allow me to break up the x_vals  
vector according to a certain step size.  I looked at split and cut,  
but those did not appear to accomplish what is needed.


Thanks again.




Heritage Laboratories
West Hartford, CT

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping stripchart markers

2009-02-14 Thread Dimitris Rizopoulos
well, for a simple stripchart (i.e., default method) you can use 
something like the following:


x <- rnorm(9)
cl <- c(1,1,1,2,2,2,3,3,3)

plot(range(x), c(0, 1), type = "n", yaxt = "n", ann = FALSE)
points(x, rep(0.5, length(x)), pch = cl)


I hope it helps.

Best,
Dimitris


Sam Player wrote:
If I have a dataset grouped into 3 classes, how can I construct a 
stripchart where the markers are different for each class, e.g. a cross 
for 1, square for 2 and circle for 3. In the example below I try to 
define the marker type by the class but this only changes all the 
markers from the default squares into circles. I am assuming I need to 
set pch as an argument in par but am unsure how to proceed.


data <- rnorm(9)
class <- c(1,1,1,2,2,2,3,3,3)
x <- data.frame(data,class)
stripchart(x[,1], pch=x[,2])

Thanks,

Sam.

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >