subject:"\[R\] grouping"

Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Duncan Murdoch

There's a package called "pivottabler" which exports PivotTable: 
http://pivottabler.org.uk/reference/PivotTable.html .


Duncan Murdoch

On 30/09/2023 7:11 a.m., John Kane wrote:

To follow up on Rui Barradas's post, I do not think PivotTable is an R
command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in
the {tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:


Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has

the

FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with
calls to library() when using non-base functionality.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Ebert,Timothy Aaron

In this sort of post it would help if we knew the package that was being used 
for the example. I found one option.
https://cran.r-project.org/web/packages/pivottabler/vignettes/v00-vignettes.html

There may be a way to create a custom data type that would be a date but 
restricted to a -mm format. I do not know how to do this.
Could you work with the date as a string with a -mm format. The issue is 
that R will not handle the string as a date.
A third option would be to look at the lubridate package that can be installed 
by itself or as part of tidyverse. I do not promise that this is a solution, 
but it could be.



-Original Message-
From: R-help  On Behalf Of John Kane
Sent: Saturday, September 30, 2023 7:11 AM
To: Rui Barradas 
Cc: Paul Bernal ; R 
Subject: Re: [R] Grouping by Date and showing count of failures by date

[External Email]

To follow up on Rui Barradas's post, I do not think PivotTable is an R command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in the 
{tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:

> Às 21:29 de 29/09/2023, Paul Bernal escreveu:
> > Dear friends,
> >
> > Hope you are doing great. I am attaching the dataset I am working
> > with because, when I tried to dput() it, I was not able to copy the
> > entire result from dput(), so I apologize in advance for that.
> >
> > I am interested in creating a column named Failure_Date_Period that
> > has
> the
> > FAILDATE but formatted as _MM. Then I want to count the number
> > of failures (given by column WONUM) and just have a dataframe that
> > has the FAILDATE and the count of WONUM.
> >
> > I tried this:
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt$defineCalculation(calculationName = "FailCounts",
> > summariseExpression="n()")
> > pt$renderPivot()
> >
> > but I was not successful. Bottom line, I need to create a new
> > dataframe that has the number of failures by FAILDATE, but in -MM 
> > format.
> >
> > Any help and/or guidance will be greatly appreciated.
> >
> > Kind regards,
> > Paul
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://st/
> > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl
> > .edu%7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a6233
> > 1e1b84%7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> > %7C%7C=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D
> > ed=0
> > PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C76
> 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=I8
> QM8BE7EdpXzfFuwe93IqqL4JS7wWGgfr24XRH5LHs%3D=0
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> No data is attached. Maybe try
>
> dput(head(failuredf, 30))
>
> ?
>
> And where can we find non-base PivotTable? Please start the scripts
> with calls to library() when using non-base functionality.
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> presença de vírus.
> http://www.a/
> vg.com%2F=05%7C01%7Ctebert%40ufl.edu%7C7647cf60560f40177c9908dbc1
> a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638316691975258863
> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=zBgPdOLZNvCOa4n2GXWWjLh4wg
> 3L4TdGXBMaGJ6n%2BsI%3D=0
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu
> %7C7647cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638316691975258863%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> ta=LNhXDb%2Bv5MGVc9SiL7KrJCMvD1Wkp4pQ14iScfZqxtk%3D=0
> PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C76
> 47cf60560f40177c9908dbc1a63d9a%7C0d4da0f84a314d76ace60a62331e1b84%7C

Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread John Kane

To follow up on Rui Barradas's post, I do not think PivotTable is an R
command.

You may be thinking og the "pivot_longer" and "pivot_wider" functions in
the {tidyr} package which is part of {tidyverse}.

On Sat, 30 Sept 2023 at 07:03, Rui Barradas  wrote:

> Às 21:29 de 29/09/2023, Paul Bernal escreveu:
> > Dear friends,
> >
> > Hope you are doing great. I am attaching the dataset I am working with
> > because, when I tried to dput() it, I was not able to copy the entire
> > result from dput(), so I apologize in advance for that.
> >
> > I am interested in creating a column named Failure_Date_Period that has
> the
> > FAILDATE but formatted as _MM. Then I want to count the number of
> > failures (given by column WONUM) and just have a dataframe that has the
> > FAILDATE and the count of WONUM.
> >
> > I tried this:
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt <- PivotTable$new()
> > pt$addData(failuredf)
> > pt$addColumnDataGroups("FAILDATE")
> > pt$defineCalculation(calculationName = "FailCounts",
> > summariseExpression="n()")
> > pt$renderPivot()
> >
> > but I was not successful. Bottom line, I need to create a new dataframe
> > that has the number of failures by FAILDATE, but in -MM format.
> >
> > Any help and/or guidance will be greatly appreciated.
> >
> > Kind regards,
> > Paul
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> No data is attached. Maybe try
>
> dput(head(failuredf, 30))
>
> ?
>
> And where can we find non-base PivotTable? Please start the scripts with
> calls to library() when using non-base functionality.
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a
> presença de vírus.
> www.avg.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
John Kane
Kingston ON Canada

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Rui Barradas


Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with 
calls to library() when using non-base functionality.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Paul Bernal

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping Question

2020-03-22 Thread Chris Evans

Here's a very "step by step" example with dplyr as I'm trying to teach myself 
the Tidyverse way of being

library(dplyr)

# SerialMeasurementMeas_testSerial_test
# 117failfail
# 116passfail
# 212passpass
# 28passpass
# 210passpass
# 319failfail
# 313passpass

dat <- as.data.frame(list(Serial = c(1,1,2,2,2,3,3),
  Measurement = c(17, 16, 12, 8, 10, 19, 13),
  Meas_test = c("fail", "pass", "pass", "pass", "pass", 
"fail", "pass")))

dat %>%
  group_by(Serial) %>%
  summarise(Serial_test = sum(Meas_test == "fail")) %>%
  mutate(Serial_test = if_else(Serial_test > 0, 1, 0),
 Serial_test = factor(Serial_test,
  levels = 0:1,
  labels = c("pass", "fail"))) -> groupedDat

dat %>%
  left_join(groupedDat) # add -> dat to the end to pip to dat

Gives:

  Serial Measurement Meas_test Serial_test
1  1  17  failfail
2  1  16  passfail
3  2  12  passpass
4  2   8  passpass
5  2  10  passpass
6  3  19  failfail
7  3  13  passfail

Would be easier for us if used dput() to share your data but thanks for the 
minimal example!

Chris

- Original Message -
> From: "Ivan Krylov" 
> To: "Thomas Subia via R-help" 
> Cc: "Thomas Subia" 
> Sent: Sunday, 22 March, 2020 07:24:15
> Subject: Re: [R] Grouping Question

> On Sat, 21 Mar 2020 20:01:30 -0700
> Thomas Subia via R-help  wrote:
> 
>> Serial_test is a pass, when all of the Meas_test are pass for a given
>> serial. Else Serial_test is a fail.
> 
> Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
> packages.
> 
> --
> Best regards,
> Ivan
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Chris Evans  Visiting Professor, University of Sheffield 

I do some consultation work for the University of Roehampton 
 and other places
but  remains my main Email address.  I have a work web site 
at:
   https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
   http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see: 
   https://www.psyctc.org/pelerinage2016/semigrating-to-france/ 
That page will also take you to my blog which started with earlier joys in 
France and Spain!

If you want to book to talk, I am trying to keep that to Thursdays and my diary 
is at:
   https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping Question

2020-03-22 Thread Ivan Krylov

On Sat, 21 Mar 2020 20:01:30 -0700
Thomas Subia via R-help  wrote:

> Serial_test is a pass, when all of the Meas_test are pass for a given
> serial. Else Serial_test is a fail.

Use by/tapply in base R or dplyr::group_by if you prefer tidyverse
packages.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping Question

2020-03-21 Thread Thomas Subia via R-help

Colleagues,

Here is my dataset.

Serial  Measurement Meas_test   Serial_test
1   17  failfail
1   16  passfail
2   12  passpass
2   8   passpass
2   10  passpass
3   19  failfail
3   13  passpass

If a measurement is less than or equal to 16, then Meas_test is pass. Else
Meas_test is fail
This is easy to code.

Serial_test is a pass, when all of the Meas_test are pass for a given
serial. Else Serial_test is a fail.
I'm at a loss to figure out how to do this in R.

Some guidance would be appreciated.

All the best,

Thomas Subia

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman

Rui

Your first code worked just fine.

Jeff

-Original Message-
From: Rui Barradas <ruipbarra...@sapo.pt> 
Sent: Saturday, May 26, 2018 8:30 AM
To: reichm...@sbcglobal.net; 'R-help' <r-help@r-project.org>
Subject: Re: [R] Grouping by 3 variable and renaming groups

Hello,

Sorry, but I think my first answer is wrong.
You probably want something along the lines of


sp <- split(priceStore_Grps, priceStore_Grps$StorePC) res <- 
lapply(seq_along(sp), function(i){
 sp[[i]]$StoreID <- paste("Store", i, sep = "_")
 sp[[i]]
})
res <- do.call(rbind, res)
row.names(res) <- NULL


Hope this helps,

Rui Barradas

On 5/26/2018 2:22 PM, Rui Barradas wrote:
> Hello,
> 
> See if this is it:
> 
> priceStore_Grps$StoreID <- paste("Store", 
> seq_len(nrow(priceStore_Grps)), sep = "_")
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> On 5/26/2018 2:03 PM, Jeff Reichman wrote:
>> ALCON
>>
>>
>> I'm trying to figure out how to rename groups in a data frame after 
>> groups by selected variabels.  I am using the dplyr library to group 
>> my data by 3 variables as follows
>>
>>
>> # group by lat (StoreX)/long (StoreY)
>>
>> priceStore <- LapTopSales[,c(4,5,15,16)]
>>
>> priceStore <- priceStore[complete.cases(priceStore), ]  # keep only 
>> non NA records
>>
>> priceStore_Grps <- priceStore %>%
>>
>>group_by(StorePC, StoreX, StoreY) %>%
>>
>>summarize(meanPrice=(mean(RetailPrice)))
>>
>>
>> which results in .
>>
>>
>>> priceStore_Grps
>>
>> # A tibble: 15 x 4
>>
>> # Groups:   StorePC, StoreX [?]
>>
>> StorePC  StoreX StoreY meanPrice
>>
>> 
>>
>> 1 CR7 8LE  532714 168302  472.
>>
>> 2 E2 0RY   535652 182961  520.
>>
>> 3 E7 8NW   541428 184515  467.
>>
>> 4 KT2 5AU  517917 170243  522.
>>
>> 5 N17 6QA  533788 189994  523.
>>
>>
>> Which is fine, but I then want to give each group (e.g. CR7 8LE  
>> 532714
>> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique 
>> identifier.
>>
>>
>> StorePC  StoreX StoreY meanPrice
>>
>> 
>>
>> 1 CR7 8LE  532714 168302  472.   Store 1
>>
>> 2 E2 0RY   535652 182961  520.   Store 2
>>
>> 3 E7 8NW   541428 184515  467.   Store 3
>>
>> 4 KT2 5AU  517917 170243  522.   Store 4
>>
>> 5 N17 6QA  533788 189994  523.   Store 5
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman

Rui

That did it 

Jeff

-Original Message-
From: Rui Barradas <ruipbarra...@sapo.pt> 
Sent: Saturday, May 26, 2018 8:23 AM
To: reichm...@sbcglobal.net; 'R-help' <r-help@r-project.org>
Subject: Re: [R] Grouping by 3 variable and renaming groups

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", seq_len(nrow(priceStore_Grps)), sep = 
"_")


Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:
> ALCON
> 
>   
> 
> I'm trying to figure out how to rename groups in a data frame after groups
> by selected variabels.  I am using the dplyr library to group my data by 3
> variables as follows
> 
>   
> 
> # group by lat (StoreX)/long (StoreY)
> 
> priceStore <- LapTopSales[,c(4,5,15,16)]
> 
> priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
> records
> 
> priceStore_Grps <- priceStore %>%
> 
>group_by(StorePC, StoreX, StoreY) %>%
> 
>summarize(meanPrice=(mean(RetailPrice)))
> 
>   
> 
> which results in .
> 
>   
> 
>> priceStore_Grps
> 
> # A tibble: 15 x 4
> 
> # Groups:   StorePC, StoreX [?]
> 
> StorePC  StoreX StoreY meanPrice
> 
> 
> 
> 1 CR7 8LE  532714 168302  472.
> 
> 2 E2 0RY   535652 182961  520.
> 
> 3 E7 8NW   541428 184515  467.
> 
> 4 KT2 5AU  517917 170243  522.
> 
> 5 N17 6QA  533788 189994  523.
> 
>   
> 
> Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
> 168302) a unique identifier (say) Store 1, 2, 3 or some other unique
> identifier.
> 
>   
> 
> StorePC  StoreX StoreY meanPrice
> 
> 
> 
> 1 CR7 8LE  532714 168302  472.   Store 1
> 
> 2 E2 0RY   535652 182961  520.   Store 2
> 
> 3 E7 8NW   541428 184515  467.   Store 3
> 
> 4 KT2 5AU  517917 170243  522.   Store 4
> 
> 5 N17 6QA  533788 189994  523.   Store 5
> 
>   
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Rui Barradas


Hello,

Sorry, but I think my first answer is wrong.
You probably want something along the lines of


sp <- split(priceStore_Grps, priceStore_Grps$StorePC)
res <- lapply(seq_along(sp), function(i){
sp[[i]]$StoreID <- paste("Store", i, sep = "_")
sp[[i]]
})
res <- do.call(rbind, res)
row.names(res) <- NULL


Hope this helps,

Rui Barradas

On 5/26/2018 2:22 PM, Rui Barradas wrote:

Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", 
seq_len(nrow(priceStore_Grps)), sep = "_")



Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:

ALCON


I'm trying to figure out how to rename groups in a data frame after 
groups
by selected variabels.  I am using the dplyr library to group my data 
by 3

variables as follows


# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only 
non NA

records

priceStore_Grps <- priceStore %>%

   group_by(StorePC, StoreX, StoreY) %>%

   summarize(meanPrice=(mean(RetailPrice)))


which results in .



priceStore_Grps


# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

    StorePC  StoreX StoreY meanPrice

        

1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.


Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.


    StorePC  StoreX StoreY meanPrice

        

1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Rui Barradas


Hello,

See if this is it:

priceStore_Grps$StoreID <- paste("Store", 
seq_len(nrow(priceStore_Grps)), sep = "_")



Hope this helps,

Rui Barradas

On 5/26/2018 2:03 PM, Jeff Reichman wrote:

ALCON

  


I'm trying to figure out how to rename groups in a data frame after groups
by selected variabels.  I am using the dplyr library to group my data by 3
variables as follows

  


# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
records

priceStore_Grps <- priceStore %>%

   group_by(StorePC, StoreX, StoreY) %>%

   summarize(meanPrice=(mean(RetailPrice)))

  


which results in .

  


priceStore_Grps


# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

StorePC  StoreX StoreY meanPrice



1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.

  


Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.

  


StorePC  StoreX StoreY meanPrice



1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5

  



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping by 3 variable and renaming groups

2018-05-26 Thread Jeff Reichman

ALCON

 

I'm trying to figure out how to rename groups in a data frame after groups
by selected variabels.  I am using the dplyr library to group my data by 3
variables as follows

 

# group by lat (StoreX)/long (StoreY)

priceStore <- LapTopSales[,c(4,5,15,16)]

priceStore <- priceStore[complete.cases(priceStore), ]  # keep only non NA
records

priceStore_Grps <- priceStore %>%

  group_by(StorePC, StoreX, StoreY) %>%

  summarize(meanPrice=(mean(RetailPrice)))

 

which results in .

 

> priceStore_Grps

# A tibble: 15 x 4

# Groups:   StorePC, StoreX [?]

   StorePC  StoreX StoreY meanPrice

   

1 CR7 8LE  532714 168302  472.

2 E2 0RY   535652 182961  520.

3 E7 8NW   541428 184515  467.

4 KT2 5AU  517917 170243  522.

5 N17 6QA  533788 189994  523.

 

Which is fine, but I then want to give each group (e.g. CR7 8LE  532714
168302) a unique identifier (say) Store 1, 2, 3 or some other unique
identifier.

 

   StorePC  StoreX StoreY meanPrice

   

1 CR7 8LE  532714 168302  472.   Store 1

2 E2 0RY   535652 182961  520.   Store 2

3 E7 8NW   541428 184515  467.   Store 3

4 KT2 5AU  517917 170243  522.   Store 4

5 N17 6QA  533788 189994  523.   Store 5

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping in R

2015-06-18 Thread PIKAL Petr

Hi

We can only guess what you really want.

Maybe this.

set.seed(111)
cust-sample(letters[1:5], 500, replace =T)
value-sample(1:1000, 500)
month-sample(1:12, 500, replace=T)
dat-data.frame(cust, value, month)
dat.ag-aggregate(dat$value, list(dat$month, dat$cust), sum)

 head(dat.ag)
  Group.1 Group.2x
1   1   a 2444
2   2   a 6234
3   3   a 6082
4   4   a 3691
5   5   a 3044
6   6   a 3534

dput(dat.ag)
structure(list(Group.1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L), Group.2 = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c(a, b,
c, d, e), class = factor), x = c(2444L, 6234L, 6082L,
3691L, 3044L, 3534L, 7444L, 1819L, 2295L, 4774L, 3659L, 1159L,
6592L, 1272L, 8245L, 2324L, 5189L, 3935L, 2945L, 2386L, 2796L,
2869L, 3142L, 4657L, 4411L, 6223L, 3266L, 3842L, 6056L, 7472L,
3879L, 7135L, 4544L, 4498L, 2703L, 3409L, 2748L, 2288L, 2654L,
4995L, 4626L, 5543L, 2162L, 4681L, 5853L, 6229L, 3001L, 5274L,
3852L, 2635L, 5643L, 2809L, 2988L, 3756L, 5180L, 2997L, 4883L,
4208L, 2669L, 3151L)), .Names = c(Group.1, Group.2, x), row.names = c(NA,
-60L), class = data.frame)


But maybe something different. Who knows?

If you wanted grouping by value use

?cut or ?findInterval

Cheers
Petr


 -Original Message-
 From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Shivi82
 Sent: Thursday, June 18, 2015 9:22 AM
 To: r-help@r-project.org
 Subject: [R] Grouping in R

 Hi All,

 I am working on a data where the total row count is 25+ and have
 approx.
 20 variables. One of the var on which i need to summarize the data is
 Consignor i.e. seller name.

 Now the issue here is after deleting all the duplicate names i still
 have 55000 unique customer name and i am not sure on how to summarize
 the data.

 Is there a possibility that i could create 8 or 10 groups based on the
 weight or booking they made from our company and eventually all 55000
 customers would fall under these 10 groups. Then it could be easier for
 me to analyze in which group there is a variance on a month on month
 level.




 --
 View this message in context: http://r.789695.n4.nabble.com/Grouping-
 in-R-tp4708800.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings

[R] Grouping in R

2015-06-18 Thread Shivi82

Hi All,

I am working on a data where the total row count is 25+ and have approx.
20 variables. One of the var on which i need to summarize the data is
Consignor i.e. seller name. 

Now the issue here is after deleting all the duplicate names i still have
55000 unique customer name and i am not sure on how to summarize the data.

Is there a possibility that i could create 8 or 10 groups based on the
weight or booking they made from our company and eventually all 55000
customers would fall under these 10 groups. Then it could be easier for me
to analyze in which group there is a variance on a month on month level.




--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-in-R-tp4708800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Maria Kernecker

Dear all, 

I am trying to run a GLMM following the procedure described by Rhodes et al. 
(Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in 
his example, I have four sets of explanatory variables: 
1. Land use - 1 variable, factor (forest or agriculture)
2. Location - 1 variable, factor (riparian or upland)
3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
manure, annual crop)
4. Vegetation patterns - 4 variables that are continuous (# of plant species in 
4 different functional guilds)

How do I create these sets?  I would like to build my model with these sets 
only instead of listing every variable. 

Also: is there a way of running all possible models with the different 
combinations of these sets and/or variables, sort of like running ordistep for 
ordinations?

Thanks a bunch in advance for your help!
Maria  

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Bert Gunter

Have you read An Introduction to R (or other online tutorial)? If
not, please do so before posting further here. It sounds like you are
missing very basic knowledge -- on factors -- which you need to learn
about before proceeding.

?factor

gives you the answer you seek, I believe.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
maria.kernec...@mail.mcgill.ca wrote:
 Dear all,

 I am trying to run a GLMM following the procedure described by Rhodes et al. 
 (Ch. 21) in the Zuur book Mixed effects models and extensions in R . Like in 
 his example, I have four sets of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant species 
 in 4 different functional guilds)

 How do I create these sets?  I would like to build my model with these 
 sets only instead of listing every variable.

 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?

 Thanks a bunch in advance for your help!
 Maria

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Bert Gunter

Unless there is reason to keep the conversation private, always reply
to the list. How will anyone else know that my answer wasn't
satisfactory?

1. I don't intend to go through your references. A minimal
reproducible example of what you wish to do and what you tried would
help.

2. Have you read An Intro to R?

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD
mkernec...@gmail.com wrote:
 Thanks for getting back to me.

 It seems I didn't write my question clearly and that it was misunderstood - 
 even if it is easy to answer: I would like to reduce the number of 
 explanatory variables in my model by using sets or categories that these 
 variables belong to, like Rhodes et al. did in their chapter, or like Lentini 
 et al. 2012 did in their paper.

 Factor is not the answer I am looking for, unfortunately.

 On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote:

 Have you read An Introduction to R (or other online tutorial)? If
 not, please do so before posting further here. It sounds like you are
 missing very basic knowledge -- on factors -- which you need to learn
 about before proceeding.

 ?factor

 gives you the answer you seek, I believe.

 Cheers,
 Bert

 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374

 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch




 On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
 maria.kernec...@mail.mcgill.ca wrote:
 Dear all,

 I am trying to run a GLMM following the procedure described by Rhodes et 
 al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . 
 Like in his example, I have four sets of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant 
 species in 4 different functional guilds)

 How do I create these sets?  I would like to build my model with these 
 sets only instead of listing every variable.

 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?

 Thanks a bunch in advance for your help!
 Maria

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping explanatory variables into sets for GLMM

2014-04-03 Thread Don McKenzie

Reading the Intro, as Bert suggests, would likely solve some of your problems. 
If you think about how many combinations it would take, using only one variable 
from each group in any one model, you would see that the number of individual 
models (12) is not so onerous that you couldn’t specify them one at a time.

On Apr 3, 2014, at 8:55 PM, Bert Gunter gunter.ber...@gene.com wrote:

 Unless there is reason to keep the conversation private, always reply
 to the list. How will anyone else know that my answer wasn't
 satisfactory?
 
 1. I don't intend to go through your references. A minimal
 reproducible example of what you wish to do and what you tried would
 help.
 
 2. Have you read An Intro to R?
 
 Cheers,
 Bert
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch
 
 
 
 
 On Thu, Apr 3, 2014 at 5:14 PM, Maria Kernecker, PhD
 mkernec...@gmail.com wrote:
 Thanks for getting back to me.
 
 It seems I didn't write my question clearly and that it was misunderstood - 
 even if it is easy to answer: I would like to reduce the number of 
 explanatory variables in my model by using sets or categories that these 
 variables belong to, like Rhodes et al. did in their chapter, or like 
 Lentini et al. 2012 did in their paper.
 
 Factor is not the answer I am looking for, unfortunately.
 
 On Apr 3, 2014, at 11:28 AM, Bert Gunter wrote:
 
 Have you read An Introduction to R (or other online tutorial)? If
 not, please do so before posting further here. It sounds like you are
 missing very basic knowledge -- on factors -- which you need to learn
 about before proceeding.
 
 ?factor
 
 gives you the answer you seek, I believe.
 
 Cheers,
 Bert
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 (650) 467-7374
 
 Data is not information. Information is not knowledge. And knowledge
 is certainly not wisdom.
 H. Gilbert Welch
 
 
 
 
 On Thu, Apr 3, 2014 at 6:54 AM, Maria Kernecker
 maria.kernec...@mail.mcgill.ca wrote:
 Dear all,
 
 I am trying to run a GLMM following the procedure described by Rhodes et 
 al. (Ch. 21) in the Zuur book Mixed effects models and extensions in R . 
 Like in his example, I have four sets of explanatory variables:
 1. Land use - 1 variable, factor (forest or agriculture)
 2. Location - 1 variable, factor (riparian or upland)
 3. Agricultural management - 3 variables that are binary (0 or 1 for till, 
 manure, annual crop)
 4. Vegetation patterns - 4 variables that are continuous (# of plant 
 species in 4 different functional guilds)
 
 How do I create these sets?  I would like to build my model with these 
 sets only instead of listing every variable.
 
 Also: is there a way of running all possible models with the different 
 combinations of these sets and/or variables, sort of like running ordistep 
 for ordinations?
 
 Thanks a bunch in advance for your help!
 Maria
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Don McKenzie
Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences 
College of the Environment
University of Washington
d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping on a Distance Matrix

2014-02-13 Thread Dario Strbenac

Hello,

I'm looking for a function that groups elements below a certain distance 
threshold, based on a distance matrix. In other words, I'd like to group 
samples without using a standard clustering algorithm on the distance matrix. 
For example, let the distance matrix be :

  A B C D
A 0  0.03  0.77  1.12  
B  0.03 0  1.59  1.11
C  0.77  1.59 0  0.09  
D  1.12  1.11  0.09 0

Two clusters would be found with a cutoff of 0.1. The first contains A,B. The 
second has C,D. Is there an efficient function that does this ? I can think of 
how to do this recursively, but am hoping it's already been considered.

--
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping on a Distance Matrix

2014-02-13 Thread Bert Gunter

You need to re-think. What you said is nonsense. Use an appropriate
clustering algorithm.
(a can be near b; b can be near c; but a is not near c, using near =
closer than threshhold)

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Thu, Feb 13, 2014 at 12:00 AM, Dario Strbenac
dstr7...@uni.sydney.edu.au wrote:
 Hello,

 I'm looking for a function that groups elements below a certain distance 
 threshold, based on a distance matrix. In other words, I'd like to group 
 samples without using a standard clustering algorithm on the distance matrix. 
 For example, let the distance matrix be :

   A B C D
 A 0  0.03  0.77  1.12
 B  0.03 0  1.59  1.11
 C  0.77  1.59 0  0.09
 D  1.12  1.11  0.09 0

 Two clusters would be found with a cutoff of 0.1. The first contains A,B. The 
 second has C,D. Is there an efficient function that does this ? I can think 
 of how to do this recursively, but am hoping it's already been considered.

 --
 Dario Strbenac
 PhD Student
 University of Sydney
 Camperdown NSW 2050
 Australia
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread Rainer M Krug

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi

I would like to group commands, so that after a group of commands has
been executed, the variables defined in that group are automatically
deleted.

My reasoning: I have a longer sript which is used to load data, do
analysis and plot graphs, all part of a document (in org-mode /
emacs).

I have several datasets which are loaded, and each one is quite
big. So after doing one part of the job (e.g. analysing the data and
storing the results) I want to delete all variables used to free space
and to avoid having these variables being used in the next block and
still having the old (for this block invalid) values.

I can't use rm(list=ls()) as I have some variables as constants
which do not change over the whole document and also some functions
defined.

I could put each block in a function and then call the function and
delete it afterwards, but this is as I see it abusing functions.

I don't want to keep track manually of the variables.

Therefore my question:

Can I do something like:

x - 15

{ #here begins the block
a - 1:100
b - 4:400
} # here ends the block

# here are and b not defined anymore
# but x is still defined

{} is great for grouping the commands, but the variables are not
deleted afterwards.

Am I missing a language feature in R?

Rainer

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v
49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F
I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw
Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g
0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz
7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0=
=dBo5
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread jim holtman

Check out the use of the 'local' function:


 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199420 10.7 407500 21.8   35 18.7
Vcells 308004  2.4 786432  6.0   786424  6.0
 result - local({
+ a - rnorm(100)  # big objects
+ b - rnorm(100)
+ mean(a + b)  # return value
+ })

 result
[1] 0.0001819203
 gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 199666 10.7 407500 21.8   35 18.7
Vcells 308780  2.42975200 22.7  3710863 28.4


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug rai...@krugs.de wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi

 I would like to group commands, so that after a group of commands has
 been executed, the variables defined in that group are automatically
 deleted.

 My reasoning: I have a longer sript which is used to load data, do
 analysis and plot graphs, all part of a document (in org-mode /
 emacs).

 I have several datasets which are loaded, and each one is quite
 big. So after doing one part of the job (e.g. analysing the data and
 storing the results) I want to delete all variables used to free space
 and to avoid having these variables being used in the next block and
 still having the old (for this block invalid) values.

 I can't use rm(list=ls()) as I have some variables as constants
 which do not change over the whole document and also some functions
 defined.

 I could put each block in a function and then call the function and
 delete it afterwards, but this is as I see it abusing functions.

 I don't want to keep track manually of the variables.

 Therefore my question:

 Can I do something like:

 x - 15

 { #here begins the block
 a - 1:100
 b - 4:400
 } # here ends the block

 # here are and b not defined anymore
 # but x is still defined

 {} is great for grouping the commands, but the variables are not
 deleted afterwards.

 Am I missing a language feature in R?

 Rainer

 - --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

 Tel :   +33 - (0)9 53 10 27 44
 Cell:   +33 - (0)6 85 62 59 98
 Fax :   +33 - (0)9 58 10 27 44

 Fax (D):+49 - (0)3 21 21 25 22 44

 email:  rai...@krugs.de

 Skype:  RMkrug
 -BEGIN PGP SIGNATURE-
 Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQEcBAEBAgAGBQJS3SDSAAoJENvXNx4PUvmCEAwH/jBCuQLRpRcPu+PSrUBsck8v
 49q3f0wAZqhyfjMQvRnLSECAfQN4GHI1WvXcuC9R8Z0eokL7gAqMnJSgWd61Un0F
 I+yClK1qbhpCwR8WV4nDXTuEW5rb5d8a1iHRPxXXSi/vdJZL3imWMsfvGTpgIhVw
 Dbi7+BSh52ZFEZPIyTm2+4qBfQA2ZaY3AEPTjBdB4iL603S+lpgmm1mAInFHFx5g
 0CzzY3feTWreD+EATXMGofTDaoxR5vuLvIRvv+PA/Ehz/hVnQah2xriL4NR+pIHz
 7WbqiReJ8H1ruAgtW6o8CmQRMArHmk0oBy1vYQvwB7SZ8/DOyKkArKBy8tGx/J0=
 =dBo5
 -END PGP SIGNATURE-

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping commands so that variablas are removed automatically - like functions

2014-01-20 Thread Rainer M Krug

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 01/20/14, 14:27 , jim holtman wrote:
 Check out the use of the 'local' function:

True - have completely forgotten the local function.

Thanks,

Rainer

 
 
 gc()
 used (Mb) gc trigger (Mb) max used (Mb) Ncells 199420 10.7
 407500 21.8   35 18.7 Vcells 308004  2.4 786432  6.0
 786424  6.0
 result - local({
 + a - rnorm(100)  # big objects + b - rnorm(100) 
 + mean(a + b)  # return value + })
 
 result
 [1] 0.0001819203
 gc()
 used (Mb) gc trigger (Mb) max used (Mb) Ncells 199666 10.7
 407500 21.8   35 18.7 Vcells 308780  2.42975200 22.7
 3710863 28.4
 
 
 Jim Holtman Data Munger Guru
 
 What is the problem that you are trying to solve? Tell me what you
 want to do, not how you want to do it.
 
 
 On Mon, Jan 20, 2014 at 8:12 AM, Rainer M Krug rai...@krugs.de
 wrote: Hi
 
 I would like to group commands, so that after a group of commands
 has been executed, the variables defined in that group are
 automatically deleted.
 
 My reasoning: I have a longer sript which is used to load data, do 
 analysis and plot graphs, all part of a document (in org-mode / 
 emacs).
 
 I have several datasets which are loaded, and each one is quite 
 big. So after doing one part of the job (e.g. analysing the data
 and storing the results) I want to delete all variables used to
 free space and to avoid having these variables being used in the
 next block and still having the old (for this block invalid)
 values.
 
 I can't use rm(list=ls()) as I have some variables as constants 
 which do not change over the whole document and also some
 functions defined.
 
 I could put each block in a function and then call the function
 and delete it afterwards, but this is as I see it abusing
 functions.
 
 I don't want to keep track manually of the variables.
 
 Therefore my question:
 
 Can I do something like:
 
 x - 15
 
 { #here begins the block a - 1:100 b - 4:400 } # here ends the
 block
 
 # here are and b not defined anymore # but x is still defined
 
 {} is great for grouping the commands, but the variables are not 
 deleted afterwards.
 
 Am I missing a language feature in R?
 
 Rainer
 
 
 __ 
 R-help@r-project.org mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
 posting guide http://www.R-project.org/posting-guide.html and
 provide commented, minimal, self-contained, reproducible code.

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJS3Sn/AAoJENvXNx4PUvmC7uIIAIkXdCNVCA1sqJ7jqODTWbG9
OrDTkhRD/IyR//39sCj5YC79peLbPkpKtQgnmoj7jMoNg2euxmCn3wIGLigWhy2w
cyGqh/TocfRnYVKyQXz4LC/IqVFAi+W9ymyevVDA0vQ9RcEYILEsXxjxl06VhZhS
wzOHOiXXdHka8xswjChPJRjA/17LQaStLYeEIQbukz3WCj1wTY68b6YixqlSh/BZ
7C91EULBQtTqV5OetvfV9lulicw0XyWp+ZcNvEa72Y3jZw5DX0LloLcRuuGLZf3N
dxmnB7Uj4kBArjupgfGtkwZzT1d3UX0bb3vqPt0TRoeJCT04XnupbpdpwUOhJ8c=
=0zID
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping Matrix by Columns; OHLC Data

2013-09-26 Thread arun

HI,
May be this helps:

set.seed(24)
 mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24)
colnames(mat1)- rep(c(O,H,L,C),6)
indx-seq_along(colnames(mat1))
n- length(unique(colnames(mat1)))
 res- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])
lapply(res,head,2)
#$`1`
#  O  O  O  O  O  O
#[1,] 18 56 51 24 24 52
#[2,] 14 31 60 12 43 34
#
#$`2`
#  H  H  H  H  H  H
#[1,] 20  6  4 23 10  2
#[2,] 15 37 22 52 30 42
#
#$`3`
#  L  L  L  L  L  L
#[1,] 30 25 29  1 57 16
#[2,] 15 23 15 10 44 60
#
#$`4`
#  C  C  C  C  C  C
#[1,] 20 13  8 44  5 13
#[2,] 45 17 35  8 25 12

A.K.



Motivation: 

Bring in data containing a number of columns divisable by 4. 
This data contains several different assets and the columns correspond 
to Open,High,Low,Close, Open,High,Low,Close,  etc (thus divisible by
 4). From where I am getting this data, the header is not labled as 
Open,High,Low,Close, but rather just has the asset symbol. 

The end goal is to have each Open,High,Low,Close,  as its own 
OHLC object, to be run through different volatility functions (via 
QuantMod ) 

I believe i am best served by first grouping the original data 
so that each asset is its own object, with 4 columns. Then i can rename 
the columns to be: 
colnames(function$asset) -c(Open, High,Low, Close) 

I've attempted to use split, but am having trouble with split along the 
columns. 

Obviously I could manipulate the indexing, with something like 
data[i:i+4] and use a loop. Maybe this indexing approach would work with
 use of apply(). 


Previously, I've been using Mathematica for most of my data 
manipulation, and there I would partition the entire data set i.e. 
Matrix, into   column# / 4 separate objects.  So, in that case I have a 3
 dimensional object. I'd then call the object by its 3rd dimension index
 # [][#]. 

I'm having trouble doing that here. Any thoughts, or at the least  helping me 
to group the data by column. 

For the sake of possible examples, lets say the dimensions of my data is n.rows 
= 30, n.col = 24 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping Matrix by Columns; OHLC Data

2013-09-26 Thread arun

Hi Jake.

Sorry, I misunderstood about what you wanted.
Instead of this:

lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])

If I use:
res1- lapply(split(indx,(indx-1)%/%n+1),function(i) mat1[,i])

#or
lapply(split(indx, as.numeric(gl(ncol(mat1),n,ncol(mat1,function(i) 
mat1[,i])



 lapply(res1,head,2)[1:2]
#$`1`
 #     O  H  L  C
#[1,] 18 20 30 20
#[2,] 14 15 15 45
#
#$`2`
 #     O  H  L  C
#[1,] 56  6 25 13
#[2,] 31 37 23 17

A.K.




So, i got it worked out. Thanks for your input. I see that you used a 
mod, which worked well for the application which you solved, and an 
application that will likely come up again. Anyways, here is the 
solution I was lookin for: 


set.seed(24) 
 mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24) 
colnames(mat1)- rep(c(O,H,L,C),6) 
indx-seq_along(colnames(mat1)) 
n- length(unique(colnames(mat1))) 


res -lapply(split(indx,rep(1:6,each = 4, times = 1)),function(i) mat1[,i]) 
##rep(1:6,each = 4, times = 1) 
## [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 

lapply(res,head,2) 


$`1` 
      O  H  L  C 
[1,] 18 20 30 20 
[2,] 14 15 15 45 

$`2` 
      O  H  L  C 
[1,] 56  6 25 13 
[2,] 31 37 23 17 

$`3` 
      O  H  L  C 
[1,] 51  4 29  8 
[2,] 60 22 15 35 

$`4` 
      O  H  L  C 
[1,] 24 23  1 44 
[2,] 12 52 10  8 

$`5` 
      O  H  L  C 
[1,] 24 10 57  5 
[2,] 43 30 44 25 

$`6` 
      O  H  L  C 
[1,] 52  2 16 13 
[2,] 34 42 60 12 

Thanks again 


- Original Message -
From: arun smartpink...@yahoo.com
To: R help r-help@r-project.org
Cc: 
Sent: Thursday, September 26, 2013 5:15 PM
Subject: Re: Grouping Matrix by Columns; OHLC Data

HI,
May be this helps:

set.seed(24)
 mat1- matrix(sample(1:60,30*24,replace=TRUE),ncol=24)
colnames(mat1)- rep(c(O,H,L,C),6)
indx-seq_along(colnames(mat1))
n- length(unique(colnames(mat1)))
 res- lapply(split(indx,(indx-1)%%n+1),function(i) mat1[,i])
lapply(res,head,2)
#$`1`
#  O  O  O  O  O  O
#[1,] 18 56 51 24 24 52
#[2,] 14 31 60 12 43 34
#
#$`2`
#  H  H  H  H  H  H
#[1,] 20  6  4 23 10  2
#[2,] 15 37 22 52 30 42
#
#$`3`
#  L  L  L  L  L  L
#[1,] 30 25 29  1 57 16
#[2,] 15 23 15 10 44 60
#
#$`4`
#  C  C  C  C  C  C
#[1,] 20 13  8 44  5 13
#[2,] 45 17 35  8 25 12

A.K.



Motivation: 

Bring in data containing a number of columns divisable by 4. 
This data contains several different assets and the columns correspond 
to Open,High,Low,Close, Open,High,Low,Close,  etc (thus divisible by
4). From where I am getting this data, the header is not labled as 
Open,High,Low,Close, but rather just has the asset symbol. 

The end goal is to have each Open,High,Low,Close,  as its own 
OHLC object, to be run through different volatility functions (via 
QuantMod ) 

I believe i am best served by first grouping the original data 
so that each asset is its own object, with 4 columns. Then i can rename 
the columns to be: 
colnames(function$asset) -c(Open, High,Low, Close) 

I've attempted to use split, but am having trouble with split along the 
columns. 

Obviously I could manipulate the indexing, with something like 
data[i:i+4] and use a loop. Maybe this indexing approach would work with
use of apply(). 


Previously, I've been using Mathematica for most of my data 
manipulation, and there I would partition the entire data set i.e. 
Matrix, into   column# / 4 separate objects.  So, in that case I have a 3
dimensional object. I'd then call the object by its 3rd dimension index
# [][#]. 

I'm having trouble doing that here. Any thoughts, or at the least  helping me 
to group the data by column. 

For the sake of possible examples, lets say the dimensions of my data is n.rows 
= 30, n.col = 24 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping variables by a irregular time interval

2013-09-21 Thread Raoni Rodrigues

Hello all,

I have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
  1   2 999 22/07/2013 11:00:2117  1
  1   2 999 22/07/2013 11:33:4731  1
  1   2 999 22/07/2013 11:34:0019  1
  1   2 999 22/07/2013 11:34:1653  1
  1   2 999 22/07/2013 11:43:2015  1
  1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval: look to Tag 999: first
interval are between 11:00 and 11:10, second between 11:34 and 11:44 and
third are between 11:44 and 11:45.

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- Good Sport
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(3L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 18L, 19L, 19L, 19L, 19L, 20L, 23L, 19L, 17L, 17L,
17L, 23L, 18L, 1L, 1L, 1L, 2L, 2L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L,
13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 16L, 16L, 16L, 16L, 18L, 19L, 21L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 22L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L,
25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L,
27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L), .Label = c(19/06/2013
22:15,
19/06/2013 22:16, 22/07/2013 11:00, 22/07/2013 11:33, 22/07/2013
11:34,
22/07/2013 11:43, 22/07/2013 11:44, 22/07/2013 11:45, 25/07/2013
14:10,
25/07/2013 14:11, 25/07/2013 14:12, 25/07/2013 14:13, 25/07/2013
14:14,
25/07/2013 14:15, 25/07/2013 14:16, 25/07/2013 14:17, 25/07/2013
14:18,
25/07/2013 14:19, 25/07/2013 14:20, 25/07/2013 14:21, 25/07/2013
14:23,
25/07/2013 14:24, 25/07/2013 14:25, 25/07/2013 14:26, 25/07/2013
14:27,
25/07/2013 14:28, 25/07/2013 14:29, 25/07/2013 14:30), class =
factor),
Power = c(17L, 31L, 19L, 53L, 15L, 17L, 21L, 12L, 15L,

Re: [R] Grouping variables by a irregular time interval

2013-09-21 Thread Raoni Rodrigues

Arun caught my attention that I committed a mistake with example data set.
I send now the correct, with same text explain my problem.

Sorry all of you for the confusion.

I have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
1   1   2 999 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:33:4731  1
3   1   2 999 22/07/2013 11:34:0019  1
4   1   2 999 22/07/2013 11:34:1653  1
5   1   2 999 22/07/2013 11:43:2015  1
6   1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
1   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
3   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
4   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
5   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
6   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval (look at tag 999).

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- Good Sport
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 68L, 70L, 72L, 73L, 71L, 75L, 86L,
74L, 64L, 64L, 65L, 87L, 67L, 1L, 2L, 3L, 4L, 5L, 26L, 27L, 28L,
29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L,
42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L,
55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 66L, 69L, 76L, 77L,
78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 88L, 89L, 90L, 91L, 92L,
93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L,
105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L,
116L, 117L), .Label = c(19/06/2013 22:15:49, 19/06/2013 22:15:54,
19/06/2013 22:15:59, 19/06/2013 22:16:24, 19/06/2013 22:16:29,
22/07/2013 11:00:21, 22/07/2013 11:33:47, 22/07/2013 11:34:00,
22/07/2013 11:34:16, 22/07/2013 11:43:20, 22/07/2013 11:43:35,
22/07/2013 11:44:35, 22/07/2013 11:44:41, 22/07/2013 11:44:42,
22/07/2013 11:44:43, 22/07/2013 11:44:44, 22/07/2013 11:44:59,
22/07/2013 11:45:11, 22/07/2013 11:45:29, 22/07/2013 11:45:30,
22/07/2013 11:45:31, 22/07/2013 11:45:35, 22/07/2013 11:45:37,

[R] Grouping variables by a irregular time interval

2013-09-20 Thread Raoni Rodrigues

Hello all,

I´m have a very large data frame (more than 5 million lines) as below (dput
example at the end of mail):

Station Antenna TagDateTime Power Events
1   1   2 999 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:33:4731  1
3   1   2 999 22/07/2013 11:34:0019  1
4   1   2 999 22/07/2013 11:34:1653  1
5   1   2 999 22/07/2013 11:43:2015  1
6   1   2 999 22/07/2013 11:43:3517  1

To each Tag, in each Antenna, in each Station, I need to create a 10 min
interval and sum the number of Events and mean of Power in the time
interval, as below (complete wanted output at the end of mail).

Station Antenna Tag   StartDateTime EndDateTime Power Events
1   1   2 999 22/07/2013 11:00:21 22/07/2013 11:00:2117  1
2   1   2 999 22/07/2013 11:34:16 22/07/2013 11:43:3527  5
3   1   2 999 22/07/2013 11:44:35 22/07/2013 11:45:4017 14
4   2   1   1 25/07/2013 14:19:45 25/07/2013 14:20:3965  4
5   2   1   2 25/07/2013 14:20:13 25/07/2013 14:25:1421  3
6   2   1   4 25/07/2013 14:20:46 25/07/2013 14:20:4628  1

Show start and end points of each interval is optional, not necessary. I
put both to show the irregular time interval (look at tag 999)

First I tried a for-loop, without success. After that, I tried this code:

require (plyr)

ddply (data, .(Station, Antenna, Tag, cut(data$DateTime, 10 min)),
summarise, Power = round (mean(Power), 0), Events = sum (Events))

Is almost what I want, because cut() divided in regular time intervals, but
in some cases I do not have this, and it split a unique observation in two.

Any ideas to solve this issue?

R version 3.0.1 (2013-05-16) -- Good Sport
Platform: x86_64-w64-mingw32/x64 (64-bit)
Windows 7 Professional

Thanks in advanced,

Raoni
-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com


##complete data dput

structure(list(Station = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Antenna = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), Tag = c(999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L, 999L,
999L, 999L, 999L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 18L, 18L, 18L,
21L, 22L, 36L, 36L, 36L, 36L, 36L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L,
48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L), DateTime = structure(c(3L,
4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 18L, 19L, 19L, 19L, 19L, 20L, 23L, 19L, 17L, 17L,
17L, 23L, 18L, 1L, 1L, 1L, 2L, 2L, 9L, 9L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L,
13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 16L, 16L, 16L, 16L, 18L, 19L, 21L, 21L, 21L, 21L, 21L,
22L, 22L, 22L, 22L, 22L, 23L, 24L, 24L, 24L, 24L, 24L, 24L, 25L,
25L, 25L, 25L, 25L, 26L, 26L, 26L, 26L, 26L, 26L, 26L, 27L, 27L,
27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 28L), .Label = c(19/06/2013
22:15,
19/06/2013 22:16, 22/07/2013 11:00, 22/07/2013 11:33, 22/07/2013
11:34,
22/07/2013 11:43, 22/07/2013 11:44, 22/07/2013 11:45, 25/07/2013
14:10,
25/07/2013 14:11, 25/07/2013 14:12, 25/07/2013 14:13, 25/07/2013
14:14,
25/07/2013 14:15, 25/07/2013 14:16, 25/07/2013 14:17, 25/07/2013
14:18,
25/07/2013 14:19, 25/07/2013 14:20, 25/07/2013 14:21, 25/07/2013
14:23,
25/07/2013 14:24, 25/07/2013 14:25, 25/07/2013 14:26, 25/07/2013
14:27,
25/07/2013 14:28, 25/07/2013 14:29, 25/07/2013 14:30), class =
factor),
Power = c(17L, 31L, 19L, 53L, 15L, 17L, 21L, 12L, 15L, 22L,
19L, 15L, 13L, 14L, 15L, 12L, 23L, 19L, 16L, 20L, 30L, 37L,
25L, 167L, 24L, 14L,

Re: [R] grouping followed by finding frequent patterns in R

2013-03-10 Thread Bert Gunter

1.Please cc to the list, as I have here, unless your comments are off topic.

2. Use dput() (?dput) to include **small** amounts of data in your
message, as attachments are generally stripped from r-help.

3. I have no experience with itemsets or the arules package, but a
quick glance at the docs there said that your data argument must be in
a specific form coercible into an S4 transactions class. I suspect
that neither your initial data frame nor the list deriving from split
is, but maybe someone familiar with the package can tell you for sure.
That's why you need to cc to the list.

-- Bert

On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas crazydh...@gmail.com wrote:
Dear Bert,

My intention is to mine frequent itemsets of TRN_TYP for individual CIN out
of that data.
But the problem is using eclat after splitting gives the following error:

Error in eclat(list) : internal error in trio library

PS: I have attached my dataset.

On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter gunter.ber...@gene.com wrote:

I **suggest** that you explain what you wish to accomplish using a
reproducible example rather than telling us what packages you think
you should use. I believe you are making things too complicated; e.g.
what do you mean by frequent patterns? Moreover, basket format is
rather unclear -- and may well be unnecessary. But using lists, it
could be simply accomplished by

?split ## as in
the_list - with(yourdata, split(TYP, CIN.TRN))

or possibly

the_list - with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

Of course, these may be irrelevant and useless, but without knowing
your purpose ...?

-- Bert

On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas crazydh...@gmail.com
wrote:
I have a data in the following form :
CIN TRN_TYP
90799541
90799542
90799543
90799544
90799545
90799544
90799545
90799546
90799547
90799548
90799549
90799549
..
..
..
there are 100 types of CIN (9079954,12441087,15246633,...) and
respective
TRN_TYP

first of all, I want this data to be grouped into basket format:
9079954 1, 2, 3, 4, 5,
12441087 19, 14, 21, 3, 7, ...
.
.
.
and then apply eclat from arules package to find frequent patterns.

1) I ran the following code:
file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
file - file[!duplicated(file),]
eclat(split(file$TRN_TYP,file$CIN))

but it gave me the following error:
Error in asMethod(object) : can not coerce list with transactions with
duplicated items

2) I ran this code:
file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other
columns as well, so I selecting only CIN and TRN_TYP
file_new - file_new[!duplicated(file_new),]
eclat(split(file_new$TRN_TYP,file_new$CIN))

but again:
Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
internal error in trio library

PLEASE HELP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:

http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grouping followed by finding frequent patterns in R

2013-03-09 Thread Dhiman Biswas

I have a data in the following form :
CIN TRN_TYP
90799541
90799542
90799543
90799544
90799545
90799544
90799545
90799546
90799547
90799548
90799549
90799549
..
..
..
there are 100 types of CIN (9079954,12441087,15246633,...) and respective
TRN_TYP

first of all, I want this data to be grouped into basket format:
9079954   1, 2, 3, 4, 5, 
12441087  19, 14, 21, 3, 7, ...
.
.
.
and then apply eclat from arules package to find frequent patterns.

1) I ran the following code:
file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
file - file[!duplicated(file),]
eclat(split(file$TRN_TYP,file$CIN))

but it gave me the following error:
Error in asMethod(object) : can not coerce list with transactions with
duplicated items

2) I ran this code:
file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other
columns as well, so I selecting only CIN and TRN_TYP
file_new - file_new[!duplicated(file_new),]
eclat(split(file_new$TRN_TYP,file_new$CIN))

but again:
Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
  internal error in trio library

PLEASE HELP

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping followed by finding frequent patterns in R

2013-03-09 Thread Bert Gunter

I **suggest** that you explain what you wish to accomplish using a
reproducible example rather than telling us what packages you think
you should use. I believe you are making things too complicated; e.g.
what do you mean by frequent patterns?  Moreover, basket format is
rather unclear -- and may well be unnecessary. But using lists, it
could be simply accomplished by

?split  ## as in
the_list - with(yourdata, split(TYP,  CIN.TRN))

or possibly

the_list - with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

Of course, these may be irrelevant and useless, but without knowing
your purpose ...?

-- Bert

On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas crazydh...@gmail.com wrote:
 I have a data in the following form :
 CIN TRN_TYP
 90799541
 90799542
 90799543
 90799544
 90799545
 90799544
 90799545
 90799546
 90799547
 90799548
 90799549
 90799549
 ..
 ..
 ..
 there are 100 types of CIN (9079954,12441087,15246633,...) and respective
 TRN_TYP

 first of all, I want this data to be grouped into basket format:
 9079954   1, 2, 3, 4, 5, 
 12441087  19, 14, 21, 3, 7, ...
 .
 .
 .
 and then apply eclat from arules package to find frequent patterns.

 1) I ran the following code:
 file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
 file - file[!duplicated(file),]
 eclat(split(file$TRN_TYP,file$CIN))

 but it gave me the following error:
 Error in asMethod(object) : can not coerce list with transactions with
 duplicated items

 2) I ran this code:
 file-read.csv(D:/R/Practice/Data_Input_NUM.csv)
 file_new-file[,c(3,6)] # because my file Data_Input_NUM has many other
 columns as well, so I selecting only CIN and TRN_TYP
 file_new - file_new[!duplicated(file_new),]
 eclat(split(file_new$TRN_TYP,file_new$CIN))

 but again:
 Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
   internal error in trio library

 PLEASE HELP

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grouping elements of a data frame

2013-01-15 Thread Nuri Alpay Temiz

Hi everyone,

I have a question on selecting and grouping elements of a data frame. For 
example:

A.df- [ a c 0.9
 b  x 0.8
 b z 0.5
 c y 0.9
 c x 0.7
 c z 0.6]


I want to create a list of a data frame that gives me the unique values of 
column 1 of A.df so that i can create intersects. That is:

B[a]- [ c 0.9]

B[b]- [ x 0.8
 z 0.5]

B[c]- [ y 0.9
 x 0.7
 z 0.6]


B[c] n B[b] - c(x,z)


How can I accomplish this?

Thanks,
Al
 
 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping elements of a data frame

2013-01-15 Thread David Winsemius


On Jan 15, 2013, at 9:10 AM, Nuri Alpay Temiz wrote:

 Hi everyone,
 
 I have a question on selecting and grouping elements of a data frame. For 
 example:
 
 A.df- [ a c 0.9
 b  x 0.8
 b z 0.5
 c y 0.9
 c x 0.7
 c z 0.6]

That is not R code. Matlab?, Python? 

 
 
 I want to create a list of a data frame that gives me the unique values of 
 column 1 of A.df so that i can create intersects. That is:
 
 B[a]- [ c 0.9]
 
 B[b]- [ x 0.8
 z 0.5]
 
 B[c]- [ y 0.9
 x 0.7
 z 0.6]
 
 
 B[c] n B[b] - c(x,z)
 

That's some sort of coded message? We are supposed to know what the n 
operation will do when assigned a vector?


Assuming your really do have a dataframe named B:

intersect(B$c, B$b)

Please code up examples in R in the future.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping elements of a data frame

2013-01-15 Thread arun

Hi,
Try this:
The last part was not clear.
A.df-read.table(text=
        a c 0.9
    b  x 0.8
    b z 0.5
    c y 0.9
    c x 0.7
    c z 0.6
,sep=,header=FALSE,stringsAsFactors=FALSE)
 lst1-split(A.df[,-1],A.df$V1)
lst1
#$a
#  V2  V3
#1  c 0.9
#
#$b
#  V2  V3
#2  x 0.8
#3  z 0.5
#
#$c
#  V2  V3
#4  y 0.9
#5  x 0.7
#6  z 0.6


A.K.



- Original Message -
From: Nuri Alpay Temiz alpayte...@outlook.com
To: R-help@r-project.org
Cc: 
Sent: Tuesday, January 15, 2013 12:10 PM
Subject: [R] grouping elements of a data frame

Hi everyone,

I have a question on selecting and grouping elements of a data frame. For 
example:

A.df- [ a c 0.9
             b  x 0.8
             b z 0.5
             c y 0.9
             c x 0.7
             c z 0.6]


I want to create a list of a data frame that gives me the unique values of 
column 1 of A.df so that i can create intersects. That is:

B[a]- [ c 0.9]

B[b]- [ x 0.8
             z 0.5]

B[c]- [ y 0.9
             x 0.7
             z 0.6]


B[c] n B[b] - c(x,z)


How can I accomplish this?

Thanks,
Al
                    
            
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping distances

2012-06-11 Thread Jhope

Hi R-listers, 

I am trying to group my HTL data, this is a column of data of Distances to
the HTL data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column. 

So far I have gotten this far: 

HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) { 
hope -values * weights 
return (apply(hope, 1, sum)/apply(values, 1, sum)) 
} 
write.csv(turtlehatch, HTLIndex, row.names=FALSE) 
 

But I do not seem to be able to create a new column  in a new file. 

Please advise, Jean 

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping distances

2012-06-11 Thread Jhope

Hi R-listers, 

I am trying to group my HTL data, this is a column of data of Distances to
the HTL data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column. 

So far I have gotten this far:

HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) {
hope -values * weights
return (apply(hope, 1, sum)/apply(values, 1, sum))
}
write.csv(turtlehatch, HTLIndex, row.names=FALSE)


But I do not seem to be able to create a new column  in a new file. 

Please advise, Jean



--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632984.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping distances

2012-06-11 Thread Rui Barradas


Hello,

It's easy to create a new column. Since you haven't said where nor the 
type of data structure you are using, I'll try to answer to both.

Suppose that 'x' s a matrix. Then

newcolumn - newvalues
x2 - cbind(x, newcolumn)  # new column added to x, result in x2

Suppose that 'y' is a data.frame. Then the same would do it, or

y$newcolumn - newvalues

Now, I believe that the new values come from your function. If so, you 
must assign the function value to some variable outside the function.


htlindex - HTL.index(...etc...)  # 'htlindex' is the 'newvalues' above


Two extra notes.
One, rowSums() does what your apply() instructions do.

Second, first you multiply then you divide, to give 'weights'. I think 
this is just an example, not the real function.


Hope this helps,

Rui Barradas

Em 11-06-2012 07:01, Jhope escreveu:

Hi R-listers,

I am trying to group my HTL data, this is a column of data of Distances to
the HTL data = turtlehatch. I would like to create an Index of distances
(0-5m, 6-10, 11-15, 16-20... up to 60). And then create a new file with this
HTLIndex in a column.

So far I have gotten this far:

HTL.index - function (values, weights=c(0, 5, 10, 15, 20, 25, 30, 35, 40,
45, 50, 55, 60)) {
hope -values * weights
return (apply(hope, 1, sum)/apply(values, 1, sum))
}
write.csv(turtlehatch, HTLIndex, row.names=FALSE)


But I do not seem to be able to create a new column  in a new file.

Please advise, Jean

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping distances

2012-06-11 Thread Jhope

Thank you Rui, 

I am trying to create a column in the data file turtlehatch.csv

Saludos, Jean

--
View this message in context: 
http://r.789695.n4.nabble.com/Grouping-distances-tp4632985p4632989.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grouping function

2012-05-08 Thread Geoffrey Smith

Hello, I would like to write a function that makes a grouping variable for
some panel data .  The grouping variable is made conditional on the begin
year and the end year.  Here is the code I have written so far.

name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

df - data.frame(name, begin, end);
df;

#This is the part I am stuck on;

makegroup - function(x,y) {
 group - 0
 if (x = 1990  y  1990) {group==1}
 if (x = 1991  y  1991) {group==2}
 if (x = 1992  y  1992) {group==3}
 return(x,y)
}

makegroup(df$begin,df$end);

#I am looking for output where each observation belongs to a group
conditional on the begin year and end year.  I would also like to use a for
loop for programming accuracy as well;

Thank you!  Geoff

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping function

2012-05-08 Thread Sarah Goslee

Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I would like to write a function that makes a grouping variable for
 some panel data .  The grouping variable is made conditional on the begin
 year and the end year.  Here is the code I have written so far.

 name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
 begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
 end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

 df - data.frame(name, begin, end);
 df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

 #This is the part I am stuck on;

 makegroup - function(x,y) {
  group - 0
  if (x = 1990  y  1990) {group==1}
  if (x = 1991  y  1991) {group==2}
  if (x = 1992  y  1992) {group==3}
  return(x,y)
 }

 makegroup(df$begin,df$end);

 #I am looking for output where each observation belongs to a group
 conditional on the begin year and end year.  I would also like to use a for
 loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always = start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that looks like your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 - function(x,y) {
 group - numeric(length(x))
 group[x = 1990  y  1990] - 1
 group[x = 1991  y  1991] - 2
 group[x = 1992  y  1992] - 3
 group
}

makegroup2 - function(x, y) {
   ifelse(x = 1990  y  1990, 1,
  ifelse(x = 1991  y  1991, 2,
   ifelse(x = 1992  y  1992, 3, 0)))
}

 makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
 [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
 df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping function

2012-05-08 Thread Sarah Goslee

Sorry, yes: I changed it before posting it to more closely match what
the default value in the pseudocode. That's a very minor issue: the
very last value in the nested ifelse() statements is what's used by
default.

Sarah

On Tue, May 8, 2012 at 2:46 PM, arun smartpink...@yahoo.com wrote:
 HI Sarah,

 I run the same code from your reply email.  For the makegroup2, the results 
 are 0 in places of NA.

 makegroup1 - function(x,y) {
 + group - numeric(length(x))
 + group[x = 1990  y  1990] - 1
 + group[x = 1991  y  1991] - 2
 + group[x = 1992  y  1992] - 3
 + group
 + }
 makegroup2 - function(x, y) {
 +   ifelse(x = 1990  y  1990, 1,
 +   ifelse(x = 1991  y  1991, 2,
 + ifelse(x = 1992  y  1992, 3, 0)))
 + }
 makegroup1(df$begin,df$end)
  [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
  [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0


 A. K.




 - Original Message -
 From: Sarah Goslee sarah.gos...@gmail.com
 To: g...@asu.edu
 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Tuesday, May 8, 2012 2:33 PM
 Subject: Re: [R] grouping function

 Hi,

 On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I would like to write a function that makes a grouping variable for
 some panel data .  The grouping variable is made conditional on the begin
 year and the end year.  Here is the code I have written so far.

 name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
 begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
 end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

 df - data.frame(name, begin, end);
 df;

 Thanks for providing reproducible data. Two minor points: you don't
 need ; at the end of lines, and calling your data frame df is
 confusing because there's a df() function.

 #This is the part I am stuck on;

 makegroup - function(x,y) {
  group - 0
  if (x = 1990  y  1990) {group==1}
  if (x = 1991  y  1991) {group==2}
  if (x = 1992  y  1992) {group==3}
  return(x,y)
 }

 makegroup(df$begin,df$end);

 #I am looking for output where each observation belongs to a group
 conditional on the begin year and end year.  I would also like to use a for
 loop for programming accuracy as well;

 This isn't a clear specification:
 1990, 1994 for instance fits into all three groups. Do you want to
 extend this to more start years, or are you only interested in those
 three? Assuming end is always = start, you don't even need to
 consider the end years in your grouping.

 Here are two methods, one that looks like your pseudocode, and one
 that is more R-ish. They give different results because of different
 handling of cases that fit all three groups. Rearranging the
 statements in makegroup1() from broadest to most restrictive would
 make it give the same result as makegroup2().


 makegroup1 - function(x,y) {
 group - numeric(length(x))
 group[x = 1990  y  1990] - 1
 group[x = 1991  y  1991] - 2
 group[x = 1992  y  1992] - 3
 group
 }

 makegroup2 - function(x, y) {
    ifelse(x = 1990  y  1990, 1,
       ifelse(x = 1991  y  1991, 2,
       ifelse(x = 1992  y  1992, 3, 0)))
 }

 makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
 [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
 df


 But really, it's a better idea to develop an unambiguous statement of
 your desired output.

 Sarah


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grouping function

2012-05-08 Thread arun

HI Sarah,

I run the same code from your reply email.  For the makegroup2, the results are 
0 in places of NA.

 makegroup1 - function(x,y) {
+ group - numeric(length(x))
+ group[x = 1990  y  1990] - 1
+ group[x = 1991  y  1991] - 2
+ group[x = 1992  y  1992] - 3
+ group
+ }
 makegroup2 - function(x, y) {
+   ifelse(x = 1990  y  1990, 1,
+   ifelse(x = 1991  y  1991, 2,
+ ifelse(x = 1992  y  1992, 3, 0)))
+ }
 makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
 [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0


A. K.




- Original Message -
From: Sarah Goslee sarah.gos...@gmail.com
To: g...@asu.edu
Cc: r-help@r-project.org r-help@r-project.org
Sent: Tuesday, May 8, 2012 2:33 PM
Subject: Re: [R] grouping function

Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I would like to write a function that makes a grouping variable for
 some panel data .  The grouping variable is made conditional on the begin
 year and the end year.  Here is the code I have written so far.

 name - c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
 begin - c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
 end - c(seq(1995,1999), seq(1995,1999), seq(1996,2000));

 df - data.frame(name, begin, end);
 df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

 #This is the part I am stuck on;

 makegroup - function(x,y) {
  group - 0
  if (x = 1990  y  1990) {group==1}
  if (x = 1991  y  1991) {group==2}
  if (x = 1992  y  1992) {group==3}
  return(x,y)
 }

 makegroup(df$begin,df$end);

 #I am looking for output where each observation belongs to a group
 conditional on the begin year and end year.  I would also like to use a for
 loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always = start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that looks like your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 - function(x,y) {
group - numeric(length(x))
group[x = 1990  y  1990] - 1
group[x = 1991  y  1991] - 2
group[x = 1992  y  1992] - 3
group
}

makegroup2 - function(x, y) {
   ifelse(x = 1990  y  1990, 1,
      ifelse(x = 1991  y  1991, 2,
      ifelse(x = 1992  y  1992, 3, 0)))
}

 makegroup1(df$begin,df$end)
[1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
 makegroup2(df$begin,df$end)
[1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
 df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping and/or splitting

2012-04-04 Thread Berend Hasselman


On 04-04-2012, at 07:15, Ashish Agarwal wrote:

 Yes. I was missing the DROP argument.
 But now the problem is splitting is causing some weird ordering of groups.

Why weird?

 See below:
 
 DF - read.table(text=
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 3,1,5,7
 , header=TRUE, sep=,)
 aa - split(DF, DF[, 1:2], drop=TRUE)
 
 Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
 preserve the ascending order?
 

Try this

aa[order(names(aa))]

Berend

 aa[3]
 $`3.1`
  Houseid Personid Tripid taz
 7   31  5   7
 aa[4]
 $`2.2`
  Houseid Personid Tripid taz
 6   22  1  58
 
 
 On Wed, Apr 4, 2012 at 6:29 AM, Rui Barradas rui1...@sapo.pt wrote:
 
 Hello,
 
 
 Ashish Agarwal wrote
 
 I have a dataframe imported from csv file below:
 
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 
 There are three groups identified based on the combination of first and
 second columns. How do I split this data frame?
 
 I tried
 aa - split(inpfil, inpfil[,1:2])
 but it has problems.
 
 Output desired is
 
 aa[1]
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 aa[2]
 Houseid,Personid,Tripid,taz
 2,1,1,96
 2,1,2,4
 2,1,3,2
 aa[3]
 Houseid,Personid,Tripid,taz
 2,2,1,58
 
  [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 Any of the following three works with me.
 
 
 DF - read.table(text=
 Houseid,Personid,Tripid,taz
 1,1,1,4
 1,1,2,7
 2,1,1,96
 2,1,2,4
 2,1,3,2
 2,2,1,58
 , header=TRUE, sep=,)
 
 DF
 
 split(DF, DF[, 1:2], drop=TRUE)
 split(DF, list(DF$Houseid, DF$Personid), drop=TRUE)
 with(DF, split(DF, list(Houseid, Personid), drop=TRUE))
 
 The argument 'drop' defaults to FALSE. Was that the problem?
 
 Hope this helps,
 
 Rui Barrada
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping and/or splitting

2012-04-04 Thread Ashish Agarwal

Thanks a ton!
It was weird because according to me ordering should have by default.
Anyways, your workaround along with Weidong's method are both good
solutions.
On Wed, Apr 4, 2012 at 12:10 PM, Berend Hasselman b...@xs4all.nl wrote:


 On 04-04-2012, at 07:15, Ashish Agarwal wrote:

  Yes. I was missing the DROP argument.
  But now the problem is splitting is causing some weird ordering of
 groups.

 Why weird?

  See below:
 
  DF - read.table(text=
  Houseid,Personid,Tripid,taz
  1,1,1,4
  1,1,2,7
  2,1,1,96
  2,1,2,4
  2,1,3,2
  2,2,1,58
  3,1,5,7
  , header=TRUE, sep=,)
  aa - split(DF, DF[, 1:2], drop=TRUE)
 
  Now the result is aa[3] should is (3,1) and not (2,2). Why? How can I
  preserve the ascending order?
 

 Try this

 aa[order(names(aa))]

 Berend

  aa[3]
  $`3.1`
   Houseid Personid Tripid taz
  7   31  5   7
  aa[4]
  $`2.2`
   Houseid Personid Tripid taz
  6   22  1  58


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grouping

2012-04-03 Thread Val

Hi all,

Assume that I have the following 10 data points.
 x=c(  46, 125 , 36 ,193, 209, 78, 66, 242 , 297 , 45)

sort x  and get the following
  y= (36 , 45 , 46,  66, 78,  125,193, 209, 242, 297)

I want to  group the sorted  data point (y)  into  equal number of
observation per group. In this case there will be three groups.  The first
two groups  will have three observation  and the third will have four
observations

group 1  = 34, 45, 46
group 2  = 66, 78, 125
group 3  = 193, 209, 242,297

Finally I want to calculate the group mean

group 1  =  42
group 2  =  87
group 3  =  234

Can anyone help me out?

In SAS I used to do it using proc rank.

thanks in advance

Val

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 173 matches

Mail list logo