[R] Counting non-empty levels of a factor

2009-11-08 Thread sylvain willart
Hi everyone,

I'm struggling with a little problem for a while, and I'm wondering if
anyone could help...

I have a dataset (from retailing industry) that indicates which brands
are present in a panel of 500 stores,

store , brand
1 , B1
1 , B2
1 , B3
2 , B1
2 , B3
3 , B2
3 , B3
3 , B4

I would like to know how many brands are present in each store,

I tried:
result - aggregate(MyData$brand , by=list(MyData$store) , nlevels)

but I got:
Group.1 x
1 , 4
2 , 4
3 , 4

which is not exactly the result I expected
I would like to get sthg like:
Group.1 x
1 , 3
2 , 2
3 , 3

Looking around, I found I can delete empty levels of factor using:
problem.factor - problem.factor[,drop=TRUE]
But this solution isn't handy for me as I have many stores and should
make a subset of my data for each store before dropping empty factor

I can't either counting the line for each store (N), because the same
brand can appear several times in each store (several products for the
same brand, and/or several weeks of observation)

I used to do this calculation using SAS with:
proc freq data = MyData noprint ; by store ;
 tables  brand / out = result ;
run ;
(the cool thing was I got a database I can merge with MyData)

any idea for doing that in R ?

Thanks in advance,

King Regards,

Sylvain Willart,
PhD Marketing,
IAE Lille, France

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting non-empty levels of a factor

2009-11-08 Thread David Winsemius


On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:


Hi everyone,

I'm struggling with a little problem for a while, and I'm wondering if
anyone could help...

I have a dataset (from retailing industry) that indicates which brands
are present in a panel of 500 stores,

store , brand
1 , B1
1 , B2
1 , B3
2 , B1
2 , B3
3 , B2
3 , B3
3 , B4

I would like to know how many brands are present in each store,

I tried:
result - aggregate(MyData$brand , by=list(MyData$store) , nlevels)

but I got:
Group.1 x
1 , 4
2 , 4
3 , 4

which is not exactly the result I expected
I would like to get sthg like:
Group.1 x
1 , 3
2 , 2
3 , 3


Try:

result - aggregate(MyData$brand , by=list(MyData$store) , length)

Quick, easy and generalizes to other situations. The factor levels got  
carried along identically, but length counts the number of elements in  
the list returned by tapply.


Looking around, I found I can delete empty levels of factor using:
problem.factor - problem.factor[,drop=TRUE]


If you reapply the function, factor, you get the same result. So you  
could have done this:


 result - aggregate(MyData$brand , by=list(MyData$store) ,  
function(x) nlevels(factor(x)))

 result
  Group.1 x
1   1 3
2   2 2
3   3 3




But this solution isn't handy for me as I have many stores and should
make a subset of my data for each store before dropping empty factor

I can't either counting the line for each store (N), because the same
brand can appear several times in each store (several products for the
same brand, and/or several weeks of observation)

I used to do this calculation using SAS with:
proc freq data = MyData noprint ; by store ;
tables  brand / out = result ;
run ;
(the cool thing was I got a database I can merge with MyData)

any idea for doing that in R ?

Thanks in advance,

King Regards,

Sylvain Willart,
PhD Marketing,
IAE Lille, France

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting non-empty levels of a factor

2009-11-08 Thread David Winsemius


On Nov 8, 2009, at 9:11 AM, David Winsemius wrote:



On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:


Hi everyone,

I'm struggling with a little problem for a while, and I'm wondering  
if

anyone could help...

I have a dataset (from retailing industry) that indicates which  
brands

are present in a panel of 500 stores,

store , brand
1 , B1
1 , B2
1 , B3
2 , B1
2 , B3
3 , B2
3 , B3
3 , B4

I would like to know how many brands are present in each store,

I tried:
result - aggregate(MyData$brand , by=list(MyData$store) , nlevels)

but I got:
Group.1 x
1 , 4
2 , 4
3 , 4

which is not exactly the result I expected
I would like to get sthg like:
Group.1 x
1 , 3
2 , 2
3 , 3


Try:

result - aggregate(MyData$brand , by=list(MyData$store) , length)

Quick, easy and generalizes to other situations. The factor levels  
got carried along identically, but length counts the number of  
elements in the list returned by tapply.


Which may not have been what you asked for as this would demonstrate.  
You probably wnat the second solution:

mydata2 - rbind(MyData, MyData)
 result - aggregate(mydata2$brand , by=list(mydata2$store) , length)
 result
  Group.1 x
1   1 6
2   2 4
3   3 6

 result - aggregate(mydata2$brand , by=list(mydata2$store) ,  
function(x) nlevels(factor(x)))

 result
  Group.1 x
1   1 3
2   2 2
3   3 3



Looking around, I found I can delete empty levels of factor using:
problem.factor - problem.factor[,drop=TRUE]


If you reapply the function, factor, you get the same result. So you  
could have done this:


 result - aggregate(MyData$brand , by=list(MyData$store) ,  
function(x) nlevels(factor(x)))

 result
 Group.1 x
1   1 3
2   2 2
3   3 3




But this solution isn't handy for me as I have many stores and should
make a subset of my data for each store before dropping empty factor

I can't either counting the line for each store (N), because the same
brand can appear several times in each store (several products for  
the

same brand, and/or several weeks of observation)

I used to do this calculation using SAS with:
proc freq data = MyData noprint ; by store ;
tables  brand / out = result ;
run ;
(the cool thing was I got a database I can merge with MyData)

any idea for doing that in R ?

Thanks in advance,

King Regards,

Sylvain Willart,
PhD Marketing,
IAE Lille, France

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting non-empty levels of a factor

2009-11-08 Thread sylvain willart
Thanks a lot for those solutions,
Both are working great, and they do slightly different (but both very
interesting) things,
Moreover, I learned about the length() function ... one more to add to
my personal cheat sheet
King Regards

2009/11/8 David Winsemius dwinsem...@comcast.net:

 On Nov 8, 2009, at 9:11 AM, David Winsemius wrote:


 On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:

 Hi everyone,h

 I'm struggling with a little problem for a while, and I'm wondering if
 anyone could help...

 I have a dataset (from retailing industry) that indicates which brands
 are present in a panel of 500 stores,

 store , brand
 1 , B1
 1 , B2
 1 , B3
 2 , B1
 2 , B3
 3 , B2
 3 , B3
 3 , B4

 I would like to know how many brands are present in each store,

 I tried:
 result - aggregate(MyData$brand , by=list(MyData$store) , nlevels)

 but I got:
 Group.1 x
 1 , 4
 2 , 4
 3 , 4

 which is not exactly the result I expected
 I would like to get sthg like:
 Group.1 x
 1 , 3
 2 , 2
 3 , 3

 Try:

 result - aggregate(MyData$brand , by=list(MyData$store) , length)

 Quick, easy and generalizes to other situations. The factor levels got
 carried along identically, but length counts the number of elements in the
 list returned by tapply.

 Which may not have been what you asked for as this would demonstrate. You
 probably wnat the second solution:
 mydata2 - rbind(MyData, MyData)
 result - aggregate(mydata2$brand , by=list(mydata2$store) , length)
 result
  Group.1 x
 1       1 6
 2       2 4
 3       3 6

 result - aggregate(mydata2$brand , by=list(mydata2$store) , function(x)
 nlevels(factor(x)))
 result
  Group.1 x
 1       1 3
 2       2 2
 3       3 3


 Looking around, I found I can delete empty levels of factor using:
 problem.factor - problem.factor[,drop=TRUE]

 If you reapply the function, factor, you get the same result. So you could
 have done this:

  result - aggregate(MyData$brand , by=list(MyData$store) , function(x)
  nlevels(factor(x)))
  result
  Group.1 x
 1       1 3
 2       2 2
 3       3 3



 But this solution isn't handy for me as I have many stores and should
 make a subset of my data for each store before dropping empty factor

 I can't either counting the line for each store (N), because the same
 brand can appear several times in each store (several products for the
 same brand, and/or several weeks of observation)

 I used to do this calculation using SAS with:
 proc freq data = MyData noprint ; by store ;
 tables  brand / out = result ;
 run ;
 (the cool thing was I got a database I can merge with MyData)

 any idea for doing that in R ?

 Thanks in advance,

 King Regards,

 Sylvain Willart,
 PhD Marketing,
 IAE Lille, France

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting non-empty levels of a factor

2009-11-08 Thread John Kane
With xx as your sample data will this work?  See ?addmargins

jj - table(xx)

addmargins(jj, 2)

# or for both margins

addmargins(jj, c(1,2))

or 
apply(jj, 1, sum)


--- On Sun, 11/8/09, sylvain willart sylvain.will...@gmail.com wrote:

 From: sylvain willart sylvain.will...@gmail.com
 Subject: [R] Counting non-empty levels of a factor
 To: r-help@r-project.org
 Received: Sunday, November 8, 2009, 8:38 AM
 Hi everyone,
 
 I'm struggling with a little problem for a while, and I'm
 wondering if
 anyone could help...
 
 I have a dataset (from retailing industry) that indicates
 which brands
 are present in a panel of 500 stores,
 
 store , brand
 1 , B1
 1 , B2
 1 , B3
 2 , B1
 2 , B3
 3 , B2
 3 , B3
 3 , B4
 
 I would like to know how many brands are present in each
 store,
 
 I tried:
 result - aggregate(MyData$brand , by=list(MyData$store)
 , nlevels)
 
 but I got:
 Group.1 x
 1 , 4
 2 , 4
 3 , 4
 
 which is not exactly the result I expected
 I would like to get sthg like:
 Group.1 x
 1 , 3
 2 , 2
 3 , 3
 
 Looking around, I found I can delete empty levels of factor
 using:
 problem.factor - problem.factor[,drop=TRUE]
 But this solution isn't handy for me as I have many stores
 and should
 make a subset of my data for each store before dropping
 empty factor
 
 I can't either counting the line for each store (N),
 because the same
 brand can appear several times in each store (several
 products for the
 same brand, and/or several weeks of observation)
 
 I used to do this calculation using SAS with:
 proc freq data = MyData noprint ; by store ;
  tables  brand / out = result ;
 run ;
 (the cool thing was I got a database I can merge with
 MyData)
 
 any idea for doing that in R ?
 
 Thanks in advance,
 
 King Regards,
 
 Sylvain Willart,
 PhD Marketing,
 IAE Lille, France
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.
 


  __
Make your browsing faster, safer, and easier with the new Internet Explorer® 8. 
Opt
ernetexplorer/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.