[R] Counting non-empty levels of a factor
Hi everyone, I'm struggling with a little problem for a while, and I'm wondering if anyone could help... I have a dataset (from retailing industry) that indicates which brands are present in a panel of 500 stores, store , brand 1 , B1 1 , B2 1 , B3 2 , B1 2 , B3 3 , B2 3 , B3 3 , B4 I would like to know how many brands are present in each store, I tried: result - aggregate(MyData$brand , by=list(MyData$store) , nlevels) but I got: Group.1 x 1 , 4 2 , 4 3 , 4 which is not exactly the result I expected I would like to get sthg like: Group.1 x 1 , 3 2 , 2 3 , 3 Looking around, I found I can delete empty levels of factor using: problem.factor - problem.factor[,drop=TRUE] But this solution isn't handy for me as I have many stores and should make a subset of my data for each store before dropping empty factor I can't either counting the line for each store (N), because the same brand can appear several times in each store (several products for the same brand, and/or several weeks of observation) I used to do this calculation using SAS with: proc freq data = MyData noprint ; by store ; tables brand / out = result ; run ; (the cool thing was I got a database I can merge with MyData) any idea for doing that in R ? Thanks in advance, King Regards, Sylvain Willart, PhD Marketing, IAE Lille, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting non-empty levels of a factor
On Nov 8, 2009, at 8:38 AM, sylvain willart wrote: Hi everyone, I'm struggling with a little problem for a while, and I'm wondering if anyone could help... I have a dataset (from retailing industry) that indicates which brands are present in a panel of 500 stores, store , brand 1 , B1 1 , B2 1 , B3 2 , B1 2 , B3 3 , B2 3 , B3 3 , B4 I would like to know how many brands are present in each store, I tried: result - aggregate(MyData$brand , by=list(MyData$store) , nlevels) but I got: Group.1 x 1 , 4 2 , 4 3 , 4 which is not exactly the result I expected I would like to get sthg like: Group.1 x 1 , 3 2 , 2 3 , 3 Try: result - aggregate(MyData$brand , by=list(MyData$store) , length) Quick, easy and generalizes to other situations. The factor levels got carried along identically, but length counts the number of elements in the list returned by tapply. Looking around, I found I can delete empty levels of factor using: problem.factor - problem.factor[,drop=TRUE] If you reapply the function, factor, you get the same result. So you could have done this: result - aggregate(MyData$brand , by=list(MyData$store) , function(x) nlevels(factor(x))) result Group.1 x 1 1 3 2 2 2 3 3 3 But this solution isn't handy for me as I have many stores and should make a subset of my data for each store before dropping empty factor I can't either counting the line for each store (N), because the same brand can appear several times in each store (several products for the same brand, and/or several weeks of observation) I used to do this calculation using SAS with: proc freq data = MyData noprint ; by store ; tables brand / out = result ; run ; (the cool thing was I got a database I can merge with MyData) any idea for doing that in R ? Thanks in advance, King Regards, Sylvain Willart, PhD Marketing, IAE Lille, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting non-empty levels of a factor
On Nov 8, 2009, at 9:11 AM, David Winsemius wrote: On Nov 8, 2009, at 8:38 AM, sylvain willart wrote: Hi everyone, I'm struggling with a little problem for a while, and I'm wondering if anyone could help... I have a dataset (from retailing industry) that indicates which brands are present in a panel of 500 stores, store , brand 1 , B1 1 , B2 1 , B3 2 , B1 2 , B3 3 , B2 3 , B3 3 , B4 I would like to know how many brands are present in each store, I tried: result - aggregate(MyData$brand , by=list(MyData$store) , nlevels) but I got: Group.1 x 1 , 4 2 , 4 3 , 4 which is not exactly the result I expected I would like to get sthg like: Group.1 x 1 , 3 2 , 2 3 , 3 Try: result - aggregate(MyData$brand , by=list(MyData$store) , length) Quick, easy and generalizes to other situations. The factor levels got carried along identically, but length counts the number of elements in the list returned by tapply. Which may not have been what you asked for as this would demonstrate. You probably wnat the second solution: mydata2 - rbind(MyData, MyData) result - aggregate(mydata2$brand , by=list(mydata2$store) , length) result Group.1 x 1 1 6 2 2 4 3 3 6 result - aggregate(mydata2$brand , by=list(mydata2$store) , function(x) nlevels(factor(x))) result Group.1 x 1 1 3 2 2 2 3 3 3 Looking around, I found I can delete empty levels of factor using: problem.factor - problem.factor[,drop=TRUE] If you reapply the function, factor, you get the same result. So you could have done this: result - aggregate(MyData$brand , by=list(MyData$store) , function(x) nlevels(factor(x))) result Group.1 x 1 1 3 2 2 2 3 3 3 But this solution isn't handy for me as I have many stores and should make a subset of my data for each store before dropping empty factor I can't either counting the line for each store (N), because the same brand can appear several times in each store (several products for the same brand, and/or several weeks of observation) I used to do this calculation using SAS with: proc freq data = MyData noprint ; by store ; tables brand / out = result ; run ; (the cool thing was I got a database I can merge with MyData) any idea for doing that in R ? Thanks in advance, King Regards, Sylvain Willart, PhD Marketing, IAE Lille, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting non-empty levels of a factor
Thanks a lot for those solutions, Both are working great, and they do slightly different (but both very interesting) things, Moreover, I learned about the length() function ... one more to add to my personal cheat sheet King Regards 2009/11/8 David Winsemius dwinsem...@comcast.net: On Nov 8, 2009, at 9:11 AM, David Winsemius wrote: On Nov 8, 2009, at 8:38 AM, sylvain willart wrote: Hi everyone,h I'm struggling with a little problem for a while, and I'm wondering if anyone could help... I have a dataset (from retailing industry) that indicates which brands are present in a panel of 500 stores, store , brand 1 , B1 1 , B2 1 , B3 2 , B1 2 , B3 3 , B2 3 , B3 3 , B4 I would like to know how many brands are present in each store, I tried: result - aggregate(MyData$brand , by=list(MyData$store) , nlevels) but I got: Group.1 x 1 , 4 2 , 4 3 , 4 which is not exactly the result I expected I would like to get sthg like: Group.1 x 1 , 3 2 , 2 3 , 3 Try: result - aggregate(MyData$brand , by=list(MyData$store) , length) Quick, easy and generalizes to other situations. The factor levels got carried along identically, but length counts the number of elements in the list returned by tapply. Which may not have been what you asked for as this would demonstrate. You probably wnat the second solution: mydata2 - rbind(MyData, MyData) result - aggregate(mydata2$brand , by=list(mydata2$store) , length) result Group.1 x 1 1 6 2 2 4 3 3 6 result - aggregate(mydata2$brand , by=list(mydata2$store) , function(x) nlevels(factor(x))) result Group.1 x 1 1 3 2 2 2 3 3 3 Looking around, I found I can delete empty levels of factor using: problem.factor - problem.factor[,drop=TRUE] If you reapply the function, factor, you get the same result. So you could have done this: result - aggregate(MyData$brand , by=list(MyData$store) , function(x) nlevels(factor(x))) result Group.1 x 1 1 3 2 2 2 3 3 3 But this solution isn't handy for me as I have many stores and should make a subset of my data for each store before dropping empty factor I can't either counting the line for each store (N), because the same brand can appear several times in each store (several products for the same brand, and/or several weeks of observation) I used to do this calculation using SAS with: proc freq data = MyData noprint ; by store ; tables brand / out = result ; run ; (the cool thing was I got a database I can merge with MyData) any idea for doing that in R ? Thanks in advance, King Regards, Sylvain Willart, PhD Marketing, IAE Lille, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting non-empty levels of a factor
With xx as your sample data will this work? See ?addmargins jj - table(xx) addmargins(jj, 2) # or for both margins addmargins(jj, c(1,2)) or apply(jj, 1, sum) --- On Sun, 11/8/09, sylvain willart sylvain.will...@gmail.com wrote: From: sylvain willart sylvain.will...@gmail.com Subject: [R] Counting non-empty levels of a factor To: r-help@r-project.org Received: Sunday, November 8, 2009, 8:38 AM Hi everyone, I'm struggling with a little problem for a while, and I'm wondering if anyone could help... I have a dataset (from retailing industry) that indicates which brands are present in a panel of 500 stores, store , brand 1 , B1 1 , B2 1 , B3 2 , B1 2 , B3 3 , B2 3 , B3 3 , B4 I would like to know how many brands are present in each store, I tried: result - aggregate(MyData$brand , by=list(MyData$store) , nlevels) but I got: Group.1 x 1 , 4 2 , 4 3 , 4 which is not exactly the result I expected I would like to get sthg like: Group.1 x 1 , 3 2 , 2 3 , 3 Looking around, I found I can delete empty levels of factor using: problem.factor - problem.factor[,drop=TRUE] But this solution isn't handy for me as I have many stores and should make a subset of my data for each store before dropping empty factor I can't either counting the line for each store (N), because the same brand can appear several times in each store (several products for the same brand, and/or several weeks of observation) I used to do this calculation using SAS with: proc freq data = MyData noprint ; by store ; tables brand / out = result ; run ; (the cool thing was I got a database I can merge with MyData) any idea for doing that in R ? Thanks in advance, King Regards, Sylvain Willart, PhD Marketing, IAE Lille, France __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Make your browsing faster, safer, and easier with the new Internet Explorer® 8. Opt ernetexplorer/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.