Re: [R] Dataframe Manipulation
Hi Hemant, data_help <- data_help %>% # Add a dummy index for each purchase to keep a memory of the purchase since it will dissappear later on. You could also use row number mutate(Purchase_ID = 1:n()) %>% # For each purchase id group_by(Purchase_ID) %>% # Call the split_items function, which returns a data.frame do(split_items(.)) cat_help %>% # Make the data.frame long where the column names are gathered in a dummy column and the items (the content of each column) in another column called Item gather("Foo", "Item") %>% filter(!is.na(Item)) %>% left_join(data_help, by = "Item") %>% group_by(Foo, Purchase_ID) %>% # Combine the items for each purchase and item type and make a wide data.frame summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Foo", value = "Item") I suggest that you read the book [R for Data Science](http://r4ds.had.co.nz/) by Garrett Grolemund and Hadley Wickham Best wishes, Ulrik On Mo., 4. Sep. 2017, 09:31 Hemant Sainwrote: > Hello Ulrik, > Can you please explain this code means how and what this code is doing > because I'm not able to understand it, if you can explain it i can use it > in future by doing some Lil bit manipulation. > > Thanks > > > data_help <- > data_help %>% > mutate(Purchase_ID = 1:n()) %>% > group_by(Purchase_ID) %>% > do(split_items(.)) > > cat_help %>% gather("Foo", "Item") %>% > filter(!is.na(Item)) %>% > left_join(data_help, by = "Item") %>% > group_by(Foo, Purchase_ID) %>% > summarise(Item = paste(Item, collapse = ", ")) %>% > spread(key = "Foo", value = "Item") > > On 31 August 2017 at 13:17, Ulrik Stervbo wrote: > >> Hi Hemant, >> >> the solution is really quite similar, and the logic is identical: >> >> library(readr) >> library(dplyr) >> library(stringr) >> library(tidyr) >> >> data_help <- read_csv("data_help.csv") >> cat_help <- read_csv("cat_help.csv") >> >> # Helper function to split the Items and create a data_frame >> split_items <- function(items){ >> x <- items$Items_purchased_on_Receipts %>% >> str_split(pattern = ",") %>% >> unlist(use.names = FALSE) >> >> data_frame(Item = x, Purchase_ID = items$Purchase_ID) >> } >> >> data_help <- >> data_help %>% >> mutate(Purchase_ID = 1:n()) %>% >> group_by(Purchase_ID) %>% >> do(split_items(.)) >> >> cat_help %>% gather("Foo", "Item") %>% >> filter(!is.na(Item)) %>% >> left_join(data_help, by = "Item") %>% >> group_by(Foo, Purchase_ID) %>% >> summarise(Item = paste(Item, collapse = ", ")) %>% >> spread(key = "Foo", value = "Item") >> >> HTH >> Ulrik >> >> On Wed, 30 Aug 2017 at 13:22 Hemant Sain wrote: >> >>> by using these two tables we have to create third table in this format >>> where categories will be on the top and transaction will be in the rows, >>> >>> On 30 August 2017 at 16:42, Hemant Sain wrote: >>> Hello Ulrik, Can you please once check this code again on the following data set because it doesn't giving same output to me due to absence of quantity,a compare to previous demo data set becaue spiting is getting done on the basis of quantity and in real data set quantity is missing. so please use following data set and help me out please consider this mail is my final email i won't bother you again but its about my job please help me . Note* the file I'm attaching is very confidential On 30 August 2017 at 15:02, Ulrik Stervbo wrote: > Hi Hemant, > > Does this help you along? > > table_1 <- textConnection("Item_1;Item_2;Item_3 > 1KG banana;300ML milk;1kg sugar > 2Large Corona_Beer;2pack Fries; > 2 Lux_Soap;1kg sugar;") > > table_1 <- read.csv(table_1, sep = ";", na.strings = "", > stringsAsFactors = FALSE, check.names = FALSE) > > table_2 <- > textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy > Products > Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk > Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter > Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red > M;sugar > Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") > > table_2 <- read.csv(table_2, sep = ";", na.strings = "", > stringsAsFactors = FALSE, check.names = FALSE) > > library(tidyr) > library(dplyr) > > table_2 <- gather(table_2, "Category", "Item") > > table_1 <- gather(table_1, "Foo", "Item") %>% > filter(!is.na(Item)) > > table_1 <- separate(table_1, col = "Item", into = c("Quantity", > "Item"), sep = " ") > > table_3 <- left_join(table_1, table_2, by = "Item") %>% > mutate(Item = paste(Quantity, Item)) %>% > select(-Quantity) > > table_3 %>% > group_by(Foo, Category) %>% > summarise(Item = paste(Item,
Re: [R] Dataframe Manipulation
Hello Ulrik, Can you please explain this code means how and what this code is doing because I'm not able to understand it, if you can explain it i can use it in future by doing some Lil bit manipulation. Thanks data_help <- data_help %>% mutate(Purchase_ID = 1:n()) %>% group_by(Purchase_ID) %>% do(split_items(.)) cat_help %>% gather("Foo", "Item") %>% filter(!is.na(Item)) %>% left_join(data_help, by = "Item") %>% group_by(Foo, Purchase_ID) %>% summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Foo", value = "Item") On 31 August 2017 at 13:17, Ulrik Stervbowrote: > Hi Hemant, > > the solution is really quite similar, and the logic is identical: > > library(readr) > library(dplyr) > library(stringr) > library(tidyr) > > data_help <- read_csv("data_help.csv") > cat_help <- read_csv("cat_help.csv") > > # Helper function to split the Items and create a data_frame > split_items <- function(items){ > x <- items$Items_purchased_on_Receipts %>% > str_split(pattern = ",") %>% > unlist(use.names = FALSE) > > data_frame(Item = x, Purchase_ID = items$Purchase_ID) > } > > data_help <- > data_help %>% > mutate(Purchase_ID = 1:n()) %>% > group_by(Purchase_ID) %>% > do(split_items(.)) > > cat_help %>% gather("Foo", "Item") %>% > filter(!is.na(Item)) %>% > left_join(data_help, by = "Item") %>% > group_by(Foo, Purchase_ID) %>% > summarise(Item = paste(Item, collapse = ", ")) %>% > spread(key = "Foo", value = "Item") > > HTH > Ulrik > > On Wed, 30 Aug 2017 at 13:22 Hemant Sain wrote: > >> by using these two tables we have to create third table in this format >> where categories will be on the top and transaction will be in the rows, >> >> On 30 August 2017 at 16:42, Hemant Sain wrote: >> >>> Hello Ulrik, >>> Can you please once check this code again on the following data set >>> because it doesn't giving same output to me due to absence of quantity,a >>> compare to previous demo data set becaue spiting is getting done on the >>> basis of quantity and in real data set quantity is missing. so please use >>> following data set and help me out please consider this mail is my final >>> email i won't bother you again but its about my job please help me >>> . >>> >>> Note* the file I'm attaching is very confidential >>> >>> On 30 August 2017 at 15:02, Ulrik Stervbo >>> wrote: >>> Hi Hemant, Does this help you along? table_1 <- textConnection("Item_1;Item_2;Item_3 1KG banana;300ML milk;1kg sugar 2Large Corona_Beer;2pack Fries; 2 Lux_Soap;1kg sugar;") table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors = FALSE, check.names = FALSE) table_2 <- textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy Products Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors = FALSE, check.names = FALSE) library(tidyr) library(dplyr) table_2 <- gather(table_2, "Category", "Item") table_1 <- gather(table_1, "Foo", "Item") %>% filter(!is.na(Item)) table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"), sep = " ") table_3 <- left_join(table_1, table_2, by = "Item") %>% mutate(Item = paste(Quantity, Item)) %>% select(-Quantity) table_3 %>% group_by(Foo, Category) %>% summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Category", value = "Item") You need to figure out how to handle words written with different cases and how to get the quantity in an universal way. For the code above, I corrected these things by hand in the example data. HTH Ulrik On Wed, 30 Aug 2017 at 10:16 Hemant Sain wrote: > Hey PIKAL, > It's not a homework neithe that is the real dataset i have signer NDA > for > my company so that i can share the original data file, Actually I'm > working > on a market basket analysis task but not able to convert my existing > data > table to appropriate format so that i can apply Apriori algorithm > using R, > and this is very important me to get it done because I'm an intern and > if i > won't get it done they will not going to hire me as a full-time > employee. > i tried everything by myself but not able to get it done. > your precious 10-15 can save my upcoming years. so please if you can > please > help me through this. > i want another
Re: [R] Dataframe Manipulation
Hi Hemant, the solution is really quite similar, and the logic is identical: library(readr) library(dplyr) library(stringr) library(tidyr) data_help <- read_csv("data_help.csv") cat_help <- read_csv("cat_help.csv") # Helper function to split the Items and create a data_frame split_items <- function(items){ x <- items$Items_purchased_on_Receipts %>% str_split(pattern = ",") %>% unlist(use.names = FALSE) data_frame(Item = x, Purchase_ID = items$Purchase_ID) } data_help <- data_help %>% mutate(Purchase_ID = 1:n()) %>% group_by(Purchase_ID) %>% do(split_items(.)) cat_help %>% gather("Foo", "Item") %>% filter(!is.na(Item)) %>% left_join(data_help, by = "Item") %>% group_by(Foo, Purchase_ID) %>% summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Foo", value = "Item") HTH Ulrik On Wed, 30 Aug 2017 at 13:22 Hemant Sainwrote: > by using these two tables we have to create third table in this format > where categories will be on the top and transaction will be in the rows, > > On 30 August 2017 at 16:42, Hemant Sain wrote: > >> Hello Ulrik, >> Can you please once check this code again on the following data set >> because it doesn't giving same output to me due to absence of quantity,a >> compare to previous demo data set becaue spiting is getting done on the >> basis of quantity and in real data set quantity is missing. so please use >> following data set and help me out please consider this mail is my final >> email i won't bother you again but its about my job please help me >> . >> >> Note* the file I'm attaching is very confidential >> >> On 30 August 2017 at 15:02, Ulrik Stervbo >> wrote: >> >>> Hi Hemant, >>> >>> Does this help you along? >>> >>> table_1 <- textConnection("Item_1;Item_2;Item_3 >>> 1KG banana;300ML milk;1kg sugar >>> 2Large Corona_Beer;2pack Fries; >>> 2 Lux_Soap;1kg sugar;") >>> >>> table_1 <- read.csv(table_1, sep = ";", na.strings = "", >>> stringsAsFactors = FALSE, check.names = FALSE) >>> >>> table_2 <- >>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy >>> Products >>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk >>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter >>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red >>> M;sugar >>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") >>> >>> table_2 <- read.csv(table_2, sep = ";", na.strings = "", >>> stringsAsFactors = FALSE, check.names = FALSE) >>> >>> library(tidyr) >>> library(dplyr) >>> >>> table_2 <- gather(table_2, "Category", "Item") >>> >>> table_1 <- gather(table_1, "Foo", "Item") %>% >>> filter(!is.na(Item)) >>> >>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"), >>> sep = " ") >>> >>> table_3 <- left_join(table_1, table_2, by = "Item") %>% >>> mutate(Item = paste(Quantity, Item)) %>% >>> select(-Quantity) >>> >>> table_3 %>% >>> group_by(Foo, Category) %>% >>> summarise(Item = paste(Item, collapse = ", ")) %>% >>> spread(key = "Category", value = "Item") >>> >>> You need to figure out how to handle words written with different cases >>> and how to get the quantity in an universal way. For the code above, I >>> corrected these things by hand in the example data. >>> >>> HTH >>> Ulrik >>> >>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain wrote: >>> Hey PIKAL, It's not a homework neithe that is the real dataset i have signer NDA for my company so that i can share the original data file, Actually I'm working on a market basket analysis task but not able to convert my existing data table to appropriate format so that i can apply Apriori algorithm using R, and this is very important me to get it done because I'm an intern and if i won't get it done they will not going to hire me as a full-time employee. i tried everything by myself but not able to get it done. your precious 10-15 can save my upcoming years. so please if you can please help me through this. i want another dataset based on first two dataset i have mentioned . Thanks On 30 August 2017 at 12:49, PIKAL Petr wrote: > Hi > > It seems to me like homework, there is no homework policy on this help > list. > > What do you want to do with your table 3? It seems to me futile. > > Anyway, some combination of melt, merge, cast and regular expressions > could be employed in such task, but it could be rather tricky. > > But be aware that > > Suger does not match sugar (I wonder that sugar is dairy product) > > and you mix uppercase and lowercase letters which could be also > problematic, when matching words. > > Cheers > Petr > > > -Original
Re: [R] Dataframe Manipulation
by using these two tables we have to create third table in this format where categories will be on the top and transaction will be in the rows, On 30 August 2017 at 16:42, Hemant Sainwrote: > Hello Ulrik, > Can you please once check this code again on the following data set > because it doesn't giving same output to me due to absence of quantity,a > compare to previous demo data set becaue spiting is getting done on the > basis of quantity and in real data set quantity is missing. so please use > following data set and help me out please consider this mail is my final > email i won't bother you again but its about my job please help me > . > > Note* the file I'm attaching is very confidential > > On 30 August 2017 at 15:02, Ulrik Stervbo wrote: > >> Hi Hemant, >> >> Does this help you along? >> >> table_1 <- textConnection("Item_1;Item_2;Item_3 >> 1KG banana;300ML milk;1kg sugar >> 2Large Corona_Beer;2pack Fries; >> 2 Lux_Soap;1kg sugar;") >> >> table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors >> = FALSE, check.names = FALSE) >> >> table_2 <- >> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy >> Products >> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk >> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter >> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar >> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") >> >> table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors >> = FALSE, check.names = FALSE) >> >> library(tidyr) >> library(dplyr) >> >> table_2 <- gather(table_2, "Category", "Item") >> >> table_1 <- gather(table_1, "Foo", "Item") %>% >> filter(!is.na(Item)) >> >> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"), >> sep = " ") >> >> table_3 <- left_join(table_1, table_2, by = "Item") %>% >> mutate(Item = paste(Quantity, Item)) %>% >> select(-Quantity) >> >> table_3 %>% >> group_by(Foo, Category) %>% >> summarise(Item = paste(Item, collapse = ", ")) %>% >> spread(key = "Category", value = "Item") >> >> You need to figure out how to handle words written with different cases >> and how to get the quantity in an universal way. For the code above, I >> corrected these things by hand in the example data. >> >> HTH >> Ulrik >> >> On Wed, 30 Aug 2017 at 10:16 Hemant Sain wrote: >> >>> Hey PIKAL, >>> It's not a homework neithe that is the real dataset i have signer NDA for >>> my company so that i can share the original data file, Actually I'm >>> working >>> on a market basket analysis task but not able to convert my existing data >>> table to appropriate format so that i can apply Apriori algorithm using >>> R, >>> and this is very important me to get it done because I'm an intern and >>> if i >>> won't get it done they will not going to hire me as a full-time >>> employee. >>> i tried everything by myself but not able to get it done. >>> your precious 10-15 can save my upcoming years. so please if you can >>> please >>> help me through this. >>> i want another dataset based on first two dataset i have mentioned . >>> >>> Thanks >>> >>> On 30 August 2017 at 12:49, PIKAL Petr wrote: >>> >>> > Hi >>> > >>> > It seems to me like homework, there is no homework policy on this help >>> > list. >>> > >>> > What do you want to do with your table 3? It seems to me futile. >>> > >>> > Anyway, some combination of melt, merge, cast and regular expressions >>> > could be employed in such task, but it could be rather tricky. >>> > >>> > But be aware that >>> > >>> > Suger does not match sugar (I wonder that sugar is dairy product) >>> > >>> > and you mix uppercase and lowercase letters which could be also >>> > problematic, when matching words. >>> > >>> > Cheers >>> > Petr >>> > >>> > > -Original Message- >>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of >>> Hemant >>> > Sain >>> > > Sent: Wednesday, August 30, 2017 8:28 AM >>> > > To: r-help@r-project.org >>> > > Subject: [R] Dataframe Manipulation >>> > > >>> > > i want to do a market basket analysis and I’m trying to create a >>> dataset >>> > for that >>> > > i have two tables, one table contains daily transaction of products >>> in >>> > which >>> > > each row of table shows item purchased by the customer, The second >>> table >>> > > contains parent group under those products are fallen, for example >>> under >>> > fruit >>> > > category there are several fruits like mango, banana, apple etc. >>> > > i want to create a third table in which parent group are mentioned as >>> > header >>> > > which can be extracted from Table 2, and all the rows represent >>> > transaction of >>> > > products >>> > > >>> > > with their names, and if there is no transaction for any parent >>> category >>> > then >>> > > the cell supposed to fill as NA. please help me
Re: [R] Dataframe Manipulation
Hi Hemant, Does this help you along? table_1 <- textConnection("Item_1;Item_2;Item_3 1KG banana;300ML milk;1kg sugar 2Large Corona_Beer;2pack Fries; 2 Lux_Soap;1kg sugar;") table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors = FALSE, check.names = FALSE) table_2 <- textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy Products Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors = FALSE, check.names = FALSE) library(tidyr) library(dplyr) table_2 <- gather(table_2, "Category", "Item") table_1 <- gather(table_1, "Foo", "Item") %>% filter(!is.na(Item)) table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"), sep = " ") table_3 <- left_join(table_1, table_2, by = "Item") %>% mutate(Item = paste(Quantity, Item)) %>% select(-Quantity) table_3 %>% group_by(Foo, Category) %>% summarise(Item = paste(Item, collapse = ", ")) %>% spread(key = "Category", value = "Item") You need to figure out how to handle words written with different cases and how to get the quantity in an universal way. For the code above, I corrected these things by hand in the example data. HTH Ulrik On Wed, 30 Aug 2017 at 10:16 Hemant Sainwrote: > Hey PIKAL, > It's not a homework neithe that is the real dataset i have signer NDA for > my company so that i can share the original data file, Actually I'm working > on a market basket analysis task but not able to convert my existing data > table to appropriate format so that i can apply Apriori algorithm using R, > and this is very important me to get it done because I'm an intern and if i > won't get it done they will not going to hire me as a full-time employee. > i tried everything by myself but not able to get it done. > your precious 10-15 can save my upcoming years. so please if you can please > help me through this. > i want another dataset based on first two dataset i have mentioned . > > Thanks > > On 30 August 2017 at 12:49, PIKAL Petr wrote: > > > Hi > > > > It seems to me like homework, there is no homework policy on this help > > list. > > > > What do you want to do with your table 3? It seems to me futile. > > > > Anyway, some combination of melt, merge, cast and regular expressions > > could be employed in such task, but it could be rather tricky. > > > > But be aware that > > > > Suger does not match sugar (I wonder that sugar is dairy product) > > > > and you mix uppercase and lowercase letters which could be also > > problematic, when matching words. > > > > Cheers > > Petr > > > > > -Original Message- > > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hemant > > Sain > > > Sent: Wednesday, August 30, 2017 8:28 AM > > > To: r-help@r-project.org > > > Subject: [R] Dataframe Manipulation > > > > > > i want to do a market basket analysis and I’m trying to create a > dataset > > for that > > > i have two tables, one table contains daily transaction of products in > > which > > > each row of table shows item purchased by the customer, The second > table > > > contains parent group under those products are fallen, for example > under > > fruit > > > category there are several fruits like mango, banana, apple etc. > > > i want to create a third table in which parent group are mentioned as > > header > > > which can be extracted from Table 2, and all the rows represent > > transaction of > > > products > > > > > > with their names, and if there is no transaction for any parent > category > > then > > > the cell supposed to fill as NA. please help me with R or C/c++ code( R > > would be > > > > > > preferred) here I’m attaching you all three tables for better reference > > i have > > > first two tables and i want to get a table like table 3 > > > > > > Tables are explained in the attached doc. > > > > > > -- > > > hemantsain.com > > > > > > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou > > určeny pouze jeho adresátům. > > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě > > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho > kopie > > vymažte ze svého systému. > > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento > email > > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. > > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi > > či zpožděním přenosu e-mailu. > > > > V případě, že je tento e-mail součástí obchodního jednání: > > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření > > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. > > - a obsahuje-li nabídku, je adresát
Re: [R] Dataframe Manipulation
Hey PIKAL, It's not a homework neithe that is the real dataset i have signer NDA for my company so that i can share the original data file, Actually I'm working on a market basket analysis task but not able to convert my existing data table to appropriate format so that i can apply Apriori algorithm using R, and this is very important me to get it done because I'm an intern and if i won't get it done they will not going to hire me as a full-time employee. i tried everything by myself but not able to get it done. your precious 10-15 can save my upcoming years. so please if you can please help me through this. i want another dataset based on first two dataset i have mentioned . Thanks On 30 August 2017 at 12:49, PIKAL Petrwrote: > Hi > > It seems to me like homework, there is no homework policy on this help > list. > > What do you want to do with your table 3? It seems to me futile. > > Anyway, some combination of melt, merge, cast and regular expressions > could be employed in such task, but it could be rather tricky. > > But be aware that > > Suger does not match sugar (I wonder that sugar is dairy product) > > and you mix uppercase and lowercase letters which could be also > problematic, when matching words. > > Cheers > Petr > > > -Original Message- > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hemant > Sain > > Sent: Wednesday, August 30, 2017 8:28 AM > > To: r-help@r-project.org > > Subject: [R] Dataframe Manipulation > > > > i want to do a market basket analysis and I’m trying to create a dataset > for that > > i have two tables, one table contains daily transaction of products in > which > > each row of table shows item purchased by the customer, The second table > > contains parent group under those products are fallen, for example under > fruit > > category there are several fruits like mango, banana, apple etc. > > i want to create a third table in which parent group are mentioned as > header > > which can be extracted from Table 2, and all the rows represent > transaction of > > products > > > > with their names, and if there is no transaction for any parent category > then > > the cell supposed to fill as NA. please help me with R or C/c++ code( R > would be > > > > preferred) here I’m attaching you all three tables for better reference > i have > > first two tables and i want to get a table like table 3 > > > > Tables are explained in the attached doc. > > > > -- > > hemantsain.com > > > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou > určeny pouze jeho adresátům. > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie > vymažte ze svého systému. > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi > či zpožděním přenosu e-mailu. > > V případě, že je tento e-mail součástí obchodního jednání: > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. > - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; > Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany > příjemce s dodatkem či odchylkou. > - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve > výslovným dosažením shody na všech jejích náležitostech. > - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za > společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn > nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto > emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich > existence je adresátovi či osobě jím zastoupené známá. > > This e-mail and any documents attached to it may be confidential and are > intended only for its intended recipients. > If you received this e-mail by mistake, please immediately inform its > sender. Delete the contents of this e-mail with all attachments and its > copies from your system. > If you are not the intended recipient of this e-mail, you are not > authorized to use, disseminate, copy or disclose this e-mail in any manner. > The sender of this e-mail shall not be liable for any possible damage > caused by modifications of the e-mail or by delay with transfer of the > email. > > In case that this e-mail forms part of business dealings: > - the sender reserves the right to end negotiations about entering into a > contract in any time, for any reason, and without stating any reasoning. > - if the e-mail contains an offer, the recipient is entitled to > immediately accept such offer; The sender of this e-mail (offer) excludes > any acceptance of the offer on the part of the recipient containing any > amendment or variation. > - the sender insists on that
Re: [R] Dataframe Manipulation
Hi It seems to me like homework, there is no homework policy on this help list. What do you want to do with your table 3? It seems to me futile. Anyway, some combination of melt, merge, cast and regular expressions could be employed in such task, but it could be rather tricky. But be aware that Suger does not match sugar (I wonder that sugar is dairy product) and you mix uppercase and lowercase letters which could be also problematic, when matching words. Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hemant Sain > Sent: Wednesday, August 30, 2017 8:28 AM > To: r-help@r-project.org > Subject: [R] Dataframe Manipulation > > i want to do a market basket analysis and I’m trying to create a dataset for > that > i have two tables, one table contains daily transaction of products in which > each row of table shows item purchased by the customer, The second table > contains parent group under those products are fallen, for example under fruit > category there are several fruits like mango, banana, apple etc. > i want to create a third table in which parent group are mentioned as header > which can be extracted from Table 2, and all the rows represent transaction of > products > > with their names, and if there is no transaction for any parent category then > the cell supposed to fill as NA. please help me with R or C/c++ code( R would > be > > preferred) here I’m attaching you all three tables for better reference i have > first two tables and i want to get a table like table 3 > > Tables are explained in the attached doc. > > -- > hemantsain.com Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe manipulation
Perfect! Thanks a lot, A.K! On Fri, Dec 13, 2013 at 4:21 PM, arun smartpink...@yahoo.com wrote: Hi, Try: d[match(unique(d$fac),d$fac),] A.K. On Friday, December 13, 2013 4:17 PM, Gang Chen gangch...@gmail.com wrote: Suppose I have a dataframe defined as L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace = TRUE))) x y fac 1 1 1 C 2 1 2 A 3 1 3 B 4 1 4 C 5 1 5 B 6 1 6 B 7 1 7 A 8 1 8 A 9 1 9 B 10 1 10 A I want to extract those rows that are the first occurrences for each level of factor 'fac', which are basically the first three rows above. How can I achieve that? The real dataframe is more complicated than the example above, and I can't simply list all the levels of factor 'fac' by exhaustibly listing all the levels like the following d[d$fac=='A' | d$fac=='B' | d$fac=='C', ] Thanks, Gang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe manipulation
What about: lapply(levels(d$fac), function(x)head(d[d$fac == x,], 1)) Thanks for the reproducible example. If you put set.seed(123) before the call to sample, then everyone who tries it will get the same data frame d. Sarah On Fri, Dec 13, 2013 at 4:15 PM, Gang Chen gangch...@gmail.com wrote: Suppose I have a dataframe defined as L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace = TRUE))) x y fac 1 1 1 C 2 1 2 A 3 1 3 B 4 1 4 C 5 1 5 B 6 1 6 B 7 1 7 A 8 1 8 A 9 1 9 B 10 1 10 A I want to extract those rows that are the first occurrences for each level of factor 'fac', which are basically the first three rows above. How can I achieve that? The real dataframe is more complicated than the example above, and I can't simply list all the levels of factor 'fac' by exhaustibly listing all the levels like the following d[d$fac=='A' | d$fac=='B' | d$fac=='C', ] Thanks, Gang -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe manipulation
Another neat solution! Thanks a lot, Sarah! On Fri, Dec 13, 2013 at 4:35 PM, Sarah Goslee sarah.gos...@gmail.comwrote: What about: lapply(levels(d$fac), function(x)head(d[d$fac == x,], 1)) Thanks for the reproducible example. If you put set.seed(123) before the call to sample, then everyone who tries it will get the same data frame d. Sarah On Fri, Dec 13, 2013 at 4:15 PM, Gang Chen gangch...@gmail.com wrote: Suppose I have a dataframe defined as L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace = TRUE))) x y fac 1 1 1 C 2 1 2 A 3 1 3 B 4 1 4 C 5 1 5 B 6 1 6 B 7 1 7 A 8 1 8 A 9 1 9 B 10 1 10 A I want to extract those rows that are the first occurrences for each level of factor 'fac', which are basically the first three rows above. How can I achieve that? The real dataframe is more complicated than the example above, and I can't simply list all the levels of factor 'fac' by exhaustibly listing all the levels like the following d[d$fac=='A' | d$fac=='B' | d$fac=='C', ] Thanks, Gang -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe manipulation
d[match(unique(d$fac),d$fac),] The following does the same thing a little more directly (and quickly) d[ !duplicated(d$fac), ] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gang Chen Sent: Friday, December 13, 2013 1:35 PM To: arun Cc: R help Subject: Re: [R] dataframe manipulation Perfect! Thanks a lot, A.K! On Fri, Dec 13, 2013 at 4:21 PM, arun smartpink...@yahoo.com wrote: Hi, Try: d[match(unique(d$fac),d$fac),] A.K. On Friday, December 13, 2013 4:17 PM, Gang Chen gangch...@gmail.com wrote: Suppose I have a dataframe defined as L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace = TRUE))) x y fac 1 1 1 C 2 1 2 A 3 1 3 B 4 1 4 C 5 1 5 B 6 1 6 B 7 1 7 A 8 1 8 A 9 1 9 B 10 1 10 A I want to extract those rows that are the first occurrences for each level of factor 'fac', which are basically the first three rows above. How can I achieve that? The real dataframe is more complicated than the example above, and I can't simply list all the levels of factor 'fac' by exhaustibly listing all the levels like the following d[d$fac=='A' | d$fac=='B' | d$fac=='C', ] Thanks, Gang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dataframe manipulation
Hi, Try: d[match(unique(d$fac),d$fac),] A.K. On Friday, December 13, 2013 4:17 PM, Gang Chen gangch...@gmail.com wrote: Suppose I have a dataframe defined as L3 - LETTERS[1:3] (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace = TRUE))) x y fac 1 1 1 C 2 1 2 A 3 1 3 B 4 1 4 C 5 1 5 B 6 1 6 B 7 1 7 A 8 1 8 A 9 1 9 B 10 1 10 A I want to extract those rows that are the first occurrences for each level of factor 'fac', which are basically the first three rows above. How can I achieve that? The real dataframe is more complicated than the example above, and I can't simply list all the levels of factor 'fac' by exhaustibly listing all the levels like the following d[d$fac=='A' | d$fac=='B' | d$fac=='C', ] Thanks, Gang [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe manipulation
Hi Adam, I hope this is what you wanted: dat1- read.csv(example.csv,sep=\t,stringsAsFactors=FALSE) str(dat1) #'data.frame': 102 obs. of 5 variables: # $ species : chr B. barbastrellus E. nilssonii H. savii M. alcathoe ... # $ period : chr dusk dusk dusk dusk ... # $ treatment: chr control control control control ... # $ no.files : int 16 1 9 13 1 49 6 3 4 0 ... # $ expected : logi NA NA NA NA NA NA ... dat2-within(dat1,{expected-ave(no.files,species,treatment,FUN=mean)}) head(dat2) # species period treatment no.files expected #1 B. barbastrellus dusk control 16 14.33 #2 E. nilssonii dusk control 1 1.00 #3 H. savii dusk control 9 4.67 #4 M. alcathoe dusk control 13 13.33 #5 M. bechsteinii dusk control 1 5.67 #6 M. brandtii dusk control 49 63.00 A.K. From: adam bosworth english.fel...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Friday, March 29, 2013 2:29 PM Subject: RE: Dataframe manipulation Hey, Thanks for the responce. I'm not sure if you messaged me on the forum or emailed me, but the only way I could get back to you was email, so hope that's alright. I've attached part of the dataset into a csv file for you to look at. in cell E2 I've given an example output I'd like in the dataframe, by summing values of '16', '25' and '2' from cells D2, D12 and D22 respectively and then dividing this by 3. Thanks for the help, much appreshiated! Date: Fri, 29 Mar 2013 10:24:31 -0700 From: smartpink...@yahoo.com To: english.fel...@hotmail.com Subject: Dataframe manipulation HI, Is it possible to post a small example dataset and also the output dataset you wanted? In that way, it will be much more easier to understand what you meant. quote author='englishfellow' New to R and struggling with dataframe commands, any help will be much appreshiated. I have an existing dataframe call it 'df' with 4 colums and I have added a 5th column which I need to fill, the conditions of which are as follows: for each row in column 5, I need it to look at column 1 and find all data which are equal to that present in that row. I then need to look through those rows in column 3 and again, find rows where the data are equal to the row in question, after it has found these rows I need it to look in column 4 for values, add them up and divide the sum by 3. I tried to explain that as best I could, and I can go into more detail if it helps clarify what I'm after. But like I said, any help would be great. Cheers. /quote Quoted from: http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe manipulation
Fantastic, thanks alot for that! Take care. Adam. Date: Sat, 30 Mar 2013 01:14:49 -0700 From: ml-node+s789695n466289...@n4.nabble.com To: english.fel...@hotmail.com Subject: Re: Dataframe manipulation Hi Adam, I hope this is what you wanted: dat1- read.csv(example.csv,sep=\t,stringsAsFactors=FALSE) str(dat1) #'data.frame':102 obs. of 5 variables: # $ species : chr B. barbastrellus E. nilssonii H. savii M. alcathoe ... # $ period : chr dusk dusk dusk dusk ... # $ treatment: chr control control control control ... # $ no.files : int 16 1 9 13 1 49 6 3 4 0 ... # $ expected : logi NA NA NA NA NA NA ... dat2-within(dat1,{expected-ave(no.files,species,treatment,FUN=mean)}) head(dat2) # species period treatment no.files expected #1 B. barbastrellus dusk control 16 14.33 #2 E. nilssonii dusk control1 1.00 #3 H. savii dusk control9 4.67 #4 M. alcathoe dusk control 13 13.33 #5 M. bechsteinii dusk control1 5.67 #6 M. brandtii dusk control 49 63.00 A.K. From: adam bosworth [hidden email] To: [hidden email] [hidden email] Sent: Friday, March 29, 2013 2:29 PM Subject: RE: Dataframe manipulation Hey, Thanks for the responce. I'm not sure if you messaged me on the forum or emailed me, but the only way I could get back to you was email, so hope that's alright. I've attached part of the dataset into a csv file for you to look at. in cell E2 I've given an example output I'd like in the dataframe, by summing values of '16', '25' and '2' from cells D2, D12 and D22 respectively and then dividing this by 3. [[elided Hotmail spam]] Date: Fri, 29 Mar 2013 10:24:31 -0700 From: [hidden email] To: [hidden email] Subject: Dataframe manipulation HI, Is it possible to post a small example dataset and also the output dataset you wanted? In that way, it will be much more easier to understand what you meant. quote author='englishfellow' New to R and struggling with dataframe commands, any help will be much appreshiated. I have an existing dataframe call it 'df' with 4 colums and I have added a 5th column which I need to fill, the conditions of which are as follows: for each row in column 5, I need it to look at column 1 and find all data which are equal to that present in that row. I then need to look through those rows in column 3 and again, find rows where the data are equal to the row in question, after it has found these rows I need it to look in column 4 for values, add them up and divide the sum by 3. I tried to explain that as best I could, and I can go into more detail if it helps clarify what I'm after. But like I said, any help would be great. Cheers. /quote Quoted from: http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844.html __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844p4662890.html To unsubscribe from Dataframe manipulation, click here. NAML -- View this message in context: http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844p4662894.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe manipulation
try this (also look at R-FAQ 7.10): sapply(df, function (x) as.numeric(levels(x))[as.integer(x)]) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Antje [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, December 04, 2007 11:46 AM Subject: [R] Dataframe manipulation Hello, can anybody help me with this problem? I have a dataframe, which contains its values as factors though I have numbers but it was read as factors with scan. Now I would like to convert these columns (multiple) to a numeric format. # this example creates a similar situation testdata - as.factor(c(1.1,NA,2.3,5.5)) testdata2 - as.factor(c(1.7,4.3,8.5,10.0)) df - data.frame(testdata, testdata2) what do I have to do to get the same datafram but with numeric values??? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe manipulation
See R-FAQ # 7-11 for the solution. Have a look at http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98227.html for a discussion of this type of problem and ways to get around the issue. --- Antje [EMAIL PROTECTED] wrote: Hello, can anybody help me with this problem? I have a dataframe, which contains its values as factors though I have numbers but it was read as factors with scan. Now I would like to convert these columns (multiple) to a numeric format. # this example creates a similar situation testdata - as.factor(c(1.1,NA,2.3,5.5)) testdata2 - as.factor(c(1.7,4.3,8.5,10.0)) df - data.frame(testdata, testdata2) what do I have to do to get the same datafram but with numeric values??? Antje __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataframe manipulation
my original reply was intended for the original version of 'df', in which both columns were factors. In your example you have added a numeric column, so not exactly the case I've replied for. For your example can use the following: testdata - as.factor(c(1.1,NA,2.3,5.5)) testdata2 - as.factor(c(1.7,4.3,8.5,10.0)) df - data.frame(testdata, testdata2) df$testdata1 - as.numeric(levels(df$testdata))[as.integer(df$testdata)] fcts - sapply(df, is.factor) df[fcts] - lapply(df[fcts], function (x) as.numeric(levels(x))[as.integer(x)]) df str(df) Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: David Winsemius [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, December 04, 2007 4:47 PM Subject: Re: [R] Dataframe manipulation Dimitris Rizopoulos [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: try this (also look at R-FAQ 7.10): sapply(df, function (x) as.numeric(levels(x))[as.integer(x)]) That looks rather dangerous. By the time I saw your suggestion, I had already added an extra variable with: df$testdata1-as.numeric(levels(df$testdata))[as.integer(df$testdata)] When I tried your suggestion I got no error, but there was also no effect. When I tried: df2-sapply(df, function (x) as.numeric(levels(x))[as.integer(x)]) I discovered that the numeric variable, testdata1, had been entirely coverted to NA's and str(df2) did not look data.frame-like. is.data.frame(df2) [1] FALSE -- David Winsemius - Original Message - From: Antje [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, December 04, 2007 11:46 AM Subject: [R] Dataframe manipulation Hello, can anybody help me with this problem? I have a dataframe, which contains its values as factors though I have numbers but it was read as factors with scan. Now I would like to convert these columns (multiple) to a numeric format. # this example creates a similar situation testdata - as.factor(c(1.1,NA,2.3,5.5)) testdata2 - as.factor(c(1.7,4.3,8.5,10.0)) df - data.frame(testdata, testdata2) what do I have to do to get the same datafram but with numeric values??? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.