Re: [R] Dataframe Manipulation

2017-09-05 Thread Ulrik Stervbo
Hi Hemant,

data_help <- data_help %>%
# Add a dummy index for each purchase to keep a memory of the purchase
since it will dissappear later on. You could also use row number
mutate(Purchase_ID = 1:n()) %>%
# For each purchase id
group_by(Purchase_ID) %>%
# Call the split_items function, which returns a data.frame
do(split_items(.))

cat_help %>%
# Make the data.frame long where the column names are gathered in a dummy
column and the items (the content of each column) in another column called
Item
gather("Foo", "Item") %>%
filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
group_by(Foo, Purchase_ID) %>%
# Combine the items for each purchase and item type and make a wide
data.frame
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Foo", value = "Item")

I suggest that you read the book [R for Data Science](http://r4ds.had.co.nz/)
by Garrett Grolemund and Hadley Wickham

Best wishes,
Ulrik

On Mo., 4. Sep. 2017, 09:31 Hemant Sain  wrote:

> Hello Ulrik,
> Can you please explain this code means how and what this code is doing
> because I'm not able to understand it, if you can explain it i can use it
> in future by doing some Lil bit manipulation.
>
> Thanks
>
>
> data_help <-
>   data_help %>%
>   mutate(Purchase_ID = 1:n()) %>%
>   group_by(Purchase_ID) %>%
> do(split_items(.))
>
> cat_help %>% gather("Foo", "Item") %>%
>   filter(!is.na(Item)) %>%
> left_join(data_help, by = "Item") %>%
>   group_by(Foo, Purchase_ID) %>%
>   summarise(Item = paste(Item, collapse = ", ")) %>%
>   spread(key = "Foo", value = "Item")
>
> On 31 August 2017 at 13:17, Ulrik Stervbo  wrote:
>
>> Hi Hemant,
>>
>> the solution is really quite similar, and the logic is identical:
>>
>> library(readr)
>> library(dplyr)
>> library(stringr)
>> library(tidyr)
>>
>> data_help <- read_csv("data_help.csv")
>> cat_help <- read_csv("cat_help.csv")
>>
>> # Helper function to split the Items and create a data_frame
>> split_items <- function(items){
>>   x <- items$Items_purchased_on_Receipts %>%
>> str_split(pattern = ",") %>%
>> unlist(use.names = FALSE)
>>
>>   data_frame(Item = x, Purchase_ID = items$Purchase_ID)
>> }
>>
>> data_help <-
>>   data_help %>%
>>   mutate(Purchase_ID = 1:n()) %>%
>>   group_by(Purchase_ID) %>%
>> do(split_items(.))
>>
>> cat_help %>% gather("Foo", "Item") %>%
>>   filter(!is.na(Item)) %>%
>> left_join(data_help, by = "Item") %>%
>>   group_by(Foo, Purchase_ID) %>%
>>   summarise(Item = paste(Item, collapse = ", ")) %>%
>>   spread(key = "Foo", value = "Item")
>>
>> HTH
>> Ulrik
>>
>> On Wed, 30 Aug 2017 at 13:22 Hemant Sain  wrote:
>>
>>> by using these two tables we have to create third table in this format
>>> where categories will be on the top and transaction will be in the rows,
>>>
>>> On 30 August 2017 at 16:42, Hemant Sain  wrote:
>>>
 Hello Ulrik,
 Can you please once check this code again on the following data set
 because it doesn't giving same output to me due to absence of quantity,a
 compare to previous demo data set becaue spiting is getting done on the
 basis of quantity and in real data set quantity is missing. so please use
 following data set and help me out please consider this mail is my final
 email i won't bother you again but its about my job please help me
 .

 Note* the file I'm attaching is very confidential

 On 30 August 2017 at 15:02, Ulrik Stervbo 
  wrote:

> Hi Hemant,
>
> Does this help you along?
>
> table_1 <- textConnection("Item_1;Item_2;Item_3
> 1KG banana;300ML milk;1kg sugar
> 2Large Corona_Beer;2pack Fries;
> 2 Lux_Soap;1kg sugar;")
>
> table_1 <- read.csv(table_1, sep = ";", na.strings = "",
> stringsAsFactors = FALSE, check.names = FALSE)
>
> table_2 <-
> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
> Products
> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
> M;sugar
> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
>
> table_2 <- read.csv(table_2, sep = ";", na.strings = "",
> stringsAsFactors = FALSE, check.names = FALSE)
>
> library(tidyr)
> library(dplyr)
>
> table_2 <- gather(table_2, "Category", "Item")
>
> table_1 <- gather(table_1, "Foo", "Item") %>%
>   filter(!is.na(Item))
>
> table_1 <- separate(table_1, col = "Item", into = c("Quantity",
> "Item"), sep = " ")
>
> table_3 <- left_join(table_1, table_2, by = "Item") %>%
>   mutate(Item = paste(Quantity, Item)) %>%
>   select(-Quantity)
>
> table_3 %>%
>   group_by(Foo, Category) %>%
>   summarise(Item = paste(Item, 

Re: [R] Dataframe Manipulation

2017-09-04 Thread Hemant Sain
Hello Ulrik,
Can you please explain this code means how and what this code is doing
because I'm not able to understand it, if you can explain it i can use it
in future by doing some Lil bit manipulation.

Thanks


data_help <-
  data_help %>%
  mutate(Purchase_ID = 1:n()) %>%
  group_by(Purchase_ID) %>%
do(split_items(.))

cat_help %>% gather("Foo", "Item") %>%
  filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
  group_by(Foo, Purchase_ID) %>%
  summarise(Item = paste(Item, collapse = ", ")) %>%
  spread(key = "Foo", value = "Item")

On 31 August 2017 at 13:17, Ulrik Stervbo  wrote:

> Hi Hemant,
>
> the solution is really quite similar, and the logic is identical:
>
> library(readr)
> library(dplyr)
> library(stringr)
> library(tidyr)
>
> data_help <- read_csv("data_help.csv")
> cat_help <- read_csv("cat_help.csv")
>
> # Helper function to split the Items and create a data_frame
> split_items <- function(items){
>   x <- items$Items_purchased_on_Receipts %>%
> str_split(pattern = ",") %>%
> unlist(use.names = FALSE)
>
>   data_frame(Item = x, Purchase_ID = items$Purchase_ID)
> }
>
> data_help <-
>   data_help %>%
>   mutate(Purchase_ID = 1:n()) %>%
>   group_by(Purchase_ID) %>%
> do(split_items(.))
>
> cat_help %>% gather("Foo", "Item") %>%
>   filter(!is.na(Item)) %>%
> left_join(data_help, by = "Item") %>%
>   group_by(Foo, Purchase_ID) %>%
>   summarise(Item = paste(Item, collapse = ", ")) %>%
>   spread(key = "Foo", value = "Item")
>
> HTH
> Ulrik
>
> On Wed, 30 Aug 2017 at 13:22 Hemant Sain  wrote:
>
>> by using these two tables we have to create third table in this format
>> where categories will be on the top and transaction will be in the rows,
>>
>> On 30 August 2017 at 16:42, Hemant Sain  wrote:
>>
>>> Hello Ulrik,
>>> Can you please once check this code again on the following data set
>>> because it doesn't giving same output to me due to absence of quantity,a
>>> compare to previous demo data set becaue spiting is getting done on the
>>> basis of quantity and in real data set quantity is missing. so please use
>>> following data set and help me out please consider this mail is my final
>>> email i won't bother you again but its about my job please help me
>>> .
>>>
>>> Note* the file I'm attaching is very confidential
>>>
>>> On 30 August 2017 at 15:02, Ulrik Stervbo 
>>>  wrote:
>>>
 Hi Hemant,

 Does this help you along?

 table_1 <- textConnection("Item_1;Item_2;Item_3
 1KG banana;300ML milk;1kg sugar
 2Large Corona_Beer;2pack Fries;
 2 Lux_Soap;1kg sugar;")

 table_1 <- read.csv(table_1, sep = ";", na.strings = "",
 stringsAsFactors = FALSE, check.names = FALSE)

 table_2 <- 
 textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
 Products
 Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
 Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
 Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
 M;sugar
 Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")

 table_2 <- read.csv(table_2, sep = ";", na.strings = "",
 stringsAsFactors = FALSE, check.names = FALSE)

 library(tidyr)
 library(dplyr)

 table_2 <- gather(table_2, "Category", "Item")

 table_1 <- gather(table_1, "Foo", "Item") %>%
   filter(!is.na(Item))

 table_1 <- separate(table_1, col = "Item", into = c("Quantity",
 "Item"), sep = " ")

 table_3 <- left_join(table_1, table_2, by = "Item") %>%
   mutate(Item = paste(Quantity, Item)) %>%
   select(-Quantity)

 table_3 %>%
   group_by(Foo, Category) %>%
   summarise(Item = paste(Item, collapse = ", ")) %>%
   spread(key = "Category", value = "Item")

 You need to figure out how to handle words written with different cases
 and how to get the quantity in an universal way. For the code above, I
 corrected these things by hand in the example data.

 HTH
 Ulrik

 On Wed, 30 Aug 2017 at 10:16 Hemant Sain 
 wrote:

> Hey PIKAL,
> It's not a homework neithe that is the real dataset i have signer NDA
> for
> my company so that i can share the original data file, Actually I'm
> working
> on a market basket analysis task but not able to convert my existing
> data
> table to appropriate format so that i can apply Apriori algorithm
> using R,
> and this is very important me to get it done because I'm an intern and
> if i
> won't get it done they will not  going to hire me as a full-time
> employee.
> i tried everything by myself but not able to get it done.
> your precious 10-15 can save my upcoming years. so please if you can
> please
> help me through this.
> i want another 

Re: [R] Dataframe Manipulation

2017-08-31 Thread Ulrik Stervbo
Hi Hemant,

the solution is really quite similar, and the logic is identical:

library(readr)
library(dplyr)
library(stringr)
library(tidyr)

data_help <- read_csv("data_help.csv")
cat_help <- read_csv("cat_help.csv")

# Helper function to split the Items and create a data_frame
split_items <- function(items){
  x <- items$Items_purchased_on_Receipts %>%
str_split(pattern = ",") %>%
unlist(use.names = FALSE)

  data_frame(Item = x, Purchase_ID = items$Purchase_ID)
}

data_help <-
  data_help %>%
  mutate(Purchase_ID = 1:n()) %>%
  group_by(Purchase_ID) %>%
do(split_items(.))

cat_help %>% gather("Foo", "Item") %>%
  filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
  group_by(Foo, Purchase_ID) %>%
  summarise(Item = paste(Item, collapse = ", ")) %>%
  spread(key = "Foo", value = "Item")

HTH
Ulrik

On Wed, 30 Aug 2017 at 13:22 Hemant Sain  wrote:

> by using these two tables we have to create third table in this format
> where categories will be on the top and transaction will be in the rows,
>
> On 30 August 2017 at 16:42, Hemant Sain  wrote:
>
>> Hello Ulrik,
>> Can you please once check this code again on the following data set
>> because it doesn't giving same output to me due to absence of quantity,a
>> compare to previous demo data set becaue spiting is getting done on the
>> basis of quantity and in real data set quantity is missing. so please use
>> following data set and help me out please consider this mail is my final
>> email i won't bother you again but its about my job please help me
>> .
>>
>> Note* the file I'm attaching is very confidential
>>
>> On 30 August 2017 at 15:02, Ulrik Stervbo 
>> wrote:
>>
>>> Hi Hemant,
>>>
>>> Does this help you along?
>>>
>>> table_1 <- textConnection("Item_1;Item_2;Item_3
>>> 1KG banana;300ML milk;1kg sugar
>>> 2Large Corona_Beer;2pack Fries;
>>> 2 Lux_Soap;1kg sugar;")
>>>
>>> table_1 <- read.csv(table_1, sep = ";", na.strings = "",
>>> stringsAsFactors = FALSE, check.names = FALSE)
>>>
>>> table_2 <-
>>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
>>> Products
>>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
>>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
>>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
>>> M;sugar
>>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
>>>
>>> table_2 <- read.csv(table_2, sep = ";", na.strings = "",
>>> stringsAsFactors = FALSE, check.names = FALSE)
>>>
>>> library(tidyr)
>>> library(dplyr)
>>>
>>> table_2 <- gather(table_2, "Category", "Item")
>>>
>>> table_1 <- gather(table_1, "Foo", "Item") %>%
>>>   filter(!is.na(Item))
>>>
>>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"),
>>> sep = " ")
>>>
>>> table_3 <- left_join(table_1, table_2, by = "Item") %>%
>>>   mutate(Item = paste(Quantity, Item)) %>%
>>>   select(-Quantity)
>>>
>>> table_3 %>%
>>>   group_by(Foo, Category) %>%
>>>   summarise(Item = paste(Item, collapse = ", ")) %>%
>>>   spread(key = "Category", value = "Item")
>>>
>>> You need to figure out how to handle words written with different cases
>>> and how to get the quantity in an universal way. For the code above, I
>>> corrected these things by hand in the example data.
>>>
>>> HTH
>>> Ulrik
>>>
>>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain  wrote:
>>>
 Hey PIKAL,
 It's not a homework neithe that is the real dataset i have signer NDA
 for
 my company so that i can share the original data file, Actually I'm
 working
 on a market basket analysis task but not able to convert my existing
 data
 table to appropriate format so that i can apply Apriori algorithm using
 R,
 and this is very important me to get it done because I'm an intern and
 if i
 won't get it done they will not  going to hire me as a full-time
 employee.
 i tried everything by myself but not able to get it done.
 your precious 10-15 can save my upcoming years. so please if you can
 please
 help me through this.
 i want another dataset based on first two dataset i have mentioned .

 Thanks

 On 30 August 2017 at 12:49, PIKAL Petr  wrote:

 > Hi
 >
 > It seems to me like homework, there is no homework policy on this help
 > list.
 >
 > What do you want to do with your table 3? It seems to me futile.
 >
 > Anyway, some combination of melt, merge, cast and regular expressions
 > could be employed in such task, but it could be rather tricky.
 >
 > But be aware that
 >
 > Suger does not match sugar (I wonder that sugar is dairy product)
 >
 > and you mix uppercase and lowercase letters which could be also
 > problematic, when matching words.
 >
 > Cheers
 > Petr
 >
 > > -Original 

Re: [R] Dataframe Manipulation

2017-08-30 Thread Hemant Sain
by using these two tables we have to create third table in this format
where categories will be on the top and transaction will be in the rows,

On 30 August 2017 at 16:42, Hemant Sain  wrote:

> Hello Ulrik,
> Can you please once check this code again on the following data set
> because it doesn't giving same output to me due to absence of quantity,a
> compare to previous demo data set becaue spiting is getting done on the
> basis of quantity and in real data set quantity is missing. so please use
> following data set and help me out please consider this mail is my final
> email i won't bother you again but its about my job please help me
> .
>
> Note* the file I'm attaching is very confidential
>
> On 30 August 2017 at 15:02, Ulrik Stervbo  wrote:
>
>> Hi Hemant,
>>
>> Does this help you along?
>>
>> table_1 <- textConnection("Item_1;Item_2;Item_3
>> 1KG banana;300ML milk;1kg sugar
>> 2Large Corona_Beer;2pack Fries;
>> 2 Lux_Soap;1kg sugar;")
>>
>> table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors
>> = FALSE, check.names = FALSE)
>>
>> table_2 <- 
>> textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
>> Products
>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar
>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
>>
>> table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors
>> = FALSE, check.names = FALSE)
>>
>> library(tidyr)
>> library(dplyr)
>>
>> table_2 <- gather(table_2, "Category", "Item")
>>
>> table_1 <- gather(table_1, "Foo", "Item") %>%
>>   filter(!is.na(Item))
>>
>> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"),
>> sep = " ")
>>
>> table_3 <- left_join(table_1, table_2, by = "Item") %>%
>>   mutate(Item = paste(Quantity, Item)) %>%
>>   select(-Quantity)
>>
>> table_3 %>%
>>   group_by(Foo, Category) %>%
>>   summarise(Item = paste(Item, collapse = ", ")) %>%
>>   spread(key = "Category", value = "Item")
>>
>> You need to figure out how to handle words written with different cases
>> and how to get the quantity in an universal way. For the code above, I
>> corrected these things by hand in the example data.
>>
>> HTH
>> Ulrik
>>
>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain  wrote:
>>
>>> Hey PIKAL,
>>> It's not a homework neithe that is the real dataset i have signer NDA for
>>> my company so that i can share the original data file, Actually I'm
>>> working
>>> on a market basket analysis task but not able to convert my existing data
>>> table to appropriate format so that i can apply Apriori algorithm using
>>> R,
>>> and this is very important me to get it done because I'm an intern and
>>> if i
>>> won't get it done they will not  going to hire me as a full-time
>>> employee.
>>> i tried everything by myself but not able to get it done.
>>> your precious 10-15 can save my upcoming years. so please if you can
>>> please
>>> help me through this.
>>> i want another dataset based on first two dataset i have mentioned .
>>>
>>> Thanks
>>>
>>> On 30 August 2017 at 12:49, PIKAL Petr  wrote:
>>>
>>> > Hi
>>> >
>>> > It seems to me like homework, there is no homework policy on this help
>>> > list.
>>> >
>>> > What do you want to do with your table 3? It seems to me futile.
>>> >
>>> > Anyway, some combination of melt, merge, cast and regular expressions
>>> > could be employed in such task, but it could be rather tricky.
>>> >
>>> > But be aware that
>>> >
>>> > Suger does not match sugar (I wonder that sugar is dairy product)
>>> >
>>> > and you mix uppercase and lowercase letters which could be also
>>> > problematic, when matching words.
>>> >
>>> > Cheers
>>> > Petr
>>> >
>>> > > -Original Message-
>>> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
>>> Hemant
>>> > Sain
>>> > > Sent: Wednesday, August 30, 2017 8:28 AM
>>> > > To: r-help@r-project.org
>>> > > Subject: [R] Dataframe Manipulation
>>> > >
>>> > > i want to do a market basket analysis and I’m trying to create a
>>> dataset
>>> > for that
>>> > > i have two tables, one table contains daily transaction of products
>>> in
>>> > which
>>> > > each row of table shows item purchased by the customer, The second
>>> table
>>> > > contains parent group under those products are fallen, for example
>>> under
>>> > fruit
>>> > > category there are several fruits like mango, banana, apple etc.
>>> > > i want to create a third table in which parent group are mentioned as
>>> > header
>>> > > which can be extracted from Table 2, and all the rows represent
>>> > transaction of
>>> > > products
>>> > >
>>> > > with their names, and if there is no transaction for any parent
>>> category
>>> > then
>>> > > the cell supposed to fill as NA. please help me 

Re: [R] Dataframe Manipulation

2017-08-30 Thread Ulrik Stervbo
Hi Hemant,

Does this help you along?

table_1 <- textConnection("Item_1;Item_2;Item_3
1KG banana;300ML milk;1kg sugar
2Large Corona_Beer;2pack Fries;
2 Lux_Soap;1kg sugar;")

table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors =
FALSE, check.names = FALSE)

table_2 <-
textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
Products
Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar
Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")

table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors =
FALSE, check.names = FALSE)

library(tidyr)
library(dplyr)

table_2 <- gather(table_2, "Category", "Item")

table_1 <- gather(table_1, "Foo", "Item") %>%
  filter(!is.na(Item))

table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"),
sep = " ")

table_3 <- left_join(table_1, table_2, by = "Item") %>%
  mutate(Item = paste(Quantity, Item)) %>%
  select(-Quantity)

table_3 %>%
  group_by(Foo, Category) %>%
  summarise(Item = paste(Item, collapse = ", ")) %>%
  spread(key = "Category", value = "Item")

You need to figure out how to handle words written with different cases and
how to get the quantity in an universal way. For the code above, I
corrected these things by hand in the example data.

HTH
Ulrik

On Wed, 30 Aug 2017 at 10:16 Hemant Sain  wrote:

> Hey PIKAL,
> It's not a homework neithe that is the real dataset i have signer NDA for
> my company so that i can share the original data file, Actually I'm working
> on a market basket analysis task but not able to convert my existing data
> table to appropriate format so that i can apply Apriori algorithm using R,
> and this is very important me to get it done because I'm an intern and if i
> won't get it done they will not  going to hire me as a full-time employee.
> i tried everything by myself but not able to get it done.
> your precious 10-15 can save my upcoming years. so please if you can please
> help me through this.
> i want another dataset based on first two dataset i have mentioned .
>
> Thanks
>
> On 30 August 2017 at 12:49, PIKAL Petr  wrote:
>
> > Hi
> >
> > It seems to me like homework, there is no homework policy on this help
> > list.
> >
> > What do you want to do with your table 3? It seems to me futile.
> >
> > Anyway, some combination of melt, merge, cast and regular expressions
> > could be employed in such task, but it could be rather tricky.
> >
> > But be aware that
> >
> > Suger does not match sugar (I wonder that sugar is dairy product)
> >
> > and you mix uppercase and lowercase letters which could be also
> > problematic, when matching words.
> >
> > Cheers
> > Petr
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hemant
> > Sain
> > > Sent: Wednesday, August 30, 2017 8:28 AM
> > > To: r-help@r-project.org
> > > Subject: [R] Dataframe Manipulation
> > >
> > > i want to do a market basket analysis and I’m trying to create a
> dataset
> > for that
> > > i have two tables, one table contains daily transaction of products in
> > which
> > > each row of table shows item purchased by the customer, The second
> table
> > > contains parent group under those products are fallen, for example
> under
> > fruit
> > > category there are several fruits like mango, banana, apple etc.
> > > i want to create a third table in which parent group are mentioned as
> > header
> > > which can be extracted from Table 2, and all the rows represent
> > transaction of
> > > products
> > >
> > > with their names, and if there is no transaction for any parent
> category
> > then
> > > the cell supposed to fill as NA. please help me with R or C/c++ code( R
> > would be
> > >
> > > preferred) here I’m attaching you all three tables for better reference
> > i have
> > > first two tables and i want to get a table like table 3
> > >
> > > Tables are explained in the attached doc.
> > >
> > > --
> > > hemantsain.com
> >
> > 
> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> > určeny pouze jeho adresátům.
> > Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> > neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho
> kopie
> > vymažte ze svého systému.
> > Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento
> email
> > jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> > Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> > či zpožděním přenosu e-mailu.
> >
> > V případě, že je tento e-mail součástí obchodního jednání:
> > - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> > smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> > - a obsahuje-li nabídku, je adresát 

Re: [R] Dataframe Manipulation

2017-08-30 Thread Hemant Sain
Hey PIKAL,
It's not a homework neithe that is the real dataset i have signer NDA for
my company so that i can share the original data file, Actually I'm working
on a market basket analysis task but not able to convert my existing data
table to appropriate format so that i can apply Apriori algorithm using R,
and this is very important me to get it done because I'm an intern and if i
won't get it done they will not  going to hire me as a full-time employee.
i tried everything by myself but not able to get it done.
your precious 10-15 can save my upcoming years. so please if you can please
help me through this.
i want another dataset based on first two dataset i have mentioned .

Thanks

On 30 August 2017 at 12:49, PIKAL Petr  wrote:

> Hi
>
> It seems to me like homework, there is no homework policy on this help
> list.
>
> What do you want to do with your table 3? It seems to me futile.
>
> Anyway, some combination of melt, merge, cast and regular expressions
> could be employed in such task, but it could be rather tricky.
>
> But be aware that
>
> Suger does not match sugar (I wonder that sugar is dairy product)
>
> and you mix uppercase and lowercase letters which could be also
> problematic, when matching words.
>
> Cheers
> Petr
>
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hemant
> Sain
> > Sent: Wednesday, August 30, 2017 8:28 AM
> > To: r-help@r-project.org
> > Subject: [R] Dataframe Manipulation
> >
> > i want to do a market basket analysis and I’m trying to create a dataset
> for that
> > i have two tables, one table contains daily transaction of products in
> which
> > each row of table shows item purchased by the customer, The second table
> > contains parent group under those products are fallen, for example under
> fruit
> > category there are several fruits like mango, banana, apple etc.
> > i want to create a third table in which parent group are mentioned as
> header
> > which can be extracted from Table 2, and all the rows represent
> transaction of
> > products
> >
> > with their names, and if there is no transaction for any parent category
> then
> > the cell supposed to fill as NA. please help me with R or C/c++ code( R
> would be
> >
> > preferred) here I’m attaching you all three tables for better reference
> i have
> > first two tables and i want to get a table like table 3
> >
> > Tables are explained in the attached doc.
> >
> > --
> > hemantsain.com
>
> 
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou
> určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě
> neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie
> vymažte ze svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email
> jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi
> či zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření
> smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout;
> Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany
> příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve
> výslovným dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za
> společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn
> nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto
> emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich
> existence je adresátovi či osobě jím zastoupené známá.
>
> This e-mail and any documents attached to it may be confidential and are
> intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its
> sender. Delete the contents of this e-mail with all attachments and its
> copies from your system.
> If you are not the intended recipient of this e-mail, you are not
> authorized to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage
> caused by modifications of the e-mail or by delay with transfer of the
> email.
>
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a
> contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to
> immediately accept such offer; The sender of this e-mail (offer) excludes
> any acceptance of the offer on the part of the recipient containing any
> amendment or variation.
> - the sender insists on that 

Re: [R] Dataframe Manipulation

2017-08-30 Thread PIKAL Petr
Hi

It seems to me like homework, there is no homework policy on this help list.

What do you want to do with your table 3? It seems to me futile.

Anyway, some combination of melt, merge, cast and regular expressions could be 
employed in such task, but it could be rather tricky.

But be aware that

Suger does not match sugar (I wonder that sugar is dairy product)

and you mix uppercase and lowercase letters which could be also problematic, 
when matching words.

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Hemant Sain
> Sent: Wednesday, August 30, 2017 8:28 AM
> To: r-help@r-project.org
> Subject: [R] Dataframe Manipulation
>
> i want to do a market basket analysis and I’m trying to create a dataset for 
> that
> i have two tables, one table contains daily transaction of products in which
> each row of table shows item purchased by the customer, The second table
> contains parent group under those products are fallen, for example under fruit
> category there are several fruits like mango, banana, apple etc.
> i want to create a third table in which parent group are mentioned as header
> which can be extracted from Table 2, and all the rows represent transaction of
> products
>
> with their names, and if there is no transaction for any parent category then
> the cell supposed to fill as NA. please help me with R or C/c++ code( R would 
> be
>
> preferred) here I’m attaching you all three tables for better reference i have
> first two tables and i want to get a table like table 3
>
> Tables are explained in the attached doc.
>
> --
> hemantsain.com


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dataframe manipulation

2013-12-13 Thread Gang Chen
Perfect! Thanks a lot, A.K!


On Fri, Dec 13, 2013 at 4:21 PM, arun smartpink...@yahoo.com wrote:



 Hi,
 Try:
  d[match(unique(d$fac),d$fac),]
 A.K.


 On Friday, December 13, 2013 4:17 PM, Gang Chen gangch...@gmail.com
 wrote:
 Suppose I have a dataframe defined as

  L3 - LETTERS[1:3]
  (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace
 = TRUE)))

x  y fac
 1  1  1   C
 2  1  2   A
 3  1  3   B
 4  1  4   C
 5  1  5   B
 6  1  6   B
 7  1  7   A
 8  1  8   A
 9  1  9   B
 10 1 10   A

 I want to extract those rows that are the first occurrences for each level
 of factor 'fac', which are basically the first three rows above. How can I
 achieve that? The real dataframe is more complicated than the example
 above, and I can't simply list all the levels of factor 'fac' by
 exhaustibly listing all the levels like the following

 d[d$fac=='A' | d$fac=='B' | d$fac=='C', ]

 Thanks,
 Gang

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe manipulation

2013-12-13 Thread Sarah Goslee
What about:

lapply(levels(d$fac), function(x)head(d[d$fac == x,], 1))


Thanks for the reproducible example. If you put set.seed(123) before
the call to sample, then everyone who tries it will get the same data
frame d.

Sarah


On Fri, Dec 13, 2013 at 4:15 PM, Gang Chen gangch...@gmail.com wrote:
 Suppose I have a dataframe defined as

  L3 - LETTERS[1:3]
  (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace
 = TRUE)))

x  y fac
 1  1  1   C
 2  1  2   A
 3  1  3   B
 4  1  4   C
 5  1  5   B
 6  1  6   B
 7  1  7   A
 8  1  8   A
 9  1  9   B
 10 1 10   A

 I want to extract those rows that are the first occurrences for each level
 of factor 'fac', which are basically the first three rows above. How can I
 achieve that? The real dataframe is more complicated than the example
 above, and I can't simply list all the levels of factor 'fac' by
 exhaustibly listing all the levels like the following

 d[d$fac=='A' | d$fac=='B' | d$fac=='C', ]

 Thanks,
 Gang

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe manipulation

2013-12-13 Thread Gang Chen
Another neat solution! Thanks a lot, Sarah!


On Fri, Dec 13, 2013 at 4:35 PM, Sarah Goslee sarah.gos...@gmail.comwrote:

 What about:

 lapply(levels(d$fac), function(x)head(d[d$fac == x,], 1))


 Thanks for the reproducible example. If you put set.seed(123) before
 the call to sample, then everyone who tries it will get the same data
 frame d.

 Sarah


 On Fri, Dec 13, 2013 at 4:15 PM, Gang Chen gangch...@gmail.com wrote:
  Suppose I have a dataframe defined as
 
   L3 - LETTERS[1:3]
   (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10,
 replace
  = TRUE)))
 
 x  y fac
  1  1  1   C
  2  1  2   A
  3  1  3   B
  4  1  4   C
  5  1  5   B
  6  1  6   B
  7  1  7   A
  8  1  8   A
  9  1  9   B
  10 1 10   A
 
  I want to extract those rows that are the first occurrences for each
 level
  of factor 'fac', which are basically the first three rows above. How can
 I
  achieve that? The real dataframe is more complicated than the example
  above, and I can't simply list all the levels of factor 'fac' by
  exhaustibly listing all the levels like the following
 
  d[d$fac=='A' | d$fac=='B' | d$fac=='C', ]
 
  Thanks,
  Gang

 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe manipulation

2013-12-13 Thread William Dunlap
   d[match(unique(d$fac),d$fac),]

The following does the same thing a little more directly (and quickly)
   d[ !duplicated(d$fac), ]

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Gang Chen
 Sent: Friday, December 13, 2013 1:35 PM
 To: arun
 Cc: R help
 Subject: Re: [R] dataframe manipulation
 
 Perfect! Thanks a lot, A.K!
 
 
 On Fri, Dec 13, 2013 at 4:21 PM, arun smartpink...@yahoo.com wrote:
 
 
 
  Hi,
  Try:
   d[match(unique(d$fac),d$fac),]
  A.K.
 
 
  On Friday, December 13, 2013 4:17 PM, Gang Chen gangch...@gmail.com
  wrote:
  Suppose I have a dataframe defined as
 
   L3 - LETTERS[1:3]
   (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace
  = TRUE)))
 
 x  y fac
  1  1  1   C
  2  1  2   A
  3  1  3   B
  4  1  4   C
  5  1  5   B
  6  1  6   B
  7  1  7   A
  8  1  8   A
  9  1  9   B
  10 1 10   A
 
  I want to extract those rows that are the first occurrences for each level
  of factor 'fac', which are basically the first three rows above. How can I
  achieve that? The real dataframe is more complicated than the example
  above, and I can't simply list all the levels of factor 'fac' by
  exhaustibly listing all the levels like the following
 
  d[d$fac=='A' | d$fac=='B' | d$fac=='C', ]
 
  Thanks,
  Gang
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dataframe manipulation

2013-12-13 Thread arun


Hi,
Try:
 d[match(unique(d$fac),d$fac),]
A.K.


On Friday, December 13, 2013 4:17 PM, Gang Chen gangch...@gmail.com wrote:
Suppose I have a dataframe defined as

     L3 - LETTERS[1:3]
     (d - data.frame(cbind(x = 1, y = 1:10), fac = sample(L3, 10, replace
= TRUE)))

   x  y fac
1  1  1   C
2  1  2   A
3  1  3   B
4  1  4   C
5  1  5   B
6  1  6   B
7  1  7   A
8  1  8   A
9  1  9   B
10 1 10   A

I want to extract those rows that are the first occurrences for each level
of factor 'fac', which are basically the first three rows above. How can I
achieve that? The real dataframe is more complicated than the example
above, and I can't simply list all the levels of factor 'fac' by
exhaustibly listing all the levels like the following

d[d$fac=='A' | d$fac=='B' | d$fac=='C', ]

Thanks,
Gang

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe manipulation

2013-03-30 Thread arun
Hi Adam,

I hope this is what you wanted:
dat1- read.csv(example.csv,sep=\t,stringsAsFactors=FALSE)
 str(dat1)
#'data.frame':    102 obs. of  5 variables:
# $ species  : chr  B. barbastrellus E. nilssonii H. savii M. alcathoe 
...
# $ period   : chr  dusk dusk dusk dusk ...
# $ treatment: chr  control control control control ...
# $ no.files : int  16 1 9 13 1 49 6 3 4 0 ...
# $ expected : logi  NA NA NA NA NA NA ...
 dat2-within(dat1,{expected-ave(no.files,species,treatment,FUN=mean)})

head(dat2)
#   species period treatment no.files  expected
#1 B. barbastrellus   dusk   control   16 14.33
#2 E. nilssonii   dusk   control    1  1.00
#3 H. savii   dusk   control    9  4.67
#4  M. alcathoe   dusk   control   13 13.33
#5   M. bechsteinii   dusk   control    1  5.67
#6  M. brandtii   dusk   control   49 63.00
A.K.







 From: adam bosworth english.fel...@hotmail.com
To: smartpink...@yahoo.com smartpink...@yahoo.com 
Sent: Friday, March 29, 2013 2:29 PM
Subject: RE: Dataframe manipulation
 

 
Hey,

Thanks for the responce. I'm not sure if you messaged me on the forum or 
emailed me, but the only way I could get back to you was email, so hope that's 
alright.

I've attached part of the dataset into a csv file for you to look at.

in cell E2 I've given an example output I'd like in the dataframe, by summing 
values of '16', '25' and '2' from cells D2, D12 and D22 respectively and then 
dividing this by 3.

Thanks for the help, much appreshiated!

 

 Date: Fri, 29 Mar 2013 10:24:31 -0700
 From: smartpink...@yahoo.com
 To: english.fel...@hotmail.com
 Subject: Dataframe manipulation
 
 HI,
 Is it possible to post a small example dataset  and also the output dataset 
 you wanted?  In that way, it will be much more easier to understand what you 
 meant.
 quote author='englishfellow'
 New to R and struggling with dataframe commands, any help will be much
 appreshiated. 
 
 I have an existing dataframe call it 'df' with 4 colums and I have added a
 5th column which I need to fill, the conditions of which are as follows:
 
 for each row in column 5, I need it to look at column 1 and find all data
 which are equal to that present in that row. I then need to look through
 those rows in column 3 and again, find rows where the data are equal to the
 row in question, after it has found these rows I need it to look in column 4
 for values, add them up and divide the sum by 3.
 
 I tried to explain that as best I could, and I can go into more detail if it
 helps clarify what I'm after.
 
 But like I said, any help would be great.
 
 Cheers.
 
 
 /quote
 Quoted from: 
 http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe manipulation

2013-03-30 Thread englishfellow
Fantastic, thanks alot for that!

Take care.

Adam.
 Date: Sat, 30 Mar 2013 01:14:49 -0700
From: ml-node+s789695n466289...@n4.nabble.com
To: english.fel...@hotmail.com
Subject: Re: Dataframe manipulation



Hi Adam,


I hope this is what you wanted:

dat1- read.csv(example.csv,sep=\t,stringsAsFactors=FALSE)

 str(dat1)

#'data.frame':102 obs. of  5 variables:

# $ species  : chr  B. barbastrellus E. nilssonii H. savii M. alcathoe 
...

# $ period   : chr  dusk dusk dusk dusk ...

# $ treatment: chr  control control control control ...

# $ no.files : int  16 1 9 13 1 49 6 3 4 0 ...

# $ expected : logi  NA NA NA NA NA NA ...

 dat2-within(dat1,{expected-ave(no.files,species,treatment,FUN=mean)})


head(dat2)

#   species period treatment no.files  expected

#1 B. barbastrellus   dusk   control   16 14.33

#2 E. nilssonii   dusk   control1  1.00

#3 H. savii   dusk   control9  4.67

#4  M. alcathoe   dusk   control   13 13.33

#5   M. bechsteinii   dusk   control1  5.67

#6  M. brandtii   dusk   control   49 63.00

A.K.









 From: adam bosworth [hidden email]

To: [hidden email] [hidden email] 

Sent: Friday, March 29, 2013 2:29 PM

Subject: RE: Dataframe manipulation

 


 

Hey,


Thanks for the responce. I'm not sure if you messaged me on the forum or 
emailed me, but the only way I could get back to you was email, so hope that's 
alright.


I've attached part of the dataset into a csv file for you to look at.


in cell E2 I've given an example output I'd like in the dataframe, by summing 
values of '16', '25' and '2' from cells D2, D12 and D22 respectively and then 
dividing this by 3.


[[elided Hotmail spam]]


 


 Date: Fri, 29 Mar 2013 10:24:31 -0700

 From: [hidden email]

 To: [hidden email]

 Subject: Dataframe manipulation

 

 HI,

 Is it possible to post a small example dataset  and also the output dataset 
 you wanted?  In that way, it will be much more easier to understand what you 
 meant.

 quote author='englishfellow'

 New to R and struggling with dataframe commands, any help will be much

 appreshiated. 

 

 I have an existing dataframe call it 'df' with 4 colums and I have added a

 5th column which I need to fill, the conditions of which are as follows:

 

 for each row in column 5, I need it to look at column 1 and find all data

 which are equal to that present in that row. I then need to look through

 those rows in column 3 and again, find rows where the data are equal to the

 row in question, after it has found these rows I need it to look in column 4

 for values, add them up and divide the sum by 3.

 

 I tried to explain that as best I could, and I can go into more detail if it

 helps clarify what I'm after.

 

 But like I said, any help would be great.

 

 Cheers.

 

 

 /quote

 Quoted from: 

 http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844.html
__

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.












If you reply to this email, your message will be added to the 
discussion below:

http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844p4662890.html



To unsubscribe from Dataframe manipulation, click here.

NAML
  



--
View this message in context: 
http://r.789695.n4.nabble.com/Dataframe-manipulation-tp4662844p4662894.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe manipulation

2007-12-04 Thread Dimitris Rizopoulos
try this (also look at R-FAQ 7.10):

sapply(df, function (x) as.numeric(levels(x))[as.integer(x)])


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: Antje [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, December 04, 2007 11:46 AM
Subject: [R] Dataframe manipulation


 Hello,

 can anybody help me with this problem?
 I have a dataframe, which contains its values as factors though I 
 have numbers
 but it was read as factors with scan. Now I would like to convert 
 these
 columns (multiple) to a numeric format.


 # this example creates a similar situation

 testdata - as.factor(c(1.1,NA,2.3,5.5))
 testdata2 - as.factor(c(1.7,4.3,8.5,10.0))

 df - data.frame(testdata, testdata2)

 what do I have to do to get the same datafram but with numeric 
 values???

 Antje

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe manipulation

2007-12-04 Thread John Kane
See  R-FAQ # 7-11 for the solution.


Have a look at
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/98227.html
for a discussion of this type of problem and ways to
get around the issue.


--- Antje [EMAIL PROTECTED] wrote:

 Hello,
 
 can anybody help me with this problem?
 I have a dataframe, which contains its values as
 factors though I have numbers 
 but it was read as factors with scan. Now I would
 like to convert these 
 columns (multiple) to a numeric format.
 
 
 # this example creates a similar situation
 
 testdata - as.factor(c(1.1,NA,2.3,5.5))
 testdata2 - as.factor(c(1.7,4.3,8.5,10.0))
 
 df - data.frame(testdata, testdata2)
 
 what do I have to do to get the same datafram but
 with numeric values???
 
 Antje
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dataframe manipulation

2007-12-04 Thread Dimitris Rizopoulos
my original reply was intended for the original version of 'df', in 
which both columns were factors. In your example you have added a 
numeric column, so not exactly the case I've replied for. For your 
example can use the following:

testdata - as.factor(c(1.1,NA,2.3,5.5))
testdata2 - as.factor(c(1.7,4.3,8.5,10.0))
df - data.frame(testdata, testdata2)

df$testdata1 - 
as.numeric(levels(df$testdata))[as.integer(df$testdata)]

fcts - sapply(df, is.factor)
df[fcts] - lapply(df[fcts], function (x) 
as.numeric(levels(x))[as.integer(x)])
df
str(df)


Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: David Winsemius [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, December 04, 2007 4:47 PM
Subject: Re: [R] Dataframe manipulation


 Dimitris Rizopoulos [EMAIL PROTECTED] wrote in
 news:[EMAIL PROTECTED]:

 try this (also look at R-FAQ 7.10):

 sapply(df, function (x) as.numeric(levels(x))[as.integer(x)])

 That looks rather dangerous. By the time I saw your suggestion, I 
 had
 already added an extra variable with:

 df$testdata1-as.numeric(levels(df$testdata))[as.integer(df$testdata)]

 When I tried your suggestion I got no error, but there was also no
 effect. When I tried:

 df2-sapply(df, function (x) as.numeric(levels(x))[as.integer(x)])

 I discovered that the numeric variable, testdata1, had been entirely
 coverted to NA's and str(df2) did not look data.frame-like.

 is.data.frame(df2)
 [1] FALSE

 -- 
 David Winsemius


 - Original Message - 
 From: Antje [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Tuesday, December 04, 2007 11:46 AM
 Subject: [R] Dataframe manipulation


 Hello,

 can anybody help me with this problem?
 I have a dataframe, which contains its values as factors though I
 have numbers
 but it was read as factors with scan. Now I would like to 
 convert
 these
 columns (multiple) to a numeric format.


 # this example creates a similar situation

 testdata - as.factor(c(1.1,NA,2.3,5.5))
 testdata2 - as.factor(c(1.7,4.3,8.5,10.0))

 df - data.frame(testdata, testdata2)

 what do I have to do to get the same datafram but with numeric
 values???


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.