Re: [R] Combining data.frames

2022-03-20 Thread Rui Barradas

Hello,

The two merge below give identical results.
Maybe there was something in your R session?


df3 <- merge(df1, df2, by = c("date", "geo_hash" ), all = TRUE)
df3b <- merge(df1, df2, all = TRUE)
identical(df3, df3b)
#[1] TRUE

Hope this helps,

Rui Barradas

Às 02:05 de 20/03/2022, Jeff Reichman escreveu:

Ok this seems to work correctly

df1 <- data.frame(date = 
as.factor(c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",

"2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3",
"2021-1-4")),
   geo_hash = 
as.factor(c("abc123","abc123","abc456","abc789","abc246","abc123",
"asd123","abc789","abc890","abc123","z12345")),
   ad_id = 
as.factor(c("a12345","b12345","a12345","a12345","c12345",
 
"b12345","b12345","a12345","b12345","a12345","a12345")))
df2 <- data.frame(date = 
as.factor(c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3","2021-1-4")),
   geo_hash = 
as.factor(c("abc123","abc456","abc123","abc789","abc890","w12345")),
   event = 
as.factor(c("shoting","ied","protest","riot","protest","killing")))

df1
df2

#df3 <- merge(df1, df2, all = TRUE)
df3 <- merge(df1, df2, by = c("date", "geo_hash" ), all = TRUE)
df3

-Original Message-
From: Jeff Newmiller 
Sent: Saturday, March 19, 2022 8:55 PM
To: reichm...@sbcglobal.net; Jeff Reichman ; 'Tom Woolman' 

Cc: r-help@r-project.org
Subject: Re: [R] Combining data.frames

by = c("date", "geo_hash" )

On March 19, 2022 6:31:19 PM PDT, Jeff Reichman  wrote:

Yes I'm reading that presently

The closest I've gotten has been

df3 <- merge(df1, df2, all = TRUE)

-----Original Message-
From: Tom Woolman 
Sent: Saturday, March 19, 2022 8:27 PM
To: reichm...@sbcglobal.net
Cc: r-help@r-project.org
Subject: Re: [R] Combining data.frames

You can also do "SQL-like" joins in the tidyverse with dplyr.


On 2022-03-19 21:23, Jeff Reichman wrote:

Evening Tom

Yest I've been playing with the merge function.  But haven't been
able to achieve what I need. Could maybe the way to to and it might
be my syntax

-Original Message-
From: Tom Woolman 
Sent: Saturday, March 19, 2022 8:20 PM
To: reichm...@sbcglobal.net
Cc: r-help@r-project.org
Subject: Re: [R] Combining data.frames

Have you looked at the merge function in base R?

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/me
r
ge


On 2022-03-19 21:15, Jeff Reichman wrote:

R-Help Community

I'm trying to combine two data.frames which each containing 10
columns of which they each share two common fields. Here are two
small test datasets.

df1 <- data.frame(date =
c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",

"2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
   geo_hash =
c("abc123","abc123","abc456","abc789","abc246","abc123",
"asd123","abc789","abc890","abc123"),
   ad_id =
c("a12345","b12345","a12345","a12345","c12345",

"b12345","b12345","a12345","b12345","a12345"))
df2 <- data.frame(date =
c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
   geo_hash =
c("abc123","abc456","abc123","abc789","abc890"),
   event =
c("shoting","ied","protest","riot","protest"))

I'm trying to combine them such that I get a combined data.frames
such as

dategeo_hashad_id   event
1/1/2021abc123  a12345  shoting
1/1/2021abc123  b12345
1/1/2021abc456  a12345  ied
1/1/2021abc789  a12345
1/1/2021abc246  c12345

Jeff

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining data.frames

2022-03-19 Thread Bert Gunter
Merge by the common keys/column names is the default. Te question is likely
what to do with rows that don't  match.  That's  determined by 'all'
settings, which the OP may already have figured out.

On Sat, Mar 19, 2022, 7:16 PM Tom Woolman  wrote:

> I'm trying hard to take tonight off and avoid booting up the laptop and
> launching R... :)   but you need to merge by the primary key(s), e.g.
> the common columns (common IVs) shared between the two dataframes.
>
>
> On 2022-03-19 21:38, Jeff Reichman wrote:
> > Tom
> >
> > Looks like I figured it out. Syntax issue - wrong "all" argument  (I
> > think)
> >
> > -Original Message-
> > From: Tom Woolman 
> > Sent: Saturday, March 19, 2022 8:27 PM
> > To: reichm...@sbcglobal.net
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Combining data.frames
> >
> > You can also do "SQL-like" joins in the tidyverse with dplyr.
> >
> >
> > On 2022-03-19 21:23, Jeff Reichman wrote:
> >> Evening Tom
> >>
> >> Yest I've been playing with the merge function.  But haven't been able
> >> to achieve what I need. Could maybe the way to to and it might be my
> >> syntax
> >>
> >> -Original Message-
> >> From: Tom Woolman 
> >> Sent: Saturday, March 19, 2022 8:20 PM
> >> To: reichm...@sbcglobal.net
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] Combining data.frames
> >>
> >> Have you looked at the merge function in base R?
> >>
> >> https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/mer
> >> ge
> >>
> >>
> >> On 2022-03-19 21:15, Jeff Reichman wrote:
> >>> R-Help Community
> >>>
> >>> I'm trying to combine two data.frames which each containing 10
> >>> columns of which they each share two common fields. Here are two
> >>> small test datasets.
> >>>
> >>> df1 <- data.frame(date =
> >>> c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",
> >>>
> >>> "2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
> >>>   geo_hash =
> >>> c("abc123","abc123","abc456","abc789","abc246","abc123",
> >>>"asd123","abc789","abc890","abc123"),
> >>>   ad_id =
> >>> c("a12345","b12345","a12345","a12345","c12345",
> >>>
> >>> "b12345","b12345","a12345","b12345","a12345"))
> >>> df2 <- data.frame(date =
> >>> c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
> >>>   geo_hash =
> >>> c("abc123","abc456","abc123","abc789","abc890"),
> >>>   event =
> >>> c("shoting","ied","protest","riot","protest"))
> >>>
> >>> I'm trying to combine them such that I get a combined data.frames
> >>> such as
> >>>
> >>> dategeo_hashad_id   event
> >>> 1/1/2021abc123  a12345  shoting
> >>> 1/1/2021abc123  b12345
> >>> 1/1/2021abc456  a12345  ied
> >>> 1/1/2021abc789  a12345
> >>> 1/1/2021abc246  c12345
> >>>
> >>> Jeff
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining data.frames

2022-03-19 Thread Jeff Newmiller
Then show your code so we can focus on what you haven't yet figured out. Have 
you read the examples in the merge help page?

On March 19, 2022 6:23:02 PM PDT, Jeff Reichman  wrote:
>Evening Tom
>
>Yest I've been playing with the merge function.  But haven't been able to
>achieve what I need. Could maybe the way to to and it might be my syntax
>
>-Original Message-
>From: Tom Woolman  
>Sent: Saturday, March 19, 2022 8:20 PM
>To: reichm...@sbcglobal.net
>Cc: r-help@r-project.org
>Subject: Re: [R] Combining data.frames
>
>Have you looked at the merge function in base R?
>
>https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge
>
>
>On 2022-03-19 21:15, Jeff Reichman wrote:
>> R-Help Community
>> 
>> I'm trying to combine two data.frames which each containing 10 columns 
>> of which they each share two common fields. Here are two small test 
>> datasets.
>> 
>> df1 <- data.frame(date =
>> c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",
>> 
>> "2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
>>   geo_hash =
>> c("abc123","abc123","abc456","abc789","abc246","abc123",
>>"asd123","abc789","abc890","abc123"),
>>   ad_id =
>> c("a12345","b12345","a12345","a12345","c12345",
>> 
>> "b12345","b12345","a12345","b12345","a12345"))
>> df2 <- data.frame(date =
>> c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
>>   geo_hash =
>> c("abc123","abc456","abc123","abc789","abc890"),
>>   event =
>> c("shoting","ied","protest","riot","protest"))
>> 
>> I'm trying to combine them such that I get a combined data.frames such 
>> as
>> 
>> date geo_hashad_id   event
>> 1/1/2021 abc123  a12345  shoting
>> 1/1/2021 abc123  b12345
>> 1/1/2021 abc456  a12345  ied
>> 1/1/2021 abc789  a12345
>> 1/1/2021 abc246  c12345
>> 
>> Jeff
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining data.frames

2022-03-19 Thread Jeff Reichman
Jeff

This seems to work 

df3 <- merge(df1, df2, all = TRUE)

When I use either of the by.x, by.y or all.x, all.y arguments  I get really 
weard results.  Simply using the code about appears to work thus far.

-Original Message-
From: Jeff Newmiller  
Sent: Saturday, March 19, 2022 8:51 PM
To: reichm...@sbcglobal.net; Jeff Reichman ; 'Tom 
Woolman' 
Cc: r-help@r-project.org
Subject: Re: [R] Combining data.frames

Then show your code so we can focus on what you haven't yet figured out. Have 
you read the examples in the merge help page?

On March 19, 2022 6:23:02 PM PDT, Jeff Reichman  wrote:
>Evening Tom
>
>Yest I've been playing with the merge function.  But haven't been able 
>to achieve what I need. Could maybe the way to to and it might be my 
>syntax
>
>-Original Message-
>From: Tom Woolman 
>Sent: Saturday, March 19, 2022 8:20 PM
>To: reichm...@sbcglobal.net
>Cc: r-help@r-project.org
>Subject: Re: [R] Combining data.frames
>
>Have you looked at the merge function in base R?
>
>https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merg
>e
>
>
>On 2022-03-19 21:15, Jeff Reichman wrote:
>> R-Help Community
>> 
>> I'm trying to combine two data.frames which each containing 10 
>> columns of which they each share two common fields. Here are two 
>> small test datasets.
>> 
>> df1 <- data.frame(date =
>> c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",
>> 
>> "2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
>>   geo_hash =
>> c("abc123","abc123","abc456","abc789","abc246","abc123",
>>"asd123","abc789","abc890","abc123"),
>>   ad_id =
>> c("a12345","b12345","a12345","a12345","c12345",
>> 
>> "b12345","b12345","a12345","b12345","a12345"))
>> df2 <- data.frame(date =
>> c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
>>   geo_hash =
>> c("abc123","abc456","abc123","abc789","abc890"),
>>   event =
>> c("shoting","ied","protest","riot","protest"))
>> 
>> I'm trying to combine them such that I get a combined data.frames 
>> such as
>> 
>> date geo_hashad_id   event
>> 1/1/2021 abc123  a12345  shoting
>> 1/1/2021 abc123  b12345
>> 1/1/2021 abc456  a12345  ied
>> 1/1/2021 abc789  a12345
>> 1/1/2021 abc246  c12345
>> 
>> Jeff
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide 
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining data.frames

2022-03-19 Thread Jeff Reichman
Yes I'm reading that presently

The closest I've gotten has been

df3 <- merge(df1, df2, all = TRUE)

-Original Message-
From: Tom Woolman  
Sent: Saturday, March 19, 2022 8:27 PM
To: reichm...@sbcglobal.net
Cc: r-help@r-project.org
Subject: Re: [R] Combining data.frames

You can also do "SQL-like" joins in the tidyverse with dplyr.


On 2022-03-19 21:23, Jeff Reichman wrote:
> Evening Tom
> 
> Yest I've been playing with the merge function.  But haven't been able 
> to achieve what I need. Could maybe the way to to and it might be my 
> syntax
> 
> -Original Message-
> From: Tom Woolman 
> Sent: Saturday, March 19, 2022 8:20 PM
> To: reichm...@sbcglobal.net
> Cc: r-help@r-project.org
> Subject: Re: [R] Combining data.frames
> 
> Have you looked at the merge function in base R?
> 
> https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/mer
> ge
> 
> 
> On 2022-03-19 21:15, Jeff Reichman wrote:
>> R-Help Community
>> 
>> I'm trying to combine two data.frames which each containing 10 
>> columns of which they each share two common fields. Here are two 
>> small test datasets.
>> 
>> df1 <- data.frame(date =
>> c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",
>> 
>> "2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
>>   geo_hash =
>> c("abc123","abc123","abc456","abc789","abc246","abc123",
>>"asd123","abc789","abc890","abc123"),
>>   ad_id =
>> c("a12345","b12345","a12345","a12345","c12345",
>> 
>> "b12345","b12345","a12345","b12345","a12345"))
>> df2 <- data.frame(date =
>> c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
>>   geo_hash =
>> c("abc123","abc456","abc123","abc789","abc890"),
>>   event =
>> c("shoting","ied","protest","riot","protest"))
>> 
>> I'm trying to combine them such that I get a combined data.frames 
>> such as
>> 
>> date geo_hashad_id   event
>> 1/1/2021 abc123  a12345  shoting
>> 1/1/2021 abc123  b12345
>> 1/1/2021 abc456  a12345  ied
>> 1/1/2021 abc789  a12345
>> 1/1/2021 abc246  c12345
>> 
>> Jeff
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining data.frames

2022-03-19 Thread Tom Woolman

Have you looked at the merge function in base R?

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge


On 2022-03-19 21:15, Jeff Reichman wrote:

R-Help Community

I'm trying to combine two data.frames which each containing 10 columns 
of
which they each share two common fields. Here are two small test 
datasets.


df1 <- data.frame(date =
c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",

"2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
  geo_hash =
c("abc123","abc123","abc456","abc789","abc246","abc123",
   "asd123","abc789","abc890","abc123"),
  ad_id = 
c("a12345","b12345","a12345","a12345","c12345",

"b12345","b12345","a12345","b12345","a12345"))

df2 <- data.frame(date =
c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
  geo_hash =
c("abc123","abc456","abc123","abc789","abc890"),
  event = 
c("shoting","ied","protest","riot","protest"))


I'm trying to combine them such that I get a combined data.frames such 
as


dategeo_hashad_id   event
1/1/2021abc123  a12345  shoting
1/1/2021abc123  b12345
1/1/2021abc456  a12345  ied
1/1/2021abc789  a12345
1/1/2021abc246  c12345

Jeff

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining data.frames with is.na & match (), two questions

2019-04-23 Thread PIKAL Petr
Hi

Keep posts also to r-help, others could give you different/better solutions.

Regarding ordering, see ?order or ?sort. However this is mainly necessary only 
for plotting or exporting data.

Cheers
Petr

From: Drake Gossi 
Sent: Thursday, April 18, 2019 9:27 PM
To: PIKAL Petr 
Subject: Re: [R] combining data.frames with is.na & match (), two questions

Thanks Pikal,

Your answer was super helpful. I just learned a lot from you. The only thing I 
have to figure out now is how to rearrange the numbers, say, so that 200 is on 
top, and NA is on bottom, or so that the two 100 calories are together. 
Something like that. Perhaps I'll try an ascending/descending function.

Thank you again.

D

On Thu, Apr 18, 2019 at 1:31 AM PIKAL Petr 
mailto:petr.pi...@precheza.cz>> wrote:
Hi

I wonder why such combination is so complicated in your text book.

Having data frames fr1 and fr2

> dput(fr1)
structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
"mango", "pear"), class = "factor"), Calories = c(100L, 100L,
200L)), class = "data.frame", row.names = c("1", "2", "3"))
> dput(fr2)
structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple",
"banana", "kiwi", "orange", "pear"), class = "factor"), Color = structure(c(3L,
4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow"
), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label = 
c("oblong",
"pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1,
0)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5"))
>

> fr1
   Fruit Calories
1 banana  100
2   pear  100
3  mango  200
>

you can use merge to combine those 2 data frames to get either all values from 
both

> merge(fr2, fr1, all=T)
   Fruit  Color  Shape Juice Calories
1  applered  round   1.0   NA
2 banana yellow oblong   0.0  100
3   kiwi  green  round   0.0   NA
4 orange orange  round   1.0   NA
5   pear  green   pear   0.5  100
6  mango  NA  200

just values from data frame with calories

> merge(fr2, fr1, all.y=T)
   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0  100
2   pear  green   pear   0.5  100
3  mango  NA  200

or just values from data frame with colours

> merge(fr2, fr1, all.x=T)
   Fruit  Color  Shape Juice Calories
1  applered  round   1.0   NA
2 banana yellow oblong   0.0  100
3   kiwi  green  round   0.0   NA
4 orange orange  round   1.0   NA
5   pear  green   pear   0.5  100

Cheers
Petr


> -Original Message-
> From: R-help 
> mailto:r-help-boun...@r-project.org>> On Behalf 
> Of Drake Gossi
> Sent: Thursday, April 18, 2019 1:24 AM
> To: r-help@r-project.org<mailto:r-help@r-project.org>
> Subject: [R] combining data.frames with is.na<http://is.na> & match (), two 
> questions
>
> Hello everyone,
>
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton), and
> I'm just having trouble understanding this maneuver.
>
> In sum, I'm trying to combine data in two different data.frames.
>
> This data.frame is called fruitNutr
>
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
>
> And this data.frame is called fruitData
>
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
>
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
>
> fruitData$calories <- NA
>
>
> As a result, I've created a new column for the fruitData data.frame:
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0N/A
> 3 pear green pear 0.5N/A
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
>
> Then:
>
> > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
>   [1]NA   1   2  NA  NA
> > is.na<http://is.na>(index)
>   [1]TRUE   FALSEFALSE   TRUETRUE
> > fruitData$Calories [!is.na<http://is.na>(index)] <- 
> > fruitNutr$Calories[index[!is.na<http://is.na>
> (index)]]
> > fruitData
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
>
> I get what the first 

Re: [R] combining data.frames with is.na & match (), two questions

2019-04-18 Thread Eric Berger
Hi Drake,
Petr's suggestion to use the merge() function is good.
Another (possibly overkill) approach is to use functions from the dplyr
package, which is a fantastic package to get familiar with.
For example, the last alternative that Petr suggests is an example of what
is called a "left join" (meaning, when joining structures x and y,  keep
all the x rows, even if there is no corresponding row for y).
You can do this via dplyr as follows:

dplyr::left_join( fr2, fr1, by="Fruit")

HTH,
Eric


On Thu, Apr 18, 2019 at 11:40 AM PIKAL Petr  wrote:

> Hi
>
> I wonder why such combination is so complicated in your text book.
>
> Having data frames fr1 and fr2
>
> > dput(fr1)
> structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
> "mango", "pear"), class = "factor"), Calories = c(100L, 100L,
> 200L)), class = "data.frame", row.names = c("1", "2", "3"))
> > dput(fr2)
> structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple",
> "banana", "kiwi", "orange", "pear"), class = "factor"), Color =
> structure(c(3L,
> 4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow"
> ), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label =
> c("oblong",
> "pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1,
> 0)), class = "data.frame", row.names = c("1", "2", "3", "4",
> "5"))
> >
>
> > fr1
>Fruit Calories
> 1 banana  100
> 2   pear  100
> 3  mango  200
> >
>
> you can use merge to combine those 2 data frames to get either all values
> from both
>
> > merge(fr2, fr1, all=T)
>Fruit  Color  Shape Juice Calories
> 1  applered  round   1.0   NA
> 2 banana yellow oblong   0.0  100
> 3   kiwi  green  round   0.0   NA
> 4 orange orange  round   1.0   NA
> 5   pear  green   pear   0.5  100
> 6  mango  NA  200
>
> just values from data frame with calories
>
> > merge(fr2, fr1, all.y=T)
>Fruit  Color  Shape Juice Calories
> 1 banana yellow oblong   0.0  100
> 2   pear  green   pear   0.5  100
> 3  mango  NA  200
>
> or just values from data frame with colours
>
> > merge(fr2, fr1, all.x=T)
>Fruit  Color  Shape Juice Calories
> 1  applered  round   1.0   NA
> 2 banana yellow oblong   0.0  100
> 3   kiwi  green  round   0.0   NA
> 4 orange orange  round   1.0   NA
> 5   pear  green   pear   0.5  100
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help  On Behalf Of Drake Gossi
> > Sent: Thursday, April 18, 2019 1:24 AM
> > To: r-help@r-project.org
> > Subject: [R] combining data.frames with is.na & match (), two questions
> >
> > Hello everyone,
> >
> > I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
> and
> > I'm just having trouble understanding this maneuver.
> >
> > In sum, I'm trying to combine data in two different data.frames.
> >
> > This data.frame is called fruitNutr
> >
> > Fruit  Calories
> > 1 banana 100
> > 2 pear 100
> > 3 mango 200
> >
> > And this data.frame is called fruitData
> >
> > Fruit Color Shape Juice
> > 1 apple red round 1
> > 2 banana yellow oblong 0
> > 3 pear green pear 0.5
> > 4 orange orange round 1
> > 5 kiwi green round 0
> >
> > So, as you can see, these two data.frames overlap insofar as they both
> have
> > banana and pear. So, what happens next is the book suggests this:
> >
> > fruitData$calories <- NA
> >
> >
> > As a result, I've created a new column for the fruitData data.frame:
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1N/A
> > 2 banana yellow oblong 0N/A
> > 3 pear green pear 0.5N/A
> > 4 orange orange round 1N/A
> > 5 kiwi green round 0N/A
> >
> > Then:
> >
> > > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
> >   [1]NA   1   2  NA  NA
> > > is.na(index)
> >   [1]TRUE   FALSEFALSE   TRUETRUE
> > > fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> > (index)]]
> > > fruitData
> >
> > Fruit Color Shape Juice Calories
> > 1 apple red round 1N/A
> > 2 banana yellow oblong 0 100
> > 3 pear green pear 0.5 100
> > 4 orange orange round 1N/A
> > 5 kiwi green round 0N/A
> >
> > I get what the first part means, that first part being this:
> > fruitData$Calories [!is.na(index)]
> > go into the fruitData data.frame, specifically into the calories column,
> and only
> > for what's true according to is.na(index). But I just literally can't
> understand
> > this last part.  fruitNutr$Calories[index[!is.na(index)]]
> >
> > Two questions.
> >
> >
> >1. I just literally don't understand how this code works. It does
> work,
> >of course, but I don't know what it's doing, specifically this
> [index[!
> >is.na(index)]] part. Could someone explain it to me like I'm five?
> I'm
> >new at this...
> >2. And then: is there any other way to combine these two data.frames
> so
> >that we get this same 

Re: [R] combining data.frames with is.na & match (), two questions

2019-04-18 Thread PIKAL Petr
Hi

I wonder why such combination is so complicated in your text book.

Having data frames fr1 and fr2

> dput(fr1)
structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana",
"mango", "pear"), class = "factor"), Calories = c(100L, 100L,
200L)), class = "data.frame", row.names = c("1", "2", "3"))
> dput(fr2)
structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple",
"banana", "kiwi", "orange", "pear"), class = "factor"), Color = structure(c(3L,
4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow"
), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label = 
c("oblong",
"pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1,
0)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5"))
>

> fr1
   Fruit Calories
1 banana  100
2   pear  100
3  mango  200
>

you can use merge to combine those 2 data frames to get either all values from 
both

> merge(fr2, fr1, all=T)
   Fruit  Color  Shape Juice Calories
1  applered  round   1.0   NA
2 banana yellow oblong   0.0  100
3   kiwi  green  round   0.0   NA
4 orange orange  round   1.0   NA
5   pear  green   pear   0.5  100
6  mango  NA  200

just values from data frame with calories

> merge(fr2, fr1, all.y=T)
   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0  100
2   pear  green   pear   0.5  100
3  mango  NA  200

or just values from data frame with colours

> merge(fr2, fr1, all.x=T)
   Fruit  Color  Shape Juice Calories
1  applered  round   1.0   NA
2 banana yellow oblong   0.0  100
3   kiwi  green  round   0.0   NA
4 orange orange  round   1.0   NA
5   pear  green   pear   0.5  100

Cheers
Petr


> -Original Message-
> From: R-help  On Behalf Of Drake Gossi
> Sent: Thursday, April 18, 2019 1:24 AM
> To: r-help@r-project.org
> Subject: [R] combining data.frames with is.na & match (), two questions
>
> Hello everyone,
>
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton), and
> I'm just having trouble understanding this maneuver.
>
> In sum, I'm trying to combine data in two different data.frames.
>
> This data.frame is called fruitNutr
>
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
>
> And this data.frame is called fruitData
>
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
>
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
>
> fruitData$calories <- NA
>
>
> As a result, I've created a new column for the fruitData data.frame:
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0N/A
> 3 pear green pear 0.5N/A
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
>
> Then:
>
> > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index
>   [1]NA   1   2  NA  NA
> > is.na(index)
>   [1]TRUE   FALSEFALSE   TRUETRUE
> > fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> (index)]]
> > fruitData
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
>
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column, and 
> only
> for what's true according to is.na(index). But I just literally can't 
> understand
> this last part.  fruitNutr$Calories[index[!is.na(index)]]
>
> Two questions.
>
>
>1. I just literally don't understand how this code works. It does work,
>of course, but I don't know what it's doing, specifically this [index[!
>is.na(index)]] part. Could someone explain it to me like I'm five? I'm
>new at this...
>2. And then: is there any other way to combine these two data.frames so
>that we get this same result? maybe an easier to understand method?
>
> That same result, again, is
>
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
>
>
> Drake
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 

Re: [R] combining data.frames with is.na & match (), two questions

2019-04-18 Thread peter dalgaard
The whole thing is a merge operation, i.e.

> FruitNutr <- read.table(text="
+ Fruit  Calories
+ 1 banana 100
+ 2 pear 100
+ 3 mango 200
+ ")
> FruitData <- read.table(text="
+ Fruit Color Shape Juice
+ 1 apple red round 1
+ 2 banana yellow oblong 0
+ 3 pear green pear 0.5
+ 4 orange orange round 1
+ 5 kiwi green round 0
+ ")
> merge(FruitData, FruitNutr)
   Fruit  Color  Shape Juice Calories
1 banana yellow oblong   0.0  100
2   pear  green   pear   0.5  100
> merge(FruitData, FruitNutr, all.x=TRUE)
   Fruit  Color  Shape Juice Calories
1  applered  round   1.0   NA
2 banana yellow oblong   0.0  100
3   kiwi  green  round   0.0   NA
4 orange orange  round   1.0   NA
5   pear  green   pear   0.5  100

Mind you, merge() comes with its own set of confusing options in the more 
complex cases, which may be why the authors have chosen a more elementary 
approach.

-pd

> On 18 Apr 2019, at 01:24 , Drake Gossi  wrote:
> 
> Hello everyone,
> 
> I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
> and I'm just having trouble understanding this maneuver.
> 
> In sum, I'm trying to combine data in two different data.frames.
> 
> This data.frame is called fruitNutr
> 
> Fruit  Calories
> 1 banana 100
> 2 pear 100
> 3 mango 200
> 
> And this data.frame is called fruitData
> 
> Fruit Color Shape Juice
> 1 apple red round 1
> 2 banana yellow oblong 0
> 3 pear green pear 0.5
> 4 orange orange round 1
> 5 kiwi green round 0
> 
> So, as you can see, these two data.frames overlap insofar as they both have
> banana and pear. So, what happens next is the book suggests this:
> 
> fruitData$calories <- NA
> 
> 
> As a result, I've created a new column for the fruitData data.frame:
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0N/A
> 3 pear green pear 0.5N/A
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
> 
> Then:
> 
>> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
>> index
>  [1]NA   1   2  NA  NA
>> is.na(index)
>  [1]TRUE   FALSEFALSE   TRUETRUE
>> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na
> (index)]]
>> fruitData
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
> 
> I get what the first part means, that first part being this:
> fruitData$Calories [!is.na(index)]
> go into the fruitData data.frame, specifically into the calories column,
> and only for what's true according to is.na(index). But I just literally
> can't understand this last part.  fruitNutr$Calories[index[!is.na(index)]]
> 
> Two questions.
> 
> 
>   1. I just literally don't understand how this code works. It does work,
>   of course, but I don't know what it's doing, specifically this [index[!
>   is.na(index)]] part. Could someone explain it to me like I'm five? I'm
>   new at this...
>   2. And then: is there any other way to combine these two data.frames so
>   that we get this same result? maybe an easier to understand method?
> 
> That same result, again, is
> 
> Fruit Color Shape Juice Calories
> 1 apple red round 1N/A
> 2 banana yellow oblong 0 100
> 3 pear green pear 0.5 100
> 4 orange orange round 1N/A
> 5 kiwi green round 0N/A
> 
> 
> Drake
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining data.frames with is.na & match (), two questions

2019-04-18 Thread Michael Dewey

Dear Drake

See in-line comments

On 18/04/2019 00:24, Drake Gossi wrote:

Hello everyone,

I'm working through this book, *Humanities Data in R* (Arnold & Tilton),
and I'm just having trouble understanding this maneuver.

In sum, I'm trying to combine data in two different data.frames.

This data.frame is called fruitNutr

Fruit  Calories
1 banana 100
2 pear 100
3 mango 200

And this data.frame is called fruitData

Fruit Color Shape Juice
1 apple red round 1
2 banana yellow oblong 0
3 pear green pear 0.5
4 orange orange round 1
5 kiwi green round 0

So, as you can see, these two data.frames overlap insofar as they both have
banana and pear. So, what happens next is the book suggests this:

fruitData$calories <- NA


As a result, I've created a new column for the fruitData data.frame:

Fruit Color Shape Juice Calories
1 apple red round 1N/A
2 banana yellow oblong 0N/A
3 pear green pear 0.5N/A
4 orange orange round 1N/A
5 kiwi green round 0N/A

Then:


index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit)
index

   [1]NA   1   2  NA  NA

is.na(index)

   [1]TRUE   FALSEFALSE   TRUETRUE

fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na

(index)]]

fruitData


Fruit Color Shape Juice Calories
1 apple red round 1N/A
2 banana yellow oblong 0 100
3 pear green pear 0.5 100
4 orange orange round 1N/A
5 kiwi green round 0N/A

I get what the first part means, that first part being this:
fruitData$Calories [!is.na(index)]
go into the fruitData data.frame, specifically into the calories column,
and only for what's true according to is.na(index). But I just literally
can't understand this last part.  fruitNutr$Calories[index[!is.na(index)]]

Two questions.


1. I just literally don't understand how this code works. It does work,
of course, but I don't know what it's doing, specifically this [index[!
is.na(index)]] part. Could someone explain it to me like I'm five? I'm
new at this...


Decompose it from the inside out. So

!is.na(index)

gives you a vector the same length as index which is true if index has a 
value and false if it is NA


index[ something ]

gives you a vector of all the values of index corresponding to something 
being true (in this case). Note this vector may be shorter than 
something if that contains FALSE.


That should help you get started. My personal opinion is that it is much 
clearer with these things to do it in separate stages.


keep <= !is.na(index)
index[keep]

and check the value of keep if it seems to have gone wrong

2. And then: is there any other way to combine these two data.frames so
that we get this same result? maybe an easier to understand method?

That same result, again, is

Fruit Color Shape Juice Calories
1 apple red round 1N/A
2 banana yellow oblong 0 100
3 pear green pear 0.5 100
4 orange orange round 1N/A
5 kiwi green round 0N/A


Drake

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
This email has been checked for viruses by AVG.
https://www.avg.com




--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.