[R] Data Structure to Unnest_tokens in tidytext package

2019-12-10 Thread Sarah Payne
Hi--I'm fairly new to R and trying to do a text mining project on a novel
using the tidytext package. The novel is saved as a plain text document and
I can import it into RStudio just fine. For reference I'm trying to do
something similar to section 1.3 of this tidy text tutorial
, except I'm working with one
novel instead of many. So I import the novel and then run:

"tidy_novel <- quicksandr %>%
unnest_tokens (word, text)"

I get the following error:

Error in check_input(x) :
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

typeof(novel) returns "list" and str(novel) returns

Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 955 obs. of  1
variable:
 $ FOR E. S. I.: chr  "FOR E. S. I." "My old man died in a fine big house.
My ma died in a shack. I wonder where I'm gonna die, Being neither white
nor black?'" "LANGSTON HUGHES" "ONE" ...
 - attr(*, "problems")=Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 8 obs. of
 5 variables:
  ..$ row : int  530 726 733 836 853 886 889 942
  ..$ col : chr  NA NA NA NA ...
  ..$ expected: chr  "1 columns" "1 columns" "1 columns" "1 columns" ...
  ..$ actual  : chr  "2 columns" "2 columns" "2 columns" "2 columns" ...
  ..$ file: chr  "'quicksandr.txt'" "'quicksandr.txt'"
"'quicksandr.txt'" "'quicksandr.txt'" ...
 - attr(*, "spec")=
  .. cols(
  ..   `FOR E. S. I.` = col_character()
  .. )
>

I'm just importing the text file and then trying to run the unnest_tokens
function, so maybe I'm missing a step in between? I seem to need my text
file in a different format, so would appreciate answers on how to do that.
Thanks, and let me know if I need to provide more info!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table and unique seems to behave differently

2019-12-10 Thread William Dunlap via R-help
You can use save(ascii=TRUE,...) to make an ascii-only RData file that you
can include in the mail message.  E.g.,

> x <- c(3.4, 3.4 + 1e-15)
> save(x, ascii=TRUE, file=stdout())
RDA3
A
3
198145
197888
5
UTF-8
1026
1
262153
1
x
14
2
3.4
3.401
254

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Dec 10, 2019 at 8:37 AM Alain Guillet 
wrote:

> Another finding for me today: dput doesn't write exactly the vector that
> creates the problem. I could use an RData file but I think it is forbidden
> in this mailing list...
>
>
> Alain
> 
> De : Chris Evans 
> Envoyé : mardi 10 décembre 2019 15:41
> À : Alain Guillet 
> Cc : r-help@r-project.org 
> Objet : Re: [R] table and unique seems to behave differently
>
> This doesn't answer your question but I get exactly the same vector of
> length 210 with unique(toto) and names(table(toto)) using the same version
> of R that you are and I can't see any obvious reason why you wouldn't but
> when I hit things like that it tends to be that one version is string with
> initial or trailing spaces or a character set issue.  I can't see that
> those apply here but it's all I could imagine without racking my poor old
> brains much more.
>
> Good luck finding the answer!
>
> Chris
>
> - Original Message -
> > From: "Alain Guillet" 
> > To: r-help@r-project.org
> > Sent: Tuesday, 10 December, 2019 09:53:29
> > Subject: [R] table and unique seems to behave differently
>
> > Hi,
> >
> > I have a vector (see below the dput) and I use unique on it to get unique
> > values. If I then sort the result of the vector obtained by unique, I
> see some
> > elements that look like identical. I suspect it could be a matter of
> rounded
> > values but table gives a different result: unlike unique output which
> contains
> > "3.4  3.4", table has only one cell for 3.4.
> >
> > Can anybody know why I get results that look like incoherent between the
> two
> > functions?
> >
> >
> > Best regards,
> > Alain Guillet
> >
> > --
> > platform   x86_64-pc-linux-gnu
> > arch   x86_64
> > os linux-gnu
> > system x86_64, linux-gnu
> > status
> > major  3
> > minor  6.1
> > year   2019
> > month  07
> > day05
> > svn rev76782
> > language   R
> > version.string R version 3.6.1 (2019-07-05)
> > nickname   Action of the Toes
> > --
> >> dput(toto)
> > c(2.5, 2.6, 2.6, 2.6, 2.6, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.8,
> > 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.9, 2.9, 2.9, 2.9, 2.9,
> > 2.9, 2.9, 2.9, 3, 3, 3, 3, 3, 3, 3, 3, 3.1, 3.1, 3.1, 3.1, 3.1,
> > 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2,
> > 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3,
> > 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
> > 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.5,
> > 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.6,
> > 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6,
> > 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
> > 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
> > 3.7, 3.7, 3.7, 3.7, 3.7, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
> > 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
> > 3.8, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
> > 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
> > 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> > 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> > 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> > 4.1, 4.1, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
> > 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
> > 4.2, 4.2, 4.2, 4.2, 4.2, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
> > 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
> > 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4,
> > 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
> > 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
> > 4.4, 4.4, 4.4, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> > 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> > 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> > 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.6,
> > 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> > 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> > 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> > 4.6, 4.6, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> > 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 

Re: [R] table and unique seems to behave differently

2019-12-10 Thread Alain Guillet
Thanks a lot, it answers my question.


Alain

De : Jeff Newmiller 
Envoy� : mardi 10 d�cembre 2019 16:31
� : r-help@r-project.org ; Duncan Murdoch 
; Alain Guillet ; 
r-help@r-project.org 
Objet : Re: [R] table and unique seems to behave differently

I think the question was about table vs unique. Table groups by character 
representation, unique groups by the underlying representation.

On December 10, 2019 7:03:34 AM PST, Duncan Murdoch  
wrote:
>On 10/12/2019 3:53 a.m., Alain Guillet wrote:
>> Hi,
>>
>> I have a vector (see below the dput) and I use unique on it to get
>unique values. If I then sort the result of the vector obtained by
>unique, I see some elements that look like identical. I suspect it
>could be a matter of rounded values but table gives a different result:
>unlike unique output which contains "3.4  3.4", table has only one cell
>for 3.4.
>>
>> Can anybody know why I get results that look like incoherent between
>the two functions?
>
>dput() does some rounding, so it doesn't necessarily reproduce values
>exactly.  For example,
>
>x <- c(3.4, 3.4 + 1e-15)
>unique(x)
>#> [1] 3.4 3.4
>dput(x)
>#> c(3.4, 3.4)
>identical(x, c(3.4, 3.4))
>#> [1] FALSE
>
>If you really want to see exact values, you can use the "hexNumeric"
>option to dput():
>
>dput(x, control = "hexNumeric")
>#> c(0x1.bp+1, 0x1.b3335p+1)
>identical(x, c(0x1.bp+1, 0x1.b3335p+1))
>#> [1] TRUE
>
>Duncan Murdoch
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=02%7C01%7C%7C74bb1eeaeb444a6499e508d77d85fba3%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637115886725989409sdata=mU3K2kH%2FAwxdEQ%2BOWVYBhqNLbWGkWzmtfgx92D1DNF8%3Dreserved=0
>PLEASE do read the posting guide
>https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.htmldata=02%7C01%7C%7C74bb1eeaeb444a6499e508d77d85fba3%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C0%7C0%7C637115886725989409sdata=hTxOssdYb%2FcvvSFQyQZ5GBWkpHzsIrbtqJbgCgW2LPw%3Dreserved=0
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table and unique seems to behave differently

2019-12-10 Thread Duncan Murdoch

On 10/12/2019 10:32 a.m., Sarah Goslee wrote:

Back to the table part of the question, but using Duncan's example.


x <- c(3.4, 3.4 + 1e-15)
unique(x)

[1] 3.4 3.4

table(x)

x
3.4
   2

The question was, why are these different.

table() only works on factors, so it converts the numeric vector to a
factor before tabulation.
factor() tries to do something sensible, and implicitly rounds the numeric data.


factor(x)

[1] 3.4 3.4
Levels: 3.4

Whether you think that is actually sensible or not is up to you, but
if it isn't then you shouldn't use table.

That table uses factors is documented in ?table. A quick read of
?factor didn't find any explicit discussion, other than the
acknowledgement that factor() is lossy in:

  To transform a factor ‘f’ to approximately its
  original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
  and slightly more efficient than ‘as.numeric(as.character(f))’.

You can't even get table() to do what you want by being explicit:


table(factor(x, levels = unique(x)))

Error in `levels<-`(`*tmp*`, value = as.character(levels)) :
   factor level [2] is duplicated


You could get it to agree with unique if you do the string conversion 
yourself.  The first result is ugly:


x <- c(3.4, 3.4 + 1e-15)
tab <- table(sprintf("%a", x))
tab
#>
#> 0x1.bp+1 0x1.b3335p+1
#>11

But if you really want you can make it readable:

names(tab) <- as.numeric(names(tab))
tab
#> 3.4 3.4
#>   1   1

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table and unique seems to behave differently

2019-12-10 Thread Alain Guillet
Another finding for me today: dput doesn't write exactly the vector that 
creates the problem. I could use an RData file but I think it is forbidden in 
this mailing list...


Alain

De : Chris Evans 
Envoy� : mardi 10 d�cembre 2019 15:41
� : Alain Guillet 
Cc : r-help@r-project.org 
Objet : Re: [R] table and unique seems to behave differently

This doesn't answer your question but I get exactly the same vector of length 
210 with unique(toto) and names(table(toto)) using the same version of R that 
you are and I can't see any obvious reason why you wouldn't but when I hit 
things like that it tends to be that one version is string with initial or 
trailing spaces or a character set issue.  I can't see that those apply here 
but it's all I could imagine without racking my poor old brains much more.

Good luck finding the answer!

Chris

- Original Message -
> From: "Alain Guillet" 
> To: r-help@r-project.org
> Sent: Tuesday, 10 December, 2019 09:53:29
> Subject: [R] table and unique seems to behave differently

> Hi,
>
> I have a vector (see below the dput) and I use unique on it to get unique
> values. If I then sort the result of the vector obtained by unique, I see some
> elements that look like identical. I suspect it could be a matter of rounded
> values but table gives a different result: unlike unique output which contains
> "3.4  3.4", table has only one cell for 3.4.
>
> Can anybody know why I get results that look like incoherent between the two
> functions?
>
>
> Best regards,
> Alain Guillet
>
> --
> platform   x86_64-pc-linux-gnu
> arch   x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major  3
> minor  6.1
> year   2019
> month  07
> day05
> svn rev76782
> language   R
> version.string R version 3.6.1 (2019-07-05)
> nickname   Action of the Toes
> --
>> dput(toto)
> c(2.5, 2.6, 2.6, 2.6, 2.6, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.8,
> 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.9, 2.9, 2.9, 2.9, 2.9,
> 2.9, 2.9, 2.9, 3, 3, 3, 3, 3, 3, 3, 3, 3.1, 3.1, 3.1, 3.1, 3.1,
> 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2,
> 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3,
> 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
> 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.5,
> 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.6,
> 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6,
> 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
> 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
> 3.7, 3.7, 3.7, 3.7, 3.7, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
> 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
> 3.8, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
> 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
> 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> 4.1, 4.1, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
> 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
> 4.2, 4.2, 4.2, 4.2, 4.2, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
> 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
> 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4,
> 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
> 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
> 4.4, 4.4, 4.4, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.6,
> 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> 4.6, 4.6, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.8,
> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,
> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,
> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 

Re: [R] table and unique seems to behave differently

2019-12-10 Thread Sarah Goslee
Back to the table part of the question, but using Duncan's example.

> x <- c(3.4, 3.4 + 1e-15)
> unique(x)
[1] 3.4 3.4
> table(x)
x
3.4
  2

The question was, why are these different.

table() only works on factors, so it converts the numeric vector to a
factor before tabulation.
factor() tries to do something sensible, and implicitly rounds the numeric data.

> factor(x)
[1] 3.4 3.4
Levels: 3.4

Whether you think that is actually sensible or not is up to you, but
if it isn't then you shouldn't use table.

That table uses factors is documented in ?table. A quick read of
?factor didn't find any explicit discussion, other than the
acknowledgement that factor() is lossy in:

 To transform a factor ‘f’ to approximately its
 original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
 and slightly more efficient than ‘as.numeric(as.character(f))’.

You can't even get table() to do what you want by being explicit:

> table(factor(x, levels = unique(x)))
Error in `levels<-`(`*tmp*`, value = as.character(levels)) :
  factor level [2] is duplicated


Sarah

On Tue, Dec 10, 2019 at 9:18 AM Alain Guillet
 wrote:
>
> Hi,
>
> I have a vector (see below the dput) and I use unique on it to get unique 
> values. If I then sort the result of the vector obtained by unique, I see 
> some elements that look like identical. I suspect it could be a matter of 
> rounded values but table gives a different result: unlike unique output which 
> contains "3.4  3.4", table has only one cell for 3.4.
>
> Can anybody know why I get results that look like incoherent between the two 
> functions?
>
>
> Best regards,
> Alain Guillet
>

-- 
Sarah Goslee (she/her)
http://www.numberwright.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table and unique seems to behave differently

2019-12-10 Thread Jeff Newmiller
I think the question was about table vs unique. Table groups by character 
representation, unique groups by the underlying representation.

On December 10, 2019 7:03:34 AM PST, Duncan Murdoch  
wrote:
>On 10/12/2019 3:53 a.m., Alain Guillet wrote:
>> Hi,
>> 
>> I have a vector (see below the dput) and I use unique on it to get
>unique values. If I then sort the result of the vector obtained by
>unique, I see some elements that look like identical. I suspect it
>could be a matter of rounded values but table gives a different result:
>unlike unique output which contains "3.4  3.4", table has only one cell
>for 3.4.
>> 
>> Can anybody know why I get results that look like incoherent between
>the two functions?
>
>dput() does some rounding, so it doesn't necessarily reproduce values 
>exactly.  For example,
>
>x <- c(3.4, 3.4 + 1e-15)
>unique(x)
>#> [1] 3.4 3.4
>dput(x)
>#> c(3.4, 3.4)
>identical(x, c(3.4, 3.4))
>#> [1] FALSE
>
>If you really want to see exact values, you can use the "hexNumeric" 
>option to dput():
>
>dput(x, control = "hexNumeric")
>#> c(0x1.bp+1, 0x1.b3335p+1)
>identical(x, c(0x1.bp+1, 0x1.b3335p+1))
>#> [1] TRUE
>
>Duncan Murdoch
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table and unique seems to behave differently

2019-12-10 Thread Duncan Murdoch

On 10/12/2019 3:53 a.m., Alain Guillet wrote:

Hi,

I have a vector (see below the dput) and I use unique on it to get unique values. If I 
then sort the result of the vector obtained by unique, I see some elements that look like 
identical. I suspect it could be a matter of rounded values but table gives a different 
result: unlike unique output which contains "3.4  3.4", table has only one cell 
for 3.4.

Can anybody know why I get results that look like incoherent between the two 
functions?


dput() does some rounding, so it doesn't necessarily reproduce values 
exactly.  For example,


x <- c(3.4, 3.4 + 1e-15)
unique(x)
#> [1] 3.4 3.4
dput(x)
#> c(3.4, 3.4)
identical(x, c(3.4, 3.4))
#> [1] FALSE

If you really want to see exact values, you can use the "hexNumeric" 
option to dput():


dput(x, control = "hexNumeric")
#> c(0x1.bp+1, 0x1.b3335p+1)
identical(x, c(0x1.bp+1, 0x1.b3335p+1))
#> [1] TRUE

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] table and unique seems to behave differently

2019-12-10 Thread Chris Evans
This doesn't answer your question but I get exactly the same vector of length 
210 with unique(toto) and names(table(toto)) using the same version of R that 
you are and I can't see any obvious reason why you wouldn't but when I hit 
things like that it tends to be that one version is string with initial or 
trailing spaces or a character set issue.  I can't see that those apply here 
but it's all I could imagine without racking my poor old brains much more.

Good luck finding the answer!

Chris

- Original Message -
> From: "Alain Guillet" 
> To: r-help@r-project.org
> Sent: Tuesday, 10 December, 2019 09:53:29
> Subject: [R] table and unique seems to behave differently

> Hi,
> 
> I have a vector (see below the dput) and I use unique on it to get unique
> values. If I then sort the result of the vector obtained by unique, I see some
> elements that look like identical. I suspect it could be a matter of rounded
> values but table gives a different result: unlike unique output which contains
> "3.4  3.4", table has only one cell for 3.4.
> 
> Can anybody know why I get results that look like incoherent between the two
> functions?
> 
> 
> Best regards,
> Alain Guillet
> 
> --
> platform   x86_64-pc-linux-gnu
> arch   x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major  3
> minor  6.1
> year   2019
> month  07
> day05
> svn rev76782
> language   R
> version.string R version 3.6.1 (2019-07-05)
> nickname   Action of the Toes
> --
>> dput(toto)
> c(2.5, 2.6, 2.6, 2.6, 2.6, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.8,
> 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.9, 2.9, 2.9, 2.9, 2.9,
> 2.9, 2.9, 2.9, 3, 3, 3, 3, 3, 3, 3, 3, 3.1, 3.1, 3.1, 3.1, 3.1,
> 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2,
> 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3,
> 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
> 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.5,
> 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.6,
> 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6,
> 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
> 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
> 3.7, 3.7, 3.7, 3.7, 3.7, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
> 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
> 3.8, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
> 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
> 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
> 4.1, 4.1, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
> 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
> 4.2, 4.2, 4.2, 4.2, 4.2, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
> 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
> 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4,
> 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
> 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
> 4.4, 4.4, 4.4, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
> 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.6,
> 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
> 4.6, 4.6, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
> 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.8,
> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,
> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,
> 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5.1, 5.1, 5.1,
> 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1,
> 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1,
> 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 

[R] table and unique seems to behave differently

2019-12-10 Thread Alain Guillet
Hi,

I have a vector (see below the dput) and I use unique on it to get unique 
values. If I then sort the result of the vector obtained by unique, I see some 
elements that look like identical. I suspect it could be a matter of rounded 
values but table gives a different result: unlike unique output which contains 
"3.4  3.4", table has only one cell for 3.4.

Can anybody know why I get results that look like incoherent between the two 
functions?


Best regards,
Alain Guillet

--
platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  3
minor  6.1
year   2019
month  07
day05
svn rev76782
language   R
version.string R version 3.6.1 (2019-07-05)
nickname   Action of the Toes
--
> dput(toto)
c(2.5, 2.6, 2.6, 2.6, 2.6, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.8,
2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.8, 2.9, 2.9, 2.9, 2.9, 2.9,
2.9, 2.9, 2.9, 3, 3, 3, 3, 3, 3, 3, 3, 3.1, 3.1, 3.1, 3.1, 3.1,
3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.1, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2,
3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.2, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3,
3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.3, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4,
3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.5,
3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.6,
3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6,
3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.6, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7, 3.7,
3.7, 3.7, 3.7, 3.7, 3.7, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8, 3.8,
3.8, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1,
4.1, 4.1, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2,
4.2, 4.2, 4.2, 4.2, 4.2, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3,
4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4,
4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4,
4.4, 4.4, 4.4, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5,
4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.6,
4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6,
4.6, 4.6, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7,
4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.8,
4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,
4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8,
4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9,
4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5.1, 5.1, 5.1,
5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1,
5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1,
5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1, 5.1,
5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2,
5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2,
5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.2, 5.3, 5.3, 5.3, 5.3, 5.3,
5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3,
5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3,
5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3, 5.3,
5.3, 5.3, 5.3, 5.3, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4,
5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4,
5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4,
5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.4, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5,
5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5,
5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5,
5.5, 5.5, 5.5, 5.5, 5.6, 5.6, 5.6, 5.6, 5.6, 5.6, 5.6, 5.6, 

Re: [R] Having problems with the ifelse and negative numbers

2019-12-10 Thread Marc Girondot via R-help
Here is a test of the different proposed solutions and a new one faster. 
In conclusion, ifelse much be used with caution:


Aini =  runif(100, min=-1,max=1)
library(microbenchmark)
A <- Aini
microbenchmark({B1 <- ifelse( A < 0, sqrt(-A), A )})
# mean = 77.1
A <- Aini
microbenchmark({B2 <- ifelse( A < 0, suppressWarnings(sqrt(-A)), A )})
# mean = 76.53762
A <- Aini
microbenchmark({B3 <- ifelse( A < 0, sqrt(abs(A)), A )})
# mean = 75.26712
A <- Aini
microbenchmark({A[A < 0] <- sqrt(-A[A < 0]);B4 <- A})
# mean = 17.71883

Marc

Le 09/12/2019 à 19:25, Richard M. Heiberger a écrit :

or even simpler
sqrt(abs(A))

On Mon, Dec 9, 2019 at 8:45 AM Eric Berger  wrote:

Hi Bob,
You wrote "the following error message" -
when in fact it is a Warning and not an error message. I think your
code does what you hoped it would do, in the sense it successfully
calculates the sqrt(abs(negativeNumber)), where appropriate.

If you want to run the code without seeing this warning message you can run

ifelse( A < 0, suppressWarnings(sqrt(-A)), A )

and you should be fine.

HTH,
Eric

On Mon, Dec 9, 2019 at 3:18 PM Kevin Thorpe  wrote:

The sqrt(-A) is evaluated for all A. The result returned is conditional on the 
first argument but the other two arguments are evaluated on the entire vector.

Kevin

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016


On 2019-12-09, 7:58 AM, "R-help on behalf of rsherry8" 
 wrote:

 Please consider the following two R statements:
  A =  runif(20, min=-1,max=1)
  ifelse( A < 0, sqrt(-A), A )

 The second statement produces the following error message:
  rt(-A) : NaNs produced

 I understand that you cannot take the square root of a negative number
 but I thought the condition A < 0
 would take care of that issue. It appears not to be.

 What am I missing?

 Thanks,
 Bob

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.