Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Jeff Newmiller
FWIW I use them quite frequently, but not for the purpose of storing 
heterogeneous data... rather for holding complex objects of the same class.

On September 14, 2021 10:25:54 PM PDT, Avi Gross via R-help 
 wrote:
>My apologies. My reply was to Andrew, not Gregg.
>
>Enough damage for one night. Here is hoping we finally understood a question 
>that could have been better phrased. list columns are not normally considered 
>common data structures but quite possibly will be more as time goes on and the 
>tools to handle them become better or at least better understood.
>
>
>-Original Message-
>From: R-help  On Behalf Of Avi Gross via R-help
>Sent: Wednesday, September 15, 2021 1:23 AM
>To: R-help@r-project.org
>Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
>any) column
>
>You are correct, Gregg, I am aware of that trick of asking something to not be 
>evaluated in certain ways.
>
> 
>
>And you can indeed use base R to play with contents of beta as defined above.  
>Here is a sort of incremental demo:
>
> 
>
>> sapply(mydf$beta, is.numeric)
>
>[1] FALSE  TRUE  TRUE FALSE
>
>> !sapply(mydf$beta, is.numeric)
>
>[1]  TRUE FALSE FALSE  TRUE
>
>> keeping <- !sapply(mydf$beta, is.numeric)
>
>> mydf[keeping, ]
>
># A tibble: 2 x 2
>
>alpha beta 
>
>
>
>  1 1 
>
>  2 4 
>
>  > str(mydf[keeping, ])
>
>tibble [2 x 2] (S3: tbl_df/tbl/data.frame)
>
>$ alpha: int [1:2] 1 4
>
>$ beta :List of 2
>
>..$ : chr "Hello"
>
>..$ : chr "bye"
>
> 
>
>Now for the bad news. The original request was for ANY column. But presumably 
>one way to do it, neither efficiently nor the best, would be to loop on the 
>names of all the columns and starting with the original data.frame, whittle 
>away at it column by column and adjust which column you search each time until 
>what is left had nothing numeric anywhere. 
>
> 
>
>Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to 
>evaluate across a row.
>
> 
>
>Using your technique I made the following data.frame:
>
> 
>
>mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")), 
>
>   beta=I(list(1, "second", 3.3, "Lasting")))
>
> 
>
>> mydf
>
>alphabeta
>
>1 first   1
>
>2 2  second
>
>3   3.3 3.3
>
>4  Last Lasting
>
> 
>
>Do we agree only the fourth row should be kept as the others have one or two 
>numeric values?
>
> 
>
>Here is some code I cobbled together that seems to work:
>
> 
>
> 
>
>rowwise(mydf) %>% 
>
>  mutate(alphazoid=!is.numeric(unlist(alpha)), 
>
> betazoid=!is.numeric(unlist(beta))) %>%
>
>  filter(alphazoid & betazoid) -> result
>
> 
>
>str(result)  
>
>print(result)
>
>result[[1,1]]
>
>result[[1,2]]
>
> 
>
>as.data.frame(result)
>
> 
>
>The results are shown below that only the fourth row was kept:
>
> 
>
>> rowwise(mydf) %>%
>
>  +   mutate(alphazoid=!is.numeric(unlist(alpha)), 
>
> +  betazoid=!is.numeric(unlist(beta))) %>%
>
>  +   filter(alphazoid & betazoid) -> result
>
>> 
>
>  > str(result)  
>
>rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame)
>
>$ alpha:List of 1
>
>..$ : chr "Last"
>
>..- attr(*, "class")= chr "AsIs"
>
>$ beta :List of 1
>
>..$ : chr "Lasting"
>
>..- attr(*, "class")= chr "AsIs"
>
>$ alphazoid: logi TRUE
>
>$ betazoid : logi TRUE
>
>- attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)
>
>..$ .rows: list [1:1] 
>
>.. ..$ : int 1
>
>.. ..@ ptype: int(0) 
>
>> print(result)
>
># A tibble: 1 x 4
>
># Rowwise: 
>
>alpha beta  alphazoid betazoid
>
>> > 
>
>  1   TRUE  TRUE
>
>> result[[1,1]]
>
>[[1]]
>
>[1] "Last"
>
> 
>
>> result[[1,2]]
>
>[[1]]
>
>[1] "Lasting"
>
> 
>
>> as.data.frame(result)
>
>alphabeta alphazoid betazoid
>
>1  Last Lasting  TRUE TRUE
>
> 
>
>Of course, the temporary columns for alphazoid and betazoid can trivially be 
>removed.
>
> 
>
> 
>
> 
>
> 
>
>From: Andrew Simmons 
>Sent: Wednesday, September 15, 2021 12:44 AM
>To: Avi Gross 
>Cc: Gregg Powell via R-help 
>Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
>any) column
>
> 
>
>I'd like to point out that base R can handle a list as a data frame column, 
>it's just that you have to make the list of class "AsIs". So in your example
>
> 
>
>temp <- list("Hello", 1, 1.1, "bye")
>
> 
>
>data.frame(alpha = 1:4, beta = I(temp)) 
>
> 
>
>means that column "beta" will still be a list.
>
> 
>
> 
>
>On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help  > wrote:
>
>Calling something a data.frame does not make it a data.frame.
>
>The abbreviated object shown below is a list of singletons. If it is a column 
>in a larger object that is a data.frame, then it is a list column which is 
>valid but can be ticklish to handle within base R but less so in the tidyverse.
>
>For example, if I try to make a data.frame the normal way, the list gets made 
>into multiple columns and copied to each row. Not what was expected. I think 
>some 

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Avi Gross via R-help
My apologies. My reply was to Andrew, not Gregg.

Enough damage for one night. Here is hoping we finally understood a question 
that could have been better phrased. list columns are not normally considered 
common data structures but quite possibly will be more as time goes on and the 
tools to handle them become better or at least better understood.


-Original Message-
From: R-help  On Behalf Of Avi Gross via R-help
Sent: Wednesday, September 15, 2021 1:23 AM
To: R-help@r-project.org
Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
any) column

You are correct, Gregg, I am aware of that trick of asking something to not be 
evaluated in certain ways.

 

And you can indeed use base R to play with contents of beta as defined above.  
Here is a sort of incremental demo:

 

> sapply(mydf$beta, is.numeric)

[1] FALSE  TRUE  TRUE FALSE

> !sapply(mydf$beta, is.numeric)

[1]  TRUE FALSE FALSE  TRUE

> keeping <- !sapply(mydf$beta, is.numeric)

> mydf[keeping, ]

# A tibble: 2 x 2

alpha beta 



  1 1 

  2 4 

  > str(mydf[keeping, ])

tibble [2 x 2] (S3: tbl_df/tbl/data.frame)

$ alpha: int [1:2] 1 4

$ beta :List of 2

..$ : chr "Hello"

..$ : chr "bye"

 

Now for the bad news. The original request was for ANY column. But presumably 
one way to do it, neither efficiently nor the best, would be to loop on the 
names of all the columns and starting with the original data.frame, whittle 
away at it column by column and adjust which column you search each time until 
what is left had nothing numeric anywhere. 

 

Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to 
evaluate across a row.

 

Using your technique I made the following data.frame:

 

mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")), 

   beta=I(list(1, "second", 3.3, "Lasting")))

 

> mydf

alphabeta

1 first   1

2 2  second

3   3.3 3.3

4  Last Lasting

 

Do we agree only the fourth row should be kept as the others have one or two 
numeric values?

 

Here is some code I cobbled together that seems to work:

 

 

rowwise(mydf) %>% 

  mutate(alphazoid=!is.numeric(unlist(alpha)), 

 betazoid=!is.numeric(unlist(beta))) %>%

  filter(alphazoid & betazoid) -> result

 

str(result)  

print(result)

result[[1,1]]

result[[1,2]]

 

as.data.frame(result)

 

The results are shown below that only the fourth row was kept:

 

> rowwise(mydf) %>%

  +   mutate(alphazoid=!is.numeric(unlist(alpha)), 

 +  betazoid=!is.numeric(unlist(beta))) %>%

  +   filter(alphazoid & betazoid) -> result

> 

  > str(result)  

rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame)

$ alpha:List of 1

..$ : chr "Last"

..- attr(*, "class")= chr "AsIs"

$ beta :List of 1

..$ : chr "Lasting"

..- attr(*, "class")= chr "AsIs"

$ alphazoid: logi TRUE

$ betazoid : logi TRUE

- attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)

..$ .rows: list [1:1] 

.. ..$ : int 1

.. ..@ ptype: int(0) 

> print(result)

# A tibble: 1 x 4

# Rowwise: 

alpha beta  alphazoid betazoid

> > 

  1   TRUE  TRUE

> result[[1,1]]

[[1]]

[1] "Last"

 

> result[[1,2]]

[[1]]

[1] "Lasting"

 

> as.data.frame(result)

alphabeta alphazoid betazoid

1  Last Lasting  TRUE TRUE

 

Of course, the temporary columns for alphazoid and betazoid can trivially be 
removed.

 

 

 

 

From: Andrew Simmons 
Sent: Wednesday, September 15, 2021 12:44 AM
To: Avi Gross 
Cc: Gregg Powell via R-help 
Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
any) column

 

I'd like to point out that base R can handle a list as a data frame column, 
it's just that you have to make the list of class "AsIs". So in your example

 

temp <- list("Hello", 1, 1.1, "bye")

 

data.frame(alpha = 1:4, beta = I(temp)) 

 

means that column "beta" will still be a list.

 

 

On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help mailto:r-help@r-project.org> > wrote:

Calling something a data.frame does not make it a data.frame.

The abbreviated object shown below is a list of singletons. If it is a column 
in a larger object that is a data.frame, then it is a list column which is 
valid but can be ticklish to handle within base R but less so in the tidyverse.

For example, if I try to make a data.frame the normal way, the list gets made 
into multiple columns and copied to each row. Not what was expected. I think 
some tidyverse functionality does better.

Like this:

library(tidyverse)
temp=list("Hello", 1, 1.1, "bye")

Now making a data.frame has an odd result:

> mydf=data.frame(alpha=1:4, beta=temp)
> mydf
alpha beta..Hello. beta.1 beta.1.1 beta..bye.
1 1Hello  1  1.1bye
2 2Hello  1  1.1bye
3 3Hello  1  1.1bye
4 4Hello  1  1.1bye

But a tibble handles it:

> mydf=tibble(alpha=1:4, 

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Avi Gross via R-help
You are correct, Gregg, I am aware of that trick of asking something to not be 
evaluated in certain ways.

 

And you can indeed use base R to play with contents of beta as defined above.  
Here is a sort of incremental demo:

 

> sapply(mydf$beta, is.numeric)

[1] FALSE  TRUE  TRUE FALSE

> !sapply(mydf$beta, is.numeric)

[1]  TRUE FALSE FALSE  TRUE

> keeping <- !sapply(mydf$beta, is.numeric)

> mydf[keeping, ]

# A tibble: 2 x 2

alpha beta 



  1 1 

  2 4 

  > str(mydf[keeping, ])

tibble [2 x 2] (S3: tbl_df/tbl/data.frame)

$ alpha: int [1:2] 1 4

$ beta :List of 2

..$ : chr "Hello"

..$ : chr "bye"

 

Now for the bad news. The original request was for ANY column. But presumably 
one way to do it, neither efficiently nor the best, would be to loop on the 
names of all the columns and starting with the original data.frame, whittle 
away at it column by column and adjust which column you search each time until 
what is left had nothing numeric anywhere. 

 

Now if I was using dplyr, I wonder if there is a nice way to use rowwise() to 
evaluate across a row.

 

Using your technique I made the following data.frame:

 

mydf <- data.frame(alpha=I(list("first", 2, 3.3, "Last")), 

   beta=I(list(1, "second", 3.3, "Lasting")))

 

> mydf

alphabeta

1 first   1

2 2  second

3   3.3 3.3

4  Last Lasting

 

Do we agree only the fourth row should be kept as the others have one or two 
numeric values?

 

Here is some code I cobbled together that seems to work:

 

 

rowwise(mydf) %>% 

  mutate(alphazoid=!is.numeric(unlist(alpha)), 

 betazoid=!is.numeric(unlist(beta))) %>%

  filter(alphazoid & betazoid) -> result

 

str(result)  

print(result)

result[[1,1]]

result[[1,2]]

 

as.data.frame(result)

 

The results are shown below that only the fourth row was kept:

 

> rowwise(mydf) %>% 

  +   mutate(alphazoid=!is.numeric(unlist(alpha)), 

 +  betazoid=!is.numeric(unlist(beta))) %>%

  +   filter(alphazoid & betazoid) -> result

> 

  > str(result)  

rowwise_df [1 x 4] (S3: rowwise_df/tbl_df/tbl/data.frame)

$ alpha:List of 1

..$ : chr "Last"

..- attr(*, "class")= chr "AsIs"

$ beta :List of 1

..$ : chr "Lasting"

..- attr(*, "class")= chr "AsIs"

$ alphazoid: logi TRUE

$ betazoid : logi TRUE

- attr(*, "groups")= tibble [1 x 1] (S3: tbl_df/tbl/data.frame)

..$ .rows: list [1:1] 

.. ..$ : int 1

.. ..@ ptype: int(0) 

> print(result)

# A tibble: 1 x 4

# Rowwise: 

alpha beta  alphazoid betazoid

> > 

  1   TRUE  TRUE

> result[[1,1]]

[[1]]

[1] "Last"

 

> result[[1,2]]

[[1]]

[1] "Lasting"

 

> as.data.frame(result)

alphabeta alphazoid betazoid

1  Last Lasting  TRUE TRUE

 

Of course, the temporary columns for alphazoid and betazoid can trivially be 
removed.

 

 

 

 

From: Andrew Simmons  
Sent: Wednesday, September 15, 2021 12:44 AM
To: Avi Gross 
Cc: Gregg Powell via R-help 
Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
any) column

 

I'd like to point out that base R can handle a list as a data frame column, 
it's just that you have to make the list of class "AsIs". So in your example

 

temp <- list("Hello", 1, 1.1, "bye")

 

data.frame(alpha = 1:4, beta = I(temp)) 

 

means that column "beta" will still be a list.

 

 

On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help mailto:r-help@r-project.org> > wrote:

Calling something a data.frame does not make it a data.frame.

The abbreviated object shown below is a list of singletons. If it is a column 
in a larger object that is a data.frame, then it is a list column which is 
valid but can be ticklish to handle within base R but less so in the tidyverse.

For example, if I try to make a data.frame the normal way, the list gets made 
into multiple columns and copied to each row. Not what was expected. I think 
some tidyverse functionality does better.

Like this:

library(tidyverse)
temp=list("Hello", 1, 1.1, "bye")

Now making a data.frame has an odd result:

> mydf=data.frame(alpha=1:4, beta=temp)
> mydf
alpha beta..Hello. beta.1 beta.1.1 beta..bye.
1 1Hello  1  1.1bye
2 2Hello  1  1.1bye
3 3Hello  1  1.1bye
4 4Hello  1  1.1bye

But a tibble handles it:

> mydf=tibble(alpha=1:4, beta=temp)
> mydf
# A tibble: 4 x 2
alpha beta 

  1 1 
  2 2 
  3 3 
  4 4 

So if the data does look like this, with a list column, but access can be 
tricky as subsetting a list with [] returns a list and you need [[]].

I found a somehwhat odd solution like this:

mydf %>%
   filter(!map_lgl(beta, is.numeric)) -> mydf2
# A tibble: 2 x 2
alpha beta 

  1 1 
  2 4 

When I saved that result into mydf2, I got this.

Original:

  > str(mydf)
tibble [4 x 2] (S3: tbl_df/tbl/data.frame)
$ alpha: int [1:4] 1 2 3 4
$ beta :List of 4
..$ : chr 

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Andrew Simmons
I'd like to point out that base R can handle a list as a data frame column,
it's just that you have to make the list of class "AsIs". So in your example

temp <- list("Hello", 1, 1.1, "bye")

data.frame(alpha = 1:4, beta = I(temp))

means that column "beta" will still be a list.



On Wed, Sep 15, 2021, 00:40 Avi Gross via R-help 
wrote:

> Calling something a data.frame does not make it a data.frame.
>
> The abbreviated object shown below is a list of singletons. If it is a
> column in a larger object that is a data.frame, then it is a list column
> which is valid but can be ticklish to handle within base R but less so in
> the tidyverse.
>
> For example, if I try to make a data.frame the normal way, the list gets
> made into multiple columns and copied to each row. Not what was expected. I
> think some tidyverse functionality does better.
>
> Like this:
>
> library(tidyverse)
> temp=list("Hello", 1, 1.1, "bye")
>
> Now making a data.frame has an odd result:
>
> > mydf=data.frame(alpha=1:4, beta=temp)
> > mydf
> alpha beta..Hello. beta.1 beta.1.1 beta..bye.
> 1 1Hello  1  1.1bye
> 2 2Hello  1  1.1bye
> 3 3Hello  1  1.1bye
> 4 4Hello  1  1.1bye
>
> But a tibble handles it:
>
> > mydf=tibble(alpha=1:4, beta=temp)
> > mydf
> # A tibble: 4 x 2
> alpha beta
>  
>   1 1 
>   2 2 
>   3 3 
>   4 4 
>
> So if the data does look like this, with a list column, but access can be
> tricky as subsetting a list with [] returns a list and you need [[]].
>
> I found a somehwhat odd solution like this:
>
> mydf %>%
>filter(!map_lgl(beta, is.numeric)) -> mydf2
> # A tibble: 2 x 2
> alpha beta
>  
>   1 1 
>   2 4 
>
> When I saved that result into mydf2, I got this.
>
> Original:
>
>   > str(mydf)
> tibble [4 x 2] (S3: tbl_df/tbl/data.frame)
> $ alpha: int [1:4] 1 2 3 4
> $ beta :List of 4
> ..$ : chr "Hello"
> ..$ : num 1
> ..$ : num 1.1
> ..$ : chr "bye"
>
> Output when any row with a numeric is removed:
>
> > str(mydf2)
> tibble [2 x 2] (S3: tbl_df/tbl/data.frame)
> $ alpha: int [1:2] 1 4
> $ beta :List of 2
> ..$ : chr "Hello"
> ..$ : chr "bye"
>
> So if you try variations on your code motivated by what I show, good luck.
> I am sure there are many better ways but I repeat, it can be tricky.
>
> -Original Message-
> From: R-help  On Behalf Of Jeff Newmiller
> Sent: Tuesday, September 14, 2021 11:54 PM
> To: Gregg Powell 
> Cc: Gregg Powell via R-help 
> Subject: Re: [R] How to remove all rows that have a numeric in the first
> (or any) column
>
> You cannot apply vectorized operators to list columns... you have to use a
> map function like sapply or purrr::map_lgl to obtain a logical vector by
> running the function once for each list element:
>
> sapply( VPN_Sheet1$HVA, is.numeric )
>
> On September 14, 2021 8:38:35 PM PDT, Gregg Powell <
> g.a.pow...@protonmail.com> wrote:
> >Here is the output:
> >
> >> str(VPN_Sheet1$HVA)
> >List of 2174
> > $ : chr "Email: f...@fff.com"
> > $ : num 1
> > $ : chr "Eloisa Libas"
> > $ : chr "Percival Esquejo"
> > $ : chr "Louchelle Singh"
> > $ : num 2
> > $ : chr "Charisse Anne Tabarno, RN"
> > $ : chr "Sol Amor Mucoy"
> > $ : chr "Josan Moira Paler"
> > $ : num 3
> > $ : chr "Anna Katrina V. Alberto"
> > $ : chr "Nenita Velarde"
> > $ : chr "Eunice Arrances"
> > $ : num 4
> > $ : chr "Catherine Henson"
> > $ : chr "Maria Carla Daya"
> > $ : chr "Renee Ireine Alit"
> > $ : num 5
> > $ : chr "Marol Joseph Domingo - PS"
> > $ : chr "Kissy Andrea Arriesgado"
> > $ : chr "Pia B Baluyut, RN"
> > $ : num 6
> > $ : chr "Gladys Joy Tan"
> > $ : chr "Frances Zarzua"
> > $ : chr "Fairy Jane Nery"
> > $ : num 7
> > $ : chr "Gladys Tijam, RMT"
> > $ : chr "Sarah Jane Aramburo"
> > $ : chr "Eve Mendoza"
> > $ : num 8
> > $ : chr "Gloria Padolino"
> > $ : chr "Joyce Pearl Javier"
> > $ : chr "Ayza Padilla"
> > $ : num 9
> > $ : chr "Walfredson Calderon"
> > $ : chr "Stephanie Anne Militante"
> > $ : chr "Rennua Oquilan"
> > $ : num 10
> > $ : chr "Neil John Nery"
> > $ : chr "Maria Reyna Reyes"
> > $ : chr "Rowella Villegas"
> > $ : num 11
> > $ : chr "Katelyn Mendiola"
> > $ : chr "Maria Riza Mariano"
> > $ : chr "Marie Vallianne Carantes"
> > $ : num 12
> >
> >‐‐‐ Original Message ‐‐‐
> >
> >On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller <
> jdnew...@dcn.davis.ca.us> wrote:
> >
> >> An atomic column of data by design has exactly one mode, so if any
> >> values are non-numeric then the entire column will be non-numeric.
> >> What does
> >>
> >
> >> str(VPN_Sheet1$HVA)
> >>
> >
> >> tell you? It is likely either a factor or character data.
> >>
> >
> >> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help
> r-help@r-project.org wrote:
> >>
> >
> >> > > Stuck on this problem - How does one remove all rows in a dataframe
> that have a numeric in the first (or any) column?
> >> >
> >
> >> > > Seems straight forward 

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Avi Gross via R-help


Calling something a data.frame does not make it a data.frame.

The abbreviated object shown below is a list of singletons. If it is a column 
in a larger object that is a data.frame, then it is a list column which is 
valid but can be ticklish to handle within base R but less so in the tidyverse.

For example, if I try to make a data.frame the normal way, the list gets made 
into multiple columns and copied to each row. Not what was expected. I think 
some tidyverse functionality does better.

Like this:

library(tidyverse)
temp=list("Hello", 1, 1.1, "bye")

Now making a data.frame has an odd result:

> mydf=data.frame(alpha=1:4, beta=temp)
> mydf
alpha beta..Hello. beta.1 beta.1.1 beta..bye.
1 1Hello  1  1.1bye
2 2Hello  1  1.1bye
3 3Hello  1  1.1bye
4 4Hello  1  1.1bye

But a tibble handles it:

> mydf=tibble(alpha=1:4, beta=temp)
> mydf
# A tibble: 4 x 2
alpha beta 

  1 1 
  2 2 
  3 3 
  4 4 

So if the data does look like this, with a list column, but access can be 
tricky as subsetting a list with [] returns a list and you need [[]].

I found a somehwhat odd solution like this:

mydf %>%
   filter(!map_lgl(beta, is.numeric)) -> mydf2 # A tibble: 2 x 2
alpha beta 

  1 1 
  2 4 

When I saved that result into mydf2, I got this.

Original:
  
  > str(mydf)
tibble [4 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:4] 1 2 3 4 $ beta 
:List of 4 ..$ : chr "Hello"
..$ : num 1
..$ : num 1.1
..$ : chr "bye"

Output when any row with a numeric is removed:

> str(mydf2)
tibble [2 x 2] (S3: tbl_df/tbl/data.frame) $ alpha: int [1:2] 1 4 $ beta :List 
of 2 ..$ : chr "Hello"
..$ : chr "bye"

So if you try variations on your code motivated by what I show, good luck. I am 
sure there are many better ways but I repeat, it can be tricky.

-Original Message-
From: R-help  On Behalf Of Jeff Newmiller
Sent: Tuesday, September 14, 2021 11:54 PM
To: Gregg Powell 
Cc: Gregg Powell via R-help 
Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
any) column

You cannot apply vectorized operators to list columns... you have to use a map 
function like sapply or purrr::map_lgl to obtain a logical vector by running 
the function once for each list element:

sapply( VPN_Sheet1$HVA, is.numeric )

On September 14, 2021 8:38:35 PM PDT, Gregg Powell  
wrote:
>Here is the output:
>
>> str(VPN_Sheet1$HVA)
>List of 2174
> $ : chr "Email: f...@fff.com"
> $ : num 1
> $ : chr "Eloisa Libas"
> $ : chr "Percival Esquejo"
> $ : chr "Louchelle Singh"
> $ : num 2
> $ : chr "Charisse Anne Tabarno, RN"
> $ : chr "Sol Amor Mucoy"
> $ : chr "Josan Moira Paler"
> $ : num 3
> $ : chr "Anna Katrina V. Alberto"
> $ : chr "Nenita Velarde"
> $ : chr "Eunice Arrances"
> $ : num 4
> $ : chr "Catherine Henson"
> $ : chr "Maria Carla Daya"
> $ : chr "Renee Ireine Alit"
> $ : num 5
> $ : chr "Marol Joseph Domingo - PS"
> $ : chr "Kissy Andrea Arriesgado"
> $ : chr "Pia B Baluyut, RN"
> $ : num 6
> $ : chr "Gladys Joy Tan"
> $ : chr "Frances Zarzua"
> $ : chr "Fairy Jane Nery"
> $ : num 7
> $ : chr "Gladys Tijam, RMT"
> $ : chr "Sarah Jane Aramburo"
> $ : chr "Eve Mendoza"
> $ : num 8
> $ : chr "Gloria Padolino"
> $ : chr "Joyce Pearl Javier"
> $ : chr "Ayza Padilla"
> $ : num 9
> $ : chr "Walfredson Calderon"
> $ : chr "Stephanie Anne Militante"
> $ : chr "Rennua Oquilan"
> $ : num 10
> $ : chr "Neil John Nery"
> $ : chr "Maria Reyna Reyes"
> $ : chr "Rowella Villegas"
> $ : num 11
> $ : chr "Katelyn Mendiola"
> $ : chr "Maria Riza Mariano"
> $ : chr "Marie Vallianne Carantes"
> $ : num 12
>
>‐‐‐ Original Message ‐‐‐
>
>On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller 
> wrote:
>
>> An atomic column of data by design has exactly one mode, so if any 
>> values are non-numeric then the entire column will be non-numeric.
>> What does
>> 
>
>> str(VPN_Sheet1$HVA)
>> 
>
>> tell you? It is likely either a factor or character data.
>> 
>
>> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help 
>> r-help@r-project.org wrote:
>> 
>
>> > > Stuck on this problem - How does one remove all rows in a dataframe that 
>> > > have a numeric in the first (or any) column?
>> > 
>
>> > > Seems straight forward - but I'm having trouble.
>> > 
>
>> > I've attempted to used:
>> > 
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
>> > 
>
>> > and
>> > 
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
>> > 
>
>> > Neither work - Neither throw an error.
>> > 
>
>> > class(VPN_Sheet1$HVA) returns:
>> > 
>
>> > [1] "list"
>> > 
>
>> > So, the HVA column returns a list.
>> > 
>
>> > > Data looks like the attached screen grab -
>> > 
>
>> > > The ONLY rows I need to delete are the rows where there is a numeric in 
>> > > the HVA column.
>> > 
>
>> > > There are some 5000+ rows in the actual data.
>> > 
>
>> > > Would be grateful for a 

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Avi Gross via R-help
Calling something a data.frame does not make it a data.frame.

The abbreviated object shown below is a list of singletons. If it is a column 
in a larger object that is a data.frame, then it is a list column which is 
valid but can be ticklish to handle within base R but less so in the tidyverse.

For example, if I try to make a data.frame the normal way, the list gets made 
into multiple columns and copied to each row. Not what was expected. I think 
some tidyverse functionality does better.

Like this:

library(tidyverse)
temp=list("Hello", 1, 1.1, "bye")

Now making a data.frame has an odd result:

> mydf=data.frame(alpha=1:4, beta=temp)
> mydf
alpha beta..Hello. beta.1 beta.1.1 beta..bye.
1 1Hello  1  1.1bye
2 2Hello  1  1.1bye
3 3Hello  1  1.1bye
4 4Hello  1  1.1bye

But a tibble handles it:

> mydf=tibble(alpha=1:4, beta=temp)
> mydf
# A tibble: 4 x 2
alpha beta 

  1 1 
  2 2 
  3 3 
  4 4 

So if the data does look like this, with a list column, but access can be 
tricky as subsetting a list with [] returns a list and you need [[]].

I found a somehwhat odd solution like this:

mydf %>%
   filter(!map_lgl(beta, is.numeric)) -> mydf2
# A tibble: 2 x 2
alpha beta 

  1 1 
  2 4 

When I saved that result into mydf2, I got this.

Original:
  
  > str(mydf)
tibble [4 x 2] (S3: tbl_df/tbl/data.frame)
$ alpha: int [1:4] 1 2 3 4
$ beta :List of 4
..$ : chr "Hello"
..$ : num 1
..$ : num 1.1
..$ : chr "bye"

Output when any row with a numeric is removed:

> str(mydf2)
tibble [2 x 2] (S3: tbl_df/tbl/data.frame)
$ alpha: int [1:2] 1 4
$ beta :List of 2
..$ : chr "Hello"
..$ : chr "bye"

So if you try variations on your code motivated by what I show, good luck. I am 
sure there are many better ways but I repeat, it can be tricky.

-Original Message-
From: R-help  On Behalf Of Jeff Newmiller
Sent: Tuesday, September 14, 2021 11:54 PM
To: Gregg Powell 
Cc: Gregg Powell via R-help 
Subject: Re: [R] How to remove all rows that have a numeric in the first (or 
any) column

You cannot apply vectorized operators to list columns... you have to use a map 
function like sapply or purrr::map_lgl to obtain a logical vector by running 
the function once for each list element:

sapply( VPN_Sheet1$HVA, is.numeric )

On September 14, 2021 8:38:35 PM PDT, Gregg Powell  
wrote:
>Here is the output:
>
>> str(VPN_Sheet1$HVA)
>List of 2174
> $ : chr "Email: f...@fff.com"
> $ : num 1
> $ : chr "Eloisa Libas"
> $ : chr "Percival Esquejo"
> $ : chr "Louchelle Singh"
> $ : num 2
> $ : chr "Charisse Anne Tabarno, RN"
> $ : chr "Sol Amor Mucoy"
> $ : chr "Josan Moira Paler"
> $ : num 3
> $ : chr "Anna Katrina V. Alberto"
> $ : chr "Nenita Velarde"
> $ : chr "Eunice Arrances"
> $ : num 4
> $ : chr "Catherine Henson"
> $ : chr "Maria Carla Daya"
> $ : chr "Renee Ireine Alit"
> $ : num 5
> $ : chr "Marol Joseph Domingo - PS"
> $ : chr "Kissy Andrea Arriesgado"
> $ : chr "Pia B Baluyut, RN"
> $ : num 6
> $ : chr "Gladys Joy Tan"
> $ : chr "Frances Zarzua"
> $ : chr "Fairy Jane Nery"
> $ : num 7
> $ : chr "Gladys Tijam, RMT"
> $ : chr "Sarah Jane Aramburo"
> $ : chr "Eve Mendoza"
> $ : num 8
> $ : chr "Gloria Padolino"
> $ : chr "Joyce Pearl Javier"
> $ : chr "Ayza Padilla"
> $ : num 9
> $ : chr "Walfredson Calderon"
> $ : chr "Stephanie Anne Militante"
> $ : chr "Rennua Oquilan"
> $ : num 10
> $ : chr "Neil John Nery"
> $ : chr "Maria Reyna Reyes"
> $ : chr "Rowella Villegas"
> $ : num 11
> $ : chr "Katelyn Mendiola"
> $ : chr "Maria Riza Mariano"
> $ : chr "Marie Vallianne Carantes"
> $ : num 12
>
>‐‐‐ Original Message ‐‐‐
>
>On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller 
> wrote:
>
>> An atomic column of data by design has exactly one mode, so if any 
>> values are non-numeric then the entire column will be non-numeric. 
>> What does
>> 
>
>> str(VPN_Sheet1$HVA)
>> 
>
>> tell you? It is likely either a factor or character data.
>> 
>
>> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help 
>> r-help@r-project.org wrote:
>> 
>
>> > > Stuck on this problem - How does one remove all rows in a dataframe that 
>> > > have a numeric in the first (or any) column?
>> > 
>
>> > > Seems straight forward - but I'm having trouble.
>> > 
>
>> > I've attempted to used:
>> > 
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
>> > 
>
>> > and
>> > 
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
>> > 
>
>> > Neither work - Neither throw an error.
>> > 
>
>> > class(VPN_Sheet1$HVA) returns:
>> > 
>
>> > [1] "list"
>> > 
>
>> > So, the HVA column returns a list.
>> > 
>
>> > > Data looks like the attached screen grab -
>> > 
>
>> > > The ONLY rows I need to delete are the rows where there is a numeric in 
>> > > the HVA column.
>> > 
>
>> > > There are some 5000+ rows in the actual data.
>> > 
>
>> > > Would be grateful for a solution 

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Jeff Newmiller
You cannot apply vectorized operators to list columns... you have to use a map 
function like sapply or purrr::map_lgl to obtain a logical vector by running 
the function once for each list element:

sapply( VPN_Sheet1$HVA, is.numeric )

On September 14, 2021 8:38:35 PM PDT, Gregg Powell  
wrote:
>Here is the output:
>
>> str(VPN_Sheet1$HVA)
>List of 2174
> $ : chr "Email: f...@fff.com"
> $ : num 1
> $ : chr "Eloisa Libas"
> $ : chr "Percival Esquejo"
> $ : chr "Louchelle Singh"
> $ : num 2
> $ : chr "Charisse Anne Tabarno, RN"
> $ : chr "Sol Amor Mucoy"
> $ : chr "Josan Moira Paler"
> $ : num 3
> $ : chr "Anna Katrina V. Alberto"
> $ : chr "Nenita Velarde"
> $ : chr "Eunice Arrances"
> $ : num 4
> $ : chr "Catherine Henson"
> $ : chr "Maria Carla Daya"
> $ : chr "Renee Ireine Alit"
> $ : num 5
> $ : chr "Marol Joseph Domingo - PS"
> $ : chr "Kissy Andrea Arriesgado"
> $ : chr "Pia B Baluyut, RN"
> $ : num 6
> $ : chr "Gladys Joy Tan"
> $ : chr "Frances Zarzua"
> $ : chr "Fairy Jane Nery"
> $ : num 7
> $ : chr "Gladys Tijam, RMT"
> $ : chr "Sarah Jane Aramburo"
> $ : chr "Eve Mendoza"
> $ : num 8
> $ : chr "Gloria Padolino"
> $ : chr "Joyce Pearl Javier"
> $ : chr "Ayza Padilla"
> $ : num 9
> $ : chr "Walfredson Calderon"
> $ : chr "Stephanie Anne Militante"
> $ : chr "Rennua Oquilan"
> $ : num 10
> $ : chr "Neil John Nery"
> $ : chr "Maria Reyna Reyes"
> $ : chr "Rowella Villegas"
> $ : num 11
> $ : chr "Katelyn Mendiola"
> $ : chr "Maria Riza Mariano"
> $ : chr "Marie Vallianne Carantes"
> $ : num 12
>
>‐‐‐ Original Message ‐‐‐
>
>On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller 
> wrote:
>
>> An atomic column of data by design has exactly one mode, so if any values 
>> are non-numeric then the entire column will be non-numeric. What does
>> 
>
>> str(VPN_Sheet1$HVA)
>> 
>
>> tell you? It is likely either a factor or character data.
>> 
>
>> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help 
>> r-help@r-project.org wrote:
>> 
>
>> > > Stuck on this problem - How does one remove all rows in a dataframe that 
>> > > have a numeric in the first (or any) column?
>> > 
>
>> > > Seems straight forward - but I'm having trouble.
>> > 
>
>> > I've attempted to used:
>> > 
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
>> > 
>
>> > and
>> > 
>
>> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
>> > 
>
>> > Neither work - Neither throw an error.
>> > 
>
>> > class(VPN_Sheet1$HVA) returns:
>> > 
>
>> > [1] "list"
>> > 
>
>> > So, the HVA column returns a list.
>> > 
>
>> > > Data looks like the attached screen grab -
>> > 
>
>> > > The ONLY rows I need to delete are the rows where there is a numeric in 
>> > > the HVA column.
>> > 
>
>> > > There are some 5000+ rows in the actual data.
>> > 
>
>> > > Would be grateful for a solution to this problem.
>> > 
>
>> > How to get R to detect whether the value in column 1 is a number so the 
>> > rows with the number values can be deleted?
>> > 
>
>> > > Thanks in advance to any and all willing to help on this problem.
>> > 
>
>> > > Gregg Powell
>> > 
>
>> > > Sierra Vista, AZ
>> 
>
>> --
>> 
>
>> Sent from my phone. Please excuse my brevity.
-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Evaluating lazily 'f<-' ?

2021-09-14 Thread Andrew Simmons
names(x) <- c("some names")

if different from

`names<-`(x, value = c("some names"))

because the second piece of code does not ever call `<-`. The first piece
of code is (approximately) equivalent to

`*tmp*` <- x
`*tmp*` <- `names<-`(`*tmp*`, value = c("some names"))
x <- `*tmp*`

Another example,

y <- `names<-`(x, value = c("some names"))

now y will be equivalent to x if we did

names(x) <- c("some names")

except that the first will not update x, it will still have its old names.

On Mon, Sep 13, 2021 at 4:33 PM Leonard Mada  wrote:

>
> On 9/13/2021 11:28 PM, Andrew Simmons wrote:
>
> In the example you gave : r(x) <- 1
> r(x) is never evaluated, the above calls `r<-`,
> in fact r does not even have to be an existing function.
>
>
> I meant:
>
> '*tmp*' <- x; # "x" is evaluated here;
>
> 'r<-' is called after this step, which makes sense in the case of
> subsetting;
>
>
> But I am wondering if changing this behaviour, when NO subsetting is
> performed, would have any impact.
>
> e.g. names(x) = c("some names");
>
> # would it have any impact to skip the evaluation of "x" and call directly:
>
> 'names<-'(x, value);
>
>
> Leonard
>
>
>
> On Mon, Sep 13, 2021, 16:18 Leonard Mada  wrote:
>
>> Hello,
>>
>>
>> I have found the evaluation: it is described in the section on
>> subsetting. The forced evaluation makes sense for subsetting.
>>
>>
>> On 9/13/2021 9:42 PM, Leonard Mada wrote:
>>
>> Hello Andrew,
>>
>>
>> I try now to understand the evaluation of the expression:
>>
>> e = expression(r(x) <- 1)
>>
>> # parameter named "value" seems to be required;
>> 'r<-' = function(x, value) {print("R");}
>> eval(e, list(x=2))
>> # [1] "R"
>>
>> # both versions work
>> 'r<-' = function(value, x) {print("R");}
>> eval(e, list(x=2))
>> # [1] "R"
>>
>>
>> ### the Expression
>> e[[1]][[1]] # "<-", not "r<-"
>> e[[1]][[2]] # "r(x)"
>>
>>
>> The evaluation of "e" somehow calls "r<-", but evaluates also the
>> argument of r(...). I am still investigating what is actually happening.
>>
>>
>> The forced evaluation is relevant for subsetting, e.g.:
>> expression(r(x)[3] <- 1)
>> expression(r(x)[3] <- 1)[[1]][[2]]
>> # r(x)[3] # the evaluation details are NOT visible in the expression per
>> se;
>> # Note: indeed, it makes sens to first evaluate r(x) and then to perform
>> the subsetting;
>>
>>
>> However, in the case of a non-subsetted expression:
>> r(x) <- 1;
>> It would make sense to evaluate lazily r(x) if no subsetting is involved
>> (more precisely "r<-"(x, value) ).
>>
>> Would this have any impact on the current code?
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>>
>> On 9/13/2021 9:15 PM, Andrew Simmons wrote:
>>
>> R's parser doesn't work the way you're expecting it to. When doing an
>> assignment like:
>>
>>
>> padding(right(df)) <- 1
>>
>>
>> it is broken into small stages. The guide "R Language Definition" claims
>> that the above would be equivalent to:
>>
>>
>> `<-`(df, `padding<-`(df, value = `right<-`(padding(df), value = 1)))
>>
>>
>> but that is not correct, and you can tell by using `substitute` as you
>> were above. There isn't a way to do what you want with the syntax you
>> provided, you'll have to do something different. You could add a `which`
>> argument to each style function, and maybe put the code for `match.arg` in
>> a separate function:
>>
>>
>> match.which <- function (which)
>> match.arg(which, c("bottom", "left", "top", "right"), several.ok = TRUE)
>>
>>
>> padding <- function (x, which)
>> {
>> which <- match.which(which)
>> # more code
>> }
>>
>>
>> border <- function (x, which)
>> {
>> which <- match.which(which)
>> # more code
>> }
>>
>>
>> some_other_style <- function (x, which)
>> {
>> which <- match.which(which)
>> # more code
>> }
>>
>>
>> I hope this helps.
>>
>> On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada  wrote:
>>
>>> Hello Andrew,
>>>
>>>
>>> this could work. I will think about it.
>>>
>>> But I was thinking more generically. Suppose we have a series of
>>> functions:
>>> padding(), border(), some_other_style();
>>> Each of these functions has the parameter "right" (or the group of
>>> parameters c("right", ...)).
>>>
>>>
>>> Then I could design a function right(FUN) that assigns the value to this
>>> parameter and evaluates the function FUN().
>>>
>>>
>>> There are a few ways to do this:
>>> 1.) Other parameters as ...
>>> right(FUN, value, ...) = value; and then pass "..." to FUN.
>>> right(value, FUN, ...) = value; # or is this the syntax? (TODO: explore)
>>>
>>> 2.) Another way:
>>> right(FUN(...other parameters already specified...)) = value;
>>> I wanted to explore this 2nd option: but avoid evaluating FUN, unless
>>> the parameter "right" is injected into the call.
>>>
>>> 3.) Option 3:
>>> The option you mentioned.
>>>
>>>
>>> Independent of the method: there are still weird/unexplained behaviours
>>> when I try the initial code (see the latest mail with the improved code).
>>>
>>>
>>> Sincerely,
>>>
>>>

Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Andrew Simmons
'is.numeric' is a function that returns whether its input is a numeric
vector. It looks like what you want to do is

VPN_Sheet1 <- VPN_Sheet1[!vapply(VPN_Sheet1$HVA, "is.numeric", NA), ]

instead of

VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA), ]

I hope this helps, and see ?vapply if necessary.

On Tue, Sep 14, 2021 at 11:42 PM Gregg Powell via R-help <
r-help@r-project.org> wrote:

> Here is the output:
>
> > str(VPN_Sheet1$HVA)
> List of 2174
>  $ : chr "Email: f...@fff.com"
>  $ : num 1
>  $ : chr "Eloisa Libas"
>  $ : chr "Percival Esquejo"
>  $ : chr "Louchelle Singh"
>  $ : num 2
>  $ : chr "Charisse Anne Tabarno, RN"
>  $ : chr "Sol Amor Mucoy"
>  $ : chr "Josan Moira Paler"
>  $ : num 3
>  $ : chr "Anna Katrina V. Alberto"
>  $ : chr "Nenita Velarde"
>  $ : chr "Eunice Arrances"
>  $ : num 4
>  $ : chr "Catherine Henson"
>  $ : chr "Maria Carla Daya"
>  $ : chr "Renee Ireine Alit"
>  $ : num 5
>  $ : chr "Marol Joseph Domingo - PS"
>  $ : chr "Kissy Andrea Arriesgado"
>  $ : chr "Pia B Baluyut, RN"
>  $ : num 6
>  $ : chr "Gladys Joy Tan"
>  $ : chr "Frances Zarzua"
>  $ : chr "Fairy Jane Nery"
>  $ : num 7
>  $ : chr "Gladys Tijam, RMT"
>  $ : chr "Sarah Jane Aramburo"
>  $ : chr "Eve Mendoza"
>  $ : num 8
>  $ : chr "Gloria Padolino"
>  $ : chr "Joyce Pearl Javier"
>  $ : chr "Ayza Padilla"
>  $ : num 9
>  $ : chr "Walfredson Calderon"
>  $ : chr "Stephanie Anne Militante"
>  $ : chr "Rennua Oquilan"
>  $ : num 10
>  $ : chr "Neil John Nery"
>  $ : chr "Maria Reyna Reyes"
>  $ : chr "Rowella Villegas"
>  $ : num 11
>  $ : chr "Katelyn Mendiola"
>  $ : chr "Maria Riza Mariano"
>  $ : chr "Marie Vallianne Carantes"
>  $ : num 12
>
> ‐‐‐ Original Message ‐‐‐
>
> On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller <
> jdnew...@dcn.davis.ca.us> wrote:
>
> > An atomic column of data by design has exactly one mode, so if any
> values are non-numeric then the entire column will be non-numeric. What does
> >
>
> > str(VPN_Sheet1$HVA)
> >
>
> > tell you? It is likely either a factor or character data.
> >
>
> > On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help
> r-help@r-project.org wrote:
> >
>
> > > > Stuck on this problem - How does one remove all rows in a dataframe
> that have a numeric in the first (or any) column?
> > >
>
> > > > Seems straight forward - but I'm having trouble.
> > >
>
> > > I've attempted to used:
> > >
>
> > > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
> > >
>
> > > and
> > >
>
> > > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
> > >
>
> > > Neither work - Neither throw an error.
> > >
>
> > > class(VPN_Sheet1$HVA) returns:
> > >
>
> > > [1] "list"
> > >
>
> > > So, the HVA column returns a list.
> > >
>
> > > > Data looks like the attached screen grab -
> > >
>
> > > > The ONLY rows I need to delete are the rows where there is a numeric
> in the HVA column.
> > >
>
> > > > There are some 5000+ rows in the actual data.
> > >
>
> > > > Would be grateful for a solution to this problem.
> > >
>
> > > How to get R to detect whether the value in column 1 is a number so
> the rows with the number values can be deleted?
> > >
>
> > > > Thanks in advance to any and all willing to help on this problem.
> > >
>
> > > > Gregg Powell
> > >
>
> > > > Sierra Vista, AZ
> >
>
> > --
> >
>
> > Sent from my phone. Please excuse my
> brevity.__
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Rolf Turner


On Wed, 15 Sep 2021 02:01:53 +
Gregg Powell via R-help  wrote:

> > Stuck on this problem - How does one remove all rows in a dataframe
> > that have a numeric in the first (or any) column?
> > 
> 
> > Seems straight forward - but I'm having trouble.
> > 
> 
> 
> I've attempted to used:
> 
> VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
> 
> and
> 
> VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
> 
> Neither work - Neither throw an error.
> 
> class(VPN_Sheet1$HVA)  returns:
> [1] "list"
> 
> So, the HVA column returns a list.

Do you mean that the HVA column *is* a list? It probably shouldn't be.
It seems very likely that your data are all screwed up.  The first
thing to do is get your data properly organised.  That could be
difficult since you have apparently read them in from an Excel file,
and Excel is a recipe for disaster.

> 
> >
> > Data looks like the attached screen grab -

No attachment came through.  Do read the posting guide.  Most
attachments are stripped by the mail handler.

> > The ONLY rows I need to delete are the rows where there is a
> > numeric in the HVA column.
> > 
> 
> > There are some 5000+ rows in the actual data.
> > 
> 
> > Would be grateful for a solution to this problem.
> 
> How to get R to detect whether the value in column 1 is a number so
> the rows with the number values can be deleted?
> > 
> 
> > Thanks in advance to any and all willing to help on this problem.
> > 
> 
> > Gregg Powell
> > 
> 
> > Sierra Vista, AZ

If there are any non-numeric entries in a column then they *all* have
to be non-numeric.  Some of them *may* be interpretable as being
numeric.

If you apply as.numeric() to a column you'll get NA's for all entries
that *cannot* be interpreted as numeric. So you may want to do something
like (untested, of course):

ok <- is.na(as.numeric(X[,"HVA"]))
X  <- X[ok,]

where "X" is the data frame that you are dealing with.

Good luck.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Gregg Powell via R-help
Here is the output:

> str(VPN_Sheet1$HVA)
List of 2174
 $ : chr "Email: f...@fff.com"
 $ : num 1
 $ : chr "Eloisa Libas"
 $ : chr "Percival Esquejo"
 $ : chr "Louchelle Singh"
 $ : num 2
 $ : chr "Charisse Anne Tabarno, RN"
 $ : chr "Sol Amor Mucoy"
 $ : chr "Josan Moira Paler"
 $ : num 3
 $ : chr "Anna Katrina V. Alberto"
 $ : chr "Nenita Velarde"
 $ : chr "Eunice Arrances"
 $ : num 4
 $ : chr "Catherine Henson"
 $ : chr "Maria Carla Daya"
 $ : chr "Renee Ireine Alit"
 $ : num 5
 $ : chr "Marol Joseph Domingo - PS"
 $ : chr "Kissy Andrea Arriesgado"
 $ : chr "Pia B Baluyut, RN"
 $ : num 6
 $ : chr "Gladys Joy Tan"
 $ : chr "Frances Zarzua"
 $ : chr "Fairy Jane Nery"
 $ : num 7
 $ : chr "Gladys Tijam, RMT"
 $ : chr "Sarah Jane Aramburo"
 $ : chr "Eve Mendoza"
 $ : num 8
 $ : chr "Gloria Padolino"
 $ : chr "Joyce Pearl Javier"
 $ : chr "Ayza Padilla"
 $ : num 9
 $ : chr "Walfredson Calderon"
 $ : chr "Stephanie Anne Militante"
 $ : chr "Rennua Oquilan"
 $ : num 10
 $ : chr "Neil John Nery"
 $ : chr "Maria Reyna Reyes"
 $ : chr "Rowella Villegas"
 $ : num 11
 $ : chr "Katelyn Mendiola"
 $ : chr "Maria Riza Mariano"
 $ : chr "Marie Vallianne Carantes"
 $ : num 12

‐‐‐ Original Message ‐‐‐

On Tuesday, September 14th, 2021 at 8:32 PM, Jeff Newmiller 
 wrote:

> An atomic column of data by design has exactly one mode, so if any values are 
> non-numeric then the entire column will be non-numeric. What does
> 

> str(VPN_Sheet1$HVA)
> 

> tell you? It is likely either a factor or character data.
> 

> On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help 
> r-help@r-project.org wrote:
> 

> > > Stuck on this problem - How does one remove all rows in a dataframe that 
> > > have a numeric in the first (or any) column?
> > 

> > > Seems straight forward - but I'm having trouble.
> > 

> > I've attempted to used:
> > 

> > VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
> > 

> > and
> > 

> > VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
> > 

> > Neither work - Neither throw an error.
> > 

> > class(VPN_Sheet1$HVA) returns:
> > 

> > [1] "list"
> > 

> > So, the HVA column returns a list.
> > 

> > > Data looks like the attached screen grab -
> > 

> > > The ONLY rows I need to delete are the rows where there is a numeric in 
> > > the HVA column.
> > 

> > > There are some 5000+ rows in the actual data.
> > 

> > > Would be grateful for a solution to this problem.
> > 

> > How to get R to detect whether the value in column 1 is a number so the 
> > rows with the number values can be deleted?
> > 

> > > Thanks in advance to any and all willing to help on this problem.
> > 

> > > Gregg Powell
> > 

> > > Sierra Vista, AZ
> 

> --
> 

> Sent from my phone. Please excuse my brevity.

signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Jeff Newmiller
An atomic column of data by design has exactly one mode, so if _any_ values are 
non-numeric then the entire column will be non-numeric. What does

str(VPN_Sheet1$HVA)

tell you? It is likely either a factor or character data.

On September 14, 2021 7:01:53 PM PDT, Gregg Powell via R-help 
 wrote:
>
>
>
>> Stuck on this problem - How does one remove all rows in a dataframe that 
>> have a numeric in the first (or any) column?
>> 
>
>> Seems straight forward - but I'm having trouble.
>> 
>
>
>I've attempted to used:
>
>VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]
>
>and
>
>VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]
>
>Neither work - Neither throw an error.
>
>class(VPN_Sheet1$HVA)  returns:
>[1] "list"
>
>So, the HVA column returns a list.
>
>>
>> Data looks like the attached screen grab -
>> 
>
>> The ONLY rows I need to delete are the rows where there is a numeric in the 
>> HVA column.
>> 
>
>> There are some 5000+ rows in the actual data.
>> 
>
>> Would be grateful for a solution to this problem.
>
>How to get R to detect whether the value in column 1 is a number so the rows 
>with the number values can be deleted?
>> 
>
>> Thanks in advance to any and all willing to help on this problem.
>> 
>
>> Gregg Powell
>> 
>
>> Sierra Vista, AZ
-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to remove all rows that have a numeric in the first (or any) column

2021-09-14 Thread Gregg Powell via R-help



> Stuck on this problem - How does one remove all rows in a dataframe that have 
> a numeric in the first (or any) column?
> 

> Seems straight forward - but I'm having trouble.
> 


I've attempted to used:

VPN_Sheet1 <- VPN_Sheet1[!is.numeric(VPN_Sheet1$HVA),]

and

VPN_Sheet1 <- VPN_Sheet1[!is.integer(VPN_Sheet1$HVA),]

Neither work - Neither throw an error.

class(VPN_Sheet1$HVA)  returns:
[1] "list"

So, the HVA column returns a list.

>
> Data looks like the attached screen grab -
> 

> The ONLY rows I need to delete are the rows where there is a numeric in the 
> HVA column.
> 

> There are some 5000+ rows in the actual data.
> 

> Would be grateful for a solution to this problem.

How to get R to detect whether the value in column 1 is a number so the rows 
with the number values can be deleted?
> 

> Thanks in advance to any and all willing to help on this problem.
> 

> Gregg Powell
> 

> Sierra Vista, AZ

signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pseudoreplication

2021-09-14 Thread Bert Gunter
This should be posted on r-sig-mixed-models, not here. But you should
realize that "equivalent analysis" presumes knowledge of what ASReml
does, so that perhaps the best target of your query is the package
maintainer, not a list concerned with other methods.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Sep 14, 2021 at 10:52 AM James Henson  wrote:
>
> Greetings R Community
> The ASReml-R package will analyze data from experiments with
> pseudoreplications.
>
> Dealing with Pseudo-Replication in Linear Mixed Models
> https://www.vsni.co.uk/case-studies/dealing-with-pseudo-replication-in-linear-mixed-models
>
> Will the ‘lme4’ package return an equivalent analysis of data from
> experiments with pseudoreplications?
> Thank you for your assistance.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Avi Gross via R-help
Rich,

You have helped us understand and at this point, suppose we now are sure
about the way missing info is supplied. What you show is not the same as the
CSV sample earlier but assuming you know that "Eqp" is the one and only way
they signaled bad data.

One choice is to fix the original data before reading into R. Chances are
placing exactly NA in those places, perhaps using a global substitute of
some sort, might do it.

But as Bert noted, R is a very powerful environment and you can use it.

One argument you can use with read.csv() is to tell it "Eqp" is to be
treated as an NA. The substitution may then be made as it is read in AND you
might then notice it is properly read in as a column of doubles.

Suppose you read in this data and make sure the column involved is read as
character strings, instead. You can use any number of tools in base R or
dplyr to replace Eqp with NA such as in a pipeline ... %>%
mutate(fps=ifelse(fps=="Eqp", NA, fps)) %>% ...

The above is one of many ways and of course afterward, you may want to
reconvert the character column back to floating point. I note dplyr can do
both in the same function as it applies them in order:

mutate(fps=ifelse(fps=="Eqp", NA, fps), fps=as.double(fps))

The point is that in many cases, the data must be carefully examined and
cleaned and set up. In some cases, it may also be useful to treat some as
factors as in the hours and minutes. If you continue on your road and hit
ggplot() to make graphs, factors may be useful in various kinds of fine
tuning.

-Original Message-
From: R-help  On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 1:59 PM
To: r-help@r-project.org
Subject: Re: [R] Need fresh eyes to see what I'm missing

On Tue, 14 Sep 2021, Bert Gunter wrote:

> **Don't do this.*** You will make errors. Use fit-for-purpose tools.
> That's what R is for. Also, be careful **how** you "download", as that 
> already may bake in problems.

Bert,

Haven't had downloading errors saving displayed files.

The problem with the velocities data is shown here:
2020-11-24 11:00PST Eqp 
2020-11-24 11:05PST Eqp 
2020-11-24 11:10PST Eqp 
2020-11-24 11:15PST Eqp 
2020-11-24 11:20PST Eqp 
2020-11-24 11:25PST Eqp 
2020-11-24 11:30PST Eqp 
2020-11-24 11:35PST Eqp 
2020-11-24 11:40PST Eqp 
2020-11-24 11:45PST Eqp 
2020-11-24 11:50PST Eqp 
2021-01-08 16:26PST Eqp

Equipment failure during the period shown.

What's the best way to replace these lines? Just remove them or change them
to NA?

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


**Don't do this.*** You will make errors. Use fit-for-purpose tools.
That's what R is for. Also, be careful **how** you "download", as that
already may bake in problems.


Bert,

Haven't had downloading errors saving displayed files.

The problem with the velocities data is shown here:
2020-11-24 11:00	PST	Eqp 
2020-11-24 11:05	PST	Eqp 
2020-11-24 11:10	PST	Eqp 
2020-11-24 11:15	PST	Eqp 
2020-11-24 11:20	PST	Eqp 
2020-11-24 11:25	PST	Eqp 
2020-11-24 11:30	PST	Eqp 
2020-11-24 11:35	PST	Eqp 
2020-11-24 11:40	PST	Eqp 
2020-11-24 11:45	PST	Eqp 
2020-11-24 11:50	PST	Eqp 
2021-01-08 16:26	PST	Eqp


Equipment failure during the period shown.

What's the best way to replace these lines? Just remove them or change them
to NA?

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Bert Gunter
Inline.


On Tue, Sep 14, 2021 at 10:42 AM Rich Shepard  wrote:
>
> On Tue, 14 Sep 2021, Eric Berger wrote:
>
> > My suggestion was not 'to make a difference'. It was to determine whether
> > the NAs or NaNs appear before the dplyr commands. You confirmed that they
> > do. There are 2321 NAs in vel. Bert suggested some ways that an NA might
> > appear.
>
> Eric,
>
> Yes, you're all correct. I've just downloaded the raw data again for mean
> velocieties and suspended sediments. I'll go through them line-by-line and
> look for discrepancies.

**Don't do this.*** You will make errors. Use fit-for-purpose tools.
That's what R is for. Also, be careful **how** you "download", as that
already may bake in problems.

-- Bert
>
> Thanks again,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Input problems of this sort are often caused by stray or extra characters
(commas, dashes, etc.) in the input files, which then can trigger
automatic conversion to character. Excel files are somewhat notorious for
this.


Bert,

Large volume of missing data at the end of last year. See attached plot.

I'll go through the raw data file to see how those missing data are
presented.

Regards,

Rich__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pseudoreplication

2021-09-14 Thread James Henson
Greetings R Community
The ASReml-R package will analyze data from experiments with
pseudoreplications.

Dealing with Pseudo-Replication in Linear Mixed Models
https://www.vsni.co.uk/case-studies/dealing-with-pseudo-replication-in-linear-mixed-models

Will the ‘lme4’ package return an equivalent analysis of data from
experiments with pseudoreplications?
Thank you for your assistance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Eric Berger wrote:


My suggestion was not 'to make a difference'. It was to determine whether
the NAs or NaNs appear before the dplyr commands. You confirmed that they
do. There are 2321 NAs in vel. Bert suggested some ways that an NA might
appear.


Eric,

Yes, you're all correct. I've just downloaded the raw data again for mean
velocieties and suspended sediments. I'll go through them line-by-line and
look for discrepancies.

Thanks again,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Eric Berger
Hi Rich,
My suggestion was not 'to make a difference'. It was to determine
whether the NAs or NaNs appear before the dplyr commands. You
confirmed that they do. There are 2321 NAs in vel. Bert suggested some
ways that an NA might appear.

Best,
Eric

On Tue, Sep 14, 2021 at 6:42 PM Rich Shepard  wrote:
>
> On Tue, 14 Sep 2021, Eric Berger wrote:
>
> > Before you create vel_by_month you can check vel for NAs and NaNs by
> >
> > sum(is.na(vel))
> > sum(unlist(lapply(vel,is.nan)))
>
> Eric,
>
> There should not be any missing values in the data file. Regardless, I added
> those lines to the script and it made no difference.
>
> Running those commands on the R command line showed these results:
> > sum(is.na(vel))
> [1] 2321
> > sum(unlist(lapply(vel,is.nan)))
> [1] 0
>
> Yet the monthly summaries retain the initial line:
> > vel_by_month
> # A tibble: 67 × 3
> # Groups:   year [8]
>  year month   flow
>
>   1 0NA NaN
>
> I've another data set with the same issue (that's 2 out of 5) and I assume
> the source of the problem is the same with both.
>
> The data sets have no NAs or missing values at the end of a line.
>
> Thanks for the ideas,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R Code] Split long names in format.ftable

2021-09-14 Thread Leonard Mada via R-help

Dear List members,


I wrote some code to split long names in format.ftable. I hope it will 
be useful to others as well.



Ideally, this code should be implemented natively in R. I will provide 
in the 2nd part of the mail a concept how to actually implement the code 
in R. This may be interesting to R-devel as well.



### Helper function

# Split the actual names

split.names = function(names, extend=0, justify="Right", blank.rm=FALSE, 
split.ch = "\n", detailed=TRUE) {
    justify = if(is.null(justify)) 0 else pmatch(justify, c("Left", 
"Right"));

    str = strsplit(names, split.ch);
    if(blank.rm) str = lapply(str, function(s) s[nchar(s) > 0]);
    nr  = max(sapply(str, function(s) length(s)));
    nch = lapply(str, function(s) max(nchar(s)));
    chf = function(nch) paste0(rep(" ", nch), collapse="");
    ch0 = sapply(nch, chf);
    mx  = matrix(rep(ch0, each=nr), nrow=nr, ncol=length(names));
    for(nc in seq(length(names))) {
        n = length(str[[nc]]);
        # Justifying
        s = sapply(seq(n), function(nr) paste0(rep(" ", nch[[nc]] - 
nchar(str[[nc]][nr])), collapse=""));
        s = if(justify == 2) paste0(s, str[[nc]]) else 
paste0(str[[nc]], s);

        mx[seq(nr + 1 - length(str[[nc]]), nr) , nc] = s;
    }
    if(extend > 0) {
        mx = cbind(mx, matrix("", nr=nr, ncol=extend));
    }
    if(detailed) attr(mx, "nchar") = unlist(nch);
    return(mx);
}

### ftable with name splitting
# - this code should be ideally integrated inside format.ftable;
ftable2 = function(ftbl, print=TRUE, quote=FALSE, ...) {
    ftbl2 = format(ftbl, quote=quote, ...);
    row.vars = names(attr(ftbl, "row.vars"))
    nr = length(row.vars);
    nms = split.names(row.vars, extend = ncol(ftbl2) - nr);
    ftbl2 = rbind(ftbl2[1,], nms, ftbl2[-c(1,2),]);
    # TODO: update width of factor labels;
    # - new width available in attr(nms, "nchar");
    if(print) {
        cat(t(ftbl2), sep = c(rep(" ", ncol(ftbl2) - 1), "\n"))
    }
    invisible(ftbl2);
}

I have uploaded this code also on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


B.) Detailed Concept
# - I am ignoring any variants;
# - the splitting is actually done in format.ftable;
# - we set only an attribute in ftable;
ftable = function(..., split.ch="\n") {
   [...]
   attr(ftbl, "split.ch") = split.ch; # set an attribute "split.ch"
   return(ftbl);
}

format.ftable(ftbl, ..., split.ch) {
if(is.missing(split.ch)) {
   # check if the split.ch attribute is set and use it;
} else {
   # use the explicitly provided split.ch: if( ! is.null(split.ch))
}
   [...]
}


C.) split.names Function

This function may be useful in other locations as well, particularly to 
split names/labels used in axes and legends in various plots. But I do 
not have much knowledge of the graphics engine in R.



Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Input problems of this sort are often caused by stray or extra characters
(commas, dashes, etc.) in the input files, which then can trigger
automatic conversion to character. Excel files are somewhat notorious for
this.


Bert,

Yes, I'm going to closely review the original data file and work forward
from there. Thanks for your comments; I do appreciate them.

Back when I have more information ... perhaps even a fix.

Best regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Bert Gunter
Input problems of this sort are often caused by stray or extra
characters (commas, dashes, etc.) in the input files, which then can
trigger automatic conversion to character. Excel files are somewhat
notorious for this.

A couple of comments, and then I'll quit, as others should have
greater insight (and may correct any of my errors).

1.
> as.numeric("1,")
[1] NA
Warning message:
NAs introduced by coercion

So if a stray character caused your "numeric" input to be read in as
character, then you converted it with as.numeric() (do not use
as.integer or as.double), you get that error.

2. So I would say that you need to check those columns in your data
frame that were read in as character instead of numeric.  I'd also
check the others with unique() or some such just to make sure they
have the handful of right values.

One way of doing this would be to look for NA's in as.numeric, as
above. But I thought you said you did
this already and found none, so I don't get it. Other approaches would
be to examine your .csv file with ?count.fields or try reading it with
?read.delim. Any discrepancies or errors you get from these may help
you to pinpoint problems like stray characters, to many fields in a
line, etc.

3. As for your "fps as factors" question, note that:
> as.numeric(factor("3"))
[1] 1

So it depends on how you read stuff in. The answer should be "no" with
read.csv(..., stringsAsFactors = FALSE), but I'm not sure what all you
did or what kind of junk in your .csv file may be causing R to misread
the numeric data as character.

As I said, others may be wiser and correct any errors in my "advice."
This is as far as I can go -- and it may already be too far.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )



On Tue, Sep 14, 2021 at 9:01 AM Rich Shepard  wrote:
>
> On Tue, 14 Sep 2021, Bert Gunter wrote:
>
> > Remove all your as.integer() and as.double() coercions. They are
> > unnecessary (unless you are preparing input for C code; also, all R
> > non-integers are double precision) and may be the source of your problems.
>
> Bert,
>
> When I remove coercions the script produces warnings like this:
> 1: In mean.default(fps, na.rm = TRUE) :
>argument is not numeric or logical: returning NA
>
> and str(vel) displays this:
> 'data.frame':   565675 obs. of  6 variables:
>   $ year : chr  "2016" "2016" "2016" "2016" ...
>   $ month: int  3 3 3 3 3 3 3 3 3 3 ...
>   $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
>   $ hour : chr  "12" "12" "12" "12" ...
>   $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
>   $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...
>
> so month, day, and min are recognized as integers but year, hour, and fps
> are seen as characters. I don't understand why.
>
> Regards,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Avi Gross via R-help
Rich,

I have to wonder about how your data was placed in the CSV file based on
what you report.

functions like read.table() (which is called by read.csv()) ultimately make
guesses about what number of columns to expect and what the contents are
likely to be. They may just examine the first N entries and make the most
compatible choice. The fact that it shows this:

'data.frame':   565675 obs. of  6 variables:
  $ year : chr  "2016" "2016" "2016" "2016" ...
  $ month: int  3 3 3 3 3 3 3 3 3 3 ...
  $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
  $ hour : chr  "12" "12" "12" "12" ...
  $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
  $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...

is odd. It suggests somewhere early in the data, it did not say 2016 or some
other entry  as an integer but as "2016" or a word like `missing` and not in
quotes.

Something similar seems to have happened with hour and fps but not the rest.

Nonetheless, you did convert back to what you wanted BUT if a single
anomalous entry remains then as.integer("missing") would return an NA and
as.double("missing") also an NA. So it is wise to check for any unexpected
numbers. If the source cannot be changed, then the R program can filter out
such cases from your data.frame in various ways.

Your way of reading the CSV in was this:

vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
stringsAsFactors = FALSE)

The default is the options you added for header=TRUE and sep="," so that is
harmless. The default now is not to read in strings as Factors. But what you
did not include may be something you can look at given your data may be a
bit off. 

Without the underlying file, we can not trivially diagnose what may be wrong
in it. Do you get any error messages when reading in the file?  You can
specify additional arguments to read.csv() about what, if any, quoting
characters are used, what sequences should be recognized as an NA,
suggestions of what type each column should be assumed to be, what to do
with blank lines, what a comment looks like  and so on. 

One thing I sometimes have had to do is open the original CSV file in EXCEL
and examine it in various ways or even change it and save it again. That is
beyond the scope of this mailing list so if needed, ask me in private. You
have been working on this kind of stuff, but I assume often using other
tools outside R and dplyr.






-Original Message-
From: R-help  On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:49 AM
To: R mailing list 
Subject: Re: [R] Need fresh eyes to see what I'm missing

On Tue, 14 Sep 2021, Bert Gunter wrote:

> Remove all your as.integer() and as.double() coercions. They are 
> unnecessary (unless you are preparing input for C code; also, all R 
> non-integers are double precision) and may be the source of your problems.

Bert,

When I remove coercions the script produces warnings like this:
1: In mean.default(fps, na.rm = TRUE) :
   argument is not numeric or logical: returning NA

and str(vel) displays this:
'data.frame':   565675 obs. of  6 variables:
  $ year : chr  "2016" "2016" "2016" "2016" ...
  $ month: int  3 3 3 3 3 3 3 3 3 3 ...
  $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
  $ hour : chr  "12" "12" "12" "12" ...
  $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
  $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...

so month, day, and min are recognized as integers but year, hour, and fps
are seen as characters. I don't understand why.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Avi Gross via R-help
Rich,

I reproduced your problem on my re-arranging the code the mailer mangled. I 
tried variations like not using pipes or changing what it is grouped by and 
they all show your results on the abbreviated data with the error:

`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.

I think I fixed summarise()  but it makes me wonder if there is an 
inconsistency introduced along the way as what you used is supposed to work and 
has worked for me in the past.

I note the man page for summarise() mentions that the .groups="..." is 
experimental and a tad confusing:

I changed your code to this by telling it to keep the grouping in the output 
the same:

vel_by_month = vel %>%
  group_by(year, month) %>%
  summarise(flow = mean(fps, na.rm = TRUE), .groups="keep")

The change from your code is the addition at the very end of the .groups="keep" 
argument.

Since I used your limited data, this is all I get:

> vel_by_month
# A tibble: 1 x 3
# Groups:   year, month [1]
year month  flow
  
  1  2016 3  1.77

For now, all I did was shut summarise() up.

Not having the rest of your data, the question is where your NA and Nan are 
introduced. If the change I made above does not resolve it, then as others 
suggested, you begin by looking at your data more carefully perhaps starting 
with the .CSV file and then the data structures in R, along the lines of what 
you were shown. I find the table() function useful for categorical data with 
limited choices as it would spit out the anomaly as happening once.

I see your point about needing fresh eyes. My eyes do not see what you did 
wrong but am just following clues you may be ignoring.


-Original Message-
From: R-help  On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:21 AM
To: r-help@r-project.org
Subject: [R] Need fresh eyes to see what I'm missing

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', 
stringsAsFactors = FALSE) vel$year <- as.integer(vel$year) vel$month <- 
as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour <- 
as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <- 
as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly # means 
vel_by_month = vel %>%
 group_by(year, month) %>%
 summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:
> source('vel.R')
`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:
> head(vel)
   year month day hour min  fps
1 2016 3   3   12   0 1.74
2 2016 3   3   12  10 1.75
3 2016 3   3   12  20 1.76
4 2016 3   3   12  30 1.81
5 2016 3   3   12  40 1.79
6 2016 3   3   12  50 1.75

and the resulting grouping:
> vel_by_month
# A tibble: 67 × 3
# Groups:   year [8]
 year month   flow
   
  1 0NA NaN
  2  2016 3   2.40
  3  2016 4   3.00
  4  2016 5   2.86
  5  2016 6   2.51
  6  2016 7   2.18
  7  2016 8   1.89
  8  2016 9   1.38
  9  201610   1.73
10  201611   2.01
# … with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this result.

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your
problems.


Bert,

Are all columns but the fps factors?

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Bert Gunter wrote:


Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your problems.


Bert,

When I remove coercions the script produces warnings like this:
1: In mean.default(fps, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

and str(vel) displays this:
'data.frame':   565675 obs. of  6 variables:
 $ year : chr  "2016" "2016" "2016" "2016" ...
 $ month: int  3 3 3 3 3 3 3 3 3 3 ...
 $ day  : int  3 3 3 3 3 3 3 3 3 3 ...
 $ hour : chr  "12" "12" "12" "12" ...
 $ min  : int  0 10 20 30 40 50 0 10 20 30 ...
 $ fps  : chr  "1.74" "1.75" "1.76" "1.81" ...

so month, day, and min are recognized as integers but year, hour, and fps
are seen as characters. I don't understand why.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

On Tue, 14 Sep 2021, Eric Berger wrote:


Before you create vel_by_month you can check vel for NAs and NaNs by

sum(is.na(vel))
sum(unlist(lapply(vel,is.nan)))


Eric,

There should not be any missing values in the data file. Regardless, I added
those lines to the script and it made no difference.

Running those commands on the R command line showed these results:

sum(is.na(vel))

[1] 2321

sum(unlist(lapply(vel,is.nan)))

[1] 0

Yet the monthly summaries retain the initial line:

vel_by_month

# A tibble: 67 × 3
# Groups:   year [8]
year month   flow
  
 1 0NA NaN

I've another data set with the same issue (that's 2 out of 5) and I assume
the source of the problem is the same with both.

The data sets have no NAs or missing values at the end of a line.

Thanks for the ideas,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Bert Gunter
Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your
problems.

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Sep 14, 2021 at 8:31 AM Eric Berger  wrote:
>
> Before you create vel_by_month you can check vel for NAs and NaNs by
>
> sum(is.na(vel))
> sum(unlist(lapply(vel,is.nan)))
>
> HTH,
> Eric
>
>
> On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard 
> wrote:
>
> > The data file begins this way:
> > year,month,day,hour,min,fps
> > 2016,03,03,12,00,1.74
> > 2016,03,03,12,10,1.75
> > 2016,03,03,12,20,1.76
> > 2016,03,03,12,30,1.81
> > 2016,03,03,12,40,1.79
> > 2016,03,03,12,50,1.75
> > 2016,03,03,13,00,1.78
> > 2016,03,03,13,10,1.81
> >
> > The script to process it:
> > library('tidyverse')
> > vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
> > stringsAsFactors = FALSE)
> > vel$year <- as.integer(vel$year)
> > vel$month <- as.integer(vel$month)
> > vel$day <- as.integer(vel$day)
> > vel$hour <- as.integer(vel$hour)
> > vel$min <- as.integer(vel$min)
> > vel$fps <- as.double(vel$fps, length = 6)
> >
> > # use dplyr to filter() by year, month, day; summarize() to get monthly
> > # means
> > vel_by_month = vel %>%
> >  group_by(year, month) %>%
> >  summarize(flow = mean(fps, na.rm = TRUE))
> >
> > R's display after running the script:
> > > source('vel.R')
> > `summarise()` has grouped output by 'year'. You can override using the
> > `.groups` argument.
> > Warning messages:
> > 1: In eval(ei, envir) : NAs introduced by coercion
> > 2: In eval(ei, envir) : NAs introduced by coercion
> > 3: In eval(ei, envir) : NAs introduced by coercion
> >
> > The dataframe created by the read.csv() command:
> > > head(vel)
> >year month day hour min  fps
> > 1 2016 3   3   12   0 1.74
> > 2 2016 3   3   12  10 1.75
> > 3 2016 3   3   12  20 1.76
> > 4 2016 3   3   12  30 1.81
> > 5 2016 3   3   12  40 1.79
> > 6 2016 3   3   12  50 1.75
> >
> > and the resulting grouping:
> > > vel_by_month
> > # A tibble: 67 × 3
> > # Groups:   year [8]
> >  year month   flow
> >
> >   1 0NA NaN
> >   2  2016 3   2.40
> >   3  2016 4   3.00
> >   4  2016 5   2.86
> >   5  2016 6   2.51
> >   6  2016 7   2.18
> >   7  2016 8   1.89
> >   8  2016 9   1.38
> >   9  201610   1.73
> > 10  201611   2.01
> > # … with 57 more rows
> >
> > I cannot find why line 1 is there. Other data sets don't produce this
> > result.
> >
> > TIA,
> >
> > Rich
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Eric Berger
Before you create vel_by_month you can check vel for NAs and NaNs by

sum(is.na(vel))
sum(unlist(lapply(vel,is.nan)))

HTH,
Eric


On Tue, Sep 14, 2021 at 6:21 PM Rich Shepard 
wrote:

> The data file begins this way:
> year,month,day,hour,min,fps
> 2016,03,03,12,00,1.74
> 2016,03,03,12,10,1.75
> 2016,03,03,12,20,1.76
> 2016,03,03,12,30,1.81
> 2016,03,03,12,40,1.79
> 2016,03,03,12,50,1.75
> 2016,03,03,13,00,1.78
> 2016,03,03,13,10,1.81
>
> The script to process it:
> library('tidyverse')
> vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',',
> stringsAsFactors = FALSE)
> vel$year <- as.integer(vel$year)
> vel$month <- as.integer(vel$month)
> vel$day <- as.integer(vel$day)
> vel$hour <- as.integer(vel$hour)
> vel$min <- as.integer(vel$min)
> vel$fps <- as.double(vel$fps, length = 6)
>
> # use dplyr to filter() by year, month, day; summarize() to get monthly
> # means
> vel_by_month = vel %>%
>  group_by(year, month) %>%
>  summarize(flow = mean(fps, na.rm = TRUE))
>
> R's display after running the script:
> > source('vel.R')
> `summarise()` has grouped output by 'year'. You can override using the
> `.groups` argument.
> Warning messages:
> 1: In eval(ei, envir) : NAs introduced by coercion
> 2: In eval(ei, envir) : NAs introduced by coercion
> 3: In eval(ei, envir) : NAs introduced by coercion
>
> The dataframe created by the read.csv() command:
> > head(vel)
>year month day hour min  fps
> 1 2016 3   3   12   0 1.74
> 2 2016 3   3   12  10 1.75
> 3 2016 3   3   12  20 1.76
> 4 2016 3   3   12  30 1.81
> 5 2016 3   3   12  40 1.79
> 6 2016 3   3   12  50 1.75
>
> and the resulting grouping:
> > vel_by_month
> # A tibble: 67 × 3
> # Groups:   year [8]
>  year month   flow
>
>   1 0NA NaN
>   2  2016 3   2.40
>   3  2016 4   3.00
>   4  2016 5   2.86
>   5  2016 6   2.51
>   6  2016 7   2.18
>   7  2016 8   1.89
>   8  2016 9   1.38
>   9  201610   1.73
> 10  201611   2.01
> # … with 57 more rows
>
> I cannot find why line 1 is there. Other data sets don't produce this
> result.
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need fresh eyes to see what I'm missing

2021-09-14 Thread Rich Shepard

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', 
stringsAsFactors = FALSE)
vel$year <- as.integer(vel$year)
vel$month <- as.integer(vel$month)
vel$day <- as.integer(vel$day)
vel$hour <- as.integer(vel$hour)
vel$min <- as.integer(vel$min)
vel$fps <- as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly
# means
vel_by_month = vel %>%
group_by(year, month) %>%
summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:

source('vel.R')

`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:

head(vel)

  year month day hour min  fps
1 2016 3   3   12   0 1.74
2 2016 3   3   12  10 1.75
3 2016 3   3   12  20 1.76
4 2016 3   3   12  30 1.81
5 2016 3   3   12  40 1.79
6 2016 3   3   12  50 1.75

and the resulting grouping:

vel_by_month

# A tibble: 67 × 3
# Groups:   year [8]
year month   flow
  
 1 0NA NaN
 2  2016 3   2.40
 3  2016 4   3.00
 4  2016 5   2.86
 5  2016 6   2.51
 6  2016 7   2.18
 7  2016 8   1.89
 8  2016 9   1.38
 9  201610   1.73
10  201611   2.01
# … with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this
result.

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fastest way to extract rows of smaller matrix many times by index to make larger matrix? and multiply columsn of matrix by vector

2021-09-14 Thread Leonard Mada via R-help

Hello Nevil,


you could test something like:


# the Matrix
m = matrix(1:1000, ncol=10)
m = t(m)

# Extract Data
idcol = sample(seq(100), 100, TRUE); # now columns
for(i in 1:100) {
    m2 = m[ , idcol];
}
m2 = t(m2); # transpose back


It may be faster, although I did not benchmark it.


There may be more complex variants. Maybe it is warranted to try for 
10^7 extractions:


- e.g. extracting one row and replacing all occurrences of that row;


Sincerely,


Leonard





It seems I cannot extract digested mail anymore. I hope though that the 
message is processed properly.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fastest way to extract rows of smaller matrix many times by index to make larger matrix? and multiply columsn of matrix by vector

2021-09-14 Thread nevil amos
OK thanks, I thought it probably was, but always worth asking. the
multiplication of the columns of M2 by V2 is as intended - not matrix
multiplication.



On Tue, 14 Sept 2021 at 17:49, Jeff Newmiller 
wrote:

> That is about as fast as it can be done. However you may be able to avoid
> doing it at all if you fold V2 into a matrix instead. Did you mean to use
> matrix multiplication in your calculation of M3?
>
> On September 13, 2021 11:48:48 PM PDT, nevil amos 
> wrote:
> >Hi is there a faster way to "extract" rows of a matrix many times to for a
> >longer matrix based in a vector or for indices than M[ V, ]
> >
> >I need to "expand" ( rather than subset)  a matrix M of 10-100,000 rows x
> >~50 columns to produce a matrix with a greater number (10^6-10^8) of rows
> >using a vector V containing the 10^6 -10^8 values that are the indices of
> >the rows of M. the output matrix M2 is then multiplied by another vector
> V2
> >With the same length as V.
> >
> >Is there a faster way to achieve these calculations (which are by far the
> >slowest portion of a function looped 1000s of times? than the standard  M2
> ><- M[ V, ] and  M3<-M2*V2, the two calculations are taking a similar time,
> >Matrix M also changes for each loop.
> >
> >
> >M<-matrix(runif(50*1,0,100),nrow=1,ncol=50)
> >x = 10^7
> >V<-sample(1:1,x,replace=T)
> >V2<-(sample(c(1,NA),x,replace=T))
> >print<-(microbenchmark(
> >M2<-M[V,],
> >M3<-M2*V2,
> >times=5,unit = "ms"))
> >
> >
> >
> >thanks for any suggestions
> >
> >Nevil Amos
> >
> >   [[alternative HTML version deleted]]
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fastest way to extract rows of smaller matrix many times by index to make larger matrix? and multiply columsn of matrix by vector

2021-09-14 Thread Jeff Newmiller
That is about as fast as it can be done. However you may be able to avoid doing 
it at all if you fold V2 into a matrix instead. Did you mean to use matrix 
multiplication in your calculation of M3?

On September 13, 2021 11:48:48 PM PDT, nevil amos  wrote:
>Hi is there a faster way to "extract" rows of a matrix many times to for a
>longer matrix based in a vector or for indices than M[ V, ]
>
>I need to "expand" ( rather than subset)  a matrix M of 10-100,000 rows x
>~50 columns to produce a matrix with a greater number (10^6-10^8) of rows
>using a vector V containing the 10^6 -10^8 values that are the indices of
>the rows of M. the output matrix M2 is then multiplied by another vector V2
>With the same length as V.
>
>Is there a faster way to achieve these calculations (which are by far the
>slowest portion of a function looped 1000s of times? than the standard  M2
><- M[ V, ] and  M3<-M2*V2, the two calculations are taking a similar time,
>Matrix M also changes for each loop.
>
>
>M<-matrix(runif(50*1,0,100),nrow=1,ncol=50)
>x = 10^7
>V<-sample(1:1,x,replace=T)
>V2<-(sample(c(1,NA),x,replace=T))
>print<-(microbenchmark(
>M2<-M[V,],
>M3<-M2*V2,
>times=5,unit = "ms"))
>
>
>
>thanks for any suggestions
>
>Nevil Amos
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggsave() with width only

2021-09-14 Thread Ivan Calandra

Thank you Adam!

I'm a bit surprised that an extra package is needed for this, but why not!

Best,
Ivan

--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 13/09/2021 15:40, Adam Wysokiński wrote:

Hi,
Instead of ggsave(), use save_plot() from the "cowplot" package:

library(ggplot2)
library(cowplot)
x <- 1:10
y <- x^2
df <- data.frame(x, y)
p <- ggplot(df, aes(x, y)) + geom_point()
save_plot("/tmp/plot.png", p, base_aspect_ratio = 1, base_width = 5, 
base_height = NULL)




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fastest way to extract rows of smaller matrix many times by index to make larger matrix? and multiply columsn of matrix by vector

2021-09-14 Thread nevil amos
Hi is there a faster way to "extract" rows of a matrix many times to for a
longer matrix based in a vector or for indices than M[ V, ]

I need to "expand" ( rather than subset)  a matrix M of 10-100,000 rows x
~50 columns to produce a matrix with a greater number (10^6-10^8) of rows
using a vector V containing the 10^6 -10^8 values that are the indices of
the rows of M. the output matrix M2 is then multiplied by another vector V2
With the same length as V.

Is there a faster way to achieve these calculations (which are by far the
slowest portion of a function looped 1000s of times? than the standard  M2
<- M[ V, ] and  M3<-M2*V2, the two calculations are taking a similar time,
Matrix M also changes for each loop.


M<-matrix(runif(50*1,0,100),nrow=1,ncol=50)
x = 10^7
V<-sample(1:1,x,replace=T)
V2<-(sample(c(1,NA),x,replace=T))
print<-(microbenchmark(
M2<-M[V,],
M3<-M2*V2,
times=5,unit = "ms"))



thanks for any suggestions

Nevil Amos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.