Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rui Barradas

Hello,

You're right, I carelessly coded this.

which.max returns the index to the first maximum of a vector, while the 
comparison of a vector with its max() returns an index to all vector 
elements.


Às 23:27 de 03/12/21, Bert Gunter escreveu:

Perhaps you meant to point this out, but the cfs[which.max(cfs)] and
cfs == ... are not the same:


x <- rep(1:2,3)
x

[1] 1 2 1 2 1 2

x[which.max(x)]

[1] 2

x[x==max(x)]

[1] 2 2 2

So maybe your point is: which does the OP want (in case there are
repeated maxes)? I suspect the == forms, but ...?

Bert Gunter

On Fri, Dec 3, 2021 at 2:56 PM Rui Barradas  wrote:


Hello,

Inline.

Às 22:08 de 03/12/21, Rich Shepard escreveu:

On Fri, 3 Dec 2021, Rich Shepard wrote:


I find solutions when the data_frame is grouped, but none when it's not.


Thanks, Bert. ?which.max confirmed that's all I need to find the maximum
value.

Now I need to read more than ?filter to learn why I'm not getting the
relevant row with:

which.max(pdx_disc$cfs)

[1] 8054


This is the *index* for which cfs is the first maximum, not the maximum
value itself.




filter(pdx_disc, cfs == 8054)


Therefore, you probably want any of


Should be "one of", not "any of"

Rui Barradas




filter(pdx_disc, cfs == cfs[8054])

filter(pdx_disc, cfs == cfs[which.max(cfs)])

filter(pdx_disc, cfs == max(cfs))# I find this one better, simpler


Hope this helps,

Rui Barradas



# A tibble: 0 × 9
# … with 9 variables: site_nbr , year , mon , day ,
#   hr , min , tz , cfs , sampdt 

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rich Shepard

On Fri, 3 Dec 2021, Rich Shepard wrote:


they apparently do. For example, 99.9000 cubic feet per second is reached

   99,900

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rich Shepard

On Fri, 3 Dec 2021, Bert Gunter wrote:


Perhaps you meant to point this out, but the cfs[which.max(cfs)] and
cfs == ... are not the same:


x <- rep(1:2,3)
x

[1] 1 2 1 2 1 2

x[which.max(x)]

[1] 2

x[x==max(x)]

[1] 2 2 2

So maybe your point is: which does the OP want (in case there are
repeated maxes)? I suspect the == forms, but ...?


Bert,

When I looked at the results I saw there were many rows with the maximum
(and minimum) values. I thought those represented instrument limits, and
they apparently do. For example, 99.9000 cubic feet per second is reached
multiple times over the past 32 years.


max(pdx_disc$cfs)

[1] 99900

and


filter(pdx_disc, cfs == max(cfs))

# A tibble: 74 × 9
   site_nbr  year   mon   dayhr   min tz  cfs sampdt
  
 1 14211720  19881128 0 0 PST   99900 1988-11-28 00:00:00
 2 14211720  19881128 5 0 PST   99900 1988-11-28 05:00:00
 3 14211720  19881128 510 PST   99900 1988-11-28 05:10:00
 4 14211720  19881128 520 PST   99900 1988-11-28 05:20:00
 5 14211720  19881129 620 PST   99900 1988-11-29 06:20:00
 6 14211720  1988112913 0 PST   99900 1988-11-29 13:00:00
 7 14211720  198811291310 PST   99900 1988-11-29 13:10:00
 8 14211720  198811291520 PST   99900 1988-11-29 15:20:00
 9 14211720  1989 111 0 0 PST   99900 1989-01-11 00:00:00
10 14211720  1989 111 010 PST   99900 1989-01-11 00:10:00
# … with 64 more rows

So the gauge was pegged at its top end quite a few times over the years.
Makes me wonder just how much higher really was.

Carpe weekend,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Bert Gunter
Perhaps you meant to point this out, but the cfs[which.max(cfs)] and
cfs == ... are not the same:

> x <- rep(1:2,3)
> x
[1] 1 2 1 2 1 2
> x[which.max(x)]
[1] 2
> x[x==max(x)]
[1] 2 2 2

So maybe your point is: which does the OP want (in case there are
repeated maxes)? I suspect the == forms, but ...?

Bert Gunter

On Fri, Dec 3, 2021 at 2:56 PM Rui Barradas  wrote:
>
> Hello,
>
> Inline.
>
> Às 22:08 de 03/12/21, Rich Shepard escreveu:
> > On Fri, 3 Dec 2021, Rich Shepard wrote:
> >
> >> I find solutions when the data_frame is grouped, but none when it's not.
> >
> > Thanks, Bert. ?which.max confirmed that's all I need to find the maximum
> > value.
> >
> > Now I need to read more than ?filter to learn why I'm not getting the
> > relevant row with:
> >> which.max(pdx_disc$cfs)
> > [1] 8054
>
> This is the *index* for which cfs is the first maximum, not the maximum
> value itself.
>
> >
> >> filter(pdx_disc, cfs == 8054)
>
> Therefore, you probably want any of
>
>
> filter(pdx_disc, cfs == cfs[8054])
>
> filter(pdx_disc, cfs == cfs[which.max(cfs)])
>
> filter(pdx_disc, cfs == max(cfs))# I find this one better, simpler
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> > # A tibble: 0 × 9
> > # … with 9 variables: site_nbr , year , mon , day ,
> > #   hr , min , tz , cfs , sampdt 
> >
> > Regards,
> >
> > Rich
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rich Shepard

On Fri, 3 Dec 2021, Rui Barradas wrote:


which.max(pdx_disc$cfs)
[1] 8054



This is the *index* for which cfs is the first maximum, not the maximum
value itself.


Rui,

Mea culpa! I completely forgot this.


Therefore, you probably want any of
filter(pdx_disc, cfs == cfs[8054])
filter(pdx_disc, cfs == cfs[which.max(cfs)])
filter(pdx_disc, cfs == max(cfs))# I find this one better, simpler



Hope this helps,


Yes, it does.

Thank you very much,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rui Barradas

Hello,

Inline.

Às 22:08 de 03/12/21, Rich Shepard escreveu:

On Fri, 3 Dec 2021, Rich Shepard wrote:


I find solutions when the data_frame is grouped, but none when it's not.


Thanks, Bert. ?which.max confirmed that's all I need to find the maximum
value.

Now I need to read more than ?filter to learn why I'm not getting the
relevant row with:

which.max(pdx_disc$cfs)

[1] 8054


This is the *index* for which cfs is the first maximum, not the maximum 
value itself.





filter(pdx_disc, cfs == 8054)


Therefore, you probably want any of


filter(pdx_disc, cfs == cfs[8054])

filter(pdx_disc, cfs == cfs[which.max(cfs)])

filter(pdx_disc, cfs == max(cfs))# I find this one better, simpler


Hope this helps,

Rui Barradas



# A tibble: 0 × 9
# … with 9 variables: site_nbr , year , mon , day ,
#   hr , min , tz , cfs , sampdt 

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rich Shepard

On Fri, 3 Dec 2021, Rich Shepard wrote:


I find solutions when the data_frame is grouped, but none when it's not.


Thanks, Bert. ?which.max confirmed that's all I need to find the maximum
value.

Now I need to read more than ?filter to learn why I'm not getting the
relevant row with:

which.max(pdx_disc$cfs)

[1] 8054


filter(pdx_disc, cfs == 8054)

# A tibble: 0 × 9
# … with 9 variables: site_nbr , year , mon , day ,
#   hr , min , tz , cfs , sampdt 

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Rich Shepard

On Fri, 3 Dec 2021, Jeff Newmiller wrote:


cfs is not a function. Don't put parentheses next to it. Use square
brackets for indexing.


Jeff,

Thanks.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Jeff Newmiller
cfs is not a function. Don't put parentheses next to it. Use square brackets 
for indexing.

On December 3, 2021 12:55:34 PM PST, Rich Shepard  
wrote:
>I find solutions when the data_frame is grouped, but none when it's not.
>
>The data:
># A tibble: 813,693 × 9
>site_nbr  year   mon   dayhr   min tz  cfs sampdt
>   
>  1 14211720  198810 1 010 PDT   16800 1988-10-01 00:10:00
>  2 14211720  198810 1 020 PDT   16800 1988-10-01 00:20:00
>  3 14211720  198810 1 030 PDT   17300 1988-10-01 00:30:00
>  4 14211720  198810 1 040 PDT   18200 1988-10-01 00:40:00
>  5 14211720  198810 1 050 PDT   18100 1988-10-01 00:50:00
>  6 14211720  198810 1 1 0 PDT   18400 1988-10-01 01:00:00
>  7 14211720  198810 1 110 PDT   18700 1988-10-01 01:10:00
>  8 14211720  198810 1 120 PDT   19200 1988-10-01 01:20:00
>  9 14211720  198810 1 130 PDT   19200 1988-10-01 01:30:00
>10 14211720  198810 1 140 PDT   18900 1988-10-01 01:40:00
># … with 813,683 more rows
>
>The script:
>library(tidyverse)
>
>max_pdx_disc <- pdx_disc %>%
> summarize(max_cfs = max(cfs), max_cfs_sampdt = cfs(which.max(cfs)))
>
>The error:
>> source('../scripts/filter_by_column_max.r') 
>Error: Problem with `summarise()` column `max_cfs_sampdt`.
>  `max_cfs_sampdt = sampt(which.max(cfs))`.
>  could not find function "sampt"
>> Run `rlang::last_error()` to see where the error occurred.
>
>I looked at the the last_error and the traceback without understanding how
>to filter the row with the maximum cfs value.
>
>What do I read to learn how to return this row?
>
>TIA,
>
>Rich
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find tibble row with maximum recorded value

2021-12-03 Thread Bert Gunter
which.max(dat$cfs), I presume.
see ?which.max

(as usual, true tidyverse questions belong on RStudio's help site, not here).

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Dec 3, 2021 at 12:56 PM Rich Shepard  wrote:
>
> I find solutions when the data_frame is grouped, but none when it's not.
>
> The data:
> # A tibble: 813,693 × 9
> site_nbr  year   mon   dayhr   min tz  cfs sampdt
>
>   1 14211720  198810 1 010 PDT   16800 1988-10-01 00:10:00
>   2 14211720  198810 1 020 PDT   16800 1988-10-01 00:20:00
>   3 14211720  198810 1 030 PDT   17300 1988-10-01 00:30:00
>   4 14211720  198810 1 040 PDT   18200 1988-10-01 00:40:00
>   5 14211720  198810 1 050 PDT   18100 1988-10-01 00:50:00
>   6 14211720  198810 1 1 0 PDT   18400 1988-10-01 01:00:00
>   7 14211720  198810 1 110 PDT   18700 1988-10-01 01:10:00
>   8 14211720  198810 1 120 PDT   19200 1988-10-01 01:20:00
>   9 14211720  198810 1 130 PDT   19200 1988-10-01 01:30:00
> 10 14211720  198810 1 140 PDT   18900 1988-10-01 01:40:00
> # … with 813,683 more rows
>
> The script:
> library(tidyverse)
>
> max_pdx_disc <- pdx_disc %>%
>  summarize(max_cfs = max(cfs), max_cfs_sampdt = cfs(which.max(cfs)))
>
> The error:
> > source('../scripts/filter_by_column_max.r')
> Error: Problem with `summarise()` column `max_cfs_sampdt`.
>   `max_cfs_sampdt = sampt(which.max(cfs))`.
>   could not find function "sampt"
> > Run `rlang::last_error()` to see where the error occurred.
>
> I looked at the the last_error and the traceback without understanding how
> to filter the row with the maximum cfs value.
>
> What do I read to learn how to return this row?
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Find tibble row with maximum recorded value

2021-12-03 Thread Rich Shepard

I find solutions when the data_frame is grouped, but none when it's not.

The data:
# A tibble: 813,693 × 9
   site_nbr  year   mon   dayhr   min tz  cfs sampdt
  
 1 14211720  198810 1 010 PDT   16800 1988-10-01 00:10:00
 2 14211720  198810 1 020 PDT   16800 1988-10-01 00:20:00
 3 14211720  198810 1 030 PDT   17300 1988-10-01 00:30:00
 4 14211720  198810 1 040 PDT   18200 1988-10-01 00:40:00
 5 14211720  198810 1 050 PDT   18100 1988-10-01 00:50:00
 6 14211720  198810 1 1 0 PDT   18400 1988-10-01 01:00:00
 7 14211720  198810 1 110 PDT   18700 1988-10-01 01:10:00
 8 14211720  198810 1 120 PDT   19200 1988-10-01 01:20:00
 9 14211720  198810 1 130 PDT   19200 1988-10-01 01:30:00
10 14211720  198810 1 140 PDT   18900 1988-10-01 01:40:00
# … with 813,683 more rows

The script:
library(tidyverse)

max_pdx_disc <- pdx_disc %>%
summarize(max_cfs = max(cfs), max_cfs_sampdt = cfs(which.max(cfs)))

The error:
source('../scripts/filter_by_column_max.r') 

Error: Problem with `summarise()` column `max_cfs_sampdt`.
 `max_cfs_sampdt = sampt(which.max(cfs))`.
 could not find function "sampt"

Run `rlang::last_error()` to see where the error occurred.


I looked at the the last_error and the traceback without understanding how
to filter the row with the maximum cfs value.

What do I read to learn how to return this row?

TIA,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with lm Giving Wrong Results

2021-12-03 Thread Labone, Thomas
Two of the machines having the problem are AVX-512 capable (e.g., i7-7820X) but 
another one is an old Samsung Series 5 with an i5-3317U. I guess I will start 
with the folks at Linux Mint.

Tom


Thomas R. LaBone
PhD student
Department of Epidemiology and Biostatistics
Arnold School of Public Health
University of South Carolina
Columbia, South Carolina USA




From: Sarah Goslee 
Sent: Friday, December 3, 2021 11:00 AM
To: Labone, Thomas 
Cc: Bill Dunlap ; r-help@r-project.org 

Subject: Re: [R] Problem with lm Giving Wrong Results

It might also be a BLAS+processor problem - I got bit pretty hard by
that, with an example here:

https://stat.ethz.ch/pipermail/r-help/2019-July/463477.html

With a key excerpt here:

On Thu, Jul 18, 2019 at 1:59 PM Ivan Krylov  wrote:
> Yes, this might be bad. I have heard about OpenBLAS (specifically, the
> matrix product routine) misbehaving on certain AVX-512 capable
> processors, so much that they had to disable some optimizations in
> 0.3.6 [*], which you already have installed. Still, would `env
> OPENBLAS_CORETYPE=Haswell R --vanilla` give a better result?
>

On Fri, Dec 3, 2021 at 10:29 AM Labone, Thomas  wrote:
>
> Thanks for the feedback everyone. If you go to 
> https://protect2.fireeye.com/v1/url?k=f8bdd2b7-a726ea7c-f8bd9c76-86ce7c8b8969-1acf41b3a3825b65=1=05c346dc-4f60-4e2e-ada8-1abaa8792515=https%3A%2F%2Fgithub.com%2Fcsantill%2FRPerformanceWBLAS%2Fblob%2Fmaster%2FRPerformanceBLAS.md
>  you will find the Linux commands to change the default math library. When I 
> switch the BLAS library from MKL to the system default (see sessionInfo 
> below), everything works as expected. I installed version 2020.0-166-1 of 
> "Intel-MKL" from the Linux Mint Software Manager. I may be coming to a hasty 
> conclusion, but there appears to be something wrong with that package or how 
> it interacts with other system software. Any suggestions on who I should 
> notify about the problem (e.g., Intel, Mint, Ubuntu)?
>
> R version 4.1.2 (2021-11-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Linux Mint 20.2
>
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   LC_TIME=en_US.UTF-8
>  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C  LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.2 tools_4.1.2
>
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
>
> 
> From: Labone, Thomas 
> Sent: Thursday, December 2, 2021 11:53 AM
> To: Bill Dunlap 
> Cc: r-help@r-project.org 
> Subject: Re: [R] Problem with lm Giving Wrong Results
>
> > summary(fit)
>
> Call:
> lm(formula = log(k) ~ Z)
>
> Residuals:
> Min  1Q  Median  3Q Max
> -21.241   1.327   1.776   2.245   4.418
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.034650.01916  -1.809   0.0705 .
> Z   -0.242070.01916 -12.634   <2e-16 ***
> ---
> Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1
>
> Residual standard error: 1.914 on 9998 degrees of freedom
> Multiple R-squared:  0.01467, Adjusted R-squared:  0.01457
> F-statistic: 148.8 on 1 and 9998 DF,  p-value: < 2.2e-16
>
> > summary(k)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>  0.2735  3.7658  5.9052  7.5113  9.4399 82.9531
> > summary(Z)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
> -3.8906 -0.6744  0.  0.  0.6744  3.8906
> > summary(gm*gsd^Z)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>  0.3767  0.8204  0.9659  0.9947  1.1372  2.4772
> >
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
> 
> From: Bill Dunlap 
> Sent: Thursday, December 2, 2021 10:31 AM
> To: Labone, Thomas 
> Cc: r-help@r-project.org 
> Subject: Re: [R] Problem with lm Giving Wrong Results
>
> On the 'bad' machines, what did you get for
>summary(fit)
>summary(k)
>summary(Z)
>summary(gm*gsd^Z)
> ?
>
> -Bill
>
> On Thu, Dec 2, 2021 at 6:18 AM Labone, Thomas 
> mailto:lab...@email.sc.edu>> wrote:
> In the code below the first and second plots should look pretty much the 
> same, the only difference being that the first has n=1000 points and the 
> second n=1 points. On two of my Linux machines (info below) the second 
> plot is a horizontal line (incorrect answer from 

Re: [R] Question about Rfast colMins and colMaxs

2021-12-03 Thread Stephen H. Dawson, DSL via R-help

Thanks, Richard.

I am researching other library options for data inspection. I have many 
csv files I am reviewing with different column names and data types. 
Flexibility of a quick review of max and min is quite valuable at this 
juncture.


I will implement your code recommendation next week and see how it performs.


Kindest Regards,
*Stephen Dawson, DSL*
/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 


On 12/2/21 11:06 PM, Richard O'Keefe wrote:

What puzzles me is why you are not just using
lapply(some.data.frame, min)
lapply(some.data.frame, max)
or as.vector(lapply(...))
Why go to another package for this?
Is it the indices you want?

col.min.indices <- function (some.data.frame) {
    v <- sapply(some.data.frame, function (column)
which(column == min(column))[1])
    names(v) <- colnames(some.data.frame)
    v
}


On Wed, 1 Dec 2021 at 07:55, Stephen H. Dawson, DSL via R-help 
mailto:r-help@r-project.org>> wrote:


Hi,


I am working to understand the Rfast functions of colMins and
colMaxs. I
worked through the example listed on page 54 of the PDF.

https://cran.r-project.org/web/packages/Rfast/index.html


https://cran.r-project.org/web/packages/Rfast/Rfast.pdf


My data is in a CSV file. So, I bring it into R Studio using:
Data <- read.csv("./input/DataSet05.csv", header=T)

However, I read the instructions listed on page 54 of the PDF
saying I
need to bring data into R using a matrix. I think read.csv brings the
data in as a dataframe. I think colMins is failing because it is
looking
for a matrix but finds a dataframe.

 > colMaxs(Data)
Error in colMaxs(Data) :
   Not compatible with requested type: [type=list; target=double].
 > colMins(Data, na.rm = TRUE)
Error in colMins(Data, na.rm = TRUE) :
   unused argument (na.rm = TRUE)
 > colMins(Data, value = FALSE, parallel = FALSE)
Error in colMins(Data, value = FALSE, parallel = FALSE) :
   Not compatible with requested type: [type=list; target=double].

QUESTION
What is the best practice to bring a csv file into R Studio so it
can be
accessed by colMaxs and colMins, please?


Thanks,
-- 
*Stephen Dawson, DSL*

/Executive Strategy Consultant/
Business & Technology
+1 (865) 804-3454
http://www.shdawson.com 
>

__
R-help@r-project.org  mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with lm Giving Wrong Results

2021-12-03 Thread Sarah Goslee
It might also be a BLAS+processor problem - I got bit pretty hard by
that, with an example here:

https://stat.ethz.ch/pipermail/r-help/2019-July/463477.html

With a key excerpt here:

On Thu, Jul 18, 2019 at 1:59 PM Ivan Krylov  wrote:
> Yes, this might be bad. I have heard about OpenBLAS (specifically, the
> matrix product routine) misbehaving on certain AVX-512 capable
> processors, so much that they had to disable some optimizations in
> 0.3.6 [*], which you already have installed. Still, would `env
> OPENBLAS_CORETYPE=Haswell R --vanilla` give a better result?
>

On Fri, Dec 3, 2021 at 10:29 AM Labone, Thomas  wrote:
>
> Thanks for the feedback everyone. If you go to 
> https://github.com/csantill/RPerformanceWBLAS/blob/master/RPerformanceBLAS.md 
> you will find the Linux commands to change the default math library. When I 
> switch the BLAS library from MKL to the system default (see sessionInfo 
> below), everything works as expected. I installed version 2020.0-166-1 of 
> "Intel-MKL" from the Linux Mint Software Manager. I may be coming to a hasty 
> conclusion, but there appears to be something wrong with that package or how 
> it interacts with other system software. Any suggestions on who I should 
> notify about the problem (e.g., Intel, Mint, Ubuntu)?
>
> R version 4.1.2 (2021-11-01)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Linux Mint 20.2
>
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   LC_TIME=en_US.UTF-8
>  [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
> LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C  LC_ADDRESS=C
> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.2 tools_4.1.2
>
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
>
> 
> From: Labone, Thomas 
> Sent: Thursday, December 2, 2021 11:53 AM
> To: Bill Dunlap 
> Cc: r-help@r-project.org 
> Subject: Re: [R] Problem with lm Giving Wrong Results
>
> > summary(fit)
>
> Call:
> lm(formula = log(k) ~ Z)
>
> Residuals:
> Min  1Q  Median  3Q Max
> -21.241   1.327   1.776   2.245   4.418
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) -0.034650.01916  -1.809   0.0705 .
> Z   -0.242070.01916 -12.634   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.914 on 9998 degrees of freedom
> Multiple R-squared:  0.01467, Adjusted R-squared:  0.01457
> F-statistic: 148.8 on 1 and 9998 DF,  p-value: < 2.2e-16
>
> > summary(k)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>  0.2735  3.7658  5.9052  7.5113  9.4399 82.9531
> > summary(Z)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
> -3.8906 -0.6744  0.  0.  0.6744  3.8906
> > summary(gm*gsd^Z)
>Min. 1st Qu.  MedianMean 3rd Qu.Max.
>  0.3767  0.8204  0.9659  0.9947  1.1372  2.4772
> >
>
>
> Thomas R. LaBone
> PhD student
> Department of Epidemiology and Biostatistics
> Arnold School of Public Health
> University of South Carolina
> Columbia, South Carolina USA
>
>
> 
> From: Bill Dunlap 
> Sent: Thursday, December 2, 2021 10:31 AM
> To: Labone, Thomas 
> Cc: r-help@r-project.org 
> Subject: Re: [R] Problem with lm Giving Wrong Results
>
> On the 'bad' machines, what did you get for
>summary(fit)
>summary(k)
>summary(Z)
>summary(gm*gsd^Z)
> ?
>
> -Bill
>
> On Thu, Dec 2, 2021 at 6:18 AM Labone, Thomas 
> mailto:lab...@email.sc.edu>> wrote:
> In the code below the first and second plots should look pretty much the 
> same, the only difference being that the first has n=1000 points and the 
> second n=1 points. On two of my Linux machines (info below) the second 
> plot is a horizontal line (incorrect answer from lm), but on my Windows 10 
> machine and a third Linux machine it works as expected. The interesting thing 
> is that the code works as expected for n <= 4095 but fails for n>=4096 (which 
> equals 2^12). Can anyone else reproduce this problem? Any ideas on how to fix 
> it?
>
> set.seed(132)
>
> #~~~
> # This works
> n <- 1000# OK <= 4095
> Z <- qnorm(ppoints(n))
>
> k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))
>
> quantile(k,probs=c(0.025,0.5,0.975))
> summary(k)
>
> fit <- lm(log(k) ~ Z)
> summary(fit)
>
> gm <- exp(coef(fit)[1])
> gsd <- exp(coef(fit)[2])
> gm
> gsd
>
> 

Re: [R] Problem with lm Giving Wrong Results

2021-12-03 Thread Labone, Thomas
Thanks for the feedback everyone. If you go to 
https://github.com/csantill/RPerformanceWBLAS/blob/master/RPerformanceBLAS.md 
you will find the Linux commands to change the default math library. When I 
switch the BLAS library from MKL to the system default (see sessionInfo below), 
everything works as expected. I installed version 2020.0-166-1 of "Intel-MKL" 
from the Linux Mint Software Manager. I may be coming to a hasty conclusion, 
but there appears to be something wrong with that package or how it interacts 
with other system software. Any suggestions on who I should notify about the 
problem (e.g., Intel, Mint, Ubuntu)?

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C  LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2



Thomas R. LaBone
PhD student
Department of Epidemiology and Biostatistics
Arnold School of Public Health
University of South Carolina
Columbia, South Carolina USA




From: Labone, Thomas 
Sent: Thursday, December 2, 2021 11:53 AM
To: Bill Dunlap 
Cc: r-help@r-project.org 
Subject: Re: [R] Problem with lm Giving Wrong Results

> summary(fit)

Call:
lm(formula = log(k) ~ Z)

Residuals:
Min  1Q  Median  3Q Max
-21.241   1.327   1.776   2.245   4.418

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.034650.01916  -1.809   0.0705 .
Z   -0.242070.01916 -12.634   <2e-16 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Residual standard error: 1.914 on 9998 degrees of freedom
Multiple R-squared:  0.01467, Adjusted R-squared:  0.01457
F-statistic: 148.8 on 1 and 9998 DF,  p-value: < 2.2e-16

> summary(k)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 0.2735  3.7658  5.9052  7.5113  9.4399 82.9531
> summary(Z)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
-3.8906 -0.6744  0.  0.  0.6744  3.8906
> summary(gm*gsd^Z)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
 0.3767  0.8204  0.9659  0.9947  1.1372  2.4772
>


Thomas R. LaBone
PhD student
Department of Epidemiology and Biostatistics
Arnold School of Public Health
University of South Carolina
Columbia, South Carolina USA



From: Bill Dunlap 
Sent: Thursday, December 2, 2021 10:31 AM
To: Labone, Thomas 
Cc: r-help@r-project.org 
Subject: Re: [R] Problem with lm Giving Wrong Results

On the 'bad' machines, what did you get for
   summary(fit)
   summary(k)
   summary(Z)
   summary(gm*gsd^Z)
?

-Bill

On Thu, Dec 2, 2021 at 6:18 AM Labone, Thomas 
mailto:lab...@email.sc.edu>> wrote:
In the code below the first and second plots should look pretty much the same, 
the only difference being that the first has n=1000 points and the second 
n=1 points. On two of my Linux machines (info below) the second plot is a 
horizontal line (incorrect answer from lm), but on my Windows 10 machine and a 
third Linux machine it works as expected. The interesting thing is that the 
code works as expected for n <= 4095 but fails for n>=4096 (which equals 2^12). 
Can anyone else reproduce this problem? Any ideas on how to fix it?

set.seed(132)

#~~~
# This works
n <- 1000# OK <= 4095
Z <- qnorm(ppoints(n))

k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))

quantile(k,probs=c(0.025,0.5,0.975))
summary(k)

fit <- lm(log(k) ~ Z)
summary(fit)

gm <- exp(coef(fit)[1])
gsd <- exp(coef(fit)[2])
gm
gsd

plot(Z,k,log="y",xlim=c(-4,4),ylim=c(0.1,100))
lines(Z,gm*gsd^Z,col="red")

#~~~
#this does not
n <- 1# fails >= 4096 = 2^12
Z <- qnorm(ppoints(n))

k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))

quantile(k,probs=c(0.025,0.5,0.975))
summary(k)

fit <- lm(log(k) ~ Z)
summary(fit)

gm <- exp(coef(fit)[1])
gsd <- exp(coef(fit)[2])
gm
gsd

plot(Z,k,log="y",xlim=c(-4,4),ylim=c(0.1,100))
lines(Z,gm*gsd^Z,col="red")


#~~~
> sessionInfo() #for two Linux machines having problem
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C   

Re: [R] SOMAscan data analysis

2021-12-03 Thread Eric Berger
Hi Kai,
Check out https://www.bioconductor.org
or the help there at
https://www.bioconductor.org/help/

You can also post your question there.

Best,
Eric


On Fri, Dec 3, 2021 at 2:22 AM Kai Yang via R-help 
wrote:

> Hello R team,we have a huge SOMAscan data set. This is an aptamer-based
> protecomics assay capable of measuring 1305 human protein analytes. does
> anyone know which package can load the data and do analysis? I apricate any
> suggestion, your experience, web page, paper
> Thank you,Kai
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.