Re: [R] add only the 1st of May with POSIXct

2024-05-29 Thread Rui Barradas

Às 07:01 de 29/05/2024, Stefano Sofia escreveu:

Thank you Rui for your code.

I basically understood all your suggestions.

I am using an old version of R (version 3.6.3, installed in a server I am not 
allowed to control), and the new pipe operator does not work.

I tried to run your code without the "|>" operator, but I get an error when I 
use apply.

Could you please expand your code without the pipe operator?


Thank you again for your help

Stefano



  (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO


____
Da: Rui Barradas 
Inviato: martedì 28 maggio 2024 18:19
A: Stefano Sofia; r-help@R-project.org
Oggetto: Re: [R] add only the 1st of May with POSIXct

[Non ricevi spesso messaggi di posta elettronica da ruipbarra...@sapo.pt. Per 
informazioni sull'importanza di questo fatto, visita 
https://aka.ms/LearnAboutSenderIdentification.]

Às 16:23 de 28/05/2024, Stefano Sofia escreveu:

Dear R-list users,

  From an initial and a final date I create a sequence of days using POSIXct.

If this interval covers all or only in part the months from May to October, I 
need to get rid of the days from the 2nd of May to the 31st of October:


a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

mydf <- data.frame(data_POSIX=seq(as.POSIXct(paste(format(a, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), 
as.POSIXct(paste(format(b, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="1 day"))


If I execute

as.data.frame(mydf[format(mydf$data_POSIX,"%m") %in% c("11", "12", "01", "02", "03", 
"04"), ])

the interval will be

from 2002-11-01 09:00:00 to 2003-04-30 09:00:00

and from 2003-11-01 09:00:00 to 2004-04-30 09:00:00


but I need also 2003-05-01 09:00:00 and 2004-05-01 09:00:00


How can I solve this problem?


Thank you for your attention and your help

Stefano



   (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere 
informazioni confidenziali, pertanto � destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed 
urgenza, la risposta al presente messaggio di posta elettronica pu� essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C02%7Cstefano.sofia%40regione.marche.it%7C0d812d3223344a1508d408dc7f31f657%7C295eaa1431a14b09bfe65a338b679f60%7C0%7C0%7C638525100275684754%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C6%7C%7C%7C=ac0Hx9auMSeXgsllDaaimZDFBpSLZ%2B3OeOGQoVvcjxQ%3D=0
PLEASE do read the posting guide 
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html=05%7C02%7Cstefano.sofia%40regione.marche.it%7C0d812d3223344a1508d408dc7f31f657%7C295eaa1431a14b09bfe65a338b679f60%7C0%7C0%7C638525100275684754%7CUnknown%7CTWFpbGZs

Re: [R] add only the 1st of May with POSIXct

2024-05-28 Thread Rui Barradas

Às 16:23 de 28/05/2024, Stefano Sofia escreveu:

Dear R-list users,

 From an initial and a final date I create a sequence of days using POSIXct.

If this interval covers all or only in part the months from May to October, I 
need to get rid of the days from the 2nd of May to the 31st of October:


a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

mydf <- data.frame(data_POSIX=seq(as.POSIXct(paste(format(a, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), 
as.POSIXct(paste(format(b, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="1 day"))


If I execute

as.data.frame(mydf[format(mydf$data_POSIX,"%m") %in% c("11", "12", "01", "02", "03", 
"04"), ])

the interval will be

from 2002-11-01 09:00:00 to 2003-04-30 09:00:00

and from 2003-11-01 09:00:00 to 2004-04-30 09:00:00


but I need also 2003-05-01 09:00:00 and 2004-05-01 09:00:00


How can I solve this problem?


Thank you for your attention and your help

Stefano



  (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere 
informazioni confidenziali, pertanto � destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed 
urgenza, la risposta al presente messaggio di posta elettronica pu� essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

First of all, 'a' and 'b' are already objects of class "POSIXct", you 
don't need to repeat the code creating them when creating mydf.


As for the question, see the code below.


a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1")
b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1")
mydf <- data.frame(data_POSIX = seq(a, b, by = "1 day"))

# get the years from the data
years <- format(c(a, b), "%Y") |> as.integer()
# this creates a sequence with all the years
years <- Reduce(`:`, years)

# coerce to "Date"
from <- ISOdate(years, 5L, 2L, tz = "Etc/GMT-1")
to <- ISOdate(years, 10L, 30L, tz = "Etc/GMT-1")

# this logical index keeps only the dates between May, 2nd and Nov 1st.
keep <- data.frame(from, to) |>
  apply(1L, \(x) x[1L] <= mydf$data_POSIX & mydf$data_POSIX <= x[2L]) |>
  rowSums() > 0L

mydf[keep, , drop = FALSE]



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print date on y axis with month, day, and year

2024-05-10 Thread Rui Barradas

Às 00:58 de 10/05/2024, Sorkin, John escreveu:

I am trying to use ggplot to plot the data, and R code, below. The dates 
(jdate) are printing as Mar 01, Mar 15, etc. I want to have the date printed as 
MMM DD  (or any other way that will show month, date, and year, e.g. 
mm/dd/yy). How can I accomplish this?

yyy  <- structure(list(
   jdate = structure(c(19052, 19053, 19054, 19055,
   19058, 19059, 19060, 19061, 19062, 19063, 19065, 19066, 
19067,
   19068, 19069, 19072, 19073, 19074, 19075, 19076, 19077, 
19083,
   19086, 19087, 19088, 19089, 19090, 19093, 19094, 19095), class = 
"Date"),
 Sum = c ( 1,  3,  9, 11, 13, 16, 18, 22, 26, 27, 30, 32, 35, 39,  41,
  43, 48, 51, 56, 58, 59, 63, 73, 79, 81, 88, 91, 93, 96, 103)),
 row.names = c(NA, 30L), class = "data.frame")
yyy
class(yyy$jdate)
ggplot(data=yyy[1:30,],aes(as.Date(jdate,format="%m-%d-%Y"),Sum)) +geom_point()


Thank you
John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Since class(yyy$jdate) returns "Date", you have a real date and 
scale_x_date can handle the printed formats, there is no need for an 
extra as.Date in aes(). And get rid of the format = "%m-%d-%Y" argument.


Let scale_x_date take care of formating the date as you want it 
displayed. Any of the two below is a valid date format.




ggplot(data = yyy[1:30,], aes(jdate, Sum)) +
   geom_point() +
   # scale_x_date(date_labels = "%b %d, %Y")
   scale_x_date(date_labels = "%m/%d/%Y")



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-21 Thread Rui Barradas

Às 09:08 de 21/04/2024, Rui Barradas escreveu:

Às 08:55 de 21/04/2024, Hans W escreveu:

As we all know, in R indices for vectors start with 1, i.e, x[0] is not a
correct expression. Some algorithms, e.g. in graph theory or 
combinatorics,
are much easier to formulate and code if 0 is an allowed index 
pointing to

the first element of the vector.

Some programming languages, for instance Julia (where the index for 
normal
vectors also starts with 1), provide libraries/packages that allow the 
user
to define an index range for its vectors, say 0:9 or 10:20 or even 
negative

indices.

Of course, this notation would only be feasible for certain specially
defined vectors. Is there a library that provides this functionality?
Or is there a simple trick to do this in R? The expression 'x[0]' must
be possible, does this mean the syntax of R has to be twisted somehow?

Thanks, Hans W.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

I find what you are asking awkward but it can be done with S3 classes.
Write an extraction method for the new class and in the use case below 
it works. The method increments the ndex before calling NextMethod, the 
usual extraction function.



`[.zerobased` <- function(x, i, ...) {
   i <- i + 1L
   NextMethod()
}
as_zerobased <- function(x) {
   class(x) <- c("zerobased", class(x))
   x
}

x <- 1:10
y <- as_zerobased(x)

y[0]
#> [1] 1
y[1]
#> [1] 2
y[9]
#> [1] 10
y[10]
#> [1] NA


Hope this helps,

Rui Barradas



Sorry, forgot to also define a `[[zerobased` method. It's probably safer.


`[[.zerobased` <- function(x, i, ...) {
  i <- i + 1L
  NextMethod()
}


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-21 Thread Rui Barradas

Às 08:55 de 21/04/2024, Hans W escreveu:

As we all know, in R indices for vectors start with 1, i.e, x[0] is not a
correct expression. Some algorithms, e.g. in graph theory or combinatorics,
are much easier to formulate and code if 0 is an allowed index pointing to
the first element of the vector.

Some programming languages, for instance Julia (where the index for normal
vectors also starts with 1), provide libraries/packages that allow the user
to define an index range for its vectors, say 0:9 or 10:20 or even negative
indices.

Of course, this notation would only be feasible for certain specially
defined vectors. Is there a library that provides this functionality?
Or is there a simple trick to do this in R? The expression 'x[0]' must
be possible, does this mean the syntax of R has to be twisted somehow?

Thanks, Hans W.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I find what you are asking awkward but it can be done with S3 classes.
Write an extraction method for the new class and in the use case below 
it works. The method increments the ndex before calling NextMethod, the 
usual extraction function.



`[.zerobased` <- function(x, i, ...) {
  i <- i + 1L
  NextMethod()
}
as_zerobased <- function(x) {
  class(x) <- c("zerobased", class(x))
  x
}

x <- 1:10
y <- as_zerobased(x)

y[0]
#> [1] 1
y[1]
#> [1] 2
y[9]
#> [1] 10
y[10]
#> [1] NA


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread Rui Barradas

Às 06:47 de 08/04/2024, Dave Dixon escreveu:

Greetings,

I have a csv file of 76 fields and about 4 million records. I know that 
some of the records have errors - unmatched quotes, specifically. 
Reading the file with readLines and parsing the lines with read.csv(text 
= ...) is really slow. I know that the first 2459465 records are good. 
So I try this:


 > startTime <- Sys.time()
 > first_records <- read.csv(file_name, nrows = 2459465)
 > endTime <- Sys.time()
 > cat("elapsed time = ", endTime - startTime, "\n")

elapsed time =   24.12598

 > startTime <- Sys.time()
 > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
 > endTime <- Sys.time()
 > cat("elapsed time = ", endTime - startTime, "\n")

This appears to never finish. I have been waiting over 20 minutes.

So why would (skip = 2459465, nrows = 5) take orders of magnitude longer 
than (nrows = 2459465) ?


Thanks!

-dave

PS: readLines(n=2459470) takes 10.42731 seconds.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Can the following function be of help?
After reading the data setting argument quote=FALSE, call a function 
applying gregexpr to its character columns, then transforming the output 
in a two column data.frame with columns


 Col - the column processed;
 Unbalanced - the rows with unbalanced double quotes.

I am assuming the quotes are double quotes. It shouldn't be difficult to 
adapt it to other cas, single quotes, both cases.





unbalanced_dquotes <- function(x) {
  char_cols <- sapply(x, is.character) |> which()
  lapply(char_cols, \(i) {
y <- x[[i]]
Unbalanced <- gregexpr('"', y) |>
  sapply(\(x) attr(x, "match.length") |> length()) |>
  {\(x) (x %% 2L) == 1L}() |>
  which()
data.frame(Col = i, Unbalanced = Unbalanced)
  }) |>
  do.call(rbind, args = _)
}

# read the data disregardin g quoted strings
df1 <- read.csv(fl, quote = "")
# determine which strings have unbalanced quotes and
# where
unbalanced_dquotes(df1)


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Rui Barradas

Às 19:42 de 08/04/2024, Ivan Krylov via R-help escreveu:

В Sun, 7 Apr 2024 23:47:52 -0600
Dave Dixon  пишет:


  > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)


It may or may not be important that read.csv defaults to header =
TRUE. Having skipped 2459465 lines, it may attempt to parse the next
one as a header, so the second call read.csv() should probably include
header = FALSE.



This will throw an error, call read.table with sep="," instead.




Bert's advice to try scan() is on point, though. It's likely that the
default-enabled header is not the most serious problem here.



Hoep this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding reservoir volume and water level

2024-04-07 Thread Rui Barradas

Às 13:27 de 07/04/2024, javad bayat escreveu:

Dear all;
I have a question about the water level of a reservoir, when the volume
changed or doubled.
There is a DEM file with the highest elevation 1267 m. The lowest elevation
is 1230 m. The current volume of the reservoir is 7,000,000 m3 at 1240 m.
Now I want to know what would be the water level if the volume rises to
1250 m? or what would be the water level if the volume doubled (14,000,000
m3)?

Is there any way to write codes to do this in R?
I would be more than happy if anyone could help me.
Sincerely









Hello,

This is a simple rule of three.
If you know the level l the argument doesn't need to be named but if you 
know the volume v then it must be named.



water_level <- function(l, v, level = 1240, volume = 7e6) {
  if(missing(v)) {
volume * l / level
  } else level * v / volume
}

lev <- 1250
vol <- 14e6

water_level(l = lev)
#> [1] 7056452
water_level(v = vol)
#> [1] 2480


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Output of tapply function as data frame: Problem Fixed

2024-03-28 Thread Rui Barradas

Às 01:43 de 29/03/2024, Ogbos Okike escreveu:

Dear Rui,
Thanks again for resolving this. I have already started using the version
that works for me.

But to clarify the second part, please let me paste the what I did and the
error message:


set.seed(2024)
data <- data.frame(

+Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
+ TRUE),
+count = sample(10L, 100L, TRUE)
+ )


# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()

Error: unexpected '>' in "res <- with(data, tapply(count, Date, mean)) |>"

# assign a dates column from the row names
res$Date <- row.names(res)

Error in row.names(res) : object 'res' not found

# cosmetics
names(res)[2:1] <- names(data)

Error in names(res)[2:1] <- names(data) : object 'res' not found

# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculations.
res


You can see that the error message is on the pipe. Please, let me know
where I am missing it.
Thanks.

On Wed, Mar 27, 2024 at 10:45 PM Rui Barradas  wrote:


Às 08:58 de 27/03/2024, Ogbos Okike escreveu:

Dear Rui,
Nice to hear from you!

I am sorry for the omission and I have taken note.

Many thanks for responding. The second solution looks elegant as it

quickly

resolved the problem.

Please, take a second look at the first solution. It refused to run.

Looks

as if the pipe is not properly positioned. Efforts to correct it and get

it

run failed. If you can look further, it would be great. If time does not
permit, I am fine too.

But having the too solutions will certainly make the subject more
interesting.
Thank you so much.
With warmest regards from
Ogbos

On Wed, Mar 27, 2024 at 8:44 AM Rui Barradas 

wrote:



Às 04:30 de 27/03/2024, Ogbos Okike escreveu:

Warm greetings to you all.

Using the tapply function below:
data<-read.table("FD1month",col.names = c("Dates","count"))
x=data$count
f<-factor(data$Dates)
AB<- tapply(x,f,mean)


I made a simple calculation. The result, stored in AB, is of the form
below. But an effort to write AB to a file as a data frame fails. When

I

use the write table, it only produces the count column and strip of the
first column (date).

2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
2006-05-01
-4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
-4.430705
2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
2006-12-01
-3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
-10.072579
2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
2007-07-01
-5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
-4.127980
2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
2008-02-01
-3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
-4.183578
2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01
-4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
-4.347428
2008-10-01 2008-11-01 2008-12-01
-4.029918  -4.260326  -4.454224

But the normal format I wish to display only appears on the terminal,
leading me to copy it and paste into a text file. That is, when I enter

AB

on the terminal, it returns a format in the form:

008-02-01  -4.183578
2008-03-01  -4.336969
2008-04-01  -3.813306
2008-05-01  -4.296579
2008-06-01  -4.575095
2008-07-01  -4.036036
2008-08-01  -4.727994
2008-09-01  -4.347428
2008-10-01  -4.029918
2008-11-01  -4.260326
2008-12-01  -4.454224

Now, my question: How do I write out two columns displayed by AB on the
terminal to a file?

I have tried using AB<-data.frame(AB) but it doesn't work either.

Many thanks for your time.
Ogbos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

The main trick is to pipe to as.data.frame. But the result will have one
column only, you must assign the dates from the df's row names.
I also include an aggregate solution.



# create a test data set
set.seed(2024)
data <- data.frame(
 Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
TRUE),
 count = sample(10L, 100L, TRUE)
)

# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
# assign a dates column from the row names
res$Date <- row.names(res)
# cosmetics
names(res)[2:1] <- names(data)
# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculat

Re: [R] Output of tapply function as data frame: Problem Fixed

2024-03-27 Thread Rui Barradas

Às 08:58 de 27/03/2024, Ogbos Okike escreveu:

Dear Rui,
Nice to hear from you!

I am sorry for the omission and I have taken note.

Many thanks for responding. The second solution looks elegant as it quickly
resolved the problem.

Please, take a second look at the first solution. It refused to run. Looks
as if the pipe is not properly positioned. Efforts to correct it and get it
run failed. If you can look further, it would be great. If time does not
permit, I am fine too.

But having the too solutions will certainly make the subject more
interesting.
Thank you so much.
With warmest regards from
Ogbos

On Wed, Mar 27, 2024 at 8:44 AM Rui Barradas  wrote:


Às 04:30 de 27/03/2024, Ogbos Okike escreveu:

Warm greetings to you all.

Using the tapply function below:
data<-read.table("FD1month",col.names = c("Dates","count"))
x=data$count
   f<-factor(data$Dates)
AB<- tapply(x,f,mean)


I made a simple calculation. The result, stored in AB, is of the form
below. But an effort to write AB to a file as a data frame fails. When I
use the write table, it only produces the count column and strip of the
first column (date).

2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
2006-05-01
   -4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
   -4.430705
2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
2006-12-01
   -3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
-10.072579
2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
2007-07-01
   -5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
   -4.127980
2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
2008-02-01
   -3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
   -4.183578
2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01
   -4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
   -4.347428
2008-10-01 2008-11-01 2008-12-01
   -4.029918  -4.260326  -4.454224

But the normal format I wish to display only appears on the terminal,
leading me to copy it and paste into a text file. That is, when I enter

AB

on the terminal, it returns a format in the form:

008-02-01  -4.183578
2008-03-01  -4.336969
2008-04-01  -3.813306
2008-05-01  -4.296579
2008-06-01  -4.575095
2008-07-01  -4.036036
2008-08-01  -4.727994
2008-09-01  -4.347428
2008-10-01  -4.029918
2008-11-01  -4.260326
2008-12-01  -4.454224

Now, my question: How do I write out two columns displayed by AB on the
terminal to a file?

I have tried using AB<-data.frame(AB) but it doesn't work either.

Many thanks for your time.
Ogbos

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

The main trick is to pipe to as.data.frame. But the result will have one
column only, you must assign the dates from the df's row names.
I also include an aggregate solution.



# create a test data set
set.seed(2024)
data <- data.frame(
Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
TRUE),
count = sample(10L, 100L, TRUE)
)

# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
# assign a dates column from the row names
res$Date <- row.names(res)
# cosmetics
names(res)[2:1] <- names(data)
# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculations.
res
#>   count   Date
#> 2024-03-22 5.416667 2024-03-22
#> 2024-03-23 5.50 2024-03-23
#> 2024-03-24 6.00 2024-03-24
#> 2024-03-25 4.476190 2024-03-25
#> 2024-03-26 6.538462 2024-03-26
#> 2024-03-27 5.20 2024-03-27

# fix the columns' order
res <- res[2:1]



# better all in one instruction
aggregate(count ~ Date, data, mean)
#> Datecount
#> 1 2024-03-22 5.416667
#> 2 2024-03-23 5.50
#> 3 2024-03-24 6.00
#> 4 2024-03-25 4.476190
#> 5 2024-03-26 6.538462
#> 6 2024-03-27 5.20



Also,
I'm glad to help as always but Ogbos, you have been an R-Help
contributor for quite a while, please post data in dput format. Given
the problem the output of the following is more than enough.


dput(head(data, 20L))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
www.avg.com




Hello,

This pipe?


with(data, tapply(count, Date, mean)) |> as.data.frame()


I am not seeing anything wrong with it. I have tried it again just now 
and it runs with no problems, like it had before.

A solution is not to pipe, separate the instructions.


Re: [R] Output of tapply function as data frame

2024-03-27 Thread Rui Barradas

Às 04:30 de 27/03/2024, Ogbos Okike escreveu:

Warm greetings to you all.

Using the tapply function below:
data<-read.table("FD1month",col.names = c("Dates","count"))
x=data$count
  f<-factor(data$Dates)
AB<- tapply(x,f,mean)


I made a simple calculation. The result, stored in AB, is of the form
below. But an effort to write AB to a file as a data frame fails. When I
use the write table, it only produces the count column and strip of the
first column (date).

2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
2006-05-01
  -4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
  -4.430705
2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
2006-12-01
  -3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
-10.072579
2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
2007-07-01
  -5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
  -4.127980
2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
2008-02-01
  -3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
  -4.183578
2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01
  -4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
  -4.347428
2008-10-01 2008-11-01 2008-12-01
  -4.029918  -4.260326  -4.454224

But the normal format I wish to display only appears on the terminal,
leading me to copy it and paste into a text file. That is, when I enter AB
on the terminal, it returns a format in the form:

008-02-01  -4.183578
2008-03-01  -4.336969
2008-04-01  -3.813306
2008-05-01  -4.296579
2008-06-01  -4.575095
2008-07-01  -4.036036
2008-08-01  -4.727994
2008-09-01  -4.347428
2008-10-01  -4.029918
2008-11-01  -4.260326
2008-12-01  -4.454224

Now, my question: How do I write out two columns displayed by AB on the
terminal to a file?

I have tried using AB<-data.frame(AB) but it doesn't work either.

Many thanks for your time.
Ogbos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The main trick is to pipe to as.data.frame. But the result will have one 
column only, you must assign the dates from the df's row names.

I also include an aggregate solution.



# create a test data set
set.seed(2024)
data <- data.frame(
  Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L, 
TRUE),

  count = sample(10L, 100L, TRUE)
)

# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
# assign a dates column from the row names
res$Date <- row.names(res)
# cosmetics
names(res)[2:1] <- names(data)
# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculations.
res
#>   count   Date
#> 2024-03-22 5.416667 2024-03-22
#> 2024-03-23 5.50 2024-03-23
#> 2024-03-24 6.00 2024-03-24
#> 2024-03-25 4.476190 2024-03-25
#> 2024-03-26 6.538462 2024-03-26
#> 2024-03-27 5.20 2024-03-27

# fix the columns' order
res <- res[2:1]



# better all in one instruction
aggregate(count ~ Date, data, mean)
#> Datecount
#> 1 2024-03-22 5.416667
#> 2 2024-03-23 5.50
#> 3 2024-03-24 6.00
#> 4 2024-03-25 4.476190
#> 5 2024-03-26 6.538462
#> 6 2024-03-27 5.20



Also,
I'm glad to help as always but Ogbos, you have been an R-Help 
contributor for quite a while, please post data in dput format. Given 
the problem the output of the following is more than enough.



dput(head(data, 20L))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with R coding

2024-03-12 Thread Rui Barradas

Às 07:43 de 12/03/2024, Maria Del Mar García Zamora escreveu:

Hello,

This is the error that appears when I try to load library(Rcmdr). I am using R 
version 4.3.3. I have tried to upload the packages, uninstall them and 
intalling them again and nothing.
Loading required package: splines
Loading required package: RcmdrMisc
Loading required package: car
Loading required package: carData
Loading required package: sandwich
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Error: package or namespace load failed for ‘Rcmdr’:
  .onLoad failed in loadNamespace() for 'tcltk2', details:
   call: file.exists("~/.Rtk2theme")
   error: file name conversion problem -- name too long?

Once this appears I use path.expand('~') and this is R's answer:
[1] "C:\\Users\\marga\\OneDrive - Fundaci\xf3n Universitaria San Pablo 
CEU\\Documentos"

The thing is that in spanish we use accents, so this word (Fundaci\xf3n) really 
is Fundación, but I can't change it.

I have tried to start R from CDM using: C:\Users\marga>set 
R_USER=C:\Users\marga\R_USER

C:\Users\marga>"C:\Users\marga\Desktop\R-4.3.3\bin\R.exe" CMD Rgui

At the beginning this worked but right now a message saying that this app 
cannot be used and that I have to ask the software company (photo attached)

What should I do?

Thanks,

Mar


[https://www.uchceu.es/img/externos/correo/ceu_uch.gif]<https://www.uchceu.es/>

Maria Del Mar García Zamora
Alumno UCHCEU -
Universidad CEU Cardenal Herrera
-
Tel.
www.uchceu.es<https://www.uchceu.es/>

[https://www.uchceu.es/img/logos/wur.jpg]
[https://www.uchceu.es/img/externos/correo/medio_ambiente.gif] Por favor, 
piensa en el medio ambiente antes de imprimir este contenido



[http://www.uchceu.es/img/externos/correo/ceu_uch.gif]<http://www.uchceu.es/>

Maria Del Mar García Zamora

www.uchceu.es<http://www.uchceu.es/>

[http://www.uchceu.es/img/externos/correo/medio_ambiente.gif] Por favor, piensa 
en el medio ambiente antes de imprimir este contenido




Este mensaje y sus archivos adjuntos, enviados desde FUNDACIÓN UNIVERSITARIA 
SAN PABLO-CEU, pueden contener información confidencial y está destinado a ser 
leído sólo por la persona a la que va dirigido, por lo que queda prohibida la 
difusión, copia o utilización de dicha información por terceros. Si usted lo 
recibiera por error, por favor, notifíquelo al remitente y destruya el mensaje 
y cualquier documento adjunto que pudiera contener. Cualquier información, 
opinión, conclusión, recomendación, etc. contenida en el presente mensaje no 
relacionada con la actividad de FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, y/o 
emitida por persona no autorizada para ello, deberá considerarse como no 
proporcionada ni aprobada por FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, que pone 
los medios a su alcance para garantizar la seguridad y ausencia de errores en 
la correspondencia electrónica, pero no puede asegurar la inexistencia de virus 
o la no alteración de los documentos transmitidos electrónicamente, por lo que 
declina cualquier responsabilidad a este respecto.

This message and its attachments, sent from FUNDACIÓN UNIVERSITARIA SAN 
PABLO-CEU, may contain confidential information and is intended to be read only 
by the person it is directed. Therefore any disclosure, copying or use by third 
parties of this information is prohibited. If you receive this in error, please 
notify the sender and destroy the message and any attachments may contain. Any 
information, opinion, conclusion, recommendation,... contained in this message 
and which is unrelated to the business activity of FUNDACIÓN UNIVERSITARIA SAN 
PABLO-CEU and/or issued by unauthorized personnel, shall be considered 
unapproved by FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU. FUNDACIÓN UNIVERSITARIA 
SAN PABLO-CEU implements control measures to ensure, as far as possible, the 
security and reliability of all its electronic correspondence. However, 
FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU does not guarantee that emails are 
virus-free or that documents have not be altered, and does not take 
responsibility in this respect.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

First of all, try running Rgui only, no R.exe CMD. Just Rgui.exe or
C:\Users\marga\Desktop\R-4.3.3\bin\Rgui.exe
Then, in Rgui, try loading Rcmdr

library(Rcmdr)


Also, do you have R in your Windows PATH variable? The directory to put 
in PATH should be


C:\Users\marga\Desktop\R-4.3.3\bin

so that Windows can find R.exe and Rgui.exe without the full path name.

Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg

Re: [R] help - Package: stats - function ar.ols

2024-02-23 Thread Rui Barradas

Às 16:34 de 22/02/2024, Pedro Gavronski. escreveu:

Hello,

My name is Pedro and it is nice to meet you all. I am having trouble
understanding a message that I receive when use function ar.ols from
package stats, it says that "Warning message:
In ar.ols(x = dtb[2:6966, ], demean = FALSE, intercept = TRUE,
prewhite = TRUE) :
   model order:  2 singularities in the computation of the projection
matrix results are only valid up to model order 1, which I do not know
what it means, if someone could clarify it, I would really appreciate
it.

Attached to this email you will find my code and data I used to run
this formula.

Thanks in advance.

Best regards,  Pedro.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Thanks for the data but the code is missing from the attachment.
Can you please post your code? In an attachment or directly in the 
e-mail body.


Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping

2024-02-18 Thread Rui Barradas

Às 03:27 de 19/02/2024, Steven Yen escreveu:

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is,

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a way of reading the files in a *apply loop. The file names are 
created by getting them from file (list.files) or by a string editing 
function (sprintf).



# file_names_vec <- list.files(pattern = "data\\d+\\.csv")
file_names_vec <- sprintf("data%d.csv", 1:24)
data_list <- sapply(file_names_vec, read.csv, simplify = FALSE)

# access the 1st data.frame
data_list[[1L]]
# same as above
data_list[["data1.csv"]]
# same as above
data_list$data1.csv


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages sometimes don't update, but no error or warning is thrown

2024-02-14 Thread Rui Barradas

Às 10:50 de 14/02/2024, Martin Maechler escreveu:

Berwin A Turlach
 on Wed, 14 Feb 2024 11:47:41 +0800 writes:
Berwin A Turlach
 on Wed, 14 Feb 2024 11:47:41 +0800 writes:


 > G'day Philipp,

 > On Tue, 13 Feb 2024 09:59:17 +0100 gernophil--- via R-help
 >  wrote:

 >> this question is related to this
 >> (https://community.rstudio.com/t/packages-are-not-updating/166214/3),
 >> [...]

 >> To sum it up: If I am updating packages (be it via
 >> Bioconductor or CRAN) some packages simply don’t update,
 >> [...]

 >> I would expect any kind of message that the package will
 >> not be updated, since no newer binary is available or a
 >> prompt, if I want to compile from source.

 > RStudio is doing its own thing for some task, including
 > 'install.packages()' (and for some reasons, at least on
 > the platforms on which I use RStudio, RStudio calls
 > 'install.packages()' and not 'update.packages()' when an
 > update is requested via the GUI). See:

 RStudio> install.packages
 > function (...)  .rs.callAs(name, hook, original, ...)
 > 

 > compared to:

 R> install.packages
 > function (pkgs, lib, repos = getOption("repos"),
 > contriburl = contrib.url(repos, type), method, available =
 > NULL, destdir = NULL, dependencies = NA, type =
 > getOption("pkgType"), configure.args =
 > getOption("configure.args"), configure.vars =
 > getOption("configure.vars"), clean = FALSE, Ncpus =
 > getOption("Ncpus", 1L), verbose = getOption("verbose"),
 > libs_only = FALSE, INSTALL_opts, quiet = FALSE,
 > keep_outputs = FALSE, ...)  { [...]


 > So if you use Install/Update in the Packages tab of
 > RStudio and do not experience the behaviour you are
 > expecting, it is something that you need to discuss with
 > Posit, not with R. :)

 >> However, the only message I get is: ``` trying URL
 >> ''

 > The package name has the version number encoded in it, so
 > theoretical you should be able to tell at this point
 > whether the package that is downloaded is the version that
 > is already installed, hence no update will happen.

 > Best wishes,

 >   Berwin


Yes, thank's a lot, Berwin.

Indeed I've raised the fact that RStudio
hides R's own install.packages() from the user  and uses its
own, undocumented one ... this has been the case for quite a few years.
I found out during teaching --- one of the few times, I use
RStudio to use R... in another case where RStudio's
install.packages() behaved differently than R's.

I'm pretty sure this is reason for quite a bit of confusion...

Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

From within RStudio you can always run the qualified names

utils::install.packages()
utils::update.packages()

or run from the command line.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages sometimes don't update, but no error or warning is thrown

2024-02-13 Thread Rui Barradas
Hello,

Not exactly an answer, just a thought:
Whenever I have problems updating or installing packages from whithin 
RStudio I close RStudio, write a script with the install.packages() call 
and run it from a command window.



R -q -f "instscript.R"


This many times works better and it also works with Bioconductor's 
BiocManager::install or with remotes'/devtools's install_github.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gathering denominator under frac

2024-02-02 Thread Rui Barradas

Às 10:01 de 02/02/2024, Troels Ring escreveu:

Hi friends - I'm plotting a ratio of bicarbonates i ggplot2 and

ylab(expression(paste(frac("additive BIC","true BIC" worked OK - but 
now I have been asked to put the chemistry instead - so I wrote


  ylab(expression(paste(frac("additive",HCO[3]^"-","true",HCO[3]^"-" 
- and frac saw that as additive = numerator and HCO3- = denominator and 
the rest was ignored-


So how do I make frac ignore the first ","  and print the fraction as I 
want?



All best wishes
Troels

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

This seems to work. Instead of separating the two numerator strings with 
a comma, separate them with a tilde. The same goes for the denominator.

And there is no need for double quotes around "additive" and "true".


library(ggplot2)

g <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point()

g + ylab(expression(paste(frac(
  additive~HCO[3]^"-",
  true~HCO[3]^"-"




Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help testing a problem

2024-02-01 Thread Rui Barradas
ackages:
[1] rerddap_1.1.0

loaded via a namespace (and not attached):
 [1] vctrs_0.6.3   cli_3.6.1 rlang_1.1.1   ncdf4_1.22 

 [5] crul_1.4.0generics_0.1.3jsonlite_1.8.7 
data.table_1.14.8
 [9] glue_1.6.2httpcode_0.3.0triebeard_0.4.1   fansi_1.0.5 


[13] rappdirs_0.3.3tibble_3.2.1  hoardr_0.5.4  lifecycle_1.0.4
[17] compiler_4.3.2dplyr_1.1.3   Rcpp_1.0.12   pkgconfig_2.0.3
[21] digest_0.6.33 R6_2.5.1  tidyselect_1.2.0  utf8_1.2.4
[25] pillar_1.9.0  curl_5.2.0magrittr_2.0.3urltools_1.7.3
[29] xml2_1.3.5
>



So there was an unspecified error, an error without a condition message 
and no call expression. I find this stranger, a call like the following 
is expected.



tryCatch(stop("error"), error = function(e) e) |> str()
List of 2
 $ message: chr "error"
 $ call   : language doTryCatch(return(expr), name, parentenv, handler)
 - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"


Function tabledap doesn't seem to be handling errors properly.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 3-dimensions

2023-12-17 Thread Rui Barradas

Às 09:13 de 17/12/2023, SIBYLLE STÖCKLI via R-help escreveu:

Dear R community

In the meantime I made some progress:
   ggplot(data = Fig2b, aes(x = BFF, y = Wert, fill = Effekt))+theme_bw()+
 geom_bar(stat = "identity", width = 0.95) +
 scale_y_continuous(limits=c(0,13), expand=c(0,0))+
 facet_wrap(~Aspekt, strip.position = "bottom", scales = "free_x") +
 theme(panel.spacing = unit(0, "lines"),
   strip.background = element_blank(),
   strip.placement = "outside")+
 theme(axis.title.x=element_blank())+
 scale_fill_manual("Effekt", values = c("Neg" = "red", "Neu" =
"darkgrey", "Pos" = "blue"), labels=c("Negativ", "Nicht sign.", "Positiv"))
   
   
Question

- Is it possible to present all the subpolots in one graph (not to "lines")?

- I tried to change the angel of the x-axis. However, I was able to change
the first x-axis (BB...), but not the second one (Voegel). Maybe this
would solve the problem.
- If not, is there another possibility to fix the number of subplots per
line?

Kind regards
Sibylle

-Original Message-
From: R-help  On Behalf Of SIBYLLE STÖCKLI via
R-help
Sent: Saturday, December 16, 2023 12:16 PM
To: R-help@r-project.org
Subject: [R] ggplot 3-dimensions

Dear R-user

Does anybody now, if ggplot allows to use two x-axis including two
dimensions (similar to excel plot (picture 1 in the pdf attachmet). If yes,
how should I adapt my code? The parameters are presented in the input file
(attachment: Input).

Fig2b = read.delim("BFF_Fig-2b.txt", na.strings="NA")
names(Fig2b)
head(Fig2b)
summary(Fig2b)
str(Fig2b)
Fig2b$Aspekt<-factor(Fig2b$Aspekt, levels=(c("Voegel", "Kleinsaeuger",
"Schnecken", "Regenwuermer_Asseln", "Pilze")))

### Figure 2b
   ggplot(Fig2b,aes(Aspekt,Wert,fill=Effekt))+
 geom_bar(stat="identity",position='fill')+
 scale_y_continuous(limits=c(0,14), expand=c(0,0))+
 labs(x="", y="Anzahl Studien pro Effekt")

Kind regards
Sibylle


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You are posting the data as image once again, please don't do this.
Paste the output of

dput(Fig2b)# if small data
dput(head(Fig2b, 20))  # if too big to fit in an e-mail


in your mails. Here it is.



Aspekt <- c("Flora", "Flora", "Flora", "Tagfalter", "Tagfalter", 
"Tagfalter",
"Heuschre", "Heuschre", "Heuschre", "Kaefer_Sp", 
"Kaefer_Sp", "Kaefer_Sp",
"Schwebfli", "Schwebfli", "Schwebfli", "Bienen_F", 
"Bienen_F", "Bienen_F")

Aspekt <- c(Aspekt, Aspekt)
BFF <- rep(c("BB", "SA", "NE"), times = 12)
Effekt <- c(rep("Neg", times = 18), rep("Pos", times = 18))
Wert <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0,
  2, 1, 0, 0, 1, 0, 9, 4, 6, 0, 0, 3, 0, 0, 4)
Fig2b <- data.frame(Aspekt, BFF, Effekt, Wert)



As for the question, you can use facet_wrap argument nrow to have all 
plots in one row only, see the comment before facet_wrap. I don't know 
if this solves the problem.

Also, I define a custom theme to make the code clearer later.



library(ggplot2)

theme_sibylle <- function() {
  theme_bw(base_size = 10) %+replace%
theme(
  panel.spacing = unit(0, "lines"),
  strip.background = element_blank(),
  strip.placement = "outside",
  # this line was added by me, remove if not wanted
  strip.text.x.bottom = element_text(face = "bold", size = 10),
  axis.title.x = element_blank()
)
}

ggplot(data = Fig2b, aes(x = BFF, y = Wert, fill = Effekt)) +
  geom_bar(stat = "identity", width = 0.95) +
  scale_y_continuous(limits=c(0,13), expand=c(0,0)) +
  # here I use nrow = 1L to put everything in one row only
  facet_wrap(~ Aspekt, nrow = 1L, strip.position = "bottom", scales = 
"free_x") +

  scale_fill_manual(
name = "Effekt",
values = c("Neg" = "red", "Neu" = "darkgrey", "Pos" = "blue"),
labels = c("Negativ", "Nicht sign.", "Positiv")) +
  theme_sibylle()



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: Get the regression line with 95% confidence bands

2023-12-12 Thread Rui Barradas

Às 00:36 de 13/12/2023, Robert Baer escreveu:
coord_cartesian also seems to work for y, and including the breaks = . 
How about:


df=data.frame(year= c(2012,2015,2018,2022),
   score=c(495,493, 495, 474))

ggplot(df, aes(x = year, y = score)) +
   geom_point() +
   geom_smooth(method = "lm", formula = y ~ x) +
   labs(title = "Standard linear regression for France", x = "Year", y = 
"PISA score in mathematics") +

   coord_cartesian(ylim=c(470,500)) +
   scale_x_continuous(breaks = 2012:2022)

On 12/12/2023 3:19 PM, varin sacha via R-help wrote:

Dear Ben,
Dear Daniel,
Dear Rui,
Dear Bert,

Here below my R code.
I really appreciate all your comments. My R code is perfectly working 
but there is still something I would like to improve. The X-axis is 
showing   2012.5 ;   2015.0   ;   2017.5   ;  2020.0
I would like to see on X-axis only the year (2012 ; 2015 ; 2017 ; 
2020). How to do?



#
library(ggplot2)
df=data.frame(year= c(2012,2015,2018,2022), score=c(495,493, 495, 474))

ggplot(df, aes(x = year, y = score)) + geom_point() + 
geom_smooth(method = "lm", formula = y ~ x) +
  labs(title = "Standard linear regression for France", x = "Year", y 
= "PISA score in mathematics") + 
scale_y_continuous(limits=c(470,500),oob=scales::squish)

#









Le lundi 11 décembre 2023 à 23:38:06 UTC+1, Ben Bolker 
 a écrit :








On 2023-12-11 5:27 p.m., Daniel Nordlund wrote:

On 12/10/2023 2:50 PM, Rui Barradas wrote:

Às 22:35 de 10/12/2023, varin sacha via R-help escreveu:

Dear R-experts,

Here below my R code, as my X-axis is "year", I must be missing one
or more steps! I am trying to get the regression line with the 95%
confidence bands around the regression line. Any help would be
appreciated.

Best,
S.


#
library(ggplot2)
   df=data.frame(year=factor(c("2012","2015","2018","2022")),
score=c(495,493, 495, 474))
   ggplot(df, aes(x=year, y=score)) + geom_point( ) +
geom_smooth(method="lm", formula = score ~ factor(year), data = df) +
labs(title="Standard linear regression for France", y="PISA score in
mathematics") + ylim(470, 500)
#

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I don't see a reason why year should be a factor and the formula in
geom_smooth is wrong, it should be y ~ x, the aesthetics envolved.
It still doesn't plot the CI's though. There's a warning and I am not
understanding where it comes from. But the regression line is plotted.



ggplot(df, aes(x = as.numeric(year), y = score)) +
   geom_point() +
   geom_smooth(method = "lm", formula = y ~ x) +
   labs(
 title = "Standard linear regression for France",
 x = "Year",
 y = "PISA score in mathematics"
   ) +
   ylim(470, 500)
#> Warning message:
#> In max(ids, na.rm = TRUE) : no non-missing arguments to max;
returning -Inf



Hope this helps,

Rui Barradas




After playing with this for a little while, I realized that the problem
with plotting the confidence limits is the addition of ylim(470, 500).
The confidence values are outside the ylim values.  Remove the limits,
or increase the range, and the confidence curves will plot.

Hope this is helpful,

Dan

   Or use + scale_y_continuous(limits = c(470, 500), oob = 
scales::squish)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

In the code below I don't use coord_cartesian because to set ylim will 
cut part of the confidence intervals.


To have labels only in the years present in the data set, get them from 
the data.




library(ggplot2)

df <- data.frame(year= c(2012,2015,2018,2022),
 score=c(495,493, 495, 47

Re: [R] ggplot2: Get the regression line with 95% confidence bands

2023-12-10 Thread Rui Barradas

Às 22:35 de 10/12/2023, varin sacha via R-help escreveu:


Dear R-experts,

Here below my R code, as my X-axis is "year", I must be missing one or more 
steps! I am trying to get the regression line with the 95% confidence bands around the 
regression line. Any help would be appreciated.

Best,
S.


#
library(ggplot2)
  
df=data.frame(year=factor(c("2012","2015","2018","2022")), score=c(495,493, 495, 474))
  
ggplot(df, aes(x=year, y=score)) + geom_point( ) + geom_smooth(method="lm", formula = score ~ factor(year), data = df) + labs(title="Standard linear regression for France", y="PISA score in mathematics") + ylim(470, 500)

#

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I don't see a reason why year should be a factor and the formula in 
geom_smooth is wrong, it should be y ~ x, the aesthetics envolved.
It still doesn't plot the CI's though. There's a warning and I am not 
understanding where it comes from. But the regression line is plotted.




ggplot(df, aes(x = as.numeric(year), y = score)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
title = "Standard linear regression for France",
x = "Year",
y = "PISA score in mathematics"
  ) +
  ylim(470, 500)
#> Warning message:
#> In max(ids, na.rm = TRUE) : no non-missing arguments to max; 
returning -Inf




Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert character date time to R date-time variable.

2023-12-07 Thread Rui Barradas

Às 16:30 de 07/12/2023, Rui Barradas escreveu:

Às 16:21 de 07/12/2023, Sorkin, John escreveu:

Colleagues,

I have a matrix of character data that represents date and time. The 
format of each element of the matrix is

"2020-09-17_00:00:00"
How can I convert the elements into a valid R date-time constant?

Thank you,
John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;

Associate Director for Biostatistics and Informatics, Baltimore VA 
Medical Center Geriatrics Research, Education, and Clinical Center;


PI Biostatistics and Informatics Core, University of Maryland School 
of Medicine Claude D. Pepper Older Americans Independence Center;


Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Coerce with ?as.POSIXct
Don't forget the underscore in the format.


as.POSIXct("2020-09-17_00:00:00", format = "%Y-%m-%d_%H:%M:%S")


Hope this helps,

Rui Barradas



Sorry, I forgot:


lubridate::ymd_hms("2020-09-17_00:00:00")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert character date time to R date-time variable.

2023-12-07 Thread Rui Barradas

Às 16:21 de 07/12/2023, Sorkin, John escreveu:

Colleagues,

I have a matrix of character data that represents date and time. The format of 
each element of the matrix is
"2020-09-17_00:00:00"
How can I convert the elements into a valid R date-time constant?

Thank you,
John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;

Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;

PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;

Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Coerce with ?as.POSIXct
Don't forget the underscore in the format.


as.POSIXct("2020-09-17_00:00:00", format = "%Y-%m-%d_%H:%M:%S")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann Kendall mutation package?

2023-12-01 Thread Rui Barradas

Às 11:58 de 01/12/2023, Nick Wray escreveu:

Hello - does anyone know whether there are any packages for Mann-Kendall
mutation tests in R available?  The only one I could find online is this
MK_mut_test: Mann-Kendall mutation test in Sibada/sibadaR: Sibada's
accumulated R scripts for next probably use to avoid reinventing the wheel.
(rdrr.io) <https://rdrr.io/github/Sibada/sibadaR/man/MK_mut_test.html> but
there doesn't seem to be a package corresponding to this.  I've tried
installing various permutations of the apparent name Sibada/sibadaR but
nothing comes up, so I'm not sure whether it even exists...

Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Your link points to a GitHub repository, the package can be installed with


devtools::install_github(repo = "Sibada/sibadaR")



Hope this helps

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] back tick names with predict function

2023-11-30 Thread Rui Barradas

Às 17:57 de 30/11/2023, Rui Barradas escreveu:

Às 17:38 de 30/11/2023, Robert Baer escreveu:
I am having trouble using back ticks with the R extractor function 
'predict' and an lm() model.  I'm trying too construct some nice 
vectors that can be used for plotting the two types of regression 
intervals.  I think it works with normal column heading names but it 
fails when I have "special" back-tick names.  Can anyone help with how 
I would reference these?  Short of renaming my columns, is there a way 
to accomplish this?


Repex

*# dataframe with dashes in column headings
cob =
   structure(list(`cob-wt` = c(212, 241, 215, 225, 250, 241, 237,
 282, 206, 246, 194, 241, 196, 193, 224, 
257, 200, 190, 208, 224

), `plant-density` = c(137, 107, 132, 135, 115, 103, 102, 65,
    149, 85, 173, 124, 157, 184, 112, 80, 165, 
160, 157, 119)),

class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

# regression model works
mod2 = lm(`cob-wt` ~ `plant-density`, data = cob)

# x sequence for plotting CI's
# Set up x points
x = seq(min(cob$`plant-density`), max(cob$`plant-density`), length = 
1000)


# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.c = predict(mod2, data.frame( `plant-density` = x), interval = 'c') 
# fail


# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.p = predict(mod2, data.frame(`plant-density`  = x), interval = 
'p')    # fail

*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

When creating the new data df, the default check.names = TRUE changes 
the column name, it is repaired and the hyphen is replaced by a legal dot.



# check.names defaults to TRUE
newd <- data.frame(`plant-density` = x)
# `plant-density` is not a column name
head(newd)

# check.names set to FALSE
newd <- data.frame(`plant-density` = x, check.names = FALSE)
# `plant-density` is becomes a column name
head(newd)


# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.c = predict(mod2, newdata = newd, interval = 'confidence')  # fail

# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.p = predict(mod2, newdata = newd, interval = 'prediction')    # fail



Hope this helps,

Rui Barradas



Hello,

Sorry for the comments '# fail' in the last two instructions, I should 
have changed them.



CI.c <- predict(mod2, newdata = newd, interval = 'confidence')  # works
CI.p <- predict(mod2, newdata = newd, interval = 'prediction')  # works


Hoep this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] back tick names with predict function

2023-11-30 Thread Rui Barradas

Às 17:38 de 30/11/2023, Robert Baer escreveu:
I am having trouble using back ticks with the R extractor function 
'predict' and an lm() model.  I'm trying too construct some nice vectors 
that can be used for plotting the two types of regression intervals.  I 
think it works with normal column heading names but it fails when I have 
"special" back-tick names.  Can anyone help with how I would reference 
these?  Short of renaming my columns, is there a way to accomplish this?


Repex

*# dataframe with dashes in column headings
cob =
   structure(list(`cob-wt` = c(212, 241, 215, 225, 250, 241, 237,
     282, 206, 246, 194, 241, 196, 193, 224, 
257, 200, 190, 208, 224

), `plant-density` = c(137, 107, 132, 135, 115, 103, 102, 65,
    149, 85, 173, 124, 157, 184, 112, 80, 165, 160, 
157, 119)),

class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

# regression model works
mod2 = lm(`cob-wt` ~ `plant-density`, data = cob)

# x sequence for plotting CI's
# Set up x points
x = seq(min(cob$`plant-density`), max(cob$`plant-density`), length = 1000)

# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.c = predict(mod2, data.frame( `plant-density` = x), interval = 'c') # 
fail


# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.p = predict(mod2, data.frame(`plant-density`  = x), interval = 
'p')    # fail

*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

When creating the new data df, the default check.names = TRUE changes 
the column name, it is repaired and the hyphen is replaced by a legal dot.



# check.names defaults to TRUE
newd <- data.frame(`plant-density` = x)
# `plant-density` is not a column name
head(newd)

# check.names set to FALSE
newd <- data.frame(`plant-density` = x, check.names = FALSE)
# `plant-density` is becomes a column name
head(newd)


# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.c = predict(mod2, newdata = newd, interval = 'confidence')  # fail

# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.p = predict(mod2, newdata = newd, interval = 'prediction')# fail



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot with two x-axis and two dimensions

2023-11-25 Thread Rui Barradas

Às 10:29 de 24/11/2023, sibylle.stoec...@gmx.ch escreveu:

Dear R-user

Does anybody now, if ggplot allows to use two x-axis including two
dimensions (similar to excel plot (picture 1 in the pdf attachmet). If yes,
how should I adapt my code? The parameters are presented in the input file
(attachment: Input).

Fig2b = read.delim("BFF_Fig-2b.txt", na.strings="NA")
names(Fig2b)
head(Fig2b)
summary(Fig2b)
str(Fig2b)
Fig2b$Aspekt<-factor(Fig2b$Aspekt, levels=(c("Voegel", "Kleinsaeuger",
"Schnecken", "Regenwuermer_Asseln", "Pilze")))

### Figure 2b
   ggplot(Fig2b,aes(Aspekt,Wert,fill=Effekt))+
 geom_bar(stat="identity",position='fill')+
 scale_y_continuous(limits=c(0,14), expand=c(0,0))+
 labs(x="", y="Anzahl Studien pro Effekt")

Kind regards
Sibylle


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The first attached file does not match the data in the second file but 
here is an answer to both this question and to your other question [1].


The trick to have a secondary axis is to compute a ratio of axis 
lenghts. The lengths of the main and secondary axis can be computed by 
functions range() and diff(), like in the code below. Then use it to 
scale the secondary axis.




Fig2b <-
  structure(list(
Aspekt = c("Flora", "Flora", "Flora", "Tagfalter",
   "Tagfalter", "Tagfalter", "Heuschre", "Heuschre", 
"Heuschre",
   "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", "Schwebfli", 
"Schwebfli",
   "Schwebfli", "Bienen_F", "Bienen_F", "Bienen_F", 
"Flora", "Flora",
   "Flora", "Tagfalter", "Tagfalter", "Tagfalter", 
"Heuschre", "Heuschre",
   "Heuschre", "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", 
"Schwebfli",
   "Schwebfli", "Schwebfli", "Bienen_F", "Bienen_F", 
"Bienen_F"),

BFF = c("BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE",
"BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB",
"SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA",
"NE", "BB", "SA", "NE", "BB", "SA", "NE"),
Effekt = c("Neu",
   "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", 
"Neu",
   "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", 
"Pos",
   "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", 
"Pos",

   "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos"),
Wert = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 3L, 1L, 1L,
 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 1L, 0L,
 9L, 4L, 6L, 0L, 0L, 3L, 0L, 0L, 4L)),
row.names = c(NA, -36L), class = "data.frame")


library(ggplot2)

# First y axis (0-9)
# Second y axis (0-2500)
# fac <- diff(range( sec axis ))/diff(range( 1st axis ))
fac <- diff(range(0, 2500))/diff(range(0, 9))

ggplot(Fig2b, aes(Aspekt, Wert, fill = Effekt)) +
  geom_col(position = position_dodge()) +
  scale_y_continuous(
breaks = seq(0, 12, 2L),
sec.axis = sec_axis(~ . * fac)
  ) +
  labs(x = "", y = "Anzahl Studien pro Effekt")




[1] https://stat.ethz.ch/pipermail/r-help/2023-November/478605.html

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to draw mean values and 95% confidence intervals of groups with ggplot2

2023-11-16 Thread Rui Barradas

Às 11:59 de 16/11/2023, Luigi Marongiu escreveu:

Hello,
I have triplicate (column A) readings (column D) of samples exposed to
different concentrations (column C) over time (column B).
Is it possible to draw a line plot of the mean values for each
concentration (C)? At the moment, I get a single line.
Also, is there a simple way to draw the 95% CI around these data? I
know I need to use ribbon with the lower and upper limit, but is there
a simple way for ggplot2 to calculate directly these values?
Here is a working example:

```
A = c(rep(1, 28), rep(2, 28), rep(3, 28))
B = rep(c(0, 15, 30, 45, 60, 75, 90), 12)
C = rep(c(rep(0, 7), rep(0.6, 7), rep(1.2, 7),
   rep(2.5,7)),3)
D = c(731.33,761.67,730,761.67,741.67,788.67,784.33,
   686.67,685.33,680,693.67,684,704,709.67,739,
   731,719,767,760.67,776.67,768.67,675,671.67,
   668.67,677.33,673.67,687,696.67,727,750.67,
   752.67,786.67,794.67,843.33,946,732.67,737.33,
   775.33,828,918,1063,1270,752.67,742.33,
   735.67,
   747.67,777.33,803.67,865.67,700,700.67,705.67,
   722.67,744,779,837,748,742,754,747.67,
   775.67,808.67,869,705.67,714.33,702.33,730,
   710.67,731,744,686.33,687.33,670,702.33,
   669.33,707.33,708.33,724,747,761.33,715,
   697.67,728,728)

df = data.frame(A, B, C, D)
library(ggplot2)
ggplot(data=df, aes(x=B, y=D, z=C, color =C)) +
   geom_line(stat = "summary", fun = "mean") +
   geom_ribbon()
```

Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I am not sure that the code below is what you want.
The first 3 instructions are to create a named vector of colors.
The pipe is what tries to solve the problem. It computes means and se's 
by groups of time and concentration, then plots the ribbon below the lines.


It is important to not set color = C in the initial call to ggplot, 
since it would be effective in all the subsequent layers (try it).

To have one line per concentration I use group = C instead.



suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
})

n_colors <- df$C |> unique() |> length()
names_colors <- df$C |> unique() |> as.character()
clrs <- setNames(palette.colors(n_colors), names_colors)

df %>%
  mutate(C = factor(C)) %>%
  group_by(B, C) %>%
  mutate(mean_D = mean(D), se_D = sd(D)) %>%
  ungroup() %>%
  ggplot(aes(x = B, group = C)) +
  geom_ribbon(aes(ymin = mean_D - se_D, ymax = mean_D + se_D), fill = 
"grey", alpha = 0.5) +

  geom_line(aes(y = mean_D, color = C)) +
  geom_point(aes(y = D, color = C)) +
  scale_color_manual(name = "Concentration", values = clrs)


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anyone having trouble accesing CRAN?

2023-11-15 Thread Rui Barradas

Às 19:13 de 15/11/2023, Christopher W. Ryan via R-help escreveu:

at https://cran.r-project.org/ I get this error message:

=
Secure Connection Failed

An error occurred during a connection to cran.r-project.org.
PR_END_OF_FILE_ERROR

Error code: PR_END_OF_FILE_ERROR

 The page you are trying to view cannot be shown because the
authenticity of the received data could not be verified.
===

Three different browsers, two different devices, two different networks.
(The text of the error messages varies.)

Anyone seeing similar?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Yes, CRAN is down.

I know last week there was an anouncement about a maintenance scheduled 
but I cannot place that e-mail right now and don't remember the date 
exactly so I cannot say for sure this is what is happening.


But it is probably a scheduled maintenance.

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cryptic error for mscmt function

2023-11-06 Thread Rui Barradas

Às 13:35 de 05/11/2023, Leu Thierry escreveu:

Hi everyone,


I am trying to conduct a synthetic control analysis using the MSCMT package. However, when 
trying to run it I get a very cryptic error message saying  "Error in 
lst[[nam]][intersect(tim, rownames(lst[[nam]])), cols, drop = FALSE]: subscript out of 
bounds". Does anyone know what this means and why I receive this error? I attached the 
code & dataset used in the attachment. Thanks a lot!


Best regards

Thierry

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No attachment came through the filters, can you resend in plain text or 
if it was a .R file, rename it .txt?


See [1], section General Instructions for more on this

[1] https://www.r-project.org/mail.html#instructions

Hope this helps,

Rui Barradas

--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-04 Thread Rui Barradas
_
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






Hello,

Here are two solutions.

1. Base R

Though I don't coerce the date column to class "Date", it seems to work.


aggregate(EnergykWh ~ date, dt1, sum)
#>date EnergykWh
#> 1 1/14/2016  11.98569
#> 2 1/15/2016  32.56938
#> 3 1/16/2016  21.29181
#> 4 1/17/2016  22.88083
#> 5 1/18/2016   9.05750


2. Package dplyr.
First column date is coerced from class "character" to class "Date".
Then the grouped sums are computed.


suppressPackageStartupMessages(
  library(dplyr)
)

dt1 %>%
  mutate(date = as.Date(date, "%m/%d/%Y")) %>%
  summarise(EnergykWh = sum(EnergykWh), .by = date)
#> date EnergykWh
#> 1 2016-01-14  11.98569
#> 2 2016-01-15  32.56938
#> 3 2016-01-16  21.29181
#> 4 2016-01-17  22.88083
#> 5 2016-01-18   9.05750


As you can see, the results are the same.

Also, this exact problem is one of the most asked on StackOverflow. 
Maybe you could try searching there for a solution. My code above is 
also exactly the code in [1], though I had already this answer written. 
I only checked after :(.



[1] 
https://stackoverflow.com/questions/61548758/r-how-sum-values-by-group-by-date



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing shapes in legend with scale_shape_manual

2023-10-31 Thread Rui Barradas

Às 20:55 de 30/10/2023, Kevin Zembower via R-help escreveu:

Hello,

I'm trying to plot a graph of blood glucose versus date. I also record
conditions, such as missing the previous night's medications, and
missing exercise on the previous day. My data looks like:


b2[68:74,]

# A tibble: 7 × 5
   Date   Time  bg missed_meds no_exercise
 
1 2023-10-17 08:50128 TRUEFALSE
2 2023-10-16 06:58144 FALSE   FALSE
3 2023-10-15 09:17137 FALSE   TRUE
4 2023-10-14 09:04115 FALSE   FALSE
5 2023-10-13 08:44136 FALSE   TRUE
6 2023-10-12 08:55122 FALSE   TRUE
7 2023-10-11 07:55150 TRUETRUE




This gets me most of the way to what I want:

ggplot(data = b2, aes(x = Date, y = bg)) +
 geom_line() +
 geom_point(data = filter(b2, missed_meds),
shape = 20,
size = 3) +
 geom_point(data = filter(b2, no_exercise),
shape = 4,
size = 3) +
 geom_point(aes(x = Date, y = bg, shape = missed_meds),
alpha = 0) + #Invisible point layer for shape mapping
 scale_y_continuous(name = "Blood glucose (mg/dL)",
breaks = seq(100, 230, by = 20)
) +
 geom_hline(yintercept = 130) +
 scale_shape_manual(name = "Conditions",
labels = c("Missed meds",
   "Missed exercise"),
values = c(20, 4),
## size = 3
)

However, the legend just prints an empty square in front of the labels.
What I want is a filled circle (shape 20) in front of "Missed meds" and
a filled circle (shape 4) in front of "Missed exercise."

My questions are:
  1. How can I fix my plot to show the shapes in the legend?
  2. Can my overall plotting method be improved? Would you do it this
way?

Thanks so much for your advice and guidance.

-Kevin



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

In ggplot2 graphics when you have more than one call to the same layer 
function, then you can probably simplify the code.


In this case you make several calls to geom_point. This can probably be 
avoided.


Create a new column named Condition.
Assign to it the column names wherever the values of those columns are 
TRUE. The simplest way of doing this is to use colus missed_meds and 
no_exercise as logical index columns, see code below.


Like this the values are mapped to shapes in just one call to geom_point.
That's what function aes() is meant for, to tell what variables define 
what in the plot.




b2$Date <- as.Date(b2$Date)
# this new column will be mapped to the shape aesthetic
b2$Conditions <- NA_character_
b2$Conditions[b2$missed_meds] <- names(b2)[4]
b2$Conditions[b2$no_exercise] <- names(b2)[5]

ggplot(data = b2, aes(x = Date, y = bg)) +
  geom_line() +
  geom_point(aes(shape = Conditions), size = 3) +
  geom_hline(yintercept = 130) +
  scale_y_continuous(
name = "Blood glucose (mg/dL)",
breaks = seq(100, 230, by = 20)
  ) +
  scale_shape_manual(
#name = "Conditions",
labels = c("Missed meds", "Missed exercise"),
values = c(20, 4),
na.translate = FALSE
  )



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Reformat a dataframe

2023-10-28 Thread Rui Barradas
want to do is, instead of having 12 observations  by row, I want to
have one observation by row. I want to have a single column with 1509
observations instead of 126 rows with 12 columns per row.

I tried the following:
df = data.frame(matrix(nrow = Length, ncol = 1))
colnames(df) = c("aportes_alajuela")



for (row in 1:nrow(alajuela_df)){
   for (col in 1:ncol(alajuela_df)){
 df[i,1]=alajuela_df[i,j]
   }
}

But I am not getting the data in the structure I want.

Any help will be greatly appreciated.

Best regards,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here are two base R way with ?stack and with ?reshape.


# 1. With stack()
df_long <- stack(alajuela_df)[1]
df_long <- df_long[complete.cases(df_long), , drop = FALSE]
head(df_long)



# 2. With reshape
df_long <- reshape(
  alajuela_df, direction = "long",
  varying = names(alajuela_df),
  v.names = "x"
)[2]

# 1512 rows, only one column
dim(df_long)
# [1] 15121

# there are NA's in the data
df_long[complete.cases(df_long), , drop = FALSE] |> dim()
# [1] 15091

# keep the rows with values not NA
df_long <- df_long[complete.cases(df_long), , drop = FALSE]

# check the dimensions again
dim(df_long)
# [1] 15091



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot for 10 years extrapolation

2023-10-27 Thread Rui Barradas

Às 19:23 de 26/10/2023, varin sacha via R-help escreveu:

Dear R-Experts,

Here below my R code working but I don't know how to complete/finish my R code 
to get the final plot with the extrapolation for the10 more years.

Indeed, I try to extrapolate my data with a linear fit over the next 10 years. 
So I create a date sequence for the next 10 years and store as a dataframe to 
make the prediction possible.
Now, I am trying to get the plot with the actual data (from year 2004 to 2018) 
and with the 10 more years extrapolation.

Thanks for your help.


date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", 
"2010-12-31", "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", "2004-12-31"))
  
value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 10087, 14987, 11098, 13421, 9023, 12098)
  
model <- lm(value~date)
  
plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot")

abline(model,col="darkorange",lwd=2)
  
dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", length.out=10))
  
predict(model,dfuture,interval="prediction")



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a way with base R graphics. Explained in the code comments.




date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31",
 "2015-12-31", "2014-12-31", "2013-12-31",
 "2012-12-31", "2011-12-31", "2010-12-31",
 "2009-12-31", "2008-12-31", "2007-12-31",
 "2006-12-31", "2005-12-31", "2004-12-31"))

value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209,
  11099, 10087, 14987, 11098, 13421, 9023, 12098)

model <- lm(value ~ date)

dfuture <- data.frame(date = seq(as.Date("2019-12-31"), by="1 year", 
length.out=10))




predfuture <- predict(model, dfuture, interval="prediction")
dfuture <- cbind(dfuture, predfuture)

# start the plot with the required x and y limits
xlim <- range(c(date, dfuture$date))
ylim <- range(c(value, dfuture$fit))

plot(value ~ date, col="grey", pch=20, cex=1.5, main="Plot"
 , xlim = xlim, ylim = ylim)

# abline extends the fitted line past the x value (date)
# limit making the next ten years line ugly and not even
# completely overplotting the abline drawn line
abline(model, col="darkorange", lwd=2)
lines(fit ~ date, dfuture
  # , lty = "dashed"
  , lwd=2
  , col = "black")

# if lines() is used for both the interpolated and extrapolated
# values you will have a gap between both fitted and predicted lines
# but it is closer to what you want

# get the fitted values first (interpolated values)
ypred <- predict(model)

plot(value ~ date, col="grey", pch=20, cex=1.5, main="Plot"
 , xlim = xlim, ylim = ylim)

# plot the interpolated values
lines(ypred ~ date, col="darkorange", lwd = 2)
# and now the extrapolated values
# I use normal orange to make the difference more obvious
lines(fit ~ date, dfuture, lty = "dashed", lwd=2, col = "orange")



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in print for data frames?

2023-10-26 Thread Rui Barradas

Hello,

Inline.

Às 13:32 de 26/10/2023, Ebert,Timothy Aaron escreveu:

The "problem" goes away if you use

x$C <- y[1,]


Actually, if I understand correctly, the OP wants the column:


x$C <- y[,1]


In this case it will produce the same output because y is a df with only 
one row. But that is a very special case, the general case would be to 
extract the column.


Hope this helps,

Rui Barradas



If you have another row in your x, say:
x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6))

then your code
x$C <- y[1]
returns an error.

If y has the same number of rows as x$C then R has the same outcome as in your 
example.

It looks like your code tells R to replace all of column C (including the name) 
with all of vector y.

Maybe unexpected, but not a bug. It is consistent.


-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Thursday, October 26, 2023 6:43 AM
To: Christian Asseburg ; r-help@r-project.org
Subject: Re: [R] Bug in print for data frames?

[External Email]

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

A B C
1 1 2 3

x$B <- y$A # works as expected
x

A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
   $ A: num 1
   $ B: num 1
   $ C:'data.frame':  1 obs. of  1 variable:
..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat/
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu
%7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84
%7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D=0
PLEASE do read the posting guide
http://www.r/
-project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C23
7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=FN
CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D=0
and provide commented, minimal, self-contained, reproducible code.

Hello,

To expand on the good answers already given, I will present two other example 
data sets.

Example 1. Imagine that instead of assigning just one column from y to x$C you 
assign two columns. The result is a data.frame column. See what is displayed as 
the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the operator 
`[[` doesn't work. You will have to extract the columns y$A and y$B one by one.



x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and then 
correct the resulting df. For instance, when computing more than one summary 
statistics.

str(agg)  below shows that the result summary stats is a matrix, so you have a 
column-matrix. And once again the displayed names reflect that.

The trick to make the result a df is to extract all but the last column as a 
sub-df, extract the last column's values as a matrix (which it is) and then 
cbind the two together.

cbind is a generic function. Since the first argument to cbind is a sub-df, the 
method called is cbind.data.frame and the result is a df.



df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics # note that it 
returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = 
sd(x))) agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that the last # column is a 
matrix, not a data.frame
str(agg)
#> 'data.frame':3 obs. of  2 variables:
#>  $ A: chr  "a" "b&quo

Re: [R] Bug in print for data frames?

2023-10-26 Thread Rui Barradas

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

   A B C
1 1 2 3

x$B <- y$A # works as expected
x

   A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

   A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
  $ A: num 1
  $ B: num 1
  $ C:'data.frame':  1 obs. of  1 variable:
   ..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To expand on the good answers already given, I will present two other 
example data sets.


Example 1. Imagine that instead of assigning just one column from y to 
x$C you assign two columns. The result is a data.frame column. See what 
is displayed as the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the 
operator `[[` doesn't work. You will have to extract the columns y$A and 
y$B one by one.




x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and 
then correct the resulting df. For instance, when computing more than 
one summary statistics.


str(agg)  below shows that the result summary stats is a matrix, so you 
have a column-matrix. And once again the displayed names reflect that.


The trick to make the result a df is to extract all but the last column 
as a sub-df, extract the last column's values as a matrix (which it is) 
and then cbind the two together.


cbind is a generic function. Since the first argument to cbind is a 
sub-df, the method called is cbind.data.frame and the result is a df.




df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics
# note that it returns a named vector
agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x)))
agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that the last
# column is a matrix, not a data.frame
str(agg)
#> 'data.frame':3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a convenience, avoids repeated calls to ncol
nc <- ncol(agg)
cbind(agg[-nc], agg[[nc]])
#>   A MeanS
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951

# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame':3 obs. of  3 variables:
#>  $ A   : chr  "a" "b" "c"
#>  $ Mean: num  14.5 15.5 16.5
#>  $ S   : num  9.08 9.08 9.08



If the anonymous function hadn't returned a named vetor, the new column 
names would have been "1". "2", try it.



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by function does not separate output from function with mulliple parts

2023-10-25 Thread Rui Barradas
---
#> mydata$StepType: Second
#> lm model parameter contrast
#>
#>   Contrast S.E. LowerUpper t df Pr(>|t|)
#> 1   -2.435 1.819421 -6.198759 1.328759 -1.34 23   0.1939


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running crossvalidation many times MSE for Lasso regression

2023-10-24 Thread Rui Barradas
      >> >> }
       >> >> mean(unlist(lst))
       >> >> ##
       >> >>
       >> >>
       >> >>
       >> >>
       >> >> __
       >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       >> >> https://stat.ethz.ch/mailman/listinfo/r-help
       >> >> PLEASE do read the posting guide
       >> http://www.R-project.org/posting-guide.html
       >> >> and provide commented, minimal, self-contained, reproducible code.
       >> >
       >> > __
       >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       >> > https://stat.ethz.ch/mailman/listinfo/r-help
       >> > PLEASE do read the posting guide
       >> http://www.R-project.org/posting-guide.html
       >> > and provide commented, minimal, self-contained, reproducible code.
       >>
       >> __
       >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       >> https://stat.ethz.ch/mailman/listinfo/r-help
       >> PLEASE do read the posting guide
       >> http://www.R-project.org/posting-guide.html
       >> and provide commented, minimal, self-contained, reproducible code.
       >>


       > --
       > Jin
       > --
       > Jin Li, PhD
       > Founder, Data2action, Australia
       > https://www.researchgate.net/profile/Jin_Li32
       > https://scholar.google.com/citations?user=Jeot53EJ=en

       > [[alternative HTML version deleted]]




       > __
       > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       > https://stat.ethz.ch/mailman/listinfo/r-help
       > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
       > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

In your OP, the following two code lines are where that error comes from.


predictLasso=predict(cv_model, newx=test1)

ypred=predict(predictLasso,newdata=test1)



predictLasso already are predictions, it's the output of predict. So 
when you run the 2nd line above you are passing it a matrix, not a 
fitted model, and the error is thrown.


After the several suggestion in this thread, don't you want something 
like this instead of your for loop?



# make the results reproducible
set.seed(2023)
# this is better than what you had
z <- TT[c("x1", "x2")] |> as.matrix()
y <- TT[["y"]]
cv_model <- cv.glmnet(z, y, alpha = 1, type.measure = "mse")
best_lambda <- cv_model$lambda.min
best_lambda

# these two values should be the same, and they are
# index to minimum mse
(i <- cv_model$index[1])
which(cv_model$lambda == cv_model$lambda.min)

# these two values should be the same, and they are
# value of minimum mse
cv_model$cvm[i]
min(cv_model$cvm)

plot(cv_model)



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Rui Barradas

Às 19:35 de 18/10/2023, Leonard Mada escreveu:

Dear Rui,

On 10/18/2023 8:45 PM, Rui Barradas wrote:

split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
    stringr::str_replace_all(mol, regex, "#") |>
  strsplit("#|[[:digit:]]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
    strsplit(x, regex, perl = TRUE)
  }
}

split.symbol.character = function(x, rm.digits = TRUE) {
  # Perl is partly broken in R 4.3, but this works:
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  s <- strsplit(x, regex, perl = TRUE)
  if(rm.digits) {
    s <- lapply(s, \(x) x[grep("[[:digit:]]+", x, invert = TRUE)])
  }
  s
}


You have a glitch (mol is hardcoded) in the code of the first function. 
The times are similar, after correcting for that glitch.


Note:
- grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]", 
...)!

- corrected results below;

Sincerely,

Leonard
###

split_chem_elements <- function(x, rm.digits = TRUE) {
   regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
   if(rm.digits) {
     stringr::str_replace_all(x, regex, "#") |>
   strsplit("#|[[:digit:]]") |>
   lapply(\(x) x[nchar(x) > 0L])
   } else {
     strsplit(x, regex, perl = TRUE)
   }
}

split.symbol.character = function(x, rm.digits = TRUE) {
   # Perl is partly broken in R 4.3, but this works:
   regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
   s <- strsplit(x, regex, perl = TRUE)
   if(rm.digits) {
     s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)])
   }
   s
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
mol1 <- rep(mol, 1)

system.time(
   split_chem_elements(mol1)
)
#   user  system elapsed
#   0.58    0.00    0.58

system.time(
   split.symbol.character(mol1)
)
#   user  system elapsed
#   0.67    0.00    0.67


Hello,

You are right, sorry for the blunder :(.
In the code below I have replaced stringr::str_replace_all by the 
package stringi function stri_replace_all_regex and the improvement is 
significant.



split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
stringi::stri_replace_all_regex(x, "#", regex) |>
  strsplit("#|[0-9]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
strsplit(x, regex, perl = TRUE)
  }
}

# system.time(
#   split_chem_elements(mol1)
# )
#  user  system elapsed
#  0.060.000.09
# system.time(
#   split.symbol.character(mol1)
# )
#  user  system elapsed
#  0.250.000.28



Hope this helps,

Rui Barradas




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Rui Barradas

Às 17:24 de 18/10/2023, Leonard Mada escreveu:

Dear Rui,

Thank you for your reply.

I do have actually access to the chemical symbols: I have started to 
refactor and enhance the Rpdb package, see Rpdb::elements:

https://github.com/discoleo/Rpdb

However, the regex that you have constructed is quite heavy, as it needs 
to iterate through all chemical symbols (in decreasing nchar). Elements 
like C, and especially O, P or S, appear late in the regex expression - 
but are quite common in chemistry.


The alternative regex is (in this respect) simpler. It actually works 
(once you know about the workaround).


Q: My question focused if there is anything like is.numeric, but to 
parse each element of a vector.


Sincerely,


Leonard


On 10/18/2023 6:53 PM, Rui Barradas wrote:

Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:

Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there
any better ways?

I was working to extract chemical elements from a formula, something
like this:
split.symbol.character = function(x, rm.digits = TRUE) {
      # Perl is partly broken in R 4.3, but this works:
      regex = 
"(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";

      # stringi::stri_split(x, regex = regex);
      s = strsplit(x, regex, perl = TRUE);
      if(rm.digits) {
      s = lapply(s, function(s) {
          isNotD = is.na(suppressWarnings(as.numeric(s)));
          s = s[isNotD];
      });
      }
      return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb
PLEASE do read the posting guide
https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK
and provide commented, minimal, self-contained, reproducible code.

Hello,

If you want to extract chemical elements symbols, the following might 
work.

It uses the periodic table in GitHub package chemr and a package stringr
function.


devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
    data(pt, package = "chemr", envir = environment())
    el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
    pat <- paste(el, collapse = "|")
    stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"


It is also possible to rewrite the function without calls to non base
packages but that will take some more work.

Hope this helps,

Rui Barradas



Hello,

You and Avi are right, my function's performance is terrible. The 
following is much faster.


As for how to not have digits throw warnings, the lapply in the version 
of your function below solves it by setting grep argument invert = TRUE. 
This will get all strings where digits do not occur.




split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
stringr::str_replace_all(mol, regex, "#") |>
  strsplit("#|[[:digit:]]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
strsplit(x, regex, perl = TRUE)
  }
}

split.symbol.character = function(x, rm.digits = TRUE) {
  # Perl is partly broken in R 4.3, but this works:
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  s <- strsplit(x, regex, perl = TRUE)
  if(rm.digits) {
s <- l

Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Rui Barradas

Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:

Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there 
any better ways?


I was working to extract chemical elements from a formula, something 
like this:

split.symbol.character = function(x, rm.digits = TRUE) {
     # Perl is partly broken in R 4.3, but this works:
     regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
     # stringi::stri_split(x, regex = regex);
     s = strsplit(x, regex, perl = TRUE);
     if(rm.digits) {
     s = lapply(s, function(s) {
         isNotD = is.na(suppressWarnings(as.numeric(s)));
         s = s[isNotD];
     });
     }
     return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

If you want to extract chemical elements symbols, the following might work.
It uses the periodic table in GitHub package chemr and a package stringr 
function.



devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
  data(pt, package = "chemr", envir = environment())
  el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
  pat <- paste(el, collapse = "|")
  stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"


It is also possible to rewrite the function without calls to non base 
packages but that will take some more work.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a time series

2023-10-16 Thread Rui Barradas

Às 11:12 de 16/10/2023, ahmet varlı escreveu:


Hello everyone,

� had 15 minutes of data from 2017-11-02 13:30:00 to  2022-11-26 23:45:00 and 
number of data is 177647

� would like to ask why my time series are less then my expectation.


baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET")
bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET")  #
zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15)


length(zaman_seti)
[1] 177642

but it has to be  177647



and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- HH:MM:SS)

su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%Y-%m-%d 
%H:%M:%S")

I am using this code to change the format but it gives result as Na

How can � solve this problem?

Bests,





[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Given your date format, try


format = "%d.%m.%Y %H:%M"


Test with your date time:



x <- "2.11.2017 13:30"
as.POSIXct(x, format = "%d.%m.%Y %H:%M")
#> [1] "2017-11-02 13:30:00 WET"

as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%d.%m.%Y %H:%M")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if-else that returns vector

2023-10-12 Thread Rui Barradas

Às 21:22 de 12/10/2023, Christofer Bogaso escreveu:

Hi,

Following expression returns only the first element

ifelse(T, c(1,2,3), c(5,6))

However I am looking for some one-liner expression like above which
will return the entire vector.

Is there any way to achieve this?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I don't like it but


ifelse(rep(T, length(c(1,2,3))), c(1,2,3), c(5,6))


maybe you should use


max(length(c(1, 2, 3)), length(5, 6)))


instead, but it's still ugly.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Text showing when R is launched

2023-10-11 Thread Rui Barradas

Às 19:21 de 11/10/2023, George Loftus escreveu:

Hi,

Thankyou for your response

<https://1drv.ms/i/s!AkfoLX--ikbqkweYckSQiXYKXJuR>
[https://9c11xq.db.files.1drv.com/y4m7xqt5yVu7b5IG1jFuopunwB7Oa9Eij0WeZ7p1lSSmBECcSIB3XjcKjXIUhdMrJwaJdjZnBRhMeAxY0_Kko06Nq1fm5IhqaHlT6aFeI3R7gicXCteRPkzqNwmCdVxZu5DhNq66IrpwDyQ1lr8E5OFdm_xL86pMgNSLAx5HRRKLPOmFdUFWdv1ID-D1PC6LvNvAB-rT87JiQonSHRJIHouLg?width=200=150=center]
[https://res-h3.public.cdn.office.net/assets/mail/file-icon/png/cloud_blue_16x16.png]Screenshot
 2023-10-11 at 19.19.48.png
?

However this is all that exists in Users/Admin

There were a couple of R files in there which I have since deleted but I am 
still getting the same issue

Thankyou,
George


From: Rui Barradas 
Sent: 10 October 2023 12:06
To: George Loftus ; r-help@r-project.org 

Subject: Re: [R] Text showing when R is launched

Às 23:56 de 09/10/2023, George Loftus escreveu:

Good Evening,

I was wondering if you were able to help, I am running R on MacOS, it is the 
2020 model mac so have install the Intel arm of R which I believe is correct

However when I launch R or resume the R window after going on a different 
programme the following text is running

I have also copied and pasted for ease

1   HIToolbox   0x7ff82142e0c2 
_ZN15MenuBarInstance22RemoveAutoShowObserverEv + 30
2   HIToolbox   0x7ff82146a638 
_ZL17BroadcastInternaljPvh + 167
3   SkyLight0x7ff81c70f23d 
_ZN12_GLOBAL__N_123notify_datagram_handlerEj15CGSDatagramTypePvmS1_ + 1030
4   SkyLight0x7ff81ca2205a 
_ZN21CGSDatagramReadStream26dispatchMainQueueDatagramsEv + 202
5   SkyLight0x7ff81ca21f81 
___ZN21CGSDatagramReadStream15mainQueueWakeupEv_block_invoke + 18
6   libdispatch.dylib   0x7ff8178867fb 
_dispatch_call_block_and_release + 12
7   libdispatch.dylib   0x7ff817887a44 
_dispatch_client_callout + 8
8   libdispatch.dylib   0x7ff8178947b9 
_dispatch_main_queue_drain + 952
9   libdispatch.dylib   0x7ff8178943f3 
_dispatch_main_queue_callback_4CF + 31
10  CoreFoundation  0x7ff817b215f0 
__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
11  CoreFoundation  0x7ff817ae1b70 __CFRunLoopRun + 2454
12  CoreFoundation  0x7ff817ae0b60 CFRunLoopRunSpecific 
+ 560
13  HIToolbox   0x7ff82142e766 
RunCurrentEventLoopInMode + 292
14  HIToolbox   0x7ff82142e576 
ReceiveNextEventCommon + 679
15  HIToolbox   0x7ff82142e2b3 
_BlockUntilNextEventMatchingListInModeWithFilter + 70
16  AppKit  0x7ff81ac31293 _DPSNextEvent + 909
17  AppKit  0x7ff81ac30114 
-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] 
+ 1219
18  R   0x000103d60c76 -[RController 
doProcessEvents:] + 166
19  R   0x000103d5b295 -[RController 
handleReadConsole:] + 149
20  R   0x000103d6466f Re_ReadConsole + 175
21  libR.dylib  0x000104442154 R_ReplDLLdo1 + 148
22  R   0x000103d71c47 run_REngineRmainloop 
+ 263
23  R   0x000103d66d5f -[REngine runREPL] + 
143
24  R   0x000103d56718 main + 792
25  dyld0x7ff8176d4310 start + 2432
1   HIToolbox   0x7ff8214a1726 
_ZN15MenuBarInstance22EnsureAutoShowObserverEv + 102
2   HIToolbox   0x7ff82146a638 
_ZL17BroadcastInternaljPvh + 167
3   SkyLight0x7ff81c70f23d 
_ZN12_GLOBAL__N_123notify_datagram_handlerEj15CGSDatagramTypePvmS1_ + 1030
4   SkyLight0x7ff81ca2205a 
_ZN21CGSDatagramReadStream26dispatchMainQueueDatagramsEv + 202
5   SkyLight0x7ff81ca21f81 
___ZN21CGSDatagramReadStream15mainQueueWakeupEv_block_invoke + 18
6   libdispatch.dylib   0x7ff8178867fb 
_dispatch_call_block_and_release + 12
7   libdispatch.dylib   0x7ff817887a44 
_dispatch_client_callout + 8
8   libdispatch.dylib   0x7ff8178947b9 
_dispatch_main_queue_drain + 952
9   libdispatch.dylib   0x7ff8178943f3 
_dispatch_main_queue_callback_4CF + 31
10  CoreFoundation  0x7ff817b215f0 
__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
11  CoreFoundation  0x7ff817ae1b70 __CFRunLoopRun + 2454
12  CoreFoundation  0x7ff817ae0b60 CFRunLoopRunSpecific 
+ 560
13  HIT

Re: [R] Text showing when R is launched

2023-10-10 Thread Rui Barradas
   0x000103d71c47 run_REngineRmainloop 
+ 263
23  R   0x000103d66d5f -[REngine runREPL] + 
143
24  R   0x000103d56718 main + 792
25  dyld0x7ff8176d4310 start + 2432

Are you able to inform me what is causing this? I can't seem to find any online 
help regarding this

Thankyou in advance,
George Loftus


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Try deleting file

/Users/admin/.RData


It is restoring the previous session and this is many times a source for 
problems.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?

2023-10-06 Thread Rui Barradas

Às 10:09 de 06/10/2023, Chris Evans via R-help escreveu:
The reason I am asking is that I would like to mark areas on a plot 
using geom_polygon() and aes(fill = variable) to fill various polygons 
forming the background of a plot with different colours. Then I would 
like to overlay that with points representing direction of change: 
improved, no reliable change, deteriorated. The obvious symbols to use 
for those three directions are an upward arrow, a circle or square and a 
downward pointing arrow.  There is a solid upward point triangle symbol 
in R (ph = 17) and there are both upward and downward pointing open 
triangle symbols (pch 21 and 25) but to fill those with a solid colour 
so they will be visible over the background requires that I use a fill 
aesthetic and that gets me a mess with the legend as I will have used a 
different fill mapping to fill the polygons.  This silly reprex shows 
the issue I think.


library(tidyverse)
tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> tmpTibPoints
tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> 
tmpTibArea1
tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> 
tmpTibArea2
tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> 
tmpTibArea3
tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> 
tmpTibArea4

bind_rows(tmpTibArea1,
   tmpTibArea2,
   tmpTibArea3,
   tmpTibArea4) -> tmpTibAreas
ggplot(data = tmpTib,
    aes(x = x, y = y)) +
   geom_polygon(data = tmpTibAreas,
    aes(x = x, y = y, fill = a)) +
   geom_point(data = tmpTibPoints,
  aes(x = x, y = y, fill = c),
  pch = 24,
  size = 6)

Does anyone know a way to create a solid downward pointing symbol?  Or 
another workaround?


TIA,

Chris


Hello,

Maybe you can solve the problem with unicode characters.
See the two scale_*_manual at the end of the plot.



# Unicode characters for black up- and down-pointing characters
pts_shapes <- c("\U25B2", "\U25BC") |> setNames(c("A", "B"))
pts_colors <- c("blue", "red") |> setNames(c("A", "B"))

ggplot(data = tmpTibAreas,
   aes(x = x, y = y)) +
  geom_polygon(data = tmpTibAreas,
   aes(x = x, y = y, fill = a)) +
  geom_point(data = tmpTibPoints,
 aes(x = x, y = y, color = c, shape = c),
 size = 6) +
  scale_shape_manual(values = pts_shapes) +
  scale_color_manual(values = pts_colors)




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R issue / No buffer space available

2023-10-05 Thread Rui Barradas

Às 21:28 de 04/10/2023, Ohad Oren, MD escreveu:

Hello,

I keep getting the following message about 'no buffer space available'. I
am using R studio via connection to server. I verified that the connection
to the server is good.

2023-10-04T20:26:25.698193Z [rsession-oo968] ERROR system error 105
(No buffer space available) [host: localhost, uri: /log_message, path:
/var/run/rstudio-server/rstudio-rserver/rserver-monitor.socket];
OCCURRED AT void
rstudio::core::http::LocalStreamAsyncClient::handleConnect(const
rstudio_boost::system::error_code&)
src/cpp/session/SessionModuleContext.cpp:124


Will appreciate your help!

Ohad

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

RStudio is an IDE for R, not R itself.
That is a RStudio error and RStudio technical support [1] is better 
suited to solve your problem.


[1] https://community.rstudio.com/

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] annotate

2023-10-05 Thread Rui Barradas

Às 20:34 de 04/10/2023, Subia Thomas OI-US-LIV5 escreveu:

Colleagues,

I wish to create y-data labels which meet a criterion.

Here is my reproducible code.
library(dplyr)
library(ggplot2)
library(cowplot)

above_92 <- filter(faithful,waiting>92)

ggplot(faithful,aes(x=eruptions,y=waiting))+
   geom_point(shape=21,size=3,fill="orange")+
   theme_cowplot()+
   geom_hline(yintercept = 92)+
   
annotate(geom="text",x=above_92$eruptions,y=above_92$waiting+2,label=above_92$waiting)

A bit of trial and error is required to figure out what number to add or 
subtract to above_92$waiting.

Is there a more efficient way to do this?


Thomas Subia
Lean Six Sigma Senior Practitioner

DRÄXLMAIER Group
DAA Draexlmaier Automotive of America LLC

mailto:thomas.su...@draexlmaier.com
http://www.draexlmaier.com

"Nous croyons en Dieu.
Tous les autres doivent apporter des données.
Edward Deming


Public: All rights reserved. Distribution to third parties allowed.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hdello,

Yes, there is an automatic way of doing this.
Use a new data set in geom_text or annotate. Below I use geom_text.
Then vjust will take care of the labels placement.




library(dplyr)
library(ggplot2)
library(cowplot)

above_92 <- filter(faithful, waiting > 92)

ggplot(faithful, aes(x = eruptions, y = waiting)) +
  geom_point(shape=21,size=3,fill="orange") +
  geom_hline(yintercept = 92) +
  # use a new data argument here
  geom_text(
data = above_92,
mapping = aes(x = eruptions, y = waiting, label = waiting),
vjust = -1
  ) +
  theme_cowplot()




Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jim Lemon RIP

2023-10-04 Thread Rui Barradas



My sympathies for your loss.
Jim Lemon was a dedicated contributor to the R community and his answers 
were always welcome.

Jim will be missed.

Rui Barradas

Às 23:36 de 04/10/2023, Jim Lemon escreveu:

Hello,
I am very sad to let you know that my husband Jim died on 18th September. I
apologise for not letting you know earlier but I had trouble finding the
password for his phone.
Kind regards,
Juel






--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Rui Barradas

Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with 
calls to library() when using non-base functionality.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict function type class vs. prob

2023-09-23 Thread Rui Barradas

Às 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu:

Dear R Help Team,

My research group and I use R scripts for our multivariate data screening 
routines. During routine use, we encountered some inconsistencies within the 
predict() function of the R Stats Package. Through internal research, we were 
unable to find the reason for this and have decided to contact your help team 
with the following issue:

The predict() function is used once to predict the class membership of a new sample (type = 
"class") on a trained linear SVM model for distinguishing two classes (using the caret 
package). It is then used to also examine the probability of class membership (type = 
"prob"). Both are then presented in an R shiny output. Within the routine, we noticed two 
samples (out of 100+) where the class prediction and probability prediction did not match. The 
prediction probabilities of one class (52%) did not match the class membership within the predict 
function. We use the same seed and the discrepancy is reproducible in this sample. The same problem 
did not occur in other trained models (lda, random forest, radial SVM...).

Is there a weighing of classes within the prediction function or is the 
classification limit not at 50%/a majority vote? Or do you have another 
explanation for this discrepancy, please let us know.

PS: If this is an issue based on the model training function of the caret 
package and therefore not your responsibility, please let us know.

Thank you in advance for your support!

Yours sincerely,
Sabine Milbert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot tell what is going on but I would like to make a correction to 
your post.


predict() is a generic function with methods for objects of several 
classes in many packages. In base package stats you will find methods 
for objects (fits) of class lm, glm and others, see ?predict.


The method you are asking about is predict.train, defined in package 
caret, not in package stats.

to see what predict method is being called, check


class(your_fit)


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hadamard transformation

2023-09-18 Thread Rui Barradas

Às 18:45 de 18/09/2023, mohan radhakrishnan escreveu:

Hello,

I am attempting to port the R code which is an answer to
https://codegolf.stackexchange.com/questions/194229/implement-the-2d-hadamard-transform


function(M){for(i in 1:log2(nrow(M)))T=T%x%matrix(1-2*!3:0,2)/2; print(T);
T%*%M%*%T}

The code, 3 inputs and the corresponding outputs are shown in
https://tio.run/##PYyxCsIwFEX3fkUcAu@VV7WvcSl2dOwi8QNqNSXQJhAqrYjfHoOIwz3D4XBDNOJYiGgerp@td9Diy/gAVlgnynr0A4MLfkkeUTdarnLq5mBXKAvON1W9J8YdZ1rmsk3T72jgV/TAVBHTAROYrs/00@jz5YSY/aOSFKmvGP1yD9sk4Wa7ARSSRowf

These are the inputs.

f(matrix(c(2,3,2,5),2,2,byrow=TRUE))
f(matrix(1,4,4))
f(lower.tri(diag(4),T))

My attempt to port this R code to another framework(Tensorflow) was only
partially successful
because I didn't fully understand the cryptic R code. The second input
shown above works after
hacking Tensorflow for a long time.

My question is this. Can anyone code this in a clear way so that I can
understand ? I understand
Kronecker Product and matrix multiplication and can port that code but I am
missing something as the same ported code does not work for all inputs.

Thanks,
Mohan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Is this what you want?
(I have changed the notation a bit.)


H <- function(M){
  H0 <- 1
  Transf <- matrix(c(1, 1, 1, -1), 2L)
  for(i in 1:log2(nrow(M))) {
H0 <- H0 %x% Transf/2
  }
  H0 %*% M %*% H0
}

x <- matrix(c(2, 3, 2, 5), 2, 2, byrow = TRUE)
y <- matrix(1, 4, 4)
z <- lower.tri(diag(4), TRUE)
z[] <- apply(z, 2, as.integer)
H(x)
H(y)
H(z)



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with plotting and date-times for climate data

2023-09-12 Thread Rui Barradas

Às 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu:

Hello,

I'm trying to calculate the mean temperature max from a file of climate
date, and plot it over a range of days in the year. I've downloaded the
data, and cleaned it up the way I think it should be. However, when I
plot it, the geom_smooth line doesn't show up. I think that's because
my x axis is characters or factors. Here's what I have so far:

library(tidyverse)

data <- read_csv("Ely_MN_Weather.csv")

start_day = yday(as_date("2023-09-22"))
end_day = yday(as_date("2023-10-15"))

d <- as_tibble(data) %>%

 select(DATE,TMAX,TMIN) %>%
 mutate(DATE = as_date(DATE),
yday = yday(DATE),
md = sprintf("%02d-%02d", month(DATE), mday(DATE))
) %>%
 filter(yday >= start_day & yday <= end_day) %>%
 mutate(md = as.factor(md))

d_sum <- d %>%
 group_by(md) %>%
 summarize(tmax_mean = mean(TMAX, na.rm=TRUE))

## Here's the filtered data:
dput(d_sum)


structure(list(md = structure(1:25, levels = c("09-21", "09-22",

"09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
"09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06",
"10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13",
"10-14", "10-15"), class = "factor"), tmax_mean = c(65,
62.2,
61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9,
61.2, 63.7, 59.5, 59.6, 61.6,
59.4, 58.8, 55.9, 58.125,
58, 55.7, 57, 55.4, 49.8,
48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L))



ggplot(data = d_sum, aes(x = md)) +
 geom_point(aes(y = tmax_mean, color = "blue")) +
 geom_smooth(aes(y = tmax_mean, color = "blue"))
=
My questions are:
1. Why isn't my geom_smooth plotting? How can I fix it?
2. I don't think I'm handling the month and day combination correctly.
Is there a way to encode month and day (but not year) as a date?
3. (Minor point) Why does my graph of tmax_mean come out red when I
specify "blue"?

Thanks for any advice or guidance you can offer. I really appreciate
the expertise of this group.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The problem is that the dates are factors, not real dates. And 
geom_smooth is not interpolating along a discrete axis (the x axis).


Paste a fake year with md, coerce to date and plot.
I have simplified the aes() calls and added a date scale in order to 
make the x axis more readable.


Without the formula and method arguments, geom_smooth will print a 
message, they are now made explicit.




suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
})

d_sum %>%
  mutate(md = paste("2023", md, sep = "-"),
 md = as.Date(md)) %>%
  ggplot(aes(x = md, y = tmax_mean)) +
  geom_point(color = "blue") +
  geom_smooth(
formula = y ~ x,
method = loess,
color = "blue"
  ) +
  scale_x_date(date_breaks = "7 days", date_labels = "%m-%d")



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph in R with grouping letters from the turkey test with agricolae package

2023-09-12 Thread Rui Barradas

Às 16:24 de 12/09/2023, Loop Vinyl escreveu:

I would like to produce the attached graph (graph1) with the R package
agricolae, could someone give me an example with the attached data (data)?

I expect an adapted graph (graph2) with the data (data)

Best regards


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

There are no attached graphs, only data.
Can you post the code have you tried?

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prop.trend.test

2023-09-08 Thread Rui Barradas

Às 10:06 de 08/09/2023, peter dalgaard escreveu:

Yes, this was written a bit bone-headed (as I am allowed to say...)

If you look at the code, you will see inside:

 a <- anova(lm(freq ~ score, data = list(freq = x/n, score = 
as.vector(score)),
 weights = w))

and the lm() inside should give you the direction via the sign of the regression 
coefficient on "score".
  
So, at least for now, you could just doctor a copy of the code for your own purposes, as in


  fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
 weights = w)
  a <- anova(fit)
  
and arrange to return coef(fit)["score"] at the end. Something like structure(... estimate=c(lpm.slope=coef(fit)["score"]) )


(I expect that you might also extract the t-statistic from coef(summary(fit)) 
and find that it is the signed square root of the Chi-square, but I won't have 
time to test that just now.)

-pd


On 8 Sep 2023, at 07:22 , Thomas Subia via R-help  wrote:

Colleagues,

Thanks all for the responses.

I am monitoring the daily total number of defects per sample unit.
I need to know whether this daily defect proportion is trending upward (a bad 
thing for a manufacturing process).

My first thought was to use either a u or a u' control chart for this.
As far as I know, u or u' charts are poor to detect drifts.

This is why I chose to use prop.trend.test to detect trends in proportions.

While prop.trend.test can confirm the existence of a trend, as far as I know, 
it is left to the user
to determine what direction that trend is.

One way to illustrate trending is of course to plot the data and use 
geom_smooth and method lm
For the non-statisticians in my group, I've found that using this method along 
with the p-value of prop.trend.test, makes it easier for the users to determine 
the existence of trending and its direction.

If there are any other ways to do this, please let me know.

Thomas Subia












On Thursday, September 7, 2023 at 10:31:27 AM PDT, Rui Barradas 
 wrote:





Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu:


Colleagues

Consider
smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

prop.trend.test(smokers, patients)

Output:

Chi-squared Test for Trend inProportions

data:  smokers out of patients ,

using scores: 1 2 3 4

X-squared = 8.2249, df = 1, p-value = 0.004132

# trend test for proportions indicates proportions aretrending.

How does one identify the direction of trending?
# prop.test indicates that the proportions are unequal but doeslittle to 
indicate trend direction.
All the best,
Thomas Subia


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

By visual inspection it seems that there is a decreasing trend.
Note that the sample estimates of prop.test and smokers/patients are equal.


smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

prop.test(smokers, patients)$estimate
#>prop 1prop 2prop 3prop 4
#> 0.9651163 0.9677419 0.9485294 0.8536585

smokers/patients

#> [1] 0.9651163 0.9677419 0.9485294 0.8536585

plot(smokers/patients, type = "b")



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Hello,

Actually, the t-statistic is not the signed square root of the X-squared 
test statistic. I have edited the function, assigned the lm fit and 
returned it as is. (print.htest won't print this new list member so the 
output is not cluttered with irrelevant noise.)



smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

edit(prop.trend.test, file = "ptt.R")
source("ptt.R")

# stats::prop.trend.test edited to include the results
# of the lm fit and saved under a new name
ptt <- function (x, n, score = seq_along(x))
{
  method <- "Chi-squared Test for Trend in Proportions"
  dname <- paste(deparse1(substitute(x)), "out of", 
deparse1(substitute(n)),

 ",\n using scores:", paste(score, collapse = " "))
  x <- as.vector(x)
  n <- as.vector(n)
  p <- sum(x)/sum(n)
  w <- n/p/(1 - p)
  a <- anova(fit <- lm(freq ~ score, data = list(freq = x/n, score = 
as.vector(score)),

   weights = w))
  chisq <- c(`X-squared` = a["score", "Sum Sq"])
  structure(list(statistic =

Re: [R] prop.trend.test

2023-09-07 Thread Rui Barradas

Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu:


Colleagues

  Consider
smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

  prop.trend.test(smokers, patients)

  Output:

  Chi-squared Test for Trend inProportions

  data:  smokers out of patients ,

using scores: 1 2 3 4

X-squared = 8.2249, df = 1, p-value = 0.004132

  # trend test for proportions indicates proportions aretrending.

  How does one identify the direction of trending?
  # prop.test indicates that the proportions are unequal but doeslittle to 
indicate trend direction.
All the best,
Thomas Subia


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

By visual inspection it seems that there is a decreasing trend.
Note that the sample estimates of prop.test and smokers/patients are equal.


smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

prop.test(smokers, patients)$estimate
#>prop 1prop 2prop 3prop 4
#> 0.9651163 0.9677419 0.9485294 0.8536585
smokers/patients
#> [1] 0.9651163 0.9677419 0.9485294 0.8536585

plot(smokers/patients, type = "b")



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding error in RStudio

2023-09-05 Thread Rui Barradas

Às 17:59 de 05/09/2023, Sukriti Sood escreveu:

Hi,

I am Sukriti Sood, a research analyst at Woodstock Institute 
<https://woodstockinst.org/> . I use RStudio extensively for our analysis. I 
have been facing two issues for a while:


   1.  I am unable to copy from RStudio and paste into or vice versa to any 
other programs.
   2.  I am facing some kind of a conversion error (screenshot attached).

I tried looking up online however could not find a resolution to these issues. 
Could I please get some help with this urgently.

Thanks!

Best,
Sukriti Sood

Sukriti Sood | Research Analyst
Woodstock Institute
Pronouns: She/Her/Hers
67 East Madison, Suite 2108 | Chicago, Illinois 60603
O (312) 368-0310 x2029 | C (610) 604-6708
www.woodstockinst.org<http://www.woodstockinst.org/> | 
ss...@woodstockinst.org<mailto:ss...@woodstockinst.org>




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You should post RStudio questions to the RStudio support service, they 
answer quickly and the answers are generally good.


It's written at the bottom of the attached image that the workspace was 
loaded from file



C:/WSI/.RData


Close RStudio, remove this file and restart. See if it solved it.

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge and replace data

2023-09-05 Thread Rui Barradas

Às 09:55 de 05/09/2023, roslinazairimah zakaria escreveu:

Hi all,

I have these data

x1 <- c(116,0,115,137,127,0,0)
x2 <- c(0,159,0,0,0,159,127)

I want : xx <- c(116,115,137,127,159, 127)

I would like to merge these data into one column. Whenever the data is '0'
it will be replaced by the value in the column which is non zero..
I tried append and merge but fail to get what I want.


Hello,

That's a case for ?pmax:


x1 <- c(116,0,115,137,127,0,0)
x2 <- c(0,159,0,0,0,159,127)
pmax(x1, x2)
#> [1] 116 159 115 137 127 159 127


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate formula - differing results

2023-09-04 Thread Rui Barradas

Às 12:51 de 04/09/2023, Ivan Calandra escreveu:

Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() completely 
unexpected and confusing? Especially considering that dplyr::summarise() 
and doBy::summaryBy() deal with NAs differently, even though they all 
use mean(na.rm = TRUE) to calculate the group stats.


Best wishes,
Ivan

On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 14.88, 13.86, 
NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 
4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 56), 
THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), 
WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT 
= c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 
2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as 
in #2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = 
mean, na.rm = TRUE)


So it seems it is not only due to the notation (cbind() vs. dot). Is 
it a bug? A peculiar thing in my dataset? I tend to think this could 
be due to some variables (or their names) as all notations seem to 
agree when I remove some variables (although I haven't found out 
which variable(s) is (are) at fault), e.g.:


my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), 
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), 
class = "data.frame")


aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, 
data = my_data2, FUN = mean, na.rm = TRUE)


aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




 *LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de <http://www.leiza.de/>
<http://www.leiza.de/>
ORCID <https://orcid.org/-0003-3816-6359>
ResearchGate
<https://www.researchgate.net/profile/Ivan_Calandra>

LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and 
Health of the State of Rhineland-Palatinate. LEIZA is a research 
museum of the Leibniz Association.

__

Re: [R] aggregate formula - differing results

2023-09-04 Thread Rui Barradas
s in at least one column and the results are the same.


However, this will not give the mean values of the other numeric 
columns, just of those two.




# define a vector of columns of interest
cols <- c("Length", "Width", "RAWMAT")

# 1) Simple aggregation with 2 variables, select cols:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data[cols], FUN = 
mean, na.rm = TRUE)


# 2) Using the dot notation - if cols are selected, equal results:
aggregate(. ~ RAWMAT, data = my_data[cols], FUN = mean, na.rm = TRUE)

# 3) Using dplyr, the results are now the same results as #1 and #2:
my_data %>%
  select(all_of(cols)) %>%
  group_by(RAWMAT) %>%
  summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in analysis of Rasch using eRm package.

2023-08-22 Thread Rui Barradas

Às 16:49 de 21/08/2023, nor azila escreveu:

Dear R users,

I am using eRm package in analysing my polytomous data as below

Respondents = 277 people
Item = 30 questions

The data consists of 0,1,2,3 responses/answers.

I'm having a problem in writing coding as below because I do not know what
I should replace in each of the arguments.

data.frame(..., row.names = NULL, check.rows = FALSE,
check.names = TRUE, fix.empty.names = TRUE,
stringsAsFactors = FALSE)



Thank you very much for any help given.

Azi.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

It seems that you have data that in tabular form is one column per 
answer, so you would end up with 30 columns, maybe an extra id column.


Can you post sample data? If not, make up the answers and post the 
answers of the first 6 individuals or so.


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Questions about R

2023-08-17 Thread Rui Barradas

Às 12:10 de 17/08/2023, Shaun Parr escreveu:



Sent from Outlook for Android<https://aka.ms/AAb9ysg>


Hi there,

My name is Shaun and I work in an organisation where one of our users wishes to 
install the R software and our process is to assess the safety of anyone 
software prior to authorisation. I can’t seem to locate all the information 
that we require on the webpage, so could someone kindly advise me of the 
following information please?

1. Please can you confirm what user information the software collects (E.g. 
Name, password, e-mail address, any Personally Identifiable Information etc)?
2. If any is collected, please can you confirm if the information collected by 
the software stays locally on the device or if it is transferred anywhere. If 
it is transferred, could you please advise where it is transferred to (E.g. 
your own servers, or a third party data centre such as Amazon Web Services or 
Azure)?
3. Are there any third-party components installed within the software and, if 
so, are these also kept up-to-date?

If you could kindly advise this information, it would be really appreciated, 
thank you 


Shaun

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Hello,

1. R itself? None. Download from CRAN and install. There are OS related 
installation issues, namely authorization but that information is not 
asked for nor recorded by R.

2. The answer to "If any is collected" is already given above.
3. I am not sure I understand this point. R comes with third-party 
components and their developers try to keep them up-to-date. This has 
nothing to do with PII.

CRAN is the main official repository for contributed packages.  From [1]:

Available Packages

Currently, the CRAN package repository features 19955 available packages.


and the R  instruction

available.packages() |> nrow()
# [1] 19931

says a number close to that one.


Those packages are developed and contributed by volunteers and it's 
impossible for the CRAN maintainers  to check what exactly those 
packages do but those packages' source code must be submited and anyone 
willing to check them can.


[1] https://cran.r-project.org/web/packages/

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geom_smooth

2023-08-11 Thread Rui Barradas

Às 05:17 de 12/08/2023, Thomas Subia via R-help escreveu:

Colleagues,

Here is my reproducible code for a graph using geom_smooth
set.seed(55)
scatter_data <- tibble(x_var = runif(100, min = 0, max = 25)
    ,y_var = log2(x_var) + rnorm(100))

library(ggplot2)
library(cowplot)

ggplot(scatter_data,aes(x=x_var,y=y_var))+
   geom_point()+
   geom_smooth(se=TRUE,fill="blue",color="black",linetype="dashed")+
   theme_cowplot()

I'd like to add a black boundary around the shaded area. I suspect this can be 
done with geom_ribbon but I cannot figure this out. Some advice would be 
welcome.

Thanks!

Thomas Subia

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a solution. You ,ust access the computed variables, which you 
can with ?ggplot_build.

Then pass them in the data argument.



p <- ggplot(scatter_data,aes(x=x_var,y=y_var)) +
  geom_point()+
  geom_smooth(se=TRUE,fill="blue",color="black",linetype="dashed")+
  theme_cowplot()

# this is a data.frame, relevant columns are x,  ymin and ymax
fit <- ggplot_build(p)$data[[2]]

p +
  geom_line(data = fit, aes(x, ymin), linetype = "dashed", linewidth = 1) +
  geom_line(data = fit, aes(x, ymax), linetype = "dashed", linewidth = 1)


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Use generic functions, e.g. print, without UseMethod?

2023-08-11 Thread Rui Barradas

Às 08:20 de 11/08/2023, Sigbert Klinke escreveu:

Hello,

I have defined a function 'equations(...)' which returns an object with 
class 'equations'. I also defined a function 'print.equations' which 
prints the object. But I did not use 'equations <- function(x, ...) 
UseMethod("equations"). Two questions:


1.) Is this a sensible approach?
2.) If yes, are there any pitfalls I could run in later?

Thanks

Sigbert


Hello,

You have to ask yourself what kind of objects are you passing to 
'equations(...)'?

Do you need to have

'equations.double(...)'
'equations.character(...)'
'equations.formula(...)'
'equations.matrix(...)'
[...]

specifically written for objects of class

numeric
character
formula
matrix
[...]

respectively?
These methods would act on the respective class, process those objects 
somewhat differently because they are of different classes and output an 
object of class "equation".

(If so, it is recommended to write a 'equations.default(...)' too.)

Methods such as print.equation or summary.equation are written when you 
want your new class to have functionality your new class' users are 
familiar with.


If, for instance, autoprint is on as it frequently is, users can see 
their "equation" by typing its name at a prompt. print.equation would 
display the "equation" in a way relevant to that new class.


But this does not mean that the function that *creates* the object needs 
to be generic, you only need a new generic to have methods processing 
inputs of different classes in ways specific to those classes.


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unused argument(s) (Header = 1) help!

2023-08-09 Thread Rui Barradas

Às 05:30 de 09/08/2023, Andreas Noviyanto escreveu:

Dear Daniel,

I was use this script to calculate replicateBE with R software, its
worked. when i use the same script with similar data (xlsx) i got error
messages like below, do you have any suggest? thanks anyway
my script:
library(replicateBE)
path.in  <- "Z:/Personil Omega"
path.out <-  path.in
method.A(path.in=path.in, path.out=path.out, file="lans",
   set="01", ext="xlsx", header=1, ola=TRUE)
method.A(path.in=path.in, path.out=path.out, file="lans",
  set="02", ext="xlsx", header=1)
ABE(path.in=path.in, path.out=path.out, file="lans",
  set="01", ext="xlsx", header=1)

ABE(path.in=path.in, path.out=path.out, file="lans",
  set="02", ext="xlsx", header=1)

result:
  > library(replicateBE)
  > path.in  <- "Z:/Personil Omega"
  > path.out <- path.in
  > method.A(path.in=path.in, path.out=path.out, file="lans",
+  set="01", ext="xlsx", header=1, ola=TRUE)
Error in method.A(path.in = path.in, path.out = path.out, file = "lans",
   :
unused argument (header = 1)
  > method.A(path.in=path.in, path.out=path.out, file="lans",
+ set="02", ext="xlsx", header=1)
Error in method.A(path.in = path.in, path.out = path.out, file = "lans",
   :
unused argument (header = 1)
  > ABE(path.in=path.in, path.out=path.out, file="lans",
+ set="01", ext="xlsx", header=1)
Error in ABE(path.in = path.in, path.out = path.out, file = "lans",  :
unused argument (header = 1)
  > ABE(path.in=path.in, path.out=path.out, file="lans",
+ set="02", ext="xlsx", header=1)
Error in ABE(path.in = path.in, path.out = path.out, file = "lans",  :
unused argument (header = 1)



Warm Regards,

Andreas

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

That error message means that there is no argument 'header' to function 
method.A.

Simply remove it and you should be fine.

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacking matrix columns

2023-08-06 Thread Rui Barradas

Às 01:15 de 06/08/2023, Iris Simmons escreveu:

You could also do

dim(x) <- c(length(x), 1)

On Sat, Aug 5, 2023, 20:12 Steven Yen  wrote:


I wish to stack columns of a matrix into one column. The following
matrix command does it. Any other ways? Thanks.

  > x<-matrix(1:20,5,4)
  > x
   [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   19
[5,]5   10   15   20

  > matrix(x,ncol=1)
[,1]
   [1,]1
   [2,]2
   [3,]3
   [4,]4
   [5,]5
   [6,]6
   [7,]7
   [8,]8
   [9,]9
[10,]   10
[11,]   11
[12,]   12
[13,]   13
[14,]   14
[15,]   15
[16,]   16
[17,]   17
[18,]   18
[19,]   19
[20,]   20
  >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Yet another solution.


t(t(c(x)))

or

x |> c() |> t() |> t()


At first I liked it but it's the slowest of the three, OP's, Iris' (the 
fastest).


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiply

2023-08-04 Thread Rui Barradas

Às 19:03 de 04/08/2023, Val escreveu:

Thank you,  Avi and Ivan.  Worked for this particular Example.

Yes, I am looking for something with a more general purpose.
I think Ivan's suggestion works for this.

multiplication=as.matrix(dat1[,-1]) %*% as.matrix(dat2[match(dat1[,1],
dat2[,1]),-1])
Res=data.frame(ID = dat1[,1], Index = multiplication)

On Fri, Aug 4, 2023 at 10:59 AM  wrote:


Val,

A data.frame is not quite the same thing as a matrix.

But as long as everything is numeric, you can convert both data.frames to
matrices, perform the computations needed and, if you want, convert it back
into a data.frame.

BUT it must be all numeric and you violate that requirement by having a
character column for ID. You need to eliminate that temporarily:

dat1 <- read.table(text="ID, x, y, z
  A, 10,  34, 12
  B, 25,  42, 18
  C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)

mat1 <- as.matrix(dat1[,2:4])

The result is:


mat1

   x  y  z
[1,] 10 34 12
[2,] 25 42 18
[3,] 14 20  8

Now do the second matrix, perhaps in one step:

mat2 <- as.matrix(read.table(text="ID, weight, weiht2
  A,  0.25, 0.35
  B,  0.42, 0.52
  C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)[,2:3])


Do note some people use read.csv() instead of read.table, albeit it simply
calls read.table after setting some parameters like the comma.

The result is what you asked for, including spelling weight wrong once.:


mat2

  weight weiht2
[1,]   0.25   0.35
[2,]   0.42   0.52
[3,]   0.65   0.75

Now you wanted to multiply as in matrix multiplication.


mat1 %*% mat2

  weight weiht2
[1,]  24.58  30.18
[2,]  35.59  44.09
[3,]  17.10  21.30

Of course, you wanted different names for the columns and you can do that
easily enough:

result <- mat1 %*% mat2

colnames(result) <- c("index1", "index2")


But this is missing something:


result

  index1 index2
[1,]  24.58  30.18
[2,]  35.59  44.09
[3,]  17.10  21.30

Do you want a column of ID numbers on the left? If numeric, you can keep it
in a matrix in one of many ways but if you want to go back to the data.frame
format and re-use the ID numbers, there are again MANY ways. But note mixing
characters and numbers can inadvertently convert everything to characters.

Here is one solution. Not the only one nor the best one but reasonable:

recombined <- data.frame(index=dat1$ID,
  index1=result[,1],
  index2=result[,2])



recombined

   index index1 index2
1 A  24.58  30.18
2 B  35.59  44.09
3 C  17.10  21.30

If for some reason you need a more general purpose way to do this for
arbitrary conformant matrices, you can write a function that does this in a
more general way but perhaps a better idea might be a way to store your
matrices in files in a way that can be read back in directly or to not
include indices as character columns but as row names.






-Original Message-
From: R-help  On Behalf Of Val
Sent: Friday, August 4, 2023 10:54 AM
To: r-help@R-project.org (r-help@r-project.org) 
Subject: [R] Multiply

Hi all,

I want to multiply two  data frames as shown below,

dat1 <-read.table(text="ID, x, y, z
  A, 10,  34, 12
  B, 25,  42, 18
  C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)

dat2 <-read.table(text="ID, weight, weiht2
  A,  0.25, 0.35
  B,  0.42, 0.52
  C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)

Desired result

ID  Index1 Index2
1  A 24.58 30.18
2  B 35.59 44.09
3  C 17.10 21.30

Here is my attempt,  but did not work

dat3 <- data.frame(ID = dat1[,1], Index = apply(dat1[,-1], 1, FUN=
function(x) {sum(x*dat2[,2:ncol(dat2)])} ), stringsAsFactors=F)


Any help?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Slightly simpler:



multiplication <- as.matrix(dat1[,-1]) %*% 
as.matrix(dat2[match(dat1[,1], dat2[,1]),-1])

Res <- data.frame(ID = dat1[,1], Index = multiplication)

# this is what I find simpler
# the method being called is cbind.data.frame
Res2 <- cbind(dat1[1], Index = multiplication)

identical(Res, Res2)
#> [1] TRUE


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with facets in ggplot2

2023-08-04 Thread Rui Barradas

Às 11:08 de 04/08/2023, Nick Wray escreveu:

Hello  I am wrestling with ggplot – I have produced a facetted plot of
flows under various metrics but I can’t find info on the net which tells me
how to do three things


I have created some simplified mock data to illustrate (and using a
colour-blind palette):


library(ggplot2)

library(forcats)

cb8<- c("#00", "#E69F00", "#56B4E9", "#009E73","#F0E442", "#0072B2",
"#D55E00", "#CC79A7")

set.seed<-(040823)

   mock<- set.seed<-(040823)


mock<-as.data.frame(cbind(rep((1990:1995),8),round(rnorm(48,50,10),3),rep(c(rep("Tweed",6),rep("Tay",6)),4),rep(c("AMAX","Mean","AMIN","Median"),each=12)))

   colnames(mock)<-c("Year","Flow","Stat","Metric")

   mock



   ggplot(mock, aes(Year,Flow, group = factor(Stat), colour = factor(Stat)))+

 coord_cartesian(ylim = c(0, 100)) +

 geom_line(size=1)+

 scale_color_manual(name = "Stat", values = cb8[4:7])+

 scale_y_discrete(breaks=c(0,25,50,75,100),labels=c(0,25,50,75,100))+

 facet_wrap(vars(Metric),nrow=2,ncol=2)+

 ylab("Flow")



1)This gives me a facetted plot but I can’t work out why I’m not getting a
labelled y scale



2)Why are plots down at the bottom of the facets rather than in the middle?



3)And also I’d like the plots to be in the order (top left to bottom right)
of

AMAX MEAN AMIN MEDIAN

but if I add in the line
facet_grid(~fct_relevel(Metric,"AMAX","Mean","AMIN","Median")) before the
line ylab it disrupts the 2x2 layout



Can anyone tell me how to resolve these problems?  Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The main problem is the way ou create the data set. cbind defaults to 
creating a matrix and since some of the vectors are of class "character" 
all others will be coerced to character too. And Year and Flow will no 
longer be numeric.


You can coerce those two columns to numeric manually or you can use 
data.frame(), not as.data.frame(), to create the data set.
And you were, therefore, using scale_y_discrete when it should be 
scale_y_continuous. Corrected below.


As for the relevel, I'm not getting any errors.



library(ggplot2)
library(forcats)

cb8<- c("#00", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")

set.seed(040823)

# the right way of creating the data set
mock <- data.frame(
  Year = rep((1990:1995),8),
  Flow = round(rnorm(48,50,10),3),
  Stat = rep(c(rep("Tweed",6), rep("Tay",6)),4),
  Metric = rep(c("AMAX","Mean","AMIN","Median"),each=12)
)

ggplot(mock, aes(Year,Flow, group = factor(Stat), colour = factor(Stat))) +
  coord_cartesian(ylim = c(0, 100)) +
  geom_line(size=1) +
  scale_color_manual(name = "Stat", values = cb8[4:5]) +
  scale_y_continuous(breaks=c(0, 25, 50, 75, 100), labels=c(0, 25, 50, 
75, 100)) +
  facet_wrap(~ fct_relevel(Metric,"AMAX","Mean","AMIN","Median"), nrow 
= 2, ncol = 2) +

  ylab("Flow")


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Choosing colours for lines in ggplot2

2023-08-02 Thread Rui Barradas

Às 18:10 de 02/08/2023, Nick Wray escreveu:

Hello - I am trying to plot flows in a number of rivers within the same
plot, and need to colour the lines differently, using a colour-blind
palette.


Code beneath works but has colours assigned by the program I have made some
simple dummy data:
## code 1
cb8<- c("#00", "#E69F00", "#56B4E9", "#009E73","#F0E442", "#0072B2",
"#D55E00", "#CC79A7")  ## this is the colour-blind palette
  set.seed(020823)
  
df<-as.data.frame(cbind(rep(1980:1991,2),c(10*runif(12),10*runif(12)),c(rep(1,12),rep(2,12
  colnames(df)<-c("Year","Flow","Stat")
  df
  ggplot(df,aes(Year,Flow,group=Stat,colour=Stat))+
coord_cartesian(ylim = c(0, 10)) +
geom_line()+
geom_point()
  ## this works

## BUT:
## code 2
col.2<-cb8[4:5]
  ggplot(df,aes(Year,Flow,group=Stat,colour=Stat))+
coord_cartesian(ylim = c(0, 10)) +
geom_line()+
geom_point()+
  scale_color_manual(values =cb8[4:5])+
theme_bw()
  ## this gives error message Error: Continuous value supplied to discrete
scale

## However this example using code from the net does work so I don't
understand why my ## second code doesn't work.
## code 3
  df.1 <- data.frame(store=c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'),
   week=c(1, 2, 3, 1, 2, 3, 1, 2, 3),
   sales=c(9, 12, 15, 7, 9, 14, 10, 16, 19))
  ggplot(df.1, aes(x=week, y=sales, group=store, color=store)) +
geom_line(size=2) +
#scale_color_manual(values=c('orange', 'pink', 'red'))
scale_color_manual(values=cb8[4:6])

Can anyone help? Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Your Stat column is numeric, therefore, ggplot sees it as continuous.
To make it work, coerce to factor. Here are two ways.


## 1st way, Stat coerce to factor in the ggplot code
## this means you will have to set the legend name
## manually in scale_color_manual
## code 2, now it works
col.2<-cb8[4:5]
ggplot(df, aes(Year,Flow, group = factor(Stat), colour = factor(Stat)))+
  coord_cartesian(ylim = c(0, 10)) +
  geom_line()+
  geom_point()+
  scale_color_manual(name = "Stat", values = cb8[4:5])+
  theme_bw()


## 2nd way, since you are using ggplot2, a tidyverse package,
## coerce to factor in a pipe before the ggplot call
## this is done with dplyr::mutate and R's native pipe operator
## (could also be magritttr's pipe)
## I have left name = "Stat" like above though it's no
## longer needed
df |>
  dplyr::mutate(Stat = factor(Stat)) |>
  ggplot(aes(Year, Flow, group = Stat, colour = Stat))+
  coord_cartesian(ylim = c(0, 10)) +
  geom_line()+
  geom_point()+
  scale_color_manual(name = "Stat", values = cb8[4:5])+
  theme_bw()



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting Fitted vs Observed Values in Logistic Regression Model

2023-08-02 Thread Rui Barradas

Às 14:57 de 01/08/2023, Paul Bernal escreveu:

Dear friends,

I hope  this email finds you all well. This is the dataset I am working
with:
dput(random_mod12_data2)
structure(list(Index = c(1L, 5L, 11L, 3L, 2L, 8L, 9L, 4L), x = c(5,
13, 25, 9, 7, 19, 21, 11), n = c(500, 500, 500, 500, 500, 500,
500, 500), r = c(100, 211, 391, 147, 122, 310, 343, 176), ratio = c(0.2,
0.422, 0.782, 0.294, 0.244, 0.62, 0.686, 0.352)), row.names = c(NA,
-8L), class = "data.frame")

A brief description of the dataset:
Index: is just a column that shows the ID of each observation (row)
x: is a column which gives information on the discount rate of the coupon
n: is the sample or number of observations
r: is the count of redeemed coupons
ratio: is just the ratio of redeemed coupons to n (total number of
observations)

#Fitting a logistic regression model to response variable y for problem 13.4
logistic_regmod2 <- glm(formula = ratio~x, family = binomial(logit), data =
random_mod12_data2)

I would like to plot the value of r (in the y-axis) vs x (the different
discount rates) and then superimpose the logistic regression fitted values
all in the same plot.

How could I accomplish this?

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is another way with ggplot2.
It doesn't give you the fitted values but it plots the fitted line.


library(ggplot2)

ggplot(random_mod12_data2, aes(x, ratio)) +
  geom_point() +
  stat_smooth(
formula = y ~ x,
method = glm,
method.args = list(family = binomial),
se = FALSE
  )


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Downloading a directory of text files into R

2023-07-25 Thread Rui Barradas

Às 23:06 de 25/07/2023, Bob Green escreveu:

Hello,

I am seeking advice as to how I can download the 833 files from this 
site:"http://home.brisnet.org.au/~bgreen/Data/;


I want to be able to download them to perform a textual analysis.

If the 833 files, which are in a Directory with two subfolders were on 
my computer I could read them through readtext. Using readtext I get the 
error:


 > x = readtext("http://home.brisnet.org.au/~bgreen/Data/*;)
Error in download_remote(file, ignore_missing, cache, verbosity) :
   Remote URL does not end in known extension. Please download the file 
manually.


 > x = readtext("http://home.brisnet.org.au/~bgreen/Data/Dir/()")
Error in download_remote(file, ignore_missing, cache, verbosity) :
   Remote URL does not end in known extension. Please download the file 
manually.


Any suggestions are appreciated.

Bob

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

The following code downloads all files in the posted link.



suppressPackageStartupMessages({
  library(rvest)
})

# destination directory, change this at will
dest_dir <- "~/Temp"

# first get the two subfolders from the Data webpage
link <- "http://home.brisnet.org.au/~bgreen/Data/;
page <- read_html(link)
page %>%
  html_elements("a") %>%
  html_text() %>%
  grep("/$", ., value = TRUE) -> sub_folder

# create relevant disk sub-directories, if
# they do not exist yet
for(subf in sub_folder) {
  d <- file.path(dest_dir, subf)
  if(!dir.exists(d)) {
success <- dir.create(d)
msg <- paste("created directory", d, "-", success)
message(msg)
  }
}

# prepare to download the files
dest_dir <- file.path(dest_dir, sub_folder)
source_url <- paste0(link, sub_folder)

success <- mapply(\(src, dest) {
  # read each Data subfolder
  # and get the file names therein
  # then lapply 'download.file' to each filename
  pg <- read_html(src)
  pg %>%
html_elements("a") %>%
html_text() %>%
grep("\\.txt$", ., value = TRUE) %>%
lapply(\(x) {
  s <- paste0(src, x)
  d <- file.path(dest, x)
  tryCatch(
download.file(url = s, destfile = d),
warning = function(w) w,
error = function(e) e
  )
})
}, source_url, dest_dir)

lengths(success)
# http://home.brisnet.org.au/~bgreen/Data/Hanson1/
#   84
# http://home.brisnet.org.au/~bgreen/Data/Hanson2/
#  749

# matches the question's number
sum(lengths(success))
# [1] 833



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Off-topic: ChatGPT Code Interpreter

2023-07-17 Thread Rui Barradas

Às 00:23 de 18/07/2023, Jim Lemon escreveu:

I haven't really focused on the statistical capabilities of AI, that
marriage of massive memory and associative learning. I am impressed by
its ability to perform text-to-image conversion, something I have
recently needed. My artistic ability is that of the average three year
old, yet I can employ AI to translate my mental images into realistic
pictures. Perhaps we really are learning about how we think. As far as
I am aware, it just does what we tell it to do. Like other tools, it
is as good or bad as the user.

Jim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Also off-topic but the date is fun:


A system is not a head.
Furniture is not people.
All processes and all devices,
will be useless for organizations,
if the heads of the individuals who employ them,
are not properly organized.
And these heads will be organized,
if the same part of the boss's body that directs them
is properly organized.
Just like you can write nonsense
with a latest model typewriter,
nonsense can also be done
with the most perfect systems and devices
meant to help you not to.
Systems, processes, furniture, machines,
are purely auxiliary elements.
The real process is to think.
The fundamental machine is intelligence.

Fernando Pessoa, 1926

Revista de Comércio e Contabilidade, nº 4. Lisboa, 25-4-1926.
(Magazine of Commerce and Accounting, nº 4. Lisbon, 25-4-1926)


Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nlmixr2 installation problems

2023-07-16 Thread Rui Barradas

Às 12:55 de 16/07/2023, Troels Ring escreveu:

Hi friends - Trying to install nlmixr2 caused problems. I'm on windows
with R4.3.1 so made sure to have rtools 4.3 and also reinstalled R and then

ran

install.packages("nlmixr2",dependencies = TRUE)and got the
responseInstalling package into
‘C:/Users/Admin/AppData/Local/R/win-library/4.3’ (as ‘lib’ is
unspecified) also installing the dependencies ‘fs’, ‘rappdirs’, ‘bit’,
‘prettyunits’, ‘rematch’, ‘askpass’, ‘sass’, ‘commonmark’, ‘proxy’,
‘bit64’, ‘progress’, ‘rootSolve’, ‘lmom’, ‘cellranger’, ‘jsonlite’,
‘mime’, ‘openssl’, ‘htmlwidgets’, ‘ellipsis’, ‘bslib’, ‘fontawesome’,
‘jquerylib’, ‘tinytex’, ‘curl’, ‘markdown’, ‘jpeg’, ‘xml2’, ‘fastmap’,
‘e1071’, ‘generics’, ‘tidyselect’, ‘clipr’, ‘hms’, ‘vroom’, ‘cpp11’,
‘tzdb’, ‘stringi’, ‘purrr’, ‘mvtnorm’, ‘expm’, ‘rstudioapi’, ‘Exact’,
‘gld’, ‘readxl’, ‘httr’, ‘gridExtra’, ‘htmlTable’, ‘viridis’,
‘htmltools’, ‘base64enc’, ‘rmarkdown’, ‘Formula’, ‘bitops’, ‘evaluate’,
‘highr’, ‘xfun’, ‘yaml’, ‘numDeriv’, ‘lazyeval’, ‘optextras’, ‘dparser’,
‘RcppEigen’, ‘StanHeaders’, ‘sitmo’, ‘gridtext’, ‘cachem’,
‘RcppParallel’, ‘RApiSerialize’, ‘stringfish’, ‘classInt’, ‘dplyr’,
‘readr’, ‘stringr’, ‘tidyr’, ‘assertthat’, ‘binom’, ‘Deriv’,
‘DescTools’, ‘Hmisc’, ‘minpack.lm’, ‘pander’, ‘png’, ‘RCurl’,
‘backports’, ‘checkmate’, ‘knitr’, ‘lbfgsb3c’, ‘minqa’, ‘n1qn1’, ‘Rcpp’,
‘rex’, ‘Rvmmin’, ‘symengine’, ‘BH’, ‘RcppArmadillo’, ‘rxode2parse’,
‘rxode2random’, ‘data.table’, ‘digest’, ‘ggtext’, ‘PreciseSums’,
‘inline’, ‘memoise’, ‘sys’, ‘rxode2ll’, ‘rxode2et’, ‘qs’, ‘vpc’, ‘xgxr’,
‘nlmixr2data’, ‘nlmixr2est’, ‘nlmixr2extra’, ‘rxode2’, ‘lotri’,
‘nlmixr2plot’, ‘crayon’ There are binary versions available but the
source versions are later: binary source needs_compilation sass 0.4.6
0.4.7 TRUE openssl 2.0.6 2.1.0 TRUE
- and nothing more happened But when trying to install sass and openssl
individually the same announcement of later source versions appeared
without making it possible to ask for recompilation. All best wishes Troels

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Maybe this [1] is relevant.


[1] 
https://community.rstudio.com/t/meaning-of-common-message-when-install-a-package-there-are-binary-versions-available-but-the-source-versions-are-later/2431


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to add error bars to a line plot with ggplot2?

2023-07-14 Thread Rui Barradas

Às 17:33 de 14/07/2023, Luigi Marongiu escreveu:

Hello,
I am measuring a certain variable at given time intervals and
different concentrations of a reagent. I would like to make a scatter
plot of the values, joined by a line to highlight the temporal
measure.
I can plot this all right. Now, since I have more than one replicate,
I would like to add he error bars.
I prepared  a dataframe with the mean measures and a column with the
standard deviations, but when I run the code, I get the error:
```
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (20): colour
Run `rlang::last_trace()` to see where the error occurred.
```
I am missing something, but what?
Thank you


WORKING EXAMPLE
```
measTime= c(1,2,4,24,48,1,2,4,24
  ,48,1,2,4,24,48,1,2,4,24,48)
conc= c(0.25,0.25,0.25,0.25,0.25,1.12,1.12
,1.12,1.12,1.12,2.5,2.5,2.5,2.5,2.5
,25,25,25,25,25)
varbl= c(0.0329,0.27,0.0785,0.1015
,-0.193,0.048,0.113,0.1695,-0.775,0.464,-0.257
,-0.154,-0.3835,-1.23,-0.513,1.3465,1.276
,1.128,-2.56,-1.813)
stdDev=c(0.646632301492381,0,1.77997087991162
,0.247683265482349,0,0.282901631902917,0
,0.273086677326693,1.03807578400295,0,0.912213425319609
,0,1.64371621638287,2.23203614068709,0,0.2615396719429
,0,0.54039985196149,2.15236180353893,0)
df = data.frame(Time=measTime, mM=conc, ddC=varbl, SD=stdDev)
library(ggplot2)
COLS = c("green", "red", "blue", "yellow")
   ggplot(df,
  aes(x=Time, y=ddC, colour=mM, group=mM)) +
   geom_line(aes(x=Time, y=ddC, colour=mM, group=mM)) +
   geom_errorbar(aes(x=Time, ymin=ddC-SD, ymax=ddC+SD, colour=mM, group=mM),
 width=.1, colour=COLS) +
   geom_point(size=6) +
   scale_colour_manual(values = COLS) +
   ggtitle("Exposure") +
   xlab(expression(bold("Time (h)"))) +
   ylab(expression(bold("Value"))) +
   geom_hline(aes(yintercept=0)) +
   theme_classic()
   ```

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Two notes:

1. If you want to use a discrete colours vector, your 'colour' aesthetic 
must be mapped to a discrete variabe. The most frequent cases are 
character or factor columns.


2. If you start the plot with certain aesthetics set you don't have to 
repeat them in subsequent layers, geom_line can be called with no aes() 
and gem_errorbar doesn't need x=measTime again.



As for the main error, the colors vector COLS should be removed from 
geom_errorbar.




df <- data.frame(Time = measTime,
 mM = factor(conc),  # this must be a factor
 ddC = varbl,
 SD = stdDev)

library(ggplot2)

COLS = c("green", "red", "blue", "yellow")

ggplot(df, aes(x = Time, y = ddC, colour = mM, group = mM)) +
  geom_line() +
  geom_errorbar(aes(ymin = ddC - SD, ymax = ddC + SD), width = 0.1) +
  geom_point(size = 6) +
  geom_hline(aes(yintercept = 0)) +
  scale_colour_manual(values = COLS) +
  ggtitle("Exposure") +
  xlab(expression(bold("Time (h)"))) +
  ylab(expression(bold("Value"))) +
  theme_classic()


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] textual analysis - transforming several pdf to txt - naming the files

2023-07-05 Thread Rui Barradas

Às 11:12 de 05/07/2023, Cecília Carmo escreveu:

convertpdf2txt <- function(dirpath){

files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$", full.names
= TRUE)
files <- chartr("\\", "/", files)

x <- lapply(files, function(x){
  pdftools::pdf_text(x) %>%
paste0(collapse = " ") %>%
stringr::str_squish()
})
new_names <- tools::file_path_sans_ext(files)
new_names <- paste(new_names, "txt", sep = ".")
setNames(x, new_names)
}

# apply function
# note that my test files are in "~/Temp"
txts <- convertpdf2txt(here::here("~", "Temp"))
names(txts)


Thank you very much, but the following error appeared:

Error: unexpected '}' in "}"




Cec�lia Carmo

Universidade de Aveiro

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I had tested the code with a couple of PDF's and it ran with no errors 
or warnings.
That error is telling that a "}" is not balanced but in my code they all 
are, RStudio checks it automatically.


Can you try to check in an editor with syntax highlighting?


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] textual analysis - transforming several pdf to txt - naming the files

2023-07-05 Thread Rui Barradas

Às 10:14 de 05/07/2023, Cecília Carmo escreveu:

I am taking my first steps in textual analysis with R.
I have pdf files consisting of company reports for several years (1 file 
corresponds to 1 company and 1 year).
My idea is to start by transforming all my pdf files into txt files for further 
treatment and analysis (this will allow me to group the files by company or by 
year, depending on the future analysis to be performed).
I do not have in-depth knowledge of programming in R. I just adapt codes that I 
find, to my needs. Here goes the first doubt in a code I'm adapting:

My pdf files are in one directory named "pdfs". The names of my files are, for 
example, SONAE2020FS.pdf, EDP2021GS.pdf
I want to convert them to txt and give the same names as in the pdf files: 
SOANE2020FS.txt, EDP2021GS.txt
I'm running the following scrip, but the names of txt files that I obtain are: 
pdftext1, pdftext2, pdftext3...
What do I need to change?
Thank you very much,

Cec�lia Carmo
Universidade de Aveiro - Portugal


dirpath <- ("/Users/ceciliacarmo/documents/RTextualAnalysis/data/pdfs")


library(pdftools)

library(dplyr)


convertpdf2txt <- function(dirpath){

   files <- list.files(dirpath, full.names = T)

   x <- sapply(files, function(x){

   x <- pdftools::pdf_text(x) %>%

   paste0(collapse = " ") %>%

   stringr::str_squish()

   return(x)

 })

}

# apply function

txts <- convertpdf2txt(here::here("data", "pdf/"))

# add names to txt files

names(txts) <- paste0(here::here("data","pdftext"), 1:length(txts), sep = "")




[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Try the following.
The corrected function convertpdf2txt assigns names based on the files 
variable.
It uses tools::file_path_sans_ext to keep the filename without extension 
and pastes the new extension to them. In the end there is no need to 
call here::here again, the list already is a named list.




convertpdf2txt <- function(dirpath){
  files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$", full.names 
= TRUE)

  files <- chartr("\\", "/", files)

  x <- lapply(files, function(x){
pdftools::pdf_text(x) %>%
  paste0(collapse = " ") %>%
  stringr::str_squish()
  })
  new_names <- tools::file_path_sans_ext(files)
  new_names <- paste(new_names, "txt", sep = ".")
  setNames(x, new_names)
}

# apply function
# note that my test files are in "~/Temp"
txts <- convertpdf2txt(here::here("~", "Temp"))
names(txts)



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create matrix with column names wiht the same prefix xxxx and that end in 1, 2

2023-07-03 Thread Rui Barradas

Às 20:55 de 03/07/2023, Rui Barradas escreveu:

Às 20:26 de 03/07/2023, Sorkin, John escreveu:

Jeff,
Again my thanks for your guidance.
I replaced dimnames(myvalues)<-list(NULL,c(zzz))
with
colnames(myvalues)<-zzz
and get the same error,
Error in dimnames(x) <- dn :
   length of 'dimnames' [2] not equal to array extent
It appears that I am creating the string zzz in a manner that is not 
compatable with either

dimnames(myvalues)<-list(NULL,c(zzz))
or
colnames(myvalues)<-zzz

I think I need to modify the way I create the string zzz.

# create variable names xxx1 and xxx2.
string=""
for (j in 1:2){
   name <- paste("xxx",j,sep="")
   string <- paste(string,name)
   print(string)
}
# Creation of xxx1 and xxx2 works
string

# Create matrix
myvalues <- matrix(nrow=2,ncol=4)
head(myvalues,1)
# Add "j" and "k" to the string of column names
zzz <- paste("j","k",string)
zzz
# assign column names, j, k, xxx1, xxx2 to the matrix
# create column names, j, k, xxx1, xxx2.
dimnames(myvalues)<-list(NULL,c(zzz))
colnames(myvalues)<-zzz

From: Jeff Newmiller 
Sent: Monday, July 3, 2023 2:45 PM
To: Sorkin, John
Cc: r-help@r-project.org
Subject: Re: [R]  Create matrix with column names wiht the same prefix 
 and that end in 1, 2


I really think you should read that help page.  colnames() accesses 
the second element of dimnames() directly.


On July 3, 2023 11:39:37 AM PDT, "Sorkin, John" 
 wrote:

Jeff,
Thank you for your reply.
I should have said with dim names not column names. I want the Mateix 
to have dim names, no row names, dim names j, k, xxx1, xxx2.


John

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and 
Geriatric Medicine

Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above 
prior to faxing)


On Jul 3, 2023, at 2:11 PM, Jeff Newmiller  
wrote:


?colnames

On July 3, 2023 11:00:32 AM PDT, "Sorkin, John" 
 wrote:
I am trying to create an array, myvalues, having 2 rows and 4 
columns, where the column names are j,k,xxx1,xxx2. The code below 
fails, with the following error, "Error in dimnames(myvalues) <- 
list(NULL, zzz) :

length of 'dimnames' [2] not equal to array extent"

Please help me get the code to work.

Thank you,
John

# create variable names xxx1 and xxx2.
string=""
for (j in 1:2){
name <- paste("xxx",j,sep="")
string <- paste(string,name)
print(string)
}
# Creation of xxx1 and xxx2 works
string

# Create matrix
myvalues <- matrix(nrow=2,ncol=4)
head(myvalues,1)
# Add "j" and "k" to the string of column names
zzz <- paste("j","k",string)
zzz
# assign column names, j, k, xxx1, xxx2 to the matrix
# create column names, j, k, xxx1, xxx2.
dimnames(myvalues)<-list(NULL,zzz)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.r-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.r-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Sent from my phone. Please excuse my brevity.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

I should have pointed out in my answer that you are inded creating the 
names vector in a (very) wrong way. When in the loop you paste string 
and name you create one vector of length 1. When the loop ends, you have 
" xxx1 xxx2", not two names.



string=""
for (j in 1:2){
   name <- paste("xxx",j,sep="")
   string <- paste(string,name)
   print(string)
}
#> [1] " xxx1"
#> [1] " xxx1 xxx2"
# Creation of xxx1 and xxx2 works
string
#> [1] " xxx1 xxx2"



Quoting the comment above,

   Creation of xxx1 and xxx2 works

No, it does not!
And then you paste again, adding two extra letters to one string

zzz <- paste("j","k",string)


This zzz also is of length 1, check it.


With a loop the right way would be any of

# 1. concatenate the current

Re: [R] Create matrix with column names wiht the same prefix xxxx and that end in 1, 2

2023-07-03 Thread Rui Barradas
frequent
for (j in 1:2){
  name <- paste("xxx",j,sep="")
  string <- c(string, name)
  print(string)
}
#> [1] "xxx1"
#> [1] "xxx1" "xxx2"
# Now creation of xxx1 and xxx2 does work
string
#> [1] "xxx1" "xxx2"



# 2. create a vector of the appropriate length beforehand, my preferred
string <- character(2)
for (j in 1:2){
  string[j] <- paste0("xxx",j,sep="")
  print(string)
}
#> [1] "xxx1" ""
#> [1] "xxx1" "xxx2"
# Creation of xxx1 and xxx2 works
string
#> [1] "xxx1" "xxx2"



But the vectorized way is still the better one.

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create matrix with column names wiht the same prefix xxxx and that end in 1, 2

2023-07-03 Thread Rui Barradas

Às 19:00 de 03/07/2023, Sorkin, John escreveu:

I am trying to create an array, myvalues, having 2 rows and 4 columns, where the column 
names are j,k,xxx1,xxx2. The code below fails, with the following error, "Error in 
dimnames(myvalues) <- list(NULL, zzz) :
   length of 'dimnames' [2] not equal to array extent"

Please help me get the code to work.

Thank you,
John

# create variable names xxx1 and xxx2.
string=""
for (j in 1:2){
   name <- paste("xxx",j,sep="")
   string <- paste(string,name)
   print(string)
}
# Creation of xxx1 and xxx2 works
string

# Create matrix
myvalues <- matrix(nrow=2,ncol=4)
head(myvalues,1)
# Add "j" and "k" to the string of column names
zzz <- paste("j","k",string)
zzz
# assign column names, j, k, xxx1, xxx2 to the matrix
# create column names, j, k, xxx1, xxx2.
dimnames(myvalues)<-list(NULL,zzz)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You don't need so many calls to paste, one is enough.
And you don't need the for loop at all, paste and paste0 are vectorized.



myvalues <- matrix(nrow=2,ncol=4)

cnames <- paste0("xxx", 1:2)
cnames
# [1] "xxx1" "xxx2"

colnames(myvalues) <- c("j", "k", cnames)



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to plot both lines and points by group on ggplot2

2023-07-01 Thread Rui Barradas

Às 19:20 de 01/07/2023, Luigi Marongiu escreveu:

Hello,
I have a dataframe with measurements stratified by the concentration
of a certain substance. I would like to plot the points of the
measures and connect the points within each series of concentrations.
When I launch ggplot2 I get the error
```
geom_path: Each group consists of only one observation. Do you need to
adjust the
group aesthetic?
```
and no lines are drawn.
Where am I going wrong?
Thank you
Luigi

```
df = data.frame(Conc = c(rep(1, 3), rep(2, 3), rep(5, 3)),
 Time = rep(1:3, 3),
 Value = c(0.91, 0.67, 0.71, 0.91, 0.65, 0.74, 0.95, 0.67, 
0.67))
df$Time <- as.factor(df$Time)
levels(df$Time) = c(1, 4, 24)
df$Conc <- as.factor(df$Conc)
levels(df$Conc) = c(1, 2, 5)
library(ggplot2)
ggplot(df, aes(x=Time, y=Value, colour=Conc)) +
   geom_point(size=6) +
   geom_line(aes(x=Time, y=Value, colour=Conc)) +
   scale_colour_manual(values = c("darkslategray3", "darkslategray4",
  "deepskyblue4")) +
   ggtitle("Working example") +
   xlab(expression(bold("Time (h)"))) +
   ylab(expression(bold("Concentration (mM)")))
```

Hello,

Here are two solutions. I have removed the redundant aes() from 
geom_line in both plots.


1. If you do not coerce Time to factor, the x axis will be continuous.
   The plot will be as expected but you wi have to include a 
scale_x_continuous to have the wanted labels.




df = data.frame(Conc = c(rep(1, 3), rep(2, 3), rep(5, 3)),
Time = rep(1:3, 3),
Value = c(0.91, 0.67, 0.71, 0.91, 0.65, 0.74, 0.95, 
0.67, 0.67))


library(ggplot2)

df$Conc <- factor(df$Conc, levels = c(1, 2, 5))

ggplot(df, aes(x=Time, y=Value, colour=Conc)) +
  geom_point(size=6) +
  geom_line() +
  scale_colour_manual(values = c("darkslategray3", "darkslategray4", 
"deepskyblue4")) +

  scale_x_continuous(breaks = 1:3, labels = c(1, 2, 24)) +
  ggtitle("Working example") +
  xlab(expression(bold("Time (h)"))) +
  ylab(expression(bold("Concentration (mM)")))




2. Time is coerced to factor. Then, tell geom_line the data is grouped 
by Conc. This is probably the solution you should use.




df$Time <- factor(df$Time, labels = c(1, 4, 24))

ggplot(df, aes(x=Time, y=Value, colour=Conc)) +
  geom_point(size=6) +
  geom_line(aes(group = Conc)) +
  scale_colour_manual(values = c("darkslategray3", "darkslategray4", 
"deepskyblue4")) +

  ggtitle("Working example") +
  xlab(expression(bold("Time (h)"))) +
  ylab(expression(bold("Concentration (mM)")))



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with crammed Y axis

2023-06-18 Thread Rui Barradas

Às 00:07 de 17/06/2023, Ana Marija escreveu:

Hi,

I have a data frame like this:


dput(df)

structure(list(ID = 1:8, Type = c("gmx mdrun -ntmpi 8 -ntomp 1 -s
benchPEP.tpr -nsteps 1 -resethway",
"gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps 4000 -resetstep 3000",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps 4000 -resetstep 3000",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway",
"gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps -1 -maxh 1.0
-resethway -noconfout",
"gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps -1 -maxh 1.0
-resethway -noconfout"
), Annee = c("SYCL", "CUDA", "SYCL", "CUDA", "SYCL", "CUDA",
"SYCL", "CUDA"), Domain.decomp. = c("2. 1", "2", "2. 1", "2. 1",
"2.1", "2", "2. 1", "2"), DD.com..load = c(0, 0, 0, 0, 3.7, 3,
0, 0), Neighbor.search = c("3.7", "3. 1", "3.7", "3.9", "0. 1",
"O. 1", "3.5", "3. 1"), Launch.PP.GPU.ops. = c("0. 1", "0", "0.2",
"0", "1 .6", "1 . 5", "0.2", "0. 1"), Comm..coord. = c("1 .6",
"1 .0", "1 .5", "1 .3", "1 .5", "1 .3", "1 . 5", "1 .6"), Force = c("1 .
5",
"1 .2", "1 .4", "1 .2", "1 .3", "1 . 1", "1 .5", "1 .2"), Wait...Comm..F =
c("1 .3",
"1 .7", "1 .2", "1 .0", "66.7", "68.8", "1 .2", "1 .2"), PIE.mesh =
c("65.6",
"70.9", "61 .0", "61 .4", "0", "0", "67.6", "69.2"), Wait.Bonded.GPU =
c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), wait.GPU.NB.nonloc. = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Wait.GPU.NB.local = c(0, 0, 0, 0, 7.4,
5.7, 0, 0), NB.X.F.buffer.ops. = c("7.3", "4.4", "6. 7", "5",
"0. 1", "0. 1", "7.2", "5.5"), Write.traje = c("0.3", "0.3",
"1 .2", "1 .3", "6.4", "6. 1", "O. 1", "0. 1"), Update = c(6.3,
4.3, 5.7, 4.9, 8.2, 9.5, 6.2, 5.6), Constraints = c("8.9", "9.7",
"1 1 .6", "13.3", "0.3", "0.4", "8. 1", "9.5"), Comm..energies = c("0.9",
"0.9", "3.3", "3.9", "8.4", "8. 5", "0.3", "0.4"), PIE.redist..X.F = c("8.
1",
"8.7", "7.9", "7.4", "29.9", "30.1", "8. 1", "8. 1"), PIE.spread =
c("29.7",
"30.6", "27.2", "29.6", "20.3", "20.2", "30. 1", "30.4"), PIE.gather =
c("19.9",
"21 .3", "18.7", "19", "6.4", "8.4", "20", "20.6"), PIE.3D.FFT = c("6",
"8.6", "5.7", "4.3", "1 .0", "1 .1", "7.6", "8.4"), PIE.3D.FFT.comm. = c("1
.2",
"1 .0", "0.9", "0.7", "1 .2", "0. 5", "1 .0", "1 .1"), PIE.solve.Elec =
c(0.7,
0.5, 0.6, 0.3, 0.7, 0.5, 0.7, 0.5)), class = "data.frame", row.names =
c(NA,
-8L))

I am plotting this data with:

library(reshape2)
library(ggplot2)

df <- read.csv("/Users/anamaria/Downloads/B5.csv", stringsAsFactors=FALSE,
header=TRUE)

df.long<-melt(df,id.vars=c("ID","Type","Annee"))

myplot =ggplot(df.long,aes(variable,value,fill=as.factor(Annee)))+
geom_bar(position="dodge",stat="identity")+
ylab("Simulation Progress (%)") +
facet_wrap(~Type,nrow=3)

myplot + theme(panel.grid.major = element_blank(),
legend.title=element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.title.x = element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
axis.line = element_line(colour = "black"))

My issue is that Y axis is crammed. How it can be cleaned up and say
feature only say these values: 0, 10, 20,30, ...80.

I tried using: scale_y_continuous(breaks = breaks_width(10))+
But I got this error:
Error in breaks_width(10) : could not find function "breaks_width"

Also can anything be done about the subtitle of the top left plot, which is
not quite fitting in that gray box: "
gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway"

Thanks
Ana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The problem seems to be that df.long$value is character and that it has 
spaces and "O" (upper case letter O) in it. Try, before plotting



df.long$value <- gsub(" ", "", df.long$value)
df.long$value <- sub("O", "0", df.long$value)
df.long$value <- as.numeric(df.long$value)


With me it solved the problem.

As for breaks_width, that's a function in package scales, so if the 
above doesn't solve it, qualify the function name:



scale_y_continuous(breaks = scales::breaks_width(10)) +


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with filling dataframe's column

2023-06-13 Thread Rui Barradas

Às 17:18 de 13/06/2023, javad bayat escreveu:

Dear Rui;
Hi. I used your codes, but it seems it didn't work for me.


pat <- c("_esmdes|_Des Section|0")
dim(data2)

 [1]  281549  9

grep(pat, data2$Layer)
dim(data2)

 [1]  281549  9

What does grep function do? I expected the function to remove 3 rows of the
dataframe.
I do not know the reason.






On Mon, Jun 12, 2023 at 5:16 PM Rui Barradas  wrote:


Às 23:13 de 12/06/2023, javad bayat escreveu:

Dear Rui;
Many thanks for the email. I tried your codes and found that the length

of

the "Values" and "Names" vectors must be equal, otherwise the results

will

not be useful.
For some of the characters in the Layer column that I do not need to be
filled in the LU column, I used "NA".
But I need to delete some of the rows from the table as they are useless
for me. I tried this code to delete entire rows of the dataframe which
contained these three value in the Layer column: It gave me the following
error.


data3 = data2[-grep(c("_esmdes","_Des Section","0"), data2$Layer),]

   Warning message:
In grep(c("_esmdes", "_Des Section", "0"), data2$Layer) :
argument 'pattern' has length > 1 and only the first element will

be

used


data3 = data2[!grepl(c("_esmdes","_Des Section","0"), data2$Layer),]

  Warning message:
  In grepl(c("_esmdes", "_Des Section", "0"), data2$Layer) :
  argument 'pattern' has length > 1 and only the first element will be
used

How can I do this?
Sincerely










On Sun, Jun 11, 2023 at 5:03 PM Rui Barradas 

wrote:



Às 13:18 de 11/06/2023, Rui Barradas escreveu:

Às 22:54 de 11/06/2023, javad bayat escreveu:

Dear Rui;
Many thanks for your email. I used one of your codes,
"data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it works
correctly for me.
Actually I need to expand the codes so as to consider all "Levels" in

the

"Layer" column. There are more than hundred levels in the Layer

column.

If I use your provided code, I have to write it hundred of time as

below:

data2$LU[which(data2$Layer == "Level 1")] <- "Park";
data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
...
...
...
.
Is there any other way to expand the code in order to consider all of

the

levels simultaneously? Like the below code:
data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3", ...))]

<-

c("Park", "Agri", "GS", ...)


Sincerely




On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas 
wrote:


Às 21:05 de 11/06/2023, javad bayat escreveu:

Dear R users;
I am trying to fill a column based on a specific value in another
column

of

a dataframe, but it seems there is a problem with the codes!
The "Layer" and the "LU" are two different columns of the dataframe.
How can I fix this?
Sincerely


for (i in 1:nrow(data2$Layer)){
  if (data2$Layer == "Level 12") {
  data2$LU == "Park"
  }
  }





Hello,

There are two bugs in your code,

1) the index i is not used in the loop
2) the assignment operator is `<-`, not `==`


Here is the loop corrected.

for (i in 1:nrow(data2$Layer)){
  if (data2$Layer[i] == "Level 12") {
data2$LU[i] <- "Park"
  }
}



But R is a vectorized language, the following two ways are the

idiomac

ways of doing what you want to do.



i <- data2$Layer == "Level 12"
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[data2$Layer == "Level 12"] <- "Park"



If there are NA's in data2$Layer it's probably safer to use ?which()

in

the logical index, to have a numeric one.



i <- which(data2$Layer == "Level 12")
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[which(data2$Layer == "Level 12")] <- "Park"


Hope this helps,

Rui Barradas





Hello,

You don't need to repeat the same instruction 100+ times, there is a

way

of assigning all new LU values at the same time with match().
This assumes that you have the new values in a vector.


Sorry, this is not clear. I mean


This assumes that you have the new values in a vector, the vector Names
below. The vector of values to be matched is created from the data.


Rui Barradas




Values <- sort(unique(data2$Layer))
Names <- c("Park", "Agri", "GS")

i <- match(data2$Layer, Values)
data2$LU <- Names[i]


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting 

Re: [R] Problem with filling dataframe's column

2023-06-12 Thread Rui Barradas

Às 23:13 de 12/06/2023, javad bayat escreveu:

Dear Rui;
Many thanks for the email. I tried your codes and found that the length of
the "Values" and "Names" vectors must be equal, otherwise the results will
not be useful.
For some of the characters in the Layer column that I do not need to be
filled in the LU column, I used "NA".
But I need to delete some of the rows from the table as they are useless
for me. I tried this code to delete entire rows of the dataframe which
contained these three value in the Layer column: It gave me the following
error.


data3 = data2[-grep(c("_esmdes","_Des Section","0"), data2$Layer),]

  Warning message:
   In grep(c("_esmdes", "_Des Section", "0"), data2$Layer) :
   argument 'pattern' has length > 1 and only the first element will be
used


data3 = data2[!grepl(c("_esmdes","_Des Section","0"), data2$Layer),]

 Warning message:
 In grepl(c("_esmdes", "_Des Section", "0"), data2$Layer) :
     argument 'pattern' has length > 1 and only the first element will be
used

How can I do this?
Sincerely










On Sun, Jun 11, 2023 at 5:03 PM Rui Barradas  wrote:


Às 13:18 de 11/06/2023, Rui Barradas escreveu:

Às 22:54 de 11/06/2023, javad bayat escreveu:

Dear Rui;
Many thanks for your email. I used one of your codes,
"data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it works
correctly for me.
Actually I need to expand the codes so as to consider all "Levels" in

the

"Layer" column. There are more than hundred levels in the Layer column.
If I use your provided code, I have to write it hundred of time as

below:

data2$LU[which(data2$Layer == "Level 1")] <- "Park";
data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
...
...
...
.
Is there any other way to expand the code in order to consider all of

the

levels simultaneously? Like the below code:
data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3", ...))]

<-

c("Park", "Agri", "GS", ...)


Sincerely




On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas 
wrote:


Às 21:05 de 11/06/2023, javad bayat escreveu:

Dear R users;
I am trying to fill a column based on a specific value in another
column

of

a dataframe, but it seems there is a problem with the codes!
The "Layer" and the "LU" are two different columns of the dataframe.
How can I fix this?
Sincerely


for (i in 1:nrow(data2$Layer)){
 if (data2$Layer == "Level 12") {
 data2$LU == "Park"
 }
 }





Hello,

There are two bugs in your code,

1) the index i is not used in the loop
2) the assignment operator is `<-`, not `==`


Here is the loop corrected.

for (i in 1:nrow(data2$Layer)){
 if (data2$Layer[i] == "Level 12") {
   data2$LU[i] <- "Park"
 }
}



But R is a vectorized language, the following two ways are the idiomac
ways of doing what you want to do.



i <- data2$Layer == "Level 12"
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[data2$Layer == "Level 12"] <- "Park"



If there are NA's in data2$Layer it's probably safer to use ?which() in
the logical index, to have a numeric one.



i <- which(data2$Layer == "Level 12")
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[which(data2$Layer == "Level 12")] <- "Park"


Hope this helps,

Rui Barradas





Hello,

You don't need to repeat the same instruction 100+ times, there is a way
of assigning all new LU values at the same time with match().
This assumes that you have the new values in a vector.


Sorry, this is not clear. I mean


This assumes that you have the new values in a vector, the vector Names
below. The vector of values to be matched is created from the data.


Rui Barradas




Values <- sort(unique(data2$Layer))
Names <- c("Park", "Agri", "GS")

i <- match(data2$Layer, Values)
data2$LU <- Names[i]


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






Hello,

Please cc the r-help list, R-Help is threaded and this can in the future 
be helpful to others.


You can combine several patters like this:


pat <- c("_esmdes|_Des Section|0")
grep(pat, data2$Layer)

or, programatically,


pat <- paste(c("_esmdes","_Des Section","0"), collapse = "|")


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with filling dataframe's column

2023-06-11 Thread Rui Barradas

Às 13:18 de 11/06/2023, Rui Barradas escreveu:

Às 22:54 de 11/06/2023, javad bayat escreveu:

Dear Rui;
Many thanks for your email. I used one of your codes,
"data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it works
correctly for me.
Actually I need to expand the codes so as to consider all "Levels" in the
"Layer" column. There are more than hundred levels in the Layer column.
If I use your provided code, I have to write it hundred of time as below:
data2$LU[which(data2$Layer == "Level 1")] <- "Park";
data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
...
...
...
.
Is there any other way to expand the code in order to consider all of the
levels simultaneously? Like the below code:
data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3", ...))] <-
c("Park", "Agri", "GS", ...)


Sincerely




On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas  
wrote:



Às 21:05 de 11/06/2023, javad bayat escreveu:

Dear R users;
I am trying to fill a column based on a specific value in another 
column

of

a dataframe, but it seems there is a problem with the codes!
The "Layer" and the "LU" are two different columns of the dataframe.
How can I fix this?
Sincerely


for (i in 1:nrow(data2$Layer)){
    if (data2$Layer == "Level 12") {
    data2$LU == "Park"
    }
    }





Hello,

There are two bugs in your code,

1) the index i is not used in the loop
2) the assignment operator is `<-`, not `==`


Here is the loop corrected.

for (i in 1:nrow(data2$Layer)){
    if (data2$Layer[i] == "Level 12") {
  data2$LU[i] <- "Park"
    }
}



But R is a vectorized language, the following two ways are the idiomac
ways of doing what you want to do.



i <- data2$Layer == "Level 12"
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[data2$Layer == "Level 12"] <- "Park"



If there are NA's in data2$Layer it's probably safer to use ?which() in
the logical index, to have a numeric one.



i <- which(data2$Layer == "Level 12")
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[which(data2$Layer == "Level 12")] <- "Park"


Hope this helps,

Rui Barradas





Hello,

You don't need to repeat the same instruction 100+ times, there is a way 
of assigning all new LU values at the same time with match().

This assumes that you have the new values in a vector.


Sorry, this is not clear. I mean


This assumes that you have the new values in a vector, the vector Names 
below. The vector of values to be matched is created from the data.



Rui Barradas




Values <- sort(unique(data2$Layer))
Names <- c("Park", "Agri", "GS")

i <- match(data2$Layer, Values)
data2$LU <- Names[i]


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with filling dataframe's column

2023-06-11 Thread Rui Barradas

Às 22:54 de 11/06/2023, javad bayat escreveu:

Dear Rui;
Many thanks for your email. I used one of your codes,
"data2$LU[which(data2$Layer == "Level 12")] <- "Park"", and it works
correctly for me.
Actually I need to expand the codes so as to consider all "Levels" in the
"Layer" column. There are more than hundred levels in the Layer column.
If I use your provided code, I have to write it hundred of time as below:
data2$LU[which(data2$Layer == "Level 1")] <- "Park";
data2$LU[which(data2$Layer == "Level 2")] <- "Agri";
...
...
...
.
Is there any other way to expand the code in order to consider all of the
levels simultaneously? Like the below code:
data2$LU[which(data2$Layer == c("Level 1","Level 2", "Level 3", ...))] <-
c("Park", "Agri", "GS", ...)


Sincerely




On Sun, Jun 11, 2023 at 1:43 PM Rui Barradas  wrote:


Às 21:05 de 11/06/2023, javad bayat escreveu:

Dear R users;
I am trying to fill a column based on a specific value in another column

of

a dataframe, but it seems there is a problem with the codes!
The "Layer" and the "LU" are two different columns of the dataframe.
How can I fix this?
Sincerely


for (i in 1:nrow(data2$Layer)){
if (data2$Layer == "Level 12") {
data2$LU == "Park"
}
}





Hello,

There are two bugs in your code,

1) the index i is not used in the loop
2) the assignment operator is `<-`, not `==`


Here is the loop corrected.

for (i in 1:nrow(data2$Layer)){
if (data2$Layer[i] == "Level 12") {
  data2$LU[i] <- "Park"
}
}



But R is a vectorized language, the following two ways are the idiomac
ways of doing what you want to do.



i <- data2$Layer == "Level 12"
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[data2$Layer == "Level 12"] <- "Park"



If there are NA's in data2$Layer it's probably safer to use ?which() in
the logical index, to have a numeric one.



i <- which(data2$Layer == "Level 12")
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[which(data2$Layer == "Level 12")] <- "Park"


Hope this helps,

Rui Barradas





Hello,

You don't need to repeat the same instruction 100+ times, there is a way 
of assigning all new LU values at the same time with match().

This assumes that you have the new values in a vector.


Values <- sort(unique(data2$Layer))
Names <- c("Park", "Agri", "GS")

i <- match(data2$Layer, Values)
data2$LU <- Names[i]


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with filling dataframe's column

2023-06-11 Thread Rui Barradas

Às 21:05 de 11/06/2023, javad bayat escreveu:

Dear R users;
I am trying to fill a column based on a specific value in another column of
a dataframe, but it seems there is a problem with the codes!
The "Layer" and the "LU" are two different columns of the dataframe.
How can I fix this?
Sincerely


for (i in 1:nrow(data2$Layer)){
   if (data2$Layer == "Level 12") {
   data2$LU == "Park"
   }
   }





Hello,

There are two bugs in your code,

1) the index i is not used in the loop
2) the assignment operator is `<-`, not `==`


Here is the loop corrected.

for (i in 1:nrow(data2$Layer)){
  if (data2$Layer[i] == "Level 12") {
data2$LU[i] <- "Park"
  }
}



But R is a vectorized language, the following two ways are the idiomac 
ways of doing what you want to do.




i <- data2$Layer == "Level 12"
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[data2$Layer == "Level 12"] <- "Park"



If there are NA's in data2$Layer it's probably safer to use ?which() in 
the logical index, to have a numeric one.




i <- which(data2$Layer == "Level 12")
data2$LU[i] <- "Park"

# equivalent one-liner
data2$LU[which(data2$Layer == "Level 12")] <- "Park"


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recombining Mon and Year values

2023-05-16 Thread Rui Barradas

Às 21:29 de 16/05/2023, Jeff Reichman escreveu:

R Help

  


I have a data.frame where I've broken out the year  and an ordered
month  values. But I need to recombine them so I can graph mon-year in
order but when I recombine I lose the month order and the results are
plotted alphabetical.

  


Yearmonth  mon_year

 

2021 MarMar-2021

2021 Jan Jan-2021

2021 Apr Apr-2021

  


So do I need to convert the months back to an integer then recombine to
plot.

  


Jeff Reichman


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You can use function as.yearmon in package zoo to get the correct 
month/year order.




df1 <- data.frame(Year = c(2021, 2021, 2021),
  Mon = c("Mar", "Jan", "Apr"))

df1$mon_year <- zoo::as.yearmon(paste(df1$Mon, df1$Year))

sort(df1$mon_year)
#> [1] "Jan 2021" "Mar 2021" "Apr 2021"



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie: Drawing fitted lines on subset of data

2023-05-16 Thread Rui Barradas

Às 15:29 de 16/05/2023, Kevin Zembower via R-help escreveu:

Hello,

I's still working with my tsibble of weight data for the last 20 years.
In addition to drawing an overall trend line, using lm, for the whole
data set, I'd like to draw short lines that would recompute lm and draw
it, say, just for the years from 2010:2015.

Here's a short example that I think illustrates what I'm trying to do.
The commented out sections show what I've tried to far:

## Short example to test segments:

w <- tsibble(
  date = as.Date("2022-01-01") + 0:99,
  value = rnorm(100)
)

ggplot(data = w, mapping = aes(date, value)) +
  geom_smooth(method = "lm", se = FALSE) +
  geom_point()
  ## Below gives error about ignoring data
  ## geom_abline( data = w$date[25:75] )
  ## Gives error ''data' must be in '
  ## geom_smooth(data = w$date[25:35],
  ## method = lm,
  ## color = "black",
  ## se = FALSE)

I'm thinking that this is probably easily done, but I'm struggling with
how to subset the data in the middle of the pipeline.

Thanks for any advice and help.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Try the following.
In the 2nd geom_smooth you need a subset of the data not of just one of 
its columns.




suppressPackageStartupMessages({
  library(tsibble)
  library(dplyr)
  library(ggplot2)
  library(lubridate)
})

ggplot(data = w, mapping = aes(date, value)) +
  geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
  geom_point() +
  geom_smooth(
data = w %>% filter(year(date) >= 2010, year(date) <= 2015),
mapping = aes(date, value),
formula = y ~ x,
method = lm,
color = "black",
se = FALSE
  )


Other ways to subset the data are


# dplyr
data = w %>% filter(year(date) %in% 2010:2015)
# base R
data = subset(w, year(date) %in% 2010:2015)


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error message when using 'optim' for numerical maximum likelihood

2023-05-14 Thread Rui Barradas

Às 06:28 de 14/05/2023, iguodala edwin via R-help escreveu:

Good morning, How can I resolved error message New_X with convergence 1.Thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Please include data and the code you tried in your questions to R-Help.
We'll be glad to help but like this it is not possible to do so.

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate wind direction data with wind speed required

2023-05-13 Thread Rui Barradas

Às 15:51 de 13/05/2023, Stefano Sofia escreveu:

Dear list users,

I have to aggregate wind direction data (wd) using a function that requires 
also a second input variable, wind speed (ws).

This is the function that I need to use:


my_fun <- function(wd1, ws1){

   u_component <- -ws1*sin(2*pi*wd1/360)
   v_component <- -ws1*cos(2*pi*wd1/360)
   mean_u <- mean(u_component, na.rm=T)
   mean_v <- mean(v_component, na.rm=T)
   mean_wd <- (atan2(mean_u, mean_v) * 360/2/pi) + 180
   result <- mean_wd
   result
}

Does the aggregate function work only with functions with a single input 
variable (the one that I want to aggregate), or its use can be extended to 
functions with two input variables?

Here a simple example (which is meaningless, the important think is the concept 
behind it):
df <- data.frame(day=c(1, 1, 1, 2, 2, 2, 3, 3), month=c(1, 1, 2, 2, 2, 2, 2, 
2), wd=c(45, 90, 90, 135, 180, 270, 270, 315), ws=c(7, 7, 8, 3, 2, 7, 14, 13))

aggregate(wd ~ day + month, data=df, FUN = my_fun)

cannot work, because ws is not taken into consideration.

I got lost. Any hint, any help?
I hope to have been able to explain my problem.
Thank you for your attention,
Stefano


  (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere 
informazioni confidenziali, pertanto � destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed 
urgenza, la risposta al presente messaggio di posta elettronica pu� essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

--

Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non infetto.

This message was scanned by Libraesva ESG and is believed to be clean.


[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Use the dots argument to pass any number of named arguments to your 
aggregation function.

In this case, ws1 = ws at the end of the aggregate call.


aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = ws)


You can also give the user the option to remove or not NA's by adding a 
na.rm argument:



my_fun <- function(wd1, ws1, na.rm = FALSE) {
  [...]
  mean_u <- mean(u_component, na.rm = na.rm)
  mean_v <- mean(v_component, na.rm = na.rm)
  [...]
}

aggregate(wd ~ day + month, data=df, FUN = my_fun, ws1 = ws, na.rm = TRUE)


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Newbie: Controlling legends in graphs

2023-05-12 Thread Rui Barradas

Às 14:24 de 12/05/2023, Kevin Zembower via R-help escreveu:

Hello, I'm trying to create a line graph with a legend, but have no
success controlling the legend. Since nothing I've tried seems to work,
I must be doing something systematically wrong. Can anyone point this
out to me?

Here's my data:
  > weights
# A tibble: 1,246 × 3
 Date   J K
   
   1 2000-02-13   133  188
   2 2000-02-20   134  185
   3 2000-02-27   135  187
   4 2000-03-05   135  185
   5 2000-03-12NA  184
   6 2000-03-19NA  184.
   7 2000-03-26   136  184.
   8 2000-04-02   134  185
   9 2000-04-09   133  186
10 2000-04-16NA  186
# ℹ 1,236 more rows
# ℹ Use `print(n = ...)` to see more rows
  >

Here's my attempts. You can see some of the things I've tried in the
commented out sections:
weights %>%
  group_by(year(Date)) %>%
  summarize(
  m_K = mean(K, na.rm = TRUE),
  m_J = mean(J, na.rm = TRUE),
  ) %>%
  ggplot(aes(x = `year(Date)`)) +
  geom_point(aes(y = m_K, color = "red")) +
  geom_smooth(aes(y = m_K, color = "red")) +
  geom_point(aes(y = m_J, color = "blue")) +
  geom_smooth(aes(y = m_J, color = "blue")) +
  guides(size = "legend",
 shape = "legend")
  ## scale_shape_discrete(name="Person",
  ##  breaks=c("m_K", "m_J"),
  ##  labels=c("K", "J"))
  ## theme(legend.title=element_blank())

When this runs, the blue line for "K" is above the red line for "J", as
I expect, but in the legend, the red is shown first, and labeled "blue."

I'd like to be able to create a legend where the first entry shows a
blue line and is labeled "K" and the second is red and labeled "J".

On a different but related topic, I'd welcome any advice or suggestions
on my methodology in this example. Is this the correct way to summarize
with a mean? Do I need the two sets of geom_point and geom_line clauses
to create this graph, or is there a better way?

Thanks for all your advice and guidance.

-Kevin


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

This is mainly a data reshaping problem. Insteadof plotting two 
variables, J and K, if the data is in the long format you will map the 
column with these variables names to the color aesthetic and call each 
geom_* only once. Then, assign the colors you want.


As for placing K above J, note that ggplot places them by alphabetical 
order unless you coerce to factor with the levels in the order you want.


Also, if you want to compute aggregate statistics for several columns, 
use ?across. See the code below.


Here is a complete example. I have augmented your data set in order to 
have more years to plot.




# augment the data set
weights <- " Date   J K
  1 2000-02-13   133  188
  2 2000-02-20   134  185
  3 2000-02-27   135  187
  4 2000-03-05   135  185
  5 2000-03-12NA  184
  6 2000-03-19NA  184.
  7 2000-03-26   136  184.
  8 2000-04-02   134  185
  9 2000-04-09   133  186
10 2000-04-16NA  186"
weights <- read.table(text = weights, header = TRUE)
weights$Date <- as.Date(weights$Date)
tmp <- weights
tmp <- lapply(1:10, \(y) {
  tmp$Date <- years(y) + tmp$Date
  tmp$J <- tmp$J + sample(-10:10, nrow(weights), TRUE)
  tmp$K <- tmp$K + sample(-10:10, nrow(weights), TRUE)
  tmp
})
weights <- do.call(rbind, tmp)

#---

# plot code
library(ggplot2)
library(dplyr)
library(tidyr)
library(lubridate)

weights %>%
mutate(Year = year(Date)) %>%
group_by(Year) %>%
summarize(across(J:K, mean, na.rm = TRUE)) %>%
# now reshape the data
pivot_longer(-Year) %>%
# uncomment the next line if you want K
# to show up on top in the legend
# mutate(name = factor(name, levels = c("K", "J"))) %>%
ggplot(aes(Year, value, color = name)) +
geom_smooth(
formula = y ~ x,
method = lm,
se = FALSE
) +
geom_point() +
scale_color_manual(values = c(J = "red", K = "blue"))



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data.frame with a column containing an array

2023-05-09 Thread Rui Barradas

Às 11:52 de 08/05/2023, Georg Kindermann escreveu:

Dear list members,

when I create a data.frame containing an array I had expected, that I get a 
similar result, when subsetting it, like having a matrix in a data.frame. But 
instead I get only the first element and not all values of the remaining 
dimensions. Differences are already when creating the data.frame, where I can 
use `I` in case of a matrix but for an array I am only able to insert it in a 
second step.

DFA <- data.frame(id = 1:2)
DFA[["ar"]] <- array(1:8, c(2,2,2))

DFA[1,]
#  id ar
#1  1  1

DFM <- data.frame(id = 1:2, M = I(matrix(1:4, 2)))

DFM[1,]
#  id M.1 M.2
#1  1   1   3

The same when trying to use merge, where only the first value is kept.

merge(DFA, data.frame(id = 1))
#  id ar
#1  1  1

merge(DFM, data.frame(id = 1))
#  id M.1 M.2
#1  1   1   3

Is there a way to use an array in a data.frame like I can use a matrix in a 
data.frame?

I am using R version 4.3.0.

Kind regards,
Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Are you looking for something like this?


DFA <- data.frame(id = 1:2)
DFA[["ar"]] <- array(1:8, c(2,2,2))

DFA$ar[1, , ]
#>  [,1] [,2]
#> [1,]1    5
#> [2,]37


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grubbs test to detect all outliers

2023-04-29 Thread Rui Barradas

Às 14:01 de 29/04/2023, AbouEl-Makarim Aboueissa escreveu:

Hi Rui:


How about this dataset, please see below. I included a few outliers in each
column, as you can see in the printed dataset; please see below.


Once again, thank you very much, and sorry if I bothered you all.

abou




dput(datafortest)

structure(list(factor1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, NA, NA, NA, NA), levels = c("1", "2", "3"), class = "factor"),
 X = c(994455.077, 4348.031, .789, 3813.139, 12.65, 5642.667,
 876684.386, 5165.731, NA, 3259.241, 8.383, 1997.878, 0.608,
 2655.977, 9.49, 1826.851, 4386.002, 883295.091, 2120.902,
 NA, 2056.123, 5.088, NA, 92539.873, NA, NA, NA, NA), Y = c(76888L,
 333L, 618L, 10L, 344L, NA, 3L, 86999L, 265L, 557L, 7L,
 383L, NA, NA, 8L, 287L, 352L, 308L, 999526L, 489L, 2L,
 444L, 9L, 333L, NA, NA, NA, NA), factor2 = structure(c(1L,
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("1",
 "2", "3"), class = "factor"), Z = c(54999L, 475L, 15L, 603L,
 442L, 79486L, 927L, 971L, 388L, 888L, 514L, 409L, 546L, 523L,
 313L, 296L, 320L, 388L, 7L, 677L, 555L, NA, 479L, 257L,
 313L, 21L, 320L, 4L), U = c(NA, NA, 1.5, 332, 216, 217, 1000,
 10, , 444, NA, 5, 327, 5, 456, 412, 251, 6, 398,
 438, 428, 15, NA, 406, 334, 465, 180, 88999), V = c(12, 240,
 9000, 265, NA, 9, 1, 562, 13, 777, 322, NA, 99988, 653,
 450, 576, NA, 396.5, 91888, 5, 219, NA, 321, 417, 409, 99,
 523, 10)), row.names = c(NA, -28L), class = "data.frame")







datafortest

factor1  X  Y factor2 Z   UV
11 994455.077  76888   1 54999  NA 12.0
21   4348.031333   1   475  NA240.0
31   .789618   115 1.5   9000.0
41   3813.139 10   1   603   332.0265.0
51 12.650344   1   442   216.0   NA
61   5642.667 NA   1 79486   217.0  9.0
71 876684.386  3   1   927  1000.0  1.0
82   5165.731  86999   1   97110.0562.0
92 NA265   1   388  .0 13.0
10   2   3259.241557   2   888   444.0777.0
11   2  8.383  7   2   514  NA322.0
12   2   1997.878383   2   409 5.0   NA
13   2  0.608 NA   2   546   327.0  99988.0
14   2   2655.977 NA   2   523 5.0653.0
15   3  9.490  8   2   313   456.0450.0
16   3   1826.851287   2   296   412.0576.0
17   3   4386.002352   2   320   251.0   NA
18   3 883295.091308   2   388 6.0396.5
19   3   2120.902 999526   3 7   398.0  91888.0
20   3 NA489   3   677   438.0  5.0
21   3   2056.123  2   3   555   428.0219.0
22   3  5.088444   3NA15.0   NA
23   3 NA  9   3   479  NA321.0
24   3  92539.873333   3   257   406.0417.0
25 NA NA   3   313   334.0409.0
26 NA NA   321   465.0 99.0
27 NA NA   3   320   180.0523.0
28 NA NA   3 4 88999.0 10.0






with many thanks
abou

__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Sat, Apr 29, 2023 at 8:05 AM Rui Barradas  wrote:


Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:

*R: *Grubbs Test to detect all outliers Per group for all columns in a

data

frame



Dear All: good morning

I have a dataset (as an example) with two column factors (factor1 and
factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have

same

length as factor1; and Z, U, and V have same length as factor2. Please

see

dataset is copied below. Please note that all dataset columns have NAs
values.

*Need help on this:*


Can we use the grubbs.test() function to detect all outliers and replace

it

by NA in X and Y datasets per group in factor1; and in Z, U, and V

datasets

per group in factor2. Columns in the dataframe have different lengths,

but

when I read the .csv file, R added NA values for the shorter columns.

If you need the .csv data file, please let me know.


Thank you very much for your help in advance.




install.packages("outliers")
library(outliers)

datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
datafortest

datafortest<-data.frame(datafortest)

datafortest$factor1<-as.factor(datafortest$factor1)
datafortest$factor2<-as.fact

Re: [R] grubbs test to detect all outliers

2023-04-29 Thread Rui Barradas

Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:

*R: *Grubbs Test to detect all outliers Per group for all columns in a data
frame



Dear All: good morning

I have a dataset (as an example) with two column factors (factor1 and
factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same
length as factor1; and Z, U, and V have same length as factor2. Please see
dataset is copied below. Please note that all dataset columns have NAs
values.

*Need help on this:*


Can we use the grubbs.test() function to detect all outliers and replace it
by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets
per group in factor2. Columns in the dataframe have different lengths, but
when I read the .csv file, R added NA values for the shorter columns.

If you need the .csv data file, please let me know.


Thank you very much for your help in advance.




install.packages("outliers")
library(outliers)

datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
datafortest

datafortest<-data.frame(datafortest)

datafortest$factor1<-as.factor(datafortest$factor1)
datafortest$factor2<-as.factor(datafortest$factor2)

str(datafortest)

# tried to use grubbs.test() on a single column of the dataframe, but
still not working
tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11)




*grubbs.test() on a single dataset: but this can only detect if the min and
the max are outliers.*


xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99)
grubbs.test(xx999, type=11)




With many thanks

Abou



factor1  XY factor2  Z   U
   V
1 4455.077 888 1 999   NA 999
1 4348.031 333 1 475NA 240
1.789 618 1 507 252 394
13813.139 417 1 603 332 265
1  7512.65 344 1 442 216   NA
1 5642.667NA 1 486 217 275
1 6684.386 341 1 927 698 479
2 5165.731 999 1 971 311 562
2 NA 265 1 388 999 512
2 3259.241 557 2 888 444 777
2 3288.383 234 2 514NA 322
2  1997.878 383 2 409 311   NA
2   0.61   NA 2 546 327 728
2   2655.977  NA 2 523 228 653
3  3189.49  2 313 456 450
3  1826.851 287 2 296 412 576
3  4386.002 352 2 320 251 NA
3  3295.091 308 2 388 888 396.5
3  2120.902 526 3  398 888
3 NA 489 3 677 438 307
3  2056.123 291 3 555 428 219
3  1995.088 444 3  NA 319   NA
3 NA 349 3 479   NA 321
3  2539.873 333 3 257 406 417
   3 313 334 409
   3 296 465 546
   3 320 180 523
   3 388 999 313



__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

With the data file you have attached I cannot reproduce any errors, all 
went well at the first try.



library(outliers)

fl <- "~/data_for_test.csv"
datafortest <- read.csv(fl)

# these are not needed to run the test
datafortest$factor1 <- as.factor(datafortest$factor1)
datafortest$factor2 <- as.factor(datafortest$factor2)
str(datafortest)
#> 'data.frame':28 obs. of  7 variables:
#>  $ factor1: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 2 2 2 ...
#>  $ X  : num  4455 4348 1 3813 7513 ...
#>  $ Y  : int  888 333 618 417 344 NA 341 999 265 557 ...
#>  $ factor2: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ...
#>  $ Z  : int  999 475 507 603 442 486 927 971 388 888 ...
#>  $ U  : int  NA NA 252 332 216 217 698 311 999 444 ...
#>  $ V  : num  999 240 394 265 NA 275 479 562 512 777 ...
head(datafortest)
#>   factor1X   Y factor2   Z   U   V
#> 1   1 4455.077 888   1 999  NA 999
#> 2   1 4348.031 333   1 475  NA 240
#> 3   1 .789 618   1 507 252 394
#> 4   1 3813.139 417   1 603 332 265
#> 5   1 7512.650 344   1 442 216  NA
#> 6   1 5642.667  NA   1 486 217 275

# tried to use grubbs.test() on a single column of the dataframe, but
# still not working
grubbs.test(datafortest$X, type = 11)
#>
#>  Grubbs test for two opposite outliers
#>
#> data:  datafortest$X
#> G = 4.6640014, U = 0.0091756, p-value = 0.02867
#> alternative hypothesis: 1826.851 and 0.608 are outliers



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE

Re: [R] grubbs test to detect all outliers

2023-04-28 Thread Rui Barradas

Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:

*R: *Grubbs Test to detect all outliers Per group for all columns in a data
frame



Dear All: good morning

I have a dataset (as an example) with two column factors (factor1 and
factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same
length as factor1; and Z, U, and V have same length as factor2. Please see
dataset is copied below. Please note that all dataset columns have NAs
values.

*Need help on this:*


Can we use the grubbs.test() function to detect all outliers and replace it
by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets
per group in factor2. Columns in the dataframe have different lengths, but
when I read the .csv file, R added NA values for the shorter columns.

If you need the .csv data file, please let me know.


Thank you very much for your help in advance.




install.packages("outliers")
library(outliers)

datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
datafortest

datafortest<-data.frame(datafortest)

datafortest$factor1<-as.factor(datafortest$factor1)
datafortest$factor2<-as.factor(datafortest$factor2)

str(datafortest)

# tried to use grubbs.test() on a single column of the dataframe, but
still not working
tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11)




*grubbs.test() on a single dataset: but this can only detect if the min and
the max are outliers.*


xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99)
grubbs.test(xx999, type=11)




With many thanks

Abou



factor1  XY factor2  Z   U
   V
1 4455.077 888 1 999   NA 999
1 4348.031 333 1 475NA 240
1.789 618 1 507 252 394
13813.139 417 1 603 332 265
1  7512.65 344 1 442 216   NA
1 5642.667NA 1 486 217 275
1 6684.386 341 1 927 698 479
2 5165.731 999 1 971 311 562
2 NA 265 1 388 999 512
2 3259.241 557 2 888 444 777
2 3288.383 234 2 514NA 322
2  1997.878 383 2 409 311   NA
2   0.61   NA 2 546 327 728
2   2655.977  NA 2 523 228 653
3  3189.49  2 313 456 450
3  1826.851 287 2 296 412 576
3  4386.002 352 2 320 251 NA
3  3295.091 308 2 388 888 396.5
3  2120.902 526 3  398 888
3 NA 489 3 677 438 307
3  2056.123 291 3 555 428 219
3  1995.088 444 3  NA 319   NA
3 NA 349 3 479   NA 321
3  2539.873 333 3 257 406 417
   3 313 334 409
   3 296 465 546
   3 320 180 523
   3 388 999 313



__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Please post the output of

dput(datafortest)

your data is difficult to read into a R session.


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grDevices::hcl.colors using two colours: Bug or Feature?

2023-04-28 Thread Rui Barradas

Às 11:07 de 28/04/2023, Achim Zeileis escreveu:

This was introduced in 4.3.0 (hence Rui cannot reproduce it in 4.2.3).

It's a bug and was introduced when fixing this other bug:

https://bugs.R-project.org/show_bug.cgi?id=18476
https://hypatia.math.ethz.ch/pipermail/r-help/2023-February/476960.html

Apparently, it only affects the case with n = 2 for diverging and 
divergingx palettes. The culprit is this line:


i <- if(n2 == 1L) 0 else seq.int(1, by = -2/(n - 1), length.out = n2)

I think n2 == 1L is not the right condition and we need to distinguish n 
= 1 and n = 2.


Will have a closer look...

Thanks for reporting this!
Achim

On Fri, 28 Apr 2023, Rui Barradas wrote:


Às 06:01 de 28/04/2023, Stevie Pederson escreveu:

Hi,

I'm not sure if this is a bug or a feature, but after updating to 
Rv4.3, if
requesting two colours from hcl.colors() you now get the same colour 
twice.

This occurs for all palettes I've tried. My reprex:

hcl.colors(2, "Vik")
[1] "#F1F1F1" "#F1F1F1"

As I have multiple workflows I run repeatedly with A vs B 
comparisons, this

has just broken the visualisations in many of them. Obviously a
workaround is hcl.colors(3, "Vik")[c(1, 3)] but this seems rather
unintuitive.

Thanks in advance,

Stevie

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_AU.UTF-8    LC_COLLATE=en_AU.UTF-8
  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
  [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot reproduce this on Windows.


hcl.colors(2, "Vik")
# [1] "#002E60" "#3E2000"

clrs <- sapply(hcl.pals(), \(p) hcl.colors(2, p))
any(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] FALSE

sessionInfo()
# R version 4.2.3 (2023-03-15 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 22621)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=Portuguese_Portugal.utf8  
LC_CTYPE=Portuguese_Portugal.utf8

# [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C
# [5] LC_TIME=Portuguese_Portugal.utf8
#
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base
#
# loaded via a namespace (and not attached):
# [1] compiler_4.2.3


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



Hello,

Right! I ran the wrong R version, here it is with R 4.3.0.
The bug is now reproducible on Windows 11.


hcl.colors(2, "Vik")
# [1] "#F1F1F1" "#F1F1F1"

clrs <- sapply(hcl.pals(), \(p) hcl.colors(2, p))
any(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] TRUE

sum(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] 35

which(apply(clrs, 2, \(x) x[1] == x[2]))
#  Blue-RedBlue-Red 2Blue-Red 3 Red-Green  Purple-Green
#8081828384
#  Purple-Brown   Green-Brown Blue-Yellow 2 Blue-Yellow 3  Green-Orange
#8586878889
#  Cyan-MagentaTropic  Broc  Cork   Vik
#9091929394
#BerlinLisbonTofino Earth  Fall
#95969799   100
#Geyser  TealRose Temps  PuOr  RdBu
#   101   102   103   104   105
#  RdGy  PiYG  PRGn  BrBGRdYlBu
#   106   107   108   109   110
#RdYlGn  Spectral  Zissou 1   Cividis  Roma
#   111   112   113   114   115

sessionInfo()
# R version 4.3.0 (2023-04-21 ucrt)
# Platform: x86_64-w64

Re: [R] grDevices::hcl.colors using two colours: Bug or Feature?

2023-04-28 Thread Rui Barradas

Às 06:01 de 28/04/2023, Stevie Pederson escreveu:

Hi,

I'm not sure if this is a bug or a feature, but after updating to Rv4.3, if
requesting two colours from hcl.colors() you now get the same colour twice.
This occurs for all palettes I've tried. My reprex:

hcl.colors(2, "Vik")
[1] "#F1F1F1" "#F1F1F1"

As I have multiple workflows I run repeatedly with A vs B comparisons, this
has just broken the visualisations in many of them. Obviously a
workaround is hcl.colors(3, "Vik")[c(1, 3)] but this seems rather
unintuitive.

Thanks in advance,

Stevie

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
  [1] LC_CTYPE=en_AU.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_AU.UTF-8LC_COLLATE=en_AU.UTF-8
  [5] LC_MONETARY=en_AU.UTF-8LC_MESSAGES=en_AU.UTF-8
  [7] LC_PAPER=en_AU.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

time zone: Australia/Adelaide
tzcode source: system (glibc)

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.3.0 tools_4.3.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot reproduce this on Windows.


hcl.colors(2, "Vik")
# [1] "#002E60" "#3E2000"

clrs <- sapply(hcl.pals(), \(p) hcl.colors(2, p))
any(apply(clrs, 2, \(x) x[1] == x[2]))
# [1] FALSE

sessionInfo()
# R version 4.2.3 (2023-03-15 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 22621)
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=Portuguese_Portugal.utf8  LC_CTYPE=Portuguese_Portugal.utf8
# [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C
# [5] LC_TIME=Portuguese_Portugal.utf8
#
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base
#
# loaded via a namespace (and not attached):
# [1] compiler_4.2.3


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] detect and replace outliers by the averaged

2023-04-21 Thread Rui Barradas

Hello,


Às 09:42 de 21/04/2023, Jeff Newmiller escreveu:

  0


Somewhat cryptic...

Rui Barradas


On April 21, 2023 4:08:08 AM GMT+09:00, Dr Eberhard W Lisse  
wrote:

There is at least one outliers package on CRAN.

el
On 20/04/2023 20:43, AbouEl-Makarim Aboueissa wrote:

Dear All:  *please discard my previous email*



*Re:* detect and replace outliers by the average



The dataset, please see attached, contains a group factoring column “
*factor*” and two columns of data “x1” and “x2” with some NA values. I need
some help to detect the outliers and replace it and the NAs with the
average within each level (0,1,2) for each variable “x1” and “x2”.



I tried the below code, but it did not accomplish what I want to do.





data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)

data

replace_outlier_with_mean <- function(x) {

   replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))   ,
na.rm=TRUE NOT working

}

data[] <- lapply(data, replace_outlier_with_mean)





Thank you all very much for your help in advance.





with many thanks

abou


__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] detect and replace outliers by the average

2023-04-20 Thread Rui Barradas

Às 19:58 de 20/04/2023, Rui Barradas escreveu:

Às 19:46 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:

Hi Rui:


here is the dataset

factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0  520
0 610 720
0 710 670
0 610 
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 
1  600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78
2 98
2 5
2 321 NA

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Apr 20, 2023 at 2:44 PM Rui Barradas  
wrote:



Às 19:36 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:

Dear All:



*Re:* detect and replace outliers by the average



The dataset, please see attached, contains a group factoring column “
*factor*” and two columns of data “x1” and “x2” with some NA values. I

need

some help to detect the outliers and replace it and the NAs with the
average within each level (0,1,2) for each variable “x1” and “x2”.



I tried the below code, but it did not accomplish what I want to do.





data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)

data

replace_outlier_with_mean <- function(x) {

    replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))  
 ,

na.rm=TRUE NOT working

}

data[] <- lapply(data, replace_outlier_with_mean)





Thank you all very much for your help in advance.





with many thanks

abou


__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

There is no data set attached, see the posting guide on what file
extensions are allowed as attachments.

As for the question, try to compute mean(x, na.rm = TRUE)  first, then
use this value in the replace instruction. Without data I'm just 
guessing.


Hope this helps,

Rui Barradas





Hello,

Here is a way. It uses ave in the function to group the data by the factor.


df1 <- "factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0  520
0 610 720
0 710 670
0 610 
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 
1  600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78 NA
2 98 NA
2 5 NA
2 321 NA"
df1 <- read.table(text = df1, header = TRUE,
   colClasses = c("factor", "numeric", "numeric"))


replace_outlier_with_mean <- function(x, f) {
   ave(x, f, FUN = \(y) {
     i <- is.na(y) | y %in% boxplot.stats(y, do.conf = FALSE)$out
     y[i] <- mean(y, na.rm = TRUE)
     y
   })
}

lapply(df1[-1], replace_outlier_with_mean, f = df1$factor)
#> $x1
#>  [1]  700.  700.  470.  710. 1258.1250  610. 
710.
#>  [8]  610.  690.  580.  690. 1261.7778  450. 
700.
#> [15]  400. 1261.7778  500.  680.  117.  120. 
130.
#> [22]  120.  125.  194.6923  194.6923  130.  123. 
194.6923

#> [29]   98.  194.6923  194.6923
#>
#> $x2
#>  [1]  700.  500.  470.  560.  520.  720. 
670.
#>  [8] 1767.3750  620.  540.  690.  401.  580. 
700.

#> [15] 1406.9000  600.  400.  650.   63.   68. 73.
#> [22]   69.   54.   70.   62.  168.   70. 
168.

#> [29]  168.  168.  168.


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

A simpler version of the same function, this time with replace(), like 
the OP. The results are identical().



replace_outlier_with_mean <- function(x, f) {
  ave(x, f, FUN = \(y) {
i <- is.na(y) | y %in% boxplot.stats(y, do.conf = FALSE)$out
replace(y, i, mean(y, na.rm = TRUE))
  })
}


Also, my data copy from a previous mail, is wrong, there are 3 
NA's in the wrong column. The following is better.


df1 <- read.table("data.txt", header = TRUE, sep = "\t",
  colClasses = c("factor", "numeric", "numeric"))


Hope this helps,

Rui Barradas

__
R-he

Re: [R] detect and replace outliers by the average

2023-04-20 Thread Rui Barradas

Às 19:46 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:

Hi Rui:


here is the dataset

factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0  520
0 610 720
0 710 670
0 610 
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 
1  600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78
2 98
2 5
2 321 NA

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Apr 20, 2023 at 2:44 PM Rui Barradas  wrote:


Às 19:36 de 20/04/2023, AbouEl-Makarim Aboueissa escreveu:

Dear All:



*Re:* detect and replace outliers by the average



The dataset, please see attached, contains a group factoring column “
*factor*” and two columns of data “x1” and “x2” with some NA values. I

need

some help to detect the outliers and replace it and the NAs with the
average within each level (0,1,2) for each variable “x1” and “x2”.



I tried the below code, but it did not accomplish what I want to do.





data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)

data

replace_outlier_with_mean <- function(x) {

replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))   ,
na.rm=TRUE NOT working

}

data[] <- lapply(data, replace_outlier_with_mean)





Thank you all very much for your help in advance.





with many thanks

abou


__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Mathematics and Statistics*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

There is no data set attached, see the posting guide on what file
extensions are allowed as attachments.

As for the question, try to compute mean(x, na.rm = TRUE)  first, then
use this value in the replace instruction. Without data I'm just guessing.

Hope this helps,

Rui Barradas





Hello,

Here is a way. It uses ave in the function to group the data by the factor.


df1 <- "factor x1 x2
0 700 700
0 700 500
0 470 470
0 710 560
0  520
0 610 720
0 710 670
0 610 
1 690 620
1 580 540
1 690 690
1 NA 401
1 450 580
1 700 700
1 400 
1  600
1 500 400
1 680 650
2 117 63
2 120 68
2 130 73
2 120 69
2 125 54
2 999 70
2 165 62
2 130 987
2 123 70
2 78 NA
2 98 NA
2 5 NA
2 321 NA"
df1 <- read.table(text = df1, header = TRUE,
  colClasses = c("factor", "numeric", "numeric"))


replace_outlier_with_mean <- function(x, f) {
  ave(x, f, FUN = \(y) {
i <- is.na(y) | y %in% boxplot.stats(y, do.conf = FALSE)$out
y[i] <- mean(y, na.rm = TRUE)
y
  })
}

lapply(df1[-1], replace_outlier_with_mean, f = df1$factor)
#> $x1
#>  [1]  700.  700.  470.  710. 1258.1250  610. 
710.
#>  [8]  610.  690.  580.  690. 1261.7778  450. 
700.
#> [15]  400. 1261.7778  500.  680.  117.  120. 
130.
#> [22]  120.  125.  194.6923  194.6923  130.  123. 
194.6923

#> [29]   98.  194.6923  194.6923
#>
#> $x2
#>  [1]  700.  500.  470.  560.  520.  720. 
670.
#>  [8] 1767.3750  620.  540.  690.  401.  580. 
700.
#> [15] 1406.9000  600.  400.  650.   63.   68. 
73.
#> [22]   69.   54.   70.   62.  168.   70. 
168.

#> [29]  168.  168.  168.


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >