Re: [R] Inquiry About R Packages for Specific Research Areas

2024-09-19 Thread Rui Barradas

Hello,

There is a CRAN Task View: Epidemiology that should be or have what you 
are looking for.


[1] https://CRAN.R-project.org/view=Epidemiology

Hope this helps,

Rui Barradas

Às 06:29 de 19/09/2024, Aleena Shaji escreveu:

Dear R Support Team,

I hope this email finds you well.

I am writing to inquire about the specific R packages that would best suit
our academic research project, which involves analyses in various fields.
We are particularly interested in the following areas:

Epidemiology Analysis: We are aware that packages like epiR, survival, and
epitools exist for epidemiological analysis. Could you please confirm which
of these (or others) would be most suitable for our needs?
Dietary Intake/Analysis: We are considering packages like foodfreq and
Dietary for dietary intake analysis. Are these the best options, or do you
recommend other packages for this purpose?
Pedigree Analysis: We are exploring the kinship2 and pedigree packages for
pedigree data analysis. Is there a package you would suggest for more
comprehensive analysis?
Migration-Related Study: We are interested in migration-related studies and
have identified the migrant and spatstat packages. Would these be the most
appropriate, or are there others we should consider?
We would appreciate your guidance in selecting the best packages that align
with our research interests. Additionally, are there any resources or
documentation that you recommend for getting started with these packages?

Thank you for your support, and we look forward to your response.

Best regards,
Aleena

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2024-09-16 Thread Rui Barradas

Às 15:23 de 16/09/2024, Francesca escreveu:

Sorry for posting a non understandable code. In my screen the dataset
looked correctly.


I recreated my dataset, folllowing your example:

test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA, 17,
  2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
 c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
 c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
2, 3, 2, 3, 3, 2, 2 ,4),
 c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
8, 5, 1, 2, 4, 7, 6, 6)))
colnames(test)<-c("cp1","cp2","role","groupid")

What I have done so far is the following, that works:
  test %>%
   group_by(groupid) %>%
   mutate(across(starts_with("cp"), list(mean = mean)))

But the problem is with NA: everytime the mean encounters a NA, it creates
NA for all group members.
I need the software to calculate the mean ignoring NA. So when the group is
made of three people, mean of the three.
If the group is two values and an NA, calculate the mean of two.

My code works , creates a mean at each position for three subjects,
replacing instead of the value of the single, the group mean.
But when NA appears, all the group gets NA.

Perhaps there is a different way to obtain the same result.



On Mon, 16 Sept 2024 at 11:35, Rui Barradas  wrote:


Às 08:28 de 16/09/2024, Francesca escreveu:

Dear Contributors,
I hope someone has found a similar issue.

I have this data set,



cp1
cp2
role
groupid
1
10
13
4
5
2
5
10
3
1
3
7
7
4
6
4
10
4
2
7
5
5
8
3
2
6
8
7
4
4
7
8
8
4
7
8
10
15
3
3
9
15
10
2
2
10
5
5
2
4
11
20
20
2
5
12
9
11
3
6
13
10
13
4
3
14
12
6
4
2
15
7
4
4
1
16
10
0
3
7
17
20
15
3
8
18
10
7
3
4
19
8
13
3
5
20
10
9
2
6



I need to to average of groups, using the values of column groupid, and
create a twin dataset in which the mean of the group is replaced instead

of

individual values.
So for example, groupid 3, I calculate the mean (12+18)/2 and then I
replace in the new dataframe, but in the same positions, instead of 12

and

18, the values of the corresponding mean.
I found this solution, where db10_means is the output dataset, db10 is my
initial data.

db10_means<-db10 %>%
group_by(groupid) %>%
mutate(across(starts_with("cp"), list(mean = mean)))

It works perfectly, except that for NA values, where it replaces to all
group members the NA, while in some cases, the group is made of some NA

and

some values.
So, when I have a group of two values and one NA, I would like that for
those with a value, the mean is replaced, for those with NA, the NA is
replaced.
Here the mean function has not the na.rm=T option associated, but it
appears that this solution cannot be implemented in this case. I am not
even sure that this would be enough to solve my problem.
Thanks for any help provided.


Hello,

Your data is a mess, please don't post html, this is plain text only
list. Anyway, I managed to create a data frame by copying the data to a
file named "rhelp.txt" and then running



db10 <- scan(file = "rhelp.txt", what = character())
header <- db10[1:4]
db10 <- db10[-(1:4)] |> as.numeric()
db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
as.data.frame() |>
setNames(header)

str(db10)
#> 'data.frame':25 obs. of  4 variables:
#>  $ cp1: num  1 5 3 7 10 5 2 4 8 10 ...
#>  $ cp2: num  10 2 1 4 4 5 6 4 4 15 ...
#>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
#>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...


And here is the data in dput format.



db10 <-
structure(list(
  cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
  2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
  cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
  4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
  role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
   11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
  groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
  class = "data.frame", row.names = c(NA, -25L))



As for the problem, I am not sure if you want summarise instead of
mutate but here is a summarise solution.



library(dplyr)

db10 %>%
group_by(groupid) %>%
summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))

# same result, summarise's new argument .by avoids the need to group_by
db10 %>%
summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by =
groupid)



Can you post the expected output too?

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
www.avg.com





Hello,

Something like this?


test <-
  structure(list(
cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2

Re: [R] (no subject)

2024-09-16 Thread Rui Barradas

Às 08:28 de 16/09/2024, Francesca escreveu:

Dear Contributors,
I hope someone has found a similar issue.

I have this data set,



cp1
cp2
role
groupid
1
10
13
4
5
2
5
10
3
1
3
7
7
4
6
4
10
4
2
7
5
5
8
3
2
6
8
7
4
4
7
8
8
4
7
8
10
15
3
3
9
15
10
2
2
10
5
5
2
4
11
20
20
2
5
12
9
11
3
6
13
10
13
4
3
14
12
6
4
2
15
7
4
4
1
16
10
0
3
7
17
20
15
3
8
18
10
7
3
4
19
8
13
3
5
20
10
9
2
6



I need to to average of groups, using the values of column groupid, and
create a twin dataset in which the mean of the group is replaced instead of
individual values.
So for example, groupid 3, I calculate the mean (12+18)/2 and then I
replace in the new dataframe, but in the same positions, instead of 12 and
18, the values of the corresponding mean.
I found this solution, where db10_means is the output dataset, db10 is my
initial data.

db10_means<-db10 %>%
   group_by(groupid) %>%
   mutate(across(starts_with("cp"), list(mean = mean)))

It works perfectly, except that for NA values, where it replaces to all
group members the NA, while in some cases, the group is made of some NA and
some values.
So, when I have a group of two values and one NA, I would like that for
those with a value, the mean is replaced, for those with NA, the NA is
replaced.
Here the mean function has not the na.rm=T option associated, but it
appears that this solution cannot be implemented in this case. I am not
even sure that this would be enough to solve my problem.
Thanks for any help provided.


Hello,

Your data is a mess, please don't post html, this is plain text only 
list. Anyway, I managed to create a data frame by copying the data to a 
file named "rhelp.txt" and then running




db10 <- scan(file = "rhelp.txt", what = character())
header <- db10[1:4]
db10 <- db10[-(1:4)] |> as.numeric()
db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
  as.data.frame() |>
  setNames(header)

str(db10)
#> 'data.frame':25 obs. of  4 variables:
#>  $ cp1: num  1 5 3 7 10 5 2 4 8 10 ...
#>  $ cp2: num  10 2 1 4 4 5 6 4 4 15 ...
#>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
#>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...


And here is the data in dput format.



db10 <-
  structure(list(
cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
class = "data.frame", row.names = c(NA, -25L))



As for the problem, I am not sure if you want summarise instead of 
mutate but here is a summarise solution.




library(dplyr)

db10 %>%
  group_by(groupid) %>%
  summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))

# same result, summarise's new argument .by avoids the need to group_by
db10 %>%
  summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by = 
groupid)




Can you post the expected output too?

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "And" condition spanning over multiple columns in data frame

2024-09-12 Thread Rui Barradas

Às 08:42 de 12/09/2024, Francesca escreveu:

Dear contributors,
I need to create a set of columns, based on conditions of a dataframe as
follows.
I have managed to do the trick for one column, but I do not seem to find
any good example where the condition is extended to all the dataframe.

I have these dataframe called c10Dt:



id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12
1  1  NA  NA  NA  NA  NA  NA  NA  NA  NA   NA   NA   NA
2  4   8  18  15  10  12  11   9  18   8   16   15   NA
3  3   8   5   5   4  NA   5  NA   6  NA   10   10   10
4  3   5   5   4   4   3   2   1   3   2112
5  1  NA  NA  NA  NA  NA  NA  NA  NA  NA   NA   NA   NA
6  2   5   5  10  10   9  10  10  10  NA   109   10
-- Columns are id, cp1, cp2.. and so on. What I need to do is the 
following, made on just one column: c10Dt <- mutate(c10Dt, exit1= 
ifelse(is.na(cp1) & id!=1, 1, 0)) So, I create a new variable, called 
exit1, in which the program selects cp1, checks if it is NA, and if it 
is NA but also the value of the column "id" is not 1, then it gives back 
a 1, otherwise 0. So, what I want is that it selects all the cases in 
which the id=2,3, or 4 is not NA in the corresponding values of the 
matrix. I managed to do it manually column by column, but I feel there 
should be something smarter here. The problem is that I need to 
replicate this over all the columns from cp2, to cp12, but keeping fixed 
the id column instead. I have tried with c10Dt %>% 
mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 ) but 
the problem with across is that it will implement the condition only on 
cp_ columns. How do I tell R to use the column id with all the other 
columns? Thanks for any help provided. Francesca 
--


Hello,

Something like this?

1. If an ifelse instruction is meant to create a binary result, coerce 
the logical condition to integer instead. You can make it more clear by 
substituting as.integer for the plus sign below;
2. the .names argument is used to create new columns and keeping the 
original ones.




df1 <- read.table(text = "id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 
cp11 cp12

1  1  NA  NA  NA  NA  NA  NA  NA  NA  NA   NA   NA   NA
2  4   8  18  15  10  12  11   9  18   8   16   15   NA
3  3   8   5   5   4  NA   5  NA   6  NA   10   10   10
4  3   5   5   4   4   3   2   1   3   2112
5  1  NA  NA  NA  NA  NA  NA  NA  NA  NA   NA   NA   NA
6  2   5   5  10  10   9  10  10  10  NA   109   10", header = TRUE)
df1

library(dplyr)

df1 %>%
  mutate(across(starts_with("cp"),  ~ +(is.na(.) & id != 1), .names = 
"{col}_new"))




Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prediction from Arima model

2024-08-31 Thread Rui Barradas

Às 18:54 de 31/08/2024, Christofer Bogaso escreveu:

Hi,

I have run following code to obtain one step ahead confidence interval
from am arima model

library(forecast)

set.seed(100)

forecast(Arima(rnorm(100), order = c(1,0,1), xreg = rt(100, 1)), h =
1, xreg = 10)

However this appear to provide the Prediction interval, however I
wanted to get the confidence interval for the new value.

Is there any way to get the confidence interval for the new value?

I also wanted to get the estimate of SE for the new value which is
used to obtain the confidence interval of the new value. Is there any
method available to obtain that?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To get the se use ?predict.Arima instead.


library(forecast)

set.seed(100)

model <- Arima(rnorm(100), order = c(1,0,1), xreg = rt(100, 1))
# in predict.Arima, se.fit defaults to TRUE
pred <- predict(model, n.ahead = 1, newxreg = 10)

pred$se
c(pred$se)

# with more points ahead
predict(model, n.ahead = 2, newxreg = c(10, 12))



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregating data with quality control

2024-08-31 Thread Rui Barradas
m = na.rm)

status_with_D <- sample(c('C', 'D'), 45, TRUE, c(.9, .1))
mydf$status <- c(rep("C", 50), "S", status_with_D)

subset_condition <- if(any(mydf$status == "D")) mydf$status == "D" else TRUE

aggregate(hs ~ format(data_POSIX, "%Y-%m-%d") + status, mydf, my.mean, 
subset = subset_condition)

#>   format(data_POSIX, "%Y-%m-%d") status   hs
#> 1 2024-01-02  D 51.2

# the formats in the OP but extracted from the date/time and used in the 
formula that follows.

year <- format(mydf$data_POSIX, "%Y")
month <- format(mydf$data_POSIX, "%m")
day <- format(mydf$data_POSIX, "%d")

aggregate(hs ~ year + month + day, mydf, my.mean)
#>   year month day   hs
#> 1 202401  01 52.37500
#> 2 202401  02 45.64583
aggregate(hs ~ year + month + day + status, mydf, my.mean, subset = 
subset_condition)

#>   year month day status   hs
#> 1 202401  02  D 51.2



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fill NA values in columns with values of another column

2024-08-28 Thread Rui Barradas

Às 16:24 de 28/08/2024, Ebert,Timothy Aaron escreveu:

Why not use na.omit() and then go from there? Unless one handles NA differently 
in different groups there is no point in processing the data by groups to 
remove NA even if later analysis steps do require group information.

Tim

-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Wednesday, August 28, 2024 4:19 AM
To: Francesca PANCOTTO ; r-help@r-project.org
Subject: Re: [R] Fill NA values in columns with values of another column

[External Email]

Às 11:23 de 27/08/2024, Francesca PANCOTTO via R-help escreveu:

Dear Contributors,
I have a problem with a database composed of many individuals for many
periods, for which I need to perform a manipulation of data as follows.
Here I report the procedure I need to do for the first 32 observations
of the first period.


cbind(VB1d[,1],s1id[,1])
[,1] [,2]
   [1,]68
   [2,]95
   [3,]   NA1
   [4,]56
   [5,]   NA7
   [6,]   NA2
   [7,]44
   [8,]27
   [9,]27
[10,]   NA3
[11,]   NA2
[12,]   NA4
[13,]56
[14,]95
[15,]   NA5
[16,]   NA6
[17,]   103
[18,]72
[19,]21
[20,]   NA7
[21,]72
[22,]   NA8
[23,]   NA4
[24,]   NA5
[25,]   NA6
[26,]21
[27,]44
[28,]68
[29,]   103
[30,]   NA3
[31,]   NA8
[32,]   NA1


In column s1id, I have numbers from 1 to 8, which are the id of 8
groups , randomly mixed in the larger group of 32.
For each group, I want the value that is reported for only to group
members, to all the four group members.

For example, value 8 in first row , second column, is group 8. The
value for group 8 of the variable VB1d is 6. At row 28, again for s1id
equal to 8, I have 6.
But in row 22, the value 8 of the second variable, reports a value NA.
in each group is the same, only two values have the correct number,
the other two are NA.
I need that each group, identified by the values of the variable S1id,
correctly report the number of variable VB1d that is present for just
two group members.

I hope my explanation is acceptable.
The task appears complex to me right now, especially because I will
need to multiply this procedure for x12x14 similar databases.

Anyone has ever encountered a similar problem?
Thanks in advance for any help provided.

--

Francesca Pancotto

Associate Professor Political Economy

University of Modena, Largo Santa Eufemia, 19, Modena

Office Phone: +39 0522 523264

Web:
*https://sit/
es.google.com%2Fview%2Ffrancescapancotto%2Fhome&data=05%7C02%7Ctebert%
40ufl.edu%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a6
2331e1b84%7C0%7C0%7C638604299508876897%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%
7C&sdata=yHdkL%2BmnsHgL1O3nE%2B0r4Wf5nvRgJp66VWJHHiYJVGA%3D&reserved=0
<https://sit/
es.google.com%2Fview%2Ffrancescapancotto%2Fhome&data=05%7C02%7Ctebert%
40ufl.edu%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a6
2331e1b84%7C0%7C0%7C638604299508887226%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%
7C&sdata=XsB7jdjGD5S7YKiyPhY5DSR%2F1yhPrTuFxdA5qz3KEBY%3D&reserved=0>*

   --

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat/
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu
%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a62331e1b84
%7C0%7C0%7C638604299508890269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=
BLTZvAFGtdZUoKefcgEtEsrw5pm4UHRUZJCGLXx5QFE%3D&reserved=0
PLEASE do read the posting guide
https://www/.
r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C0
ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a62331e1b84%7C0
%7C0%7C638604299508893127%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=q4Mj
%2BjSL2ZG0%2Fi0%2FrBUR3Z2B%2BbV6eH35to2Rt6kHUZ8%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a solution.
Split the 1st column by the 2nd, keep only the not-NA values and unlist, to 
have a named vector.
Then put the names and the values together with cbind.



mat <- structure(
c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L,
  9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L,
  10L, NA, NA, NA, 8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L,
  4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L,
  8L, 3L, 3L, 8L, 1L), dim = c(32L, 2L))


res <- split(mat[, 1L], mat[, 2L]) |> lapply(\(x) x[!is.na(x)]) |> unlist() nms <

Re: [R] Fill NA values in columns with values of another column

2024-08-28 Thread Rui Barradas

Às 11:23 de 27/08/2024, Francesca PANCOTTO via R-help escreveu:

Dear Contributors,
I have a problem with a database composed of many individuals for many
periods, for which I need to perform a manipulation of data as follows.
Here I report the procedure I need to do for the first 32 observations of
the first period.


cbind(VB1d[,1],s1id[,1])
   [,1] [,2]
  [1,]68
  [2,]95
  [3,]   NA1
  [4,]56
  [5,]   NA7
  [6,]   NA2
  [7,]44
  [8,]27
  [9,]27
[10,]   NA3
[11,]   NA2
[12,]   NA4
[13,]56
[14,]95
[15,]   NA5
[16,]   NA6
[17,]   103
[18,]72
[19,]21
[20,]   NA7
[21,]72
[22,]   NA8
[23,]   NA4
[24,]   NA5
[25,]   NA6
[26,]21
[27,]44
[28,]68
[29,]   103
[30,]   NA3
[31,]   NA8
[32,]   NA1


In column s1id, I have numbers from 1 to 8, which are the id of 8 groups ,
randomly mixed in the larger group of 32.
For each group, I want the value that is reported for only to group
members, to all the four group members.

For example, value 8 in first row , second column, is group 8. The value
for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to
8, I have 6.
But in row 22, the value 8 of the second variable, reports a value NA.
in each group is the same, only two values have the correct number, the
other two are NA.
I need that each group, identified by the values of the variable S1id,
correctly report the number of variable VB1d that is present for just two
group members.

I hope my explanation is acceptable.
The task appears complex to me right now, especially because I will need to
multiply this procedure for x12x14 similar databases.

Anyone has ever encountered a similar problem?
Thanks in advance for any help provided.

--

Francesca Pancotto

Associate Professor Political Economy

University of Modena, Largo Santa Eufemia, 19, Modena

Office Phone: +39 0522 523264

Web: *https://sites.google.com/view/francescapancotto/home
<https://sites.google.com/view/francescapancotto/home>*

  --

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a solution.
Split the 1st column by the 2nd, keep only the not-NA values and unlist, 
to have a named vector.

Then put the names and the values together with cbind.



mat <- structure(
  c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L,
9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L,
10L, NA, NA, NA, 8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L,
4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L,
8L, 3L, 3L, 8L, 1L), dim = c(32L, 2L))


res <- split(mat[, 1L], mat[, 2L]) |> lapply(\(x) x[!is.na(x)]) |> unlist()
nms <- names(res)
res <- cbind(
  VB1d = res,
  s1id = substr(nms, 1, nchar(nms) - 1L) |> as.integer()
)
res
#>VB1d s1id
#> 1121
#> 1221
#> 2172
#> 2272
#> 31   103
#> 32   103
#> 4144
#> 4244
#> 5195
#> 5295
#> 6156
#> 6256
#> 7127
#> 7227
#> 8168
#> 8268



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Very strange behavior of 'rep'

2024-08-15 Thread Rui Barradas

Às 19:39 de 15/08/2024, Izmirlian, Grant (NIH/NCI) [E] via R-help escreveu:

\n<>\n\n \n<<
This is very weird. I was running a swarm job on the cluster and it bombed
only for n.per.grp=108, not for the other values. Even though
n.per.grp*n.tt is 540, so that the length of the call to 'rep'
should be 1080, I'm getting a vector of length 1078.
 n.per.grp <- 108
 n.tt <- 5
 n.per.grp*n.tt
 length(rep(0:1, each=n.per.grp*n.tt))
 length(rep(0:1, each=108*5))

\n<>\n\n\n\n

--please do not edit the information below--

R Version:
  platform = x86_64-pc-linux-gnu
  arch = x86_64
  os = linux-gnu
  system = x86_64, linux-gnu
  status =
  major = 4
  minor = 4.1
  year = 2024
  month = 06
  day = 14
  svn rev = 86737
  language = R
  version.string = R version 4.4.1 (2024-06-14)
  nickname = Race for Your Life

Locale:
  
LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C.UTF-8;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C

Search Path:
  .GlobalEnv, package:lme4, package:Matrix, package:stats,
  package:graphics, package:grDevices, package:utils, package:datasets,
  package:showtext, package:showtextdb, package:sysfonts,
  package:methods, Autoloads, package:base

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot reproduce this behavior.



n.per.grp <- 108
n.tt <- 5
n.per.grp*n.tt
#> [1] 540
length(rep(0:1, each = n.per.grp*n.tt))
#> [1] 1080
length(rep(0:1, each = 108*5))
#> [1] 1080



But my version of R and my OS are different.
(I don't see how the error in the OP can be related to R version or OS.)



R.version
#>_
#> platform   x86_64-w64-mingw32
#> arch   x86_64
#> os mingw32
#> crtucrt
#> system x86_64, mingw32
#> status
#> major  4
#> minor  4.1
#> year   2024
#> month  06
#> day14
#> svn rev86737
#> language   R
#> version.string R version 4.4.1 (2024-06-14 ucrt)
#> nickname   Race for Your Life



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing

2024-08-11 Thread Rui Barradas

Às 15:36 de 11/08/2024, Steven Yen escreveu:

Thanks. Will try it.

Have not tried it but I think the following may work:

out$results<-NULL

out$results$ei<-ap

out$results$vi<-vap

All I need is printing by returning out (unless I turn it off). And, 
retrieve ap and vap as needed as shown above. Guess I need to read more 
about invisible.


On 8/11/2024 10:09 PM, Rui Barradas wrote:

Às 09:51 de 11/08/2024, Steven Yen escreveu:

Hi

In the following codes, I had to choose between printing (= TRUE) or 
deliver something for grab (ei, vi). Is there a way to get both--that 
is, to print and also have ei and vi for grab? Thanks.


Steven

...

out<-round(as.data.frame(cbind(ap,se,t,p)),digits)
out<-cbind(out,sig)
out<-out[!grepl(colnames(zx)[1],rownames(out)),]
if(printing){
cat("\nAPPs of bivariate ordered probit probabilities",
 "\nWritten by Steven T. Yen (Last update: 08.11.24)",
 "\ny1.level=", y1.level,
 "  y2.level=", y2.level,
 "\njoint12 =", joint12,
 "\nmarg1 =",   marg1,
 "\nmarg2 =",   marg2,
 "\ncond12 =",  cond12,
 "\ncond21 =",  cond21,
 "\nCovariance matrix:",vb.method,
 "\nWeighted =",    weighted,
 "\nAt means =",    mean,
 "\nProb x 100 =",  times100,
 "\ntesting ="  ,   testing,
 "\nuse_bb_and_vbb = ",use_bb_and_vbb,
 "\nsample size =", length(y1),"\n")
if (!resampling) cat("\nSEs by delta method","\n")
if  (resampling) cat("\nSEs K-R resampling with",ndraws,"draws\n")
return(out)
} else {
invisible(list("ei"=ap,"vi"=vap))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


Hello,

Maybe change the end of the code to return a bigger list.


ll <- list(out = out, ei = ap, vi = vap)
return(ll)
} else {
invisible(list("ei"=ap,"vi"=vap))


Hope this helps,

Rui Barradas




Hello,

Use descriptive names, print the data frame in the function and return a 
list invisibly.


Also,
1) Why create a data.frame with rounded numbers? I never do this. To 
round numbers is a matter of results display and in the case of df's 
should be left to the print.data.frame method. Always return the numbers 
as they are. See comment below.

2) And never, ever code the creation of a df as

as.data.frame(cbind(.))

If the vectors are a mix of numeric and character they will all be 
coerced to the least common denominator, all vectors will become of 
class character.




# don't do this
df <- round(as.data.frame(cbind(ap, se, t, p)), digits)
df <- cbind(df, sig)

# do this instead
df <- data.frame(ap, se, t, p, sig)

df <- df[!grepl(colnames(zx)[1],rownames(df)), ]

out <- NULL
if(printing){
  #
  [...rest of code...]
  #
  out$data <- df
  cat("data:\n")
  print(out$data)
}
out$results <- list(ei = ap, vi = vap)
invisible(out)



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing

2024-08-11 Thread Rui Barradas

Às 09:51 de 11/08/2024, Steven Yen escreveu:

Hi

In the following codes, I had to choose between printing (= TRUE) or 
deliver something for grab (ei, vi). Is there a way to get both--that 
is, to print and also have ei and vi for grab? Thanks.


Steven

...

out<-round(as.data.frame(cbind(ap,se,t,p)),digits)
out<-cbind(out,sig)
out<-out[!grepl(colnames(zx)[1],rownames(out)),]
if(printing){
cat("\nAPPs of bivariate ordered probit probabilities",
     "\nWritten by Steven T. Yen (Last update: 08.11.24)",
     "\ny1.level=", y1.level,
     "  y2.level=", y2.level,
     "\njoint12 =", joint12,
     "\nmarg1 =",   marg1,
     "\nmarg2 =",   marg2,
     "\ncond12 =",  cond12,
     "\ncond21 =",  cond21,
     "\nCovariance matrix:",vb.method,
     "\nWeighted =",    weighted,
     "\nAt means =",    mean,
     "\nProb x 100 =",  times100,
     "\ntesting ="  ,   testing,
     "\nuse_bb_and_vbb = ",use_bb_and_vbb,
     "\nsample size =", length(y1),"\n")
if (!resampling) cat("\nSEs by delta method","\n")
if  (resampling) cat("\nSEs K-R resampling with",ndraws,"draws\n")
return(out)
} else {
invisible(list("ei"=ap,"vi"=vap))

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


Hello,

Maybe change the end of the code to return a bigger list.


ll <- list(out = out, ei = ap, vi = vap)
return(ll)
} else {
invisible(list("ei"=ap,"vi"=vap))


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a fast way to do my job

2024-08-10 Thread Rui Barradas

Hello,

.lm.fit is an order of magnitude faster than lm.fit but the Description 
section warns on its use, see the examples in help("lm.fit").


Hope this helps,

Rui Barradas

Às 21:08 de 10/08/2024, Yuan Chun Ding via R-help escreveu:

You are right.  I also just thought about that, no intercept is not applicable 
to my case.

Ding

From: Bert Gunter 
Sent: Saturday, August 10, 2024 1:06 PM
To: Yuan Chun Ding 
Cc: Ben Bolker ; r-help@r-project.org
Subject: Re: [R] a fast way to do my job

Ah, messages crossed. A no-intercept model **assumes** the straight line fit 
must pass through the origin. Unless there is a strong justification for such 
an assumption, you should include an intercept. -- Bert On Sat, Aug 10, 2024 at 
1: 02 PM


Ah, messages crossed.

A no-intercept model **assumes** the straight line fit must pass

through the origin. Unless there is a strong justification for such an

assumption, you should include an intercept.



-- Bert



On Sat, Aug 10, 2024 at 1:02 PM Bert Gunter 
mailto:bgunter.4...@gmail.com>> wrote:






Is it because I failed to to add a column of ones for an intercept to



the x matrix? TRhat would be my bad.







-- Bert











On Sat, Aug 10, 2024 at 12:59 PM Bert Gunter 
mailto:bgunter.4...@gmail.com>> wrote:







Probably because you inadvertently ran different models. Without your code, I 
haven't a clue.











On Sat, Aug 10, 2024, 12:29 Yuan Chun Ding 
mailto:ycd...@coh.org>> wrote:







HI Bert and Ben,















Yes, running lm.fit using the matrix format is much faster. I read a couple of 
online comments why it is faster.















However, the residual values for three tested variables or genes from lm 
function and lm.fit function are different, with Pearson correlation of 0.55, 
0.89, and 0.99.















I have not found the reason.















Thanks,











Ding















From: Bert Gunter mailto:bgunter.4...@gmail.com>>



Sent: Friday, August 9, 2024 7:11 PM



To: Ben Bolker mailto:bbol...@gmail.com>>



Cc: Yuan Chun Ding mailto:ycd...@coh.org>>; 
r-help@r-project.org<mailto:r-help@r-project.org>



Subject: Re: [R] a fast way to do my job















Better idea, Ben! It would work as you might expect it to to produce the same results 
as the above: ##first make sure your regressor is a matrix: pur2 <- 
matrix(purity2, ncol =1) ## convert the data frame variables into a matrix dat <-







Better idea, Ben!















It would work as you might expect it to to produce the same results as







the above:















##first make sure your regressor is a matrix:







pur2 <- matrix(purity2, ncol =1)







## convert the data frame variables into a matrix







dat <- as.matrix(gem751be.rpkm[ , 74:35164])







##then







result <- residuals(lm.fit( x= pur2, y = dat))















Cheers,







Bert















On Fri, Aug 9, 2024 at 6:38 PM Ben Bolker 
mailto:bbol...@gmail.com>> wrote:















You can also fit a linear model with a matrix-valued response







variable, which should be even faster (not sure off the top of my head







how to get the residuals and reshape them to the dimensions you want)















On Fri, Aug 9, 2024 at 9:31 PM Bert Gunter 
mailto:bgunter.4...@gmail.com>> wrote:















See ?lm.fit.







I must be missing something, because:















results <- sapply(74:35164, \(i) residuals(lm.fit(purity2,







gem751be.rpkm[, i] )))















would give you a 751 x 35091 matrix of the residuals from each of the







regressions.







I assume it will be considerably faster than all the overhead you are







carrying in your current code, but of course you'll have to try it and







see. ... Assuming that I have interpreted your request correctly.







Ignore if not.















Cheers,







Bert















On Fri, Aug 9, 2024 at 4:50 PM Yuan Chun Ding via R-help







mailto:r-help@r-project.org>> wrote:















Dear R users,















I am running the following code below,  the gem751be.rpkm is a dataframe with 
dim of 751 samples by 35164 variables,  73 phenotypic variables in the furst to 
73rd column and 35091 genomic variables or genes in the 74th to 35164th 
columns.  What I need to do is to calculate the residuals for each gene using 
the simple linear regression model of genelist[i] ~ purity2;















The following code is running,  it takes long time, but I have an expensive 
ThinkStation window computer.







Can you provide a fast way to do it?















Thank you,















Ding















-























gem751be.rpkm <-merge(gem751be10, as.data.frame(t(rna849.fpkm2)),







+   by.x="id2",by.y=0)







   row.names(gem751be.rp

Re: [R] If loop

2024-08-08 Thread Rui Barradas

Às 05:33 de 09/08/2024, Steven Yen escreveu:
The following (using if else) did not help. Seemed like joint12 always 
kicked in.


     me1<-me0<-NULL.
     if(joint12){
   {me1<-cbind(me1,v1$p12);  me0<-cbind(me0,v0$p12)}
     } else if(marg1) {
   {me1<-cbind(me1,v1$p1);   me0<-cbind(me0,v0$p1)}
     } else if(marg2) {
   {me1<-cbind(me1,v1$p2);   me0<-cbind(me0,v0$p2)}
     } else if(cond12){
   {me1<-cbind(me1,v1$pc12); me0<-cbind(me0,v0$pc12)}
     } else {
   {me1<-cbind(me1,v1$pc21); me0<-cbind(me0,v0$pc21)}
     }

...

   labels<-NULL
   if(joint12){
     labels<-c(labels,lab.p12)
   } else if(marg1) {
     labels<-c(labels,lab.p1)
   } else if(marg2) {
     labels<-c(labels,lab.p2)
   } else if(cond12){
     labels<-c(labels,lab.pc12)
   } else {
     labels<-c(labels,lab.pc21)
   }


On 8/9/2024 11:44 AM, Steven Yen wrote:
Can someone help me with the if loop below? In the subroutine, I 
initialize all of (joint12,marg1,marg2,cond12,cond21) as FALSE, and 
call with only one of them being TRUE:


,...,joint12=FALSE,marg1=FALSE,marg2=FALSE,cond12=FALSE,cond21=FALSE

joint12 seems to always kick in, even though I call with, e.g., marg1 
being TRUE and everything else being FALSE. My attempts with if... 
else if were not useful. Please help. Thanks.


v1<-cprob(z1,x1,a,b,mu1,mu2,rho,j+1,k+1)
    v0<-cprob(z0,x0,a,b,mu1,mu2,rho,j+1,k+1)

   ...

    me1<-me0<-NULL
    if(joint12) {me1<-cbind(me1,v1$p12); me0<-cbind(me0,v0$p12)}
    if(marg1)   {me1<-cbind(me1,v1$p1); me0<-cbind(me0,v0$p1)}
    if(marg2)   {me1<-cbind(me1,v1$p2); me0<-cbind(me0,v0$p2)}
    if(cond12)  {me1<-cbind(me1,v1$pc12); me0<-cbind(me0,v0$pc12)}
    if(cond21)  {me1<-cbind(me1,v1$pc21); me0<-cbind(me0,v0$pc21)}
    ...

  labels<-NULL
  if(joint12) labels<-c(labels,lab.p12)
  if(marg1)   labels<-c(labels,lab.p1)
  if(marg2)   labels<-c(labels,lab.p2)
  if(cond12)  labels<-c(labels,lab.pc12)
  if(cond21)  labels<-c(labels,lab.pc21)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

What you are saying is hardly (not) possible.

If you ever call that code with joint12 set to TRUE, do you reset to 
FALSE afterwards?


Can you give a small working example with code and data showing this 
behavior?


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep

2024-08-01 Thread Rui Barradas
;). To match a literal period you must 
escape it. The correct regex is '\\.r'.




x <- c("age", "sleep", "primary", "middle", "high", "somewhath", "veryh",
   "somewhatm", "verym", "somewhatc", "veryc", "somewhatl", "veryl",
   "village", "married", "social", "agricultural", "communist",
   "minority", "religious")
colnms <- c("depression", "sleep", "female", "village", "agricultural",
"married", "communist", "minority", "religious", "social", 
"no",

"primary", "middle", "high", "veryh", "somewhath", "notveryh",
"verym", "somewhatm", "notverym", "veryc", "somewhatc", 
"notveryc",
"veryl", "somewhatl", "notveryl", "age", "village.r", 
"married.r",
"social.r", "agricultural.r", "communist.r", "minority.r", 
"religious.r",

"male.r", "education.r")

grep("\\.r\\b", colnms, value = TRUE)
#> [1] "village.r"  "married.r"  "social.r"   "agricultural.r"
#> [5] "communist.r""minority.r" "religious.r""male.r"
#> [9] "education.r"
# the same as above
# \\> matches the empty string at the end of a word,
# \\b matches the empty string at both ends of a word
grep("\\.r\\>", colnms, value = TRUE)
#> [1] "village.r"  "married.r"  "social.r"   "agricultural.r"
#> [5] "communist.r""minority.r" "religious.r""male.r"
#> [9] "education.r"

# 4 col names have a 'm' and end in '.r' therefore 4 matches
grep("m.*\\.r\\>", colnms, value = TRUE)
#> [1] "married.r"   "communist.r" "minority.r"  "male.r"
# only the strings starting with 'm'
grep("\\bm.*\\.r\\b", colnms, value = TRUE)
#> [1] "married.r"  "minority.r" "male.r"
grep("\\", colnms, value = TRUE)
#> [1] "married.r"  "minority.r" "male.r"


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R facets including two kinds of charts

2024-08-01 Thread Rui Barradas

Hello,

I hadn't understood the problem, sorry.
The problem are the bar plots, ggplot is plotting one in the "A" facet. 
And since there is nothing to plot, the bars start at 0.


A hack is to plot facet "A" separately and then combine the plots with 
one of several ways to combine ggplot plots. Below is an example with 
cowplot::plot_grid



library(ggplot2)
library(dplyr)
library(cowplot)

p1 <- df %>%
  filter(nm == "A") %>%
  ggplot(aes(x = date)) +
  geom_line(aes(y = val2)) +
  facet_wrap(~ nm, scales = "free_y") +
  theme(plot.margin = unit(c(0.2, 0, 0.1, 0), "cm"))

p2 <- df %>%
  filter(nm != "A") %>%
  ggplot(aes(x = date)) +
  geom_col(aes(y = val0), na.rm = TRUE, fill = "white") +
  geom_line(aes(y = val1)) +
  ylab("") +
  facet_wrap(~ nm, scales = "free_y")

plot_grid(p1, p2, rel_widths = c(1, 2))


Hope this helps,

Rui Barradas



Às 20:10 de 01/08/2024, p...@philipsmith.ca escreveu:
Thanks for the suggestion, but this does not give me what I want. Each 
chart needs its own unique scale on the y-axis.


Philip


On 2024-08-01 15:08, Rui Barradas wrote:

Às 19:01 de 01/08/2024, p...@philipsmith.ca escreveu:
I am asking for help with a ggplot2 program that has facets. There 
are actually 100 facets in my program, but in the example below I 
have limited the number to 3. There are two kinds of charts among the 
facets. One kind is a simple line plot with all of the y-values 
greater than zero. The facet for "A" in my example below is this 
kind. The other kind is a line plot combined with a bar chart with 
some of the y-values being positive and others negative. The facets 
for "B" and "C" in my example are this kind.


The facets for "B" and "C" look the way I want them to. However the 
facet for "A" has a scale on the y-axis that starts at zero, whereas 
I would like the minimum value on this scale to be non-zero, chosen 
by ggplot2 to be closer to the minimum value of y for that particular 
facet.


My example may not be the most efficient way to achieve this, but it 
works except for one aspect. Chart A, for which I do not wish to show 
a zero line, does indeed not show a zero line but it nevertheless 
chooses a scale for the y-axis that has a minimum value of zero. How 
can I adjust the code so that it chooses a minimum value on the 
y-axis that is non-zero and closer to the minimum actual y-value (as 
would be the case for a simple line chart alone, without any facets)?


library(ggplot2)
library(dplyr)

df <- data.frame(
   date=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6),
nm=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"),
   val0=c(NA,-5,4,NA,-3,3,NA,2,4,NA,3,3,NA,3,1,NA,-3,-4),
   val1=c(NA,-3,6,NA,-1,4,NA,5,5,NA,7,2,NA,4,3,NA,-2,-2),
   val2=c(50,NA,NA,53,NA,NA,62,NA,NA,56,NA,NA,54,NA,NA,61,NA,NA),
   zline=c(NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0)
)

ggplot(df)+
   geom_col(aes(x=date,y=val0),na.rm=TRUE,fill="white")+
   geom_line(aes(x=date,y=val1))+
   geom_line(aes(x=date,y=val2))+
   geom_hline(aes(yintercept=zline),na.rm=TRUE)+
   facet_wrap(~nm,scales="free_y")

Thank you for your assistance.

Philip

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Try to remove

scales="free_y"

from facet_wrap(). With scales="free_y" each facet will have its own y 
limits, given by the data plotted in each of them. If you want a 
global y limits, don't use it.


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R facets including two kinds of charts

2024-08-01 Thread Rui Barradas

Às 19:01 de 01/08/2024, p...@philipsmith.ca escreveu:
I am asking for help with a ggplot2 program that has facets. There are 
actually 100 facets in my program, but in the example below I have 
limited the number to 3. There are two kinds of charts among the facets. 
One kind is a simple line plot with all of the y-values greater than 
zero. The facet for "A" in my example below is this kind. The other kind 
is a line plot combined with a bar chart with some of the y-values being 
positive and others negative. The facets for "B" and "C" in my example 
are this kind.


The facets for "B" and "C" look the way I want them to. However the 
facet for "A" has a scale on the y-axis that starts at zero, whereas I 
would like the minimum value on this scale to be non-zero, chosen by 
ggplot2 to be closer to the minimum value of y for that particular facet.


My example may not be the most efficient way to achieve this, but it 
works except for one aspect. Chart A, for which I do not wish to show a 
zero line, does indeed not show a zero line but it nevertheless chooses 
a scale for the y-axis that has a minimum value of zero. How can I 
adjust the code so that it chooses a minimum value on the y-axis that is 
non-zero and closer to the minimum actual y-value (as would be the case 
for a simple line chart alone, without any facets)?


library(ggplot2)
library(dplyr)

df <- data.frame(
   date=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6),
nm=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"),
   val0=c(NA,-5,4,NA,-3,3,NA,2,4,NA,3,3,NA,3,1,NA,-3,-4),
   val1=c(NA,-3,6,NA,-1,4,NA,5,5,NA,7,2,NA,4,3,NA,-2,-2),
   val2=c(50,NA,NA,53,NA,NA,62,NA,NA,56,NA,NA,54,NA,NA,61,NA,NA),
   zline=c(NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0)
)

ggplot(df)+
   geom_col(aes(x=date,y=val0),na.rm=TRUE,fill="white")+
   geom_line(aes(x=date,y=val1))+
   geom_line(aes(x=date,y=val2))+
   geom_hline(aes(yintercept=zline),na.rm=TRUE)+
   facet_wrap(~nm,scales="free_y")

Thank you for your assistance.

Philip

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Try to remove

scales="free_y"

from facet_wrap(). With scales="free_y" each facet will have its own y 
limits, given by the data plotted in each of them. If you want a global 
y limits, don't use it.


Hope this helps,

Rui Barradas




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help on date objects...

2024-07-28 Thread Rui Barradas

Às 05:23 de 28/07/2024, akshay kulkarni escreveu:

Dear members,
  WHy is the following code returning NA instead of 
the date?



as.Date("2022-01-02", origin = "1900-01-01",  format = "%y%d%m")

[1] NA


Thanking you,
Yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

There are several reasons for your result.

1. You have 4 digits year but format %y (lower case = 2 digits year) It 
should be %Y

2. Your date has '-' as separator but your format doesn't have a separator.

Also, though less important:

1. You don't need argument origin. This is only needed with numeric to 
date coercion.

2. Are you sure the format is -DD-MM, year-day-month?


as.Date("2022-01-02", format = "%Y-%d-%m")
#> [1] "2022-02-01"

# note the origin is not your posted origin date,
# see the examples on Windows and Excel
# dates in help("as.Date")
as.Date(19024, origin = "1970-01-01")
#> [1] "2022-02-01"


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Rui Barradas

Às 20:47 de 25/07/2024, Yuan Chun Ding escreveu:

Hi Rui,

You are always very helpful!! Thank you,

I just modified your R codes to remove a row with zero values in both column 
pair as below for my real data.

Ding

dat<-gene22mut.coded
r <- P <- matrix(NA, nrow = 22L, ncol = 22L,
  dimnames = list(names(dat), names(dat)))

for(i in 1:22) {
   #i=1
   x <- dat[[i]]
   for(j in (1:22)) {
 #j=2
 if(i == j) {
   # there's nothing to test, assign correlation 1
   r[i, j] <- 1
 } else {
   tmp <-cbind(x,dat[[j]])
   row0 <-rowSums(tmp)
   tem2 <-tmp[row0!=0,]
   tmp3 <- cor.test(tem2[,1],tem2[,2])
   r[i, j] <- tmp3$estimate
   P[i, j] <- tmp3$p.value
 }
   }
}
r<-as.data.frame(r)
P<-as.data.frame(P)

From: R-help  On Behalf Of Yuan Chun Ding via 
R-help
Sent: Thursday, July 25, 2024 11:26 AM
To: Rui Barradas ; r-help@r-project.org
Subject: Re: [R] please help generate a square correlation matrix

HI Rui, Thank you for the help! You did not remove a row if zero values exist in both 
column pair, right? Ding From: Rui Barradas  Sent: Thursday, 
July 25, 2024 11: 15 AM To: Yuan Chun Ding ;


HI Rui,



Thank you for the  help!



You did not remove a row if zero values exist in both column pair, right?



Ding



From: Rui Barradas mailto:ruipbarra...@sapo.pt>>

Sent: Thursday, July 25, 2024 11:15 AM

To: Yuan Chun Ding mailto:ycd...@coh.org>>; 
r-help@r-project.org<mailto:r-help@r-project.org>

Subject: Re: [R] please help generate a square correlation matrix



Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R users, > > I generated 
a square correlation matrix for the dat dataframe below; > dat<-data. 
frame(g1=c(1,0,0,1,1,1,0,0,0), > g2=c(0,1,0,1,0,1,1,0,0), > g3=c(1,1,0,0,0,1,0,0,0),





Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:




Hi R users,











I generated a square correlation matrix for the dat dataframe below;





dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),





  g2=c(0,1,0,1,0,1,1,0,0),





  g3=c(1,1,0,0,0,1,0,0,0),





  g4=c(0,1,0,1,1,1,1,1,0))





library("Hmisc")





dat.rcorr = rcorr(as.matrix(dat))





dat.r <-round(dat.rcorr$r,2)











however, I want to modify this correlation calculation;





my dat has more than 1000 rows and 22 columns;





in each column, less than 10% values are 1, most of them are 0;





so I want to remove a  row with value of zero in both columns when calculate 
correlation between two columns.





I just want to check whether those values of 1 are correlated between two 
columns.





Please look at my code in the following;











cor.4gene <-matrix(0,nrow=4*4, ncol=4)





for (i in 1:4){





#i=1





for (j in 1:4) {





  #j=1





  d <-dat[,c(i,j)]%>%





filter(eval(as.symbol(colnames(dat)[i]))!=0 |





 eval(as.symbol(colnames(dat)[j]))!=0)





  c <-cor.test(d[,1],d[,2])





  cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],





  c$estimate,c$p.value)





}





}





cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)





colnames(cor.4gene)<-c("gene1","gene2","cor","P")











Can you tell me what mistakes I made?





first, why cor is NA when calculation of correlation for g1 and g1, I though it 
should be 1.











cor.4gene$cor[is.na(cor.4gene$cor)]<-1





cor.4gene$cor[is.na(cor.4gene$P)]<-0





cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor)











Then this line of code above did not generate a square matrix as what the HMisc 
library did.





How to fix my code?











Thank you,











Ding

















--











-SECURITY/CONFIDENTIALITY WARNING-











This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message 

Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Rui Barradas
00 NA
P
#>   g1 g2g3 g4
#> g1NA 0.79797170 0.4070838 0.68452834
#> g2 0.7979717 NA 0.4070838 0.06758329
#> g3 0.4070838 0.40708382NA 1.0000
#> g4 0.6845283 0.06758329 1.000 NA


You can put these two results in a list, like Hmisc::rcorr does.

lst_rcorr <- list(r = r, P = P)


Hope this helps,

Rui Barradas




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Rui Barradas

Às 21:46 de 20/07/2024, Bert Gunter escreveu:

With further fooling around, I realized that explicitly assigning my
last "solution" 'works'; i.e.

names(z)[2] <- "foo"

can be piped as:

  z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()

z

   a foo
1 1   a
2 2   b
3 3   c

This is even awfuller than before. So my query still stands.

-- Bert

On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:


Nope, I still got it wrong: None of my approaches work.  :(

So my query remains: how to do it via piping with |> ?

Bert


On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:


This post is likely pretty useless;  it is motivated by a recent post
from "Val" that was elegantly answered using Tidyverse constructs, but
I wondered how to do it using base R only. Along the way, I ran into
the following question to which I think my answer (below) is pretty
awful. I would be interested in more elegant base R approaches. So...

z <- data.frame(a = 1:3, b = letters[1:3])

z

   a h
1 1 a
2 2 b
3 3 c

Suppose I want to change the name of the second column of z from 'b'
to 'foo' . This is very easy using nested function syntax by:

names(z)[2] <- "foo"

z

   a foo
1 1   a
2 2   b
3 3   c

Now suppose I wanted to do this using |> syntax, along the lines of:

z |> names()[2] <- "foo"  ## throws an error

Slightly fancier is:

z |> (\(x)names(x)[2] <- "b")()
## does nothing, but does not throw an error.

However, the following, which resulted from a more careful read of
?names works (after changing the name of the second column back to "b"
of course):

z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()

z

   a foo
1 1   a
2 2   b
3 3   c

This qualifies to me as "pretty awful." I'm sure there are better ways
to do this using pipe syntax, so I would appreciate any better
approaches.

Best,
Bert


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

This is not exactly the same but in one of your attempts all you have to 
do is to return x.

The following works and does something.


z |> (\(x){names(x)[2] <- "foo";x})()
#   a foo
# 1 1   a
# 2 2   b
# 3 3   c


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread Rui Barradas

Às 17:43 de 18/07/2024, Rui Barradas escreveu:

Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu:

Hi

I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 
1) x
= 6 years. I was able to adapt the colour of the lines (green and red) 
and

the linetype (solid and dashed).
Challenge: my code produces now two legends. One with the colors for the
group and one with the linetype for the group. Does somebody have a 
hint how
to adapt the code to produce one legend? Group 0 = red and dashed, 
Group 1 =

green and solid?


MS1<- MS %>% filter(QI_A!="NA") %>% droplevels()
dev.new(width=4, height=2.75)
par(mar = c(0,6,0,0))
p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio,
linetype=Bio)) +
 geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
I(x^2),linewidth=1) +
theme(panel.background = element_blank())+
theme(axis.line = element_line(colour = "black"))+
   theme(axis.text=element_text(size=18))+
   theme(axis.title=element_text(size=20))+
ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
scale_color_manual(values=c("red","dark green"), labels=c("ÖLN",
"BIO"))+
scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN",
"BIO"))+
theme(legend.title = element_blank())+
   theme(legend.text=element_text(size=20))+
   scale_linetype_manual(values=c("dashed", "solid"))
p1<-p1 + expand_limits(y=c(0, 30))

kind regards
Sibylle

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

To have one legend only, the labels must be the same. Try using

labels=c("ÖLN", "BIO")

in

scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", "BIO"))


Hope this helps,

Rui Barradas



Hello,

Here is a more complete an answer with the built-in data set mtcars.
Note that the group aesthetic is not used. This is because linetype is 
categorical (after mutate) and there's no need to group again by the 
same variable (am).


Remove labels from scale_linetype_manual and there are two legends but 
with the same labels the legends merge.



library(ggplot2)
library(dplyr)

mtcars %>%
  # linetype must be categorical
  mutate(am = factor(am)) %>%
  ggplot(aes(hp, disp, color = am, linetype = am)) +
  geom_line() +
  scale_color_manual(
values = c("red","dark green"),
labels = c("ÖLN", "BIO")
  ) +
  scale_linetype_manual(
values = c("dashed", "solid"),
labels = c("ÖLN", "BIO")
  ) +
  theme_bw()


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread Rui Barradas

Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu:

Hi

I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 1) x
= 6 years. I was able to adapt the colour of the lines (green and red) and
the linetype (solid and dashed).
Challenge: my code produces now two legends. One with the colors for the
group and one with the linetype for the group. Does somebody have a hint how
to adapt the code to produce one legend? Group 0 = red and dashed, Group 1 =
green and solid?


MS1<- MS %>% filter(QI_A!="NA") %>% droplevels()
dev.new(width=4, height=2.75)
par(mar = c(0,6,0,0))
p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio,
linetype=Bio)) +
geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
I(x^2),linewidth=1) +
theme(panel.background = element_blank())+
theme(axis.line = element_line(colour = "black"))+
   theme(axis.text=element_text(size=18))+
   theme(axis.title=element_text(size=20))+
ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
scale_color_manual(values=c("red","dark green"), labels=c("ÖLN",
"BIO"))+
scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN",
"BIO"))+
theme(legend.title = element_blank())+
   theme(legend.text=element_text(size=20))+
   scale_linetype_manual(values=c("dashed", "solid"))
p1<-p1 + expand_limits(y=c(0, 30))

kind regards
Sibylle

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To have one legend only, the labels must be the same. Try using

labels=c("ÖLN", "BIO")

in

scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", "BIO"))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtaining predicted probabilities for Logistic regression

2024-07-13 Thread Rui Barradas

Às 12:13 de 13/07/2024, Christofer Bogaso escreveu:

Hi,

I ran below code

Dat = 
read.csv('https://raw.githubusercontent.com/sam16tyagi/Machine-Learning-techniques-in-python/master/logistic%20regression%20dataset-Social_Network_Ads.csv')
head(Dat)
Model = glm(Purchased ~ Gender, data = Dat, family = binomial())
head(predict(Model, type="response"))
My_Predict = 1/(1+exp(-1 * (as.vector(coef(Model))[1] *
as.vector(coef(Model))[2] * ifelse(Dat['Gender'] == "Male", 1, 0
head(My_Predict)

However, My_Predict and predict(Model, type="response")) are differing
when I tried to manually calculate prediction.

Could you please help to identify what was the mistake I made?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Sometimes when there is an error, the best way to correct it is to 
rewrite the offending part of the code.

In your case, after as.vector(coef(Model))[1] you should have a plus sign.



Dat = 
read.csv('https://raw.githubusercontent.com/sam16tyagi/Machine-Learning-techniques-in-python/master/logistic%20regression%20dataset-Social_Network_Ads.csv')

head(Dat)
Model = glm(Purchased ~ Gender, data = Dat, family = binomial())

# use matrix algebra
x <- cbind(1, (Dat$Gender == "Male")) %*% coef(Model)
pred1 <- exp(x)/(1 + exp(x))

# use the fitted line equation
y <- coef(Model)[1L] + coef(Model)[2L] * (Dat$Gender == "Male")
pred2 <- exp(y)/(1 + exp(y))

head(predict(Model, type="response"))
head(pred1) |> c()
head(pred2)


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep

2024-07-12 Thread Rui Barradas



Hello,l

Though the question is already answered, here is another answer to what 
is 'x'.
The output in the OP is not a lm or glm output but if your regression 
model was programmed according to recommended practices, there must be a 
'coefficients' member in the list or object it returns and the following 
should work.



# this is 'x', a named character vector
coef(fit)
#
fit |> coef() |> names() |> grep("somewhat|very", x = _)


Hope this helps,

Rui Barradas

Às 10:26 de 12/07/2024, Steven Yen escreveu:
Thanks. In this case below, what is "x"? I tried rownames(out) which did 
not work.


Sorry. Does this sound like homework to you?

On 7/12/2024 5:09 PM, Uwe Ligges wrote:



On 12.07.2024 10:54, Steven Yen wrote:

Below is part a regression printout. How can I use "grep" to identify
rows headed by variables (first column) with a certain label. In this
case, I like to find variables containing "somewhath",
"veryh", "somewhatm", "verym", "somewhatc", "veryc","somewhatl",
"veryl". The result should be an index 6:13 or 6,7,8,9,10,11,12,13. Note
that they all contain "somewhat" and "very". Thanks.


Sounds like homework?

which(grep("very|somewhat", x))

Best,
Uwe Ligges



est se t p g sig x.1.age 0.0341 0.0138 2.4766 0.0133 -3.8835e-04 **
x.1.sleep -0.1108 0.0059 -18.6277 0. -4.4572e-04 *** x.1.primary
-0.0694 0.0289 -2.4002 0.0164 -9.9638e-06 ** x.1.middle -0.2909 0.0356
-8.1657 0. -1.4913e-05 *** x.1.high -0.4267 0.0463 -9.2118 0.
-3.6246e-05 *** x.1.somewhath -0.6188 0.0256 -24.1971 0. -3.1337e-05
*** x.1.veryh -0.7580 0.0331 -22.8695 0. -2.9558e-05 ***
x.1.somewhatm -0.3413 0.0426 -8.0112 0. -1.8920e-05 *** x.1.verym
-0.3813 0.0446 -8.5413 0. -4.4029e-05 *** x.1.somewhatc -0.3101
0.0649 -4.7783 0. -1.4353e-05 *** x.1.veryc -0.2977 0.0648 -4.5910
0. -4.8986e-05 *** x.1.somewhatl -0.6310 0.0424 -14.8846 0.
-1.9543e-05 *** x.1.veryl -0.9132 0.0462 -19.7525 0. -4.4603e-05 ***
...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] simple problem with unquoting argument

2024-07-03 Thread Rui Barradas

Às 09:13 de 03/07/2024, Troels Ring escreveu:
Hi  friends - I'm in problems finding out how to unquote - I have a 
series of vectors named adds1adds11 and need to e.g. find the sum of 
each of them


So I try

SS <- c()

for (i in 1:11) {

e <- paste("adds",i,sep="")

SS[i]  <- sum(xx(e)) }

Now e looks right - but I have been unable to find out how to get the 
string e converted to the proper argument for sum()  - i.e. what  is 
function xx?


All best wishes
Troels Ring, Aalborg, Denmark

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Function xx is ?get or mget (same help page).
You can get the vectors adds all in one instruction with mget or one at 
a time with get.



adds1 <- 1:10
adds2 <- 2:10
adds3 <- 3:10
adds4 <- 4:10
adds5 <- 5:10

# create SS with the required length beforehand
SS <- numeric(5L)
for (i in 1:5) {
  e <- paste("adds",i,sep="")
  SS[i]  <- sum(get(e))
}
SS
#> [1] 55 54 52 49 45


Or all in one instruction with the assistance of ?ls.



# ls(pattern = "^adds") |> mget() |> lapply(sum)
ls(pattern = "^adds") |> mget() |> sapply(sum)
#> adds1 adds2 adds3 adds4 adds5
#>5554524945


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create matrix with variable number of columns AND CREATE NAMES FOR THE COLUMNS

2024-07-01 Thread Rui Barradas

Às 16:54 de 01/07/2024, Sorkin, John escreveu:

#I am trying to write code that will create a matrix with a variable number of 
columns where the #number of columns is 1+Grps
#I can do this:
NSims <- 4
Grps <- 5
DiffMeans <- matrix(nrow=NSims,ncol=1+Grps)
DiffMeans

#I have a problem when I try to name the columns of the matrix. I want the 
first column to be NSims, #and the other columns to be something like Value1, 
Value2, . . . Valuen where N=Grps

# I wrote a function to build a list of length Grps
createValuelist <- function(num_elements) {
   for (i in 1:num_elements) {
 cat("Item", i, "\n", sep = "")
   }
}
createValuelist(Grps)

# When I try to assign column names I receive an error:
#Error in dimnames(DiffMeans) <- list(NULL, c("NSim", createValuelist(Grps))) :
# length of 'dimnames' [2] not equal to array extent
dimnames(DiffMeans) <- list(NULL,c("NSim",createValuelist(Grps)))
DiffMeans

# Thank you for your help!


John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Sorry for my first answer, I thought you only wanted to name the matrix 
columns. After reading the OP again, this time actually reading it, I 
realized you also want to create the matrix. This is even in the 
question title line :(.




create_matrix <- function(nsims, ngrps, First = "NSims", Prefix = "Value") {
  # could also be paste0(Prefix, seq_len(ngrps))
  grp_names <- sprintf("%s%d", Prefix, seq_len(ngrps))
  nms <- c(First, grp_names)
  matrix(nrow = nsims, ncol = 1L + ngrps, dimnames = list(NULL, nms))
}

NSims <- 4
Grps <- 5
create_matrix(NSims, Grps)
#>  NSims Value1 Value2 Value3 Value4 Value5
#> [1,]NA NA NA NA NA NA
#> [2,]NA NA NA NA     NA NA
#> [3,]NA NA NA NA NA NA
#> [4,]NA NA NA NA NA NA



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create matrix with variable number of columns AND CREATE NAMES FOR THE COLUMNS

2024-07-01 Thread Rui Barradas

Às 16:54 de 01/07/2024, Sorkin, John escreveu:

#I am trying to write code that will create a matrix with a variable number of 
columns where the #number of columns is 1+Grps
#I can do this:
NSims <- 4
Grps <- 5
DiffMeans <- matrix(nrow=NSims,ncol=1+Grps)
DiffMeans

#I have a problem when I try to name the columns of the matrix. I want the 
first column to be NSims, #and the other columns to be something like Value1, 
Value2, . . . Valuen where N=Grps

# I wrote a function to build a list of length Grps
createValuelist <- function(num_elements) {
   for (i in 1:num_elements) {
 cat("Item", i, "\n", sep = "")
   }
}
createValuelist(Grps)

# When I try to assign column names I receive an error:
#Error in dimnames(DiffMeans) <- list(NULL, c("NSim", createValuelist(Grps))) :
# length of 'dimnames' [2] not equal to array extent
dimnames(DiffMeans) <- list(NULL,c("NSim",createValuelist(Grps)))
DiffMeans

# Thank you for your help!


John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Something like this?



names_cols <- function(x, First = "NSims", Prefix = "Value") {
  nms <- c(First, sprintf("%s%d", Prefix, seq_len(ncol(x) - 1L)))
  colnames(x) <- nms
  x
}

NSims <- 4
Grps <- 5
DiffMeans <- matrix(nrow=NSims,ncol=1+Grps)
names_cols(DiffMeans)
#>  NSims Value1 Value2 Value3 Value4 Value5
#> [1,]NA NA NA NA NA NA
#> [2,]NA NA NA NA     NA NA
#> [3,]NA NA NA NA NA NA
#> [4,]NA NA NA NA NA NA



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Naming output file

2024-06-24 Thread Rui Barradas

Às 12:41 de 24/06/2024, Steven Yen escreveu:

I would like a loop to

(1) read data files 2010midata1,2010midata2,2010midata3; and

(2)  name OUTPUT bop1,bop2,bop3.

I succeeded in line 3 of the code below,

BUT not line 4. The error message says:

Error in paste0("bop", im) <- boprobit(eqs, mydata, wt = weight, method
= "NR", : target of assignment expands to non-language object Please
help. Thanks.

m<-3
for (im in 1:m) {
mydata<-read.csv(paste0("2010midata",im,".csv"))
paste0("bop",im)<-boprobit(eqs,mydata,wt=weight,method="BHHH",tol=0,reltol=0,gradtol=1e-5,Fisher=TRUE)
}



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here are two ways, with a for loop and with a lapply loop.


# for loop
m <- 3
# create the input filenames in one instruction
INPUT <- paste0("2010midata", seq.int(m), ".csv")
# create a named list with m elements to store the output
OUTPUT <- vector("list", length = m) |> setNames(paste0("bop", seq.int(m)))
for(i in seq.int(m)) {
  mydata <- read.csv(INPUT[[i]])
  OUTPUT[[i]] <- boprobit(eqs, mydata, wt=weight, method="BHHH",
  tol=0, reltol=0, gradtol=1e-5, Fisher=TRUE)
}



# lapply loop
m <- 3
# create the input filenames in one instruction
INPUT <- paste0("2010midata", seq.int(m), ".csv")
# no need to create the output list, it will be the
# return value of lapply
OUTPUT <- lapply(INPUT, \(f) {
  mydata <- read.csv(f)
  boprobit(eqs, mydata, wt=weight, method="BHHH",
   tol=0, reltol=0, gradtol=1e-5, Fisher=TRUE)
})
# assign the output list's names
names(OUTPUT) <- paste0("bop", seq.int(m))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug with writeClipboard in {utils}

2024-06-20 Thread Rui Barradas

Hello,

Inline.

Às 14:15 de 20/06/2024, Barthelemy Tanguy escreveu:

Hello,

Thank you for your different tests.

You have that you didn't find any errors with Rscript or with R but I have the 
impression that your test with R (second test) showed additional and unwanted 
characters (second line of the output)?


You are right, in the case I posted there were unwanted characters.
Most of the tests I ran there were no additional, unwanted charcters, 
though.

This is definitely unstable, that's all I can say.

Hope this helps,

Rui Barradas


Thank you again

Tanguy BARTHELEMY


____
De : Rui Barradas 
Envoyé : mercredi 19 juin 2024 19:26
À : Barthelemy Tanguy; r-help@r-project.org
Objet : Re: [R] Bug with writeClipboard in {utils}

« Ce courriel provient d’un expéditeur extérieur à l’Insee. Compte tenu du 
contexte de menace cyber actuel il convient d’être extrêmement vigilant sur 
l’émetteur et son contenu avant d’ouvrir une pièce jointe, de cliquer sur un 
lien internet présent dans ce message ou d'y répondre. »


Às 11:12 de 18/06/2024, Barthelemy Tanguy via R-help escreveu:

Hello,

I'm encountering what seems to be a bug when using the `writeClipboard()` 
function in the R {utils} package.
When I try to copy text to the clipboard, I notice that I get extra characters 
when I try to paste it (by hand with CTRL+V or with the `readClipboard()` 
function from R packages {utils}).

Here's my example:

``` r
utils::writeClipboard("plot(AirPassengers)")
for (k in 1:10) {
  print(utils::readClipboard())
}
#> [1] "plot(AirPassengers)" "⤀攀"
#> [1] "plot(AirPassengers)" "\u0a00"
#> [1] "plot(AirPassengers)" "\xed\xb0\x80ư"
#> [1] "plot(AirPassengers)"
#> [1] "plot(AirPassengers)"
#> [1] "plot(AirPassengers)"
#> [1] "plot(AirPassengers)"
#> [1] "plot(AirPassengers)"
#> [1] "plot(AirPassengers)" "⤀"
#> [1] "plot(AirPassengers)"
Message d'avis :
Dans utils::readClipboard() : unpaired surrogate Unicode point dc00
```

So I don't always get the same result.
I opened a problem in the {clipr} GitHub repository before realizing it's a 
{tools} problem: https://github.com/mdlincoln/clipr/issues/68

Is this a bug or something I haven't configured properly?


Thank you very much


Tanguy BARTHELEMY


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I have reproduced part of the behavior in the OP but it will depend on
the GUI or command line used.

With Rscript or with R I haven't found any errors.
With Rgui or with RStudio, yes, the output was not the expected output.

All code run in R 4.4.0 on Windows 11.

The script rscript.R is


utils::capture.output({
utils::writeClipboard("plot(AirPassengers)")
for (k in 1:10) {
print(utils::readClipboard())
}
sessionInfo()
}, file = "rhelp.txt")


---

Here are the results I got.

1) Command:

Rscript rscript.R

Output:

[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Portuguese_Portugal.utf8
LC_CTYPE=Portuguese_Portugal.utf8
[3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C

[5] LC_TIME=Portuguese_Portugal.utf8

time zone: Europe/Lisbon
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.4.0

---

2) Command:
R -q -f rscript.R

Output:

  > utils::writeClipboard("plot(AirPassengers)")
  > for (k in 1:10) {
+ print(utils::readClipboard())
+ }
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)" "㨀Ǐ\005"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
  > sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:

Re: [R] Bug with writeClipboard in {utils}

2024-06-19 Thread Rui Barradas
8 
LC_CTYPE=Portuguese_Portugal.utf8
[3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C 


[5] LC_TIME=Portuguese_Portugal.utf8

time zone: Europe/Lisbon
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
 [1] gtable_0.3.4 tensorA_0.36.2.1 ggplot2_3.5.0
 [4] QuickJSR_1.1.3   processx_3.8.3   inline_0.3.19
 [7] lattice_0.22-5   tzdb_0.4.0   callr_3.7.5
[10] vctrs_0.6.5  tools_4.4.0  ps_1.7.6
[13] generics_0.1.3   stats4_4.4.0 curl_5.2.1
[16] parallel_4.4.0   sandwich_3.1-0   tibble_3.2.1
[19] fansi_1.0.6  chron_2.3-61 pkgconfig_2.0.3
[22] brms_2.21.0  Matrix_1.6-5 checkmate_2.3.1
[25] distributional_0.4.0 RcppParallel_5.1.7   lifecycle_1.0.4
[28] compiler_4.4.0   stringr_1.5.1Brobdingnag_1.2-9
[31] munsell_0.5.0codetools_0.2-19 bayesplot_1.11.1
[34] pillar_1.9.0 crayon_1.5.2 MASS_7.3-60.0.1
[37] StanHeaders_2.32.6   bridgesampling_1.1-2 abind_1.4-5
[40] multcomp_1.4-25  nlme_3.1-164 posterior_1.5.0
[43] rstan_2.32.5 tidyselect_1.2.0 mvtnorm_1.2-3
[46] stringi_1.7.12   dplyr_1.1.4  splines_4.4.0
[49] grid_4.4.0   colorspace_2.1-0 cli_3.6.2
[52] magrittr_2.0.3   loo_2.6.0survival_3.5-8
[55] pkgbuild_1.4.2   utf8_1.2.4   TH.data_1.1-2
[58] readr_2.1.4  prettyunits_1.2.0scales_1.3.0
[61] backports_1.4.1  estimability_1.5 httr_1.4.7
[64] matrixStats_1.0.0emmeans_1.10.0   gridExtra_2.3
[67] hms_1.1.3zoo_1.8-12   coda_0.19-4.1
[70] V8_4.4.2 rstantools_2.3.1.1   rlang_1.1.3
[73] Rcpp_1.0.12  xtable_1.8-4 glue_1.7.0
[76] ppcor_1.1rstudioapi_0.15.0jsonlite_1.8.8
[79] R6_2.5.1

---

4) GUI: Rgui
Output:

[1] "plot(AirPassengers)" "က \005ⷀǏǭ"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
[1] "plot(AirPassengers)"
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Portuguese_Portugal.utf8 
LC_CTYPE=Portuguese_Portugal.utf8
[3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C 


[5] LC_TIME=Portuguese_Portugal.utf8

time zone: Europe/Lisbon
tzcode source: internal

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.4.0



Hope this helps,

Rui Barradas




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] code for year month day hr format

2024-06-17 Thread Rui Barradas
-07-11  9  6.2  523 110  -34 167.1
4619  2012-07-11 10  5.5  527 110  -25 167.1
4620  2012-07-11 11  6.0  527 110  -22 167.1
4621  2012-07-11 12  5.8  518 110  -22 167.1
4622  2012-07-11 13  5.4  515 110  -19 167.1
4623  2012-07-11 14  5.3  513 110  -21 167.1
4624  2012-07-11 15  5.5  512 110  -21 167.1
4625  2012-07-11 16  5.2  505 110  -21 167.1
4626  2012-07-11 17  4.9  512 110  -18 167.1
4627  2012-07-11 18  5.1  514 110  -17 167.1
4628  2012-07-11 19  6.2  520 110  -13 167.1
4629  2012-07-11 20  6.6  510 110  -17 167.1
4630  2012-07-11 21  6.2  516 110  -18 167.1
4631  2012-07-11 22  5.8  512 110  -24 167.1
4632  2012-07-11 23  5.9  509 110  -31 167.1
4633  2012-07-12  0  6.1  502 125  -34 170.9
4634  2012-07-12  1  6.6  506 125  -34 170.9
4635  2012-07-12  2  6.1  502 125  -22 170.9
4636  2012-07-12  3  5.8  480 125  -18 170.9
4637  2012-07-12  4  5.7  474 125  -15 170.9
4638  2012-07-12  5  5.4  474 125  -23 170.9
4639  2012-07-12  6  6.1  466 125  -28 170.9
4640  2012-07-12  7  5.4  460 125  -32 170.9
4641  2012-07-12  8  4.8  453 125  -32 170.9
4642  2012-07-12  9  4.7  445 125  -28 170.9
4643  2012-07-12 10  4.9  436 125  -29 170.9
4644  2012-07-12 11  4.9  441 125  -23 170.9
4645  2012-07-12 12  4.9  440 125  -18 170.9
4646  2012-07-12 13  4.2  417 125  -15 170.9
4647  2012-07-12 14  3.5  414 125  -16 170.9
4648  2012-07-12 15  3.9  418 125  -14 170.9
4649  2012-07-12 16  4.2  419 125  -11 170.9
4650  2012-07-12 17  3.9  416 125  -11 170.9
4651  2012-07-12 18  4.0  416 125  -12 170.9
4652  2012-07-12 19  3.8  415 125  -13 170.9
4653  2012-07-12 20  3.9  410 125  -16 170.9
4654  2012-07-12 21  3.8  402 125  -20 170.9
4655  2012-07-12 22  3.8  395 125  -19 170.9
4656  2012-07-12 23  3.9  394 125  -19 170.9
4657  2012-07-13  0  3.9  395 129  -20 152.1
4658  2012-07-13  1  3.8  395 129  -19 152.1
4659  2012-07-13  2  3.8  391 129  -17 152.1
4660  2012-07-13  3  3.8  385 129  -16 152.1
4661  2012-07-13  4  3.7  376 129  -15 152.1
4662  2012-07-13  5  3.8  371 129  -15 152.1
4663  2012-07-13  6  3.8  365 129  -14 152.1
4664  2012-07-13  7  3.9  357 129  -15 152.1
4665  2012-07-13  8  4.0  354 129  -18 152.1
4666  2012-07-13  9  3.9  355 129  -20 152.1
4667  2012-07-13 10  3.9  353 129  -19 152.1
4668  2012-07-13 11  3.7  357 129  -18 152.1
4669  2012-07-13 12  3.8  357 129  -18 152.1
4670  2012-07-13 13  3.8  355 129  -18 152.1
4671  2012-07-13 14  3.7  347 129  -17 152.1
4672  2012-07-13 15  3.7  350 129  -15 152.1
4673  2012-07-13 16  3.7  346 129  -13 152.1
4674  2012-07-13 17  3.7  341 129  -10 152.1
4675  2012-07-13 18  3.3  340 129   -8 152.1
4676  2012-07-13 19  3.2  338 129   -9 152.1
4677  2012-07-13 20  3.3  333 129  -10 152.1
4678  2012-07-13 21  3.4  329 129   -9 152.1
4679  2012-07-13 22  3.9  326 129   -7 152.1
4680  2012-07-13 23  4.0  324 129   -8 152.1
4681  2012-07-14  0  4.0  324 125   -9 152.8
4682  2012-07-14  1  4.0  325 125   -9 152.8
4683  2012-07-14  2  3.9  329 125   -7 152.8
4684  2012-07-14  3  4.1  326 125   -5 152.8
4685  2012-07-14  4  4.4  325 125   -6 152.8
4686  2012-07-14  5  4.5  323 125   -5 152.8
4687  2012-07-14  6  5.0  319 125   -5 152.8
4688  2012-07-14  7  5.2  317 125   -8 152.8
4689  2012-07-14  8  5.4  323 125   -7 152.8
4690  2012-07-14  9  5.4  318 125   -6 152.8
4691  2012-07-14 10  5.2  316 125   -8 152.8
4692  2012-07-14 11  5.2  326 125   -5 152.8
4693  2012-07-14 12  4.6  335 125   -5 152.8
4694  2012-07-14 13  4.2  340 125   -5 152.8
4695  2012-07-14 14  5.0  350 125   -5 152.8
4696  2012-07-14 15  4.9  366 125   -1 152.8
4697  2012-07-14 16  3.9  355 125   -5 152.8
4698  2012-07-14 17  5.1  369 125   -5 152.8
4699  2012-07-14 18 11.0  419 125   15 152.8
4700  2012-07-14 19 14.6  574 1254 152.8
4701  2012-07-14 20 11.2  569 125   -7 152.8
4702  2012-07-14 21 13.9  568 125   -5 152.8
4703  2012-07-14 22 15.3  574 1251 152.8
4704  2012-07-14 23 19.2  644 125   -2 152.8
4705  2012-07-15  0 11.4  665 1179 145.1
4706  2012-07-15  1  9.7  657 1170 145.1
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Mon, Jun 17, 2024 at 9:23 AM Rui Barradas  wrote:


Às 09:12 de 17/06/2024, Jibrin Alhassan escreveu:

Hello Rui,
Here is the head(df1) output
Date HR IMF SWS SSN Dst f10.7
1 2012-01-01  0 4.0 379  71  -8 999.9
2 2012-01-01  1 4.4 386  71  -3 999.9
3 2012-01-01  2 4.8 380  71  -4 999.9
4 2012-01-01  3 5.4 374  71  -5 999.9
5 2012-01-01  4 4.5 369  71  -9 999.9
6 2012-01-01  5 4.2 368  71  -7 999.9
Many thanks.
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Mon, Jun 17, 2024 at 8:14 AM Rui Barradas 

wrote:



Às 07:53 de 17/06/2024, Jibrin Alhassan escreveu:

Part of it is pasted below
YEAR DOY HRIMF SWS   SSN   Dst f10.7
2012   1  0   4.0  379.  71-8 999.9
2012   1  1   4.4  386.  71-3 999.9
2012   1  2   4.8  380.  71-4 999.9
2012   1  3   5.4  374.  71-5 999.9
2012   1  4   4.5  369.  71-9 999.9
2012   1  5

Re: [R] code for year month day hr format

2024-06-17 Thread Rui Barradas

Às 09:12 de 17/06/2024, Jibrin Alhassan escreveu:

Hello Rui,
Here is the head(df1) output
Date HR IMF SWS SSN Dst f10.7
1 2012-01-01  0 4.0 379  71  -8 999.9
2 2012-01-01  1 4.4 386  71  -3 999.9
3 2012-01-01  2 4.8 380  71  -4 999.9
4 2012-01-01  3 5.4 374  71  -5 999.9
5 2012-01-01  4 4.5 369  71  -9 999.9
6 2012-01-01  5 4.2 368  71  -7 999.9
Many thanks.
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Mon, Jun 17, 2024 at 8:14 AM Rui Barradas  wrote:


Às 07:53 de 17/06/2024, Jibrin Alhassan escreveu:

Part of it is pasted below
YEAR DOY HRIMF SWS   SSN   Dst f10.7
2012   1  0   4.0  379.  71-8 999.9
2012   1  1   4.4  386.  71-3 999.9
2012   1  2   4.8  380.  71-4 999.9
2012   1  3   5.4  374.  71-5 999.9
2012   1  4   4.5  369.  71-9 999.9
2012   1  5   4.2  368.  71-7 999.9
2012   1  6   4.7  367.  71-6 999.9
2012   1  7   4.1  361.  71   -10 999.9
2012   1  8   3.2  362.  71-7 999.9
2012   1  9   4.3  367.  71-3 999.9
2012   1 10   4.5  365.  71-6 999.9
2012   1 11   5.6  369.  71-8 999.9
2012   1 12   5.2  366.  71-8 999.9
2012   1 13   4.4  370.  71-7 999.9
2012   1 14   4.8  357.  71-5 999.9
2012   1 15   4.6  354.  71-8 999.9
2012   1 16   3.7  382.  71-7 999.9
2012   1 17   3.2  376.  71-2 999.9
2012   1 18   2.8  368.  71 2 999.9
2012   1 19   3.2  361.  71 2 999.9
2012   1 20   3.2  361.  71-3 999.9
2012   1 21   3.5  365.  71-5 999.9
2012   1 22   3.6  364.  71-3 999.9
2012   1 23   3.0  362.  71-3 999.9
2012   2  0   3.2  359.  92-5 130.3
2012   2  1   3.0  361.  92-4 130.3
2012   2  2   4.5  374.  92 3 130.3
2012   2  3   4.5  364.  92 5 130.3
2012   2  4   5.1  352.  92 3 130.3
2012   2  5   4.9  358.  92 3 130.3
2012   2  6   4.4  346.  92 4 130.3
2012   2  7   4.2  349.  92 7 130.3
2012   2  8   4.5  346.  92 8 130.3
2012   2  9   5.2  345.  92 7 130.3
2012   2 10   5.0  349.  92 5 130.3
2012   2 11   4.8  345.  92 0 130.3
2012   2 12   5.3  347.  92 0 130.3
2012   2 13   5.5  342.  92 0 130.3
2012   2 14   6.1  359.  92 1 130.3
2012   2 15   6.2  393.  92 8 130.3
2012   2 16   6.7  390.  9210 130.3
2012   2 17   7.7  369.  9210 130.3
2012   2 18   9.4  380.  9214 130.3
2012   2 19  10.6  386.  9212 130.3
2012   2 20  10.2  378.  9211 130.3
2012   2 21  11.6  369.  92 7 130.3
2012   2 22  12.0  369.  92 8 130.3
2012   2 23  10.5  361.  92 1 130.3
2012   3  0  11.3  403. 120-7 130.2
2012   3  1  10.3  412. 120   -14 130.2
2012   3  2   8.8  419. 120   -18 130.2
2012   3  3   8.3  412. 120   -23 130.2
2012   3  4   8.0  408. 120   -25 130.2
2012   3  5   7.0  380. 120   -28 130.2
2012   3  6   6.9  374. 120   -29 130.2
2012   3  7   6.9  372. 120   -30 130.2
2012   3  8   7.1  365. 120   -32 130.2
2012   3  9   6.8  376. 120   -35 130.2
2012   3 10   6.7  380. 120   -35 130.2
2012   3 11   6.4  381. 120   -30 130.2
2012   3 12   5.9  401. 120   -26 130.2
2012   3 13   5.9  405. 120   -23 130.2
2012   3 14   5.9  413. 120   -20 130.2
2012   3 15   5.9  406. 120   -20 130.2
2012   3 16   6.3  427. 120   -20 130.2
2012   3 17   5.9  424. 120   -19 130.2
2012   3 18   4.8  390. 120   -16 130.2
2012   3 19   4.8  374. 120   -15 130.2
2012   3 20   4.8  374. 120   -15 130.2
2012   3 21   5.1  378. 120   -18 130.2
2012   3 22   4.9  375. 120   -19 130.2
2012   3 23   4.7  364. 120   -17 130.2
2012   4  0   4.3  359. 126   -17 131.6
2012   4  1   4.3  359. 126   -15 131.6
2012   4  2   4.2  358. 126   -13 131.6
2012   4  3   3.8  359. 126   -13 131.6
2012   4  4   3.8  358. 126   -13 131.6
2012   4  5   3.7  359. 126   -14 131.6
2012   4  6   3.9  361. 126   -13 131.6
2012   4  7   3.7  364. 126   -13 131.6
2012   4  8   3.7  366. 126   -12 131.6
2012   4  9   3.8  363. 126   -10 131.6
2012   4 10   3.5  363. 126-8 131.6
2012   4 11   3.0  352. 126   -10 131.6
2012   4 12   3.1  348. 126   -12 131.6
2012   4 13   3.3  340. 126-9 131.6
2012   4 14   4.0  343. 126-8 131.6
2012   4 15   4.2  343. 126-7 131.6
2012   4 16   3.8  336. 126-5 131.6
2012   4 17   3.9  334. 126-6 131.6
2012   4 18   3.8  329. 126-5 131.6
2012   4 19   3.8  326. 126-4 131.6
2012   4 20   4.3  337. 126-3 131.6
2012   4 21   3.9  331. 126 0 131.6
2012   4 22   3.8  322. 126-1 131.6
2012   4 23   3.5  331. 126-1 131.6
2012   5  0   3.9  312. 109-3 136.6
2012   5  1   3.6  311. 109-1 136.6
2012   5  2   3.7  312. 109 0 136.6
2012   5  3   3.8  308. 109 0 136.6
2012   5  4   4.0  305. 109 2 136.6
2012   5  5   4.5  309. 109 2 136.6
2012   5  6   3.5  314. 109 3 136.6
2012   5  7   3.6  305. 109 2 136.6
2012   5  8   4.3  307. 109 2 136.6
2012   5  9   4.6  316. 109 1 136.6
2012   5 10   5.0  321. 109-4 136.6
2012   5 11   5.1  321. 109-6 136.6
2012   5 12   4.6  326. 109-4 136.6

Re: [R] code for year month day hr format

2024-06-17 Thread Rui Barradas
 135.1
2012  16 12  13.2  424. 154   -10 135.1
2012  16 13  12.9  433. 154-8 135.1
2012  16 14   9.3  461. 154-7 135.1
2012  16 15   6.6  466. 154   -14 135.1
2012  16 16   6.6  493. 154   -11 135.1
2012  16 17   7.4  496. 154-7 135.1
2012  16 18   6.2  493. 154-7 135.1
2012  16 19   6.9  492. 154   -13 135.1
2012  16 20   6.8  486. 154   -19 135.1
2012  16 21   5.6  488. 154   -14 135.1
2012  16 22   6.4  464. 154   -11 135.1
2012  16 23   6.0  459. 154   -10 135.1
2012  17  0   4.9  476. 141   -14 134.5
2012  17  1   4.6  460. 141   -20 134.5
2012  17  2   4.1  467. 141   -17 134.5
2012  17  3   3.7  469. 141   -13 134.5
2012  17  4   3.3  472. 141   -12 134.5
2012  17  5   2.7  472. 141-8 134.5
2012  17  6   3.5  459. 141-6 134.5
2012  17  7   3.9  459. 141-6 134.5
2012  17  8   4.1  463. 141-7 134.5
2012  17  9   4.1  443. 141   -10 134.5
2012  17 10   4.1  446. 141   -14 134.5
2012  17 11   4.1  442. 141   -13 134.5
2012  17 12   3.6  436. 141   -10 134.5
2012  17 13   3.6  433. 141-6 134.5
2012  17 14   4.2  421. 141-1 134.5
2012  17 15   3.7  416. 141-2 134.5
2012  17 16   4.2  410. 141-1 134.5
2012  17 17   4.6  396. 141-1 134.5
2012  17 18   4.5  398. 141-2 134.5
2012  17 19   4.4  397. 141-6 134.5
2012  17 20   4.5  396. 141-8 134.5
2012  17 21   3.5  411. 141-5 134.5
2012  17 22   3.9  425. 141-5 134.5
2012  17 23   4.7  418. 141-6 134.5
2012  18  0   4.6  400. 126-7 143.4
2012  18  1   4.5  413. 126-3 143.4
2012  18  2   4.4  418. 126 2 143.4
2012  18  3   4.2  420. 126 2 143.4
2012  18  4   4.0  401. 126-2 143.4
2012  18  5   3.8  399. 126-1 143.4
2012  18  6   3.5  388. 126-1 143.4
2012  18  7   4.4  393. 126-2 143.4
2012  18  8   4.7  405. 126-3 143.4
2012  18  9   4.8  409. 126-4 143.4
2012  18 10   4.9  409. 126-3 143.4
2012  18 11   5.0  411. 126-5 143.4
2012  18 12   5.1  405. 126-5 143.4
2012  18 13   5.2  403. 126-6 143.4
2012  18 14   5.1  394. 126-4 143.4
2012  18 15   5.0  391. 126-5 143.4
2012  18 16   4.6  387. 126-4 143.4
2012  18 17   4.7  376. 126-2 143.4
2012  18 18   4.7  381. 126-1 143.4
2012  18 19   4.5  382. 126-2 143.4
2012  18 20   4.9  386. 126-5 143.4
2012  18 21   4.8  375. 126-5 143.4
2012  18 22   4.7  385. 126-6 143.4
2012  18 23   4.7  381. 126-5 143.4
2012  19  0   4.3  372. 105-3 152.0
2012  19  1   4.2  361. 105-4 152.0
2012  19  2   4.0  360. 105-5 152.0
2012  19  3   3.9  362. 105-4 152.0
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Mon, Jun 17, 2024 at 7:50 AM Jibrin Alhassan 
wrote:


Hello Rui,
Your patience is indeed amazing. Your script tested as shown below worked
perfectly well.
df1 <- read.table(text = "YEAR DOY HR   IMF  SW   SSNDst f10.7
2012 215  4   5.1  371. 143-4 138.6 ", header = TRUE)
with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j")
df1$Date <- with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j")
df1 <- df1[-(1:2)]
df1 <- df1[c(ncol(df1), 1:(ncol(df1) - 1L))]
head(df1).
But  I have 43,849 data points. Your script only generated one. Help me
with a script that can handle the whole data points. I have tried following
your tested solution but was unsuccessful. My regards.
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Sun, Jun 16, 2024 at 8:33 AM Rui Barradas  wrote:


Às 21:42 de 15/06/2024, Jibrin Alhassan escreveu:

Thank you Rui. I ran the following script
df1 <- read.table("solar_hour", header = TRUE)
df1$date <- as.Date(paste(df1$year, df1$hour),
   format = "%Y %j",
origin = "2012-08-01-0")
df2 <- df1[c("date", "IMF", "SWS", "SSN", "Dst", "f10")]
head(df1)
#To display all the rows
   print(df2).
It gave me this error message

source ("script.R")

Error in `$<-.data.frame`(`*tmp*`, date, value = numeric(0)) :
replacement has 0 rows, data has 38735

print(df2)

Error: object 'df2' not found

My data is an hourly data but desire to have the date as

yearmonthday   hour
2012   08 01 01
2012   08 01 02
2012   08    01  03 etc
Thanks.

*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Sat, Jun 15, 2024 at 8:34 PM Rui Barradas 

wrote:



Às 20:00 de 15/06/2024, Jibrin Alhassan escreveu:

I have solar-geophysical data e.g as blow:
YEAR DOY HR   IMF  SW   SSNDst f10.7
2012 214  0   3.4  403. 132-9 154.6
2012 214  1   3.7  388. 132   -10 154.6
2012 214  2   3.7  383. 132   -10 154.6
2012 214  3   3.7  391. 132-9 154.6
2012 214  4   4.2  399. 132-7 154.6
2012 214  5   4.1  411. 132-6 154.6
2012 214  6   4.0  407. 132-6 154.6
2012 214  7   4.2  404.

Re: [R] code for year month day hr format

2024-06-16 Thread Rui Barradas

Às 21:42 de 15/06/2024, Jibrin Alhassan escreveu:

Thank you Rui. I ran the following script
df1 <- read.table("solar_hour", header = TRUE)
df1$date <- as.Date(paste(df1$year, df1$hour),
  format = "%Y %j",
origin = "2012-08-01-0")
df2 <- df1[c("date", "IMF", "SWS", "SSN", "Dst", "f10")]
head(df1)
#To display all the rows
  print(df2).
It gave me this error message

source ("script.R")

Error in `$<-.data.frame`(`*tmp*`, date, value = numeric(0)) :
   replacement has 0 rows, data has 38735

print(df2)

Error: object 'df2' not found

My data is an hourly data but desire to have the date as

yearmonthday   hour
2012   08 01 01
2012   08 01 02
2012   0801  03 etc
Thanks.

*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Sat, Jun 15, 2024 at 8:34 PM Rui Barradas  wrote:


Às 20:00 de 15/06/2024, Jibrin Alhassan escreveu:

I have solar-geophysical data e.g as blow:
YEAR DOY HR   IMF  SW   SSNDst f10.7
2012 214  0   3.4  403. 132-9 154.6
2012 214  1   3.7  388. 132   -10 154.6
2012 214  2   3.7  383. 132   -10 154.6
2012 214  3   3.7  391. 132-9 154.6
2012 214  4   4.2  399. 132-7 154.6
2012 214  5   4.1  411. 132-6 154.6
2012 214  6   4.0  407. 132-6 154.6
2012 214  7   4.2  404. 132-4 154.6
2012 214  8   4.3  405. 132-6 154.6
2012 214  9   4.4  409. 132-6 154.6
2012 214 10   4.4  401. 132-6 154.6
2012 214 11   4.5  385. 132-7 154.6
2012 214 12   4.7  377. 132-8 154.6
2012 214 13   4.7  382. 132-6 154.6
2012 214 14   4.3  396. 132-4 154.6
2012 214 15   4.1  384. 132-2 154.6
2012 214 16   4.0  382. 132-1 154.6
2012 214 17   3.9  397. 132 0 154.6
2012 214 18   3.8  390. 132 1 154.6
2012 214 19   4.2  400. 132 2 154.6
2012 214 20   4.6  408. 132 1 154.6
2012 214 21   4.8  401. 132-3 154.6
2012 214 22   4.9  395. 132-5 154.6
2012 214 23   5.0  386. 132-1 154.6
2012 215  0   5.0  377. 143-1 138.6
2012 215  1   4.9  384. 143-2 138.6
2012 215  2   4.9  390. 143-4 138.6
2012 215  3   4.9  372. 143-6 138.6
2012 215  4   5.1  371. 143-4 138.6
I want to process it to be of the format as shown below
   y   m  d  hr imf  sws  ssnDst f10.7
2012-08-01 10 3.4  403. 132-9 154.6
2012-08-01 12 3.7  388. 132   -10 154.6
2012-08-01 15 3.7  383. 132   -10 154.6
2012-08-01 17 3.7  391. 132-9 154.6
I want to request an R code to accomplish this task. Thanks for your

time.

*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

To create a date column, paste the first two columns and coerce to class
"Date" with conversion specifications %Y for the 4 digit year and %j for
the day of year. See

help("strptime")



df1 <- read.table(text = "YEAR DOY HR   IMF  SW   SSNDst f10.7
2012 214  0   3.4  403. 132-9 154.6
2012 214  1   3.7  388. 132   -10 154.6
2012 214  2   3.7  383. 132   -10 154.6
2012 214  3   3.7  391. 132-9 154.6
2012 214  4   4.2  399. 132-7 154.6
2012 214  5   4.1  411. 132-6 154.6
2012 214  6   4.0  407. 132-6 154.6
2012 214  7   4.2  404. 132-4 154.6
2012 214  8   4.3  405. 132-6 154.6
2012 214  9   4.4  409. 132-6 154.6
2012 214 10   4.4  401. 132-6 154.6
2012 214 11   4.5  385. 132-7 154.6
2012 214 12   4.7  377. 132-8 154.6
2012 214 13   4.7  382. 132-6 154.6
2012 214 14   4.3  396. 132-4 154.6
2012 214 15   4.1  384. 132-2 154.6
2012 214 16   4.0  382. 132-1 154.6
2012 214 17   3.9  397. 132 0 154.6
2012 214 18   3.8  390. 132 1 154.6
2012 214 19   4.2  400. 132 2 154.6
2012 214 20   4.6  408. 132 1 154.6
2012 214 21   4.8  401. 132-3 154.6
2012 214 22   4.9  395. 132-5 154.6
2012 214 23   5.0  386. 132-1 154.6
2012 215  0   5.0  377. 143-1 138.6
2012 215  1   4.9  384. 143-2 138.6
2012 215  2   4.9  390. 143-4 138.6
2012 215  3   4.9  372. 143-6 138.6
2012 215  4   5.1  371. 143-4 138.6", header = TRUE)


with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j")
#>  [1] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01"
#>  [6] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01"
#> [11] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01

Re: [R] code for year month day hr format

2024-06-15 Thread Rui Barradas

Às 20:00 de 15/06/2024, Jibrin Alhassan escreveu:

I have solar-geophysical data e.g as blow:
YEAR DOY HR   IMF  SW   SSNDst f10.7
2012 214  0   3.4  403. 132-9 154.6
2012 214  1   3.7  388. 132   -10 154.6
2012 214  2   3.7  383. 132   -10 154.6
2012 214  3   3.7  391. 132-9 154.6
2012 214  4   4.2  399. 132-7 154.6
2012 214  5   4.1  411. 132-6 154.6
2012 214  6   4.0  407. 132-6 154.6
2012 214  7   4.2  404. 132-4 154.6
2012 214  8   4.3  405. 132-6 154.6
2012 214  9   4.4  409. 132-6 154.6
2012 214 10   4.4  401. 132-6 154.6
2012 214 11   4.5  385. 132-7 154.6
2012 214 12   4.7  377. 132-8 154.6
2012 214 13   4.7  382. 132-6 154.6
2012 214 14   4.3  396. 132-4 154.6
2012 214 15   4.1  384. 132-2 154.6
2012 214 16   4.0  382. 132-1 154.6
2012 214 17   3.9  397. 132 0 154.6
2012 214 18   3.8  390. 132 1 154.6
2012 214 19   4.2  400. 132 2 154.6
2012 214 20   4.6  408. 132 1 154.6
2012 214 21   4.8  401. 132-3 154.6
2012 214 22   4.9  395. 132-5 154.6
2012 214 23   5.0  386. 132-1 154.6
2012 215  0   5.0  377. 143-1 138.6
2012 215  1   4.9  384. 143-2 138.6
2012 215  2   4.9  390. 143-4 138.6
2012 215  3   4.9  372. 143-6 138.6
2012 215  4   5.1  371. 143-4 138.6
I want to process it to be of the format as shown below
  y   m  d  hr imf  sws  ssnDst f10.7
2012-08-01 10 3.4  403. 132-9 154.6
2012-08-01 12 3.7  388. 132   -10 154.6
2012-08-01 15 3.7  383. 132   -10 154.6
2012-08-01 17 3.7  391. 132-9 154.6
I want to request an R code to accomplish this task. Thanks for your time.
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To create a date column, paste the first two columns and coerce to class 
"Date" with conversion specifications %Y for the 4 digit year and %j for 
the day of year. See


help("strptime")



df1 <- read.table(text = "YEAR DOY HR   IMF  SW   SSNDst f10.7
2012 214  0   3.4  403. 132-9 154.6
2012 214  1   3.7  388. 132   -10 154.6
2012 214  2   3.7  383. 132   -10 154.6
2012 214  3   3.7  391. 132-9 154.6
2012 214  4   4.2  399. 132-7 154.6
2012 214  5   4.1  411. 132-6 154.6
2012 214  6   4.0  407. 132-6 154.6
2012 214  7   4.2  404. 132-4 154.6
2012 214  8   4.3  405. 132-6 154.6
2012 214  9   4.4  409. 132-6 154.6
2012 214 10   4.4  401. 132-6 154.6
2012 214 11   4.5  385. 132-7 154.6
2012 214 12   4.7  377. 132-8 154.6
2012 214 13   4.7  382. 132-6 154.6
2012 214 14   4.3  396. 132-4 154.6
2012 214 15   4.1  384. 132-2 154.6
2012 214 16   4.0  382. 132-1 154.6
2012 214 17   3.9  397. 132 0 154.6
2012 214 18   3.8  390. 132 1 154.6
2012 214 19   4.2  400. 132 2 154.6
2012 214 20   4.6  408. 132 1 154.6
2012 214 21   4.8  401. 132-3 154.6
2012 214 22   4.9  395. 132-5 154.6
2012 214 23   5.0  386. 132-1 154.6
2012 215  0   5.0  377. 143-1 138.6
2012 215  1   4.9  384. 143-2 138.6
2012 215  2   4.9  390. 143-4 138.6
2012 215  3   4.9  372. 143-6 138.6
2012 215  4   5.1  371. 143-4 138.6", header = TRUE)


with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j")
#>  [1] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01"
#>  [6] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01"
#> [11] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01"
#> [16] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01"
#> [21] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-02"
#> [26] "2012-08-02" "2012-08-02" "2012-08-02" "2012-08-02"

# now create the column
df1$Date <- with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j")
# remove the columns no longer needed
df1 <- df1[-(1:2)]
# relocate the new date column
df1 <- df1[c(ncol(df1), 1:(ncol(df1) - 1L))]
head(df1)
#> Date HR IMF  SW SSN Dst f10.7
#> 1 2012-08-01  0 3.4 403 132  -9 154.6
#> 2 2012-08-01  1 3.7 388 132 -10 154.6
#> 3 2012-08-01  2 3.7 383 132 -10 154.6
#> 4 2012-08-01  3 3.7 391 132  -9 154.6
#> 5 2012-08-01  4 4.2 399 132  -7 154.6
#> 6 2012-08-01  5 4.1 411 132  -6 154.6


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] my R code worked well when running the first 1000 lines of R code

2024-06-12 Thread Rui Barradas

Às 20:44 de 12/06/2024, Yuan Chun Ding escreveu:

Hi Rui,

Thank you very much!


Yes, I verified using real data, it worked correctly as expected after adding 
tidyr:: to the pivot_longer function and dplyr:: to the group_by and summarize
Function.

I did not know how to assign the tidyr and dplyr to the three functions because 
I do not really understand well the three functions and just got the code from 
a google search.

I also tried your simplified code, but got the following error
Error in `dplyr::summarize()`:
! Can't supply both `.by` and `.groups`.
Run `rlang::last_trace()` to see where the error occurred.

Ding

From: Rui Barradas 
Sent: Wednesday, June 12, 2024 11:29 AM
To: Yuan Chun Ding ; CALUM POLWART 
Cc: r-help@r-project.org
Subject: Re: [R] my R code worked well when running the first 1000 lines of R 
code

Hello, Inline. Às 19: 03 de 12/06/2024, Yuan Chun Ding via R-help escreveu: > I am sorry 
that I know I should provide a dataset that allows to replicate my problem. > > It is 
a research dataset and quite large, so I can not share. >


Hello,



Inline.



Às 19:03 de 12/06/2024, Yuan Chun Ding via R-help escreveu:


I am sorry that I know I should provide a dataset that allows to replicate my 
problem.







It is a research dataset and quite large, so I can not share.







Both Bert and Tim guessed my problem correctly.  I also thought about the 
conflicting issue between different packages and function masking.



I just hope to that someone has similar experience, so providing me suggestion.







For conflicting issue,







What I tried  was to add dplyr::pivot_longer or tidyr:: pivot_longer,






Do that to all functions comming from contributed packages. At least to

those.





summary_anno1148ft <- anno1148ft %>%

tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to =

"measure") %>%

dplyr::group_by(dat, measure) %>%

dplyr::summarize(minimum = min(value,na.rm=T),

 q25 = quantile(value, probs = 0.25,na.rm=T),

 med = median(value,na.rm=T),

 q75 = quantile(value, probs = 0.75,na.rm=T),

 maximum = max(value,na.rm=T),

 average = mean(value,na.rm=T),

 #standard_deviation = sd(value),

 .groups = "drop"

)





Or, simpler, no need to group_by anymore. It can be done in summarise.





summary_anno1148ft <- anno1148ft %>%

tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to =

"measure") %>%

dplyr::summarize(minimum = min(value,na.rm=T),

 q25 = quantile(value, probs = 0.25,na.rm=T),

 med = median(value,na.rm=T),

 q75 = quantile(value, probs = 0.75,na.rm=T),

 maximum = max(value,na.rm=T),

 average = mean(value,na.rm=T),

 #standard_deviation = sd(value),

 .by = c(dat, measure),

 .groups = "drop"

)







This is only a guess, the question cannot really be answered.





Hope this helps,



Rui Barradas



but still not resolved the problem.














I will restart from the first line my code, it will work again and then I will 
track down.















Thank you,







Ding











From: CALUM POLWART mailto:polc1...@gmail.com>>



Sent: Wednesday, June 12, 2024 10:52 AM



To: Yuan Chun Ding mailto:ycd...@coh.org>>



Cc: r-help@r-project.org<mailto:r-help@r-project.org>



Subject: Re: [R] my R code worked well when running the first 1000 lines of R 
code







I sometimes think people on this list are quite rude to posters. I'm afraid I'm likely to 
join in with some rudeness? 1. "Here is some code that works but also doesn't" 
is probably not going to get you an answer 2. I provide











I sometimes think people on this list are quite rude to posters.







I'm afraid I'm likely to join in with some rudeness?







1. "Here is some code that works but also doesn't" is probably not going to get 
you an answer



2. I provide no information about the data it works on or doesn't



3. I tell you I'm using a load of dependencies, but don't tell you what



4. I refer to 2000 lines of code but probably means 2000 lines of data?







So. Please post a question someone can actually answer.







If the question is "why might code fail on a 2000 line dataset when it works on 1000 
line dataset" then here are some thoughts:







* Is the 1000 lines being run as dataset[1:1000,] or is it dataset1 and 
dataset2 ?



* Is there a structural difference in the datasets - i.e. numbers, characters 
or factors as columns. Often import functions guess a column type by reading 
the first 500/1000 lines. If the data has numbers

Re: [R] my R code worked well when running the first 1000 lines of R code

2024-06-12 Thread Rui Barradas

Hello,

Inline.

Às 19:03 de 12/06/2024, Yuan Chun Ding via R-help escreveu:

I am sorry that I know I should provide a dataset that allows to replicate my 
problem.

It is a research dataset and quite large, so I can not share.

Both Bert and Tim guessed my problem correctly.  I also thought about the 
conflicting issue between different packages and function masking.
I just hope to that someone has similar experience, so providing me suggestion.

For conflicting issue,

What I tried  was to add dplyr::pivot_longer or tidyr:: pivot_longer, 



Do that to all functions comming from contributed packages. At least to 
those.



summary_anno1148ft <- anno1148ft %>%
  tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = 
"measure") %>%

  dplyr::group_by(dat, measure) %>%
  dplyr::summarize(minimum = min(value,na.rm=T),
   q25 = quantile(value, probs = 0.25,na.rm=T),
   med = median(value,na.rm=T),
   q75 = quantile(value, probs = 0.75,na.rm=T),
   maximum = max(value,na.rm=T),
   average = mean(value,na.rm=T),
   #standard_deviation = sd(value),
   .groups = "drop"
  )


Or, simpler, no need to group_by anymore. It can be done in summarise.


summary_anno1148ft <- anno1148ft %>%
  tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = 
"measure") %>%

  dplyr::summarize(minimum = min(value,na.rm=T),
   q25 = quantile(value, probs = 0.25,na.rm=T),
   med = median(value,na.rm=T),
   q75 = quantile(value, probs = 0.75,na.rm=T),
   maximum = max(value,na.rm=T),
   average = mean(value,na.rm=T),
   #standard_deviation = sd(value),
   .by = c(dat, measure),
   .groups = "drop"
  )



This is only a guess, the question cannot really be answered.


Hope this helps,

Rui Barradas

but still not resolved the problem.




I will restart from the first line my code, it will work again and then I will 
track down.



Thank you,

Ding


From: CALUM POLWART 
Sent: Wednesday, June 12, 2024 10:52 AM
To: Yuan Chun Ding 
Cc: r-help@r-project.org
Subject: Re: [R] my R code worked well when running the first 1000 lines of R 
code

I sometimes think people on this list are quite rude to posters. I'm afraid I'm likely to 
join in with some rudeness? 1. "Here is some code that works but also doesn't" 
is probably not going to get you an answer 2. I provide


I sometimes think people on this list are quite rude to posters.

I'm afraid I'm likely to join in with some rudeness?

1. "Here is some code that works but also doesn't" is probably not going to get 
you an answer
2. I provide no information about the data it works on or doesn't
3. I tell you I'm using a load of dependencies, but don't tell you what
4. I refer to 2000 lines of code but probably means 2000 lines of data?

So. Please post a question someone can actually answer.

If the question is "why might code fail on a 2000 line dataset when it works on 1000 
line dataset" then here are some thoughts:

* Is the 1000 lines being run as dataset[1:1000,] or is it dataset1 and 
dataset2 ?
* Is there a structural difference in the datasets - i.e. numbers, characters 
or factors as columns. Often import functions guess a column type by reading 
the first 500/1000 lines. If the data has numbers in column 1 for 1-1000 but on 
line 1999 has a letter... The data type may vary.

On Wed, 12 Jun 2024, 17:28 Yuan Chun Ding via R-help, 
mailto:r-help@r-project.org>> wrote:
Hi R users,

The following code worked well to summarize four data groups in a dataframe for 
three variables (t_depth, t_alt_count, t_alt_ratio), 12 columns of summary, see 
attached.
However, after running another 2000 lines of R codes using functions from more 
than 10 other R  libraries, then it only generated one column of summary.
Do you know why?

Thank you,

Yuan Chun Ding

summary_anno1148ft <- anno1148ft %>%
   pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = "measure") %>%
   group_by(dat, measure) %>%
   summarize(minimum = min(value,na.rm=T),
 q25 = quantile(value, probs = 0.25,na.rm=T),
 med = median(value,na.rm=T),
 q75 = quantile(value, probs = 0.75,na.rm=T),
 maximum = max(value,na.rm=T),
 average = mean(value,na.rm=T),
 #standard_deviation = sd(value),
 .groups = "drop"
   )
summary_anno1148ft <-t(summary_anno1148ft)



--

-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or 
entity 

Re: [R] Format

2024-06-09 Thread Rui Barradas

Às 21:39 de 09/06/2024, Val escreveu:

HI all,

My
I am trying to convert character date (mm/dd/yy)  to -mm-dd date
format in one of the columns of my data file.

The first few lines of the data file looks like as follow

  head(Atest,10);dim(Atest)
   ddate
1  19/08/21
2  30/04/18
3  28/08/21
4  11/10/21
5  07/09/21
6  15/08/21
7  03/09/21
8  23/07/18
9  17/08/20
10 23/09/20
[1] 1270076   1

I am using the following different scenarios but none of them resulted
the desired result.

library(data.table)
library(stringr)
library(lubridate)
 Atest$ddate1 <- as.Date((Atest$ddate), format = "%m/%d/%y")
 Atest$ddate2 <- mdy((Atest$ddate))
 Atest$ddate3 <= as.Date(as.character(Atest$ddate),format="%m/%d/%y")
 Atest$ddate4 <- as.Date(as.character(Atest$ddate),"%m/%d/%y")
 Atest$ddate5 <- lubridate::mdy(Atest$ddate)


head(Atest,3)

  ddate ddate1 ddate2 ddate4 ddate5
1 19/08/21
2 30/04/18
3 28/08/21


Any help why I am not getting the desired result.
Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Day is clearly first, format "%m/%d/%y" assumes a month 19 in 19/08/21.
Try

as.Date(Atest$ddate, format = "%d/%m/%y")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for overlapping variables -- count

2024-06-02 Thread Rui Barradas

Às 18:40 de 02/06/2024, Rui Barradas escreveu:

Às 18:34 de 02/06/2024, Leo Mada via R-help escreveu:

Dear Shadee,

If you have a data.frame with the following columns:

n = 100; # population size
x = data.frame(
  Sex = sample(c("M","F"), n, T),
  Country = sample(c("AA", "BB", "US"), n, T),
  Income  = as.factor(sample(1:3, n, T))
)

# Dummy variable
ONE = rep(1, nrow(x))

r = aggregate(ONE ~ Sex + Income + Country, length, data = x)
r = r[, c("Country", "Income", "Sex")]
print(r)

It is possible to write more simple code, if you need only the 
particular combination of variables (which you specified in your 
mail). But this is the more general approach.


Note: you may want to use "sum" instead of "length", e.g. if you have 
a column specifying the number of individuals in that category.



Hope this helps,

Leonard


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

The following is simpler.


r2 <- xtabs(~ ., x) |> as.data.frame()
r2[-4L] # or r2[names(r2) != "Freq"]


Hope this helps,

Rui Barradas



Hello,

This is the same solution but the code to keep only the columns in the 
original data set is better. And it's a MRE.



n <- 100; # population size
x <- data.frame(
  Sex = sample(c("M","F"), n, T),
  Country = sample(c("AA", "BB", "US"), n, T),
  Income  = as.factor(sample(1:3, n, T))
)

r2 <- xtabs(~ ., x) |> as.data.frame()
# no need for constants, find the columns
# to keep from the data
r2[names(r2) %in% names(x)]


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for overlapping variables -- count

2024-06-02 Thread Rui Barradas

Às 18:34 de 02/06/2024, Leo Mada via R-help escreveu:

Dear Shadee,

If you have a data.frame with the following columns:

n = 100; # population size
x = data.frame(
  Sex = sample(c("M","F"), n, T),
  Country = sample(c("AA", "BB", "US"), n, T),
  Income  = as.factor(sample(1:3, n, T))
)

# Dummy variable
ONE = rep(1, nrow(x))

r = aggregate(ONE ~ Sex + Income + Country, length, data = x)
r = r[, c("Country", "Income", "Sex")]
print(r)

It is possible to write more simple code, if you need only the particular 
combination of variables (which you specified in your mail). But this is the 
more general approach.

Note: you may want to use "sum" instead of "length", e.g. if you have a column 
specifying the number of individuals in that category.


Hope this helps,

Leonard


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The following is simpler.


r2 <- xtabs(~ ., x) |> as.data.frame()
r2[-4L] # or r2[names(r2) != "Freq"]


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add only the 1st of May with POSIXct

2024-05-29 Thread Rui Barradas

Às 07:01 de 29/05/2024, Stefano Sofia escreveu:

Thank you Rui for your code.

I basically understood all your suggestions.

I am using an old version of R (version 3.6.3, installed in a server I am not 
allowed to control), and the new pipe operator does not work.

I tried to run your code without the "|>" operator, but I get an error when I 
use apply.

Could you please expand your code without the pipe operator?


Thank you again for your help

Stefano



  (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO


____
Da: Rui Barradas 
Inviato: martedì 28 maggio 2024 18:19
A: Stefano Sofia; r-help@R-project.org
Oggetto: Re: [R] add only the 1st of May with POSIXct

[Non ricevi spesso messaggi di posta elettronica da ruipbarra...@sapo.pt. Per 
informazioni sull'importanza di questo fatto, visita 
https://aka.ms/LearnAboutSenderIdentification.]

Às 16:23 de 28/05/2024, Stefano Sofia escreveu:

Dear R-list users,

  From an initial and a final date I create a sequence of days using POSIXct.

If this interval covers all or only in part the months from May to October, I 
need to get rid of the days from the 2nd of May to the 31st of October:


a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

mydf <- data.frame(data_POSIX=seq(as.POSIXct(paste(format(a, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), 
as.POSIXct(paste(format(b, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="1 day"))


If I execute

as.data.frame(mydf[format(mydf$data_POSIX,"%m") %in% c("11", "12", "01", "02", "03", 
"04"), ])

the interval will be

from 2002-11-01 09:00:00 to 2003-04-30 09:00:00

and from 2003-11-01 09:00:00 to 2004-04-30 09:00:00


but I need also 2003-05-01 09:00:00 and 2004-05-01 09:00:00


How can I solve this problem?


Thank you for your attention and your help

Stefano



   (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere 
informazioni confidenziali, pertanto � destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed 
urgenza, la risposta al presente messaggio di posta elettronica pu� essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

   [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Cstefano.sofia%40regione.marche.it%7C0d812d3223344a1508d408dc7f31f657%7C295eaa1431a14b09bfe65a338b679f60%7C0%7C0%7C638525100275684754%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C6%7C%7C%7C&sdata=ac0Hx9auMSeXgsllDaaimZDFBpSLZ%2B3OeOGQoVvcjxQ%3D&reserved=0
PLEASE do read the posting guide 
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C02%7Cstefano.sofia%40regione.marche.it%7C0d812d3223344a1508d408dc7f31f657%7C295eaa1431a14b09bfe65a338b679f60%7

Re: [R] add only the 1st of May with POSIXct

2024-05-28 Thread Rui Barradas

Às 16:23 de 28/05/2024, Stefano Sofia escreveu:

Dear R-list users,

 From an initial and a final date I create a sequence of days using POSIXct.

If this interval covers all or only in part the months from May to October, I 
need to get rid of the days from the 2nd of May to the 31st of October:


a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1")

mydf <- data.frame(data_POSIX=seq(as.POSIXct(paste(format(a, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), 
as.POSIXct(paste(format(b, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="1 day"))


If I execute

as.data.frame(mydf[format(mydf$data_POSIX,"%m") %in% c("11", "12", "01", "02", "03", 
"04"), ])

the interval will be

from 2002-11-01 09:00:00 to 2003-04-30 09:00:00

and from 2003-11-01 09:00:00 to 2004-04-30 09:00:00


but I need also 2003-05-01 09:00:00 and 2004-05-01 09:00:00


How can I solve this problem?


Thank you for your attention and your help

Stefano



  (oo)
--oOO--( )--OOo--
Stefano Sofia PhD
Civil Protection - Marche Region - Italy
Meteo Section
Snow Section
Via del Colle Ameno 5
60126 Torrette di Ancona, Ancona (AN)
Uff: +39 071 806 7743
E-mail: stefano.so...@regione.marche.it
---Oo-oO



AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere 
informazioni confidenziali, pertanto � destinato solo a persone autorizzate 
alla ricezione. I messaggi di posta elettronica per i client di Regione Marche 
possono contenere informazioni confidenziali e con privilegi legali. Se non si 
� il destinatario specificato, non leggere, copiare, inoltrare o archiviare 
questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al 
mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi 
dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed 
urgenza, la risposta al presente messaggio di posta elettronica pu� essere 
visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages to clients of Regione Marche may contain information that is 
confidential and legally privileged. Please do not read, copy, forward, or 
store this message unless you are an intended recipient of it. If you have 
received this message in error, please forward it to the sender and delete it 
completely from your computer system.

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

First of all, 'a' and 'b' are already objects of class "POSIXct", you 
don't need to repeat the code creating them when creating mydf.


As for the question, see the code below.


a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1")
b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1")
mydf <- data.frame(data_POSIX = seq(a, b, by = "1 day"))

# get the years from the data
years <- format(c(a, b), "%Y") |> as.integer()
# this creates a sequence with all the years
years <- Reduce(`:`, years)

# coerce to "Date"
from <- ISOdate(years, 5L, 2L, tz = "Etc/GMT-1")
to <- ISOdate(years, 10L, 30L, tz = "Etc/GMT-1")

# this logical index keeps only the dates between May, 2nd and Nov 1st.
keep <- data.frame(from, to) |>
  apply(1L, \(x) x[1L] <= mydf$data_POSIX & mydf$data_POSIX <= x[2L]) |>
  rowSums() > 0L

mydf[keep, , drop = FALSE]



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print date on y axis with month, day, and year

2024-05-10 Thread Rui Barradas

Às 00:58 de 10/05/2024, Sorkin, John escreveu:

I am trying to use ggplot to plot the data, and R code, below. The dates 
(jdate) are printing as Mar 01, Mar 15, etc. I want to have the date printed as 
MMM DD  (or any other way that will show month, date, and year, e.g. 
mm/dd/yy). How can I accomplish this?

yyy  <- structure(list(
   jdate = structure(c(19052, 19053, 19054, 19055,
   19058, 19059, 19060, 19061, 19062, 19063, 19065, 19066, 
19067,
   19068, 19069, 19072, 19073, 19074, 19075, 19076, 19077, 
19083,
   19086, 19087, 19088, 19089, 19090, 19093, 19094, 19095), class = 
"Date"),
 Sum = c ( 1,  3,  9, 11, 13, 16, 18, 22, 26, 27, 30, 32, 35, 39,  41,
  43, 48, 51, 56, 58, 59, 63, 73, 79, 81, 88, 91, 93, 96, 103)),
 row.names = c(NA, 30L), class = "data.frame")
yyy
class(yyy$jdate)
ggplot(data=yyy[1:30,],aes(as.Date(jdate,format="%m-%d-%Y"),Sum)) +geom_point()


Thank you
John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Since class(yyy$jdate) returns "Date", you have a real date and 
scale_x_date can handle the printed formats, there is no need for an 
extra as.Date in aes(). And get rid of the format = "%m-%d-%Y" argument.


Let scale_x_date take care of formating the date as you want it 
displayed. Any of the two below is a valid date format.




ggplot(data = yyy[1:30,], aes(jdate, Sum)) +
   geom_point() +
   # scale_x_date(date_labels = "%b %d, %Y")
   scale_x_date(date_labels = "%m/%d/%Y")



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-21 Thread Rui Barradas

Às 09:08 de 21/04/2024, Rui Barradas escreveu:

Às 08:55 de 21/04/2024, Hans W escreveu:

As we all know, in R indices for vectors start with 1, i.e, x[0] is not a
correct expression. Some algorithms, e.g. in graph theory or 
combinatorics,
are much easier to formulate and code if 0 is an allowed index 
pointing to

the first element of the vector.

Some programming languages, for instance Julia (where the index for 
normal
vectors also starts with 1), provide libraries/packages that allow the 
user
to define an index range for its vectors, say 0:9 or 10:20 or even 
negative

indices.

Of course, this notation would only be feasible for certain specially
defined vectors. Is there a library that provides this functionality?
Or is there a simple trick to do this in R? The expression 'x[0]' must
be possible, does this mean the syntax of R has to be twisted somehow?

Thanks, Hans W.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

I find what you are asking awkward but it can be done with S3 classes.
Write an extraction method for the new class and in the use case below 
it works. The method increments the ndex before calling NextMethod, the 
usual extraction function.



`[.zerobased` <- function(x, i, ...) {
   i <- i + 1L
   NextMethod()
}
as_zerobased <- function(x) {
   class(x) <- c("zerobased", class(x))
   x
}

x <- 1:10
y <- as_zerobased(x)

y[0]
#> [1] 1
y[1]
#> [1] 2
y[9]
#> [1] 10
y[10]
#> [1] NA


Hope this helps,

Rui Barradas



Sorry, forgot to also define a `[[zerobased` method. It's probably safer.


`[[.zerobased` <- function(x, i, ...) {
  i <- i + 1L
  NextMethod()
}


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-21 Thread Rui Barradas

Às 08:55 de 21/04/2024, Hans W escreveu:

As we all know, in R indices for vectors start with 1, i.e, x[0] is not a
correct expression. Some algorithms, e.g. in graph theory or combinatorics,
are much easier to formulate and code if 0 is an allowed index pointing to
the first element of the vector.

Some programming languages, for instance Julia (where the index for normal
vectors also starts with 1), provide libraries/packages that allow the user
to define an index range for its vectors, say 0:9 or 10:20 or even negative
indices.

Of course, this notation would only be feasible for certain specially
defined vectors. Is there a library that provides this functionality?
Or is there a simple trick to do this in R? The expression 'x[0]' must
be possible, does this mean the syntax of R has to be twisted somehow?

Thanks, Hans W.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I find what you are asking awkward but it can be done with S3 classes.
Write an extraction method for the new class and in the use case below 
it works. The method increments the ndex before calling NextMethod, the 
usual extraction function.



`[.zerobased` <- function(x, i, ...) {
  i <- i + 1L
  NextMethod()
}
as_zerobased <- function(x) {
  class(x) <- c("zerobased", class(x))
  x
}

x <- 1:10
y <- as_zerobased(x)

y[0]
#> [1] 1
y[1]
#> [1] 2
y[9]
#> [1] 10
y[10]
#> [1] NA


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread Rui Barradas

Às 06:47 de 08/04/2024, Dave Dixon escreveu:

Greetings,

I have a csv file of 76 fields and about 4 million records. I know that 
some of the records have errors - unmatched quotes, specifically. 
Reading the file with readLines and parsing the lines with read.csv(text 
= ...) is really slow. I know that the first 2459465 records are good. 
So I try this:


 > startTime <- Sys.time()
 > first_records <- read.csv(file_name, nrows = 2459465)
 > endTime <- Sys.time()
 > cat("elapsed time = ", endTime - startTime, "\n")

elapsed time =   24.12598

 > startTime <- Sys.time()
 > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
 > endTime <- Sys.time()
 > cat("elapsed time = ", endTime - startTime, "\n")

This appears to never finish. I have been waiting over 20 minutes.

So why would (skip = 2459465, nrows = 5) take orders of magnitude longer 
than (nrows = 2459465) ?


Thanks!

-dave

PS: readLines(n=2459470) takes 10.42731 seconds.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Can the following function be of help?
After reading the data setting argument quote=FALSE, call a function 
applying gregexpr to its character columns, then transforming the output 
in a two column data.frame with columns


 Col - the column processed;
 Unbalanced - the rows with unbalanced double quotes.

I am assuming the quotes are double quotes. It shouldn't be difficult to 
adapt it to other cas, single quotes, both cases.





unbalanced_dquotes <- function(x) {
  char_cols <- sapply(x, is.character) |> which()
  lapply(char_cols, \(i) {
y <- x[[i]]
Unbalanced <- gregexpr('"', y) |>
  sapply(\(x) attr(x, "match.length") |> length()) |>
  {\(x) (x %% 2L) == 1L}() |>
  which()
data.frame(Col = i, Unbalanced = Unbalanced)
  }) |>
  do.call(rbind, args = _)
}

# read the data disregardin g quoted strings
df1 <- read.csv(fl, quote = "")
# determine which strings have unbalanced quotes and
# where
unbalanced_dquotes(df1)


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Rui Barradas

Às 19:42 de 08/04/2024, Ivan Krylov via R-help escreveu:

В Sun, 7 Apr 2024 23:47:52 -0600
Dave Dixon  пишет:


  > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)


It may or may not be important that read.csv defaults to header =
TRUE. Having skipped 2459465 lines, it may attempt to parse the next
one as a header, so the second call read.csv() should probably include
header = FALSE.



This will throw an error, call read.table with sep="," instead.




Bert's advice to try scan() is on point, though. It's likely that the
default-enabled header is not the most serious problem here.



Hoep this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding reservoir volume and water level

2024-04-07 Thread Rui Barradas

Às 13:27 de 07/04/2024, javad bayat escreveu:

Dear all;
I have a question about the water level of a reservoir, when the volume
changed or doubled.
There is a DEM file with the highest elevation 1267 m. The lowest elevation
is 1230 m. The current volume of the reservoir is 7,000,000 m3 at 1240 m.
Now I want to know what would be the water level if the volume rises to
1250 m? or what would be the water level if the volume doubled (14,000,000
m3)?

Is there any way to write codes to do this in R?
I would be more than happy if anyone could help me.
Sincerely









Hello,

This is a simple rule of three.
If you know the level l the argument doesn't need to be named but if you 
know the volume v then it must be named.



water_level <- function(l, v, level = 1240, volume = 7e6) {
  if(missing(v)) {
volume * l / level
  } else level * v / volume
}

lev <- 1250
vol <- 14e6

water_level(l = lev)
#> [1] 7056452
water_level(v = vol)
#> [1] 2480


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Output of tapply function as data frame: Problem Fixed

2024-03-28 Thread Rui Barradas

Às 01:43 de 29/03/2024, Ogbos Okike escreveu:

Dear Rui,
Thanks again for resolving this. I have already started using the version
that works for me.

But to clarify the second part, please let me paste the what I did and the
error message:


set.seed(2024)
data <- data.frame(

+Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
+ TRUE),
+count = sample(10L, 100L, TRUE)
+ )


# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()

Error: unexpected '>' in "res <- with(data, tapply(count, Date, mean)) |>"

# assign a dates column from the row names
res$Date <- row.names(res)

Error in row.names(res) : object 'res' not found

# cosmetics
names(res)[2:1] <- names(data)

Error in names(res)[2:1] <- names(data) : object 'res' not found

# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculations.
res


You can see that the error message is on the pipe. Please, let me know
where I am missing it.
Thanks.

On Wed, Mar 27, 2024 at 10:45 PM Rui Barradas  wrote:


Às 08:58 de 27/03/2024, Ogbos Okike escreveu:

Dear Rui,
Nice to hear from you!

I am sorry for the omission and I have taken note.

Many thanks for responding. The second solution looks elegant as it

quickly

resolved the problem.

Please, take a second look at the first solution. It refused to run.

Looks

as if the pipe is not properly positioned. Efforts to correct it and get

it

run failed. If you can look further, it would be great. If time does not
permit, I am fine too.

But having the too solutions will certainly make the subject more
interesting.
Thank you so much.
With warmest regards from
Ogbos

On Wed, Mar 27, 2024 at 8:44 AM Rui Barradas 

wrote:



Às 04:30 de 27/03/2024, Ogbos Okike escreveu:

Warm greetings to you all.

Using the tapply function below:
data<-read.table("FD1month",col.names = c("Dates","count"))
x=data$count
f<-factor(data$Dates)
AB<- tapply(x,f,mean)


I made a simple calculation. The result, stored in AB, is of the form
below. But an effort to write AB to a file as a data frame fails. When

I

use the write table, it only produces the count column and strip of the
first column (date).

2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
2006-05-01
-4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
-4.430705
2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
2006-12-01
-3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
-10.072579
2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
2007-07-01
-5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
-4.127980
2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
2008-02-01
-3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
-4.183578
2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01
-4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
-4.347428
2008-10-01 2008-11-01 2008-12-01
-4.029918  -4.260326  -4.454224

But the normal format I wish to display only appears on the terminal,
leading me to copy it and paste into a text file. That is, when I enter

AB

on the terminal, it returns a format in the form:

008-02-01  -4.183578
2008-03-01  -4.336969
2008-04-01  -3.813306
2008-05-01  -4.296579
2008-06-01  -4.575095
2008-07-01  -4.036036
2008-08-01  -4.727994
2008-09-01  -4.347428
2008-10-01  -4.029918
2008-11-01  -4.260326
2008-12-01  -4.454224

Now, my question: How do I write out two columns displayed by AB on the
terminal to a file?

I have tried using AB<-data.frame(AB) but it doesn't work either.

Many thanks for your time.
Ogbos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

The main trick is to pipe to as.data.frame. But the result will have one
column only, you must assign the dates from the df's row names.
I also include an aggregate solution.



# create a test data set
set.seed(2024)
data <- data.frame(
 Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
TRUE),
 count = sample(10L, 100L, TRUE)
)

# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
# assign a dates column from the row names
res$Date <- row.names(res)
# cosmetics
names(res)[2:1] <- names(data)
# note that the row names are still tapply's names vector
# and that the columns 

Re: [R] Output of tapply function as data frame: Problem Fixed

2024-03-27 Thread Rui Barradas

Às 08:58 de 27/03/2024, Ogbos Okike escreveu:

Dear Rui,
Nice to hear from you!

I am sorry for the omission and I have taken note.

Many thanks for responding. The second solution looks elegant as it quickly
resolved the problem.

Please, take a second look at the first solution. It refused to run. Looks
as if the pipe is not properly positioned. Efforts to correct it and get it
run failed. If you can look further, it would be great. If time does not
permit, I am fine too.

But having the too solutions will certainly make the subject more
interesting.
Thank you so much.
With warmest regards from
Ogbos

On Wed, Mar 27, 2024 at 8:44 AM Rui Barradas  wrote:


Às 04:30 de 27/03/2024, Ogbos Okike escreveu:

Warm greetings to you all.

Using the tapply function below:
data<-read.table("FD1month",col.names = c("Dates","count"))
x=data$count
   f<-factor(data$Dates)
AB<- tapply(x,f,mean)


I made a simple calculation. The result, stored in AB, is of the form
below. But an effort to write AB to a file as a data frame fails. When I
use the write table, it only produces the count column and strip of the
first column (date).

2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
2006-05-01
   -4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
   -4.430705
2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
2006-12-01
   -3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
-10.072579
2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
2007-07-01
   -5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
   -4.127980
2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
2008-02-01
   -3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
   -4.183578
2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01
   -4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
   -4.347428
2008-10-01 2008-11-01 2008-12-01
   -4.029918  -4.260326  -4.454224

But the normal format I wish to display only appears on the terminal,
leading me to copy it and paste into a text file. That is, when I enter

AB

on the terminal, it returns a format in the form:

008-02-01  -4.183578
2008-03-01  -4.336969
2008-04-01  -3.813306
2008-05-01  -4.296579
2008-06-01  -4.575095
2008-07-01  -4.036036
2008-08-01  -4.727994
2008-09-01  -4.347428
2008-10-01  -4.029918
2008-11-01  -4.260326
2008-12-01  -4.454224

Now, my question: How do I write out two columns displayed by AB on the
terminal to a file?

I have tried using AB<-data.frame(AB) but it doesn't work either.

Many thanks for your time.
Ogbos

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

The main trick is to pipe to as.data.frame. But the result will have one
column only, you must assign the dates from the df's row names.
I also include an aggregate solution.



# create a test data set
set.seed(2024)
data <- data.frame(
Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L,
TRUE),
count = sample(10L, 100L, TRUE)
)

# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
# assign a dates column from the row names
res$Date <- row.names(res)
# cosmetics
names(res)[2:1] <- names(data)
# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculations.
res
#>   count   Date
#> 2024-03-22 5.416667 2024-03-22
#> 2024-03-23 5.50 2024-03-23
#> 2024-03-24 6.00 2024-03-24
#> 2024-03-25 4.476190 2024-03-25
#> 2024-03-26 6.538462 2024-03-26
#> 2024-03-27 5.20 2024-03-27

# fix the columns' order
res <- res[2:1]



# better all in one instruction
aggregate(count ~ Date, data, mean)
#> Datecount
#> 1 2024-03-22 5.416667
#> 2 2024-03-23 5.50
#> 3 2024-03-24 6.00
#> 4 2024-03-25 4.476190
#> 5 2024-03-26 6.538462
#> 6 2024-03-27 5.20



Also,
I'm glad to help as always but Ogbos, you have been an R-Help
contributor for quite a while, please post data in dput format. Given
the problem the output of the following is more than enough.


dput(head(data, 20L))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
www.avg.com




Hello,

This pipe?


with(data, tapply(count, Date, mean)) |> as.data.frame()


I am not seeing anything wrong with it. I have tried it again just now 
and it runs with no problems, like it had before.

A solution is not to pipe

Re: [R] Output of tapply function as data frame

2024-03-27 Thread Rui Barradas

Às 04:30 de 27/03/2024, Ogbos Okike escreveu:

Warm greetings to you all.

Using the tapply function below:
data<-read.table("FD1month",col.names = c("Dates","count"))
x=data$count
  f<-factor(data$Dates)
AB<- tapply(x,f,mean)


I made a simple calculation. The result, stored in AB, is of the form
below. But an effort to write AB to a file as a data frame fails. When I
use the write table, it only produces the count column and strip of the
first column (date).

2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01
2006-05-01
  -4.106887  -4.259154  -5.836090  -4.756757  -4.118011  -4.487942
  -4.430705
2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01
2006-12-01
  -3.856727  -6.067103  -6.418767  -4.383031  -3.985805  -4.768196
-10.072579
2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01
2007-07-01
  -5.342338  -4.653128  -4.325094  -4.525373  -4.574783  -3.915600
  -4.127980
2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01
2008-02-01
  -3.952150  -4.033518  -4.532878  -4.522941  -4.485693  -3.922155
  -4.183578
2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01
  -4.336969  -3.813306  -4.296579  -4.575095  -4.036036  -4.727994
  -4.347428
2008-10-01 2008-11-01 2008-12-01
  -4.029918  -4.260326  -4.454224

But the normal format I wish to display only appears on the terminal,
leading me to copy it and paste into a text file. That is, when I enter AB
on the terminal, it returns a format in the form:

008-02-01  -4.183578
2008-03-01  -4.336969
2008-04-01  -3.813306
2008-05-01  -4.296579
2008-06-01  -4.575095
2008-07-01  -4.036036
2008-08-01  -4.727994
2008-09-01  -4.347428
2008-10-01  -4.029918
2008-11-01  -4.260326
2008-12-01  -4.454224

Now, my question: How do I write out two columns displayed by AB on the
terminal to a file?

I have tried using AB<-data.frame(AB) but it doesn't work either.

Many thanks for your time.
Ogbos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The main trick is to pipe to as.data.frame. But the result will have one 
column only, you must assign the dates from the df's row names.

I also include an aggregate solution.



# create a test data set
set.seed(2024)
data <- data.frame(
  Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L, 
TRUE),

  count = sample(10L, 100L, TRUE)
)

# coerce tapply's result to class "data.frame"
res <- with(data, tapply(count, Date, mean)) |> as.data.frame()
# assign a dates column from the row names
res$Date <- row.names(res)
# cosmetics
names(res)[2:1] <- names(data)
# note that the row names are still tapply's names vector
# and that the columns order is not Date/count. Both are fixed
# after the calculations.
res
#>   count   Date
#> 2024-03-22 5.416667 2024-03-22
#> 2024-03-23 5.50 2024-03-23
#> 2024-03-24 6.00 2024-03-24
#> 2024-03-25 4.476190 2024-03-25
#> 2024-03-26 6.538462 2024-03-26
#> 2024-03-27 5.20 2024-03-27

# fix the columns' order
res <- res[2:1]



# better all in one instruction
aggregate(count ~ Date, data, mean)
#> Datecount
#> 1 2024-03-22 5.416667
#> 2 2024-03-23 5.50
#> 3 2024-03-24 6.00
#> 4 2024-03-25 4.476190
#> 5 2024-03-26 6.538462
#> 6 2024-03-27 5.20



Also,
I'm glad to help as always but Ogbos, you have been an R-Help 
contributor for quite a while, please post data in dput format. Given 
the problem the output of the following is more than enough.



dput(head(data, 20L))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with R coding

2024-03-12 Thread Rui Barradas

Às 07:43 de 12/03/2024, Maria Del Mar García Zamora escreveu:

Hello,

This is the error that appears when I try to load library(Rcmdr). I am using R 
version 4.3.3. I have tried to upload the packages, uninstall them and 
intalling them again and nothing.
Loading required package: splines
Loading required package: RcmdrMisc
Loading required package: car
Loading required package: carData
Loading required package: sandwich
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Error: package or namespace load failed for ‘Rcmdr’:
  .onLoad failed in loadNamespace() for 'tcltk2', details:
   call: file.exists("~/.Rtk2theme")
   error: file name conversion problem -- name too long?

Once this appears I use path.expand('~') and this is R's answer:
[1] "C:\\Users\\marga\\OneDrive - Fundaci\xf3n Universitaria San Pablo 
CEU\\Documentos"

The thing is that in spanish we use accents, so this word (Fundaci\xf3n) really 
is Fundación, but I can't change it.

I have tried to start R from CDM using: C:\Users\marga>set 
R_USER=C:\Users\marga\R_USER

C:\Users\marga>"C:\Users\marga\Desktop\R-4.3.3\bin\R.exe" CMD Rgui

At the beginning this worked but right now a message saying that this app 
cannot be used and that I have to ask the software company (photo attached)

What should I do?

Thanks,

Mar


[https://www.uchceu.es/img/externos/correo/ceu_uch.gif]<https://www.uchceu.es/>

Maria Del Mar García Zamora
Alumno UCHCEU -
Universidad CEU Cardenal Herrera
-
Tel.
www.uchceu.es<https://www.uchceu.es/>

[https://www.uchceu.es/img/logos/wur.jpg]
[https://www.uchceu.es/img/externos/correo/medio_ambiente.gif] Por favor, 
piensa en el medio ambiente antes de imprimir este contenido



[http://www.uchceu.es/img/externos/correo/ceu_uch.gif]<http://www.uchceu.es/>

Maria Del Mar García Zamora

www.uchceu.es<http://www.uchceu.es/>

[http://www.uchceu.es/img/externos/correo/medio_ambiente.gif] Por favor, piensa 
en el medio ambiente antes de imprimir este contenido




Este mensaje y sus archivos adjuntos, enviados desde FUNDACIÓN UNIVERSITARIA 
SAN PABLO-CEU, pueden contener información confidencial y está destinado a ser 
leído sólo por la persona a la que va dirigido, por lo que queda prohibida la 
difusión, copia o utilización de dicha información por terceros. Si usted lo 
recibiera por error, por favor, notifíquelo al remitente y destruya el mensaje 
y cualquier documento adjunto que pudiera contener. Cualquier información, 
opinión, conclusión, recomendación, etc. contenida en el presente mensaje no 
relacionada con la actividad de FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, y/o 
emitida por persona no autorizada para ello, deberá considerarse como no 
proporcionada ni aprobada por FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, que pone 
los medios a su alcance para garantizar la seguridad y ausencia de errores en 
la correspondencia electrónica, pero no puede asegurar la inexistencia de virus 
o la no alteración de los documentos transmitidos electrónicamente, por lo que 
declina cualquier responsabilidad a este respecto.

This message and its attachments, sent from FUNDACIÓN UNIVERSITARIA SAN 
PABLO-CEU, may contain confidential information and is intended to be read only 
by the person it is directed. Therefore any disclosure, copying or use by third 
parties of this information is prohibited. If you receive this in error, please 
notify the sender and destroy the message and any attachments may contain. Any 
information, opinion, conclusion, recommendation,... contained in this message 
and which is unrelated to the business activity of FUNDACIÓN UNIVERSITARIA SAN 
PABLO-CEU and/or issued by unauthorized personnel, shall be considered 
unapproved by FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU. FUNDACIÓN UNIVERSITARIA 
SAN PABLO-CEU implements control measures to ensure, as far as possible, the 
security and reliability of all its electronic correspondence. However, 
FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU does not guarantee that emails are 
virus-free or that documents have not be altered, and does not take 
responsibility in this respect.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

First of all, try running Rgui only, no R.exe CMD. Just Rgui.exe or
C:\Users\marga\Desktop\R-4.3.3\bin\Rgui.exe
Then, in Rgui, try loading Rcmdr

library(Rcmdr)


Also, do you have R in your Windows PATH variable? The directory to put 
in PATH should be


C:\Users\marga\Desktop\R-4.3.3\bin

so that Windows can find R.exe and Rgui.exe without the full path name.

Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar

Re: [R] help - Package: stats - function ar.ols

2024-02-23 Thread Rui Barradas

Às 16:34 de 22/02/2024, Pedro Gavronski. escreveu:

Hello,

My name is Pedro and it is nice to meet you all. I am having trouble
understanding a message that I receive when use function ar.ols from
package stats, it says that "Warning message:
In ar.ols(x = dtb[2:6966, ], demean = FALSE, intercept = TRUE,
prewhite = TRUE) :
   model order:  2 singularities in the computation of the projection
matrix results are only valid up to model order 1, which I do not know
what it means, if someone could clarify it, I would really appreciate
it.

Attached to this email you will find my code and data I used to run
this formula.

Thanks in advance.

Best regards,  Pedro.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Thanks for the data but the code is missing from the attachment.
Can you please post your code? In an attachment or directly in the 
e-mail body.


Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping

2024-02-18 Thread Rui Barradas

Às 03:27 de 19/02/2024, Steven Yen escreveu:

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is,

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a way of reading the files in a *apply loop. The file names are 
created by getting them from file (list.files) or by a string editing 
function (sprintf).



# file_names_vec <- list.files(pattern = "data\\d+\\.csv")
file_names_vec <- sprintf("data%d.csv", 1:24)
data_list <- sapply(file_names_vec, read.csv, simplify = FALSE)

# access the 1st data.frame
data_list[[1L]]
# same as above
data_list[["data1.csv"]]
# same as above
data_list$data1.csv


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages sometimes don't update, but no error or warning is thrown

2024-02-14 Thread Rui Barradas

Às 10:50 de 14/02/2024, Martin Maechler escreveu:

Berwin A Turlach
 on Wed, 14 Feb 2024 11:47:41 +0800 writes:
Berwin A Turlach
 on Wed, 14 Feb 2024 11:47:41 +0800 writes:


 > G'day Philipp,

 > On Tue, 13 Feb 2024 09:59:17 +0100 gernophil--- via R-help
 >  wrote:

 >> this question is related to this
 >> (https://community.rstudio.com/t/packages-are-not-updating/166214/3),
 >> [...]

 >> To sum it up: If I am updating packages (be it via
 >> Bioconductor or CRAN) some packages simply don’t update,
 >> [...]

 >> I would expect any kind of message that the package will
 >> not be updated, since no newer binary is available or a
 >> prompt, if I want to compile from source.

 > RStudio is doing its own thing for some task, including
 > 'install.packages()' (and for some reasons, at least on
 > the platforms on which I use RStudio, RStudio calls
 > 'install.packages()' and not 'update.packages()' when an
 > update is requested via the GUI). See:

 RStudio> install.packages
 > function (...)  .rs.callAs(name, hook, original, ...)
 > 

 > compared to:

 R> install.packages
 > function (pkgs, lib, repos = getOption("repos"),
 > contriburl = contrib.url(repos, type), method, available =
 > NULL, destdir = NULL, dependencies = NA, type =
 > getOption("pkgType"), configure.args =
 > getOption("configure.args"), configure.vars =
 > getOption("configure.vars"), clean = FALSE, Ncpus =
 > getOption("Ncpus", 1L), verbose = getOption("verbose"),
 > libs_only = FALSE, INSTALL_opts, quiet = FALSE,
 > keep_outputs = FALSE, ...)  { [...]


 > So if you use Install/Update in the Packages tab of
 > RStudio and do not experience the behaviour you are
 > expecting, it is something that you need to discuss with
 > Posit, not with R. :)

 >> However, the only message I get is: ``` trying URL
 >> ''

 > The package name has the version number encoded in it, so
 > theoretical you should be able to tell at this point
 > whether the package that is downloaded is the version that
 > is already installed, hence no update will happen.

 > Best wishes,

 >   Berwin


Yes, thank's a lot, Berwin.

Indeed I've raised the fact that RStudio
hides R's own install.packages() from the user  and uses its
own, undocumented one ... this has been the case for quite a few years.
I found out during teaching --- one of the few times, I use
RStudio to use R... in another case where RStudio's
install.packages() behaved differently than R's.

I'm pretty sure this is reason for quite a bit of confusion...

Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

From within RStudio you can always run the qualified names

utils::install.packages()
utils::update.packages()

or run from the command line.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Packages sometimes don't update, but no error or warning is thrown

2024-02-13 Thread Rui Barradas
 commented, minimal, self-contained, reproducible code.

Hello,

Not exactly an answer, just a thought:
Whenever I have problems updating or installing packages from whithin 
RStudio I close RStudio, write a script with the install.packages() call 
and run it from a command window.



R -q -f "instscript.R"


This many times works better and it also works with Bioconductor's 
BiocManager::install or with remotes'/devtools's install_github.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gathering denominator under frac

2024-02-02 Thread Rui Barradas

Às 10:01 de 02/02/2024, Troels Ring escreveu:

Hi friends - I'm plotting a ratio of bicarbonates i ggplot2 and

ylab(expression(paste(frac("additive BIC","true BIC" worked OK - but 
now I have been asked to put the chemistry instead - so I wrote


  ylab(expression(paste(frac("additive",HCO[3]^"-","true",HCO[3]^"-" 
- and frac saw that as additive = numerator and HCO3- = denominator and 
the rest was ignored-


So how do I make frac ignore the first ","  and print the fraction as I 
want?



All best wishes
Troels

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

This seems to work. Instead of separating the two numerator strings with 
a comma, separate them with a tilde. The same goes for the denominator.

And there is no need for double quotes around "additive" and "true".


library(ggplot2)

g <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point()

g + ylab(expression(paste(frac(
  additive~HCO[3]^"-",
  true~HCO[3]^"-"




Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help testing a problem

2024-02-01 Thread Rui Barradas
  grDevices utils datasets  methods   base

other attached packages:
[1] rerddap_1.1.0

loaded via a namespace (and not attached):
 [1] vctrs_0.6.3   cli_3.6.1 rlang_1.1.1   ncdf4_1.22 

 [5] crul_1.4.0generics_0.1.3jsonlite_1.8.7 
data.table_1.14.8
 [9] glue_1.6.2httpcode_0.3.0triebeard_0.4.1   fansi_1.0.5 


[13] rappdirs_0.3.3tibble_3.2.1  hoardr_0.5.4  lifecycle_1.0.4
[17] compiler_4.3.2dplyr_1.1.3   Rcpp_1.0.12   pkgconfig_2.0.3
[21] digest_0.6.33 R6_2.5.1  tidyselect_1.2.0  utf8_1.2.4
[25] pillar_1.9.0  curl_5.2.0magrittr_2.0.3urltools_1.7.3
[29] xml2_1.3.5
>



So there was an unspecified error, an error without a condition message 
and no call expression. I find this stranger, a call like the following 
is expected.



tryCatch(stop("error"), error = function(e) e) |> str()
List of 2
 $ message: chr "error"
 $ call   : language doTryCatch(return(expr), name, parentenv, handler)
 - attr(*, "class")= chr [1:3] "simpleError" "error" "condition"


Function tabledap doesn't seem to be handling errors properly.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 3-dimensions

2023-12-17 Thread Rui Barradas

Às 09:13 de 17/12/2023, SIBYLLE STÖCKLI via R-help escreveu:

Dear R community

In the meantime I made some progress:
   ggplot(data = Fig2b, aes(x = BFF, y = Wert, fill = Effekt))+theme_bw()+
 geom_bar(stat = "identity", width = 0.95) +
 scale_y_continuous(limits=c(0,13), expand=c(0,0))+
 facet_wrap(~Aspekt, strip.position = "bottom", scales = "free_x") +
 theme(panel.spacing = unit(0, "lines"),
   strip.background = element_blank(),
   strip.placement = "outside")+
 theme(axis.title.x=element_blank())+
 scale_fill_manual("Effekt", values = c("Neg" = "red", "Neu" =
"darkgrey", "Pos" = "blue"), labels=c("Negativ", "Nicht sign.", "Positiv"))
   
   
Question

- Is it possible to present all the subpolots in one graph (not to "lines")?

- I tried to change the angel of the x-axis. However, I was able to change
the first x-axis (BB...), but not the second one (Voegel). Maybe this
would solve the problem.
- If not, is there another possibility to fix the number of subplots per
line?

Kind regards
Sibylle

-Original Message-
From: R-help  On Behalf Of SIBYLLE STÖCKLI via
R-help
Sent: Saturday, December 16, 2023 12:16 PM
To: R-help@r-project.org
Subject: [R] ggplot 3-dimensions

Dear R-user

Does anybody now, if ggplot allows to use two x-axis including two
dimensions (similar to excel plot (picture 1 in the pdf attachmet). If yes,
how should I adapt my code? The parameters are presented in the input file
(attachment: Input).

Fig2b = read.delim("BFF_Fig-2b.txt", na.strings="NA")
names(Fig2b)
head(Fig2b)
summary(Fig2b)
str(Fig2b)
Fig2b$Aspekt<-factor(Fig2b$Aspekt, levels=(c("Voegel", "Kleinsaeuger",
"Schnecken", "Regenwuermer_Asseln", "Pilze")))

### Figure 2b
   ggplot(Fig2b,aes(Aspekt,Wert,fill=Effekt))+
 geom_bar(stat="identity",position='fill')+
 scale_y_continuous(limits=c(0,14), expand=c(0,0))+
 labs(x="", y="Anzahl Studien pro Effekt")

Kind regards
Sibylle


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You are posting the data as image once again, please don't do this.
Paste the output of

dput(Fig2b)# if small data
dput(head(Fig2b, 20))  # if too big to fit in an e-mail


in your mails. Here it is.



Aspekt <- c("Flora", "Flora", "Flora", "Tagfalter", "Tagfalter", 
"Tagfalter",
"Heuschre", "Heuschre", "Heuschre", "Kaefer_Sp", 
"Kaefer_Sp", "Kaefer_Sp",
"Schwebfli", "Schwebfli", "Schwebfli", "Bienen_F", 
"Bienen_F", "Bienen_F")

Aspekt <- c(Aspekt, Aspekt)
BFF <- rep(c("BB", "SA", "NE"), times = 12)
Effekt <- c(rep("Neg", times = 18), rep("Pos", times = 18))
Wert <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0,
  2, 1, 0, 0, 1, 0, 9, 4, 6, 0, 0, 3, 0, 0, 4)
Fig2b <- data.frame(Aspekt, BFF, Effekt, Wert)



As for the question, you can use facet_wrap argument nrow to have all 
plots in one row only, see the comment before facet_wrap. I don't know 
if this solves the problem.

Also, I define a custom theme to make the code clearer later.



library(ggplot2)

theme_sibylle <- function() {
  theme_bw(base_size = 10) %+replace%
theme(
  panel.spacing = unit(0, "lines"),
  strip.background = element_blank(),
  strip.placement = "outside",
  # this line was added by me, remove if not wanted
  strip.text.x.bottom = element_text(face = "bold", size = 10),
  axis.title.x = element_blank()
)
}

ggplot(data = Fig2b, aes(x = BFF, y = Wert, fill = Effekt)) +
  geom_bar(stat = "identity", width = 0.95) +
  scale_y_continuous(limits=c(0,13), expand=c(0,0)) +
  # here I use nrow = 1L to put everything in one row only
  facet_wrap(~ Aspekt, nrow = 1L, strip.position = "bottom", scales = 
"free_x") +

  scale_fill_manual(
name = "Effekt",
values = c("Neg" = "red", "Neu" = "darkgrey", "Pos" = "blue"),
labels = c("Negativ", "Nicht sign.", "Positiv")) +
  theme_sibylle()



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: Get the regression line with 95% confidence bands

2023-12-12 Thread Rui Barradas

Às 00:36 de 13/12/2023, Robert Baer escreveu:
coord_cartesian also seems to work for y, and including the breaks = . 
How about:


df=data.frame(year= c(2012,2015,2018,2022),
   score=c(495,493, 495, 474))

ggplot(df, aes(x = year, y = score)) +
   geom_point() +
   geom_smooth(method = "lm", formula = y ~ x) +
   labs(title = "Standard linear regression for France", x = "Year", y = 
"PISA score in mathematics") +

   coord_cartesian(ylim=c(470,500)) +
   scale_x_continuous(breaks = 2012:2022)

On 12/12/2023 3:19 PM, varin sacha via R-help wrote:

Dear Ben,
Dear Daniel,
Dear Rui,
Dear Bert,

Here below my R code.
I really appreciate all your comments. My R code is perfectly working 
but there is still something I would like to improve. The X-axis is 
showing   2012.5 ;   2015.0   ;   2017.5   ;  2020.0
I would like to see on X-axis only the year (2012 ; 2015 ; 2017 ; 
2020). How to do?



#
library(ggplot2)
df=data.frame(year= c(2012,2015,2018,2022), score=c(495,493, 495, 474))

ggplot(df, aes(x = year, y = score)) + geom_point() + 
geom_smooth(method = "lm", formula = y ~ x) +
  labs(title = "Standard linear regression for France", x = "Year", y 
= "PISA score in mathematics") + 
scale_y_continuous(limits=c(470,500),oob=scales::squish)

#









Le lundi 11 décembre 2023 à 23:38:06 UTC+1, Ben Bolker 
 a écrit :








On 2023-12-11 5:27 p.m., Daniel Nordlund wrote:

On 12/10/2023 2:50 PM, Rui Barradas wrote:

Às 22:35 de 10/12/2023, varin sacha via R-help escreveu:

Dear R-experts,

Here below my R code, as my X-axis is "year", I must be missing one
or more steps! I am trying to get the regression line with the 95%
confidence bands around the regression line. Any help would be
appreciated.

Best,
S.


#
library(ggplot2)
   df=data.frame(year=factor(c("2012","2015","2018","2022")),
score=c(495,493, 495, 474))
   ggplot(df, aes(x=year, y=score)) + geom_point( ) +
geom_smooth(method="lm", formula = score ~ factor(year), data = df) +
labs(title="Standard linear regression for France", y="PISA score in
mathematics") + ylim(470, 500)
#

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I don't see a reason why year should be a factor and the formula in
geom_smooth is wrong, it should be y ~ x, the aesthetics envolved.
It still doesn't plot the CI's though. There's a warning and I am not
understanding where it comes from. But the regression line is plotted.



ggplot(df, aes(x = as.numeric(year), y = score)) +
   geom_point() +
   geom_smooth(method = "lm", formula = y ~ x) +
   labs(
 title = "Standard linear regression for France",
     x = "Year",
 y = "PISA score in mathematics"
   ) +
   ylim(470, 500)
#> Warning message:
#> In max(ids, na.rm = TRUE) : no non-missing arguments to max;
returning -Inf



Hope this helps,

Rui Barradas




After playing with this for a little while, I realized that the problem
with plotting the confidence limits is the addition of ylim(470, 500).
The confidence values are outside the ylim values.  Remove the limits,
or increase the range, and the confidence curves will plot.

Hope this is helpful,

Dan

   Or use + scale_y_continuous(limits = c(470, 500), oob = 
scales::squish)



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

In the code below I don't use coord_cartesian because to set ylim will 
cut part of the confidence intervals.


To have labels only in the years present in the data set, get them from 
the data.




library(ggplot2)

df <- data.frame(year= c(2012,2015,2018,2022),

Re: [R] ggplot2: Get the regression line with 95% confidence bands

2023-12-10 Thread Rui Barradas

Às 22:35 de 10/12/2023, varin sacha via R-help escreveu:


Dear R-experts,

Here below my R code, as my X-axis is "year", I must be missing one or more 
steps! I am trying to get the regression line with the 95% confidence bands around the 
regression line. Any help would be appreciated.

Best,
S.


#
library(ggplot2)
  
df=data.frame(year=factor(c("2012","2015","2018","2022")), score=c(495,493, 495, 474))
  
ggplot(df, aes(x=year, y=score)) + geom_point( ) + geom_smooth(method="lm", formula = score ~ factor(year), data = df) + labs(title="Standard linear regression for France", y="PISA score in mathematics") + ylim(470, 500)

#

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I don't see a reason why year should be a factor and the formula in 
geom_smooth is wrong, it should be y ~ x, the aesthetics envolved.
It still doesn't plot the CI's though. There's a warning and I am not 
understanding where it comes from. But the regression line is plotted.




ggplot(df, aes(x = as.numeric(year), y = score)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
title = "Standard linear regression for France",
x = "Year",
y = "PISA score in mathematics"
  ) +
  ylim(470, 500)
#> Warning message:
#> In max(ids, na.rm = TRUE) : no non-missing arguments to max; 
returning -Inf




Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert character date time to R date-time variable.

2023-12-07 Thread Rui Barradas

Às 16:30 de 07/12/2023, Rui Barradas escreveu:

Às 16:21 de 07/12/2023, Sorkin, John escreveu:

Colleagues,

I have a matrix of character data that represents date and time. The 
format of each element of the matrix is

"2020-09-17_00:00:00"
How can I convert the elements into a valid R date-time constant?

Thank you,
John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;

Associate Director for Biostatistics and Informatics, Baltimore VA 
Medical Center Geriatrics Research, Education, and Clinical Center;


PI Biostatistics and Informatics Core, University of Maryland School 
of Medicine Claude D. Pepper Older Americans Independence Center;


Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Coerce with ?as.POSIXct
Don't forget the underscore in the format.


as.POSIXct("2020-09-17_00:00:00", format = "%Y-%m-%d_%H:%M:%S")


Hope this helps,

Rui Barradas



Sorry, I forgot:


lubridate::ymd_hms("2020-09-17_00:00:00")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert character date time to R date-time variable.

2023-12-07 Thread Rui Barradas

Às 16:21 de 07/12/2023, Sorkin, John escreveu:

Colleagues,

I have a matrix of character data that represents date and time. The format of 
each element of the matrix is
"2020-09-17_00:00:00"
How can I convert the elements into a valid R date-time constant?

Thank you,
John



John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;

Associate Director for Biostatistics and Informatics, Baltimore VA Medical 
Center Geriatrics Research, Education, and Clinical Center;

PI Biostatistics and Informatics Core, University of Maryland School of 
Medicine Claude D. Pepper Older Americans Independence Center;

Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Coerce with ?as.POSIXct
Don't forget the underscore in the format.


as.POSIXct("2020-09-17_00:00:00", format = "%Y-%m-%d_%H:%M:%S")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mann Kendall mutation package?

2023-12-01 Thread Rui Barradas

Às 11:58 de 01/12/2023, Nick Wray escreveu:

Hello - does anyone know whether there are any packages for Mann-Kendall
mutation tests in R available?  The only one I could find online is this
MK_mut_test: Mann-Kendall mutation test in Sibada/sibadaR: Sibada's
accumulated R scripts for next probably use to avoid reinventing the wheel.
(rdrr.io) <https://rdrr.io/github/Sibada/sibadaR/man/MK_mut_test.html> but
there doesn't seem to be a package corresponding to this.  I've tried
installing various permutations of the apparent name Sibada/sibadaR but
nothing comes up, so I'm not sure whether it even exists...

Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Your link points to a GitHub repository, the package can be installed with


devtools::install_github(repo = "Sibada/sibadaR")



Hope this helps

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] back tick names with predict function

2023-11-30 Thread Rui Barradas

Às 17:57 de 30/11/2023, Rui Barradas escreveu:

Às 17:38 de 30/11/2023, Robert Baer escreveu:
I am having trouble using back ticks with the R extractor function 
'predict' and an lm() model.  I'm trying too construct some nice 
vectors that can be used for plotting the two types of regression 
intervals.  I think it works with normal column heading names but it 
fails when I have "special" back-tick names.  Can anyone help with how 
I would reference these?  Short of renaming my columns, is there a way 
to accomplish this?


Repex

*# dataframe with dashes in column headings
cob =
   structure(list(`cob-wt` = c(212, 241, 215, 225, 250, 241, 237,
 282, 206, 246, 194, 241, 196, 193, 224, 
257, 200, 190, 208, 224

), `plant-density` = c(137, 107, 132, 135, 115, 103, 102, 65,
    149, 85, 173, 124, 157, 184, 112, 80, 165, 
160, 157, 119)),

class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

# regression model works
mod2 = lm(`cob-wt` ~ `plant-density`, data = cob)

# x sequence for plotting CI's
# Set up x points
x = seq(min(cob$`plant-density`), max(cob$`plant-density`), length = 
1000)


# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.c = predict(mod2, data.frame( `plant-density` = x), interval = 'c') 
# fail


# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.p = predict(mod2, data.frame(`plant-density`  = x), interval = 
'p')    # fail

*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

When creating the new data df, the default check.names = TRUE changes 
the column name, it is repaired and the hyphen is replaced by a legal dot.



# check.names defaults to TRUE
newd <- data.frame(`plant-density` = x)
# `plant-density` is not a column name
head(newd)

# check.names set to FALSE
newd <- data.frame(`plant-density` = x, check.names = FALSE)
# `plant-density` is becomes a column name
head(newd)


# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.c = predict(mod2, newdata = newd, interval = 'confidence')  # fail

# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.p = predict(mod2, newdata = newd, interval = 'prediction')    # fail



Hope this helps,

Rui Barradas



Hello,

Sorry for the comments '# fail' in the last two instructions, I should 
have changed them.



CI.c <- predict(mod2, newdata = newd, interval = 'confidence')  # works
CI.p <- predict(mod2, newdata = newd, interval = 'prediction')  # works


Hoep this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] back tick names with predict function

2023-11-30 Thread Rui Barradas

Às 17:38 de 30/11/2023, Robert Baer escreveu:
I am having trouble using back ticks with the R extractor function 
'predict' and an lm() model.  I'm trying too construct some nice vectors 
that can be used for plotting the two types of regression intervals.  I 
think it works with normal column heading names but it fails when I have 
"special" back-tick names.  Can anyone help with how I would reference 
these?  Short of renaming my columns, is there a way to accomplish this?


Repex

*# dataframe with dashes in column headings
cob =
   structure(list(`cob-wt` = c(212, 241, 215, 225, 250, 241, 237,
     282, 206, 246, 194, 241, 196, 193, 224, 
257, 200, 190, 208, 224

), `plant-density` = c(137, 107, 132, 135, 115, 103, 102, 65,
    149, 85, 173, 124, 157, 184, 112, 80, 165, 160, 
157, 119)),

class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))

# regression model works
mod2 = lm(`cob-wt` ~ `plant-density`, data = cob)

# x sequence for plotting CI's
# Set up x points
x = seq(min(cob$`plant-density`), max(cob$`plant-density`), length = 1000)

# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.c = predict(mod2, data.frame( `plant-density` = x), interval = 'c') # 
fail


# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement
CI.p = predict(mod2, data.frame(`plant-density`  = x), interval = 
'p')    # fail

*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

When creating the new data df, the default check.names = TRUE changes 
the column name, it is repaired and the hyphen is replaced by a legal dot.



# check.names defaults to TRUE
newd <- data.frame(`plant-density` = x)
# `plant-density` is not a column name
head(newd)

# check.names set to FALSE
newd <- data.frame(`plant-density` = x, check.names = FALSE)
# `plant-density` is becomes a column name
head(newd)


# Use predict to get CIs for a plot
# Add CI for regression line (y-hat uses 'c')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.c = predict(mod2, newdata = newd, interval = 'confidence')  # fail

# Add CI for prediction value (y-tilde uses 'p')
# usual trick is to assign x to actual x-var name in middle dataframe 
arguement

CI.p = predict(mod2, newdata = newd, interval = 'prediction')# fail



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot with two x-axis and two dimensions

2023-11-25 Thread Rui Barradas

Às 10:29 de 24/11/2023, sibylle.stoec...@gmx.ch escreveu:

Dear R-user

Does anybody now, if ggplot allows to use two x-axis including two
dimensions (similar to excel plot (picture 1 in the pdf attachmet). If yes,
how should I adapt my code? The parameters are presented in the input file
(attachment: Input).

Fig2b = read.delim("BFF_Fig-2b.txt", na.strings="NA")
names(Fig2b)
head(Fig2b)
summary(Fig2b)
str(Fig2b)
Fig2b$Aspekt<-factor(Fig2b$Aspekt, levels=(c("Voegel", "Kleinsaeuger",
"Schnecken", "Regenwuermer_Asseln", "Pilze")))

### Figure 2b
   ggplot(Fig2b,aes(Aspekt,Wert,fill=Effekt))+
 geom_bar(stat="identity",position='fill')+
 scale_y_continuous(limits=c(0,14), expand=c(0,0))+
 labs(x="", y="Anzahl Studien pro Effekt")

Kind regards
Sibylle


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The first attached file does not match the data in the second file but 
here is an answer to both this question and to your other question [1].


The trick to have a secondary axis is to compute a ratio of axis 
lenghts. The lengths of the main and secondary axis can be computed by 
functions range() and diff(), like in the code below. Then use it to 
scale the secondary axis.




Fig2b <-
  structure(list(
Aspekt = c("Flora", "Flora", "Flora", "Tagfalter",
   "Tagfalter", "Tagfalter", "Heuschre", "Heuschre", 
"Heuschre",
   "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", "Schwebfli", 
"Schwebfli",
   "Schwebfli", "Bienen_F", "Bienen_F", "Bienen_F", 
"Flora", "Flora",
   "Flora", "Tagfalter", "Tagfalter", "Tagfalter", 
"Heuschre", "Heuschre",
   "Heuschre", "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", 
"Schwebfli",
   "Schwebfli", "Schwebfli", "Bienen_F", "Bienen_F", 
"Bienen_F"),

BFF = c("BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE",
"BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB",
"SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA",
"NE", "BB", "SA", "NE", "BB", "SA", "NE"),
Effekt = c("Neu",
   "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", 
"Neu",
   "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", 
"Pos",
   "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", 
"Pos",

   "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos"),
Wert = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 3L, 1L, 1L,
 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 1L, 0L,
 9L, 4L, 6L, 0L, 0L, 3L, 0L, 0L, 4L)),
row.names = c(NA, -36L), class = "data.frame")


library(ggplot2)

# First y axis (0-9)
# Second y axis (0-2500)
# fac <- diff(range( sec axis ))/diff(range( 1st axis ))
fac <- diff(range(0, 2500))/diff(range(0, 9))

ggplot(Fig2b, aes(Aspekt, Wert, fill = Effekt)) +
  geom_col(position = position_dodge()) +
  scale_y_continuous(
breaks = seq(0, 12, 2L),
sec.axis = sec_axis(~ . * fac)
  ) +
  labs(x = "", y = "Anzahl Studien pro Effekt")




[1] https://stat.ethz.ch/pipermail/r-help/2023-November/478605.html

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to draw mean values and 95% confidence intervals of groups with ggplot2

2023-11-16 Thread Rui Barradas

Às 11:59 de 16/11/2023, Luigi Marongiu escreveu:

Hello,
I have triplicate (column A) readings (column D) of samples exposed to
different concentrations (column C) over time (column B).
Is it possible to draw a line plot of the mean values for each
concentration (C)? At the moment, I get a single line.
Also, is there a simple way to draw the 95% CI around these data? I
know I need to use ribbon with the lower and upper limit, but is there
a simple way for ggplot2 to calculate directly these values?
Here is a working example:

```
A = c(rep(1, 28), rep(2, 28), rep(3, 28))
B = rep(c(0, 15, 30, 45, 60, 75, 90), 12)
C = rep(c(rep(0, 7), rep(0.6, 7), rep(1.2, 7),
   rep(2.5,7)),3)
D = c(731.33,761.67,730,761.67,741.67,788.67,784.33,
   686.67,685.33,680,693.67,684,704,709.67,739,
   731,719,767,760.67,776.67,768.67,675,671.67,
   668.67,677.33,673.67,687,696.67,727,750.67,
   752.67,786.67,794.67,843.33,946,732.67,737.33,
   775.33,828,918,1063,1270,752.67,742.33,
   735.67,
   747.67,777.33,803.67,865.67,700,700.67,705.67,
   722.67,744,779,837,748,742,754,747.67,
   775.67,808.67,869,705.67,714.33,702.33,730,
   710.67,731,744,686.33,687.33,670,702.33,
   669.33,707.33,708.33,724,747,761.33,715,
   697.67,728,728)

df = data.frame(A, B, C, D)
library(ggplot2)
ggplot(data=df, aes(x=B, y=D, z=C, color =C)) +
   geom_line(stat = "summary", fun = "mean") +
   geom_ribbon()
```

Thank you

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I am not sure that the code below is what you want.
The first 3 instructions are to create a named vector of colors.
The pipe is what tries to solve the problem. It computes means and se's 
by groups of time and concentration, then plots the ribbon below the lines.


It is important to not set color = C in the initial call to ggplot, 
since it would be effective in all the subsequent layers (try it).

To have one line per concentration I use group = C instead.



suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
})

n_colors <- df$C |> unique() |> length()
names_colors <- df$C |> unique() |> as.character()
clrs <- setNames(palette.colors(n_colors), names_colors)

df %>%
  mutate(C = factor(C)) %>%
  group_by(B, C) %>%
  mutate(mean_D = mean(D), se_D = sd(D)) %>%
  ungroup() %>%
  ggplot(aes(x = B, group = C)) +
  geom_ribbon(aes(ymin = mean_D - se_D, ymax = mean_D + se_D), fill = 
"grey", alpha = 0.5) +

  geom_line(aes(y = mean_D, color = C)) +
  geom_point(aes(y = D, color = C)) +
  scale_color_manual(name = "Concentration", values = clrs)


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anyone having trouble accesing CRAN?

2023-11-15 Thread Rui Barradas

Às 19:13 de 15/11/2023, Christopher W. Ryan via R-help escreveu:

at https://cran.r-project.org/ I get this error message:

=
Secure Connection Failed

An error occurred during a connection to cran.r-project.org.
PR_END_OF_FILE_ERROR

Error code: PR_END_OF_FILE_ERROR

 The page you are trying to view cannot be shown because the
authenticity of the received data could not be verified.
===

Three different browsers, two different devices, two different networks.
(The text of the error messages varies.)

Anyone seeing similar?

Thanks.

--Chris Ryan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Yes, CRAN is down.

I know last week there was an anouncement about a maintenance scheduled 
but I cannot place that e-mail right now and don't remember the date 
exactly so I cannot say for sure this is what is happening.


But it is probably a scheduled maintenance.

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cryptic error for mscmt function

2023-11-06 Thread Rui Barradas

Às 13:35 de 05/11/2023, Leu Thierry escreveu:

Hi everyone,


I am trying to conduct a synthetic control analysis using the MSCMT package. However, when 
trying to run it I get a very cryptic error message saying  "Error in 
lst[[nam]][intersect(tim, rownames(lst[[nam]])), cols, drop = FALSE]: subscript out of 
bounds". Does anyone know what this means and why I receive this error? I attached the 
code & dataset used in the attachment. Thanks a lot!


Best regards

Thierry

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No attachment came through the filters, can you resend in plain text or 
if it was a .R file, rename it .txt?


See [1], section General Instructions for more on this

[1] https://www.r-project.org/mail.html#instructions

Hope this helps,

Rui Barradas

--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-04 Thread Rui Barradas
___
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






Hello,

Here are two solutions.

1. Base R

Though I don't coerce the date column to class "Date", it seems to work.


aggregate(EnergykWh ~ date, dt1, sum)
#>date EnergykWh
#> 1 1/14/2016  11.98569
#> 2 1/15/2016  32.56938
#> 3 1/16/2016  21.29181
#> 4 1/17/2016  22.88083
#> 5 1/18/2016   9.05750


2. Package dplyr.
First column date is coerced from class "character" to class "Date".
Then the grouped sums are computed.


suppressPackageStartupMessages(
  library(dplyr)
)

dt1 %>%
  mutate(date = as.Date(date, "%m/%d/%Y")) %>%
  summarise(EnergykWh = sum(EnergykWh), .by = date)
#> date EnergykWh
#> 1 2016-01-14  11.98569
#> 2 2016-01-15  32.56938
#> 3 2016-01-16  21.29181
#> 4 2016-01-17  22.88083
#> 5 2016-01-18   9.05750


As you can see, the results are the same.

Also, this exact problem is one of the most asked on StackOverflow. 
Maybe you could try searching there for a solution. My code above is 
also exactly the code in [1], though I had already this answer written. 
I only checked after :(.



[1] 
https://stackoverflow.com/questions/61548758/r-how-sum-values-by-group-by-date



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing shapes in legend with scale_shape_manual

2023-10-31 Thread Rui Barradas

Às 20:55 de 30/10/2023, Kevin Zembower via R-help escreveu:

Hello,

I'm trying to plot a graph of blood glucose versus date. I also record
conditions, such as missing the previous night's medications, and
missing exercise on the previous day. My data looks like:


b2[68:74,]

# A tibble: 7 × 5
   Date   Time  bg missed_meds no_exercise
 
1 2023-10-17 08:50128 TRUEFALSE
2 2023-10-16 06:58144 FALSE   FALSE
3 2023-10-15 09:17137 FALSE   TRUE
4 2023-10-14 09:04115 FALSE   FALSE
5 2023-10-13 08:44136 FALSE   TRUE
6 2023-10-12 08:55122 FALSE   TRUE
7 2023-10-11 07:55150 TRUETRUE




This gets me most of the way to what I want:

ggplot(data = b2, aes(x = Date, y = bg)) +
 geom_line() +
 geom_point(data = filter(b2, missed_meds),
shape = 20,
size = 3) +
 geom_point(data = filter(b2, no_exercise),
shape = 4,
size = 3) +
 geom_point(aes(x = Date, y = bg, shape = missed_meds),
alpha = 0) + #Invisible point layer for shape mapping
 scale_y_continuous(name = "Blood glucose (mg/dL)",
breaks = seq(100, 230, by = 20)
) +
 geom_hline(yintercept = 130) +
 scale_shape_manual(name = "Conditions",
labels = c("Missed meds",
   "Missed exercise"),
values = c(20, 4),
## size = 3
)

However, the legend just prints an empty square in front of the labels.
What I want is a filled circle (shape 20) in front of "Missed meds" and
a filled circle (shape 4) in front of "Missed exercise."

My questions are:
  1. How can I fix my plot to show the shapes in the legend?
  2. Can my overall plotting method be improved? Would you do it this
way?

Thanks so much for your advice and guidance.

-Kevin



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

In ggplot2 graphics when you have more than one call to the same layer 
function, then you can probably simplify the code.


In this case you make several calls to geom_point. This can probably be 
avoided.


Create a new column named Condition.
Assign to it the column names wherever the values of those columns are 
TRUE. The simplest way of doing this is to use colus missed_meds and 
no_exercise as logical index columns, see code below.


Like this the values are mapped to shapes in just one call to geom_point.
That's what function aes() is meant for, to tell what variables define 
what in the plot.




b2$Date <- as.Date(b2$Date)
# this new column will be mapped to the shape aesthetic
b2$Conditions <- NA_character_
b2$Conditions[b2$missed_meds] <- names(b2)[4]
b2$Conditions[b2$no_exercise] <- names(b2)[5]

ggplot(data = b2, aes(x = Date, y = bg)) +
  geom_line() +
  geom_point(aes(shape = Conditions), size = 3) +
  geom_hline(yintercept = 130) +
  scale_y_continuous(
name = "Blood glucose (mg/dL)",
breaks = seq(100, 230, by = 20)
  ) +
  scale_shape_manual(
#name = "Conditions",
labels = c("Missed meds", "Missed exercise"),
values = c(20, 4),
na.translate = FALSE
  )



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Reformat a dataframe

2023-10-28 Thread Rui Barradas
t to do is, instead of having 12 observations  by row, I want to
have one observation by row. I want to have a single column with 1509
observations instead of 126 rows with 12 columns per row.

I tried the following:
df = data.frame(matrix(nrow = Length, ncol = 1))
colnames(df) = c("aportes_alajuela")



for (row in 1:nrow(alajuela_df)){
   for (col in 1:ncol(alajuela_df)){
 df[i,1]=alajuela_df[i,j]
   }
}

But I am not getting the data in the structure I want.

Any help will be greatly appreciated.

Best regards,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here are two base R way with ?stack and with ?reshape.


# 1. With stack()
df_long <- stack(alajuela_df)[1]
df_long <- df_long[complete.cases(df_long), , drop = FALSE]
head(df_long)



# 2. With reshape
df_long <- reshape(
  alajuela_df, direction = "long",
  varying = names(alajuela_df),
  v.names = "x"
)[2]

# 1512 rows, only one column
dim(df_long)
# [1] 15121

# there are NA's in the data
df_long[complete.cases(df_long), , drop = FALSE] |> dim()
# [1] 15091

# keep the rows with values not NA
df_long <- df_long[complete.cases(df_long), , drop = FALSE]

# check the dimensions again
dim(df_long)
# [1] 15091



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot for 10 years extrapolation

2023-10-27 Thread Rui Barradas

Às 19:23 de 26/10/2023, varin sacha via R-help escreveu:

Dear R-Experts,

Here below my R code working but I don't know how to complete/finish my R code 
to get the final plot with the extrapolation for the10 more years.

Indeed, I try to extrapolate my data with a linear fit over the next 10 years. 
So I create a date sequence for the next 10 years and store as a dataframe to 
make the prediction possible.
Now, I am trying to get the plot with the actual data (from year 2004 to 2018) 
and with the 10 more years extrapolation.

Thanks for your help.


date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", 
"2010-12-31", "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", "2004-12-31"))
  
value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 10087, 14987, 11098, 13421, 9023, 12098)
  
model <- lm(value~date)
  
plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot")

abline(model,col="darkorange",lwd=2)
  
dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", length.out=10))
  
predict(model,dfuture,interval="prediction")



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Here is a way with base R graphics. Explained in the code comments.




date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31",
 "2015-12-31", "2014-12-31", "2013-12-31",
 "2012-12-31", "2011-12-31", "2010-12-31",
 "2009-12-31", "2008-12-31", "2007-12-31",
 "2006-12-31", "2005-12-31", "2004-12-31"))

value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209,
  11099, 10087, 14987, 11098, 13421, 9023, 12098)

model <- lm(value ~ date)

dfuture <- data.frame(date = seq(as.Date("2019-12-31"), by="1 year", 
length.out=10))




predfuture <- predict(model, dfuture, interval="prediction")
dfuture <- cbind(dfuture, predfuture)

# start the plot with the required x and y limits
xlim <- range(c(date, dfuture$date))
ylim <- range(c(value, dfuture$fit))

plot(value ~ date, col="grey", pch=20, cex=1.5, main="Plot"
 , xlim = xlim, ylim = ylim)

# abline extends the fitted line past the x value (date)
# limit making the next ten years line ugly and not even
# completely overplotting the abline drawn line
abline(model, col="darkorange", lwd=2)
lines(fit ~ date, dfuture
  # , lty = "dashed"
  , lwd=2
  , col = "black")

# if lines() is used for both the interpolated and extrapolated
# values you will have a gap between both fitted and predicted lines
# but it is closer to what you want

# get the fitted values first (interpolated values)
ypred <- predict(model)

plot(value ~ date, col="grey", pch=20, cex=1.5, main="Plot"
 , xlim = xlim, ylim = ylim)

# plot the interpolated values
lines(ypred ~ date, col="darkorange", lwd = 2)
# and now the extrapolated values
# I use normal orange to make the difference more obvious
lines(fit ~ date, dfuture, lty = "dashed", lwd=2, col = "orange")



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bug in print for data frames?

2023-10-26 Thread Rui Barradas

Hello,

Inline.

Às 13:32 de 26/10/2023, Ebert,Timothy Aaron escreveu:

The "problem" goes away if you use

x$C <- y[1,]


Actually, if I understand correctly, the OP wants the column:


x$C <- y[,1]


In this case it will produce the same output because y is a df with only 
one row. But that is a very special case, the general case would be to 
extract the column.


Hope this helps,

Rui Barradas



If you have another row in your x, say:
x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6))

then your code
x$C <- y[1]
returns an error.

If y has the same number of rows as x$C then R has the same outcome as in your 
example.

It looks like your code tells R to replace all of column C (including the name) 
with all of vector y.

Maybe unexpected, but not a bug. It is consistent.


-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Thursday, October 26, 2023 6:43 AM
To: Christian Asseburg ; r-help@r-project.org
Subject: Re: [R] Bug in print for data frames?

[External Email]

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

A B C
1 1 2 3

x$B <- y$A # works as expected
x

A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
   $ A: num 1
   $ B: num 1
   $ C:'data.frame':  1 obs. of  1 variable:
..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat/
.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
%7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84
%7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D&reserved=0
PLEASE do read the posting guide
http://www.r/
-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C23
7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FN
CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.

Hello,

To expand on the good answers already given, I will present two other example 
data sets.

Example 1. Imagine that instead of assigning just one column from y to x$C you 
assign two columns. The result is a data.frame column. See what is displayed as 
the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the operator 
`[[` doesn't work. You will have to extract the columns y$A and y$B one by one.



x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and then 
correct the resulting df. For instance, when computing more than one summary 
statistics.

str(agg)  below shows that the result summary stats is a matrix, so you have a 
column-matrix. And once again the displayed names reflect that.

The trick to make the result a df is to extract all but the last column as a 
sub-df, extract the last column's values as a matrix (which it is) and then 
cbind the two together.

cbind is a generic function. Since the first argument to cbind is a sub-df, the 
method called is cbind.data.frame and the result is a df.



df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics # note that it 
returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = 
sd(x))) agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that

Re: [R] Bug in print for data frames?

2023-10-26 Thread Rui Barradas

Às 07:18 de 25/10/2023, Christian Asseburg escreveu:

Hi! I came across this unexpected behaviour in R. First I thought it was a bug in 
the assignment operator <- but now I think it's maybe a bug in the way data 
frames are being printed. What do you think?

Using R 4.3.1:


x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1)
x

   A B C
1 1 2 3

x$B <- y$A # works as expected
x

   A B C
1 1 1 3

x$C <- y[1] # makes C disappear
x

   A B A
1 1 1 1

str(x)

'data.frame':   1 obs. of  3 variables:
  $ A: num 1
  $ B: num 1
  $ C:'data.frame':  1 obs. of  1 variable:
   ..$ A: num 1

Why does the print(x) not show "C" as the name of the third element? I did mess 
up the data frame (and this was a mistake on my part), but finding the bug was harder 
because print(x) didn't show the C any longer.

Thanks. With best wishes -

. . . Christian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To expand on the good answers already given, I will present two other 
example data sets.


Example 1. Imagine that instead of assigning just one column from y to 
x$C you assign two columns. The result is a data.frame column. See what 
is displayed as the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the 
operator `[[` doesn't work. You will have to extract the columns y$A and 
y$B one by one.




x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and 
then correct the resulting df. For instance, when computing more than 
one summary statistics.


str(agg)  below shows that the result summary stats is a matrix, so you 
have a column-matrix. And once again the displayed names reflect that.


The trick to make the result a df is to extract all but the last column 
as a sub-df, extract the last column's values as a matrix (which it is) 
and then cbind the two together.


cbind is a generic function. Since the first argument to cbind is a 
sub-df, the method called is cbind.data.frame and the result is a df.




df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)

# the anonymous function computes more than one summary statistics
# note that it returns a named vector
agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x)))
agg
#>   AX.Mean   X.S
#> 1 a 14.50  9.082951
#> 2 b 15.50  9.082951
#> 3 c 16.50  9.082951

# similar effect as in the OP, The difference is that the last
# column is a matrix, not a data.frame
str(agg)
#> 'data.frame':3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a convenience, avoids repeated calls to ncol
nc <- ncol(agg)
cbind(agg[-nc], agg[[nc]])
#>   A MeanS
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951

# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame':3 obs. of  3 variables:
#>  $ A   : chr  "a" "b" "c"
#>  $ Mean: num  14.5 15.5 16.5
#>  $ S   : num  9.08 9.08 9.08



If the anonymous function hadn't returned a named vetor, the new column 
names would have been "1". "2", try it.



Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] by function does not separate output from function with mulliple parts

2023-10-25 Thread Rui Barradas
----
#> mydata$StepType: Second
#> lm model parameter contrast
#>
#>   Contrast S.E. LowerUpper t df Pr(>|t|)
#> 1   -2.435 1.819421 -6.198759 1.328759 -1.34 23   0.1939


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running crossvalidation many times MSE for Lasso regression

2023-10-24 Thread Rui Barradas
  >> >> MSE
       >> >> lst[i]<-MSE
       >> >> }
       >> >> mean(unlist(lst))
       >> >> ##
       >> >>
       >> >>
       >> >>
       >> >>
       >> >> __
       >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       >> >> https://stat.ethz.ch/mailman/listinfo/r-help
       >> >> PLEASE do read the posting guide
       >> http://www.R-project.org/posting-guide.html
       >> >> and provide commented, minimal, self-contained, reproducible code.
       >> >
       >> > __
       >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       >> > https://stat.ethz.ch/mailman/listinfo/r-help
       >> > PLEASE do read the posting guide
       >> http://www.R-project.org/posting-guide.html
       >> > and provide commented, minimal, self-contained, reproducible code.
       >>
       >> __
       >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       >> https://stat.ethz.ch/mailman/listinfo/r-help
       >> PLEASE do read the posting guide
       >> http://www.R-project.org/posting-guide.html
       >> and provide commented, minimal, self-contained, reproducible code.
       >>


       > --
       > Jin
       > --
       > Jin Li, PhD
       > Founder, Data2action, Australia
       > https://www.researchgate.net/profile/Jin_Li32
       > https://scholar.google.com/citations?user=Jeot53EJ&hl=en

       > [[alternative HTML version deleted]]




       > __
       > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
       > https://stat.ethz.ch/mailman/listinfo/r-help
       > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
       > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

In your OP, the following two code lines are where that error comes from.


predictLasso=predict(cv_model, newx=test1)

ypred=predict(predictLasso,newdata=test1)



predictLasso already are predictions, it's the output of predict. So 
when you run the 2nd line above you are passing it a matrix, not a 
fitted model, and the error is thrown.


After the several suggestion in this thread, don't you want something 
like this instead of your for loop?



# make the results reproducible
set.seed(2023)
# this is better than what you had
z <- TT[c("x1", "x2")] |> as.matrix()
y <- TT[["y"]]
cv_model <- cv.glmnet(z, y, alpha = 1, type.measure = "mse")
best_lambda <- cv_model$lambda.min
best_lambda

# these two values should be the same, and they are
# index to minimum mse
(i <- cv_model$index[1])
which(cv_model$lambda == cv_model$lambda.min)

# these two values should be the same, and they are
# value of minimum mse
cv_model$cvm[i]
min(cv_model$cvm)

plot(cv_model)



Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Rui Barradas

Às 19:35 de 18/10/2023, Leonard Mada escreveu:

Dear Rui,

On 10/18/2023 8:45 PM, Rui Barradas wrote:

split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
    stringr::str_replace_all(mol, regex, "#") |>
  strsplit("#|[[:digit:]]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
    strsplit(x, regex, perl = TRUE)
  }
}

split.symbol.character = function(x, rm.digits = TRUE) {
  # Perl is partly broken in R 4.3, but this works:
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  s <- strsplit(x, regex, perl = TRUE)
  if(rm.digits) {
    s <- lapply(s, \(x) x[grep("[[:digit:]]+", x, invert = TRUE)])
  }
  s
}


You have a glitch (mol is hardcoded) in the code of the first function. 
The times are similar, after correcting for that glitch.


Note:
- grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]", 
...)!

- corrected results below;

Sincerely,

Leonard
###

split_chem_elements <- function(x, rm.digits = TRUE) {
   regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
   if(rm.digits) {
     stringr::str_replace_all(x, regex, "#") |>
   strsplit("#|[[:digit:]]") |>
   lapply(\(x) x[nchar(x) > 0L])
   } else {
     strsplit(x, regex, perl = TRUE)
   }
}

split.symbol.character = function(x, rm.digits = TRUE) {
   # Perl is partly broken in R 4.3, but this works:
   regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
   s <- strsplit(x, regex, perl = TRUE)
   if(rm.digits) {
     s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)])
   }
   s
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
mol1 <- rep(mol, 1)

system.time(
   split_chem_elements(mol1)
)
#   user  system elapsed
#   0.58    0.00    0.58

system.time(
   split.symbol.character(mol1)
)
#   user  system elapsed
#   0.67    0.00    0.67


Hello,

You are right, sorry for the blunder :(.
In the code below I have replaced stringr::str_replace_all by the 
package stringi function stri_replace_all_regex and the improvement is 
significant.



split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
stringi::stri_replace_all_regex(x, "#", regex) |>
  strsplit("#|[0-9]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
strsplit(x, regex, perl = TRUE)
  }
}

# system.time(
#   split_chem_elements(mol1)
# )
#  user  system elapsed
#  0.060.000.09
# system.time(
#   split.symbol.character(mol1)
# )
#  user  system elapsed
#  0.250.000.28



Hope this helps,

Rui Barradas




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Rui Barradas

Às 17:24 de 18/10/2023, Leonard Mada escreveu:

Dear Rui,

Thank you for your reply.

I do have actually access to the chemical symbols: I have started to 
refactor and enhance the Rpdb package, see Rpdb::elements:

https://github.com/discoleo/Rpdb

However, the regex that you have constructed is quite heavy, as it needs 
to iterate through all chemical symbols (in decreasing nchar). Elements 
like C, and especially O, P or S, appear late in the regex expression - 
but are quite common in chemistry.


The alternative regex is (in this respect) simpler. It actually works 
(once you know about the workaround).


Q: My question focused if there is anything like is.numeric, but to 
parse each element of a vector.


Sincerely,


Leonard


On 10/18/2023 6:53 PM, Rui Barradas wrote:

Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:

Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there
any better ways?

I was working to extract chemical elements from a formula, something
like this:
split.symbol.character = function(x, rm.digits = TRUE) {
      # Perl is partly broken in R 4.3, but this works:
      regex = 
"(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";

      # stringi::stri_split(x, regex = regex);
      s = strsplit(x, regex, perl = TRUE);
      if(rm.digits) {
      s = lapply(s, function(s) {
          isNotD = is.na(suppressWarnings(as.numeric(s)));
          s = s[isNotD];
      });
      }
      return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb
PLEASE do read the posting guide
https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK
and provide commented, minimal, self-contained, reproducible code.

Hello,

If you want to extract chemical elements symbols, the following might 
work.

It uses the periodic table in GitHub package chemr and a package stringr
function.


devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
    data(pt, package = "chemr", envir = environment())
    el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
    pat <- paste(el, collapse = "|")
    stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"


It is also possible to rewrite the function without calls to non base
packages but that will take some more work.

Hope this helps,

Rui Barradas



Hello,

You and Avi are right, my function's performance is terrible. The 
following is much faster.


As for how to not have digits throw warnings, the lapply in the version 
of your function below solves it by setting grep argument invert = TRUE. 
This will get all strings where digits do not occur.




split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
stringr::str_replace_all(mol, regex, "#") |>
  strsplit("#|[[:digit:]]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
strsplit(x, regex, perl = TRUE)
  }
}

split.symbol.character = function(x, rm.digits = TRUE) {
  # Perl is partly broken in R 4.3, but this works:
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  s <- strsplit(x, regex, perl = TRUE)
  if(rm.digits) {
s &l

Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Rui Barradas

Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:

Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there 
any better ways?


I was working to extract chemical elements from a formula, something 
like this:

split.symbol.character = function(x, rm.digits = TRUE) {
     # Perl is partly broken in R 4.3, but this works:
     regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
     # stringi::stri_split(x, regex = regex);
     s = strsplit(x, regex, perl = TRUE);
     if(rm.digits) {
     s = lapply(s, function(s) {
         isNotD = is.na(suppressWarnings(as.numeric(s)));
         s = s[isNotD];
     });
     }
     return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

If you want to extract chemical elements symbols, the following might work.
It uses the periodic table in GitHub package chemr and a package stringr 
function.



devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
  data(pt, package = "chemr", envir = environment())
  el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
  pat <- paste(el, collapse = "|")
  stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"


It is also possible to rewrite the function without calls to non base 
packages but that will take some more work.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a time series

2023-10-16 Thread Rui Barradas

Às 11:12 de 16/10/2023, ahmet varlı escreveu:


Hello everyone,

� had 15 minutes of data from 2017-11-02 13:30:00 to  2022-11-26 23:45:00 and 
number of data is 177647

� would like to ask why my time series are less then my expectation.


baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET")
bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET")  #
zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15)


length(zaman_seti)
[1] 177642

but it has to be  177647



and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- HH:MM:SS)

su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%Y-%m-%d 
%H:%M:%S")

I am using this code to change the format but it gives result as Na

How can � solve this problem?

Bests,





[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Given your date format, try


format = "%d.%m.%Y %H:%M"


Test with your date time:



x <- "2.11.2017 13:30"
as.POSIXct(x, format = "%d.%m.%Y %H:%M")
#> [1] "2017-11-02 13:30:00 WET"

as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%d.%m.%Y %H:%M")


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if-else that returns vector

2023-10-12 Thread Rui Barradas

Às 21:22 de 12/10/2023, Christofer Bogaso escreveu:

Hi,

Following expression returns only the first element

ifelse(T, c(1,2,3), c(5,6))

However I am looking for some one-liner expression like above which
will return the entire vector.

Is there any way to achieve this?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I don't like it but


ifelse(rep(T, length(c(1,2,3))), c(1,2,3), c(5,6))


maybe you should use


max(length(c(1, 2, 3)), length(5, 6)))


instead, but it's still ugly.

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Text showing when R is launched

2023-10-11 Thread Rui Barradas

Às 19:21 de 11/10/2023, George Loftus escreveu:

Hi,

Thankyou for your response

<https://1drv.ms/i/s!AkfoLX--ikbqkweYckSQiXYKXJuR>
[https://9c11xq.db.files.1drv.com/y4m7xqt5yVu7b5IG1jFuopunwB7Oa9Eij0WeZ7p1lSSmBECcSIB3XjcKjXIUhdMrJwaJdjZnBRhMeAxY0_Kko06Nq1fm5IhqaHlT6aFeI3R7gicXCteRPkzqNwmCdVxZu5DhNq66IrpwDyQ1lr8E5OFdm_xL86pMgNSLAx5HRRKLPOmFdUFWdv1ID-D1PC6LvNvAB-rT87JiQonSHRJIHouLg?width=200&height=150&cropmode=center]
[https://res-h3.public.cdn.office.net/assets/mail/file-icon/png/cloud_blue_16x16.png]Screenshot
 2023-10-11 at 19.19.48.png
?

However this is all that exists in Users/Admin

There were a couple of R files in there which I have since deleted but I am 
still getting the same issue

Thankyou,
George

________
From: Rui Barradas 
Sent: 10 October 2023 12:06
To: George Loftus ; r-help@r-project.org 

Subject: Re: [R] Text showing when R is launched

Às 23:56 de 09/10/2023, George Loftus escreveu:

Good Evening,

I was wondering if you were able to help, I am running R on MacOS, it is the 
2020 model mac so have install the Intel arm of R which I believe is correct

However when I launch R or resume the R window after going on a different 
programme the following text is running

I have also copied and pasted for ease

1   HIToolbox   0x7ff82142e0c2 
_ZN15MenuBarInstance22RemoveAutoShowObserverEv + 30
2   HIToolbox   0x7ff82146a638 
_ZL17BroadcastInternaljPvh + 167
3   SkyLight0x7ff81c70f23d 
_ZN12_GLOBAL__N_123notify_datagram_handlerEj15CGSDatagramTypePvmS1_ + 1030
4   SkyLight0x7ff81ca2205a 
_ZN21CGSDatagramReadStream26dispatchMainQueueDatagramsEv + 202
5   SkyLight0x7ff81ca21f81 
___ZN21CGSDatagramReadStream15mainQueueWakeupEv_block_invoke + 18
6   libdispatch.dylib   0x7ff8178867fb 
_dispatch_call_block_and_release + 12
7   libdispatch.dylib   0x7ff817887a44 
_dispatch_client_callout + 8
8   libdispatch.dylib   0x7ff8178947b9 
_dispatch_main_queue_drain + 952
9   libdispatch.dylib   0x7ff8178943f3 
_dispatch_main_queue_callback_4CF + 31
10  CoreFoundation  0x7ff817b215f0 
__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
11  CoreFoundation  0x7ff817ae1b70 __CFRunLoopRun + 2454
12  CoreFoundation  0x7ff817ae0b60 CFRunLoopRunSpecific 
+ 560
13  HIToolbox   0x7ff82142e766 
RunCurrentEventLoopInMode + 292
14  HIToolbox   0x7ff82142e576 
ReceiveNextEventCommon + 679
15  HIToolbox   0x7ff82142e2b3 
_BlockUntilNextEventMatchingListInModeWithFilter + 70
16  AppKit  0x7ff81ac31293 _DPSNextEvent + 909
17  AppKit  0x7ff81ac30114 
-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] 
+ 1219
18  R   0x000103d60c76 -[RController 
doProcessEvents:] + 166
19  R   0x000103d5b295 -[RController 
handleReadConsole:] + 149
20  R   0x000103d6466f Re_ReadConsole + 175
21  libR.dylib  0x000104442154 R_ReplDLLdo1 + 148
22  R   0x000103d71c47 run_REngineRmainloop 
+ 263
23  R   0x000103d66d5f -[REngine runREPL] + 
143
24  R   0x000103d56718 main + 792
25  dyld0x7ff8176d4310 start + 2432
1   HIToolbox   0x7ff8214a1726 
_ZN15MenuBarInstance22EnsureAutoShowObserverEv + 102
2   HIToolbox   0x7ff82146a638 
_ZL17BroadcastInternaljPvh + 167
3   SkyLight0x7ff81c70f23d 
_ZN12_GLOBAL__N_123notify_datagram_handlerEj15CGSDatagramTypePvmS1_ + 1030
4   SkyLight0x7ff81ca2205a 
_ZN21CGSDatagramReadStream26dispatchMainQueueDatagramsEv + 202
5   SkyLight0x7ff81ca21f81 
___ZN21CGSDatagramReadStream15mainQueueWakeupEv_block_invoke + 18
6   libdispatch.dylib   0x7ff8178867fb 
_dispatch_call_block_and_release + 12
7   libdispatch.dylib   0x7ff817887a44 
_dispatch_client_callout + 8
8   libdispatch.dylib   0x7ff8178947b9 
_dispatch_main_queue_drain + 952
9   libdispatch.dylib   0x7ff8178943f3 
_dispatch_main_queue_callback_4CF + 31
10  CoreFoundation  0x7ff817b215f0 
__CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
11  CoreFoundation  0x7ff817ae1b70 __CFRunLoopRun + 2454
12  CoreFoundation  0x7ff817ae0b60 CFRunLoopRunSpecific 
+ 560

Re: [R] Text showing when R is launched

2023-10-10 Thread Rui Barradas
   0x000103d71c47 run_REngineRmainloop 
+ 263
23  R   0x000103d66d5f -[REngine runREPL] + 
143
24  R   0x000103d56718 main + 792
25  dyld0x7ff8176d4310 start + 2432

Are you able to inform me what is causing this? I can't seem to find any online 
help regarding this

Thankyou in advance,
George Loftus


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Try deleting file

/Users/admin/.RData


It is restoring the previous session and this is many times a source for 
problems.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?

2023-10-06 Thread Rui Barradas

Às 10:09 de 06/10/2023, Chris Evans via R-help escreveu:
The reason I am asking is that I would like to mark areas on a plot 
using geom_polygon() and aes(fill = variable) to fill various polygons 
forming the background of a plot with different colours. Then I would 
like to overlay that with points representing direction of change: 
improved, no reliable change, deteriorated. The obvious symbols to use 
for those three directions are an upward arrow, a circle or square and a 
downward pointing arrow.  There is a solid upward point triangle symbol 
in R (ph = 17) and there are both upward and downward pointing open 
triangle symbols (pch 21 and 25) but to fill those with a solid colour 
so they will be visible over the background requires that I use a fill 
aesthetic and that gets me a mess with the legend as I will have used a 
different fill mapping to fill the polygons.  This silly reprex shows 
the issue I think.


library(tidyverse)
tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> tmpTibPoints
tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> 
tmpTibArea1
tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> 
tmpTibArea2
tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> 
tmpTibArea3
tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> 
tmpTibArea4

bind_rows(tmpTibArea1,
   tmpTibArea2,
   tmpTibArea3,
   tmpTibArea4) -> tmpTibAreas
ggplot(data = tmpTib,
    aes(x = x, y = y)) +
   geom_polygon(data = tmpTibAreas,
    aes(x = x, y = y, fill = a)) +
   geom_point(data = tmpTibPoints,
  aes(x = x, y = y, fill = c),
  pch = 24,
  size = 6)

Does anyone know a way to create a solid downward pointing symbol?  Or 
another workaround?


TIA,

Chris


Hello,

Maybe you can solve the problem with unicode characters.
See the two scale_*_manual at the end of the plot.



# Unicode characters for black up- and down-pointing characters
pts_shapes <- c("\U25B2", "\U25BC") |> setNames(c("A", "B"))
pts_colors <- c("blue", "red") |> setNames(c("A", "B"))

ggplot(data = tmpTibAreas,
   aes(x = x, y = y)) +
  geom_polygon(data = tmpTibAreas,
   aes(x = x, y = y, fill = a)) +
  geom_point(data = tmpTibPoints,
 aes(x = x, y = y, color = c, shape = c),
 size = 6) +
  scale_shape_manual(values = pts_shapes) +
  scale_color_manual(values = pts_colors)




--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R issue / No buffer space available

2023-10-05 Thread Rui Barradas

Às 21:28 de 04/10/2023, Ohad Oren, MD escreveu:

Hello,

I keep getting the following message about 'no buffer space available'. I
am using R studio via connection to server. I verified that the connection
to the server is good.

2023-10-04T20:26:25.698193Z [rsession-oo968] ERROR system error 105
(No buffer space available) [host: localhost, uri: /log_message, path:
/var/run/rstudio-server/rstudio-rserver/rserver-monitor.socket];
OCCURRED AT void
rstudio::core::http::LocalStreamAsyncClient::handleConnect(const
rstudio_boost::system::error_code&)
src/cpp/session/SessionModuleContext.cpp:124


Will appreciate your help!

Ohad

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

RStudio is an IDE for R, not R itself.
That is a RStudio error and RStudio technical support [1] is better 
suited to solve your problem.


[1] https://community.rstudio.com/

Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] annotate

2023-10-05 Thread Rui Barradas

Às 20:34 de 04/10/2023, Subia Thomas OI-US-LIV5 escreveu:

Colleagues,

I wish to create y-data labels which meet a criterion.

Here is my reproducible code.
library(dplyr)
library(ggplot2)
library(cowplot)

above_92 <- filter(faithful,waiting>92)

ggplot(faithful,aes(x=eruptions,y=waiting))+
   geom_point(shape=21,size=3,fill="orange")+
   theme_cowplot()+
   geom_hline(yintercept = 92)+
   
annotate(geom="text",x=above_92$eruptions,y=above_92$waiting+2,label=above_92$waiting)

A bit of trial and error is required to figure out what number to add or 
subtract to above_92$waiting.

Is there a more efficient way to do this?


Thomas Subia
Lean Six Sigma Senior Practitioner

DRÄXLMAIER Group
DAA Draexlmaier Automotive of America LLC

mailto:thomas.su...@draexlmaier.com
http://www.draexlmaier.com

"Nous croyons en Dieu.
Tous les autres doivent apporter des données.
Edward Deming


Public: All rights reserved. Distribution to third parties allowed.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hdello,

Yes, there is an automatic way of doing this.
Use a new data set in geom_text or annotate. Below I use geom_text.
Then vjust will take care of the labels placement.




library(dplyr)
library(ggplot2)
library(cowplot)

above_92 <- filter(faithful, waiting > 92)

ggplot(faithful, aes(x = eruptions, y = waiting)) +
  geom_point(shape=21,size=3,fill="orange") +
  geom_hline(yintercept = 92) +
  # use a new data argument here
  geom_text(
data = above_92,
mapping = aes(x = eruptions, y = waiting, label = waiting),
vjust = -1
  ) +
  theme_cowplot()




Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Jim Lemon RIP

2023-10-04 Thread Rui Barradas



My sympathies for your loss.
Jim Lemon was a dedicated contributor to the R community and his answers 
were always welcome.

Jim will be missed.

Rui Barradas

Às 23:36 de 04/10/2023, Jim Lemon escreveu:

Hello,
I am very sad to let you know that my husband Jim died on 18th September. I
apologise for not letting you know earlier but I had trouble finding the
password for his phone.
Kind regards,
Juel






--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping by Date and showing count of failures by date

2023-09-30 Thread Rui Barradas

Às 21:29 de 29/09/2023, Paul Bernal escreveu:

Dear friends,

Hope you are doing great. I am attaching the dataset I am working with
because, when I tried to dput() it, I was not able to copy the entire
result from dput(), so I apologize in advance for that.

I am interested in creating a column named Failure_Date_Period that has the
FAILDATE but formatted as _MM. Then I want to count the number of
failures (given by column WONUM) and just have a dataframe that has the
FAILDATE and the count of WONUM.

I tried this:
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt <- PivotTable$new()
pt$addData(failuredf)
pt$addColumnDataGroups("FAILDATE")
pt$defineCalculation(calculationName = "FailCounts",
summariseExpression="n()")
pt$renderPivot()

but I was not successful. Bottom line, I need to create a new dataframe
that has the number of failures by FAILDATE, but in -MM format.

Any help and/or guidance will be greatly appreciated.

Kind regards,
Paul
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

No data is attached. Maybe try

dput(head(failuredf, 30))

?

And where can we find non-base PivotTable? Please start the scripts with 
calls to library() when using non-base functionality.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict function type class vs. prob

2023-09-23 Thread Rui Barradas

Às 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu:

Dear R Help Team,

My research group and I use R scripts for our multivariate data screening 
routines. During routine use, we encountered some inconsistencies within the 
predict() function of the R Stats Package. Through internal research, we were 
unable to find the reason for this and have decided to contact your help team 
with the following issue:

The predict() function is used once to predict the class membership of a new sample (type = 
"class") on a trained linear SVM model for distinguishing two classes (using the caret 
package). It is then used to also examine the probability of class membership (type = 
"prob"). Both are then presented in an R shiny output. Within the routine, we noticed two 
samples (out of 100+) where the class prediction and probability prediction did not match. The 
prediction probabilities of one class (52%) did not match the class membership within the predict 
function. We use the same seed and the discrepancy is reproducible in this sample. The same problem 
did not occur in other trained models (lda, random forest, radial SVM...).

Is there a weighing of classes within the prediction function or is the 
classification limit not at 50%/a majority vote? Or do you have another 
explanation for this discrepancy, please let us know.

PS: If this is an issue based on the model training function of the caret 
package and therefore not your responsibility, please let us know.

Thank you in advance for your support!

Yours sincerely,
Sabine Milbert

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

I cannot tell what is going on but I would like to make a correction to 
your post.


predict() is a generic function with methods for objects of several 
classes in many packages. In base package stats you will find methods 
for objects (fits) of class lm, glm and others, see ?predict.


The method you are asking about is predict.train, defined in package 
caret, not in package stats.

to see what predict method is being called, check


class(your_fit)


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hadamard transformation

2023-09-18 Thread Rui Barradas

Às 18:45 de 18/09/2023, mohan radhakrishnan escreveu:

Hello,

I am attempting to port the R code which is an answer to
https://codegolf.stackexchange.com/questions/194229/implement-the-2d-hadamard-transform


function(M){for(i in 1:log2(nrow(M)))T=T%x%matrix(1-2*!3:0,2)/2; print(T);
T%*%M%*%T}

The code, 3 inputs and the corresponding outputs are shown in
https://tio.run/##PYyxCsIwFEX3fkUcAu@VV7WvcSl2dOwi8QNqNSXQJhAqrYjfHoOIwz3D4XBDNOJYiGgerp@td9Diy/gAVlgnynr0A4MLfkkeUTdarnLq5mBXKAvON1W9J8YdZ1rmsk3T72jgV/TAVBHTAROYrs/00@jz5YSY/aOSFKmvGP1yD9sk4Wa7ARSSRowf

These are the inputs.

f(matrix(c(2,3,2,5),2,2,byrow=TRUE))
f(matrix(1,4,4))
f(lower.tri(diag(4),T))

My attempt to port this R code to another framework(Tensorflow) was only
partially successful
because I didn't fully understand the cryptic R code. The second input
shown above works after
hacking Tensorflow for a long time.

My question is this. Can anyone code this in a clear way so that I can
understand ? I understand
Kronecker Product and matrix multiplication and can port that code but I am
missing something as the same ported code does not work for all inputs.

Thanks,
Mohan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

Is this what you want?
(I have changed the notation a bit.)


H <- function(M){
  H0 <- 1
  Transf <- matrix(c(1, 1, 1, -1), 2L)
  for(i in 1:log2(nrow(M))) {
H0 <- H0 %x% Transf/2
  }
  H0 %*% M %*% H0
}

x <- matrix(c(2, 3, 2, 5), 2, 2, byrow = TRUE)
y <- matrix(1, 4, 4)
z <- lower.tri(diag(4), TRUE)
z[] <- apply(z, 2, as.integer)
H(x)
H(y)
H(z)



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with plotting and date-times for climate data

2023-09-12 Thread Rui Barradas

Às 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu:

Hello,

I'm trying to calculate the mean temperature max from a file of climate
date, and plot it over a range of days in the year. I've downloaded the
data, and cleaned it up the way I think it should be. However, when I
plot it, the geom_smooth line doesn't show up. I think that's because
my x axis is characters or factors. Here's what I have so far:

library(tidyverse)

data <- read_csv("Ely_MN_Weather.csv")

start_day = yday(as_date("2023-09-22"))
end_day = yday(as_date("2023-10-15"))

d <- as_tibble(data) %>%

 select(DATE,TMAX,TMIN) %>%
 mutate(DATE = as_date(DATE),
yday = yday(DATE),
md = sprintf("%02d-%02d", month(DATE), mday(DATE))
) %>%
 filter(yday >= start_day & yday <= end_day) %>%
 mutate(md = as.factor(md))

d_sum <- d %>%
 group_by(md) %>%
 summarize(tmax_mean = mean(TMAX, na.rm=TRUE))

## Here's the filtered data:
dput(d_sum)


structure(list(md = structure(1:25, levels = c("09-21", "09-22",

"09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
"09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06",
"10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13",
"10-14", "10-15"), class = "factor"), tmax_mean = c(65,
62.2,
61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9,
61.2, 63.7, 59.5, 59.6, 61.6,
59.4, 58.8, 55.9, 58.125,
58, 55.7, 57, 55.4, 49.8,
48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L))



ggplot(data = d_sum, aes(x = md)) +
 geom_point(aes(y = tmax_mean, color = "blue")) +
 geom_smooth(aes(y = tmax_mean, color = "blue"))
=
My questions are:
1. Why isn't my geom_smooth plotting? How can I fix it?
2. I don't think I'm handling the month and day combination correctly.
Is there a way to encode month and day (but not year) as a date?
3. (Minor point) Why does my graph of tmax_mean come out red when I
specify "blue"?

Thanks for any advice or guidance you can offer. I really appreciate
the expertise of this group.

-Kevin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

The problem is that the dates are factors, not real dates. And 
geom_smooth is not interpolating along a discrete axis (the x axis).


Paste a fake year with md, coerce to date and plot.
I have simplified the aes() calls and added a date scale in order to 
make the x axis more readable.


Without the formula and method arguments, geom_smooth will print a 
message, they are now made explicit.




suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
})

d_sum %>%
  mutate(md = paste("2023", md, sep = "-"),
 md = as.Date(md)) %>%
  ggplot(aes(x = md, y = tmax_mean)) +
  geom_point(color = "blue") +
  geom_smooth(
formula = y ~ x,
method = loess,
color = "blue"
  ) +
  scale_x_date(date_breaks = "7 days", date_labels = "%m-%d")



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graph in R with grouping letters from the turkey test with agricolae package

2023-09-12 Thread Rui Barradas

Às 16:24 de 12/09/2023, Loop Vinyl escreveu:

I would like to produce the attached graph (graph1) with the R package
agricolae, could someone give me an example with the attached data (data)?

I expect an adapted graph (graph2) with the data (data)

Best regards


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

There are no attached graphs, only data.
Can you post the code have you tried?

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prop.trend.test

2023-09-08 Thread Rui Barradas

Às 10:06 de 08/09/2023, peter dalgaard escreveu:

Yes, this was written a bit bone-headed (as I am allowed to say...)

If you look at the code, you will see inside:

 a <- anova(lm(freq ~ score, data = list(freq = x/n, score = 
as.vector(score)),
 weights = w))

and the lm() inside should give you the direction via the sign of the regression 
coefficient on "score".
  
So, at least for now, you could just doctor a copy of the code for your own purposes, as in


  fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)),
 weights = w)
  a <- anova(fit)
  
and arrange to return coef(fit)["score"] at the end. Something like structure(... estimate=c(lpm.slope=coef(fit)["score"]) )


(I expect that you might also extract the t-statistic from coef(summary(fit)) 
and find that it is the signed square root of the Chi-square, but I won't have 
time to test that just now.)

-pd


On 8 Sep 2023, at 07:22 , Thomas Subia via R-help  wrote:

Colleagues,

Thanks all for the responses.

I am monitoring the daily total number of defects per sample unit.
I need to know whether this daily defect proportion is trending upward (a bad 
thing for a manufacturing process).

My first thought was to use either a u or a u' control chart for this.
As far as I know, u or u' charts are poor to detect drifts.

This is why I chose to use prop.trend.test to detect trends in proportions.

While prop.trend.test can confirm the existence of a trend, as far as I know, 
it is left to the user
to determine what direction that trend is.

One way to illustrate trending is of course to plot the data and use 
geom_smooth and method lm
For the non-statisticians in my group, I've found that using this method along 
with the p-value of prop.trend.test, makes it easier for the users to determine 
the existence of trending and its direction.

If there are any other ways to do this, please let me know.

Thomas Subia












On Thursday, September 7, 2023 at 10:31:27 AM PDT, Rui Barradas 
 wrote:





Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu:


Colleagues

Consider
smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

prop.trend.test(smokers, patients)

Output:

Chi-squared Test for Trend inProportions

data:  smokers out of patients ,

using scores: 1 2 3 4

X-squared = 8.2249, df = 1, p-value = 0.004132

# trend test for proportions indicates proportions aretrending.

How does one identify the direction of trending?
# prop.test indicates that the proportions are unequal but doeslittle to 
indicate trend direction.
All the best,
Thomas Subia


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

By visual inspection it seems that there is a decreasing trend.
Note that the sample estimates of prop.test and smokers/patients are equal.


smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

prop.test(smokers, patients)$estimate
#>prop 1prop 2prop 3prop 4
#> 0.9651163 0.9677419 0.9485294 0.8536585

smokers/patients

#> [1] 0.9651163 0.9677419 0.9485294 0.8536585

plot(smokers/patients, type = "b")



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Hello,

Actually, the t-statistic is not the signed square root of the X-squared 
test statistic. I have edited the function, assigned the lm fit and 
returned it as is. (print.htest won't print this new list member so the 
output is not cluttered with irrelevant noise.)



smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

edit(prop.trend.test, file = "ptt.R")
source("ptt.R")

# stats::prop.trend.test edited to include the results
# of the lm fit and saved under a new name
ptt <- function (x, n, score = seq_along(x))
{
  method <- "Chi-squared Test for Trend in Proportions"
  dname <- paste(deparse1(substitute(x)), "out of", 
deparse1(substitute(n)),

 ",\n using scores:", paste(score, collapse = " "))
  x <- as.vector(x)
  n <- as.vector(n)
  p <- sum(x)/sum(n)
  w <- n/p/(1 - p)
  a <- anova(fit <- lm(freq ~ score, data = list(freq = x/n, score = 
as.vector(score)),

   weights = w))
  chisq <- c(`X-squared` = a["score", "Sum Sq"])
  s

Re: [R] prop.trend.test

2023-09-07 Thread Rui Barradas

Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu:


Colleagues

  Consider
smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

  prop.trend.test(smokers, patients)

  Output:

  Chi-squared Test for Trend inProportions

  data:  smokers out of patients ,

using scores: 1 2 3 4

X-squared = 8.2249, df = 1, p-value = 0.004132

  # trend test for proportions indicates proportions aretrending.

  How does one identify the direction of trending?
  # prop.test indicates that the proportions are unequal but doeslittle to 
indicate trend direction.
All the best,
Thomas Subia


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

By visual inspection it seems that there is a decreasing trend.
Note that the sample estimates of prop.test and smokers/patients are equal.


smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )

prop.test(smokers, patients)$estimate
#>prop 1prop 2prop 3prop 4
#> 0.9651163 0.9677419 0.9485294 0.8536585
smokers/patients
#> [1] 0.9651163 0.9677419 0.9485294 0.8536585

plot(smokers/patients, type = "b")



Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regarding error in RStudio

2023-09-05 Thread Rui Barradas

Às 17:59 de 05/09/2023, Sukriti Sood escreveu:

Hi,

I am Sukriti Sood, a research analyst at Woodstock Institute 
<https://woodstockinst.org/> . I use RStudio extensively for our analysis. I 
have been facing two issues for a while:


   1.  I am unable to copy from RStudio and paste into or vice versa to any 
other programs.
   2.  I am facing some kind of a conversion error (screenshot attached).

I tried looking up online however could not find a resolution to these issues. 
Could I please get some help with this urgently.

Thanks!

Best,
Sukriti Sood

Sukriti Sood | Research Analyst
Woodstock Institute
Pronouns: She/Her/Hers
67 East Madison, Suite 2108 | Chicago, Illinois 60603
O (312) 368-0310 x2029 | C (610) 604-6708
www.woodstockinst.org<http://www.woodstockinst.org/> | 
ss...@woodstockinst.org<mailto:ss...@woodstockinst.org>




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You should post RStudio questions to the RStudio support service, they 
answer quickly and the answers are generally good.


It's written at the bottom of the attached image that the workspace was 
loaded from file



C:/WSI/.RData


Close RStudio, remove this file and restart. See if it solved it.

Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge and replace data

2023-09-05 Thread Rui Barradas

Às 09:55 de 05/09/2023, roslinazairimah zakaria escreveu:

Hi all,

I have these data

x1 <- c(116,0,115,137,127,0,0)
x2 <- c(0,159,0,0,0,159,127)

I want : xx <- c(116,115,137,127,159, 127)

I would like to merge these data into one column. Whenever the data is '0'
it will be replaced by the value in the column which is non zero..
I tried append and merge but fail to get what I want.


Hello,

That's a case for ?pmax:


x1 <- c(116,0,115,137,127,0,0)
x2 <- c(0,159,0,0,0,159,127)
pmax(x1, x2)
#> [1] 116 159 115 137 127 159 127


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate formula - differing results

2023-09-04 Thread Rui Barradas

Às 12:51 de 04/09/2023, Ivan Calandra escreveu:

Thanks Rui for your help; that would be one possibility indeed.

But am I the only one who finds that behavior of aggregate() completely 
unexpected and confusing? Especially considering that dplyr::summarise() 
and doBy::summaryBy() deal with NAs differently, even though they all 
use mean(na.rm = TRUE) to calculate the group stats.


Best wishes,
Ivan

On 04/09/2023 13:46, Rui Barradas wrote:

Às 10:44 de 04/09/2023, Ivan Calandra escreveu:

Dear useRs,

I have just stumbled across a behavior in aggregate() that I cannot 
explain. Any help would be appreciated!


Sample data:
my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 14.88, 13.86, 
NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 
4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 56), 
THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), 
WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT 
= c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 
2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = "data.frame")


1) Simple aggregation with 2 variables:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, 
na.rm = TRUE)


2) Using the dot notation - different results:
aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE)

3) Using dplyr, I get the same results as #1:
group_by(my_data, RAWMAT) %>%
   summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))

4) It gets weirder: using all columns in #1 give the same results as 
in #2 but different from #1 and #3
aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, 
PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = 
mean, na.rm = TRUE)


So it seems it is not only due to the notation (cbind() vs. dot). Is 
it a bug? A peculiar thing in my dataset? I tend to think this could 
be due to some variables (or their names) as all notations seem to 
agree when I remove some variables (although I haven't found out 
which variable(s) is (are) at fault), e.g.:


my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", 
"FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", 
"HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 
130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = 
c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 
1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 
54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 
67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 
35.297), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", 
"HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), 
row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), 
class = "data.frame")


aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, 
data = my_data2, FUN = mean, na.rm = TRUE)


aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE)

group_by(my_data2, RAWMAT) %>%
   summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))


Thank you in advance for any hint.
Best wishes,
Ivan




 *LEIBNIZ-ZENTRUM*
*FÜR ARCHÄOLOGIE*

*Dr. Ivan CALANDRA*
**Head of IMPALA (IMaging Platform At LeizA)

*MONREPOS* Archaeological Research Centre, Schloss Monrepos
56567 Neuwied, Germany

T: +49 2631 9772 243
T: +49 6131 8885 543
ivan.calan...@leiza.de

leiza.de <http://www.leiza.de/>
<http://www.leiza.de/>
ORCID <https://orcid.org/-0003-3816-6359>
ResearchGate
<https://www.researchgate.net/profile/Ivan_Calandra>

LEIZA is a foundation under public law of the State of 
Rhineland-Palatinate and the City of Mainz. Its headquarters are in 
Mainz. Supervision is carried out by the Ministry of Science and 
Health of the State of Rhineland-Palatinate. LEIZA is a research 
museum of the Leibniz Association.

_

Re: [R] aggregate formula - differing results

2023-09-04 Thread Rui Barradas
A vals in at least one column and the results are the same.


However, this will not give the mean values of the other numeric 
columns, just of those two.




# define a vector of columns of interest
cols <- c("Length", "Width", "RAWMAT")

# 1) Simple aggregation with 2 variables, select cols:
aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data[cols], FUN = 
mean, na.rm = TRUE)


# 2) Using the dot notation - if cols are selected, equal results:
aggregate(. ~ RAWMAT, data = my_data[cols], FUN = mean, na.rm = TRUE)

# 3) Using dplyr, the results are now the same results as #1 and #2:
my_data %>%
  select(all_of(cols)) %>%
  group_by(RAWMAT) %>%
  summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE)))


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >