Re: [R] question

2023-01-30 Thread PIKAL Petr
Hallo Carolyn

>From what you describe you cannot calculate correlations.

You stated that you have two sets of data, one for December and one for
March and that rows in one set is not related to the rows in another set and
even persons tested in both months do not have their values on the same row.
In that case cor is not appropriate. You should first adjust your data so
that results of those 3 persons are on the same row but even after that only
those 3 values could be evaluated by "cor".

>From what you wrote I think that t.test or similar beast is the way you
should take.

But without same data sample I may be wrong.

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Carolyn J Miller
via
> R-help
> Sent: Monday, January 30, 2023 7:16 PM
> To: r-help@r-project.org
> Subject: [R] question
> 
> Hi guys,
> 
> I am using the cor() function to see if there are correlations between
March
> cortisol levels and December cortisol levels and I'm trying to figure out
if the
> function is doing what I want it to do.
> 
> Each sample has it's own separate row in the CSV file that I'm working out
of.
> March Cort and December Cort are different columns and they come from
> separate samples, therefore their values would not be on the same row.
There
> are only 3 individuals that have both December cort values and March
cortisol
> values but they still have different sample ID values (from different
seasons) so
> they are also not on the same row.
> 
>  I ran the function twice: once as cor(cortphcor, use = "complete.obs")
first
> 
> and then cor(cortphcor, use = "pairwise.complete.obs", method =
"pearson").
> 
> I received the same output both times. I guess what I'm asking is, is the
output
> simply the correlation just for those 3 samples or is the second pairwise.
> complete.obs version giving me the correlation for all of the cort samples
for
> March against all of the samples for December despite not being on the
same
> row? I'm trying to figure out how many sample values are contributing to
the
> correlation results I'm getting.
> 
> Thanks,
> 
> Carolyn
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question

2023-01-30 Thread Ebert,Timothy Aaron
Can you please show us a small sample of your data? The first 5 or 10 lines 
should be good enough.
Tim

-Original Message-
From: R-help  On Behalf Of Carolyn J Miller via 
R-help
Sent: Monday, January 30, 2023 1:16 PM
To: r-help@r-project.org
Subject: [R] question

[External Email]

Hi guys,

I am using the cor() function to see if there are correlations between March 
cortisol levels and December cortisol levels and I'm trying to figure out if 
the function is doing what I want it to do.

Each sample has it's own separate row in the CSV file that I'm working out of. 
March Cort and December Cort are different columns and they come from separate 
samples, therefore their values would not be on the same row. There are only 3 
individuals that have both December cort values and March cortisol values but 
they still have different sample ID values (from different seasons) so they are 
also not on the same row.

 I ran the function twice: once as cor(cortphcor, use = "complete.obs") first

and then cor(cortphcor, use = "pairwise.complete.obs", method = "pearson").

I received the same output both times. I guess what I'm asking is, is the 
output simply the correlation just for those 3 samples or is the second 
pairwise. complete.obs version giving me the correlation for all of the cort 
samples for March against all of the samples for December despite not being on 
the same row? I'm trying to figure out how many sample values are contributing 
to the correlation results I'm getting.

Thanks,

Carolyn


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu%7Cdd5773a499934ef0fd3a08db032d0447%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638107264114367125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=YxPstnNkhcAvtY7SdBGpW3LFFubZl2WKaEuc29qh40Y%3D=0
PLEASE do read the posting guide 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7Cdd5773a499934ef0fd3a08db032d0447%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638107264114367125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=OxrAXBA2yjPy%2B94tKkiQ34adp%2BPrNPcpp2SE81ZEUZ4%3D=0
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question

2023-01-30 Thread Carolyn J Miller via R-help
Hi guys,

I am using the cor() function to see if there are correlations between March 
cortisol levels and December cortisol levels and I'm trying to figure out if 
the function is doing what I want it to do.

Each sample has it's own separate row in the CSV file that I'm working out of. 
March Cort and December Cort are different columns and they come from separate 
samples, therefore their values would not be on the same row. There are only 3 
individuals that have both December cort values and March cortisol values but 
they still have different sample ID values (from different seasons) so they are 
also not on the same row.

 I ran the function twice: once as cor(cortphcor, use = "complete.obs") first

and then cor(cortphcor, use = "pairwise.complete.obs", method = "pearson").

I received the same output both times. I guess what I'm asking is, is the 
output simply the correlation just for those 3 samples or is the second 
pairwise. complete.obs version giving me the correlation for all of the cort 
samples for March against all of the samples for December despite not being on 
the same row? I'm trying to figure out how many sample values are contributing 
to the correlation results I'm getting.

Thanks,

Carolyn


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Covid Mutations: Cumulative?

2023-01-30 Thread Leonard Mada via R-help

Dear R-Users,

Did anyone follow more closely the SARS Cov-2 lineages?

I have done a quick check of Cov-2 mutations on the list downloaded from 
NCBI (see GitHub page below); but it seems that the list contains the 
cumulative mutations only for B.1 => B.1.1, but not after the B.1.1 branch:

# B.1 => B.1.1 seems cumulative
diff.lineage("B.1.1", "B.1", data=z)
# but B.1.1 => B.1.1.529 is NOT cumulative anymore;
diff.lineage("B.1.1.529", "B.1.1", data=z)
diff.lineage("B.1.1.529", "BA.2", data=z)
diff.lineage("B.1.1.529", "BA.5", data=z)

# Column id: B(oth) = present in both lineages:
    V   Mutation    P    AA Pos AAi AAm Polymorphism id
899 B.1.1 nsp3:F106F nsp3 F106F 106   F F TRUE  B
900 B.1.1 RdRp:P323L RdRp P323L 323   P L    FALSE  B
901 B.1.1    S:D614G    S D614G 614   D G    FALSE  B
902 B.1.1    N:R203K    N R203K 203   R K    FALSE  1
903 B.1.1    N:R203R    N R203R 203   R R TRUE  1
904 B.1.1    N:G204R    N G204R 204   G R    FALSE  1
896   B.1 nsp3:F106F nsp3 F106F 106   F F TRUE  B
897   B.1 RdRp:P323L RdRp P323L 323   P L    FALSE  B
898   B.1    S:D614G    S D614G 614   D G    FALSE  B
# B.1.1.529 and branches do not have any of the defining mutations of B.1.1;

I have uploaded the code on GitHub:
https://github.com/discoleo/R/blob/master/Stat/Infx/Cov2.Variants.R

1.) Does anyone have a better picture of what is going on?
The sub-variants should have cumulative mutations. This should be the 
logic for the sub-lineages and I deduce it also by the data/post on the 
GitHub pango page:

https://github.com/cov-lineages/pango-designation/issues/361


2.) Cumulative List

It maybe that NCBI kept only the new mutations, as the number of 
mutations increased.



Does anyone know if there is a full cumulative list?

Alternatively, there might be a list or package with the full lineage 
encoding. There is a list on the Pango GitHub project, but I hope to 
skip at least this step; the synonyms in the NCBI file seem uglier to 
process.



Note:

This question may be more oriented towards Bioconductor; but I haven't 
found any real Covid packages on Bioconductor.



Thank you very much for any help.


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.