Re: [R] How to represent the effect of one covariate on regression results?
Hi David, thanks for the useful insight I did of course wrote to plink user group but no answer there. I guess they are more concerned about how to run commands with plink as oppose to interpret results. What I can tell about my cohort is that about 80% of cases had Type 2 diabetes while about 8% had Type 1. (my TD covariate is reference for the type of diabetes) In the attach is the description of the data. Cheers, Ana On Tue, Sep 15, 2020 at 7:59 PM David Winsemius wrote: > > > On 9/15/20 8:57 AM, Ana Marija wrote: > > Hi Abby and David, > > > > Thanks for the useful tips! I will check those. > > > > I completed the regression analysis in plink (as R would be very slow > > for my sample size) but as I mentioned I need to determine the > > influence of a specific covariate in my results and Plink is of no > > help there. > > > > I did Pearson correlation analysis for P values which I got in > > regression with and without my covariate of interest and I got this: > > > >> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95) > > Pearson's product-moment correlation > > > > data: tt$P_TD and tt$P_noTD > > t = 20.17, df = 283, p-value < 2.2e-16 > > alternative hypothesis: true correlation is not equal to 0 > > 95 percent confidence interval: > > 0.7156134 0.8117108 > > sample estimates: > >cor > > 0.7679493 > > > > I can see the p values are very correlated in those two instances. Can > > I conclude that my covariate then doesn't have a huge effect or what > > kind of conclusion I can draw from that? > > > I do not think it follows from the correlation of p-values that your > covariate "does not have a huge effect". P-values are not really data, > although they are random values. A simulation study of this would > require a much better description of the original dataset. Again, that > is something that the users of Plink are more likely to be able to > intuit than are we. I still do not see why this question is not being > addressed to the users of the software from which you are deriving your > "data". > > > -- > > David. > > > > > Thanks for all your help > > Ana > > > > > > > > On Tue, Sep 15, 2020 at 1:26 AM David Winsemius > > wrote: > >> There is a user-group for PLINK, easily found by looking at the page you > >> cited. This is not the correct place to submit such questions. > >> > >> > >> https://groups.google.com/g/plink2-users?pli=1 > >> > >> > >> -- > >> > >> David. > >> > >> On 9/14/20 6:29 AM, Ana Marija wrote: > >>> Hello, > >>> > >>> I was running association analysis using --glm genotypic from: > >>> https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > >>> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > >>> result looks like this: > >>> > >>> #CHROMPOSIDREFALTA1TESTOBS_CTBETA > >>> SEZ_OR_F_STATPERRCODE > >>> 10135434303rs11101905GAAADD11863 > >>> -0.1107330.0986981-1.121930.261891. > >>> 10135434303rs11101905GAADOMDEV11863 > >>> 0.0797970.1110040.7188680.47. > >>> 10135434303rs11101905GAAsex=Female > >>> 11863-0.1204040.0536069-2.246050.0247006. > >>> 10135434303rs11101905GAAage11863 > >>> 0.005245010.003915281.339630.180367. > >>> 10135434303rs11101905GAAPC111863 > >>> -0.01917790.0166868-1.149280.25044. > >>> 10135434303rs11101905GAAPC211863 > >>> -0.02699390.0173086-1.559570.118863. > >>> 10135434303rs11101905GAAPC311863 > >>> 0.01152070.01680760.6854480.493061. > >>> 10135434303rs11101905GAAPC411863 > >>> 9.57832e-050.01246070.00768680.993867. > >>> 10135434303rs11101905GAAPC511863 > >>> -0.001910470.00543937-0.351230.725416. > >>> 10135434303rs11101905GAAPC611863 > >>> -0.01033090.0159879-0.6461720.518168. > >>> 10135434303rs11101905GAAPC711863 > >>> 0.007909970.01440250.5492070.582863. > >>> 10135434303rs11101905GAAPC811863 > >>> -0.002056390.0142709-0.1440960.885424. > >>> 10135434303rs11101905GAAPC911863 > >>> -0.008737710.0057239-1.526530.126878. > >>> 10135434303rs11101905GAAPC1011863 > >>> 0.01161970.01238260.9383880.348045. > >>> 10135434303rs11101905GAATD11863 > >>> -0.6700260.0962216-6.963373.32228e-12. > >>> 10135434303rs11101905GAAarray=Biobank > >>> 118630.160666
Re: [R] How to represent the effect of one covariate on regression results?
On 9/15/20 8:57 AM, Ana Marija wrote: Hi Abby and David, Thanks for the useful tips! I will check those. I completed the regression analysis in plink (as R would be very slow for my sample size) but as I mentioned I need to determine the influence of a specific covariate in my results and Plink is of no help there. I did Pearson correlation analysis for P values which I got in regression with and without my covariate of interest and I got this: cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95) Pearson's product-moment correlation data: tt$P_TD and tt$P_noTD t = 20.17, df = 283, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7156134 0.8117108 sample estimates: cor 0.7679493 I can see the p values are very correlated in those two instances. Can I conclude that my covariate then doesn't have a huge effect or what kind of conclusion I can draw from that? I do not think it follows from the correlation of p-values that your covariate "does not have a huge effect". P-values are not really data, although they are random values. A simulation study of this would require a much better description of the original dataset. Again, that is something that the users of Plink are more likely to be able to intuit than are we. I still do not see why this question is not being addressed to the users of the software from which you are deriving your "data". -- David. Thanks for all your help Ana On Tue, Sep 15, 2020 at 1:26 AM David Winsemius wrote: There is a user-group for PLINK, easily found by looking at the page you cited. This is not the correct place to submit such questions. https://groups.google.com/g/plink2-users?pli=1 -- David. On 9/14/20 6:29 AM, Ana Marija wrote: Hello, I was running association analysis using --glm genotypic from: https://www.cog-genomics.org/plink/2.0/assoc with these covariates: sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The result looks like this: #CHROMPOSIDREFALTA1TESTOBS_CTBETA SEZ_OR_F_STATPERRCODE 10135434303rs11101905GAAADD11863 -0.1107330.0986981-1.121930.261891. 10135434303rs11101905GAADOMDEV11863 0.0797970.1110040.7188680.47. 10135434303rs11101905GAAsex=Female 11863-0.1204040.0536069-2.246050.0247006. 10135434303rs11101905GAAage11863 0.005245010.003915281.339630.180367. 10135434303rs11101905GAAPC111863 -0.01917790.0166868-1.149280.25044. 10135434303rs11101905GAAPC211863 -0.02699390.0173086-1.559570.118863. 10135434303rs11101905GAAPC311863 0.01152070.01680760.6854480.493061. 10135434303rs11101905GAAPC411863 9.57832e-050.01246070.00768680.993867. 10135434303rs11101905GAAPC511863 -0.001910470.00543937-0.351230.725416. 10135434303rs11101905GAAPC611863 -0.01033090.0159879-0.6461720.518168. 10135434303rs11101905GAAPC711863 0.007909970.01440250.5492070.582863. 10135434303rs11101905GAAPC811863 -0.002056390.0142709-0.1440960.885424. 10135434303rs11101905GAAPC911863 -0.008737710.0057239-1.526530.126878. 10135434303rs11101905GAAPC1011863 0.01161970.01238260.9383880.348045. 10135434303rs11101905GAATD11863 -0.6700260.0962216-6.963373.32228e-12. 10135434303rs11101905GAAarray=Biobank 118630.1606660.0736312.182050.0291062. 10135434303rs11101905GAAHBA1C11863 0.02659330.0016875815.75836.0236e-56. 10135434303rs11101905GAAGENO_2DF11863 NANA0.7265140.483613. This results is shown just for one ID (rs11101905) there is about 2 million of those in the resulting file. My question is how do I present/plot the effect of covariate "TD" in the example it has "P" equal to 3.32228e-12 for all IDs in the resulting file so that I show how much effect covariate "TD" has on the analysis. Should I run another regression without covariate "TD" and than do scatter plot of P values with and without "TD" covariate or there is a better way to do this from the data I already have? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
Re: [R] How to represent the effect of one covariate on regression results?
> My question is how do I present/plot the effect of covariate "TD" in > the example it has "P" equal to 3.32228e-12 for all IDs in the > resulting file so that I show how much effect covariate "TD" has on > the analysis. Should I run another regression without covariate "TD" I'll take a second shot in the dark: There is R^2, and a number of generalizations. (The most common of which, is probably adjusted R^2). And there are various other goodness of fit tests. https://en.wikipedia.org/wiki/Goodness_of_fit https://en.wikipedia.org/wiki/Coefficient_of_determination You could fit two models (one with a particular variable included, and one without), and compare how the statistic changes. However, I'm probably going to get told off, for going off-topic. So, unless any further questions are specific to R programming, I don't think I'm going to contribute further. Also, I'd recommend you read some notes on statistical modelling, or consult an expert, or both. And I suspect there are additional considerations modelling genetic data. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to represent the effect of one covariate on regression results?
Hi Abby and David, Thanks for the useful tips! I will check those. I completed the regression analysis in plink (as R would be very slow for my sample size) but as I mentioned I need to determine the influence of a specific covariate in my results and Plink is of no help there. I did Pearson correlation analysis for P values which I got in regression with and without my covariate of interest and I got this: > cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95) Pearson's product-moment correlation data: tt$P_TD and tt$P_noTD t = 20.17, df = 283, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7156134 0.8117108 sample estimates: cor 0.7679493 I can see the p values are very correlated in those two instances. Can I conclude that my covariate then doesn't have a huge effect or what kind of conclusion I can draw from that? Thanks for all your help Ana On Tue, Sep 15, 2020 at 1:26 AM David Winsemius wrote: > > There is a user-group for PLINK, easily found by looking at the page you > cited. This is not the correct place to submit such questions. > > > https://groups.google.com/g/plink2-users?pli=1 > > > -- > > David. > > On 9/14/20 6:29 AM, Ana Marija wrote: > > Hello, > > > > I was running association analysis using --glm genotypic from: > > https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > > result looks like this: > > > > #CHROMPOSIDREFALTA1TESTOBS_CTBETA > >SEZ_OR_F_STATPERRCODE > > 10135434303rs11101905GAAADD11863 > > -0.1107330.0986981-1.121930.261891. > > 10135434303rs11101905GAADOMDEV11863 > > 0.0797970.1110040.7188680.47. > > 10135434303rs11101905GAAsex=Female > > 11863-0.1204040.0536069-2.246050.0247006. > > 10135434303rs11101905GAAage11863 > > 0.005245010.003915281.339630.180367. > > 10135434303rs11101905GAAPC111863 > > -0.01917790.0166868-1.149280.25044. > > 10135434303rs11101905GAAPC211863 > > -0.02699390.0173086-1.559570.118863. > > 10135434303rs11101905GAAPC311863 > > 0.01152070.01680760.6854480.493061. > > 10135434303rs11101905GAAPC411863 > > 9.57832e-050.01246070.00768680.993867. > > 10135434303rs11101905GAAPC511863 > > -0.001910470.00543937-0.351230.725416. > > 10135434303rs11101905GAAPC611863 > > -0.01033090.0159879-0.6461720.518168. > > 10135434303rs11101905GAAPC711863 > > 0.007909970.01440250.5492070.582863. > > 10135434303rs11101905GAAPC811863 > > -0.002056390.0142709-0.1440960.885424. > > 10135434303rs11101905GAAPC911863 > > -0.008737710.0057239-1.526530.126878. > > 10135434303rs11101905GAAPC1011863 > > 0.01161970.01238260.9383880.348045. > > 10135434303rs11101905GAATD11863 > > -0.6700260.0962216-6.963373.32228e-12. > > 10135434303rs11101905GAAarray=Biobank > > 118630.1606660.0736312.182050.0291062. > > 10135434303rs11101905GAAHBA1C11863 > > 0.02659330.0016875815.75836.0236e-56. > > 10135434303rs11101905GAAGENO_2DF11863 > >NANA0.7265140.483613. > > > > This results is shown just for one ID (rs11101905) there is about 2 > > million of those in the resulting file. > > > > My question is how do I present/plot the effect of covariate "TD" in > > the example it has "P" equal to 3.32228e-12 for all IDs in the > > resulting file so that I show how much effect covariate "TD" has on > > the analysis. Should I run another regression without covariate "TD" > > and than do scatter plot of P values with and without "TD" covariate > > or there is a better way to do this from the data I already have? > > > > Thanks > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
Re: [R] How to represent the effect of one covariate on regression results?
There is a user-group for PLINK, easily found by looking at the page you cited. This is not the correct place to submit such questions. https://groups.google.com/g/plink2-users?pli=1 -- David. On 9/14/20 6:29 AM, Ana Marija wrote: Hello, I was running association analysis using --glm genotypic from: https://www.cog-genomics.org/plink/2.0/assoc with these covariates: sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The result looks like this: #CHROMPOSIDREFALTA1TESTOBS_CTBETA SEZ_OR_F_STATPERRCODE 10135434303rs11101905GAAADD11863 -0.1107330.0986981-1.121930.261891. 10135434303rs11101905GAADOMDEV11863 0.0797970.1110040.7188680.47. 10135434303rs11101905GAAsex=Female 11863-0.1204040.0536069-2.246050.0247006. 10135434303rs11101905GAAage11863 0.005245010.003915281.339630.180367. 10135434303rs11101905GAAPC111863 -0.01917790.0166868-1.149280.25044. 10135434303rs11101905GAAPC211863 -0.02699390.0173086-1.559570.118863. 10135434303rs11101905GAAPC311863 0.01152070.01680760.6854480.493061. 10135434303rs11101905GAAPC411863 9.57832e-050.01246070.00768680.993867. 10135434303rs11101905GAAPC511863 -0.001910470.00543937-0.351230.725416. 10135434303rs11101905GAAPC611863 -0.01033090.0159879-0.6461720.518168. 10135434303rs11101905GAAPC711863 0.007909970.01440250.5492070.582863. 10135434303rs11101905GAAPC811863 -0.002056390.0142709-0.1440960.885424. 10135434303rs11101905GAAPC911863 -0.008737710.0057239-1.526530.126878. 10135434303rs11101905GAAPC1011863 0.01161970.01238260.9383880.348045. 10135434303rs11101905GAATD11863 -0.6700260.0962216-6.963373.32228e-12. 10135434303rs11101905GAAarray=Biobank 118630.1606660.0736312.182050.0291062. 10135434303rs11101905GAAHBA1C11863 0.02659330.0016875815.75836.0236e-56. 10135434303rs11101905GAAGENO_2DF11863 NANA0.7265140.483613. This results is shown just for one ID (rs11101905) there is about 2 million of those in the resulting file. My question is how do I present/plot the effect of covariate "TD" in the example it has "P" equal to 3.32228e-12 for all IDs in the resulting file so that I show how much effect covariate "TD" has on the analysis. Should I run another regression without covariate "TD" and than do scatter plot of P values with and without "TD" covariate or there is a better way to do this from the data I already have? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to represent the effect of one covariate on regression results?
I'm wondering if you want one of these: (1) Plots of "Main Effects". (2) "Partial Residual Plots". Search for them, and you should be able to tell if they're what you want. But a word of warning: Many people (including many senior statisticians) misinterpret this kind of information. Because, it's always the effect of xj on Y, while holding the other variables *constant*. That's not as simple as it sounds, and people have a tendency of disregarding the importance of the second half of that sentence, in their final interpretations. P.S. John Fox, announced a package with support for Regression Diagnostics, about 11 days ago: https://stat.ethz.ch/pipermail/r-help/2020-September/468609.html I'm not sure how relevant it is to your question, but I just glanced at the vignette, and it's pretty slick... On Tue, Sep 15, 2020 at 1:30 AM Ana Marija wrote: > > Hello, > > I was running association analysis using --glm genotypic from: > https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > result looks like this: > > #CHROMPOSIDREFALTA1TESTOBS_CTBETA > SEZ_OR_F_STATPERRCODE > 10135434303rs11101905GAAADD11863 > -0.1107330.0986981-1.121930.261891. > 10135434303rs11101905GAADOMDEV11863 > 0.0797970.1110040.7188680.47. > 10135434303rs11101905GAAsex=Female > 11863-0.1204040.0536069-2.246050.0247006. > 10135434303rs11101905GAAage11863 > 0.005245010.003915281.339630.180367. > 10135434303rs11101905GAAPC111863 > -0.01917790.0166868-1.149280.25044. > 10135434303rs11101905GAAPC211863 > -0.02699390.0173086-1.559570.118863. > 10135434303rs11101905GAAPC311863 > 0.01152070.01680760.6854480.493061. > 10135434303rs11101905GAAPC411863 > 9.57832e-050.01246070.00768680.993867. > 10135434303rs11101905GAAPC511863 > -0.001910470.00543937-0.351230.725416. > 10135434303rs11101905GAAPC611863 > -0.01033090.0159879-0.6461720.518168. > 10135434303rs11101905GAAPC711863 > 0.007909970.01440250.5492070.582863. > 10135434303rs11101905GAAPC811863 > -0.002056390.0142709-0.1440960.885424. > 10135434303rs11101905GAAPC911863 > -0.008737710.0057239-1.526530.126878. > 10135434303rs11101905GAAPC1011863 > 0.01161970.01238260.9383880.348045. > 10135434303rs11101905GAATD11863 > -0.6700260.0962216-6.963373.32228e-12. > 10135434303rs11101905GAAarray=Biobank > 118630.1606660.0736312.182050.0291062. > 10135434303rs11101905GAAHBA1C11863 > 0.02659330.0016875815.75836.0236e-56. > 10135434303rs11101905GAAGENO_2DF11863 > NANA0.7265140.483613. > > This results is shown just for one ID (rs11101905) there is about 2 > million of those in the resulting file. > > My question is how do I present/plot the effect of covariate "TD" in > the example it has "P" equal to 3.32228e-12 for all IDs in the > resulting file so that I show how much effect covariate "TD" has on > the analysis. Should I run another regression without covariate "TD" > and than do scatter plot of P values with and without "TD" covariate > or there is a better way to do this from the data I already have? > > Thanks > Ana > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to represent the effect of one covariate on regression results?
Hello, I was running association analysis using --glm genotypic from: https://www.cog-genomics.org/plink/2.0/assoc with these covariates: sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The result looks like this: #CHROMPOSIDREFALTA1TESTOBS_CTBETA SEZ_OR_F_STATPERRCODE 10135434303rs11101905GAAADD11863 -0.1107330.0986981-1.121930.261891. 10135434303rs11101905GAADOMDEV11863 0.0797970.1110040.7188680.47. 10135434303rs11101905GAAsex=Female 11863-0.1204040.0536069-2.246050.0247006. 10135434303rs11101905GAAage11863 0.005245010.003915281.339630.180367. 10135434303rs11101905GAAPC111863 -0.01917790.0166868-1.149280.25044. 10135434303rs11101905GAAPC211863 -0.02699390.0173086-1.559570.118863. 10135434303rs11101905GAAPC311863 0.01152070.01680760.6854480.493061. 10135434303rs11101905GAAPC411863 9.57832e-050.01246070.00768680.993867. 10135434303rs11101905GAAPC511863 -0.001910470.00543937-0.351230.725416. 10135434303rs11101905GAAPC611863 -0.01033090.0159879-0.6461720.518168. 10135434303rs11101905GAAPC711863 0.007909970.01440250.5492070.582863. 10135434303rs11101905GAAPC811863 -0.002056390.0142709-0.1440960.885424. 10135434303rs11101905GAAPC911863 -0.008737710.0057239-1.526530.126878. 10135434303rs11101905GAAPC1011863 0.01161970.01238260.9383880.348045. 10135434303rs11101905GAATD11863 -0.6700260.0962216-6.963373.32228e-12. 10135434303rs11101905GAAarray=Biobank 118630.1606660.0736312.182050.0291062. 10135434303rs11101905GAAHBA1C11863 0.02659330.0016875815.75836.0236e-56. 10135434303rs11101905GAAGENO_2DF11863 NANA0.7265140.483613. This results is shown just for one ID (rs11101905) there is about 2 million of those in the resulting file. My question is how do I present/plot the effect of covariate "TD" in the example it has "P" equal to 3.32228e-12 for all IDs in the resulting file so that I show how much effect covariate "TD" has on the analysis. Should I run another regression without covariate "TD" and than do scatter plot of P values with and without "TD" covariate or there is a better way to do this from the data I already have? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.