Re: [R] [External] Re: Error in setwd(dir) when initializing R
Thank you all for your help. I was better with Rgui. I have the feeling that it was me, doing something wrong when I uninstalled Onedrive from both computers... I will talk with the IT in the university. Again, thank you for your interest. I have learnt a lot. El mar, 21 nov 2023 a las 19:41, Michael Dewey () escribió: > Dear Ana > > When you installed R it gave you an option to put a shortcut on the > desktop. If you did that then you can start the R GUI by clicking on it. > You may need to customise it for your preferences. Specifically it > starts R with an option cd-user-docs which I delete and I then alter the > starting option to start in the directory I want. You may have different > preferences of course. > > Try Rgui as Ivan suggests and report back what happened. > > You may have to ask your local IT support as if this is something the > Complutense is setting up it is surprising that all your collegues are > free of problems. Have you asked around? > > Michael > > On 21/11/2023 12:41, Ana de las Heras Molina wrote: > > Hello, > > > > I am sorry for my ignorance, but what is Rgui.exe? > > > > El mar, 21 nov 2023 a las 12:06, Ivan Krylov () > > escribió: > > > >> В Tue, 21 Nov 2023 10:51:59 +0100 > >> Ana de las Heras Molina пишет: > >> > >>> I uninstalled onedrive, I eliminated all the folders and then > >>> reinstalled R and RStudio... but it is RStudio the one creating a > >>> folder called C:\Users\Ana\OneDrive - Universidad Complutense de > >>> Madrid (UCM)\Documentos. > >> > >> Does Rgui.exe use a different home directory from RStudio? Perhaps you > >> need not to reinstall RStudio (which deletes and recreates the program > >> files which aren't damaged) but to wipe the profile (the settings > >> somewhere under %APPDATA% or %LOCALAPPDATA%), but it pays to be extra > >> careful about these directories. People at > >> <https://community.rstudio.com/> may know more about that. > >> > >> I'm afraid I don't know how to disable OneDrive on a Windows 10 > >> installation. If your university has a Microsoft support contract, it > >> could be worth asking the Microsoft representative to fix OneDrive so > >> that temporary files would work on it and telling them the steps to > >> reproduce the problem. > >> > >>> Browse[1]> dir.exists(dir) > >>> [1] TRUE > >>> Browse[1]> setwd(dir) > >>> Error durante el wrapup: no es posible cambiar el directorio de > >>> trabajo > >> > >> What is the value of `dir` at this point? (What does print(dir) say?) > >> Can you open this directory in Windows Explorer? > >> > >> If possible, could you please set your mailer to compose messages in > >> plain text instead of HTML? The plain text versions of your messages > >> that your mailer generates from the HTML messages are somewhat mangled: > >> https://stat.ethz.ch/pipermail/r-help/2023-November/478593.html > >> > >> -- > >> Best regards, > >> Ivan > >> > > > > > > -- > Michael > -- *Ana de las Heras Molina* Nutrición Animal Departamento de Producción Animal Facultad de Veterinaria Universidad Complutense de Madrid *Contacto*: 913943855/anaher...@ucm.es [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [External] Re: Error in setwd(dir) when initializing R
Hello, I am sorry for my ignorance, but what is Rgui.exe? El mar, 21 nov 2023 a las 12:06, Ivan Krylov () escribió: > В Tue, 21 Nov 2023 10:51:59 +0100 > Ana de las Heras Molina пишет: > > > I uninstalled onedrive, I eliminated all the folders and then > > reinstalled R and RStudio... but it is RStudio the one creating a > > folder called C:\Users\Ana\OneDrive - Universidad Complutense de > > Madrid (UCM)\Documentos. > > Does Rgui.exe use a different home directory from RStudio? Perhaps you > need not to reinstall RStudio (which deletes and recreates the program > files which aren't damaged) but to wipe the profile (the settings > somewhere under %APPDATA% or %LOCALAPPDATA%), but it pays to be extra > careful about these directories. People at > <https://community.rstudio.com/> may know more about that. > > I'm afraid I don't know how to disable OneDrive on a Windows 10 > installation. If your university has a Microsoft support contract, it > could be worth asking the Microsoft representative to fix OneDrive so > that temporary files would work on it and telling them the steps to > reproduce the problem. > > > Browse[1]> dir.exists(dir) > > [1] TRUE > > Browse[1]> setwd(dir) > > Error durante el wrapup: no es posible cambiar el directorio de > > trabajo > > What is the value of `dir` at this point? (What does print(dir) say?) > Can you open this directory in Windows Explorer? > > If possible, could you please set your mailer to compose messages in > plain text instead of HTML? The plain text versions of your messages > that your mailer generates from the HTML messages are somewhat mangled: > https://stat.ethz.ch/pipermail/r-help/2023-November/478593.html > > -- > Best regards, > Ivan > -- *Ana de las Heras Molina* Nutrición Animal Departamento de Producción Animal Facultad de Veterinaria Universidad Complutense de Madrid *Contacto*: 913943855/anaher...@ucm.es [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [External] Re: Error in setwd(dir) when initializing R
Hi, I uninstalled onedrive, I eliminated all the folders and then reinstalled R and RStudio... but it is RStudio the one creating a folder called C:\Users\Ana\OneDrive - Universidad Complutense de Madrid (UCM)\Documentos. This is what I obtained with the debugging Error in setwd(dir) : no es posible cambiar el directorio de trabajo Enter a frame number, or 0 to exit 1: .rs.availablePackages() 2: .rs.onAvailablePackagesStale(reposString) 3: setwd(dir) Selection: 3Called from: .rs.onAvailablePackagesStale(reposString)Browse[1]> dir.exists(dir)[1] TRUEBrowse[1]> setwd(dir)Error durante el wrapup: no es posible cambiar el directorio de trabajo Error: no more error handlers available (recursive errors?); invoking 'abort' restart El mar, 21 nov 2023 a las 9:16, Ivan Krylov () escribió: > On Tue, 21 Nov 2023 08:46:25 +0100 > Ana de las Heras Molina wrote: > > > > traceback() > > 4: setwd(dir) > > 3: .rs.onAvailablePackagesStale(reposString) > > 2: .rs.availablePackages() > > 1: .rs.rpc.discover_package_dependencies("3C994FEC", ".R") > > This is something that RStudio (not R itself!) does, but it shouldn't > be failing. Here's what seems to be failing for you [*]: > ># prepare directory for discovery of available packages >dir <- tempfile("rstudio-available-packages-") >dir.create(dir, showWarnings = FALSE) > ># create a file in that directory >saveRDS(Sys.time(), file = file.path(dir, "time.rds")) ># (This doesn't fail because we keep executing the function, so the ># directory must exist!) > ># move there >owd <- setwd(dir) # this somehow fails > > Is it an option to disable OneDrive for the home directory? It's > clearly doing something terrible to your temporary files. If not, it > should be possible to create a separate temporary directory on your > computer (the path must not contain spaces) and list it in the > .Renviron file in the home directory as the TMPDIR variable: > > TMPDIR=C:/my/R/temp/directory > > See help(Startup) for more information on .Renviron. > > Alternatively, we may try to perform some debugging. If you set > options(error = recover) and run .rs.availablePackages() again, it > should fail and give you a debugger prompt. Choose the deepest option > in the call stack and try to find out the value of the `dir` variable. > Does dir.exists(dir) still return TRUE? Does setwd(dir) still fail? Can > you open the directory in the Windows Explorer? > > The last but not the least, do you see some of the same problems if you > launch Rgui.exe instead of RStudio? > > -- > Best regards, > Ivan > > [*] > > https://github.com/rstudio/rstudio/blob/45af283a7a5853399904ddc438f6cd89d9b5c137/src/cpp/session/modules/SessionPackages.R#L1625-L1636 > -- *Ana de las Heras Molina* Nutrición Animal Departamento de Producción Animal Facultad de Veterinaria Universidad Complutense de Madrid *Contacto*: 913943855/anaher...@ucm.es [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [External] Re: Error in setwd(dir) when initializing R
Hello, Thank you all for your responses. When I initialize R, the folder in which it starts is: getwd() [1] "C:/Users/Ana/OneDrive - Universidad Complutense de Madrid (UCM)/Documentos" So it seems a problem with OneDrive -Kevin: I do not really understand your question... You mean if I set my working directory to my C:// or D:// folder in my computer? It doesn't work either. -Ivan: this is the message I got when running traceback(). So can I move the.Rdata file without affecting the work? -Luke: thank you for the idea, I will try it. Error in setwd(dir) : no es posible cambiar el directorio de trabajo> traceback()4: setwd(dir) 3: .rs.onAvailablePackagesStale(reposString) 2: .rs.availablePackages() 1: .rs.rpc.discover_package_dependencies("3C994FEC", ".R") Best, Ana El mar, 21 nov 2023 a las 0:43, escribió: > On Mon, 20 Nov 2023, Ivan Krylov wrote: > > > On Mon, 20 Nov 2023 12:18:11 +0100 > > Ana de las Heras Molina wrote: > > > >> Error in setwd(dir) : no es posible cambiar el directorio de trabajo > > > > If you run traceback() first thing after getting this error, does it > > say anything useful? (Anything besides "No traceback available" would > > count as useful.) > > > > Do you have a file named .RData in your home directory? If yes, it may > > help to move it away (or remove it if you don't use the saved session). > > Also check for .Rprofile or other profile that might contain a setwd() > call. Starting R with --vanilla should work if the issue is a startup > file. > > Best, > > luke > > > > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics andFax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > -- *Ana de las Heras Molina* Nutrición Animal Departamento de Producción Animal Facultad de Veterinaria Universidad Complutense de Madrid *Contacto*: 913943855/anaher...@ucm.es [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in setwd(dir) when initializing R
Hello, I am Ana de las Heras, and I write to you because every time I open RStudio or R directly I have the following message, before I can do anything at all: Error in setwd(dir) : no es posible cambiar el directorio de trabajo At first I didn't pay much attention to it, but I am having lots of troubles with different programs, including LinDa, ANCOMBC or Genome InfoDbdata (and thus, phyloseq). After asking in the different forus of each program, they have told me that the issue is more related to my R installation and that first message I obtain when I start the program. I would be very grateful if someone could help me. This issue started when I installed the latest version of R (4.3.1. and now 4.3.2.). I have already uninstalled and reinstalled both programs, as well as Rtools. I am not sure if the issue could be related to ONe Drive, since the HOME folder is in OneDrive. I am currently working on Windows 10, 64 bits. Yours faithfully, Ana -- *Ana de las Heras Molina* Nutrición Animal Departamento de Producción Animal Facultad de Veterinaria Universidad Complutense de Madrid *Contacto*: 913943855/anaher...@ucm.es [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] horizontal grouped stacked plots and removing space between bars
I have code like this: data <- read.csv("test1.csv", stringsAsFactors=FALSE, header=TRUE) # Graph myplot=ggplot(data, aes(fill=condition, y=value, x=condition)) + geom_bar(position="dodge", stat="identity", width=0.5) + scale_fill_manual(values=c("#7b3294", "#c2a5cf", "#a6dba0", "#008837"))+ ylab("Performance (ns/day)") + facet_wrap(~specie,nrow=3, labeller = label_wrap_gen(width = 85), strip.position="bottom") + theme_bw() + theme(panel.grid = element_blank(), panel.spacing = unit(0, "mm"), legend.title=element_blank(), axis.title.x = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.ticks.x=element_blank(), axis.title.y = element_text(angle = 90, hjust = 0.5)) myplot + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.title=element_blank(), panel.background = element_blank(), axis.title.x = element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.y = element_text(angle = 90, hjust = 0.5)) And my data is this: > dput(data) structure(list(specie = c("gmx mdrun -gpu_id 1 -ntomp 16 -s benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s MD_15NM_WATER.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s MD_15NM_WATER.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s MD_15NM_WATER.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s MD_15NM_WATER.tpr -nsteps 1"), condition = c("Tesla P100-SYCL", "Tesla V100-SYCL", "Tesla P100-CUDA", "Tesla V100-CUDA", "Tesla P100-SYCL", "Tesla V100-SYCL", "Tesla P100-CUDA", "Tesla V100-CUDA"), value = c(75.8, 77.771, 63.297, 78.046, 34.666, 50.052, 32.07, 59.815)), class = "data.frame", row.names = c(NA, -8L)) How do I: 1. have these two plots next to each other, not on the top of each other 2. How do I remove spaces between those bars? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Issue with crammed Y axis
Hi, I have a data frame like this: > dput(df) structure(list(ID = 1:8, Type = c("gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway", "gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway", "gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps 4000 -resetstep 3000", "gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps 4000 -resetstep 3000", "gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway", "gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway", "gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway -noconfout", "gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway -noconfout" ), Annee = c("SYCL", "CUDA", "SYCL", "CUDA", "SYCL", "CUDA", "SYCL", "CUDA"), Domain.decomp. = c("2. 1", "2", "2. 1", "2. 1", "2.1", "2", "2. 1", "2"), DD.com..load = c(0, 0, 0, 0, 3.7, 3, 0, 0), Neighbor.search = c("3.7", "3. 1", "3.7", "3.9", "0. 1", "O. 1", "3.5", "3. 1"), Launch.PP.GPU.ops. = c("0. 1", "0", "0.2", "0", "1 .6", "1 . 5", "0.2", "0. 1"), Comm..coord. = c("1 .6", "1 .0", "1 .5", "1 .3", "1 .5", "1 .3", "1 . 5", "1 .6"), Force = c("1 . 5", "1 .2", "1 .4", "1 .2", "1 .3", "1 . 1", "1 .5", "1 .2"), Wait...Comm..F = c("1 .3", "1 .7", "1 .2", "1 .0", "66.7", "68.8", "1 .2", "1 .2"), PIE.mesh = c("65.6", "70.9", "61 .0", "61 .4", "0", "0", "67.6", "69.2"), Wait.Bonded.GPU = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), wait.GPU.NB.nonloc. = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Wait.GPU.NB.local = c(0, 0, 0, 0, 7.4, 5.7, 0, 0), NB.X.F.buffer.ops. = c("7.3", "4.4", "6. 7", "5", "0. 1", "0. 1", "7.2", "5.5"), Write.traje = c("0.3", "0.3", "1 .2", "1 .3", "6.4", "6. 1", "O. 1", "0. 1"), Update = c(6.3, 4.3, 5.7, 4.9, 8.2, 9.5, 6.2, 5.6), Constraints = c("8.9", "9.7", "1 1 .6", "13.3", "0.3", "0.4", "8. 1", "9.5"), Comm..energies = c("0.9", "0.9", "3.3", "3.9", "8.4", "8. 5", "0.3", "0.4"), PIE.redist..X.F = c("8. 1", "8.7", "7.9", "7.4", "29.9", "30.1", "8. 1", "8. 1"), PIE.spread = c("29.7", "30.6", "27.2", "29.6", "20.3", "20.2", "30. 1", "30.4"), PIE.gather = c("19.9", "21 .3", "18.7", "19", "6.4", "8.4", "20", "20.6"), PIE.3D.FFT = c("6", "8.6", "5.7", "4.3", "1 .0", "1 .1", "7.6", "8.4"), PIE.3D.FFT.comm. = c("1 .2", "1 .0", "0.9", "0.7", "1 .2", "0. 5", "1 .0", "1 .1"), PIE.solve.Elec = c(0.7, 0.5, 0.6, 0.3, 0.7, 0.5, 0.7, 0.5)), class = "data.frame", row.names = c(NA, -8L)) I am plotting this data with: library(reshape2) library(ggplot2) df <- read.csv("/Users/anamaria/Downloads/B5.csv", stringsAsFactors=FALSE, header=TRUE) df.long<-melt(df,id.vars=c("ID","Type","Annee")) myplot =ggplot(df.long,aes(variable,value,fill=as.factor(Annee)))+ geom_bar(position="dodge",stat="identity")+ ylab("Simulation Progress (%)") + facet_wrap(~Type,nrow=3) myplot + theme(panel.grid.major = element_blank(), legend.title=element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.title.x = element_blank(), axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), axis.line = element_line(colour = "black")) My issue is that Y axis is crammed. How it can be cleaned up and say feature only say these values: 0, 10, 20,30, ...80. I tried using: scale_y_continuous(breaks = breaks_width(10))+ But I got this error: Error in breaks_width(10) : could not find function "breaks_width" Also can anything be done about the subtitle of the top left plot, which is not quite fitting in that gray box: " gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway" Thanks Ana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] log transform a data frame
Thank you so much David, here is correction: d1=suppressWarnings(read.csv("/Users/anamaria/Downloads/B1.csv", stringsAsFactors=FALSE, header=TRUE)) d1$X <- NULL d2=as.matrix(sapply(d1, as.numeric)) pdf("~/graph.pdf") b<-barplot(d2, legend= c("SYCL", "CUDA"), beside= TRUE,las=2,cex.axis=0.7,cex.names=0.7,ylim=c(0,80), col=c("#9e9ac8", "#6a51a3")) dev.off() > dput(head(d1)) structure(list(Domain.decomp. = c("2. 1", "2"), DD.com..load = c(0L, 0L), Neighbor.search = c("3.7", "3. 1"), Launch.PP.GPU.ops. = c("0. 1", "0"), Comm..coord. = c("1 .6", "1 .0"), Force = c("1 . 5", "1 .2" ), Wait...Comm..F = c("1 .3", "1 .7"), PIE.mesh = c(65.6, 70.9 ), Wait.Bonded.GPU = c(0L, 0L), wait.GPU.NB.nonloc. = c(0L, 0L ), Wait.GPU.NB.local = c(0L, 0L), NB.X.F.buffer.ops. = c(7.3, 4.4), Write.traje = c(0.3, 0.3), Update = c(6.3, 4.3), Constraints = c(8.9, 9.7), Comm..energies = c(0.9, 0.9), PIE.redist..X.F = c("8. 1", "8.7"), PIE.spread = c(29.7, 30.6), PIE.gather = c("19.9", "21 .3" ), PIE.3D.FFT = c(6, 8.6), PIE.3D.FFT.comm. = c("1 .2", "1 .0" ), PIE.solve.Elec = c(0.7, 0.5)), row.names = 1:2, class = "data.frame") Now my problem is that when I save my plot as PDF my labels on X axis are cut off. Any advice about that? On Tue, Jun 13, 2023 at 5:14 PM David Carlson wrote: > Your first data column appears to contain character data (e.g. SYCL) which > cannot be converted to numeric. You also appear to have 0's in the numeric > columns which will cause problems since log(0) is -Inf. Barplots are useful > for categorical data, but not continuous, numeric data which are better > handled with box plots or strip charts. > > Do not use printouts of your data since it hides important information. > Use str(a11) and dput(a11) or dput(head(a11)) to provide useful information > about your data. > > David L Carlson > Texas A&M University > > > On Tue, Jun 13, 2023 at 4:08 PM Ana Marija > wrote: > >> Hello, I have a data frame like this: d11=suppressWarnings(read. >> csv("/Users/anamaria/Downloads/B1. csv", stringsAsFactors=FALSE, >> header=TRUE)) > d11 X Domain. decomp. DD. com. . load Neighbor. search >> Launch. PP. GPU. ops. Comm. . coord. 1 SYCL 2. 1 >> ZjQcmQRYFpfptBannerStart >> This Message Is From an External Sender >> This message came from outside your organization. >> >> ZjQcmQRYFpfptBannerEnd >> >> Hello, >> >> I have a data frame like this: >> >> d11=suppressWarnings(read.csv("/Users/anamaria/Downloads/B1.csv", >> stringsAsFactors=FALSE, header=TRUE)) >> >> > d11 >> X Domain.decomp. DD.com..load Neighbor.search Launch.PP.GPU.ops. >> Comm..coord. >> 1 SYCL 2. 10 3.7 0. 1 >> 1 .6 >> 2 CUDA 203. 1 0 >> 1 .0 >> Force Wait...Comm..F PIE.mesh Wait.Bonded.GPU wait.GPU.NB.nonloc. >> 1 1 . 5 1 .3 65.6 0 0 >> 2 1 .2 1 .7 70.9 0 0 >> Wait.GPU.NB.local NB.X.F.buffer.ops. Write.traje Update Constraints >> Comm..energies >> 1 07.3 0.36.3 8.9 >> 0.9 >> 2 04.4 0.34.3 9.7 >> 0.9 >> PIE.redist..X.F PIE.spread PIE.gather PIE.3D.FFT PIE.3D.FFT.comm. >> PIE.solve.Elec >> 18. 1 29.7 19.96.0 1 .2 >>0.7 >> 2 8.7 30.6 21 .38.6 1 .0 >>0.5 >> >> I am trying to log transform the whole data frame, but I get this error: >> >> > d1=log(d11) >> Error in Math.data.frame(d11) : >> non-numeric variable(s) in data frame: X, Domain.decomp., >> Neighbor.search, Launch.PP.GPU.ops., Comm..coord., Force, Wait...Comm..F, >> PIE.redist..X.F, PIE.gather, PIE.3D.FFT.comm >> >> >> My goal is to make a stacked barplot like this: >> d2=as.matrix(sapply(d1, as.numeric)) >> b<-barplot(d2, legend= rownames(data2), beside= >> TRUE,las=2,cex.axis=0.7,cex.names=0.7,ylim=c(0,80), col=c("#9e9ac8", >> "#6a51a3")) >> >> If I don't log transform my code runs. >> >> Please advise, >> Ana >> >> [[alternative HTML version deleted]] >> >> __r-h...@r-project.org mailing >
[R] log transform a data frame
Hello, I have a data frame like this: d11=suppressWarnings(read.csv("/Users/anamaria/Downloads/B1.csv", stringsAsFactors=FALSE, header=TRUE)) > d11 X Domain.decomp. DD.com..load Neighbor.search Launch.PP.GPU.ops. Comm..coord. 1 SYCL 2. 10 3.7 0. 1 1 .6 2 CUDA 203. 1 0 1 .0 Force Wait...Comm..F PIE.mesh Wait.Bonded.GPU wait.GPU.NB.nonloc. 1 1 . 5 1 .3 65.6 0 0 2 1 .2 1 .7 70.9 0 0 Wait.GPU.NB.local NB.X.F.buffer.ops. Write.traje Update Constraints Comm..energies 1 07.3 0.36.3 8.9 0.9 2 04.4 0.34.3 9.7 0.9 PIE.redist..X.F PIE.spread PIE.gather PIE.3D.FFT PIE.3D.FFT.comm. PIE.solve.Elec 18. 1 29.7 19.96.0 1 .2 0.7 2 8.7 30.6 21 .38.6 1 .0 0.5 I am trying to log transform the whole data frame, but I get this error: > d1=log(d11) Error in Math.data.frame(d11) : non-numeric variable(s) in data frame: X, Domain.decomp., Neighbor.search, Launch.PP.GPU.ops., Comm..coord., Force, Wait...Comm..F, PIE.redist..X.F, PIE.gather, PIE.3D.FFT.comm My goal is to make a stacked barplot like this: d2=as.matrix(sapply(d1, as.numeric)) b<-barplot(d2, legend= rownames(data2), beside= TRUE,las=2,cex.axis=0.7,cex.names=0.7,ylim=c(0,80), col=c("#9e9ac8", "#6a51a3")) If I don't log transform my code runs. Please advise, Ana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to do inverse log of every value in every column in data frame
Thank you so much! On Thu, Oct 14, 2021 at 12:17 PM Bert Gunter wrote: > As all of your columns are numeric, you should probably convert your df to > a matrix. Then use exp() on that, of course: > exp(as.matrix(b)) > > see ?exp > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Oct 14, 2021 at 10:10 AM Ana Marija > wrote: > >> Hi All, >> >> I have a data frame like this: >> >> > head(b) >> LRET02LRET04LRET06LRET08LRET10LRET12LRET14 >> 1 0 0.6931472 . 1.0986123 1.0986123 1.0986123 0.6931472 >> 2 2.1972246 2.4849066 2.4849066 . 2.5649494 2.6390573 2.6390573 >> 3 1.6094379 1.7917595 1.6094379 1.7917595 2.0794415 1.9459101 2.0794415 >> 4 0 0 0 0 0 0 0 >> 5 0.6931472 0 1.0986123 1.0986123 0.6931472 0.6931472 0.6931472 >> 6 1.0986123 1.0986123 1.0986123 0.6931472 1.0986123 1.3862944 1.0986123 >> >> All values in this data frame are product of natural log. I have to do >> inverse of it. >> So for example do do inverse of 0.6931472 I would do: >> > 2.718281828^0.6931472 >> [1] 2 >> >> How do I perform this operation for every single value in this data frame? >> >> The original data frame is this dimension: >> > dim(b) >> [1] 1441 18 >> >> Thanks >> Ana >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to do inverse log of every value in every column in data frame
Hi All, I have a data frame like this: > head(b) LRET02LRET04LRET06LRET08LRET10LRET12LRET14 1 0 0.6931472 . 1.0986123 1.0986123 1.0986123 0.6931472 2 2.1972246 2.4849066 2.4849066 . 2.5649494 2.6390573 2.6390573 3 1.6094379 1.7917595 1.6094379 1.7917595 2.0794415 1.9459101 2.0794415 4 0 0 0 0 0 0 0 5 0.6931472 0 1.0986123 1.0986123 0.6931472 0.6931472 0.6931472 6 1.0986123 1.0986123 1.0986123 0.6931472 1.0986123 1.3862944 1.0986123 All values in this data frame are product of natural log. I have to do inverse of it. So for example do do inverse of 0.6931472 I would do: > 2.718281828^0.6931472 [1] 2 How do I perform this operation for every single value in this data frame? The original data frame is this dimension: > dim(b) [1] 1441 18 Thanks Ana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculate power-linear mixed effect model
Thank you so much for that info! On Fri, Sep 17, 2021 at 3:06 PM Bert Gunter wrote: > > Wrong list! Post on r-sig-mixed-models, not here. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Fri, Sep 17, 2021 at 12:22 PM Ana Marija > wrote: > > > > Hi All, > > > > I plan to identify metabolite levels that differ between individuals > > with various retinopathy outcomes (DR or noDR). I plan to model > > metabolite levels using linear mixed models ref as implemented in > > lmm2met software. The model covariates will include: age, sex, SV1, > > SV, and disease_condition. > > > > The random effect is subject variation (ID) > > > > Disease condition is the fixed effect because I am interested in > > metabolite differences between those disease conditions. > > > > This command will build a model for each metabolite: > > fitMet = fitLmm(fix=c('Sex','Age','SV1,'SV2','disease_condition'), > > random='(1|ID)', data=df, start=10) > > > > SV1 and SV2 are surrogate variables (numerical values) > > > > Next I need to calculate the power of my study. Let's say that I have > > 1,172 individuals total in the study, from which 431 are DR. Let's say > > that I would like to determine the power of this study given the > > effect size of 0.337. > > > > I know about SIMR software in R but I am not sure how to apply it to > > my study design. > > > > I looked at this paper: > > https://besjournals.onlinelibrary.wiley.com/doi/10./2041-210X.12504 > > > > But I am not sure how to adapt the code given in the tutorial so that > > it is matching to mine design. > > > > Can you please help, > > > > Thanks > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculate power-linear mixed effect model
Hi All, I plan to identify metabolite levels that differ between individuals with various retinopathy outcomes (DR or noDR). I plan to model metabolite levels using linear mixed models ref as implemented in lmm2met software. The model covariates will include: age, sex, SV1, SV, and disease_condition. The random effect is subject variation (ID) Disease condition is the fixed effect because I am interested in metabolite differences between those disease conditions. This command will build a model for each metabolite: fitMet = fitLmm(fix=c('Sex','Age','SV1,'SV2','disease_condition'), random='(1|ID)', data=df, start=10) SV1 and SV2 are surrogate variables (numerical values) Next I need to calculate the power of my study. Let's say that I have 1,172 individuals total in the study, from which 431 are DR. Let's say that I would like to determine the power of this study given the effect size of 0.337. I know about SIMR software in R but I am not sure how to apply it to my study design. I looked at this paper: https://besjournals.onlinelibrary.wiley.com/doi/10./2041-210X.12504 But I am not sure how to adapt the code given in the tutorial so that it is matching to mine design. Can you please help, Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unable to remove NAs from a data frame
Completely true. Thank you for your help On Thu, Sep 16, 2021 at 12:37 PM Rui Barradas wrote: > Hello, > > You are trying to access elements that do not exist, see the example below: > > > x <- 1:3 > x[5] # beyond the last element > #[1] NA > > dim(df) > #[1] 145092258 > > df[14509227,] # beyond nrow(df) by 2 > > > Hope this helps, > > Rui Barradas > > > Às 15:12 de 16/09/21, Ana Marija escreveu: > > Hi All, > > > > I have lines in file that look like this: > > > >> df[14509227,] > > SNP A1 A2 freq b se p N > > 1: NA NA NA NA NA > > > > data looks like this: > >> head(df) > > SNP A1 A2 freq b se p N > > 1: rs74337086 G A 0.0024460 0.1627 0.1231 0.1865 218792 > > 2: rs76388980 G A 0.0034150 0.1451 0.1047 0.1660 218792 > > ... > >> sapply(df,class) > > SNP A1 A2freq b se > > "character" "character" "character" "numeric" "numeric" "numeric" > >p N > >"numeric" "integer" > > > >> dim(df) > > [1] 145092258 > > > > Tried: > >> df=na.omit(df) > >> dim(df) > > [1] 145092258 > > > > and: > >> library(tidyr) > >> d=df %>% drop_na() > >> dim(d) > > [1] 145092258 > > > > > > Please advise, > > > > Thanks > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unable to remove NAs from a data frame
Hi All, I have lines in file that look like this: > df[14509227,] SNP A1 A2 freq b se p N 1: NA NA NA NA NA data looks like this: > head(df) SNP A1 A2 freq b se p N 1: rs74337086 G A 0.0024460 0.1627 0.1231 0.1865 218792 2: rs76388980 G A 0.0034150 0.1451 0.1047 0.1660 218792 ... > sapply(df,class) SNP A1 A2freq b se "character" "character" "character" "numeric" "numeric" "numeric" p N "numeric" "integer" > dim(df) [1] 145092258 Tried: > df=na.omit(df) > dim(df) [1] 145092258 and: > library(tidyr) > d=df %>% drop_na() > dim(d) [1] 145092258 Please advise, Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in n * rvec : non-numeric argument to binary operator
Hello, I have this code, and when I run it: > kbpowerf() Error in n * rvec : non-numeric argument to binary operator this is the code: function (){ #USER SPECIFICATION PORTION alpha=0.05 #DESIGNATED ALPHA g=3 #NUMBER OF GROUPS nvec=c(25,10,15) #GROUP SIZES beta1vec=c(789.93,122.87,1871.99) #SLOPE COEFFICIENTS sigsq=209460.57 #ERROR VARIANCE tausqvec=c(0.1596,0.1602,0.1360) #PREDICTOR VARIANCES #END OF SPECIFICATION df1<-g-1 numint<-200 l<-numint+1 dd<-1e-6 coevec<-c(1,rep(c(4,2),numint/2-1),4,1) set.seed(2017) repn<-2 kbpowerf<-function(){ df<-sum(nvec)-2*g fcrit<-qf(1-alpha,df1,df) dfkvec<-nvec-1 dfk<-sum(dfkvec) cl<-dd cu<-qchisq(1-dd,dfk) intc<-(cu-cl)/numint cvec<-cl+intc*(0:numint) wcpdf<-(intc/3)*coevec*dchisq(cvec,dfk) dfb1<-cumsum(dfkvec[1:g-1]) dfb2<-dfkvec[2:g] ep<-0 for (i in seq(repn)) { bvec<-rbeta(df1,dfb1/2,dfb2/2) avec<-rep(0,g) avec[1]<-exp(sum(log(bvec))) for (ig in (2:df1)) { avec[ig]<-(1-bvec[ig-1])*exp(sum(log(bvec[ig:df1]))) } avec[g]<-1-bvec[df1] oavec<-tausqvec*avec buw<-sum(oavec*beta1vec)/sum(oavec) lamvec<-cvec*sum(oavec*(beta1vec-buw)^2)/sigsq ep<-ep+sum(wcpdf*pf(fcrit,df1,df,lamvec,lower.tail=FALSE)) } kbpower<-ep/repn } kbpower<-kbpowerf() return(list(sample_size=nvec,attained_power=kbpower)) } Can you please advise, Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to quantify outliers on multi-dimensional scaling plot?
Hello, I am in process of writing a grant where I am explaining my planned methylation analysis using R software "minfi". In the text of the grant I am mentioning looking for samples containing outliers in the multi-dimensional scaling (MDS) plot https://rdrr.io/bioc/minfi/man/mdsPlot.html . My question is: how to more precisely define what is an outlier on a MDS plot? I know it is not logFC like on volcano plot. Is there is a way to quantify distance on MDA plot or it is just sufficient to say that samples that cluster separately are outliers? Here you can find a full workflow for minfi package including making MDS plots: https://www.bioconductor.org/packages/release/workflows/vignettes/methylationArrayAnalysis/inst/doc/methylationArrayAnalysis.html Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] making code (loop) more efficient
Indeed it was the issue with data.table. I converted it to data.frame and it worked like a charm. Thank you so much for your insight! This is the code that worked: library(parallel) library(data.table) library(doSNOW) n <- parallel::detectCores() cl <- parallel::makeCluster(n, type = "SOCK") doSNOW::registerDoSNOW(cl) files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) lst_out <- foreach::foreach(i = seq_along(files), .packages = c("data.table") ) %dopar% { tmp <- get(load(files[i])) a <- data.table::copy(tmp) a=as.data.frame(a) rm(tmp) gc() names <- rownames(a) if("blup" %in% colnames(a)) { data <- data.table(names, a["blup"]) nm1 <- c("rsid", "ref_allele", "eff_allele") data[, (nm1) := tstrsplit(names, ":")[-2]] out <- data[, .(rsid, weight = blup, ref_allele, eff_allele)][, WGT := files[i]][] } else { data <- data.table(names) nm1 <- c("rsid", "ref_allele", "eff_allele") data[, (nm1) := tstrsplit(names, ":")[-2]] out <- data[, .(rsid, ref_allele, eff_allele)][, WGT := files[i]][] } return(out) rm(data) gc() } parallel::stopCluster(cl) big_data <- rbindlist(lst_out, fill = TRUE) On Wed, Dec 16, 2020 at 9:31 AM Ana Marija wrote: > > HI Jim, > > this is what I as running: > > library(parallel) > library(data.table) > library(foreach) > library(doSNOW) > > n <- parallel::detectCores() > cl <- parallel::makeCluster(n, type = "SOCK") > doSNOW::registerDoSNOW(cl) > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) > > lst_out <- foreach::foreach(i = seq_along(files), > .packages = c("data.table") ) %dopar% { > >a <- get(load(files[i])) > namesplit<-strsplit(rownames(a),":") > rsid<-unlist(lapply(namesplit,"[",1)) > ref_allele<-unlist(lapply(namesplit,"[",3)) > eff_allele<-unlist(lapply(namesplit,"[",4)) > WGT<-rep(files[i],length(rsid)) > data<-data.frame(rsid=rsid,weight=a$blup, #weight is "blup" column > ref_allele=ref_allele,eff_allele,WGT=WGT) > >return(data) > } > parallel::stopCluster(cl) > > big_data <- rbindlist(lst_out) > > and i got: > Error in { : task 4 failed - "$ operator is invalid for atomic vectors" > > parallel::stopCluster(cl) > > I uploaded 3 RDat file here if you want to try it > https://github.com/montenegrina/sample > > Thank you for looking into this > Ana > > On Tue, Dec 15, 2020 at 11:45 PM Jim Lemon wrote: > > > > Hi Ana, > > Back on the job. I'm not sure how this will work in your setup, but > > here is a try: > > > > a<-read.table(text="top1 blup lasso enet > > rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 > > rs7978751:187541:G:C 0.62411425 9.934994e-04 00 > > rs2368831:188285:C:T 0.69529158 1.211028e-03 00 > > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00 > > rs1500098:189093:G:C 0.42032471 9.001814e-04 00 > > rs79410690:190097:G:A 0.26194244 5.019037e-04 00", > > header=TRUE,stringsAsFactors=FALSE) > > namesplit<-strsplit(rownames(a),":") > > rsid<-unlist(lapply(namesplit,"[",1)) > > ref_allele<-unlist(lapply(namesplit,"[",3)) > > eff_allele<-unlist(lapply(namesplit,"[",4)) > > # here I'm assuming that the filename > > # is stored in files[i] > > files<-"retina.ENSG0135776.wgt.RDat" > > i<-1 > > WGT<-rep(files[i],length(rsid)) > > data<-data.frame(rsid=rsid,weight=a$top1, > > ref_allele=ref_allele,eff_allele,WGT=WGT) > > data > > > > Note that the output is a data frame, not a data table. I hope that > > the function for creating a data table is close enough to that for a > > data frame for you to work it out. If not I can probably have a look > > at it a bit later. > > > > Jim > > > > On Wed, Dec 16, 2020 at 1:45 PM Ana Marija > > wrote: > > > > > > Hi Jim, > > > > > > as always you're completely right, this is what is happening: > > > > > > > head(a) > > > top1 blup lasso enet > > > rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 > > > rs7978751:187541:G:C 0.62411425 9.934994e-04 00 > > &
Re: [R] making code (loop) more efficient
HI Jim, this is what I as running: library(parallel) library(data.table) library(foreach) library(doSNOW) n <- parallel::detectCores() cl <- parallel::makeCluster(n, type = "SOCK") doSNOW::registerDoSNOW(cl) files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) lst_out <- foreach::foreach(i = seq_along(files), .packages = c("data.table") ) %dopar% { a <- get(load(files[i])) namesplit<-strsplit(rownames(a),":") rsid<-unlist(lapply(namesplit,"[",1)) ref_allele<-unlist(lapply(namesplit,"[",3)) eff_allele<-unlist(lapply(namesplit,"[",4)) WGT<-rep(files[i],length(rsid)) data<-data.frame(rsid=rsid,weight=a$blup, #weight is "blup" column ref_allele=ref_allele,eff_allele,WGT=WGT) return(data) } parallel::stopCluster(cl) big_data <- rbindlist(lst_out) and i got: Error in { : task 4 failed - "$ operator is invalid for atomic vectors" > parallel::stopCluster(cl) I uploaded 3 RDat file here if you want to try it https://github.com/montenegrina/sample Thank you for looking into this Ana On Tue, Dec 15, 2020 at 11:45 PM Jim Lemon wrote: > > Hi Ana, > Back on the job. I'm not sure how this will work in your setup, but > here is a try: > > a<-read.table(text="top1 blup lasso enet > rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 > rs7978751:187541:G:C 0.62411425 9.934994e-04 00 > rs2368831:188285:C:T 0.69529158 1.211028e-03 00 > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00 > rs1500098:189093:G:C 0.42032471 9.001814e-04 00 > rs79410690:190097:G:A 0.26194244 5.019037e-04 00", > header=TRUE,stringsAsFactors=FALSE) > namesplit<-strsplit(rownames(a),":") > rsid<-unlist(lapply(namesplit,"[",1)) > ref_allele<-unlist(lapply(namesplit,"[",3)) > eff_allele<-unlist(lapply(namesplit,"[",4)) > # here I'm assuming that the filename > # is stored in files[i] > files<-"retina.ENSG0135776.wgt.RDat" > i<-1 > WGT<-rep(files[i],length(rsid)) > data<-data.frame(rsid=rsid,weight=a$top1, > ref_allele=ref_allele,eff_allele,WGT=WGT) > data > > Note that the output is a data frame, not a data table. I hope that > the function for creating a data table is close enough to that for a > data frame for you to work it out. If not I can probably have a look > at it a bit later. > > Jim > > On Wed, Dec 16, 2020 at 1:45 PM Ana Marija > wrote: > > > > Hi Jim, > > > > as always you're completely right, this is what is happening: > > > > > head(a) > > top1 blup lasso enet > > rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 > > rs7978751:187541:G:C 0.62411425 9.934994e-04 00 > > rs2368831:188285:C:T 0.69529158 1.211028e-03 00 > > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00 > > rs1500098:189093:G:C 0.42032471 9.001814e-04 00 > > rs79410690:190097:G:A 0.26194244 5.019037e-04 00 > > >names <- rownames(a) > > >data <- data.table(names, a["blup"]) > > > head(data) > >names V2 > > 1: rs4980905:184404:C:A NA > > 2: rs7978751:187541:G:C NA > > 3: rs2368831:188285:C:T NA > > 4: rs12830904:188335:T:A NA > > 5: rs1500098:189093:G:C NA > > 6: rs79410690:190097:G:A NA > > > > So my goal is to transform what is in "a" to this for every RDat file: > > > > rsidweight ref_allele eff_allele > > 1: rs72763981 9.376766e-09 C G > > 2: rs144383755 -2.093346e-09 A G > > 3: rs1925717 1.511376e-08 T C > > 4: rs61827307 -1.625302e-08 C A > > 5: rs61827308 -1.625302e-08 G C > > 6: rs199623136 -9.128354e-10 GC G > >WGT > > 1: retina.ENSG0135776.wgt.RDat > > 2: retina.ENSG0135776.wgt.RDat > > 3: retina.ENSG0135776.wgt.RDat > > 4: retina.ENSG0135776.wgt.RDat > > 5: retina.ENSG0135776.wgt.RDat > > 6: retina.ENSG0135776.wgt.RDat > > > > so from rs4980905:184404:C:A I would take rs4980905 to be in column > > "rsid", C in column "ref_allele" and A to be in column "eff_allele", > > WGT column would just be filled with a name of the particular RDat > > file. > > > > So the issue is in these lines: > > > > a <- get(load(files[i])) > > names <- rowname
Re: [R] making code (loop) more efficient
Hi Jim, as always you're completely right, this is what is happening: > head(a) top1 blup lasso enet rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 rs7978751:187541:G:C 0.62411425 9.934994e-04 00 rs2368831:188285:C:T 0.69529158 1.211028e-03 00 rs12830904:188335:T:A 0.92793158 -9.143555e-05 00 rs1500098:189093:G:C 0.42032471 9.001814e-04 00 rs79410690:190097:G:A 0.26194244 5.019037e-04 00 >names <- rownames(a) >data <- data.table(names, a["blup"]) > head(data) names V2 1: rs4980905:184404:C:A NA 2: rs7978751:187541:G:C NA 3: rs2368831:188285:C:T NA 4: rs12830904:188335:T:A NA 5: rs1500098:189093:G:C NA 6: rs79410690:190097:G:A NA So my goal is to transform what is in "a" to this for every RDat file: rsidweight ref_allele eff_allele 1: rs72763981 9.376766e-09 C G 2: rs144383755 -2.093346e-09 A G 3: rs1925717 1.511376e-08 T C 4: rs61827307 -1.625302e-08 C A 5: rs61827308 -1.625302e-08 G C 6: rs199623136 -9.128354e-10 GC G WGT 1: retina.ENSG0135776.wgt.RDat 2: retina.ENSG0135776.wgt.RDat 3: retina.ENSG0135776.wgt.RDat 4: retina.ENSG0135776.wgt.RDat 5: retina.ENSG0135776.wgt.RDat 6: retina.ENSG0135776.wgt.RDat so from rs4980905:184404:C:A I would take rs4980905 to be in column "rsid", C in column "ref_allele" and A to be in column "eff_allele", WGT column would just be filled with a name of the particular RDat file. So the issue is in these lines: a <- get(load(files[i])) names <- rownames(a) data <- data.table(names, a["blup"]) nm1 <- c("rsid", "ref_allele", "eff_allele") any idea how I can rewrite this? On Tue, Dec 15, 2020 at 8:30 PM Jim Lemon wrote: > > Hi Ana, > I would look at "data" in your second example and see if it contains a > column named "blup" or just the values that were extracted from > a$blup. Also, I assume that weight=blup looks for an object named > "blup", which may not be there. > > Jim > > On Wed, Dec 16, 2020 at 1:20 PM Ana Marija > wrote: > > > > Hi Jim, > > > > Maybe my post is confusing. > > > > so "dd" came from my slow code and I don't use it again in parallelized > > code. > > > > So for example for one of my files: > > > > if > > i="retina.ENSG0120647.wgt.RDat" > > > a <- get(load(i)) > > > head(a) > > top1 blup lasso enet > > rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 > > rs7978751:187541:G:C 0.62411425 9.934994e-04 00 > > rs2368831:188285:C:T 0.69529158 1.211028e-03 00 > > ... > > > > Slow code was posted just to show what was running very slow and it > > was running. I really need help fixing parallelized version. > > > > On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon wrote: > > > > > > Hi Ana, > > > My guess is that in your second code fragment you are assigning the > > > rownames of "a" and the _values_ contained in a$blup to the data.table > > > "data". As I don't have much experience with data tables I may be > > > wrong, but I suspect that the column name "blup" may not be visible or > > > even present in "data". I don't see it in "dd" above this code > > > fragment. > > > > > > Jim > > > > > > On Wed, Dec 16, 2020 at 11:12 AM Ana Marija > > > wrote: > > > > > > > > Hello, > > > > > > > > I made a terribly inefficient code which runs forever but it does run. > > > > > > > > library(dplyr) > > > > library(splitstackshape) > > > > > > > > datalist = list() > > > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) > > > > > > > > for(i in files) > > > > { > > > > a<-get(load(i)) > > > > names <- rownames(a) > > > > data <- as.data.frame(cbind(names,a)) > > > > rownames(data) <- NULL > > > > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"), > > > > seps = ":")) > > > > dd=select(dd,names_1,blup,names_3,names_4) > > > > colnames(dd)=c("rsid","weight","ref_allele","
Re: [R] making code (loop) more efficient
Hi Jim, Maybe my post is confusing. so "dd" came from my slow code and I don't use it again in parallelized code. So for example for one of my files: if i="retina.ENSG0120647.wgt.RDat" > a <- get(load(i)) > head(a) top1 blup lasso enet rs4980905:184404:C:A 0.07692622 -1.881795e-04 00 rs7978751:187541:G:C 0.62411425 9.934994e-04 00 rs2368831:188285:C:T 0.69529158 1.211028e-03 00 ... Slow code was posted just to show what was running very slow and it was running. I really need help fixing parallelized version. On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon wrote: > > Hi Ana, > My guess is that in your second code fragment you are assigning the > rownames of "a" and the _values_ contained in a$blup to the data.table > "data". As I don't have much experience with data tables I may be > wrong, but I suspect that the column name "blup" may not be visible or > even present in "data". I don't see it in "dd" above this code > fragment. > > Jim > > On Wed, Dec 16, 2020 at 11:12 AM Ana Marija > wrote: > > > > Hello, > > > > I made a terribly inefficient code which runs forever but it does run. > > > > library(dplyr) > > library(splitstackshape) > > > > datalist = list() > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) > > > > for(i in files) > > { > > a<-get(load(i)) > > names <- rownames(a) > > data <- as.data.frame(cbind(names,a)) > > rownames(data) <- NULL > > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"), > > seps = ":")) > > dd=select(dd,names_1,blup,names_3,names_4) > > colnames(dd)=c("rsid","weight","ref_allele","eff_allele") > > dd$WGT<-i > > datalist[[i]] <- dd # add it to your list > > } > > > > big_data = do.call(rbind, datalist) > > > > There is 17345 RDat files this loop has to go through. And each file > > has approximately 10,000 lines. All RDat files can be downloaded from > > here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and > > they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz . > > And subset of 3 of those .RDat files is here: > > https://github.com/montenegrina/sample > > > > For one of those files, say i="retina.ENSG0135776.wgt.RDat" > > dd looks like this: > > > > > head(dd) > > rsidweight ref_allele eff_allele > > 1: rs72763981 9.376766e-09 C G > > 2: rs144383755 -2.093346e-09 A G > > 3: rs1925717 1.511376e-08 T C > > 4: rs61827307 -1.625302e-08 C A > > 5: rs61827308 -1.625302e-08 G C > > 6: rs199623136 -9.128354e-10 GC G > >WGT > > 1: retina.ENSG0135776.wgt.RDat > > 2: retina.ENSG0135776.wgt.RDat > > 3: retina.ENSG0135776.wgt.RDat > > 4: retina.ENSG0135776.wgt.RDat > > 5: retina.ENSG0135776.wgt.RDat > > 6: retina.ENSG0135776.wgt.RDat > > > > so on attempt to parallelize this I did this: > > > > library(parallel) > > library(data.table) > > library(foreach) > > library(doSNOW) > > > > n <- parallel::detectCores() > > cl <- parallel::makeCluster(n, type = "SOCK") > > doSNOW::registerDoSNOW(cl) > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) > > > > lst_out <- foreach::foreach(i = seq_along(files), > > .packages = c("data.table") ) %dopar% { > > > >a <- get(load(files[i])) > >names <- rownames(a) > >data <- data.table(names, a["blup"]) > >nm1 <- c("rsid", "ref_allele", "eff_allele") > >data[, (nm1) := tstrsplit(names, ":")[-2]] > >return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][, > >WGT := files[i]][]) > > } > > parallel::stopCluster(cl) > > > > big_data <- rbindlist(lst_out) > > > > I am getting this Error: > > > > Error in { : task 7 failed - "object 'blup' not found" > > > parallel::stopCluster(cl) > > > > Can you please advise, > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] making code (loop) more efficient
Hello, I made a terribly inefficient code which runs forever but it does run. library(dplyr) library(splitstackshape) datalist = list() files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) for(i in files) { a<-get(load(i)) names <- rownames(a) data <- as.data.frame(cbind(names,a)) rownames(data) <- NULL dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"), seps = ":")) dd=select(dd,names_1,blup,names_3,names_4) colnames(dd)=c("rsid","weight","ref_allele","eff_allele") dd$WGT<-i datalist[[i]] <- dd # add it to your list } big_data = do.call(rbind, datalist) There is 17345 RDat files this loop has to go through. And each file has approximately 10,000 lines. All RDat files can be downloaded from here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz . And subset of 3 of those .RDat files is here: https://github.com/montenegrina/sample For one of those files, say i="retina.ENSG0135776.wgt.RDat" dd looks like this: > head(dd) rsidweight ref_allele eff_allele 1: rs72763981 9.376766e-09 C G 2: rs144383755 -2.093346e-09 A G 3: rs1925717 1.511376e-08 T C 4: rs61827307 -1.625302e-08 C A 5: rs61827308 -1.625302e-08 G C 6: rs199623136 -9.128354e-10 GC G WGT 1: retina.ENSG0135776.wgt.RDat 2: retina.ENSG0135776.wgt.RDat 3: retina.ENSG0135776.wgt.RDat 4: retina.ENSG0135776.wgt.RDat 5: retina.ENSG0135776.wgt.RDat 6: retina.ENSG0135776.wgt.RDat so on attempt to parallelize this I did this: library(parallel) library(data.table) library(foreach) library(doSNOW) n <- parallel::detectCores() cl <- parallel::makeCluster(n, type = "SOCK") doSNOW::registerDoSNOW(cl) files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T) lst_out <- foreach::foreach(i = seq_along(files), .packages = c("data.table") ) %dopar% { a <- get(load(files[i])) names <- rownames(a) data <- data.table(names, a["blup"]) nm1 <- c("rsid", "ref_allele", "eff_allele") data[, (nm1) := tstrsplit(names, ":")[-2]] return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][, WGT := files[i]][]) } parallel::stopCluster(cl) big_data <- rbindlist(lst_out) I am getting this Error: Error in { : task 7 failed - "object 'blup' not found" > parallel::stopCluster(cl) Can you please advise, Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to order variables on correlation plot
Thank you! On Fri, Nov 6, 2020 at 1:45 PM David Winsemius wrote: > > Why did you specify a different order parameter if that is not what you > wanted? > > Suggest you look more carefully at the parameters of the code you are copying > and pasting and also at the help page, ?corrplot . > > -- > > David. > > On 11/6/20 6:08 AM, Ana Marija wrote: > > sorry forgot to attach the plot. > > On Fri, Nov 6, 2020 at 8:07 AM Ana Marija wrote: > > Hello > > I have data like this: > > head(my_data) > > subjects DIABDUR HBA1C ESRD SEX AGE PHENO C1 C2 > 1 fam0110_G110 38 9.41 2 51 2 -0.01144980 0.002661140 > 2 fam0113_G113 30 12.51 2 40 2 -0.00502052 -0.000929061 > 3 fam0114_G114 23 8.42 2 45 2 -0.00251578 -0.003450950 > 4 fam0117_G117 37 9.02 2 46 2 -0.00704917 -0.000573325 > 5 fam0119_G119 22 9.41 1 46 1 0.00263433 0.001002370 > 6 fam0119_G120 NANA1 1 71 1 -0.00354795 -0.002045940 > C3 C4 C5 C6 C7 C8 > 1 0.006028150 -0.00176795 -0.00148375 0.004543550 -0.006272170 -0.00535077 > 2 -0.000453402 -0.00192162 0.00416229 0.007868230 -0.001957670 -0.00473148 > 3 -0.001680860 -0.00620438 -0.00235092 0.000672831 -0.000278318 0.00647337 > 4 0.001436740 0.00155568 -0.00556147 -0.000386401 -0.006885350 0.00135539 > 5 -0.007396920 0.00326229 0.00355575 -0.011149400 0.009156510 0.00120833 > 6 0.004532050 0.00869862 -0.00113207 0.002244520 -0.002119220 0.00657587 >C9 C10 > 1 0.00328111 -0.00113515 > 2 -0.00495790 0.00320201 > 3 0.00208591 -0.00874752 > 4 -0.00967934 0.00607760 > 5 0.00611030 0.00876190 > 6 -0.00990661 0.00635349 > > I am plotting it with: > > library(dplyr) > library(magrittr) > library(corrplot) > d=my_data %>% data.frame %>% set_rownames(.$subjects) %>% select(-subjects) > res <- cor(d, use = "complete.obs") > pdf("correlation.pdf") > corrplot(res, type = "upper", order = "hclust", > tl.col = "black", tl.srt = 45) > dev.off() > > and I am getting the plot in attach. How to make it so that my > variables are shown on the plot in the order they are in my_data data > frame? > > Thanks > Ana > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to order variables on correlation plot
sorry forgot to attach the plot. On Fri, Nov 6, 2020 at 8:07 AM Ana Marija wrote: > > Hello > > I have data like this: > > > head(my_data) > subjects DIABDUR HBA1C ESRD SEX AGE PHENO C1 C2 > 1 fam0110_G110 38 9.41 2 51 2 -0.01144980 0.002661140 > 2 fam0113_G113 30 12.51 2 40 2 -0.00502052 -0.000929061 > 3 fam0114_G114 23 8.42 2 45 2 -0.00251578 -0.003450950 > 4 fam0117_G117 37 9.02 2 46 2 -0.00704917 -0.000573325 > 5 fam0119_G119 22 9.41 1 46 1 0.00263433 0.001002370 > 6 fam0119_G120 NANA1 1 71 1 -0.00354795 -0.002045940 > C3 C4 C5 C6 C7 C8 > 1 0.006028150 -0.00176795 -0.00148375 0.004543550 -0.006272170 -0.00535077 > 2 -0.000453402 -0.00192162 0.00416229 0.007868230 -0.001957670 -0.00473148 > 3 -0.001680860 -0.00620438 -0.00235092 0.000672831 -0.000278318 0.00647337 > 4 0.001436740 0.00155568 -0.00556147 -0.000386401 -0.006885350 0.00135539 > 5 -0.007396920 0.00326229 0.00355575 -0.011149400 0.009156510 0.00120833 > 6 0.004532050 0.00869862 -0.00113207 0.002244520 -0.002119220 0.00657587 >C9 C10 > 1 0.00328111 -0.00113515 > 2 -0.00495790 0.00320201 > 3 0.00208591 -0.00874752 > 4 -0.00967934 0.00607760 > 5 0.00611030 0.00876190 > 6 -0.00990661 0.00635349 > > I am plotting it with: > > library(dplyr) > library(magrittr) > library(corrplot) > d=my_data %>% data.frame %>% set_rownames(.$subjects) %>% select(-subjects) > res <- cor(d, use = "complete.obs") > pdf("correlation.pdf") > corrplot(res, type = "upper", order = "hclust", > tl.col = "black", tl.srt = 45) > dev.off() > > and I am getting the plot in attach. How to make it so that my > variables are shown on the plot in the order they are in my_data data > frame? > > Thanks > Ana correlation.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to order variables on correlation plot
Hello I have data like this: > head(my_data) subjects DIABDUR HBA1C ESRD SEX AGE PHENO C1 C2 1 fam0110_G110 38 9.41 2 51 2 -0.01144980 0.002661140 2 fam0113_G113 30 12.51 2 40 2 -0.00502052 -0.000929061 3 fam0114_G114 23 8.42 2 45 2 -0.00251578 -0.003450950 4 fam0117_G117 37 9.02 2 46 2 -0.00704917 -0.000573325 5 fam0119_G119 22 9.41 1 46 1 0.00263433 0.001002370 6 fam0119_G120 NANA1 1 71 1 -0.00354795 -0.002045940 C3 C4 C5 C6 C7 C8 1 0.006028150 -0.00176795 -0.00148375 0.004543550 -0.006272170 -0.00535077 2 -0.000453402 -0.00192162 0.00416229 0.007868230 -0.001957670 -0.00473148 3 -0.001680860 -0.00620438 -0.00235092 0.000672831 -0.000278318 0.00647337 4 0.001436740 0.00155568 -0.00556147 -0.000386401 -0.006885350 0.00135539 5 -0.007396920 0.00326229 0.00355575 -0.011149400 0.009156510 0.00120833 6 0.004532050 0.00869862 -0.00113207 0.002244520 -0.002119220 0.00657587 C9 C10 1 0.00328111 -0.00113515 2 -0.00495790 0.00320201 3 0.00208591 -0.00874752 4 -0.00967934 0.00607760 5 0.00611030 0.00876190 6 -0.00990661 0.00635349 I am plotting it with: library(dplyr) library(magrittr) library(corrplot) d=my_data %>% data.frame %>% set_rownames(.$subjects) %>% select(-subjects) res <- cor(d, use = "complete.obs") pdf("correlation.pdf") corrplot(res, type = "upper", order = "hclust", tl.col = "black", tl.srt = 45) dev.off() and I am getting the plot in attach. How to make it so that my variables are shown on the plot in the order they are in my_data data frame? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I remove entries in data frame from a vector
Makes sense, thank you! On Wed, 21 Oct 2020 at 17:46, Rolf Turner wrote: > > On Wed, 21 Oct 2020 16:15:22 -0500 > Ana Marija wrote: > > > Hello, > > > > I have a data frame with one column: > > > > > remove > > > > V1 > > > > 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 > > 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 > > 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 > > 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 > > 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 > > 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 > > 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 > > 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 > > > > I want to remove these 8 entries from remove data frame from this > > vector that looks like this: > > > > > head(celFiles) > > > > [1] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" > > [2] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" > > > > [3] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" > > > > [4] > > > "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" > > > > [5] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" > > > > [6] > > > "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" > > ... > > > > I tried doing this: > > > > b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] > > > > but none of the 8th entries in "remove" data frame have been removed. > > > > Please advise, > > Ana > > I would advise you to *look* at basename(celFiles)!!! > > The entries end in ".CEL"; the names in remove$V1 do not. So %in% > finds no matches. Perhaps: > > b <- celFiles[!basename(celFiles) %in% > paste0(as.character(remove$V1),".CEL")] > > Note that, for the data that you have presented, none of the entries of > celFiles "match up" with "remove" so it is *still* the case that (for > the data shown) none of the entries will be removed. So your example > was bad. > > cheers, > > Rolf Turner > > -- > Honorary Research Fellow > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do I remove entries in data frame from a vector
Thank you so much! On Wed, Oct 21, 2020 at 4:47 PM Rui Barradas wrote: > > Hello, > > To remove the file extension it's much easier to use base R > > > filename <- tools::file_path_sans_ext(basename(celFiles)) > > > Hope this helps, > > Rui Barradas > > Às 22:41 de 21/10/20, Rui Barradas escreveu: > > Hello, > > > > This is probably because basename keeps the file extension, try instead > > > > > > filename <- sub("(^[^\\.]*)\\..+$", "\\1", basename(celFiles)) > > celFiles[!filename %in% as.character(remove$V1)] > > > > > > Hope this helps, > > > > Rui Barradas > > > > Às 22:15 de 21/10/20, Ana Marija escreveu: > >> Hello, > >> > >> I have a data frame with one column: > >> > >>> remove > >> > >> V1 > >> > >> 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 > >> 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 > >> 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 > >> 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 > >> 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 > >> 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 > >> 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 > >> 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 > >> > >> I want to remove these 8 entries from remove data frame from this > >> vector that looks like this: > >> > >>> head(celFiles) > >> > >> [1] > >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" > >> > >> [2] > >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" > >> > >> > >> [3] > >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" > >> > >> > >> [4] > >> "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" > >> > >> > >> [5] > >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" > >> > >> > >> [6] > >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" > >> > >> ... > >> > >> I tried doing this: > >> > >> b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] > >> > >> but none of the 8th entries in "remove" data frame have been removed. > >> > >> Please advise, > >> Ana > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how do I remove entries in data frame from a vector
Hello, I have a data frame with one column: > remove V1 1 ABAFT_g_4RWG569_BI_SNP_A10_35096 2 ABAFT_g_4RWG569_BI_SNP_B12_35130 3 ABAFT_g_4RWG569_BI_SNP_E09_35088 4 ABAFT_g_4RWG569_BI_SNP_E12_35136 5 ABAFT_g_4RWG569_BI_SNP_F11_35122 6 ABAFT_g_4RWG569_BI_SNP_F12_35138 7 ABAFT_g_4RWG569_BI_SNP_G07_35060 8 ABAFT_g_4RWG569_BI_SNP_G12_35140 I want to remove these 8 entries from remove data frame from this vector that looks like this: > head(celFiles) [1] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL" [2] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL" [3] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL" [4] "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL" [5] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL" [6] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL" ... I tried doing this: b= celFiles[!basename(celFiles) %in% as.character(remove$V1)] but none of the 8th entries in "remove" data frame have been removed. Please advise, Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 D density plot interpretation and manipulating the data
Hi Abby, Thanks for getting back to me, yes I believe I did that by doing this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. MedianMean 3rd Qu.Max. 0 383 696 73811701789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and keep only entries with density > 400 a=SNP[SNP$density>400,] and plot it again: p <- ggplot(a, mapping = aes(x = mean, y = var)) p <- p + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red") and probably I can increase that threshold... Any idea how do I interpret data points that are left contained within the ellipses? On Fri, Oct 9, 2020 at 6:09 PM Abby Spurdle wrote: > > You could assign a density value to each point. > Maybe you've done that already...? > > Then trim the lowest n (number of) data points > Or trim the lowest p (proportion of) data points. > > e.g. > Remove the data points with the 20 lowest density values. > Or remove the data points with the lowest 5% of density values. > > I'll let you decide whether that is a good idea or a bad idea. > And if it's a good idea, then how much to trim. > > > On Sat, Oct 10, 2020 at 5:47 AM Ana Marija > wrote: > > > > Hi Bert, > > > > Another confrontational response from you... > > > > You might have noticed that I use the word "outlier" carefully in this > > post and only in relation to the plotted ellipses. I do not know the > > underlying algorithm of geom_density_2d() and therefore I am having an > > issue of how to interpret the plot. I was hoping someone here knows > > that and can help me. > > > > Ana > > > > On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter wrote: > > > > > > I recommend that you consult with a local statistical expert. Much of > > > what you say (outliers?!?) seems to make little sense, and your > > > statistical knowledge seems minimal. Perhaps more to the point, none of > > > your questions can be properly answered without subject matter context, > > > which this list is not designed to provide. That's why I believe you need > > > local expertise. > > > > > > Bert Gunter > > > > > > "The trouble with having an open mind is that people keep coming along > > > and sticking things into it." > > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > > > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija > > > wrote: > > >> > > >> Hi Abby, > > >> > > >> thank you for getting back to me and for this useful information. > > >> > > >> I'm trying to detect the outliers in my distribution based of mean and > > >> variance. Can I see that from the plot I provided? Would outliers be > > >> outside of ellipses? If so how do I extract those from my data frame, > > >> based on which parameter? > > >> > > >> So I am trying to connect outliers based on what the plot is showing: > > >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > > >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > > >> > > >> versus what is in the data: > > >> > > >> > head(SNP) > > >>mean var sd > > >> FQC.10090295 0.0327 0.002678 0.0517 > > >> FQC.10119363 0.0220 0.000978 0.0313 > > >> FQC.10132112 0.0275 0.002088 0.0457 > > >> FQC.10201128 0.0169 0.000289 0.0170 > > >> FQC.10208432 0.0443 0.004081 0.0639 > > >> FQC.10218466 0.0116 0.000131 0.0115 > > >> ... > > >> > > >> the distribution is not normal, it is right-skewed. > > >> > > >> Cheers, > > >> Ana > > >> > > >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle wrote: > > >> > > > >> > > My understanding is that this represents bivariate normal > > >> > > approximation of the data which uses the kernel density function to > > >> > > test for inclusion within a level set. (please correct me) > > >> > > > >> > You can fit a bivariate normal distribution by computing five > > >> > parameters. > > >> > Two means, two standard deviations (or two variances) and one > > >> > correlation (or covariance) coefficient. > > >> > The bivariate normal *has* elliptical contour
Re: [R] 2 D density plot interpretation and manipulating the data
Hi Bert, Another confrontational response from you... You might have noticed that I use the word "outlier" carefully in this post and only in relation to the plotted ellipses. I do not know the underlying algorithm of geom_density_2d() and therefore I am having an issue of how to interpret the plot. I was hoping someone here knows that and can help me. Ana On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter wrote: > > I recommend that you consult with a local statistical expert. Much of what > you say (outliers?!?) seems to make little sense, and your statistical > knowledge seems minimal. Perhaps more to the point, none of your questions > can be properly answered without subject matter context, which this list is > not designed to provide. That's why I believe you need local expertise. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija wrote: >> >> Hi Abby, >> >> thank you for getting back to me and for this useful information. >> >> I'm trying to detect the outliers in my distribution based of mean and >> variance. Can I see that from the plot I provided? Would outliers be >> outside of ellipses? If so how do I extract those from my data frame, >> based on which parameter? >> >> So I am trying to connect outliers based on what the plot is showing: >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") >> >> versus what is in the data: >> >> > head(SNP) >>mean var sd >> FQC.10090295 0.0327 0.002678 0.0517 >> FQC.10119363 0.0220 0.000978 0.0313 >> FQC.10132112 0.0275 0.002088 0.0457 >> FQC.10201128 0.0169 0.000289 0.0170 >> FQC.10208432 0.0443 0.004081 0.0639 >> FQC.10218466 0.0116 0.000131 0.0115 >> ... >> >> the distribution is not normal, it is right-skewed. >> >> Cheers, >> Ana >> >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle wrote: >> > >> > > My understanding is that this represents bivariate normal >> > > approximation of the data which uses the kernel density function to >> > > test for inclusion within a level set. (please correct me) >> > >> > You can fit a bivariate normal distribution by computing five parameters. >> > Two means, two standard deviations (or two variances) and one >> > correlation (or covariance) coefficient. >> > The bivariate normal *has* elliptical contours. >> > >> > A kernel density estimate is usually regarded as an estimate of an >> > unknown density function. >> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe >> > them as normal approximations. >> > In general, bivariate kernel density estimates do *not* have >> > elliptical contours. >> > But in saying that, if the data is close to normality, then contours >> > will be close to elliptical. >> > >> > Kernel density estimates do not test for inclusion, as such. >> > (But technically, there are some exceptions to that). >> > >> > I'm not sure what you're trying to achieve here. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 D density plot interpretation and manipulating the data
Hi Abby, thank you for getting back to me and for this useful information. I'm trying to detect the outliers in my distribution based of mean and variance. Can I see that from the plot I provided? Would outliers be outside of ellipses? If so how do I extract those from my data frame, based on which parameter? So I am trying to connect outliers based on what the plot is showing: s <- ggplot(SNP, mapping = aes(x = mean, y = var)) s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") versus what is in the data: > head(SNP) mean var sd FQC.10090295 0.0327 0.002678 0.0517 FQC.10119363 0.0220 0.000978 0.0313 FQC.10132112 0.0275 0.002088 0.0457 FQC.10201128 0.0169 0.000289 0.0170 FQC.10208432 0.0443 0.004081 0.0639 FQC.10218466 0.0116 0.000131 0.0115 ... the distribution is not normal, it is right-skewed. Cheers, Ana On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle wrote: > > > My understanding is that this represents bivariate normal > > approximation of the data which uses the kernel density function to > > test for inclusion within a level set. (please correct me) > > You can fit a bivariate normal distribution by computing five parameters. > Two means, two standard deviations (or two variances) and one > correlation (or covariance) coefficient. > The bivariate normal *has* elliptical contours. > > A kernel density estimate is usually regarded as an estimate of an > unknown density function. > Often they use a normal (or Gaussian) kernel, but I wouldn't describe > them as normal approximations. > In general, bivariate kernel density estimates do *not* have > elliptical contours. > But in saying that, if the data is close to normality, then contours > will be close to elliptical. > > Kernel density estimates do not test for inclusion, as such. > (But technically, there are some exceptions to that). > > I'm not sure what you're trying to achieve here. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 D density plot interpretation and manipulating the data
My understanding is that this represents bivariate normal approximation of the data which uses the kernel density function to test for inclusion within a level set. (please correct me) In order to exclude the outlier to these ellipses/contours is it advisable to do something like this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. MedianMean 3rd Qu.Max. 0 383 696 73811701789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and then do something like this: a=SNP[SNP$density>400,] and plot it again: p <- ggplot(a, mapping = aes(x = mean, y = var)) p <- p + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red") On Thu, Oct 8, 2020 at 3:52 PM Ana Marija wrote: > > Hello, > > I have a data frame like this: > > > head(SNP) >mean var sd > FQC.10090295 0.0327 0.002678 0.0517 > FQC.10119363 0.0220 0.000978 0.0313 > FQC.10132112 0.0275 0.002088 0.0457 > FQC.10201128 0.0169 0.000289 0.0170 > FQC.10208432 0.0443 0.004081 0.0639 > FQC.10218466 0.0116 0.000131 0.0115 > ... > > and I am creating plot like this: > > s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > s > > I am getting plot in attach. > > My question is how do I: > 1.interpret the inclusion versus exclusion within the ellipses-contours? > > 2. how do I extract from my data frame the points which are outside of > ellipses? > > Thanks > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2 D density plot interpretation and manipulating the data
Hello, I have a data frame like this: > head(SNP) mean var sd FQC.10090295 0.0327 0.002678 0.0517 FQC.10119363 0.0220 0.000978 0.0313 FQC.10132112 0.0275 0.002088 0.0457 FQC.10201128 0.0169 0.000289 0.0170 FQC.10208432 0.0443 0.004081 0.0639 FQC.10218466 0.0116 0.000131 0.0115 ... and I am creating plot like this: s <- ggplot(SNP, mapping = aes(x = mean, y = var)) s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") s I am getting plot in attach. My question is how do I: 1.interpret the inclusion versus exclusion within the ellipses-contours? 2. how do I extract from my data frame the points which are outside of ellipses? Thanks Ana snps.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to turn column into column names and fill it with values
oh it seems that I can just use your last line of code and solve my problem: m2=tapply(mc$IID, list(FID=mc$FID, PLATE=mc$PLATE), mean) m2=as.data.frame(m2) library(data.table) m3=setDT(m2, keep.rownames = TRUE)[] colnames(m3)[1] <- "FID" mt=merge(mc,m3,by="FID" for(i in 4:ncol(mt)) mt[,i] <- 1 + (names(mt)[i]== mt$PLATE) Thanks! On Tue, Sep 29, 2020 at 12:08 PM Ana Marija wrote: > > HI Bert, > > thank you for getting back to me. > I tried this: > > > dat <- cbind(mc, matrix(0,ncol = 34)) > > head(dat) > FID IID PLATE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 > 22 > 1 fam0110 G110 4RWG569 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > 2 fam0113 G113 cherry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > 3 fam0114 G114 cherry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > 4 fam0117 G117 4RWG569 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > 5 fam0118 G118 5XAV049 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > 6 fam0119 G119 cherry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > 23 24 25 26 27 28 29 30 31 32 33 34 > 1 0 0 0 0 0 0 0 0 0 0 0 0 > 2 0 0 0 0 0 0 0 0 0 0 0 0 > 3 0 0 0 0 0 0 0 0 0 0 0 0 > 4 0 0 0 0 0 0 0 0 0 0 0 0 > 5 0 0 0 0 0 0 0 0 0 0 0 0 > 6 0 0 0 0 0 0 0 0 0 0 0 0 > > names(dat) <- c(names(dat)[1:34], unique(dat$PLATE)) > Error in names(dat) <- c(names(dat)[1:34], unique(dat$PLATE)) : > 'names' attribute [68] must be the same length as the vector [37] > > so names should include FID,IID,PLATE plus unique dat$PLATE > how do I fix that so the code works? > > Also I tried a bit on my own: > > > head(mc) > FID IID PLATE > 1 fam0110 G110 4RWG569 > 2 fam0113 G113 cherry > 3 fam0114 G114 cherry > 4 fam0117 G117 4RWG569 > 5 fam0118 G118 5XAV049 > 6 fam0119 G119 cherry > ... > > m2=tapply(mc$IID, list(FID=mc$FID, PLATE=mc$PLATE), mean) > m2=as.data.frame(m2) > library(data.table) > m3=setDT(m2, keep.rownames = TRUE)[] > colnames(m3)[1] <- "FID" > mt=merge(mc,m3,by="FID") > > > head(mt) > FID IID PLATE 0VXC556 1CNF297 1CWO500 1DXJ626 1LTX827 1SHK635 1TNP840 > 1 fam0110 G110 4RWG569 NA NA NA NA NA NA NA > 2 fam0113 G113 cherry NA NA NA NA NA NA NA > 3 fam0114 G114 cherry NA NA NA NA NA NA NA > 4 fam0117 G117 4RWG569 NA NA NA NA NA NA NA > 5 fam0118 G118 5XAV049 NA NA NA NA NA NA NA > 6 fam0119 G119 cherry NA NA NA NA NA NA NA > 1URP242 2BKX529 2PAG415 3DEF425 3ECO791 3FQM386 3KYJ479 3XHK903 4RWG569 > 1 NA NA NA NA NA NA NA NA NA > 2 NA NA NA NA NA NA NA NA NA > 3 NA NA NA NA NA NA NA NA NA > 4 NA NA NA NA NA NA NA NA NA > 5 NA NA NA NA NA NA NA NA NA > 6 NA NA NA NA NA NA NA NA NA > ... > > so this gives me the correct columns. Now is the question of how to > replace NA with 2 id column name matches the rownname in PLATE column > with 2 otherwise it is 1. > > Cheers, > Ana > > On Tue, Sep 29, 2020 at 11:46 AM Bert Gunter wrote: > > > > I am not sure reshape2 is appropriate for this task, but, assuming I > > understand correctly, it's quite easy without it. The following is one way, > > which probably can be done more elegantly and efficiently, but I think it > > does what you want. > > > > "dat" is your example data frame, in which the columns were read in with > > "stringsAsFactors" = FALSE (this is important!) > > > > dat <- cbind(dat, matrix(0,ncol = 3)) ## change 3 to 34 for your full data > > names(dat) <- c(names(dat)[1:3], unique(dat$PLATE)) > > for(i in 4:ncol(dat)) dat[,i] <- 1 + (names(dat)[i]== dat$PLATE) > > dat > > > > Result: > > > > FID IID PLATE 4RWG569 cherry 5XAV049 > > 1 fam0110 G110 4RWG569 2 1 1 > > 2 fam0113 G113 cherry 1 2 1 > > 3 fam0114 G114 cherry 1 2 1 > > 4 fam0117 G117 4RWG569 2 1 1 > > 5 fam0118 G118 5XAV049 1 1 2 > > 6 fam0119 G119 cherry 1 2 1 > > > > > > Bert Gunter > &
Re: [R] how to turn column into column names and fill it with values
HI Bert, thank you for getting back to me. I tried this: > dat <- cbind(mc, matrix(0,ncol = 34)) > head(dat) FID IID PLATE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 fam0110 G110 4RWG569 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 fam0113 G113 cherry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 fam0114 G114 cherry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 fam0117 G117 4RWG569 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 fam0118 G118 5XAV049 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 fam0119 G119 cherry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 24 25 26 27 28 29 30 31 32 33 34 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 > names(dat) <- c(names(dat)[1:34], unique(dat$PLATE)) Error in names(dat) <- c(names(dat)[1:34], unique(dat$PLATE)) : 'names' attribute [68] must be the same length as the vector [37] so names should include FID,IID,PLATE plus unique dat$PLATE how do I fix that so the code works? Also I tried a bit on my own: > head(mc) FID IID PLATE 1 fam0110 G110 4RWG569 2 fam0113 G113 cherry 3 fam0114 G114 cherry 4 fam0117 G117 4RWG569 5 fam0118 G118 5XAV049 6 fam0119 G119 cherry ... m2=tapply(mc$IID, list(FID=mc$FID, PLATE=mc$PLATE), mean) m2=as.data.frame(m2) library(data.table) m3=setDT(m2, keep.rownames = TRUE)[] colnames(m3)[1] <- "FID" mt=merge(mc,m3,by="FID") > head(mt) FID IID PLATE 0VXC556 1CNF297 1CWO500 1DXJ626 1LTX827 1SHK635 1TNP840 1 fam0110 G110 4RWG569 NA NA NA NA NA NA NA 2 fam0113 G113 cherry NA NA NA NA NA NA NA 3 fam0114 G114 cherry NA NA NA NA NA NA NA 4 fam0117 G117 4RWG569 NA NA NA NA NA NA NA 5 fam0118 G118 5XAV049 NA NA NA NA NA NA NA 6 fam0119 G119 cherry NA NA NA NA NA NA NA 1URP242 2BKX529 2PAG415 3DEF425 3ECO791 3FQM386 3KYJ479 3XHK903 4RWG569 1 NA NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA 3 NA NA NA NA NA NA NA NA NA 4 NA NA NA NA NA NA NA NA NA 5 NA NA NA NA NA NA NA NA NA 6 NA NA NA NA NA NA NA NA NA ... so this gives me the correct columns. Now is the question of how to replace NA with 2 id column name matches the rownname in PLATE column with 2 otherwise it is 1. Cheers, Ana On Tue, Sep 29, 2020 at 11:46 AM Bert Gunter wrote: > > I am not sure reshape2 is appropriate for this task, but, assuming I > understand correctly, it's quite easy without it. The following is one way, > which probably can be done more elegantly and efficiently, but I think it > does what you want. > > "dat" is your example data frame, in which the columns were read in with > "stringsAsFactors" = FALSE (this is important!) > > dat <- cbind(dat, matrix(0,ncol = 3)) ## change 3 to 34 for your full data > names(dat) <- c(names(dat)[1:3], unique(dat$PLATE)) > for(i in 4:ncol(dat)) dat[,i] <- 1 + (names(dat)[i]== dat$PLATE) > dat > > Result: > > FID IID PLATE 4RWG569 cherry 5XAV049 > 1 fam0110 G110 4RWG569 2 1 1 > 2 fam0113 G113 cherry 1 2 1 > 3 fam0114 G114 cherry 1 2 1 > 4 fam0117 G117 4RWG569 2 1 1 > 5 fam0118 G118 5XAV049 1 1 2 > 6 fam0119 G119 cherry 1 2 1 > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Sep 29, 2020 at 9:19 AM Ana Marija > wrote: >> >> Hello, >> >> I have a data frame like this: >> >> > head(mc) >> FID IID PLATE >> 1 fam0110 G110 4RWG569 >> 2 fam0113 G113 cherry >> 3 fam0114 G114 cherry >> 4 fam0117 G117 4RWG569 >> 5 fam0118 G118 5XAV049 >> 6 fam0119 G119 cherry >> ... >> > dim(mc) >> [1] 16254 >> > length(unique(mc$PLATE)) >> [1] 34 >> >> I am trying to make a new data frame which would look like this: >> FID IID PLATE 4RWG569 cherry 5XAV049 ... >> 1 fam0110 G110 4RWG569 2 1 1 >> 2 fam0113 G113
[R] how to turn column into column names and fill it with values
Hello, I have a data frame like this: > head(mc) FID IID PLATE 1 fam0110 G110 4RWG569 2 fam0113 G113 cherry 3 fam0114 G114 cherry 4 fam0117 G117 4RWG569 5 fam0118 G118 5XAV049 6 fam0119 G119 cherry ... > dim(mc) [1] 16254 > length(unique(mc$PLATE)) [1] 34 I am trying to make a new data frame which would look like this: FID IID PLATE 4RWG569 cherry 5XAV049 ... 1 fam0110 G110 4RWG569 2 1 1 2 fam0113 G113 cherry 1 2 1 3 fam0114 G114 cherry 1 2 1 4 fam0117 G117 4RWG569 2 1 1 5 fam0118 G118 5XAV049 2 1 1 6 fam0119 G119 cherry 1 2 1 ... so the new data frame would have an additional 34 columns (for every unique mc$PLATE) and if in the row of PLATE column the value is ==to that column name I would have 2 otherwise 1 I tried to do this with: library(reshape2) > m2=dcast(mc, IID ~ PLATE) Using PLATE as value column: use value.var to override. Please advise, Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with nesting if else statements
Hi Jeremie, when I try to reproduce your code this is what I get: > a=setDT(a) > head(a) FID IID CURRELIG PLASER RTNPTHY 1: fam0110 G1102 2 2 2: fam0113 G1132 2 2 3: fam0114 G1142 2 2 4: fam0117 G1172 2 2 5: fam0118 G1182 NA 2 6: fam0119 G1192 1 2 > a=a[,PHENO:=NA] > head(a) FID IID CURRELIG PLASER RTNPTHY PHENO 1: fam0110 G1102 2 2NA 2: fam0113 G1132 2 2NA 3: fam0114 G1142 2 2NA 4: fam0117 G1172 2 2NA 5: fam0118 G1182 NA 2NA 6: fam0119 G1192 1 2NA > a=a[PLASER==2|RTNPTHY==2,PHENO:=2] Warning message: In `[.data.table`(a, PLASER == 2 | RTNPTHY == 2, `:=`(PHENO, 2)) : 2.00 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical' (column 6 named 'PHENO') Please advise, Ana On Wed, Sep 23, 2020 at 2:48 PM Jeremie Juste wrote: > > > Hello Ana Marija, > > I cannot reproduce your error, > > with a$PHENO=ifelse(a$PLASER==2 |a$RTNPTHY==2, 2, ifelse(a$CURRELIG==1 | > a$RTNPTHY==1,1,NA)) > For instance I have the expected PHENO=2 > > > FID IID CURRELIG PLASER RTNPTHY PHENO > > 39: fam5706 G57061 1 2 2 > > In general I find nested ifelse to be difficult to work with especially > when I am tired :-). I would suggest this alternative way instead. It uses > data.table and you can investigate each step if you need to. > > library(data.table) > setDT(a) > a[,PHENO:=NA] > a[PLASER==2|RTNPTHY==2,PHENO:=2] > a[is.na(PHENO)&(CURRELIG==1|RTNPTHY==1),PHENO:=1] > > > HTH, > Jeremie > > a <- read.table(text="FID,IID,CURRELIG,PLASER,RTNPTHY > fam5610,G5610,1,1,1 > fam5614,G5614,1,2,2 > fam5615,G5615,1,1,1 > fam5618,G5618,1,1,2 > fam5621,G5621,1,1,1 > fam5624,G5624,1,1,2 > fam5625,G5625,1,1,1 > fam5628,G5628,1,2,2 > fam5633,G5633,1,2,2 > fam5634,G5634,1,1,1 > fam5635,G5635,2,2,2 > fam5636,G5636,1,1,1 > fam5641,G5641,1,1,1 > fam5645,G5645,2,1,2 > fam5646,G5646,2,2,2 > fam5654,G5654,1,2,2 > fam5655,G5655,1,2,2 > fam5656,G5656,2,2,2 > fam5658,G5658,1,1,1 > fam5659,G5659,2,2,2 > fam5660,G5660,1,1,1 > fam5661,G5661,2,2,2 > fam5664,G5664,1,1,1 > fam5666,G5666,1,1,1 > fam5667,G5667,1,1,2 > fam5670,G5670,1,1,1 > fam5671,G5671,1,1,2 > fam5672,G5672,1,1,2 > fam5673,G5673,1,1,1 > fam5680,G5680,1,2,2 > fam5686,G5686,1,2,2 > fam5687,G5687,1,2,2 > fam5688,G5688,1,1,2 > fam5693,G5693,2,1,1 > fam5695,G5695,1,1,1 > fam5697,G5697,1,1,1 > fam5700,G5700,1,2,2 > fam5701,G5701,1,1,1 > fam5706,G5706,1,1,2 > fam5709,G5709,1,1,1 > fam5713,G5713,1,1,1 > fam5715,G5715,1,1,1 > fam5718,G5718,1,1,1",sep=",", header=TRUE) > > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with nesting if else statements
I tried doing this: a$PHENO=ifelse(a$PLASER==2 | a$RTNPTHY==2,2,ifelse(a$CURRELIG==1 | a$RTNPTHY==1,1,NA)) which brought be closer to the solution, but now I have lines like this: FID IID CURRELIG PLASER RTNPTHY PHENO fam3151 G31511 1 NANA fam3149 G31492 1 NANA fam3151 G31511 1 NANA fam0637 G6372 NA NANA fam4483 G44831 NA NANA I would like these lines to look like this: FID IID CURRELIG PLASER RTNPTHY PHENO fam3151 G31511 1 NA1 fam3149 G31492 1 NA2 fam3151 G31511 1 NA 1 fam0637 G6372 NA NA2 fam4483 G44831 NA NA1 in addition to what this command does a$PHENO=ifelse(a$PLASER==2 | a$RTNPTHY==2,2,ifelse(a$CURRELIG==1 | a$RTNPTHY==1,1,NA)) On Wed, Sep 23, 2020 at 11:43 AM Ana Marija wrote: > > Hello, > > I have a data frame as shown bellow. > I want to create a new column PHENO which will be defined as follows: > if CURRELIG==1 -> PHENO==1 > in the above subset those that have: > PLASER==2 -> PHENO==2 > and > those where RTNPTHY==1 -> PHENO==1 > > I tried doing this: > a$PHENO=ifelse(a$CURRELIG==1 | a$RTNPTHY==1 ,1,ifelse(a$PLASER==2 | > a$RTNPTHY==2,2,NA)) > > but this give me some lines where I am not seeing results that I want, > for example: > FID IID CURRELIG PLASER RTNPTHY PHENO > fam5628 G56281 2 21 > > here the PHENO should be =2 because RTNPTHY==2 and PLASER==2 > PHENO should be ==2 when either RTNPTHY==2 or PLASER==2 > > another wrong line is this: > FID IID CURRELIG PLASER RTNPTHY PHENO > fam5706G570611 2 1 > > again RTNPTHY ==2 and PHENO==1 instead of 2. > > My data looks like this: > FID IID CURRELIG PLASER RTNPTHY > fam5610 G56101 1 1 > fam5614 G56141 2 2 > fam5615 G56151 1 1 > fam5618 G56181 1 2 > fam5621 G56211 1 1 > fam5624 G56241 1 2 > fam5625 G56251 1 1 > fam5628 G56281 2 2 > fam5633 G56331 2 2 > fam5634 G56341 1 1 > fam5635 G56352 2 2 > fam5636 G56361 1 1 > fam5641 G56411 1 1 > fam5645 G56452 1 2 > fam5646 G56462 2 2 > fam5654 G56541 2 2 > fam5655 G56551 2 2 > fam5656 G56562 2 2 > fam5658 G56581 1 1 > fam5659 G56592 2 2 > fam5660 G56601 1 1 > fam5661 G56612 2 2 > fam5664 G56641 1 1 > fam5666 G56661 1 1 > fam5667 G56671 1 2 > fam5670 G56701 1 1 > fam5671 G56711 1 2 > fam5672 G56721 1 2 > fam5673 G56731 1 1 > fam5680 G56801 2 2 > fam5686 G56861 2 2 > fam5687 G56871 2 2 > fam5688 G56881 1 2 > fam5693 G56932 1 1 > fam5695 G56951 1 1 > fam5697 G56971 1 1 > fam5700 G57001 2 2 > fam5701 G57011 1 1 > fam5706 G57061 1 2 > fam5709 G57091 1 1 > fam5713 G57131 1 1 > fam5715 G57151 1 1 > fam5718 G57181 1 1 > > Please advise, > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with nesting if else statements
Hello, I have a data frame as shown bellow. I want to create a new column PHENO which will be defined as follows: if CURRELIG==1 -> PHENO==1 in the above subset those that have: PLASER==2 -> PHENO==2 and those where RTNPTHY==1 -> PHENO==1 I tried doing this: a$PHENO=ifelse(a$CURRELIG==1 | a$RTNPTHY==1 ,1,ifelse(a$PLASER==2 | a$RTNPTHY==2,2,NA)) but this give me some lines where I am not seeing results that I want, for example: FID IID CURRELIG PLASER RTNPTHY PHENO fam5628 G56281 2 21 here the PHENO should be =2 because RTNPTHY==2 and PLASER==2 PHENO should be ==2 when either RTNPTHY==2 or PLASER==2 another wrong line is this: FID IID CURRELIG PLASER RTNPTHY PHENO fam5706G570611 2 1 again RTNPTHY ==2 and PHENO==1 instead of 2. My data looks like this: FID IID CURRELIG PLASER RTNPTHY fam5610 G56101 1 1 fam5614 G56141 2 2 fam5615 G56151 1 1 fam5618 G56181 1 2 fam5621 G56211 1 1 fam5624 G56241 1 2 fam5625 G56251 1 1 fam5628 G56281 2 2 fam5633 G56331 2 2 fam5634 G56341 1 1 fam5635 G56352 2 2 fam5636 G56361 1 1 fam5641 G56411 1 1 fam5645 G56452 1 2 fam5646 G56462 2 2 fam5654 G56541 2 2 fam5655 G56551 2 2 fam5656 G56562 2 2 fam5658 G56581 1 1 fam5659 G56592 2 2 fam5660 G56601 1 1 fam5661 G56612 2 2 fam5664 G56641 1 1 fam5666 G56661 1 1 fam5667 G56671 1 2 fam5670 G56701 1 1 fam5671 G56711 1 2 fam5672 G56721 1 2 fam5673 G56731 1 1 fam5680 G56801 2 2 fam5686 G56861 2 2 fam5687 G56871 2 2 fam5688 G56881 1 2 fam5693 G56932 1 1 fam5695 G56951 1 1 fam5697 G56971 1 1 fam5700 G57001 2 2 fam5701 G57011 1 1 fam5706 G57061 1 2 fam5709 G57091 1 1 fam5713 G57131 1 1 fam5715 G57151 1 1 fam5718 G57181 1 1 Please advise, Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to overlay two histograms
HI Jim, fantastic solution! Thank you so much!!! Ana On Thu, Sep 17, 2020 at 6:01 PM Jim Lemon wrote: > > Hi Ana, > Sorry it's not in ggplot, but it may help: > > d<-read.table(text="CHR counts name > 1 193554 old > 2 220816 old > 3 174350 old > 4 163112 old > 5 168125 old > 6 182366 old > 7 143023 old > 8 147410 old > 9 122112 old > 10 138394 old > 11 130069 old > 12 124850 old > 13 104119 old > 14 83931 old > 15 72287 old > 16 71550 old > 17 58380 old > 18 76812 old > 19 37040 old > 20 63407 old > 21 33863 old > 22 33812 old > 1 202783 new > 2 252124 new > 3 213337 new > 4 201001 new > 5 207606 new > 6 228133 new > 7 147218 new > 8 177518 new > 9 121276 new > 10 163447 new > 11 158724 new > 12 142183 new > 13 89 new > 14 83043 new > 15 61063 new > 16 55439 new > 17 32883 new > 18 69135 new > 19 16624 new > 20 48541 new > 21 25479 new > 22 19698 new", > header=TRUE,stingsAsFactors=FALSE) > barpos<-barplot(counts~name+CHR,data=d,beside=TRUE,names.arg=rep("",22)) > legend(40,22,c("new","old"),fill=c("gray20","gray80")) > library(plotrix) > staxlab(1,at=colMeans(barpos),labels=1:22) > > Jim > > On Fri, Sep 18, 2020 at 8:05 AM Ana Marija > wrote: > > > > Hello, > > > > I am trying to overlay two histograms with this: > > > > p <- ggplot(d, aes(CHR, counts, fill = name)) + geom_bar(position = "dodge") > > p > > > > but I am getting this error: > > Error: stat_count() can only have an x or y aesthetic. > > Run `rlang::last_error()` to see where the error occurred. > > > > my data is this: > > > > > d > >CHR counts name > > 11 193554 old > > 22 220816 old > > 33 174350 old > > 44 163112 old > > 55 168125 old > > 66 182366 old > > 77 143023 old > > 88 147410 old > > 99 122112 old > > 10 10 138394 old > > 11 11 130069 old > > 12 12 124850 old > > 13 13 104119 old > > 14 14 83931 old > > 15 15 72287 old > > 16 16 71550 old > > 17 17 58380 old > > 18 18 76812 old > > 19 19 37040 old > > 20 20 63407 old > > 21 21 33863 old > > 22 22 33812 old > > 23 1 202783 new > > 24 2 252124 new > > 25 3 213337 new > > 26 4 201001 new > > 27 5 207606 new > > 28 6 228133 new > > 29 7 147218 new > > 30 8 177518 new > > 31 9 121276 new > > 32 10 163447 new > > 33 11 158724 new > > 34 12 142183 new > > 35 13 89 new > > 36 14 83043 new > > 37 15 61063 new > > 38 16 55439 new > > 39 17 32883 new > > 40 18 69135 new > > 41 19 16624 new > > 42 20 48541 new > > 43 21 25479 new > > 44 22 19698 new > > > > Basically I need to show counts per CHR in "old" and "new" side by side. > > > > Please advise, > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to overlay two histograms
Hello, I am trying to overlay two histograms with this: p <- ggplot(d, aes(CHR, counts, fill = name)) + geom_bar(position = "dodge") p but I am getting this error: Error: stat_count() can only have an x or y aesthetic. Run `rlang::last_error()` to see where the error occurred. my data is this: > d CHR counts name 11 193554 old 22 220816 old 33 174350 old 44 163112 old 55 168125 old 66 182366 old 77 143023 old 88 147410 old 99 122112 old 10 10 138394 old 11 11 130069 old 12 12 124850 old 13 13 104119 old 14 14 83931 old 15 15 72287 old 16 16 71550 old 17 17 58380 old 18 18 76812 old 19 19 37040 old 20 20 63407 old 21 21 33863 old 22 22 33812 old 23 1 202783 new 24 2 252124 new 25 3 213337 new 26 4 201001 new 27 5 207606 new 28 6 228133 new 29 7 147218 new 30 8 177518 new 31 9 121276 new 32 10 163447 new 33 11 158724 new 34 12 142183 new 35 13 89 new 36 14 83043 new 37 15 61063 new 38 16 55439 new 39 17 32883 new 40 18 69135 new 41 19 16624 new 42 20 48541 new 43 21 25479 new 44 22 19698 new Basically I need to show counts per CHR in "old" and "new" side by side. Please advise, Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to represent the effect of one covariate on regression results?
Hi David, thanks for the useful insight I did of course wrote to plink user group but no answer there. I guess they are more concerned about how to run commands with plink as oppose to interpret results. What I can tell about my cohort is that about 80% of cases had Type 2 diabetes while about 8% had Type 1. (my TD covariate is reference for the type of diabetes) In the attach is the description of the data. Cheers, Ana On Tue, Sep 15, 2020 at 7:59 PM David Winsemius wrote: > > > On 9/15/20 8:57 AM, Ana Marija wrote: > > Hi Abby and David, > > > > Thanks for the useful tips! I will check those. > > > > I completed the regression analysis in plink (as R would be very slow > > for my sample size) but as I mentioned I need to determine the > > influence of a specific covariate in my results and Plink is of no > > help there. > > > > I did Pearson correlation analysis for P values which I got in > > regression with and without my covariate of interest and I got this: > > > >> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95) > > Pearson's product-moment correlation > > > > data: tt$P_TD and tt$P_noTD > > t = 20.17, df = 283, p-value < 2.2e-16 > > alternative hypothesis: true correlation is not equal to 0 > > 95 percent confidence interval: > > 0.7156134 0.8117108 > > sample estimates: > >cor > > 0.7679493 > > > > I can see the p values are very correlated in those two instances. Can > > I conclude that my covariate then doesn't have a huge effect or what > > kind of conclusion I can draw from that? > > > I do not think it follows from the correlation of p-values that your > covariate "does not have a huge effect". P-values are not really data, > although they are random values. A simulation study of this would > require a much better description of the original dataset. Again, that > is something that the users of Plink are more likely to be able to > intuit than are we. I still do not see why this question is not being > addressed to the users of the software from which you are deriving your > "data". > > > -- > > David. > > > > > Thanks for all your help > > Ana > > > > > > > > On Tue, Sep 15, 2020 at 1:26 AM David Winsemius > > wrote: > >> There is a user-group for PLINK, easily found by looking at the page you > >> cited. This is not the correct place to submit such questions. > >> > >> > >> https://groups.google.com/g/plink2-users?pli=1 > >> > >> > >> -- > >> > >> David. > >> > >> On 9/14/20 6:29 AM, Ana Marija wrote: > >>> Hello, > >>> > >>> I was running association analysis using --glm genotypic from: > >>> https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > >>> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > >>> result looks like this: > >>> > >>> #CHROMPOSIDREFALTA1TESTOBS_CTBETA > >>> SEZ_OR_F_STATPERRCODE > >>> 10135434303rs11101905GAAADD11863 > >>> -0.1107330.0986981-1.121930.261891. > >>> 10135434303rs11101905GAADOMDEV11863 > >>> 0.0797970.1110040.7188680.47. > >>> 10135434303rs11101905GAAsex=Female > >>> 11863-0.1204040.0536069-2.246050.0247006. > >>> 10135434303rs11101905GAAage11863 > >>> 0.005245010.003915281.339630.180367. > >>> 10135434303rs11101905GAAPC111863 > >>> -0.01917790.0166868-1.149280.25044. > >>> 10135434303rs11101905GAAPC211863 > >>> -0.02699390.0173086-1.559570.118863. > >>> 10135434303rs11101905GAAPC311863 > >>> 0.01152070.01680760.6854480.493061. > >>> 10135434303rs11101905GAAPC411863 > >>> 9.57832e-050.01246070.00768680.993867. > >>> 10135434303rs11101905GAAPC511863 > >>> -0.001910470.00543937-0.351230.725416. > >>> 10135434303rs11101905GAAPC611863 > >>> -0.01033090.0159879-0.6461720.5181
Re: [R] How to represent the effect of one covariate on regression results?
Hi Abby and David, Thanks for the useful tips! I will check those. I completed the regression analysis in plink (as R would be very slow for my sample size) but as I mentioned I need to determine the influence of a specific covariate in my results and Plink is of no help there. I did Pearson correlation analysis for P values which I got in regression with and without my covariate of interest and I got this: > cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95) Pearson's product-moment correlation data: tt$P_TD and tt$P_noTD t = 20.17, df = 283, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7156134 0.8117108 sample estimates: cor 0.7679493 I can see the p values are very correlated in those two instances. Can I conclude that my covariate then doesn't have a huge effect or what kind of conclusion I can draw from that? Thanks for all your help Ana On Tue, Sep 15, 2020 at 1:26 AM David Winsemius wrote: > > There is a user-group for PLINK, easily found by looking at the page you > cited. This is not the correct place to submit such questions. > > > https://groups.google.com/g/plink2-users?pli=1 > > > -- > > David. > > On 9/14/20 6:29 AM, Ana Marija wrote: > > Hello, > > > > I was running association analysis using --glm genotypic from: > > https://www.cog-genomics.org/plink/2.0/assoc with these covariates: > > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The > > result looks like this: > > > > #CHROMPOSIDREFALTA1TESTOBS_CTBETA > >SEZ_OR_F_STATPERRCODE > > 10135434303rs11101905GAAADD11863 > > -0.1107330.0986981-1.121930.261891. > > 10135434303rs11101905GAADOMDEV11863 > > 0.0797970.1110040.7188680.47. > > 10135434303rs11101905GAAsex=Female > > 11863-0.1204040.0536069-2.246050.0247006. > > 10135434303rs11101905GAAage11863 > > 0.005245010.003915281.339630.180367. > > 10135434303rs11101905GAAPC111863 > > -0.01917790.0166868-1.149280.25044. > > 10135434303rs11101905GAAPC211863 > > -0.02699390.0173086-1.559570.118863. > > 10135434303rs11101905GAAPC311863 > > 0.01152070.01680760.6854480.493061. > > 10135434303rs11101905GAAPC411863 > > 9.57832e-050.01246070.00768680.993867. > > 10135434303rs11101905GAAPC511863 > > -0.001910470.00543937-0.351230.725416. > > 10135434303rs11101905GAAPC611863 > > -0.01033090.0159879-0.6461720.518168. > > 10135434303rs11101905GAAPC711863 > > 0.007909970.01440250.5492070.582863. > > 10135434303rs11101905GAAPC811863 > > -0.002056390.0142709-0.1440960.885424. > > 10135434303rs11101905GAAPC911863 > > -0.008737710.0057239-1.526530.126878. > > 10135434303rs11101905GAAPC1011863 > > 0.01161970.01238260.9383880.348045. > > 10135434303rs11101905GAATD11863 > > -0.6700260.0962216-6.963373.32228e-12. > > 10135434303rs11101905GAAarray=Biobank > > 118630.1606660.0736312.182050.0291062. > > 10135434303rs11101905GAAHBA1C11863 > > 0.02659330.0016875815.75836.0236e-56. > > 10135434303rs11101905GAAGENO_2DF11863 > >NANA0.7265140.483613. > > > > This results is shown just for one ID (rs11101905) there is about 2 > > million of those in the resulting file. > > > > My question is how do I present/plot the effect of covariate "TD" in > > the example it has "P" equal to 3.32228e-12 for all IDs in the > > resulting file so that I show how much effect covariate "TD" has on > > the analysis. Should I run another regression without covariate "TD" > > and than do scatter plot of P values with and without "TD" covariate > > or there is a better way to do this from the data I already have? > > > > Thanks > > Ana >
Re: [R] how to replace values in a named vector
sorry not replace with NA but with empty string for a name, for example for example this: > geneSymbol["Ku8QhfS0n_hIOABXuE"] Ku8QhfS0n_hIOABXuE "MACC1" would go when I subject it to > geneSymbol["Ku8QhfS0n_hIOABXuE"] Ku8QhfS0n_hIOABXuE On Mon, Sep 14, 2020 at 11:35 AM Ana Marija wrote: > > Hello, > > I have a vector like this: > > > head(geneSymbol) > Ku8QhfS0n_hIOABXuE Bx496XsFXiAlj.Eaeo W38p0ogk.wIBVRXllY > QIBkqIS9LR5DfTlTS8 BZKiEvS0eQ305U0v34 6TheVd.HiE1UF3lX6g >"MACC1""GGACT" "A4GALT" > "NPSR1-AS1""NPSR1-AS1" "AAAS" > > it has around 15000 entries. How do I replace all values with NA > expect these that are named like this: > > geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")] > > > > geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")] > 0lQ1XozriVZTn.PezY uaeFiCdegrnWFijF_s ZOluqaxSe3ndekoNng > 912ny6eCHjnlY2XSCU odF3XHR8CVl4SAUaUQ > "FLCN" "FLCN" "FLCN" > "UCA1" "IL1B" > > Thanks > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to replace values in a named vector
Hello, I have a vector like this: > head(geneSymbol) Ku8QhfS0n_hIOABXuE Bx496XsFXiAlj.Eaeo W38p0ogk.wIBVRXllY QIBkqIS9LR5DfTlTS8 BZKiEvS0eQ305U0v34 6TheVd.HiE1UF3lX6g "MACC1""GGACT" "A4GALT" "NPSR1-AS1""NPSR1-AS1" "AAAS" it has around 15000 entries. How do I replace all values with NA expect these that are named like this: geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")] > geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")] 0lQ1XozriVZTn.PezY uaeFiCdegrnWFijF_s ZOluqaxSe3ndekoNng 912ny6eCHjnlY2XSCU odF3XHR8CVl4SAUaUQ "FLCN" "FLCN" "FLCN" "UCA1" "IL1B" Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to represent the effect of one covariate on regression results?
Hello, I was running association analysis using --glm genotypic from: https://www.cog-genomics.org/plink/2.0/assoc with these covariates: sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The result looks like this: #CHROMPOSIDREFALTA1TESTOBS_CTBETA SEZ_OR_F_STATPERRCODE 10135434303rs11101905GAAADD11863 -0.1107330.0986981-1.121930.261891. 10135434303rs11101905GAADOMDEV11863 0.0797970.1110040.7188680.47. 10135434303rs11101905GAAsex=Female 11863-0.1204040.0536069-2.246050.0247006. 10135434303rs11101905GAAage11863 0.005245010.003915281.339630.180367. 10135434303rs11101905GAAPC111863 -0.01917790.0166868-1.149280.25044. 10135434303rs11101905GAAPC211863 -0.02699390.0173086-1.559570.118863. 10135434303rs11101905GAAPC311863 0.01152070.01680760.6854480.493061. 10135434303rs11101905GAAPC411863 9.57832e-050.01246070.00768680.993867. 10135434303rs11101905GAAPC511863 -0.001910470.00543937-0.351230.725416. 10135434303rs11101905GAAPC611863 -0.01033090.0159879-0.6461720.518168. 10135434303rs11101905GAAPC711863 0.007909970.01440250.5492070.582863. 10135434303rs11101905GAAPC811863 -0.002056390.0142709-0.1440960.885424. 10135434303rs11101905GAAPC911863 -0.008737710.0057239-1.526530.126878. 10135434303rs11101905GAAPC1011863 0.01161970.01238260.9383880.348045. 10135434303rs11101905GAATD11863 -0.6700260.0962216-6.963373.32228e-12. 10135434303rs11101905GAAarray=Biobank 118630.1606660.0736312.182050.0291062. 10135434303rs11101905GAAHBA1C11863 0.02659330.0016875815.75836.0236e-56. 10135434303rs11101905GAAGENO_2DF11863 NANA0.7265140.483613. This results is shown just for one ID (rs11101905) there is about 2 million of those in the resulting file. My question is how do I present/plot the effect of covariate "TD" in the example it has "P" equal to 3.32228e-12 for all IDs in the resulting file so that I show how much effect covariate "TD" has on the analysis. Should I run another regression without covariate "TD" and than do scatter plot of P values with and without "TD" covariate or there is a better way to do this from the data I already have? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from .Rdata format
do you think that this is useful output from Basics of R? > load("paired_example.Rdata") > str(rawdata) num [1:4482, 1:10] 46 4 3 48 1 4 0 60 0 12 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:4482] "gene1" "gene2" "gene3" "gene4" ... ..$ : chr [1:10] "a.cancer" "b.cancer" "c.cancer" "d.cancer" .. I think this one is better: > dat<-local(get(load("paired_example.Rdata"))) > head(dat) a.cancer b.cancer c.cancer d.cancer e.cancer a.normal b.normal c.normal gene1 464 3358 615 42 gene240215 241 30 gene334421300 gene4 482 1006943 gene515236103 gene640011407 On Fri, Jul 31, 2020 at 11:05 PM Bert Gunter wrote: > > Sarah has explained all. > > I agree with her about the need for tutorials also. This list cannot > substitute for such homework on your own. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Jul 31, 2020 at 7:55 PM Ana Marija > wrote: >> >> Hi Bert, >> >> it gives me this: >> >> > a=load("paired_example.Rdata") >> > str(a) >> chr [1:3] "rawdata" "treatment" "patient" >> >> I don't know how to extract "treatment" for example in a data frame. >> >> I tried this but of no help. >> > b=a[[2]] >> > b >> [1] "treatment" >> >> > str(treatment) >> chr [1:10] "treat" "treat" "treat" "treat" "treat" "control" "control" ... >> >> but this is not the format I need. >> >> On Fri, Jul 31, 2020 at 9:18 PM Ana Marija >> wrote: >> > >> > Hello, >> > >> > I have this file: >> > > a=load("paired_example.Rdata") >> > > a >> > [1] "rawdata" "treatment" "patient" >> > >> > I can extract "rawdata" with: >> > dat<-local(get(load("paired_example.Rdata"))) >> > >> > Can you please advise how would I extract in data frame "treatment" >> > and "patient"? >> > >> > Thanks >> > Ana >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from .Rdata format
It seems that "treatment" and "patient" are just vectors. > treatment [1] "treat" "treat" "treat" "treat" "treat" "control" "control" [8] "control" "control" "control" > patient [1] "a" "b" "c" "d" "e" "a" "b" "c" "d" "e" On Fri, Jul 31, 2020 at 9:53 PM Ana Marija wrote: > > Hi Bert, > > it gives me this: > > > a=load("paired_example.Rdata") > > str(a) > chr [1:3] "rawdata" "treatment" "patient" > > I don't know how to extract "treatment" for example in a data frame. > > I tried this but of no help. > > b=a[[2]] > > b > [1] "treatment" > > > str(treatment) > chr [1:10] "treat" "treat" "treat" "treat" "treat" "control" "control" ... > > but this is not the format I need. > > On Fri, Jul 31, 2020 at 9:18 PM Ana Marija > wrote: > > > > Hello, > > > > I have this file: > > > a=load("paired_example.Rdata") > > > a > > [1] "rawdata" "treatment" "patient" > > > > I can extract "rawdata" with: > > dat<-local(get(load("paired_example.Rdata"))) > > > > Can you please advise how would I extract in data frame "treatment" > > and "patient"? > > > > Thanks > > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract information from .Rdata format
Hi Bert, it gives me this: > a=load("paired_example.Rdata") > str(a) chr [1:3] "rawdata" "treatment" "patient" I don't know how to extract "treatment" for example in a data frame. I tried this but of no help. > b=a[[2]] > b [1] "treatment" > str(treatment) chr [1:10] "treat" "treat" "treat" "treat" "treat" "control" "control" ... but this is not the format I need. On Fri, Jul 31, 2020 at 9:18 PM Ana Marija wrote: > > Hello, > > I have this file: > > a=load("paired_example.Rdata") > > a > [1] "rawdata" "treatment" "patient" > > I can extract "rawdata" with: > dat<-local(get(load("paired_example.Rdata"))) > > Can you please advise how would I extract in data frame "treatment" > and "patient"? > > Thanks > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract information from .Rdata format
Hello, I have this file: > a=load("paired_example.Rdata") > a [1] "rawdata" "treatment" "patient" I can extract "rawdata" with: dat<-local(get(load("paired_example.Rdata"))) Can you please advise how would I extract in data frame "treatment" and "patient"? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to loop over two files ...
Thank you so much as.character(r) indeed resolved the issue! On Sat, Jun 20, 2020 at 3:47 AM Ivan Krylov wrote: > > On Fri, 19 Jun 2020 19:36:41 -0500 > Ana Marija wrote: > > > Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), > > "\n"), : argument 1 (type 'list') cannot be handled by 'cat' > > It might be a good idea to try to solve problems like this yourself > instead of waiting for hours for someone to reply. All the required > information is there in the error message: write() fails because r is a > list. Why is r a list? It's returned from GET(), so let's read its > documentation. > > httr::GET() returns a response object, not a string [1]. Try passing > as.character(r) or content(r,'text') instead of just r to write(...) or > use a different way of extracting the actual response from the response > object. > > -- > Best regards, > Ivan > > [1] https://httr.r-lib.org/reference/GET.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to loop over two files ...
unfortunately it complains again: > f1 <- read_tsv("1g", col_names=F) Parsed with column specification: cols( X1 = col_character() ) > f2 <- read_tsv("1n", col_names=F) Parsed with column specification: cols( X1 = col_character() ) > for ( a in rownames(f1) ) { + +for ( b in rownames(f2) ) { + + ext <- paste0( "/ld/human/pairwise/", + f1[a,1], + "/", + f2[b,1], + "?population_name=1000GENOMES:phase_3:KHV") + + r <- GET(paste(server, ext, sep = ""), + content_type("application/json")) + + write(r,file="list.txt",append=TRUE) + + +} + + } Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"), : argument 1 (type 'list') cannot be handled by 'cat' > traceback() 2: cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"), append = append) 1: write(r, file = "list.txt", append = TRUE) On Fri, Jun 19, 2020 at 5:19 PM wrote: > > Sorry - its been a long week! > > there is a foreach package but I try to avoid extras > > make your for statements: > > for ( a in rownames(f1) ) { > > # a will now be a row number rather than the value, so replace ' a ' in > the paste0 with: f1[ a, 1] > > so > > ext <- paste0( "/ld/human/pairwise/", > f1[a,1], > "/", > f2[b,1], > "?population_name=1000GENOMES:phase_3:KHV") > > On 2020-06-19 22:54, Ana Marija wrote: > > I tried it: > > > > > library(httr) > >> library(jsonlite) > >> library(xml2) > >> library(readr) > >> server <- "http://rest.ensembl.org"; > >> f1 <- read_tsv("1g", col_names=F) > > Parsed with column specification: > > cols( > > X1 = col_character() > > ) > >> f2 <- read_tsv("1n", col_names=F) > > Parsed with column specification: > > cols( > > X1 = col_character() > > ) > >> > >> for ( a in as.list(f1[,1]) ) { > > + > > +for ( b in as.list(f2[,1]) ) { > > + > > + ext <- paste0( "/ld/human/pairwise/", > > + a, > > + "/", > > + b, > > + "?population_name=1000GENOMES:phase_3:KHV") > > + > > + r <- GET(paste(server, ext, sep = ""), > > + content_type("application/json")) > > + > > + write(r,file="list.txt",append=TRUE) > > + > > + > > +} > > + > > + } > > Error in parse_url(url) : length(url) == 1 is not TRUE > > > >> traceback() > > 10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p))) > > 9: stopifnot(length(url) == 1) > > 8: parse_url(url) > > 7: is.url(url) > > 6: stopifnot(is.url(url)) > > 5: build_url(parse_url(url)[c("scheme", "hostname", "port")]) > > 4: handle_name(url) > > 3: handle_find(url) > > 2: handle_url(handle, url, ...) > > 1: GET(paste(server, ext, sep = ""), content_type("application/json")) > > > > On Fri, Jun 19, 2020 at 4:41 PM wrote: > >> > >> Oh - read.text isn't in base! Not sure where is came from (my head > >> mostly!) You may have something that adds it but better to use > >> something that works. So try using: > >> > >> library(readr) > >> f1 <- read_tsv("1g.txt", col.names=F) > >> > >> This will give you a tibble with f1$X1 with the file in it > >> > >> then loop it with (a in as.list(f1[,1]) > >> > >> Others will have much slicker code than me! > >> > >> On 2020-06-19 22:02, Ana Marija wrote: > >> > Hi, > >> > > >> > thanks for getting back to me, it is just for my job :) > >> > > >> > so I tried it: > >> > > >> > library(httr) > >> > library(jsonlite) > >> > library(xml2) > >> > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", > >> > "lib"))) > >> > sparkR.session(master = "local[*]", sparkConfig = > >> > list(spark.driver.memory = "2g")) > >> > > >> > server <- "http://rest.ensembl.org"; > >> > > >>
Re: [R] How to loop over two files ...
Hi Rasmus, I got those SNPs from two GWAS-es which I run with different phenotypes and I would like to compare weather the top SNPs in both of them are in LD. So 1n.txt and 1g.txt are just top SNPs from those two GWAS-es. Unfortunately https://ldlink.nci.nih.gov/?tab=ldpair works for only two SNPs at the time and I need to do that for 300 pairs On Fri, Jun 19, 2020 at 6:42 PM Rasmus Liland wrote: > > On 2020-06-19 14:34 -0500, Ana Marija wrote: > > > > I have two files (each has 300 lines)like this: > > The example looks quite similar to the R example in > https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra > ... > > The question becomes: how did you query the > 600 variant names in 1g.txt and 1n.txt? > > curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H > 'Content-type:application/json' > > shows the 26 population_names for the > rs6792369/rs1042779 combination ... __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to loop over two files ...
I tried it: > library(httr) > library(jsonlite) > library(xml2) > library(readr) > server <- "http://rest.ensembl.org"; > f1 <- read_tsv("1g", col_names=F) Parsed with column specification: cols( X1 = col_character() ) > f2 <- read_tsv("1n", col_names=F) Parsed with column specification: cols( X1 = col_character() ) > > for ( a in as.list(f1[,1]) ) { + +for ( b in as.list(f2[,1]) ) { + + ext <- paste0( "/ld/human/pairwise/", + a, + "/", + b, + "?population_name=1000GENOMES:phase_3:KHV") + + r <- GET(paste(server, ext, sep = ""), + content_type("application/json")) + + write(r,file="list.txt",append=TRUE) + + +} + + } Error in parse_url(url) : length(url) == 1 is not TRUE > traceback() 10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p))) 9: stopifnot(length(url) == 1) 8: parse_url(url) 7: is.url(url) 6: stopifnot(is.url(url)) 5: build_url(parse_url(url)[c("scheme", "hostname", "port")]) 4: handle_name(url) 3: handle_find(url) 2: handle_url(handle, url, ...) 1: GET(paste(server, ext, sep = ""), content_type("application/json")) On Fri, Jun 19, 2020 at 4:41 PM wrote: > > Oh - read.text isn't in base! Not sure where is came from (my head > mostly!) You may have something that adds it but better to use > something that works. So try using: > > library(readr) > f1 <- read_tsv("1g.txt", col.names=F) > > This will give you a tibble with f1$X1 with the file in it > > then loop it with (a in as.list(f1[,1]) > > Others will have much slicker code than me! > > On 2020-06-19 22:02, Ana Marija wrote: > > Hi, > > > > thanks for getting back to me, it is just for my job :) > > > > so I tried it: > > > > library(httr) > > library(jsonlite) > > library(xml2) > > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", > > "lib"))) > > sparkR.session(master = "local[*]", sparkConfig = > > list(spark.driver.memory = "2g")) > > > > server <- "http://rest.ensembl.org"; > > > > f1 <- read.text("1g.txt") > > f2 <- read.text("1n.txt") > > > > for ( a in as.list(f1) ) { > > > >for ( b in as.list(f2) ) { > > > > ext <- paste0( "/ld/human/pairwise/", > > a, > > "/", > > b, > > "?population_name=1000GENOMES:phase_3:KHV") > > > > r <- GET(paste(server, ext, sep = ""), > > content_type("application/json")) > > > > write(r,file="list.txt",append=TRUE) > > > > > >} > > > > } > > > > and I got this error: > > Error in as.list.default(f1) : > > no method for coercing this S4 class to a vector > > > > Please advise > > > > On Fri, Jun 19, 2020 at 3:28 PM wrote: > >> > >> so (untested) if you did something like > >> > >> f1 <- read.text("1g.txt") > >> f2 <- read.text("1n.txt") > >> > >> for ( a in as.list(f1) ) { > >> > >>for ( b in as.list(f2) ) { > >> > >> ext <- paste0( "/ld/human/pairwise/", > >> a, > >> "/", > >> b, > >> "?population_name=1000GENOMES:phase_3:KHV") > >> > >> r <- GET(paste(server, ext, sep = ""), > >> content_type("application/json")) > >> > >> # You presumably need to do something with 'r' at the > >> moment its over written by the next loop.. were > >> # you appending it to list.txt? Possibly its just a > >> bit > >> of the R output you want.? > >> > >> write(r,file="list.txt",append=TRUE) > >> > >> > >>} > >> > >> } > >> > >> > >> Are we doing your PhD for you ;-) Do we get to share ;-) > >> > >> > >> On 2020-06-19 20:34, Ana Marija wrote: > >> > Hello, > >> > > >> > I have two files (each has 300 lines)like this: > >> > > >> >
Re: [R] How to loop over two files ...
HI Rasmus, I tried it: library(base) files <- c("1g.txt", "1n.txt") files <- lapply(files, readLines) server <- "http://rest.ensembl.org"; population.name <- "1000GENOMES:phase_3:KHV" ext <- apply(expand.grid(files), 1, function(x) { return(paste0(server, "/ld/human/pairwise/", x[1], "/", x[2], "?population_name=", population.name)) }) r <- readRDS(paste0(population.name, ".rds")) lapply(r[1:4], function(x) { jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) }) and I got this error: > r <- readRDS(paste0(population.name, ".rds")) Error in gzfile(file, "rb") : cannot open the connection In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable reason 'No such file or directory' > lapply(r[1:4], function(x) { + jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) + }) Error in lapply(r[1:4], function(x) { : object 'r' not found Am I am doing here something wrong? Do I need any other libraries loaded? Thanks Ana On Fri, Jun 19, 2020 at 3:49 PM Rasmus Liland wrote: > > On 2020-06-19 14:34 -0500, Ana Marija wrote: > > > > server <- "http://rest.ensembl.org"; > > ext <- > > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > > > r <- GET(paste(server, ext, sep = ""), content_type("application/json")) > > > > stop_for_status(r) > > head(fromJSON(toJSON(content(r > >d_prime r2 variation1 variation2 population_name > > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > > > What I would like to do is to do is to run this command for every SNP > > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > > is rs# and output every line of result in list.txt > > Dear Ana, > > I tried, but for some reason I get only a > response for the first URL you supplied. > > I wrote this: > > files <- c("1g.txt", "1n.txt") > files <- lapply(files, readLines) > server <- "http://rest.ensembl.org"; > population.name <- "1000GENOMES:phase_3:KHV" > ext <- apply(expand.grid(files), 1, function(x) { > return(paste0(server, "/ld/human/pairwise/", > x[1], "/", x[2], > "?population_name=", population.name)) > }) > > # r <- lapply(ext, function(x) { > # httr::GET(x, httr::content_type("application/json")) > # }) > # names(r) <- ext > # file <- paste0(population.name, ".rds") > # saveRDS(object=r, compress="xz", file=file) > > r <- readRDS(paste0(population.name, ".rds")) > lapply(r[1:4], function(x) { > jsonlite::fromJSON(jsonlite::toJSON(httr::content(x))) > }) > > > Which if you are able to run it (saving the > output in that rds file), yields this: > > > $`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV` > variation2 population_name d_prime r2 variation1 > 1 rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626 rs6792369 > > > $`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV` > list() > > > $`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV` > list() > > > $`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV` > list() > > For some reason, only the first url works ... > > I am a bit unfamiliar working with REST > API's. Or web scraping in general. Daniel > Cegiełka knows something in this thread some > days ago, where it might be similar to the > API of borsaitaliana.it, where you can supply > headers with curl like he quickly did [2]. > > You might be able to supply the list of SNPs > in a header to Ensemble in httr::GET somehow > if you read some docs on their API? > > Best, > Rasmus > > [1] https://marc.info/?t=15924924612&r=1&w=2 > [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to loop over two files ...
Hi, thanks for getting back to me, it is just for my job :) so I tried it: library(httr) library(jsonlite) library(xml2) library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g")) server <- "http://rest.ensembl.org"; f1 <- read.text("1g.txt") f2 <- read.text("1n.txt") for ( a in as.list(f1) ) { for ( b in as.list(f2) ) { ext <- paste0( "/ld/human/pairwise/", a, "/", b, "?population_name=1000GENOMES:phase_3:KHV") r <- GET(paste(server, ext, sep = ""), content_type("application/json")) write(r,file="list.txt",append=TRUE) } } and I got this error: Error in as.list.default(f1) : no method for coercing this S4 class to a vector Please advise On Fri, Jun 19, 2020 at 3:28 PM wrote: > > so (untested) if you did something like > > f1 <- read.text("1g.txt") > f2 <- read.text("1n.txt") > > for ( a in as.list(f1) ) { > >for ( b in as.list(f2) ) { > > ext <- paste0( "/ld/human/pairwise/", > a, > "/", > b, > "?population_name=1000GENOMES:phase_3:KHV") > > r <- GET(paste(server, ext, sep = ""), > content_type("application/json")) > > # You presumably need to do something with 'r' at the > moment its over written by the next loop.. were > # you appending it to list.txt? Possibly its just a bit > of the R output you want.? > > write(r,file="list.txt",append=TRUE) > > >} > > } > > > Are we doing your PhD for you ;-) Do we get to share ;-) > > > On 2020-06-19 20:34, Ana Marija wrote: > > Hello, > > > > I have two files (each has 300 lines)like this: > > > > head 1g.txt > > rs6792369 > > rs1414517 > > rs16857712 > > rs16857703 > > rs12239392 > > ... > > > > head 1n.txt > > rs1042779 > > rs2360630 > > rs10753597 > > rs7549096 > > rs2343491 > > ... > > > > For each pair of rs# from those two files I can run this command in R > > > > library(httr) > > library(jsonlite) > > library(xml2) > > > > server <- "http://rest.ensembl.org"; > > ext <- > > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" > > > > r <- GET(paste(server, ext, sep = ""), > > content_type("application/json")) > > > > stop_for_status(r) > > head(fromJSON(toJSON(content(r > >d_prime r2 variation1 variation2 population_name > > 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV > > > > What I would like to do is to do is to run this command for every SNP > > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# > > is rs# and output every line of result in list.txt > > > > The process is illustrated in the attachment. > > > > Please help, > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to loop over two files ...
Hello, I have two files (each has 300 lines)like this: head 1g.txt rs6792369 rs1414517 rs16857712 rs16857703 rs12239392 ... head 1n.txt rs1042779 rs2360630 rs10753597 rs7549096 rs2343491 ... For each pair of rs# from those two files I can run this command in R library(httr) library(jsonlite) library(xml2) server <- "http://rest.ensembl.org"; ext <- "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV" r <- GET(paste(server, ext, sep = ""), content_type("application/json")) stop_for_status(r) head(fromJSON(toJSON(content(r d_prime r2 variation1 variation2 population_name 1 0.975513 0.951626 rs6792369 rs1042779 1000GENOMES:phase_3:KHV What I would like to do is to do is to run this command for every SNP in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP# is rs# and output every line of result in list.txt The process is illustrated in the attachment. Please help, Ana lists.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing variances/distributions
would using Kolmogorov-Smirnov test make more sense here? > x=m$Pold > y=m$Pnew > ks.test(x,y) Two-sample Kolmogorov-Smirnov test data: x and y D = 0.0049066, p-value = 1.221e-15 alternative hypothesis: two-sided Warning message: In ks.test(x, y) : p-value will be approximate in the presence of ties as I understand high p-values here say I cannot claim statistical support for a difference, but low p-values are not evidence of sameness? D should be the maximum difference between the two probability distributions? On Wed, Jun 17, 2020 at 3:06 PM Ana Marija wrote: > > Hello, > > I have p values from two distributions, Pold and Pnew > > head(m) >CHR POS MARKER Pnew Pold > 1: 1 785989 rs2980300 0.1419 0.9521 > 2: 1 1130727 rs10907175 0.1022 0.4750 > 3: 1 1156131 rs2887286 0.3698 0.5289 > 4: 1 1158631 rs6603781 0.1929 0.2554 > 5: 1 1211292 rs6685064 0.6054 0.2954 > 6: 1 1478153 rs3766180 0.6511 0.5542 > ... > > In order to compare those two distributions (QQ plots shown in attach) > does it make sense to use: > > var.test(m$Pold, m$Pnew, alternative = "two.sided") > > F test to compare two variances > > data: m$Pold and m$Pnew > F = 0.99937, num df = 1454159, denom df = 1454159, p-value = 0.7057 > alternative hypothesis: true ratio of variances is not equal to 1 > 95 percent confidence interval: > 0.9970808 1.0016750 > sample estimates: > ratio of variances > 0.9993739 > > > Or some other test makes more sense? > > Thanks > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing variances/distributions
Hello, I have p values from two distributions, Pold and Pnew > head(m) CHR POS MARKER Pnew Pold 1: 1 785989 rs2980300 0.1419 0.9521 2: 1 1130727 rs10907175 0.1022 0.4750 3: 1 1156131 rs2887286 0.3698 0.5289 4: 1 1158631 rs6603781 0.1929 0.2554 5: 1 1211292 rs6685064 0.6054 0.2954 6: 1 1478153 rs3766180 0.6511 0.5542 ... In order to compare those two distributions (QQ plots shown in attach) does it make sense to use: var.test(m$Pold, m$Pnew, alternative = "two.sided") F test to compare two variances data: m$Pold and m$Pnew F = 0.99937, num df = 1454159, denom df = 1454159, p-value = 0.7057 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.9970808 1.0016750 sample estimates: ratio of variances 0.9993739 Or some other test makes more sense? Thanks Ana qq-plots-compressed.pdf Description: Adobe PDF document __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change manhattan plot code to get a different color per chromosome
Hi Jim, thank you so much for this elaborate code I will definitely use some hacks from it in the future. For now, for the purpose of compactness of the code I just set manually colors which I want to be there: manhattan(results_log,chr="CHR",bp="POS",p="META_pval",snp="MARKER",suggestiveline = F, genomewideline = F,main = "Gokind old",col = c("red2", "orange", "gold1","springgreen4", "dodgerblue", "darkorchid1","orchid1"),ylim=c(0,9),cex = 0.6) in this qqman function. Cheers, Ana On Tue, Jun 16, 2020 at 4:00 AM Jim Lemon wrote: > > Hi Ana, > Your attached image seems to have bailed out before landing in the > list. Here's how to do it using a simple manhattan plot with the data > from: > > https://reneshbedre.github.io//assets/posts/mhat/gwas_res_sim.csv > > gwas<-read.csv("gwas_res_sim.csv",stringsAsFactors=FALSE) > # get the data into chromosome order > gwas<-gwas[order(gwas$chr,gwas$SNP),] > gwas$BP<-rep(NA,dim(gwas)[1] > # fake some base positions > for(chromosome in 1:20) { > snporder<-order(as.numeric(gsub("rs","",gwas[gwas$chr == chromosome,"SNP"]))) > gwas$BP[gwas$chr == chromosome]<- > as.numeric(paste(chromosome,snporder,sep=".")) > } > library(plotrix) > # set the chromosome colors here - be more creative than me > chrcol<-color.scale(1:10,extremes=c("red","blue")) > > # simple manhattan plot function > manhattan<-function(x,CHR="CHR",BP="BP",SNP="SNP",p="p", > main="Manhattan Plot",xlab="Chromosome",ylab="-log10(p)", > pch=".",cex=1,siglevel=0.1,sigcol="green",sigcex=2, > chrlab=NULL,chrcol=NULL,annotate=FALSE) { > > par(xaxs="i",yaxs="i") > nchr<-length(unique(x[,CHR])) > if(is.null(chrlab)) chrlab<-1:nchr > ypos<--log10(x[,p]) > plot(x[,BP],ypos,ylim=c(0,max(ypos,na.rm=TRUE)*1.05), > main=main,xlab=xlab,ylab=ylab, > pch=pch,col=chrcol[x[,CHR]],cex=4,xaxt="n") > abline(h=-log10(siglevel),lty=2) > staxlab(1,at=(1:nchr) + 0.5,labels=chrlab) > sigindx<-which(ypos >= -log10(siglevel)) > sigSNP<-unique(x[sigindx,SNP]) > cat(length(sigindx),"sig\n") > points(x[sigindx,BP],ypos[sigindx], > pch=pch,col=sigcol,cex=sigcex) > if(annotate) text(x[sigindx,BP],ypos[sigindx]*1.03,x[sigindx,SNP]) > return(sigSNP=sigSNP) > } > > manhattan(gwas,CHR="chr",p="pvalue",cex=4,sigcex=6, > chrcol=chrcol,annotate=TRUE) > > Jim > > On Tue, Jun 16, 2020 at 8:06 AM Ana Marija > wrote: > > > > Hello, > > > > Is there is a way to set colors in this plot to look like this one in > > attach (different color for each CHR-there is 22 of them)? > > > > > > library(qqman) > > results_log <- read.table("meta_p_pos_chr.F", > > head=TRUE,stringsAsFactors=FALSE) > > png("META.png") > > manhattan(results_log,chr="CHR",bp="POS",p="META_pval",snp="MARKER",ylim > > = c(0, 10)) > > dev.off() > > > > Thanks > > Ana > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to change manhattan plot code to get a different color per chromosome
Hello, Is there is a way to set colors in this plot to look like this one in attach (different color for each CHR-there is 22 of them)? library(qqman) results_log <- read.table("meta_p_pos_chr.F", head=TRUE,stringsAsFactors=FALSE) png("META.png") manhattan(results_log,chr="CHR",bp="POS",p="META_pval",snp="MARKER",ylim = c(0, 10)) dev.off() Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else statement adjustemtn
HI Jim thank you so much! This is amazing answer!!! Ana On Sat, Jun 13, 2020 at 4:09 AM Jim Lemon wrote: > > Right, back from shopping. Since you have fourteen rows containing NAs > and you only want seven, we can infer that half of them must go. As > they are neatly divided into seven rows in which only one NA appears > and seven in which two stare meaninglessly out at us. I will assume > that the latter are the ones to be discarded. As your condition for > calculating "pheno" stated that a 2 in either FLASER or PLASER should > result in a 2 in pheno, the following statement closely conforms to > that: > > b<-read.table(text="FID IID FLASER PLASER > fam1837 G1837 1 NA > fam2410 G2410 NA NA > fam2838 G2838 NA 2 > fam3367 G3367 1 NA > fam3410 G3410 1 NA > fam3492 G3492 1 NA > fam0911 G911 NA NA > fam3834 G3834 2 NA > fam4708 G4708 NA 2 > fam5162 G5162 NA NA > fam5274 G5274 NA NA > fam0637 G637 NA NA > fam0640 G640 NA NA > fam0743 G743 NA NA > fam0911 G911 NA NA", > header=TRUE,stringsAsFactors=FALSE) > > b$pheno<-ifelse(b$FLASER == 2 | b$PLASER == 2,2,1) > # use the valid FLASER values when PLASER is NA > b[is.na(b$pheno),]$pheno<-ifelse(!is.na(b[is.na(b$pheno),]$FLASER), > b[is.na(b$pheno),]$FLASER,NA) > # use the valid PLASER values when FLASER if NA > b[is.na(b$pheno),]$pheno<-ifelse(!is.na(b[is.na(b$pheno),]$PLASER), > b[is.na(b$pheno),]$PLASER,NA) > b > > I could write that mess in one straitjacket of conditional statements > but my brain hurts enough. > > Jim > > > On Sat, Jun 13, 2020 at 1:59 PM Ana Marija > wrote: > > > > Great idea! > > Here it is: > > > b[is.na(b$FLASER) | is.na(b$PLASER),] > > FID IID FLASER PLASER pheno > > 1: fam1837 G1837 1 NA 2 > > 2: fam2410 G2410 NA NA 2 > > 3: fam2838 G2838 NA 2 2 > > 4: fam3367 G3367 1 NA 2 > > 5: fam3410 G3410 1 NA 2 > > 6: fam3492 G3492 1 NA 2 > > 7: fam3834 G3834 2 NA 2 > > 8: fam4708 G4708 NA 2 2 > > 9: fam5162 G5162 NA NA 2 > > 10: fam5274 G5274 NA NA 2 > > 11: fam0637 G637 NA NA 2 > > 12: fam0640 G640 NA NA 2 > > 13: fam0743 G743 NA NA 2 > > 14: fam0911 G911 NA NA 2 > > > > On Fri, Jun 12, 2020 at 10:29 PM Jim Lemon wrote: > > > > > > Since you have only a few troublesome NA values, if you look at them, > > > or even better, post them: > > > > > > b[is.na(b$FLASER) | is.na(b$PLASER),] > > > > > > perhaps we can work out the appropriate logic to get rid of only the > > > ones you don't want. > > > > > > Jim > > > > > > On Sat, Jun 13, 2020 at 12:50 PM Ana Marija > > > wrote: > > > > > > > > Hi Rasmus, > > > > > > > > thank you for getting back to be, the command your provided seems to > > > > add all 11 NAs to 2s > > > > > b$pheno <- > > > > + ifelse(b$PLASER==2 | > > > > + b$FLASER==2 | > > > > + is.na(b$PLASER) | > > > > + is.na(b$PLASER) & b$FLASER %in% 1:2 | > > > > + is.na(b$FLASER) & b$PLASER == 2, > > > > + 2, 1) > > > > > table(b$pheno, exclude = NULL) > > > > > > > > 1 2 > > > > 859 839 > > > > > > > > Once again my desired results is to keep these 7 NAs as NAs > > > > > table(b$PLASER,b$FLASER, exclude = NULL) > > > > > > > > 1 2 3 > > > > 1836 14 00 > > > > 2691 70 432 > > > > 3 2 7 210 > > > > 4 1 07 > > > > > > > > and have > > > > 825 2s (825=691+14+70+7+43) > > > > and the rest would be 1s (866=1698-7-825) > > > > > > > > On Fri, Jun 12, 2020 at 9:29 PM Rasmus Liland wrote: > > > > > > > > > > On 2020-06-13 11:30 +1000, Jim Lemon wrote: > > > > > > On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote: > > > > > > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija wrote: > > > > > > > > > > > > > >
Re: [R] if else statement adjustemtn
Great idea! Here it is: > b[is.na(b$FLASER) | is.na(b$PLASER),] FID IID FLASER PLASER pheno 1: fam1837 G1837 1 NA 2 2: fam2410 G2410 NA NA 2 3: fam2838 G2838 NA 2 2 4: fam3367 G3367 1 NA 2 5: fam3410 G3410 1 NA 2 6: fam3492 G3492 1 NA 2 7: fam3834 G3834 2 NA 2 8: fam4708 G4708 NA 2 2 9: fam5162 G5162 NA NA 2 10: fam5274 G5274 NA NA 2 11: fam0637 G637 NA NA 2 12: fam0640 G640 NA NA 2 13: fam0743 G743 NA NA 2 14: fam0911 G911 NA NA 2 On Fri, Jun 12, 2020 at 10:29 PM Jim Lemon wrote: > > Since you have only a few troublesome NA values, if you look at them, > or even better, post them: > > b[is.na(b$FLASER) | is.na(b$PLASER),] > > perhaps we can work out the appropriate logic to get rid of only the > ones you don't want. > > Jim > > On Sat, Jun 13, 2020 at 12:50 PM Ana Marija > wrote: > > > > Hi Rasmus, > > > > thank you for getting back to be, the command your provided seems to > > add all 11 NAs to 2s > > > b$pheno <- > > + ifelse(b$PLASER==2 | > > + b$FLASER==2 | > > + is.na(b$PLASER) | > > + is.na(b$PLASER) & b$FLASER %in% 1:2 | > > + is.na(b$FLASER) & b$PLASER == 2, > > + 2, 1) > > > table(b$pheno, exclude = NULL) > > > > 1 2 > > 859 839 > > > > Once again my desired results is to keep these 7 NAs as NAs > > > table(b$PLASER,b$FLASER, exclude = NULL) > > > > 1 2 3 > > 1836 14 00 > > 2691 70 432 > > 3 2 7 210 > > 4 1 07 > > > > and have > > 825 2s (825=691+14+70+7+43) > > and the rest would be 1s (866=1698-7-825) > > > > On Fri, Jun 12, 2020 at 9:29 PM Rasmus Liland wrote: > > > > > > On 2020-06-13 11:30 +1000, Jim Lemon wrote: > > > > On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote: > > > > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija wrote: > > > > > > > > > > > > I am trying to make a new column > > > > > > "pheno" so that I reduce the number > > > > > > of NAs > > > > > > > > > > it looks like those two NA values in > > > > > PLASER are the ones you want to drop. > > > > > > > > From just your summary table, it's hard to > > > > guess the distribution of NA values. > > > > > > Dear Ana, > > > > > > This small sample > > > > > > b <- read.table(text="FLASER;PLASER > > > 1;2 > > > ;2 > > > ; > > > 1; > > > 2; > > > 2;2 > > > 3;2 > > > 3;3 > > > 1;1", sep=";", header=TRUE) > > > > > > table(b$PLASER,b$FLASER, exclude = NULL) > > > > > > yields the same combinations you showed > > > earlier: > > > > > >1 2 3 > > > 11 0 00 > > > 21 1 11 > > > 30 0 10 > > >1 1 01 > > > > > > If you want to eliminate the four -based > > > combinations completely, this line > > > > > > b$pheno <- > > > ifelse(b$PLASER==2 | > > > b$FLASER==2 | > > > is.na(b$PLASER) | > > > is.na(b$PLASER) & b$FLASER %in% 1:2 | > > > is.na(b$FLASER) & b$PLASER == 2, > > > 2, 1) > > > table(b$pheno, exclude = NULL) > > > > > > will do it: > > > > > > 1 2 > > > 2 7 > > > > > > Best, > > > Rasmus > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else statement adjustemtn
Hi Rasmus, thank you for getting back to be, the command your provided seems to add all 11 NAs to 2s > b$pheno <- + ifelse(b$PLASER==2 | + b$FLASER==2 | + is.na(b$PLASER) | + is.na(b$PLASER) & b$FLASER %in% 1:2 | + is.na(b$FLASER) & b$PLASER == 2, + 2, 1) > table(b$pheno, exclude = NULL) 1 2 859 839 Once again my desired results is to keep these 7 NAs as NAs > table(b$PLASER,b$FLASER, exclude = NULL) 1 2 3 1836 14 00 2691 70 432 3 2 7 210 4 1 07 and have 825 2s (825=691+14+70+7+43) and the rest would be 1s (866=1698-7-825) On Fri, Jun 12, 2020 at 9:29 PM Rasmus Liland wrote: > > On 2020-06-13 11:30 +1000, Jim Lemon wrote: > > On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote: > > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija wrote: > > > > > > > > I am trying to make a new column > > > > "pheno" so that I reduce the number > > > > of NAs > > > > > > it looks like those two NA values in > > > PLASER are the ones you want to drop. > > > > From just your summary table, it's hard to > > guess the distribution of NA values. > > Dear Ana, > > This small sample > > b <- read.table(text="FLASER;PLASER > 1;2 > ;2 > ; > 1; > 2; > 2;2 > 3;2 > 3;3 > 1;1", sep=";", header=TRUE) > > table(b$PLASER,b$FLASER, exclude = NULL) > > yields the same combinations you showed > earlier: > >1 2 3 > 11 0 00 > 21 1 11 > 30 0 10 >1 1 01 > > If you want to eliminate the four -based > combinations completely, this line > > b$pheno <- > ifelse(b$PLASER==2 | > b$FLASER==2 | > is.na(b$PLASER) | > is.na(b$PLASER) & b$FLASER %in% 1:2 | > is.na(b$FLASER) & b$PLASER == 2, > 2, 1) > table(b$pheno, exclude = NULL) > > will do it: > > 1 2 > 2 7 > > Best, > Rasmus > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if else statement adjustemtn
Hi Jim, I tried it: > b$pheno<-ifelse(b$PLASER==2 | b$FLASER==2 |is.na(b$PLASER) & b$FLASER == > 2,2,1) > table(b$pheno,exclude = NULL) 12 859 828 11 > b$pheno<-ifelse(b$PLASER==2 | b$FLASER==2 |is.na(b$FLASER) & b$PLASER == > 2,2,1) > table(b$pheno,exclude = NULL) 12 859 828 11 Am I am doing something wrong? Thanks Ana On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote: > > Hi Ana, > From your desired result, it looks like those two NA values in PLASER > are the ones you want to drop. > If so, try this: > > b$pheno<-ifelse(b$PLASER==2 | b$FLASER==2 | > is.na(b$PLASER) & b$FLASER == 2,2,1) > > and if I have it the wrong way round, swap FLASER and PLASER in the > bit I have added. > > Jim > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija > wrote: > > > > Hello > > > > I have a data frame like this: > > > > > head(b) > >FID IID FLASER PLASER > > 1: fam1000 G1000 1 1 > > 2: fam1001 G1001 1 1 > > 3: fam1003 G1003 1 2 > > 4: fam1005 G1005 1 1 > > 5: fam1009 G1009 1 1 > > 6: fam1052 G1052 1 1 > > ... > > > > > table(b$PLASER,b$FLASER, exclude = NULL) > > > > 1 2 3 > > 1836 14 00 > > 2691 70 432 > > 3 2 7 210 > > 4 1 07 > > > > I am trying to make a new column "pheno" so that I reduce the number of NAs > > > > right now I am doing: > > > > > b$pheno=ifelse(b$PLASER==2 | b$FLASER==2,2,1) > > > table(b$pheno, exclude = NULL) > > > >12 > > 859 828 11 > > > > I would like to reduce this number of NAs to be 7 > > so I would like to have in "pheno column" > > 7 NAs > > 825 2s (825=691+14+70+7+43) > > and the rest would be 1s (866=1698-7-825) > > > > How can I change the above command to get these numbers? > > > > Thanks > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] if else statement adjustemtn
Hello I have a data frame like this: > head(b) FID IID FLASER PLASER 1: fam1000 G1000 1 1 2: fam1001 G1001 1 1 3: fam1003 G1003 1 2 4: fam1005 G1005 1 1 5: fam1009 G1009 1 1 6: fam1052 G1052 1 1 ... > table(b$PLASER,b$FLASER, exclude = NULL) 1 2 3 1836 14 00 2691 70 432 3 2 7 210 4 1 07 I am trying to make a new column "pheno" so that I reduce the number of NAs right now I am doing: > b$pheno=ifelse(b$PLASER==2 | b$FLASER==2,2,1) > table(b$pheno, exclude = NULL) 12 859 828 11 I would like to reduce this number of NAs to be 7 so I would like to have in "pheno column" 7 NAs 825 2s (825=691+14+70+7+43) and the rest would be 1s (866=1698-7-825) How can I change the above command to get these numbers? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stack two Stack manhattan plots?
Thank you so much it is getting better (see attach) when I do: ggplot( data = tmp.tidy) +geom_point( aes(y = -log10(value),x = CHR,color=key) ,position = "jitter", size=0.5 ) is there is a way to have separation between every chromosome shown better and also every number of chromosome shown on the x-axis? On Thu, Jun 11, 2020 at 4:39 PM wrote: > > Your dots are too big! > > Add > > geom_points(... , size = 1 > > May need to play... 0.5 or 0.1? > > On 11 Jun 2020 22:26, Ana Marija wrote: > > I tried it, > ggplot( data = tmp.tidy) +geom_point( aes(y = BP,x = CHR,color=key) > ,position = "jitter" ) > I got the attached > > On Thu, Jun 11, 2020 at 4:18 PM wrote: > > > > Try adding > > position = "jitter" to the geom_point(... > > > > > > > > On 11 Jun 2020 21:41, Ana Marija wrote: > > > > Hello, > > > > I tried your code and this is what I got > > > > I really need two groups side by side shown per chromosome as it is here: > > https://imgur.com/a/pj40c > > on the image there are 4 groups I do have only two > > > > On Thu, Jun 11, 2020 at 11:52 AM wrote: > > > > > > On 2020-06-11 15:59, Ana Marija wrote: > > > > yes all in one plot. > > > > So I want key (and therefore color)to be "Pold" and "Pnew" as those I > > > > am comparing per CHR > > > > so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis) > > > > On the end x-axis would have two strikes of Pold and Pnew (different > > > > colors) per one chromosome, and CHR would go from 1 to 22 > > > > > > > > > > > > > ggplot( data = tmp.tidy) + > > > geom_point( aes( > > > y = BP, > > > x = CHR, > > > color=key) ) > > > > > > ? > > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stack two Stack manhattan plots?
I tried it, ggplot( data = tmp.tidy) +geom_point( aes(y = BP,x = CHR,color=key) ,position = "jitter" ) I got the attached On Thu, Jun 11, 2020 at 4:18 PM wrote: > > Try adding > position = "jitter" to the geom_point(... > > > > On 11 Jun 2020 21:41, Ana Marija wrote: > > Hello, > > I tried your code and this is what I got > > I really need two groups side by side shown per chromosome as it is here: > https://imgur.com/a/pj40c > on the image there are 4 groups I do have only two > > On Thu, Jun 11, 2020 at 11:52 AM wrote: > > > > On 2020-06-11 15:59, Ana Marija wrote: > > > yes all in one plot. > > > So I want key (and therefore color)to be "Pold" and "Pnew" as those I > > > am comparing per CHR > > > so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis) > > > On the end x-axis would have two strikes of Pold and Pnew (different > > > colors) per one chromosome, and CHR would go from 1 to 22 > > > > > > > > > ggplot( data = tmp.tidy) + > > geom_point( aes( > > y = BP, > > x = CHR, > > color=key) ) > > > > ? > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stack two Stack manhattan plots?
Hello, I tried your code and this is what I got I really need two groups side by side shown per chromosome as it is here: https://imgur.com/a/pj40c on the image there are 4 groups I do have only two On Thu, Jun 11, 2020 at 11:52 AM wrote: > > On 2020-06-11 15:59, Ana Marija wrote: > > yes all in one plot. > > So I want key (and therefore color)to be "Pold" and "Pnew" as those I > > am comparing per CHR > > so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis) > > On the end x-axis would have two strikes of Pold and Pnew (different > > colors) per one chromosome, and CHR would go from 1 to 22 > > > > > ggplot( data = tmp.tidy) + > geom_point( aes( > y = BP, > x = CHR, > color=key) ) > > ? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stack two Stack manhattan plots?
yes all in one plot. So I want key (and therefore color)to be "Pold" and "Pnew" as those I am comparing per CHR so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis) On the end x-axis would have two strikes of Pold and Pnew (different colors) per one chromosome, and CHR would go from 1 to 22 On Thu, Jun 11, 2020 at 9:26 AM wrote: > > On 2020-06-11 14:54, Ana Marija wrote: > > Hello, > > > > I expected it to look like this: > > https://imgur.com/a/pj40c > > > > Ah - so all on the one plot? - so you don't want a facet. It puts two > plots side by side (or 22) > > > where x-axis would be CHR, there is 22 of them > >> unique(tmp.tidy$CHR) > > [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 > > > > You have 22 plots all appearing side by size as part of the facet > > I think you want color=CHR - but I don't know what your current color > setting is doing. > > > I also have two phenotypes (keys) which I would like to compare > >> unique(tmp.tidy$key) > > [1] "Pold" "Pnew" > > > > So did you want to Facet those? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stack two Stack manhattan plots?
Hello, I expected it to look like this: https://imgur.com/a/pj40c where x-axis would be CHR, there is 22 of them > unique(tmp.tidy$CHR) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 I also have two phenotypes (keys) which I would like to compare > unique(tmp.tidy$key) [1] "Pold" "Pnew" > dim(tmp.tidy) [1] 2600184 4 I would like the x-axis to be separated by CHR > sapply(tmp.tidy,class) CHR BP key value "integer" "integer" "character" "numeric" > str(tmp.tidy) 'data.frame':2600184 obs. of 4 variables: $ CHR : int 1 1 1 1 1 1 1 1 1 1 ... $ BP : int 785989 1130727 1156131 1158631 1211292 1478153 1500941 1506035 1510801 1721479 ... $ key : chr "Pold" "Pold" "Pold" "Pold" ... $ value: num 0.952 0.475 0.529 0.255 0.295 ... Unfortunately qqman doesn't do this kind of overlay of two plots On Wed, Jun 10, 2020 at 11:24 PM John wrote: > > On Wed, 10 Jun 2020 15:36:11 -0500 > Ana Marija wrote: > > > Hello, > > > > I have a data frame like this: > > > > > head(tmp1) > > CHR BP PoldPnew > > 1 1 785989 0.9521 0.09278 > > 2 1 1130727 0.4750 0.19010 > > 3 1 1156131 0.5289 0.48520 > > 4 1 1158631 0.2554 0.18140 > > 5 1 1211292 0.2954 0.48590 > > 6 1 1478153 0.5542 0.68790 > > ... > > > > I did: > > tmp.tidy <- tmp1 %>% gather(key, value, -BP, -CHR) > > jpeg("over.jpeg") > > ggplot(tmp.tidy, aes(BP, value, color=key)) + geom_point() + > > facet_wrap(~CHR, nrow=1) > > dev.off() > > > > but I got this plot in attach which doesn't make sense. Can you please > > advise how to make this plot? > > > > thanks > > Ana > > If you would, the str() output might help people understand what is > happening, and also how many records you're looking at. The head() > output is a bit thin on information. There are various manhattan plot > packages for R including a specialized package for ggplot2. > > JWDougherty __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stack two Stack manhattan plots?
HI Jim I run it like this: Rscript --no-save Manhattan_plot.R and just in case I added: stringsAsFactors=FALSE so the script looks like this: library(qqman) results_log <- read.table("logistic_results_M3.assoc.logistic.C", head=TRUE,stringsAsFactors=FALSE) jpeg("M3.assoc.logistic.jpeg") manhattan(results_log,chr="CHR",bp="BP",p="P",snp="SNP", main = "Manhattan plot:logistic") dev.off() this code does work with another data set, the jpeg() does work. I will try now with PNG On Wed, Jun 10, 2020 at 5:17 PM Jim Lemon wrote: > > Hi Ana, > The problem may be that the JPEG device doesn't handle transparency. > Perhaps PNG? > > Jim > > On Thu, Jun 11, 2020 at 6:48 AM Ana Marija > wrote: > > > > Hello, > > > > I have a data frame like this: > > > > > head(tmp1) > > CHR BP PoldPnew > > 1 1 785989 0.9521 0.09278 > > 2 1 1130727 0.4750 0.19010 > > 3 1 1156131 0.5289 0.48520 > > 4 1 1158631 0.2554 0.18140 > > 5 1 1211292 0.2954 0.48590 > > 6 1 1478153 0.5542 0.68790 > > ... > > > > I did: > > tmp.tidy <- tmp1 %>% gather(key, value, -BP, -CHR) > > jpeg("over.jpeg") > > ggplot(tmp.tidy, aes(BP, value, color=key)) + geom_point() + > > facet_wrap(~CHR, nrow=1) > > dev.off() > > > > but I got this plot in attach which doesn't make sense. Can you please > > advise how to make this plot? > > > > thanks > > Ana > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in plot.window(...) : need finite 'ylim' values
Hello, I do have a file like this: head M3.assoc.logistic.C CHR SNP BP P 1 1:785989:T:C 785989 0.4544 1 1:785989:T:C 785989 0.689 1 1:1130727:A:C 1130727 0.05068 1 1:1130727:A:C 1130727 0.07381 1 1:1156131:T:C 1156131 0.6008 1 1:1156131:T:C 1156131 0.8685 ... And I don't have any "NA" or "inf" values in it and I am plotting it in R via: library(qqman) results_log <- read.table("M3.assoc.logistic.C", head=TRUE) jpeg("Logistic_manhattan_retinopathy_M3.jpeg") manhattan(results_log,chr="CHR",bp="BP",p="P",snp="SNP", main = "Manhattan plot: logistic") dev.off() but I am getting: Error in plot.window(...) : need finite 'ylim' values Calls: manhattan ... do.call -> plot -> plot.default -> localWindow -> plot.window Execution halted Please advise, Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to filter variables which appear in any row but do not include
Hi Rui, thank you so much, that is exactly what I needed! Cheers, Ana On Wed, Jun 3, 2020 at 11:50 AM Rui Barradas wrote: > > Hello, > > If you want to filter out rows with any of the values in a 'unwanted' > vector, try the following. > > First, create a test data set. > > x <- scan(what = character(), text = ' > "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > "E107" "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > "E117" > ') > > set.seed(2020) > dat <- replicate(5, sample(x, 20, TRUE)) > dat <- as.data.frame(dat) > > > Now, remove all rows that have at least one of "E102" or "E112" > > > unwanted <- c("E102", "E112") > no <- sapply(dat, function(x){ >grepl(paste(unwanted, collapse = "|"), x) > }) > no <- apply(no, 1, any) > dat[!no, ] > > > That's it, if I understood the problem. > > > Hope this helps, > > Rui Barradas > > > Às 15:55 de 03/06/20, Ana Marija escreveu: > > Hello. > > > > I am trying to filter only rows that have ANY of these variables: > > E109, E119, E149 > > > > so I did: > > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > > > than I checked what I got: > >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > >> d0=unlist(s0) > >> d10=unique(d0) > >> d10 > > [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" > > [11] "E107" > > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) > > d1=unlist(s1) > > d11=unique(d1) > >> d11 > > [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" > > [11] "E117" > > > > I need help with changing this command > > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) > > > > so that in the output I do not have any rows that include E102 or E112? > > > > Thanks > > Ana > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to filter variables which appear in any row but do not include
Hi Bert The issue is that I have around 2000 columns so I can not be checking if those two are not present in each column of any row “by hand” so to speakAnd I need my output to be a data frame where neither E102 nor E112 are present. Basically from the data frame columns that I already created just remove any row that contains any of those variables. Thanks Ana On Wed, 3 Jun 2020 at 11:00, Bert Gunter wrote: > I suggest that you forget all that fancy stuff (and this is not a use > case for regular expressions). > Use %in% with logical subscripting instead -- basic R functionality that > can be found in any good R tutorial. > > > x <- c("ab","bc","cd") > > x[x %in% c("ab","cd")] > [1] "ab" "cd" > > x[!x %in% c("ab","cd")] > [1] "bc" > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Jun 3, 2020 at 7:56 AM Ana Marija > wrote: > >> Hello. >> >> I am trying to filter only rows that have ANY of these variables: >> E109, E119, E149 >> >> so I did: >> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) >> >> than I checked what I got: >> > s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) >> > d0=unlist(s0) >> > d10=unique(d0) >> > d10 >> [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" >> [11] "E107" >> s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) >> d1=unlist(s1) >> d11=unique(d1) >> > d11 >> [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" >> [11] "E117" >> >> I need help with changing this command >> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) >> >> so that in the output I do not have any rows that include E102 or E112? >> >> Thanks >> Ana >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to filter variables which appear in any row but do not include
Hello. I am trying to filter only rows that have ANY of these variables: E109, E119, E149 so I did: controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) than I checked what I got: > s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE)) > d0=unlist(s0) > d10=unique(d0) > d10 [1] "E10" "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102" [11] "E107" s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE)) d1=unlist(s1) d11=unique(d1) > d11 [1] "E11" "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112" [11] "E117" I need help with changing this command controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149"))) so that in the output I do not have any rows that include E102 or E112? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?
Hi Jim, not in this case, but thanks for asking! Ana On Mon, Jun 1, 2020 at 10:04 PM Jim Lemon wrote: > > So recombination sticks out its foot before us. Do you want to account > for gene linkage? > > JIm > > On Tue, Jun 2, 2020 at 11:55 AM Ana Marija > wrote: > > > > Hi Jim > > > > > neu3<-neu1[!(neu1$Marker %in% Marker3),] > > > dim(neu3) > > [1] 18579 > > > nep3<-nep1[!(nep1$Marker %in% Marker3),] > > > dim(nep3) > > [1] 55629 > > > ret3<-ret1[!(ret1$Marker %in% Marker3),] > > > dim(ret3) > > [1] 34939 > > > > > > If I do: > > > > nn1<-merge(neu1,nep1,by=c("Marker","Chr")) > > nn2<-merge(nn1,ret1,by=c("Marker","Chr")) > > > Marker3<-nn2$Marker > > > length(Marker3) > > [1] 3742962 > > > Marker4<-nn1$Marker > > > length(Marker4) > > [1] 373 > > > > On Mon, Jun 1, 2020 at 8:50 PM Ana Marija > > wrote: > > > > > > Hi David, > > > > > > that is a great point! > > > Yes indeed some are non unique: > > > > > > > dim(neu1) > > > [1] 3742845 9 > > > > length(unique(neu1$Marker)) > > > [1] 3741858 > > > > length(unique(nep1$Marker)) > > > [1] 3745560 > > > > dim(nep1) > > > [1] 3746550 9 > > > > length(unique(ret1$Marker)) > > > [1] 3743494 > > > > dim(ret1) > > > [1] 3743494 9 > > > > > > How would I rewrite this code so that is merging by Chr and Marker > > > column? It seems that a Marker can be under a few Chr. > > > > > > > > > > > > > > > > > > On Mon, Jun 1, 2020 at 8:41 PM David Winsemius > > > wrote: > > > > > > > > > > > > On 6/1/20 5:40 PM, Ana Marija wrote: > > > > > Hi Jim, > > > > > > > > > > thank you so much for getting back to me. I tried your code and this > > > > > is > > > > > what I get: > > > > >> dim(neu2) > > > > > [1] 3740988 9 > > > > >> dim(nep2) > > > > > [1] 3740988 9 > > > > >> dim(ret2) > > > > > [1] 3740001 9 > > > > > > > > > > I think I would need to have the same number of lines in all 3 data > > > > > frames. > > > > > > > > > > Can you please advise. > > > > > > > > > > > > You should check for duplicated Marker values. > > > > > > > > > > > > -- > > > > > > > > David > > > > > > > > > > > > > > Cheers > > > > > Ana > > > > > > > > > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon wrote: > > > > > > > > > >> Hi Ana, > > > > >> Not too hard, but your example has all the "marker" fields in common. > > > > >> So using a sample that will show the expected result: > > > > >> > > > > >> neu1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > > > > >> 1 10012 1:10012:G:T 0.229925 T G + 0.650403 1594 > > > > >> 1 10827 1:10827:C:T 0.287014 T C + 0.955449 1594 > > > > >> 1 12713 1:12713:C:T 0.097867 T C - 0.290455 1594 > > > > >> 1 12882 1:12882:T:G 0.287014 G T + 0.955449 1594 > > > > >> 1 12991 1:12991:G:A 0.097867 A G - 0.290455 1594 > > > > >> 1 14726 1:14726:G:A 0.132058 A G + 0.115005 1594", > > > > >> header=TRUE,stringsAsFactors=FALSE) > > > > >> > > > > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N > > > > >> 1 10012 1:10012:G:T 0.2300430 T G - 0.1420030 1641 > > > > >> 1 10827 1:10827:C:T 0.2867150 T C - 0.2045580 1641 > > > > >> 1 12713 1:12713:C:T 0.0975015 T C - 0.007 1641 > > > > >> 1 12882 1:12882:T:G 0.2867150 G T - 0.2045580 1641 > > > > >> 1 12991 1:12991:G:A 0.0975015 A G - 0.007 1641 > > > > >> 1 14726 1:14727:G:A 0.1325410 A G - 0.8725660 1641", > > > > >> header=TRUE,stringsAsFac
Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?
Hi Jim > neu3<-neu1[!(neu1$Marker %in% Marker3),] > dim(neu3) [1] 18579 > nep3<-nep1[!(nep1$Marker %in% Marker3),] > dim(nep3) [1] 55629 > ret3<-ret1[!(ret1$Marker %in% Marker3),] > dim(ret3) [1] 34939 If I do: nn1<-merge(neu1,nep1,by=c("Marker","Chr")) nn2<-merge(nn1,ret1,by=c("Marker","Chr")) > Marker3<-nn2$Marker > length(Marker3) [1] 3742962 > Marker4<-nn1$Marker > length(Marker4) [1] 373 On Mon, Jun 1, 2020 at 8:50 PM Ana Marija wrote: > > Hi David, > > that is a great point! > Yes indeed some are non unique: > > > dim(neu1) > [1] 3742845 9 > > length(unique(neu1$Marker)) > [1] 3741858 > > length(unique(nep1$Marker)) > [1] 3745560 > > dim(nep1) > [1] 3746550 9 > > length(unique(ret1$Marker)) > [1] 3743494 > > dim(ret1) > [1] 3743494 9 > > How would I rewrite this code so that is merging by Chr and Marker > column? It seems that a Marker can be under a few Chr. > > > > > > On Mon, Jun 1, 2020 at 8:41 PM David Winsemius wrote: > > > > > > On 6/1/20 5:40 PM, Ana Marija wrote: > > > Hi Jim, > > > > > > thank you so much for getting back to me. I tried your code and this is > > > what I get: > > >> dim(neu2) > > > [1] 3740988 9 > > >> dim(nep2) > > > [1] 3740988 9 > > >> dim(ret2) > > > [1] 3740001 9 > > > > > > I think I would need to have the same number of lines in all 3 data > > > frames. > > > > > > Can you please advise. > > > > > > You should check for duplicated Marker values. > > > > > > -- > > > > David > > > > > > > > Cheers > > > Ana > > > > > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon wrote: > > > > > >> Hi Ana, > > >> Not too hard, but your example has all the "marker" fields in common. > > >> So using a sample that will show the expected result: > > >> > > >> neu1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > > >> 1 10012 1:10012:G:T 0.229925 T G + 0.650403 1594 > > >> 1 10827 1:10827:C:T 0.287014 T C + 0.955449 1594 > > >> 1 12713 1:12713:C:T 0.097867 T C - 0.290455 1594 > > >> 1 12882 1:12882:T:G 0.287014 G T + 0.955449 1594 > > >> 1 12991 1:12991:G:A 0.097867 A G - 0.290455 1594 > > >> 1 14726 1:14726:G:A 0.132058 A G + 0.115005 1594", > > >> header=TRUE,stringsAsFactors=FALSE) > > >> > > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N > > >> 1 10012 1:10012:G:T 0.2300430 T G - 0.1420030 1641 > > >> 1 10827 1:10827:C:T 0.2867150 T C - 0.2045580 1641 > > >> 1 12713 1:12713:C:T 0.0975015 T C - 0.007 1641 > > >> 1 12882 1:12882:T:G 0.2867150 G T - 0.2045580 1641 > > >> 1 12991 1:12991:G:A 0.0975015 A G - 0.007 1641 > > >> 1 14726 1:14727:G:A 0.1325410 A G - 0.8725660 1641", > > >> header=TRUE,stringsAsFactors=FALSE) > > >> > > >> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > > >> 1 10012 1:10012:G:T 0.2322760 T G - 0.230383 1608 > > >> 1 10827 1:10827:C:T 0.2882460 T C - 0.120356 1608 > > >> 1 12713 1:12713:C:T 0.0982587 T C - 0.272936 1608 > > >> 1 12882 1:12882:T:G 0.2882460 G T - 0.120356 1608 > > >> 1 12991 1:12992:G:A 0.0982587 A G - 0.272936 1608 > > >> 1 14726 1:14727:G:A 0.1340170 A G - 0.594538 1608", > > >> header=TRUE,stringsAsFactors=FALSE) > > >> > > >> # merge the three data frames on "Marker" > > >> nn1<-merge(neu1,nep1,by="Marker") > > >> nn2<-merge(nn1,ret1,by="Marker") > > >> # get the common "Marker" strings > > >> Marker3<-nn2$Marker > > >> # subset all three data frames on Marker3 > > >> neu2<-neu1[neu1$Marker %in% Marker3,] > > >> nep2<-nep1[nep1$Marker %in% Marker3,] > > >> ret2<-ret1[ret1$Marker %in% Marker3,] > > >> > > >> Jim > > >> > > >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija > > >> wrote: > > >>> Hello, > >
Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?
Hi David, that is a great point! Yes indeed some are non unique: > dim(neu1) [1] 3742845 9 > length(unique(neu1$Marker)) [1] 3741858 > length(unique(nep1$Marker)) [1] 3745560 > dim(nep1) [1] 3746550 9 > length(unique(ret1$Marker)) [1] 3743494 > dim(ret1) [1] 3743494 9 How would I rewrite this code so that is merging by Chr and Marker column? It seems that a Marker can be under a few Chr. On Mon, Jun 1, 2020 at 8:41 PM David Winsemius wrote: > > > On 6/1/20 5:40 PM, Ana Marija wrote: > > Hi Jim, > > > > thank you so much for getting back to me. I tried your code and this is > > what I get: > >> dim(neu2) > > [1] 3740988 9 > >> dim(nep2) > > [1] 3740988 9 > >> dim(ret2) > > [1] 3740001 9 > > > > I think I would need to have the same number of lines in all 3 data frames. > > > > Can you please advise. > > > You should check for duplicated Marker values. > > > -- > > David > > > > > Cheers > > Ana > > > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon wrote: > > > >> Hi Ana, > >> Not too hard, but your example has all the "marker" fields in common. > >> So using a sample that will show the expected result: > >> > >> neu1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > >> 1 10012 1:10012:G:T 0.229925 T G + 0.650403 1594 > >> 1 10827 1:10827:C:T 0.287014 T C + 0.955449 1594 > >> 1 12713 1:12713:C:T 0.097867 T C - 0.290455 1594 > >> 1 12882 1:12882:T:G 0.287014 G T + 0.955449 1594 > >> 1 12991 1:12991:G:A 0.097867 A G - 0.290455 1594 > >> 1 14726 1:14726:G:A 0.132058 A G + 0.115005 1594", > >> header=TRUE,stringsAsFactors=FALSE) > >> > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N > >> 1 10012 1:10012:G:T 0.2300430 T G - 0.1420030 1641 > >> 1 10827 1:10827:C:T 0.2867150 T C - 0.2045580 1641 > >> 1 12713 1:12713:C:T 0.0975015 T C - 0.007 1641 > >> 1 12882 1:12882:T:G 0.2867150 G T - 0.2045580 1641 > >> 1 12991 1:12991:G:A 0.0975015 A G - 0.007 1641 > >> 1 14726 1:14727:G:A 0.1325410 A G - 0.8725660 1641", > >> header=TRUE,stringsAsFactors=FALSE) > >> > >> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > >> 1 10012 1:10012:G:T 0.2322760 T G - 0.230383 1608 > >> 1 10827 1:10827:C:T 0.2882460 T C - 0.120356 1608 > >> 1 12713 1:12713:C:T 0.0982587 T C - 0.272936 1608 > >> 1 12882 1:12882:T:G 0.2882460 G T - 0.120356 1608 > >> 1 12991 1:12992:G:A 0.0982587 A G - 0.272936 1608 > >> 1 14726 1:14727:G:A 0.1340170 A G - 0.594538 1608", > >> header=TRUE,stringsAsFactors=FALSE) > >> > >> # merge the three data frames on "Marker" > >> nn1<-merge(neu1,nep1,by="Marker") > >> nn2<-merge(nn1,ret1,by="Marker") > >> # get the common "Marker" strings > >> Marker3<-nn2$Marker > >> # subset all three data frames on Marker3 > >> neu2<-neu1[neu1$Marker %in% Marker3,] > >> nep2<-nep1[nep1$Marker %in% Marker3,] > >> ret2<-ret1[ret1$Marker %in% Marker3,] > >> > >> Jim > >> > >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija > >> wrote: > >>> Hello, > >>> > >>> I have 3 data frames which have about 3.4 mill lines (but they don't have > >>> exactly the same number of lines)...they look like this: > >>> ... > >>> Is there is a way to create another 3 data frames, say neu2, nep2, ret2 > >>> which would only contain lines that have the same entries in Marker > >> column > >>> for all 3 data frames? > >>> > >>> Thanks > >>> Ana > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?
Hi Jim, thank you so much for getting back to me. I tried your code and this is what I get: > dim(neu2) [1] 3740988 9 > dim(nep2) [1] 3740988 9 > dim(ret2) [1] 3740001 9 I think I would need to have the same number of lines in all 3 data frames. Can you please advise. Cheers Ana On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon wrote: > Hi Ana, > Not too hard, but your example has all the "marker" fields in common. > So using a sample that will show the expected result: > > neu1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > 1 10012 1:10012:G:T 0.229925 T G + 0.650403 1594 > 1 10827 1:10827:C:T 0.287014 T C + 0.955449 1594 > 1 12713 1:12713:C:T 0.097867 T C - 0.290455 1594 > 1 12882 1:12882:T:G 0.287014 G T + 0.955449 1594 > 1 12991 1:12991:G:A 0.097867 A G - 0.290455 1594 > 1 14726 1:14726:G:A 0.132058 A G + 0.115005 1594", > header=TRUE,stringsAsFactors=FALSE) > > nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N > 1 10012 1:10012:G:T 0.2300430 T G - 0.1420030 1641 > 1 10827 1:10827:C:T 0.2867150 T C - 0.2045580 1641 > 1 12713 1:12713:C:T 0.0975015 T C - 0.007 1641 > 1 12882 1:12882:T:G 0.2867150 G T - 0.2045580 1641 > 1 12991 1:12991:G:A 0.0975015 A G - 0.007 1641 > 1 14726 1:14727:G:A 0.1325410 A G - 0.8725660 1641", > header=TRUE,stringsAsFactors=FALSE) > > ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction pValue N > 1 10012 1:10012:G:T 0.2322760 T G - 0.230383 1608 > 1 10827 1:10827:C:T 0.2882460 T C - 0.120356 1608 > 1 12713 1:12713:C:T 0.0982587 T C - 0.272936 1608 > 1 12882 1:12882:T:G 0.2882460 G T - 0.120356 1608 > 1 12991 1:12992:G:A 0.0982587 A G - 0.272936 1608 > 1 14726 1:14727:G:A 0.1340170 A G - 0.594538 1608", > header=TRUE,stringsAsFactors=FALSE) > > # merge the three data frames on "Marker" > nn1<-merge(neu1,nep1,by="Marker") > nn2<-merge(nn1,ret1,by="Marker") > # get the common "Marker" strings > Marker3<-nn2$Marker > # subset all three data frames on Marker3 > neu2<-neu1[neu1$Marker %in% Marker3,] > nep2<-nep1[nep1$Marker %in% Marker3,] > ret2<-ret1[ret1$Marker %in% Marker3,] > > Jim > > On Tue, Jun 2, 2020 at 7:50 AM Ana Marija > wrote: > > > > Hello, > > > > I have 3 data frames which have about 3.4 mill lines (but they don't have > > exactly the same number of lines)...they look like this: > > ... > > Is there is a way to create another 3 data frames, say neu2, nep2, ret2 > > which would only contain lines that have the same entries in Marker > column > > for all 3 data frames? > > > > Thanks > > Ana > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] is there is a way to extract lines in between 3 files that are in common based on one column?
Hello, I have 3 data frames which have about 3.4 mill lines (but they don't have exactly the same number of lines)...they look like this: > neu1=neu[order(neu$Marker),] > head(neu1) ChrBP Marker MAF A1 A2 Direction pValueN 209565 1 10012 1:10012:G:T 0.229925 T G + 0.650403 1594 209566 1 10827 1:10827:C:T 0.287014 T C + 0.955449 1594 209567 1 12713 1:12713:C:T 0.097867 T C - 0.290455 1594 209568 1 12882 1:12882:T:G 0.287014 G T + 0.955449 1594 209569 1 12991 1:12991:G:A 0.097867 A G - 0.290455 1594 209570 1 14726 1:14726:G:A 0.132058 A G + 0.115005 1594 > nep1=nep[order(nep$Marker),] > head(nep1) ChrBP Marker MAF A1 A2 DirectionpValue N 209642 1 10012 1:10012:G:T 0.2300430 T G - 0.1420030 1641 209643 1 10827 1:10827:C:T 0.2867150 T C - 0.2045580 1641 209644 1 12713 1:12713:C:T 0.0975015 T C - 0.007 1641 209645 1 12882 1:12882:T:G 0.2867150 G T - 0.2045580 1641 209646 1 12991 1:12991:G:A 0.0975015 A G - 0.007 1641 209647 1 14726 1:14726:G:A 0.1325410 A G - 0.8725660 1641 > ret1=ret[order(ret$Marker),] > head(ret1) ChrBP Marker MAF A1 A2 Direction pValue N 8654531 10012 1:10012:G:T 0.2322760 T G - 0.230383 1608 4515961 10827 1:10827:C:T 0.2882460 T C - 0.120356 1608 1026046 1 12713 1:12713:C:T 0.0982587 T C - 0.272936 1608 4515971 12882 1:12882:T:G 0.2882460 G T - 0.120356 1608 1026047 1 12991 1:12991:G:A 0.0982587 A G - 0.272936 1608 2234642 1 14726 1:14726:G:A 0.1340170 A G - 0.594538 1608 Is there is a way to create another 3 data frames, say neu2, nep2, ret2 which would only contain lines that have the same entries in Marker column for all 3 data frames? Thanks Ana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to load data frame where numeric will be numeric instead of character
HI David, this is the problem: > NEP <- read.table("gokind.nephropathy.fin", header=T,stringsAsFactors=FALSE) > sapply(NEP,class) Chr BP Marker MAF A1 A2 "character" "character" "character" "character" "character" "character" Direction pValue N So even entries like Chr, BP, MAFare characters while they should be numeric > head(NEP) ChrBP Marker MAF A1 A2 Direction pValueN 1 10 10625 10:10625:A:G 0.4156 G A + 0.484813 1641 2 10 10645 10:10645:A:C 0.216027 C A + 0.73597 1641 Can you please tell me what colClasses=colClassvec suppose to do? Thanks Ana On Mon, Jun 1, 2020 at 4:13 PM David Winsemius wrote: > > On 6/1/20 1:37 PM, Ana Marija wrote: > > Hello, > > > > I have a dataframe like this: > > > >ChrBP Marker MAF A1 A2 Direction pValueN > > 1 10 10625 10:10625:A:G 0.416562 G A - 0.558228 1594 > > 2 10 10645 10:10645:A:C 0.215182 C A - 0.880622 1594 > > ... > > > > which I load with: > > NEU <- read.table("gokind.neuropathy.fin", > header=T,stringsAsFactors=FALSE) > > > > and every column is numeric. How to say have all numeric ones stay > numeric > > like: Chr, BP, MAF, pValue, N > > > I cannot figure out what the problem is. You say every column is > numeric. It's not possible to have a column that contains the value > "10:10625:A:G" be numeric. > > > If you meant to say the every column was character, then the answer > might be: > > > colClassvec <- rep("numeric",9) > colClassvec[ c(3,5:7)] <- "character" > > NEU <- read.table("gokind.neuropathy.fin", > header=T,stringsAsFactors=FALSE, colClasses=colClassvec) > > -- > David. > > > > > Thanks > > Ana > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to load data frame where numeric will be numeric instead of character
7th fileld, Direction contains only "+" and "-" On Mon, Jun 1, 2020 at 3:46 PM Bert Gunter wrote: > I count 8 fields in your data and 9 names in your heading ?? > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Jun 1, 2020 at 1:38 PM Ana Marija > wrote: > >> Hello, >> >> I have a dataframe like this: >> >> ChrBP Marker MAF A1 A2 Direction pValueN >> 1 10 10625 10:10625:A:G 0.416562 G A - 0.558228 1594 >> 2 10 10645 10:10645:A:C 0.215182 C A - 0.880622 1594 >> ... >> >> which I load with: >> NEU <- read.table("gokind.neuropathy.fin", >> header=T,stringsAsFactors=FALSE) >> >> and every column is numeric. How to say have all numeric ones stay numeric >> like: Chr, BP, MAF, pValue, N >> >> Thanks >> Ana >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to load data frame where numeric will be numeric instead of character
Hello, I have a dataframe like this: ChrBP Marker MAF A1 A2 Direction pValueN 1 10 10625 10:10625:A:G 0.416562 G A - 0.558228 1594 2 10 10645 10:10645:A:C 0.215182 C A - 0.880622 1594 ... which I load with: NEU <- read.table("gokind.neuropathy.fin", header=T,stringsAsFactors=FALSE) and every column is numeric. How to say have all numeric ones stay numeric like: Chr, BP, MAF, pValue, N Thanks Ana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to only show two numbers on bar_plot with ggplot
I resolved it not elegantly with: d=as.numeric(as.character(e$pheno)) ed<-ggplot(e) + geom_bar(aes(x = ESRD, fill = factor(pheno,labels=c("control","case"+scale_fill_manual(values=c("#56B4E9","#E7B800"))+labs(fill="pheno")+scale_x_continuous(breaks = unique(d)) ed where: > head(e) ESRD pheno 11 1 21 1 31 2 41 1 51 1 > sapply(e,class) ESRD pheno "integer" "factor" On Fri, May 22, 2020 at 1:52 PM Ana Marija wrote: > > Hello, > > I made the plot in attach via: > > ed<-ggplot(e) + > geom_bar(aes(x = ESRD, fill = > factor(pheno,labels=c("control","case"+scale_fill_manual(values=c("#56B4E9","#E7B800"))+labs(fill="pheno") > > ed > > How do I show only 1 and 2 on x axis? > > Thanks > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to only show two numbers on bar_plot with ggplot
Hello, I made the plot in attach via: ed<-ggplot(e) + geom_bar(aes(x = ESRD, fill = factor(pheno,labels=c("control","case"+scale_fill_manual(values=c("#56B4E9","#E7B800"))+labs(fill="pheno") ed How do I show only 1 and 2 on x axis? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to show percentage of individuals for two groups on histogram?
Hi Eric, Thank you for getting back to me, I tried those solutions but they don't do percentage per groups, so if I do ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) + geom_histogram(aes(y = stat(density)), binwidth = 0.5) + scale_y_continuous(labels = scales::percent_format()) I am getting the plot in attach, while my results should be more in this range like on the plot here: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs18.v2.p1&phv=19980&phd=154&pha=2864&pht=62&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1 On Fri, May 22, 2020 at 12:18 AM Eric Berger wrote: > > Hi Ana, > This is a very common question about ggplot. > A quick search turns up lots of hits that answer your question. Here > are a couple > https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999 > https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables > > From reading those discussions, the following should work (untested) > > ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y = > stat(density)), binwidth = 0.5) + > scale_y_continuous(labels = scales::percent_format()) > > HTH, > Eric > > > On Fri, May 22, 2020 at 7:18 AM Jim Lemon wrote: > > > > Hi Ana, > > Just noticed a typo from a hasty cut-paste. Two lines should read: > > > > casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > > controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > > > > Jim > > > > On Fri, May 22, 2020 at 2:08 PM Jim Lemon wrote: > > > > > > Hi Ana, > > > My apologies for the pedestrian graphics, but it may help. > > > > > > # a bit of fake data > > > aafd<-data.frame(FID=paste0("fam",1000:2739), > > > IID=paste0("G",1000,2739),FLASER=rep(1,1740), > > > PLASER=c(rep(1,892),rep(2,848)), > > > DIABDUR=sample(10:50,1740,TRUE), > > > HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740), > > > pheno=c(rep("control",892),rep("case",848))) > > > par(mfrow=c(2,1)) > > > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > > > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > > > par(mar=c(0,4,1,2)) > > > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange", > > > space=0,ylab="Percentage",xaxt="n",ylim=c(0,25)) > > > text(mean(barpos),23, > > > "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96") > > > box() > > > par(mar=c(3,4,0,2)) > > > barplot(100*controlhist,names.arg=names(controlpct), > > > space=0,ylab="Percentage",col="orange",ylim=c(0,25)) > > > text(mean(barpos),23, > > > "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12") > > > box() > > > > > > Jim > > > > > > On Fri, May 22, 2020 at 9:08 AM Ana Marija > > > wrote: > > > > > > > > the result would basically look something like this on in attach or > > > > the overlay of those two plots > > > > > > > > > > > > On Thu, May 21, 2020 at 5:23 PM Ana Marija > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > I have a data frame like this: > > > > > > head(a) > > > > > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno > > > > > 1 fam1000-03 G1000 1 1 38 10.21 control > > > > > 2 fam1001-03 G1001 1 1 15 7.31 control > > > > > 3 fam1003-03 G1003 1 2 17 7.01case > > > > > 4 fam1005-03 G1005 1 1 36 7.71 control > > > > > 5 fam1009-03 G1009 1 1 23 7.61 control > > > > > 6 fam1052-03 G1052 1 1 32 7.31 control > > > > > > > > > > > dim(a) > > > > > [1] 16988 > > > > > > > > > > I am doing histogram plot via: > > > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, > > > > > position="dodge") > > > > > > > > > > there is 848 who have "case" in pheno column and 892 who have > > > > > "control" in pheno column. > > > > > > > > > &g
Re: [R] how to show percentage of individuals for two groups on histogram?
HI Jim, Thank you so much for getting back to me I tried your codes and I got this in attach, I think the issue is in calculating percentage per groups (cases or controls) par(mfrow=c(2,1)) casehist<-table(cut(a$HBA1C[a$pheno=="case"],breaks=0:15)) controlhist<-table(cut(a$HBA1C[a$pheno=="control"],breaks=0:15)) par(mar=c(0,4,1,2)) barpos=barplot(100*casehist,names.arg=names(casehist),col="orange", space=0,ylab="Percentage",xaxt="n",ylim=c(0,25)) text(mean(barpos),23, "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96") box() par(mar=c(3,4,0,2)) barplot(100*controlhist,names.arg=names(controlhist), space=0,ylab="Percentage",col="orange",ylim=c(0,25)) text(mean(barpos),23, "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12") box() I can send you the whole dataset if you would like to try with it On Thu, May 21, 2020 at 11:14 PM Jim Lemon wrote: > > Hi Ana, > Just noticed a typo from a hasty cut-paste. Two lines should read: > > casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > > Jim > > On Fri, May 22, 2020 at 2:08 PM Jim Lemon wrote: > > > > Hi Ana, > > My apologies for the pedestrian graphics, but it may help. > > > > # a bit of fake data > > aafd<-data.frame(FID=paste0("fam",1000:2739), > > IID=paste0("G",1000,2739),FLASER=rep(1,1740), > > PLASER=c(rep(1,892),rep(2,848)), > > DIABDUR=sample(10:50,1740,TRUE), > > HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740), > > pheno=c(rep("control",892),rep("case",848))) > > par(mfrow=c(2,1)) > > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15)) > > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15)) > > par(mar=c(0,4,1,2)) > > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange", > > space=0,ylab="Percentage",xaxt="n",ylim=c(0,25)) > > text(mean(barpos),23, > > "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96") > > box() > > par(mar=c(3,4,0,2)) > > barplot(100*controlhist,names.arg=names(controlpct), > > space=0,ylab="Percentage",col="orange",ylim=c(0,25)) > > text(mean(barpos),23, > > "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12") > > box() > > > > Jim > > > > On Fri, May 22, 2020 at 9:08 AM Ana Marija > > wrote: > > > > > > the result would basically look something like this on in attach or > > > the overlay of those two plots > > > > > > > > > On Thu, May 21, 2020 at 5:23 PM Ana Marija > > > wrote: > > > > > > > > Hello, > > > > > > > > I have a data frame like this: > > > > > head(a) > > > > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno > > > > 1 fam1000-03 G1000 1 1 38 10.21 control > > > > 2 fam1001-03 G1001 1 1 15 7.31 control > > > > 3 fam1003-03 G1003 1 2 17 7.01case > > > > 4 fam1005-03 G1005 1 1 36 7.71 control > > > > 5 fam1009-03 G1009 1 1 23 7.61 control > > > > 6 fam1052-03 G1052 1 1 32 7.31 control > > > > > > > > > dim(a) > > > > [1] 16988 > > > > > > > > I am doing histogram plot via: > > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, > > > > position="dodge") > > > > > > > > there is 848 who have "case" in pheno column and 892 who have > > > > "control" in pheno column. > > > > > > > > I would like to have on y-axis shown percentage of individuals which > > > > have either "case" or "control" in pheno instead of count. > > > > > > > > Please advise, > > > > Ana > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to show percentage of individuals for two groups on histogram?
the result would basically look something like this on in attach or the overlay of those two plots On Thu, May 21, 2020 at 5:23 PM Ana Marija wrote: > > Hello, > > I have a data frame like this: > > head(a) > FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno > 1 fam1000-03 G1000 1 1 38 10.21 control > 2 fam1001-03 G1001 1 1 15 7.31 control > 3 fam1003-03 G1003 1 2 17 7.01case > 4 fam1005-03 G1005 1 1 36 7.71 control > 5 fam1009-03 G1009 1 1 23 7.61 control > 6 fam1052-03 G1052 1 1 32 7.31 control > > > dim(a) > [1] 16988 > > I am doing histogram plot via: > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, > position="dodge") > > there is 848 who have "case" in pheno column and 892 who have > "control" in pheno column. > > I would like to have on y-axis shown percentage of individuals which > have either "case" or "control" in pheno instead of count. > > Please advise, > Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to show percentage of individuals for two groups on histogram?
Hello, I have a data frame like this: > head(a) FID IID FLASER PLASER DIABDUR HBA1C ESRD pheno 1 fam1000-03 G1000 1 1 38 10.21 control 2 fam1001-03 G1001 1 1 15 7.31 control 3 fam1003-03 G1003 1 2 17 7.01case 4 fam1005-03 G1005 1 1 36 7.71 control 5 fam1009-03 G1009 1 1 23 7.61 control 6 fam1052-03 G1052 1 1 32 7.31 control > dim(a) [1] 16988 I am doing histogram plot via: ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5, position="dodge") there is 848 who have "case" in pheno column and 892 who have "control" in pheno column. I would like to have on y-axis shown percentage of individuals which have either "case" or "control" in pheno instead of count. Please advise, Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text annotation on Manhattn plot in qqman
HI Michael, Thank you so much! That worked!!! Now I am just trying to increase the size of text of SNP and GENE on plot I tried this: a$newname <- paste(a$SNP,"\n", a$GENE) manhattan(a, chr="CHR", bp="BP", snp="newname", p="P",cex = 0.5,annotatePval = 0.0001) but I am getting this error: Error in textxy(topSNPs$pos, -log10(topSNPs$P), offset = 0.625, labs = topSNPs$SNP, : formal argument "cex" matched by multiple actual arguments Do you by any chance know how to do this? Cheers Ana On Wed, May 20, 2020 at 4:12 AM Michael Dewey wrote: > > a$newname <- paste(a$SNP, a$GENE) > manhattan(a, chr="CHR", bp="BP", snp="newname", p="P",annotatePval = 0.0001) > > However note that I do not use manhattan() so you may need to alter the > parameters as I am assuming a is where it finds the remaining parameters. > > You may also need to play with the sep =, and collapse = parameters to > paste() to get the precise layout you want. > > Michael > > On 19/05/2020 17:21, Ana Marija wrote: > > Hi Michael, > > > > can you please send me code how that would be done? > > > > Thanks > > Ana > > > > On Tue, May 19, 2020 at 11:18 AM Michael Dewey > > wrote: > >> > >> Dear Ana > >> > >> Perhaps paste together SNP and GENE using paste() and then supply that > >> as the snp parameter. > >> > >> Michael > >> > >> On 19/05/2020 17:12, Ana Marija wrote: > >>> Hello, > >>> > >>> I am making manhattan plot with: > >>> library(qqman) > >>> manhattan(a, chr="CHR", bp="BP", snp="SNP", p="P",annotatePval = 0.0001) > >>> > >>> and I would like to annotate these two SNPs which are above the > >>> threshold so that they have GENE name beside them: > >>> > >>>> a[a$SNP=="rs4081570",] > >>> SNPP CHR BP GENE > >>> 1 rs4081570 6.564447e-05 19 15918647 UCA1 > >>>> a[a$SNP=="rs11867934",] > >>> SNPP CHR BP GENE > >>> 1021 rs11867934 6.738066e-06 17 16933404 FLCN > >>> > >>> Right now my plot only has SNP name for those 2, how to add GENE names > >>> (FLCN and UCA1 as well) > >>> > >>> Please advise > >>> Ana > >>> > >>> > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> -- > >> Michael > >> http://www.dewey.myzen.co.uk/home.html > > > > -- > Michael > http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text annotation on Manhattn plot in qqman
Hi Michael, can you please send me code how that would be done? Thanks Ana On Tue, May 19, 2020 at 11:18 AM Michael Dewey wrote: > > Dear Ana > > Perhaps paste together SNP and GENE using paste() and then supply that > as the snp parameter. > > Michael > > On 19/05/2020 17:12, Ana Marija wrote: > > Hello, > > > > I am making manhattan plot with: > > library(qqman) > > manhattan(a, chr="CHR", bp="BP", snp="SNP", p="P",annotatePval = 0.0001) > > > > and I would like to annotate these two SNPs which are above the > > threshold so that they have GENE name beside them: > > > >> a[a$SNP=="rs4081570",] > > SNPP CHR BP GENE > > 1 rs4081570 6.564447e-05 19 15918647 UCA1 > >> a[a$SNP=="rs11867934",] > > SNPP CHR BP GENE > > 1021 rs11867934 6.738066e-06 17 16933404 FLCN > > > > Right now my plot only has SNP name for those 2, how to add GENE names > > (FLCN and UCA1 as well) > > > > Please advise > > Ana > > > > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Michael > http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] text annotation on Manhattn plot in qqman
Hello, I am making manhattan plot with: library(qqman) manhattan(a, chr="CHR", bp="BP", snp="SNP", p="P",annotatePval = 0.0001) and I would like to annotate these two SNPs which are above the threshold so that they have GENE name beside them: > a[a$SNP=="rs4081570",] SNPP CHR BP GENE 1 rs4081570 6.564447e-05 19 15918647 UCA1 > a[a$SNP=="rs11867934",] SNPP CHR BP GENE 1021 rs11867934 6.738066e-06 17 16933404 FLCN Right now my plot only has SNP name for those 2, how to add GENE names (FLCN and UCA1 as well) Please advise Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
Hi Rui, thank you so much that is exactly what I needed! Cheers, Ana On Fri, May 15, 2020 at 5:12 PM Rui Barradas wrote: > > Hello, > > I have tried several options and with large dataframes this one was the > fastest (in my tests, of the ones I have tried). > > > s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE)) > > > Then unlist(s1). > A close second (15% slower) was > > > s2 <- tot[sapply(tot, function(x) grepl('^E10', x))] > > > grep/unlist was 3.7 times slower: > > > grep("^E10", unlist(tot), value = TRUE) > > > Hope this helps, > > Rui Barradas > > Às 20:24 de 15/05/20, Ana Marija escreveu: > > Hello, > > > > this command was running for more than 2 hours > > grep("E10",tot,value=T) > > and no output > > > > and this command > > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > > > > gave me a subset (a data frame) of tot where ^E10 > > > > what I need is just a vector or all values in tot which start with E10. > > > > Thanks > > Ana > > > > On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller > > wrote: > >> > >> Read about regular expressions... they are extremely useful. > >> > >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > >> > >> It is bad form not to put spaces around the <- assignment. > >> > >> > >> On May 15, 2020 10:00:04 AM PDT, Ana Marija > >> wrote: > >>> Hello, > >>> > >>> I have a data frame: > >>> > >>>> dim(tot) > >>> [1] 502536 1093 > >>> > >>> How would I extract from it all strings that start with E10? > >>> > >>> I know how to extract all rows that contain with E10 > >>> df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > >>>> dim(df0) > >>> [1] 5105 1093 > >>> > >>> but I just need a vector of strings that start with E10... > >>> it would look something like this: > >>> > >>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > >>> > >>> Thanks > >>> Ana > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> -- > >> Sent from my phone. Please excuse my brevity. > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to extract strings in any column and in any row that start with
Hello, this command was running for more than 2 hours grep("E10",tot,value=T) and no output and this command df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) gave me a subset (a data frame) of tot where ^E10 what I need is just a vector or all values in tot which start with E10. Thanks Ana On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller wrote: > > Read about regular expressions... they are extremely useful. > > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .))) > > It is bad form not to put spaces around the <- assignment. > > > On May 15, 2020 10:00:04 AM PDT, Ana Marija > wrote: > >Hello, > > > >I have a data frame: > > > >> dim(tot) > >[1] 502536 1093 > > > >How would I extract from it all strings that start with E10? > > > >I know how to extract all rows that contain with E10 > >df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > >> dim(df0) > >[1] 5105 1093 > > > >but I just need a vector of strings that start with E10... > >it would look something like this: > > > >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" > > > >Thanks > >Ana > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to extract strings in any column and in any row that start with
Hello, I have a data frame: > dim(tot) [1] 502536 1093 How would I extract from it all strings that start with E10? I know how to extract all rows that contain with E10 df0<-tot %>% filter_all(any_vars(. %in% c('E10'))) > dim(df0) [1] 5105 1093 but I just need a vector of strings that start with E10... it would look something like this: [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107" Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: Cannot use `+.gg()` with a single argument.
Hello, I got this error: Error: Cannot use `+.gg()` with a single argument. Did you accidentally put + on a new line? After running this: data(murders) library(ggplot2) library(dplyr) library(ggplot2) ggplot(data=murders) #define the slope of the line r<-murders %>% summarize(rate=sum(total)/sum(population)*10^6) %>%.$rate #mamke the plot murders %>% ggplot(aes(population/10^6,total,label=abb))+ +geom_abline(intercept = log10(r),lty=2,color="darkgrey")+ +geom_point(aes(col=region), size=3)+ +geom_text_repel()+ +scale_x_log10()+ +scale_y_log10()+ +xlab("Populations in millions (log scale)")+ +ylab("Total number of murders (log scale)")+ +ggtitle("US Gun Murders in US 2010")+ +scale_color_discrete(name="Region")+ +theme_economist() Is this an issue with my dplyr? Or how I can fix this code in order to work? Thanks Ana __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating t-score/t-stats as my zscores
Thank you so much! I mostly worry which of those procedures is the closest to the z-score On Wed, May 6, 2020 at 10:05 AM Rui Barradas wrote: > > Hello, > > Another option is to use stats::t.test > > t.test(x)$statistic > > Or, if you want to test against a beta0 != 0, > > t.test(x, mu = beta0)$statistic > > > But in this case the estimator is the estimator for the mean value. > > Hope this helps, > > Rui Barradas > > Às 15:54 de 06/05/20, Ana Marija escreveu: > > Thanks Patrick, so in conclusion this is fine? > > z-score=Beta/StdErr > > > > On Wed, May 6, 2020 at 9:52 AM Patrick (Malone Quantitative) > > wrote: > >> > >> Guessing for Ana, but no, that's a different meaning. Beta/StdErr is a > >> z statistic--a test statistic against (usually) the tails of the unit > >> normal distribution. So like a t-test with infinite df. > >> > >> > >> On Wed, May 6, 2020 at 10:41 AM Rui Barradas wrote: > >>> > >>> Hello, > >>> > >>> By z-scores do you mean function help('scale')? > >>> > >>> Hope this helps, > >>> > >>> Rui Barradas > >>> > >>> Às 15:31 de 06/05/20, Ana Marija escreveu: > >>>> Hi Rui, > >>>> > >>>> Thank you for getting back to me. Is there is a better way to > >>>> calculate Z scores if I have p values, SE and Beta? > >>>> > >>>> Thanks > >>>> Ana > >>>> > >>>> On Wed, May 6, 2020 at 9:27 AM Rui Barradas wrote: > >>>>> > >>>>> Hello, > >>>>> > >>>>> That gives the *absolute* t-scores. If it's all you need/want, then the > >>>>> answer is yes, you can. > >>>>> > >>>>> Hope this helps, > >>>>> > >>>>> Rui Barradas > >>>>> > >>>>> Às 14:28 de 06/05/20, Ana Marija escreveu: > >>>>>> Hello, > >>>>>> > >>>>>> Can I apply the quantile function qt() this way? > >>>>>> qt(pvals/2, 406-34, lower.tail = F) > >>>>>> to get the T-scores? > >>>>>> > >>>>>> Thanks > >>>>>> Ama > >>>>>> > >>>>>> __ > >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>> PLEASE do read the posting guide > >>>>>> http://www.R-project.org/posting-guide.html > >>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>> > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > >> -- > >> Patrick S. Malone, Ph.D., Malone Quantitative > >> NEW Service Models: http://malonequantitative.com > >> > >> He/Him/His __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating t-score/t-stats as my zscores
Thanks Patrick, so in conclusion this is fine? z-score=Beta/StdErr On Wed, May 6, 2020 at 9:52 AM Patrick (Malone Quantitative) wrote: > > Guessing for Ana, but no, that's a different meaning. Beta/StdErr is a > z statistic--a test statistic against (usually) the tails of the unit > normal distribution. So like a t-test with infinite df. > > > On Wed, May 6, 2020 at 10:41 AM Rui Barradas wrote: > > > > Hello, > > > > By z-scores do you mean function help('scale')? > > > > Hope this helps, > > > > Rui Barradas > > > > Às 15:31 de 06/05/20, Ana Marija escreveu: > > > Hi Rui, > > > > > > Thank you for getting back to me. Is there is a better way to > > > calculate Z scores if I have p values, SE and Beta? > > > > > > Thanks > > > Ana > > > > > > On Wed, May 6, 2020 at 9:27 AM Rui Barradas wrote: > > >> > > >> Hello, > > >> > > >> That gives the *absolute* t-scores. If it's all you need/want, then the > > >> answer is yes, you can. > > >> > > >> Hope this helps, > > >> > > >> Rui Barradas > > >> > > >> Às 14:28 de 06/05/20, Ana Marija escreveu: > > >>> Hello, > > >>> > > >>> Can I apply the quantile function qt() this way? > > >>> qt(pvals/2, 406-34, lower.tail = F) > > >>> to get the T-scores? > > >>> > > >>> Thanks > > >>> Ama > > >>> > > >>> __ > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide > > >>> http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >>> > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Patrick S. Malone, Ph.D., Malone Quantitative > NEW Service Models: http://malonequantitative.com > > He/Him/His __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating t-score/t-stats as my zscores
as defined here: https://huwenboshi.github.io/data%20management/2017/11/23/tips-for-formatting-gwas-summary-stats.html where Effect size is Beta On Wed, May 6, 2020 at 9:41 AM Rui Barradas wrote: > > Hello, > > By z-scores do you mean function help('scale')? > > Hope this helps, > > Rui Barradas > > Às 15:31 de 06/05/20, Ana Marija escreveu: > > Hi Rui, > > > > Thank you for getting back to me. Is there is a better way to > > calculate Z scores if I have p values, SE and Beta? > > > > Thanks > > Ana > > > > On Wed, May 6, 2020 at 9:27 AM Rui Barradas wrote: > >> > >> Hello, > >> > >> That gives the *absolute* t-scores. If it's all you need/want, then the > >> answer is yes, you can. > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> Às 14:28 de 06/05/20, Ana Marija escreveu: > >>> Hello, > >>> > >>> Can I apply the quantile function qt() this way? > >>> qt(pvals/2, 406-34, lower.tail = F) > >>> to get the T-scores? > >>> > >>> Thanks > >>> Ama > >>> > >>> __ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.