from:"Ana"

Re: [R] [External] Re: Error in setwd(dir) when initializing R

2023-11-22 Thread Ana de las Heras Molina

Thank you all for your help. I was better with Rgui. I have the feeling
that it was me, doing something wrong when I uninstalled Onedrive from both
computers... I will talk with the IT in the university.

Again, thank you for your interest. I have learnt a lot.

El mar, 21 nov 2023 a las 19:41, Michael Dewey ()
escribió:

> Dear Ana
>
> When you installed R it gave you an option to put a shortcut on the
> desktop. If you did that then you can start the R GUI by clicking on it.
> You may need to customise it for your preferences. Specifically it
> starts R with an option cd-user-docs which I delete and I then alter the
> starting option to start in the directory I want. You may have different
> preferences of course.
>
> Try Rgui as Ivan suggests and report back what happened.
>
> You may have to ask your local IT support as if this is something the
> Complutense is setting up it is surprising that all your collegues are
> free of problems. Have you asked around?
>
> Michael
>
> On 21/11/2023 12:41, Ana de las Heras Molina wrote:
> > Hello,
> >
> > I am sorry for my ignorance, but what is Rgui.exe?
> >
> > El mar, 21 nov 2023 a las 12:06, Ivan Krylov ()
> > escribió:
> >
> >> В Tue, 21 Nov 2023 10:51:59 +0100
> >> Ana de las Heras Molina  пишет:
> >>
> >>> I uninstalled onedrive, I eliminated all the folders and then
> >>> reinstalled R and RStudio... but it is RStudio the one creating a
> >>> folder called C:\Users\Ana\OneDrive - Universidad Complutense de
> >>> Madrid (UCM)\Documentos.
> >>
> >> Does Rgui.exe use a different home directory from RStudio? Perhaps you
> >> need not to reinstall RStudio (which deletes and recreates the program
> >> files which aren't damaged) but to wipe the profile (the settings
> >> somewhere under %APPDATA% or %LOCALAPPDATA%), but it pays to be extra
> >> careful about these directories. People at
> >> <https://community.rstudio.com/> may know more about that.
> >>
> >> I'm afraid I don't know how to disable OneDrive on a Windows 10
> >> installation. If your university has a Microsoft support contract, it
> >> could be worth asking the Microsoft representative to fix OneDrive so
> >> that temporary files would work on it and telling them the steps to
> >> reproduce the problem.
> >>
> >>> Browse[1]> dir.exists(dir)
> >>> [1] TRUE
> >>> Browse[1]> setwd(dir)
> >>> Error durante el wrapup: no es posible cambiar el directorio de
> >>> trabajo
> >>
> >> What is the value of `dir` at this point? (What does print(dir) say?)
> >> Can you open this directory in Windows Explorer?
> >>
> >> If possible, could you please set your mailer to compose messages in
> >> plain text instead of HTML? The plain text versions of your messages
> >> that your mailer generates from the HTML messages are somewhat mangled:
> >> https://stat.ethz.ch/pipermail/r-help/2023-November/478593.html
> >>
> >> --
> >> Best regards,
> >> Ivan
> >>
> >
> >
>
> --
> Michael
>


-- 
*Ana de las Heras Molina*


Nutrición Animal
Departamento de  Producción Animal
Facultad de Veterinaria
Universidad Complutense de Madrid

*Contacto*: 913943855/anaher...@ucm.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [External] Re: Error in setwd(dir) when initializing R

2023-11-21 Thread Ana de las Heras Molina

Hello,

I am sorry for my ignorance, but what is Rgui.exe?

El mar, 21 nov 2023 a las 12:06, Ivan Krylov ()
escribió:

> В Tue, 21 Nov 2023 10:51:59 +0100
> Ana de las Heras Molina  пишет:
>
> > I uninstalled onedrive, I eliminated all the folders and then
> > reinstalled R and RStudio... but it is RStudio the one creating a
> > folder called C:\Users\Ana\OneDrive - Universidad Complutense de
> > Madrid (UCM)\Documentos.
>
> Does Rgui.exe use a different home directory from RStudio? Perhaps you
> need not to reinstall RStudio (which deletes and recreates the program
> files which aren't damaged) but to wipe the profile (the settings
> somewhere under %APPDATA% or %LOCALAPPDATA%), but it pays to be extra
> careful about these directories. People at
> <https://community.rstudio.com/> may know more about that.
>
> I'm afraid I don't know how to disable OneDrive on a Windows 10
> installation. If your university has a Microsoft support contract, it
> could be worth asking the Microsoft representative to fix OneDrive so
> that temporary files would work on it and telling them the steps to
> reproduce the problem.
>
> > Browse[1]> dir.exists(dir)
> > [1] TRUE
> > Browse[1]> setwd(dir)
> > Error durante el wrapup: no es posible cambiar el directorio de
> > trabajo
>
> What is the value of `dir` at this point? (What does print(dir) say?)
> Can you open this directory in Windows Explorer?
>
> If possible, could you please set your mailer to compose messages in
> plain text instead of HTML? The plain text versions of your messages
> that your mailer generates from the HTML messages are somewhat mangled:
> https://stat.ethz.ch/pipermail/r-help/2023-November/478593.html
>
> --
> Best regards,
> Ivan
>


-- 
*Ana de las Heras Molina*


Nutrición Animal
Departamento de  Producción Animal
Facultad de Veterinaria
Universidad Complutense de Madrid

*Contacto*: 913943855/anaher...@ucm.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [External] Re: Error in setwd(dir) when initializing R

2023-11-21 Thread Ana de las Heras Molina

Hi,

I uninstalled onedrive, I eliminated all the folders and then reinstalled R
and RStudio... but it is RStudio the one creating a folder
called C:\Users\Ana\OneDrive - Universidad Complutense de Madrid
(UCM)\Documentos.

This is what I obtained with the debugging

Error in setwd(dir) : no es posible cambiar el directorio de trabajo
Enter a frame number, or 0 to exit

1: .rs.availablePackages()
2: .rs.onAvailablePackagesStale(reposString)
3: setwd(dir)
Selection: 3Called from:
.rs.onAvailablePackagesStale(reposString)Browse[1]> dir.exists(dir)[1]
TRUEBrowse[1]> setwd(dir)Error durante el wrapup: no es posible
cambiar el directorio de trabajo
Error: no more error handlers available (recursive errors?); invoking
'abort' restart


El mar, 21 nov 2023 a las 9:16, Ivan Krylov ()
escribió:

> On Tue, 21 Nov 2023 08:46:25 +0100
> Ana de las Heras Molina  wrote:
>
> > > traceback()
> > 4: setwd(dir)
> > 3: .rs.onAvailablePackagesStale(reposString)
> > 2: .rs.availablePackages()
> > 1: .rs.rpc.discover_package_dependencies("3C994FEC", ".R")
>
> This is something that RStudio (not R itself!) does, but it shouldn't
> be failing. Here's what seems to be failing for you [*]:
>
># prepare directory for discovery of available packages
>dir <- tempfile("rstudio-available-packages-")
>dir.create(dir, showWarnings = FALSE)
>
># create a file in that directory
>saveRDS(Sys.time(), file = file.path(dir, "time.rds"))
># (This doesn't fail because we keep executing the function, so the
># directory must exist!)
>
># move there
>owd <- setwd(dir) # this somehow fails
>
> Is it an option to disable OneDrive for the home directory? It's
> clearly doing something terrible to your temporary files. If not, it
> should be possible to create a separate temporary directory on your
> computer (the path must not contain spaces) and list it in the
> .Renviron file in the home directory as the TMPDIR variable:
>
> TMPDIR=C:/my/R/temp/directory
>
> See help(Startup) for more information on .Renviron.
>
> Alternatively, we may try to perform some debugging. If you set
> options(error = recover) and run .rs.availablePackages() again, it
> should fail and give you a debugger prompt. Choose the deepest option
> in the call stack and try to find out the value of the `dir` variable.
> Does dir.exists(dir) still return TRUE? Does setwd(dir) still fail? Can
> you open the directory in the Windows Explorer?
>
> The last but not the least, do you see some of the same problems if you
> launch Rgui.exe instead of RStudio?
>
> --
> Best regards,
> Ivan
>
> [*]
>
> https://github.com/rstudio/rstudio/blob/45af283a7a5853399904ddc438f6cd89d9b5c137/src/cpp/session/modules/SessionPackages.R#L1625-L1636
>


-- 
*Ana de las Heras Molina*


Nutrición Animal
Departamento de  Producción Animal
Facultad de Veterinaria
Universidad Complutense de Madrid

*Contacto*: 913943855/anaher...@ucm.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [External] Re: Error in setwd(dir) when initializing R

2023-11-20 Thread Ana de las Heras Molina

Hello,

Thank you all for your responses. When I initialize R, the folder in which
it starts is:

getwd()
[1] "C:/Users/Ana/OneDrive - Universidad Complutense de Madrid
(UCM)/Documentos"

So it seems a problem with OneDrive
-Kevin: I do not really understand your question... You mean if I set my
working directory to my C:// or D:// folder in my computer? It doesn't work
either.

-Ivan: this is the message I got when running traceback(). So can I move
the.Rdata file without affecting the work?

-Luke: thank you for the idea, I will try it.

Error in setwd(dir) : no es posible cambiar el directorio de trabajo>
traceback()4: setwd(dir)
3: .rs.onAvailablePackagesStale(reposString)
2: .rs.availablePackages()
1: .rs.rpc.discover_package_dependencies("3C994FEC", ".R")


Best,

Ana


El mar, 21 nov 2023 a las 0:43,  escribió:

> On Mon, 20 Nov 2023, Ivan Krylov wrote:
>
> > On Mon, 20 Nov 2023 12:18:11 +0100
> > Ana de las Heras Molina  wrote:
> >
> >> Error in setwd(dir) : no es posible cambiar el directorio de trabajo
> >
> > If you run traceback() first thing after getting this error, does it
> > say anything useful? (Anything besides "No traceback available" would
> > count as useful.)
> >
> > Do you have a file named .RData in your home directory? If yes, it may
> > help to move it away (or remove it if you don't use the saved session).
>
> Also check for .Rprofile or other profile that might contain a setwd()
> call.  Starting R with --vanilla should work if the issue is a startup
> file.
>
> Best,
>
> luke
>
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>


-- 
*Ana de las Heras Molina*


Nutrición Animal
Departamento de  Producción Animal
Facultad de Veterinaria
Universidad Complutense de Madrid

*Contacto*: 913943855/anaher...@ucm.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in setwd(dir) when initializing R

2023-11-20 Thread Ana de las Heras Molina

Hello,
I am Ana de las Heras, and I write to you because every time I open RStudio
or R directly I have the following message, before I can do anything at
all:

Error in setwd(dir) : no es posible cambiar el directorio de trabajo


At first I didn't pay much attention to it, but I am having lots of
troubles with different programs, including LinDa, ANCOMBC or Genome
InfoDbdata (and thus, phyloseq). After asking in the different forus of
each program, they have told me that the issue is more related to my R
installation and that first message I obtain when I start the program.

I would be very grateful if someone could help me. This issue started when
I installed the latest version of R (4.3.1. and now 4.3.2.). I have already
uninstalled and reinstalled both programs, as well as Rtools. I am not sure
if the issue could be related to ONe Drive, since the HOME folder is in
OneDrive. I am currently working on Windows 10, 64 bits.

Yours faithfully,
Ana
-- 
*Ana de las Heras Molina*


Nutrición Animal
Departamento de  Producción Animal
Facultad de Veterinaria
Universidad Complutense de Madrid

*Contacto*: 913943855/anaher...@ucm.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] horizontal grouped stacked plots and removing space between bars

2023-06-28 Thread Ana Marija

I have code like this:

data <- read.csv("test1.csv", stringsAsFactors=FALSE, header=TRUE)
# Graph
myplot=ggplot(data, aes(fill=condition, y=value, x=condition)) +
geom_bar(position="dodge", stat="identity", width=0.5) +
scale_fill_manual(values=c("#7b3294", "#c2a5cf", "#a6dba0", "#008837"))+
ylab("Performance (ns/day)") +
facet_wrap(~specie,nrow=3, labeller = label_wrap_gen(width = 85),
strip.position="bottom") +
   theme_bw() +
   theme(panel.grid = element_blank(),
panel.spacing = unit(0, "mm"),
legend.title=element_blank(),
axis.title.x = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.ticks.x=element_blank(),
axis.title.y = element_text(angle = 90, hjust = 0.5))



myplot + theme(panel.grid.major = element_blank(), panel.grid.minor =
element_blank(), legend.title=element_blank(),
panel.background = element_blank(), axis.title.x = element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(),
axis.title.y = element_text(angle = 90, hjust = 0.5))

And my data is this:

> dput(data)
structure(list(specie = c("gmx mdrun -gpu_id 1 -ntomp 16 -s
benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
benchMEM.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
MD_15NM_WATER.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
MD_15NM_WATER.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
MD_15NM_WATER.tpr -nsteps 1", "gmx mdrun -gpu_id 1 -ntomp 16 -s
MD_15NM_WATER.tpr -nsteps 1"), condition = c("Tesla P100-SYCL",
"Tesla V100-SYCL", "Tesla P100-CUDA", "Tesla V100-CUDA", "Tesla
P100-SYCL", "Tesla V100-SYCL", "Tesla P100-CUDA", "Tesla V100-CUDA"),
value = c(75.8, 77.771, 63.297, 78.046, 34.666, 50.052, 32.07,
59.815)), class = "data.frame", row.names = c(NA, -8L))

How do I:

   1. have these two plots next to each other, not on the top of each other
   2. How do I remove spaces between those bars?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Issue with crammed Y axis

2023-06-16 Thread Ana Marija

Hi,

I have a data frame like this:

> dput(df)
structure(list(ID = 1:8, Type = c("gmx mdrun -ntmpi 8 -ntomp 1 -s
benchPEP.tpr -nsteps 1 -resethway",
"gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps 4000 -resetstep 3000",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps 4000 -resetstep 3000",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway",
"gmx mdrun -ntmpi 8 -s benchPEP.tpr -nsteps -1 -maxh 1.0 -resethway",
"gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps -1 -maxh 1.0
-resethway -noconfout",
"gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps -1 -maxh 1.0
-resethway -noconfout"
), Annee = c("SYCL", "CUDA", "SYCL", "CUDA", "SYCL", "CUDA",
"SYCL", "CUDA"), Domain.decomp. = c("2. 1", "2", "2. 1", "2. 1",
"2.1", "2", "2. 1", "2"), DD.com..load = c(0, 0, 0, 0, 3.7, 3,
0, 0), Neighbor.search = c("3.7", "3. 1", "3.7", "3.9", "0. 1",
"O. 1", "3.5", "3. 1"), Launch.PP.GPU.ops. = c("0. 1", "0", "0.2",
"0", "1 .6", "1 . 5", "0.2", "0. 1"), Comm..coord. = c("1 .6",
"1 .0", "1 .5", "1 .3", "1 .5", "1 .3", "1 . 5", "1 .6"), Force = c("1 .
5",
"1 .2", "1 .4", "1 .2", "1 .3", "1 . 1", "1 .5", "1 .2"), Wait...Comm..F =
c("1 .3",
"1 .7", "1 .2", "1 .0", "66.7", "68.8", "1 .2", "1 .2"), PIE.mesh =
c("65.6",
"70.9", "61 .0", "61 .4", "0", "0", "67.6", "69.2"), Wait.Bonded.GPU =
c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), wait.GPU.NB.nonloc. = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Wait.GPU.NB.local = c(0, 0, 0, 0, 7.4,
5.7, 0, 0), NB.X.F.buffer.ops. = c("7.3", "4.4", "6. 7", "5",
"0. 1", "0. 1", "7.2", "5.5"), Write.traje = c("0.3", "0.3",
"1 .2", "1 .3", "6.4", "6. 1", "O. 1", "0. 1"), Update = c(6.3,
4.3, 5.7, 4.9, 8.2, 9.5, 6.2, 5.6), Constraints = c("8.9", "9.7",
"1 1 .6", "13.3", "0.3", "0.4", "8. 1", "9.5"), Comm..energies = c("0.9",
"0.9", "3.3", "3.9", "8.4", "8. 5", "0.3", "0.4"), PIE.redist..X.F = c("8.
1",
"8.7", "7.9", "7.4", "29.9", "30.1", "8. 1", "8. 1"), PIE.spread =
c("29.7",
"30.6", "27.2", "29.6", "20.3", "20.2", "30. 1", "30.4"), PIE.gather =
c("19.9",
"21 .3", "18.7", "19", "6.4", "8.4", "20", "20.6"), PIE.3D.FFT = c("6",
"8.6", "5.7", "4.3", "1 .0", "1 .1", "7.6", "8.4"), PIE.3D.FFT.comm. = c("1
.2",
"1 .0", "0.9", "0.7", "1 .2", "0. 5", "1 .0", "1 .1"), PIE.solve.Elec =
c(0.7,
0.5, 0.6, 0.3, 0.7, 0.5, 0.7, 0.5)), class = "data.frame", row.names =
c(NA,
-8L))

I am plotting this data with:

library(reshape2)
library(ggplot2)

df <- read.csv("/Users/anamaria/Downloads/B5.csv", stringsAsFactors=FALSE,
header=TRUE)

df.long<-melt(df,id.vars=c("ID","Type","Annee"))

myplot =ggplot(df.long,aes(variable,value,fill=as.factor(Annee)))+
   geom_bar(position="dodge",stat="identity")+
   ylab("Simulation Progress (%)") +
   facet_wrap(~Type,nrow=3)

myplot + theme(panel.grid.major = element_blank(),
legend.title=element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.title.x = element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
axis.line = element_line(colour = "black"))

My issue is that Y axis is crammed. How it can be cleaned up and say
feature only say these values: 0, 10, 20,30, ...80.

I tried using: scale_y_continuous(breaks = breaks_width(10))+
But I got this error:
Error in breaks_width(10) : could not find function "breaks_width"

Also can anything be done about the subtitle of the top left plot, which is
not quite fitting in that gray box: "
gmx mdrun -ntmpi 8 -ntomp 1 -s benchPEP.tpr -nsteps 1 -resethway"

Thanks
Ana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log transform a data frame

2023-06-13 Thread Ana Marija

Thank you so much David, here is correction:

d1=suppressWarnings(read.csv("/Users/anamaria/Downloads/B1.csv",
stringsAsFactors=FALSE, header=TRUE))
d1$X <- NULL
d2=as.matrix(sapply(d1, as.numeric))
pdf("~/graph.pdf")
b<-barplot(d2, legend= c("SYCL", "CUDA"), beside=
TRUE,las=2,cex.axis=0.7,cex.names=0.7,ylim=c(0,80), col=c("#9e9ac8",
"#6a51a3"))
dev.off()

 > dput(head(d1))
structure(list(Domain.decomp. = c("2. 1", "2"), DD.com..load = c(0L,
0L), Neighbor.search = c("3.7", "3. 1"), Launch.PP.GPU.ops. = c("0. 1",
"0"), Comm..coord. = c("1 .6", "1 .0"), Force = c("1 . 5", "1 .2"
), Wait...Comm..F = c("1 .3", "1 .7"), PIE.mesh = c(65.6, 70.9
), Wait.Bonded.GPU = c(0L, 0L), wait.GPU.NB.nonloc. = c(0L, 0L
), Wait.GPU.NB.local = c(0L, 0L), NB.X.F.buffer.ops. = c(7.3,
4.4), Write.traje = c(0.3, 0.3), Update = c(6.3, 4.3), Constraints = c(8.9,
9.7), Comm..energies = c(0.9, 0.9), PIE.redist..X.F = c("8. 1",
"8.7"), PIE.spread = c(29.7, 30.6), PIE.gather = c("19.9", "21 .3"
), PIE.3D.FFT = c(6, 8.6), PIE.3D.FFT.comm. = c("1 .2", "1 .0"
), PIE.solve.Elec = c(0.7, 0.5)), row.names = 1:2, class = "data.frame")

Now my problem is that when I save my plot as PDF my labels on X axis are
cut off. Any advice about that?



On Tue, Jun 13, 2023 at 5:14 PM David Carlson  wrote:

> Your first data column appears to contain character data (e.g. SYCL) which
> cannot be converted to numeric. You also appear to have 0's in the numeric
> columns which will cause problems since log(0) is -Inf. Barplots are useful
> for categorical data, but not continuous, numeric data which are better
> handled with box plots or strip charts.
>
> Do not use printouts of your data since it hides important information.
> Use str(a11) and dput(a11) or dput(head(a11)) to provide useful information
> about your data.
>
> David L Carlson
> Texas A&M University
>
>
> On Tue, Jun 13, 2023 at 4:08 PM Ana Marija 
> wrote:
>
>> Hello, I have a data frame like this: d11=suppressWarnings(read.
>> csv("/Users/anamaria/Downloads/B1. csv", stringsAsFactors=FALSE,
>> header=TRUE)) > d11 X Domain. decomp. DD. com. . load Neighbor. search
>> Launch. PP. GPU. ops. Comm. . coord. 1 SYCL 2. 1
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>>
>> Hello,
>>
>> I have a data frame like this:
>>
>> d11=suppressWarnings(read.csv("/Users/anamaria/Downloads/B1.csv",
>> stringsAsFactors=FALSE, header=TRUE))
>>
>> > d11
>>  X Domain.decomp. DD.com..load Neighbor.search Launch.PP.GPU.ops.
>> Comm..coord.
>> 1 SYCL   2. 10 3.7   0. 1
>>   1 .6
>> 2 CUDA  203. 1  0
>>   1 .0
>>   Force Wait...Comm..F PIE.mesh Wait.Bonded.GPU wait.GPU.NB.nonloc.
>> 1 1 . 5   1 .3 65.6   0   0
>> 2  1 .2   1 .7 70.9   0   0
>>   Wait.GPU.NB.local NB.X.F.buffer.ops. Write.traje Update Constraints
>> Comm..energies
>> 1 07.3 0.36.3 8.9
>>  0.9
>> 2 04.4 0.34.3 9.7
>>  0.9
>>   PIE.redist..X.F PIE.spread PIE.gather PIE.3D.FFT PIE.3D.FFT.comm.
>> PIE.solve.Elec
>> 18. 1   29.7   19.96.0 1 .2
>>0.7
>> 2 8.7   30.6  21 .38.6 1 .0
>>0.5
>>
>> I am trying to log transform the whole data frame, but I get this error:
>>
>> > d1=log(d11)
>> Error in Math.data.frame(d11) :
>>   non-numeric variable(s) in data frame: X, Domain.decomp.,
>> Neighbor.search, Launch.PP.GPU.ops., Comm..coord., Force, Wait...Comm..F,
>> PIE.redist..X.F, PIE.gather, PIE.3D.FFT.comm
>>
>>
>> My goal is to make a stacked barplot like this:
>> d2=as.matrix(sapply(d1, as.numeric))
>> b<-barplot(d2, legend= rownames(data2), beside=
>> TRUE,las=2,cex.axis=0.7,cex.names=0.7,ylim=c(0,80), col=c("#9e9ac8",
>> "#6a51a3"))
>>
>> If I don't log transform  my code runs.
>>
>> Please advise,
>> Ana
>>
>>  [[alternative HTML version deleted]]
>>
>> __r-h...@r-project.org mailing 
>

[R] log transform a data frame

2023-06-13 Thread Ana Marija

Hello,

I have a data frame like this:

d11=suppressWarnings(read.csv("/Users/anamaria/Downloads/B1.csv",
stringsAsFactors=FALSE, header=TRUE))

> d11
 X Domain.decomp. DD.com..load Neighbor.search Launch.PP.GPU.ops.
Comm..coord.
1 SYCL   2. 10 3.7   0. 1
  1 .6
2 CUDA  203. 1  0
  1 .0
  Force Wait...Comm..F PIE.mesh Wait.Bonded.GPU wait.GPU.NB.nonloc.
1 1 . 5   1 .3 65.6   0   0
2  1 .2   1 .7 70.9   0   0
  Wait.GPU.NB.local NB.X.F.buffer.ops. Write.traje Update Constraints
Comm..energies
1 07.3 0.36.3 8.9
 0.9
2 04.4 0.34.3 9.7
 0.9
  PIE.redist..X.F PIE.spread PIE.gather PIE.3D.FFT PIE.3D.FFT.comm.
PIE.solve.Elec
18. 1   29.7   19.96.0 1 .2
   0.7
2 8.7   30.6  21 .38.6 1 .0
   0.5

I am trying to log transform the whole data frame, but I get this error:

> d1=log(d11)
Error in Math.data.frame(d11) :
  non-numeric variable(s) in data frame: X, Domain.decomp.,
Neighbor.search, Launch.PP.GPU.ops., Comm..coord., Force, Wait...Comm..F,
PIE.redist..X.F, PIE.gather, PIE.3D.FFT.comm


My goal is to make a stacked barplot like this:
d2=as.matrix(sapply(d1, as.numeric))
b<-barplot(d2, legend= rownames(data2), beside=
TRUE,las=2,cex.axis=0.7,cex.names=0.7,ylim=c(0,80), col=c("#9e9ac8",
"#6a51a3"))

If I don't log transform  my code runs.

Please advise,
Ana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to do inverse log of every value in every column in data frame

2021-10-14 Thread Ana Marija

Thank you so much!

On Thu, Oct 14, 2021 at 12:17 PM Bert Gunter  wrote:

> As all of your columns are numeric, you should probably convert your df to
> a matrix. Then use exp() on that, of course:
> exp(as.matrix(b))
>
> see ?exp
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Thu, Oct 14, 2021 at 10:10 AM Ana Marija 
> wrote:
>
>> Hi All,
>>
>> I have a data frame like this:
>>
>> > head(b)
>>  LRET02LRET04LRET06LRET08LRET10LRET12LRET14
>> 1 0 0.6931472 . 1.0986123 1.0986123 1.0986123 0.6931472
>> 2 2.1972246 2.4849066 2.4849066 . 2.5649494 2.6390573 2.6390573
>> 3 1.6094379 1.7917595 1.6094379 1.7917595 2.0794415 1.9459101 2.0794415
>> 4 0 0 0 0 0 0 0
>> 5 0.6931472 0 1.0986123 1.0986123 0.6931472 0.6931472 0.6931472
>> 6 1.0986123 1.0986123 1.0986123 0.6931472 1.0986123 1.3862944 1.0986123
>>
>> All values in this data frame are product of natural log. I have to do
>> inverse of it.
>> So for example do do inverse of 0.6931472 I would do:
>> > 2.718281828^0.6931472
>> [1] 2
>>
>> How do I perform this operation for every single value in this data frame?
>>
>> The original data frame is this dimension:
>> > dim(b)
>> [1] 1441   18
>>
>> Thanks
>> Ana
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to do inverse log of every value in every column in data frame

2021-10-14 Thread Ana Marija

Hi All,

I have a data frame like this:

> head(b)
 LRET02LRET04LRET06LRET08LRET10LRET12LRET14
1 0 0.6931472 . 1.0986123 1.0986123 1.0986123 0.6931472
2 2.1972246 2.4849066 2.4849066 . 2.5649494 2.6390573 2.6390573
3 1.6094379 1.7917595 1.6094379 1.7917595 2.0794415 1.9459101 2.0794415
4 0 0 0 0 0 0 0
5 0.6931472 0 1.0986123 1.0986123 0.6931472 0.6931472 0.6931472
6 1.0986123 1.0986123 1.0986123 0.6931472 1.0986123 1.3862944 1.0986123

All values in this data frame are product of natural log. I have to do
inverse of it.
So for example do do inverse of 0.6931472 I would do:
> 2.718281828^0.6931472
[1] 2

How do I perform this operation for every single value in this data frame?

The original data frame is this dimension:
> dim(b)
[1] 1441   18

Thanks
Ana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate power-linear mixed effect model

2021-09-17 Thread Ana Marija

Thank you so much for that info!

On Fri, Sep 17, 2021 at 3:06 PM Bert Gunter  wrote:
>
> Wrong list! Post on r-sig-mixed-models, not here.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Fri, Sep 17, 2021 at 12:22 PM Ana Marija  
> wrote:
> >
> > Hi All,
> >
> > I plan to identify metabolite levels that differ between individuals
> > with various retinopathy outcomes (DR or noDR). I plan to model
> > metabolite levels using linear mixed models ref as implemented in
> > lmm2met software. The model covariates will include: age, sex, SV1,
> > SV, and disease_condition.
> >
> > The random effect is subject variation (ID)
> >
> > Disease condition is the fixed effect because I am interested in
> > metabolite differences between those disease conditions.
> >
> > This command  will build a model for each metabolite:
> > fitMet = fitLmm(fix=c('Sex','Age','SV1,'SV2','disease_condition'),
> > random='(1|ID)', data=df, start=10)
> >
> > SV1 and SV2 are surrogate variables (numerical values)
> >
> > Next I need to calculate the power of my study. Let's say that I have
> > 1,172 individuals total in the study, from which 431 are DR. Let's say
> > that I would like to determine the power of this study given the
> > effect size of 0.337.
> >
> > I know about SIMR software in R but I am not sure how to apply it to
> > my study design.
> >
> > I looked at this paper:
> > https://besjournals.onlinelibrary.wiley.com/doi/10./2041-210X.12504
> >
> > But I am not sure how to adapt the code given in the tutorial so that
> > it is matching to mine design.
> >
> > Can you please help,
> >
> > Thanks
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calculate power-linear mixed effect model

2021-09-17 Thread Ana Marija

Hi All,

I plan to identify metabolite levels that differ between individuals
with various retinopathy outcomes (DR or noDR). I plan to model
metabolite levels using linear mixed models ref as implemented in
lmm2met software. The model covariates will include: age, sex, SV1,
SV, and disease_condition.

The random effect is subject variation (ID)

Disease condition is the fixed effect because I am interested in
metabolite differences between those disease conditions.

This command  will build a model for each metabolite:
fitMet = fitLmm(fix=c('Sex','Age','SV1,'SV2','disease_condition'),
random='(1|ID)', data=df, start=10)

SV1 and SV2 are surrogate variables (numerical values)

Next I need to calculate the power of my study. Let's say that I have
1,172 individuals total in the study, from which 431 are DR. Let's say
that I would like to determine the power of this study given the
effect size of 0.337.

I know about SIMR software in R but I am not sure how to apply it to
my study design.

I looked at this paper:
https://besjournals.onlinelibrary.wiley.com/doi/10./2041-210X.12504

But I am not sure how to adapt the code given in the tutorial so that
it is matching to mine design.

Can you please help,

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unable to remove NAs from a data frame

2021-09-16 Thread Ana Marija

Completely true. Thank you for your help

On Thu, Sep 16, 2021 at 12:37 PM Rui Barradas  wrote:

> Hello,
>
> You are trying to access elements that do not exist, see the example below:
>
>
> x <- 1:3
> x[5]  # beyond the last element
> #[1] NA
>
> dim(df)
> #[1] 145092258
>
> df[14509227,] # beyond nrow(df) by 2
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 15:12 de 16/09/21, Ana Marija escreveu:
> > Hi All,
> >
> > I have lines in file that look like this:
> >
> >> df[14509227,]
> >  SNP   A1   A2 freq  b se  p  N
> > 1:  NA NA NA NA NA
> >
> > data looks like this:
> >> head(df)
> > SNP A1 A2  freq   b se  p  N
> > 1:  rs74337086  G  A 0.0024460  0.1627 0.1231 0.1865 218792
> > 2:  rs76388980  G  A 0.0034150  0.1451 0.1047 0.1660 218792
> > ...
> >> sapply(df,class)
> >  SNP  A1  A2freq   b  se
> > "character" "character" "character"   "numeric"   "numeric"   "numeric"
> >p   N
> >"numeric"   "integer"
> >
> >> dim(df)
> > [1] 145092258
> >
> > Tried:
> >> df=na.omit(df)
> >> dim(df)
> > [1] 145092258
> >
> > and:
> >> library(tidyr)
> >> d=df %>% drop_na()
> >> dim(d)
> > [1] 145092258
> >
> >
> > Please advise,
> >
> > Thanks
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] unable to remove NAs from a data frame

2021-09-16 Thread Ana Marija

Hi All,

I have lines in file that look like this:

> df[14509227,]
SNP   A1   A2 freq  b se  p  N
1:  NA NA NA NA NA

data looks like this:
> head(df)
   SNP A1 A2  freq   b se  p  N
1:  rs74337086  G  A 0.0024460  0.1627 0.1231 0.1865 218792
2:  rs76388980  G  A 0.0034150  0.1451 0.1047 0.1660 218792
...
> sapply(df,class)
SNP  A1  A2freq   b  se
"character" "character" "character"   "numeric"   "numeric"   "numeric"
  p   N
  "numeric"   "integer"

> dim(df)
[1] 145092258

Tried:
> df=na.omit(df)
> dim(df)
[1] 145092258

and:
> library(tidyr)
> d=df %>% drop_na()
> dim(d)
[1] 145092258


Please advise,

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in n * rvec : non-numeric argument to binary operator

2021-04-19 Thread Ana Marija

Hello,

I have this code, and when I run it:
> kbpowerf()
Error in n * rvec : non-numeric argument to binary operator

this is the code:

function (){
  #USER SPECIFICATION PORTION
  alpha=0.05 #DESIGNATED ALPHA
  g=3 #NUMBER OF GROUPS
  nvec=c(25,10,15) #GROUP SIZES
  beta1vec=c(789.93,122.87,1871.99) #SLOPE COEFFICIENTS
  sigsq=209460.57 #ERROR VARIANCE
  tausqvec=c(0.1596,0.1602,0.1360) #PREDICTOR VARIANCES
  #END OF SPECIFICATION
  df1<-g-1
  numint<-200
  l<-numint+1
  dd<-1e-6
  coevec<-c(1,rep(c(4,2),numint/2-1),4,1)
  set.seed(2017)
  repn<-2
  kbpowerf<-function(){
df<-sum(nvec)-2*g
fcrit<-qf(1-alpha,df1,df)
dfkvec<-nvec-1
dfk<-sum(dfkvec)
cl<-dd
cu<-qchisq(1-dd,dfk)
intc<-(cu-cl)/numint
cvec<-cl+intc*(0:numint)
wcpdf<-(intc/3)*coevec*dchisq(cvec,dfk)
dfb1<-cumsum(dfkvec[1:g-1])
dfb2<-dfkvec[2:g]
ep<-0
for (i in seq(repn)) {
  bvec<-rbeta(df1,dfb1/2,dfb2/2)
  avec<-rep(0,g)
  avec[1]<-exp(sum(log(bvec)))
  for (ig in (2:df1)) {
avec[ig]<-(1-bvec[ig-1])*exp(sum(log(bvec[ig:df1])))
  }
  avec[g]<-1-bvec[df1]
  oavec<-tausqvec*avec
  buw<-sum(oavec*beta1vec)/sum(oavec)
  lamvec<-cvec*sum(oavec*(beta1vec-buw)^2)/sigsq
  ep<-ep+sum(wcpdf*pf(fcrit,df1,df,lamvec,lower.tail=FALSE))
}
kbpower<-ep/repn
  }
  kbpower<-kbpowerf()
  return(list(sample_size=nvec,attained_power=kbpower))
}

Can you please advise,
Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to quantify outliers on multi-dimensional scaling plot?

2021-03-31 Thread Ana Marija

Hello,

I am in process of writing a grant where I am explaining my planned
methylation analysis using R software "minfi". In the text of the
grant I am mentioning looking for samples containing outliers in the
multi-dimensional scaling (MDS) plot
https://rdrr.io/bioc/minfi/man/mdsPlot.html . My question is: how to
more precisely define what is an outlier on a MDS plot? I know it is
not logFC like on volcano plot. Is there is a way to quantify distance
on MDA plot or it is just sufficient to say that samples that cluster
separately are outliers?

Here you can find a full workflow for minfi package including making MDS plots:
https://www.bioconductor.org/packages/release/workflows/vignettes/methylationArrayAnalysis/inst/doc/methylationArrayAnalysis.html

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] making code (loop) more efficient

2020-12-16 Thread Ana Marija

Indeed it was the issue with data.table. I converted it to data.frame
and it worked like a charm.
Thank you so much for your insight!

This is the code that worked:

library(parallel)
library(data.table)
library(doSNOW)

n <-  parallel::detectCores()
cl <- parallel::makeCluster(n, type = "SOCK")
doSNOW::registerDoSNOW(cl)
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

lst_out <- foreach::foreach(i = seq_along(files),
  .packages = c("data.table") ) %dopar% {

   tmp <- get(load(files[i]))
   a <- data.table::copy(tmp)
   a=as.data.frame(a)
   rm(tmp)
   gc()

   names <- rownames(a)
   if("blup" %in% colnames(a)) {
 data <- data.table(names, a["blup"])
 nm1 <- c("rsid", "ref_allele", "eff_allele")
 data[,  (nm1) := tstrsplit(names, ":")[-2]]
 out <- data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
   WGT := files[i]][]
} else {

 data <- data.table(names)
 nm1 <- c("rsid", "ref_allele", "eff_allele")
 data[,  (nm1) := tstrsplit(names, ":")[-2]]
 out <- data[, .(rsid,  ref_allele, eff_allele)][,
   WGT := files[i]][]
   }

return(out)
   rm(data)
   gc()
 }
parallel::stopCluster(cl)

big_data <- rbindlist(lst_out, fill = TRUE)

On Wed, Dec 16, 2020 at 9:31 AM Ana Marija  wrote:
>
> HI Jim,
>
> this is what I as running:
>
> library(parallel)
> library(data.table)
> library(foreach)
> library(doSNOW)
>
> n <-  parallel::detectCores()
> cl <- parallel::makeCluster(n, type = "SOCK")
> doSNOW::registerDoSNOW(cl)
> files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
>
> lst_out <- foreach::foreach(i = seq_along(files),
>   .packages = c("data.table") ) %dopar% {
>
>a <- get(load(files[i]))
> namesplit<-strsplit(rownames(a),":")
> rsid<-unlist(lapply(namesplit,"[",1))
> ref_allele<-unlist(lapply(namesplit,"[",3))
> eff_allele<-unlist(lapply(namesplit,"[",4))
> WGT<-rep(files[i],length(rsid))
> data<-data.frame(rsid=rsid,weight=a$blup, #weight is "blup" column
>  ref_allele=ref_allele,eff_allele,WGT=WGT)
>
>return(data)
>  }
> parallel::stopCluster(cl)
>
> big_data <- rbindlist(lst_out)
>
> and i got:
> Error in { : task 4 failed - "$ operator is invalid for atomic vectors"
> > parallel::stopCluster(cl)
>
> I uploaded 3 RDat file here if you want to try it
> https://github.com/montenegrina/sample
>
> Thank you for looking into this
> Ana
>
> On Tue, Dec 15, 2020 at 11:45 PM Jim Lemon  wrote:
> >
> > Hi Ana,
> > Back on the job. I'm not sure how this will work in your setup, but
> > here is a try:
> >
> > a<-read.table(text="top1 blup lasso enet
> > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> > rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> > rs79410690:190097:G:A 0.26194244  5.019037e-04 00",
> > header=TRUE,stringsAsFactors=FALSE)
> > namesplit<-strsplit(rownames(a),":")
> > rsid<-unlist(lapply(namesplit,"[",1))
> > ref_allele<-unlist(lapply(namesplit,"[",3))
> > eff_allele<-unlist(lapply(namesplit,"[",4))
> > # here I'm assuming that the filename
> > # is stored in files[i]
> > files<-"retina.ENSG0135776.wgt.RDat"
> > i<-1
> > WGT<-rep(files[i],length(rsid))
> > data<-data.frame(rsid=rsid,weight=a$top1,
> >  ref_allele=ref_allele,eff_allele,WGT=WGT)
> > data
> >
> > Note that the output is a data frame, not a data table. I hope that
> > the function for creating a data table is close enough to that for a
> > data frame for you to work it out. If not I can probably have a look
> > at it a bit later.
> >
> > Jim
> >
> > On Wed, Dec 16, 2020 at 1:45 PM Ana Marija  
> > wrote:
> > >
> > > Hi Jim,
> > >
> > > as always you're completely right, this is what is happening:
> > >
> > > > head(a)
> > > top1  blup lasso enet
> > > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > &

Re: [R] making code (loop) more efficient

2020-12-16 Thread Ana Marija

HI Jim,

this is what I as running:

library(parallel)
library(data.table)
library(foreach)
library(doSNOW)

n <-  parallel::detectCores()
cl <- parallel::makeCluster(n, type = "SOCK")
doSNOW::registerDoSNOW(cl)
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

lst_out <- foreach::foreach(i = seq_along(files),
  .packages = c("data.table") ) %dopar% {

   a <- get(load(files[i]))
namesplit<-strsplit(rownames(a),":")
rsid<-unlist(lapply(namesplit,"[",1))
ref_allele<-unlist(lapply(namesplit,"[",3))
eff_allele<-unlist(lapply(namesplit,"[",4))
WGT<-rep(files[i],length(rsid))
data<-data.frame(rsid=rsid,weight=a$blup, #weight is "blup" column
 ref_allele=ref_allele,eff_allele,WGT=WGT)

   return(data)
 }
parallel::stopCluster(cl)

big_data <- rbindlist(lst_out)

and i got:
Error in { : task 4 failed - "$ operator is invalid for atomic vectors"
> parallel::stopCluster(cl)

I uploaded 3 RDat file here if you want to try it
https://github.com/montenegrina/sample

Thank you for looking into this
Ana

On Tue, Dec 15, 2020 at 11:45 PM Jim Lemon  wrote:
>
> Hi Ana,
> Back on the job. I'm not sure how this will work in your setup, but
> here is a try:
>
> a<-read.table(text="top1 blup lasso enet
> rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> rs79410690:190097:G:A 0.26194244  5.019037e-04 00",
> header=TRUE,stringsAsFactors=FALSE)
> namesplit<-strsplit(rownames(a),":")
> rsid<-unlist(lapply(namesplit,"[",1))
> ref_allele<-unlist(lapply(namesplit,"[",3))
> eff_allele<-unlist(lapply(namesplit,"[",4))
> # here I'm assuming that the filename
> # is stored in files[i]
> files<-"retina.ENSG0135776.wgt.RDat"
> i<-1
> WGT<-rep(files[i],length(rsid))
> data<-data.frame(rsid=rsid,weight=a$top1,
>  ref_allele=ref_allele,eff_allele,WGT=WGT)
> data
>
> Note that the output is a data frame, not a data table. I hope that
> the function for creating a data table is close enough to that for a
> data frame for you to work it out. If not I can probably have a look
> at it a bit later.
>
> Jim
>
> On Wed, Dec 16, 2020 at 1:45 PM Ana Marija  
> wrote:
> >
> > Hi Jim,
> >
> > as always you're completely right, this is what is happening:
> >
> > > head(a)
> > top1  blup lasso enet
> > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
> > rs1500098:189093:G:C  0.42032471  9.001814e-04 00
> > rs79410690:190097:G:A 0.26194244  5.019037e-04 00
> > >names <- rownames(a)
> > >data <- data.table(names, a["blup"])
> > > head(data)
> >names V2
> > 1:  rs4980905:184404:C:A NA
> > 2:  rs7978751:187541:G:C NA
> > 3:  rs2368831:188285:C:T NA
> > 4: rs12830904:188335:T:A NA
> > 5:  rs1500098:189093:G:C NA
> > 6: rs79410690:190097:G:A NA
> >
> > So my goal is to transform what is in "a" to this for every RDat file:
> >
> >   rsidweight ref_allele eff_allele
> > 1:  rs72763981  9.376766e-09  C  G
> > 2: rs144383755 -2.093346e-09  A  G
> > 3:   rs1925717  1.511376e-08  T  C
> > 4:  rs61827307 -1.625302e-08  C  A
> > 5:  rs61827308 -1.625302e-08  G  C
> > 6: rs199623136 -9.128354e-10 GC  G
> >WGT
> > 1: retina.ENSG0135776.wgt.RDat
> > 2: retina.ENSG0135776.wgt.RDat
> > 3: retina.ENSG0135776.wgt.RDat
> > 4: retina.ENSG0135776.wgt.RDat
> > 5: retina.ENSG0135776.wgt.RDat
> > 6: retina.ENSG0135776.wgt.RDat
> >
> > so from rs4980905:184404:C:A I would take rs4980905 to be in column
> > "rsid", C in column "ref_allele" and A to be in column "eff_allele",
> > WGT column would just be filled with a name of the particular RDat
> > file.
> >
> > So the issue is in these lines:
> >
> > a <- get(load(files[i]))
> > names <- rowname

Re: [R] making code (loop) more efficient

2020-12-15 Thread Ana Marija

Hi Jim,

as always you're completely right, this is what is happening:

> head(a)
top1  blup lasso enet
rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
rs7978751:187541:G:C  0.62411425  9.934994e-04 00
rs2368831:188285:C:T  0.69529158  1.211028e-03 00
rs12830904:188335:T:A 0.92793158 -9.143555e-05 00
rs1500098:189093:G:C  0.42032471  9.001814e-04 00
rs79410690:190097:G:A 0.26194244  5.019037e-04 00
>names <- rownames(a)
>data <- data.table(names, a["blup"])
> head(data)
   names V2
1:  rs4980905:184404:C:A NA
2:  rs7978751:187541:G:C NA
3:  rs2368831:188285:C:T NA
4: rs12830904:188335:T:A NA
5:  rs1500098:189093:G:C NA
6: rs79410690:190097:G:A NA

So my goal is to transform what is in "a" to this for every RDat file:

  rsidweight ref_allele eff_allele
1:  rs72763981  9.376766e-09  C  G
2: rs144383755 -2.093346e-09  A  G
3:   rs1925717  1.511376e-08  T  C
4:  rs61827307 -1.625302e-08  C  A
5:  rs61827308 -1.625302e-08  G  C
6: rs199623136 -9.128354e-10 GC  G
   WGT
1: retina.ENSG0135776.wgt.RDat
2: retina.ENSG0135776.wgt.RDat
3: retina.ENSG0135776.wgt.RDat
4: retina.ENSG0135776.wgt.RDat
5: retina.ENSG0135776.wgt.RDat
6: retina.ENSG0135776.wgt.RDat

so from rs4980905:184404:C:A I would take rs4980905 to be in column
"rsid", C in column "ref_allele" and A to be in column "eff_allele",
WGT column would just be filled with a name of the particular RDat
file.

So the issue is in these lines:

a <- get(load(files[i]))
names <- rownames(a)
data <- data.table(names, a["blup"])
nm1 <- c("rsid", "ref_allele", "eff_allele")

any idea how I can rewrite this?



On Tue, Dec 15, 2020 at 8:30 PM Jim Lemon  wrote:
>
> Hi Ana,
> I would look at "data" in your second example and see if it contains a
> column named "blup" or just the values that were extracted from
> a$blup. Also, I assume that weight=blup looks for an object named
> "blup", which may not be there.
>
> Jim
>
> On Wed, Dec 16, 2020 at 1:20 PM Ana Marija  
> wrote:
> >
> > Hi Jim,
> >
> > Maybe my post is confusing.
> >
> > so "dd" came from my slow code and I don't use it again in parallelized 
> > code.
> >
> > So for example for one of my files:
> >
> > if
> > i="retina.ENSG0120647.wgt.RDat"
> > > a <- get(load(i))
> > > head(a)
> > top1  blup lasso enet
> > rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
> > rs7978751:187541:G:C  0.62411425  9.934994e-04 00
> > rs2368831:188285:C:T  0.69529158  1.211028e-03 00
> > ...
> >
> > Slow code was posted just to show what was running very slow and it
> > was running. I really need help fixing parallelized version.
> >
> > On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon  wrote:
> > >
> > > Hi Ana,
> > > My guess is that in your second code fragment you are assigning the
> > > rownames of "a" and the _values_ contained in a$blup to the data.table
> > > "data". As I don't have much experience with data tables I may be
> > > wrong, but I suspect that the column name "blup" may not be visible or
> > > even present in "data". I don't see it in "dd" above this code
> > > fragment.
> > >
> > > Jim
> > >
> > > On Wed, Dec 16, 2020 at 11:12 AM Ana Marija  
> > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I made a terribly inefficient code which runs forever but it does run.
> > > >
> > > > library(dplyr)
> > > > library(splitstackshape)
> > > >
> > > > datalist = list()
> > > > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> > > >
> > > > for(i in files)
> > > > {
> > > > a<-get(load(i))
> > > > names <- rownames(a)
> > > > data <- as.data.frame(cbind(names,a))
> > > > rownames(data) <- NULL
> > > > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
> > > > seps = ":"))
> > > > dd=select(dd,names_1,blup,names_3,names_4)
> > > > colnames(dd)=c("rsid","weight","ref_allele","

Re: [R] making code (loop) more efficient

2020-12-15 Thread Ana Marija

Hi Jim,

Maybe my post is confusing.

so "dd" came from my slow code and I don't use it again in parallelized code.

So for example for one of my files:

if
i="retina.ENSG0120647.wgt.RDat"
> a <- get(load(i))
> head(a)
top1  blup lasso enet
rs4980905:184404:C:A  0.07692622 -1.881795e-04 00
rs7978751:187541:G:C  0.62411425  9.934994e-04 00
rs2368831:188285:C:T  0.69529158  1.211028e-03 00
...

Slow code was posted just to show what was running very slow and it
was running. I really need help fixing parallelized version.

On Tue, Dec 15, 2020 at 7:35 PM Jim Lemon  wrote:
>
> Hi Ana,
> My guess is that in your second code fragment you are assigning the
> rownames of "a" and the _values_ contained in a$blup to the data.table
> "data". As I don't have much experience with data tables I may be
> wrong, but I suspect that the column name "blup" may not be visible or
> even present in "data". I don't see it in "dd" above this code
> fragment.
>
> Jim
>
> On Wed, Dec 16, 2020 at 11:12 AM Ana Marija  
> wrote:
> >
> > Hello,
> >
> > I made a terribly inefficient code which runs forever but it does run.
> >
> > library(dplyr)
> > library(splitstackshape)
> >
> > datalist = list()
> > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> >
> > for(i in files)
> > {
> > a<-get(load(i))
> > names <- rownames(a)
> > data <- as.data.frame(cbind(names,a))
> > rownames(data) <- NULL
> > dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
> > seps = ":"))
> > dd=select(dd,names_1,blup,names_3,names_4)
> > colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
> > dd$WGT<-i
> > datalist[[i]] <- dd # add it to your list
> > }
> >
> > big_data = do.call(rbind, datalist)
> >
> > There is 17345 RDat files this loop has to go through. And each file
> > has approximately 10,000 lines. All RDat files can be downloaded from
> > here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
> > they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
> > And subset of 3 of those .RDat files is here:
> > https://github.com/montenegrina/sample
> >
> > For one of those files, say i="retina.ENSG0135776.wgt.RDat"
> > dd looks like this:
> >
> > > head(dd)
> >   rsidweight ref_allele eff_allele
> > 1:  rs72763981  9.376766e-09  C  G
> > 2: rs144383755 -2.093346e-09  A  G
> > 3:   rs1925717  1.511376e-08  T  C
> > 4:  rs61827307 -1.625302e-08  C  A
> > 5:  rs61827308 -1.625302e-08  G  C
> > 6: rs199623136 -9.128354e-10 GC  G
> >WGT
> > 1: retina.ENSG0135776.wgt.RDat
> > 2: retina.ENSG0135776.wgt.RDat
> > 3: retina.ENSG0135776.wgt.RDat
> > 4: retina.ENSG0135776.wgt.RDat
> > 5: retina.ENSG0135776.wgt.RDat
> > 6: retina.ENSG0135776.wgt.RDat
> >
> > so on attempt to parallelize this I did this:
> >
> > library(parallel)
> > library(data.table)
> > library(foreach)
> > library(doSNOW)
> >
> > n <-  parallel::detectCores()
> > cl <- parallel::makeCluster(n, type = "SOCK")
> > doSNOW::registerDoSNOW(cl)
> > files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)
> >
> > lst_out <- foreach::foreach(i = seq_along(files),
> >   .packages = c("data.table") ) %dopar% {
> >
> >a <- get(load(files[i]))
> >names <- rownames(a)
> >data <- data.table(names, a["blup"])
> >nm1 <- c("rsid", "ref_allele", "eff_allele")
> >data[,  (nm1) := tstrsplit(names, ":")[-2]]
> >return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
> >WGT := files[i]][])
> >  }
> > parallel::stopCluster(cl)
> >
> > big_data <- rbindlist(lst_out)
> >
> > I am getting this Error:
> >
> > Error in { : task 7 failed - "object 'blup' not found"
> > > parallel::stopCluster(cl)
> >
> > Can you please advise,
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] making code (loop) more efficient

2020-12-15 Thread Ana Marija

Hello,

I made a terribly inefficient code which runs forever but it does run.

library(dplyr)
library(splitstackshape)

datalist = list()
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

for(i in files)
{
a<-get(load(i))
names <- rownames(a)
data <- as.data.frame(cbind(names,a))
rownames(data) <- NULL
dd=na.omit(concat.split.multiple(data = data, split.cols = c("names"),
seps = ":"))
dd=select(dd,names_1,blup,names_3,names_4)
colnames(dd)=c("rsid","weight","ref_allele","eff_allele")
dd$WGT<-i
datalist[[i]] <- dd # add it to your list
}

big_data = do.call(rbind, datalist)

There is 17345 RDat files this loop has to go through. And each file
has approximately 10,000 lines. All RDat files can be downloaded from
here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115828 and
they are compressed in this file: GSE115828_retina_TWAS_wgts.tar.gz .
And subset of 3 of those .RDat files is here:
https://github.com/montenegrina/sample

For one of those files, say i="retina.ENSG0135776.wgt.RDat"
dd looks like this:

> head(dd)
  rsidweight ref_allele eff_allele
1:  rs72763981  9.376766e-09  C  G
2: rs144383755 -2.093346e-09  A  G
3:   rs1925717  1.511376e-08  T  C
4:  rs61827307 -1.625302e-08  C  A
5:  rs61827308 -1.625302e-08  G  C
6: rs199623136 -9.128354e-10 GC  G
   WGT
1: retina.ENSG0135776.wgt.RDat
2: retina.ENSG0135776.wgt.RDat
3: retina.ENSG0135776.wgt.RDat
4: retina.ENSG0135776.wgt.RDat
5: retina.ENSG0135776.wgt.RDat
6: retina.ENSG0135776.wgt.RDat

so on attempt to parallelize this I did this:

library(parallel)
library(data.table)
library(foreach)
library(doSNOW)

n <-  parallel::detectCores()
cl <- parallel::makeCluster(n, type = "SOCK")
doSNOW::registerDoSNOW(cl)
files <- list.files("/WEIGHTS1/Retina", pattern=".RDat", ignore.case=T)

lst_out <- foreach::foreach(i = seq_along(files),
  .packages = c("data.table") ) %dopar% {

   a <- get(load(files[i]))
   names <- rownames(a)
   data <- data.table(names, a["blup"])
   nm1 <- c("rsid", "ref_allele", "eff_allele")
   data[,  (nm1) := tstrsplit(names, ":")[-2]]
   return(data[, .(rsid, weight = blup, ref_allele, eff_allele)][,
       WGT := files[i]][])
 }
parallel::stopCluster(cl)

big_data <- rbindlist(lst_out)

I am getting this Error:

Error in { : task 7 failed - "object 'blup' not found"
> parallel::stopCluster(cl)

Can you please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to order variables on correlation plot

2020-11-06 Thread Ana Marija

Thank you!

On Fri, Nov 6, 2020 at 1:45 PM David Winsemius  wrote:
>
> Why did you specify a different order parameter if that is not what you 
> wanted?
>
> Suggest you look more carefully at the parameters of the code you are copying 
> and pasting and also at the help page, ?corrplot .
>
> --
>
> David.
>
> On 11/6/20 6:08 AM, Ana Marija wrote:
>
> sorry forgot to attach the plot.
>
> On Fri, Nov 6, 2020 at 8:07 AM Ana Marija  wrote:
>
> Hello
>
> I have data like this:
>
> head(my_data)
>
>   subjects DIABDUR HBA1C ESRD SEX AGE PHENO  C1   C2
> 1 fam0110_G110  38   9.41   2  51 2 -0.01144980  0.002661140
> 2 fam0113_G113  30  12.51   2  40 2 -0.00502052 -0.000929061
> 3 fam0114_G114  23   8.42   2  45 2 -0.00251578 -0.003450950
> 4 fam0117_G117  37   9.02   2  46 2 -0.00704917 -0.000573325
> 5 fam0119_G119  22   9.41   1  46 1  0.00263433  0.001002370
> 6 fam0119_G120  NANA1   1  71 1 -0.00354795 -0.002045940
> C3  C4  C5   C6   C7  C8
> 1  0.006028150 -0.00176795 -0.00148375  0.004543550 -0.006272170 -0.00535077
> 2 -0.000453402 -0.00192162  0.00416229  0.007868230 -0.001957670 -0.00473148
> 3 -0.001680860 -0.00620438 -0.00235092  0.000672831 -0.000278318  0.00647337
> 4  0.001436740  0.00155568 -0.00556147 -0.000386401 -0.006885350  0.00135539
> 5 -0.007396920  0.00326229  0.00355575 -0.011149400  0.009156510  0.00120833
> 6  0.004532050  0.00869862 -0.00113207  0.002244520 -0.002119220  0.00657587
>C9 C10
> 1  0.00328111 -0.00113515
> 2 -0.00495790  0.00320201
> 3  0.00208591 -0.00874752
> 4 -0.00967934  0.00607760
> 5  0.00611030  0.00876190
> 6 -0.00990661  0.00635349
>
> I am plotting it with:
>
> library(dplyr)
> library(magrittr)
> library(corrplot)
> d=my_data %>% data.frame %>% set_rownames(.$subjects) %>% select(-subjects)
> res <- cor(d, use = "complete.obs")
> pdf("correlation.pdf")
> corrplot(res, type = "upper", order = "hclust",
>  tl.col = "black", tl.srt = 45)
> dev.off()
>
> and I am getting the plot in attach. How to make it so that my
> variables are shown on the plot in the order they are in my_data data
> frame?
>
> Thanks
> Ana
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to order variables on correlation plot

2020-11-06 Thread Ana Marija

sorry forgot to attach the plot.

On Fri, Nov 6, 2020 at 8:07 AM Ana Marija  wrote:
>
> Hello
>
> I have data like this:
>
> > head(my_data)
>   subjects DIABDUR HBA1C ESRD SEX AGE PHENO  C1   C2
> 1 fam0110_G110  38   9.41   2  51 2 -0.01144980  0.002661140
> 2 fam0113_G113  30  12.51   2  40 2 -0.00502052 -0.000929061
> 3 fam0114_G114  23   8.42   2  45 2 -0.00251578 -0.003450950
> 4 fam0117_G117  37   9.02   2  46 2 -0.00704917 -0.000573325
> 5 fam0119_G119  22   9.41   1  46 1  0.00263433  0.001002370
> 6 fam0119_G120  NANA1   1  71 1 -0.00354795 -0.002045940
> C3  C4  C5   C6   C7  C8
> 1  0.006028150 -0.00176795 -0.00148375  0.004543550 -0.006272170 -0.00535077
> 2 -0.000453402 -0.00192162  0.00416229  0.007868230 -0.001957670 -0.00473148
> 3 -0.001680860 -0.00620438 -0.00235092  0.000672831 -0.000278318  0.00647337
> 4  0.001436740  0.00155568 -0.00556147 -0.000386401 -0.006885350  0.00135539
> 5 -0.007396920  0.00326229  0.00355575 -0.011149400  0.009156510  0.00120833
> 6  0.004532050  0.00869862 -0.00113207  0.002244520 -0.002119220  0.00657587
>C9 C10
> 1  0.00328111 -0.00113515
> 2 -0.00495790  0.00320201
> 3  0.00208591 -0.00874752
> 4 -0.00967934  0.00607760
> 5  0.00611030  0.00876190
> 6 -0.00990661  0.00635349
>
> I am plotting it with:
>
> library(dplyr)
> library(magrittr)
> library(corrplot)
> d=my_data %>% data.frame %>% set_rownames(.$subjects) %>% select(-subjects)
> res <- cor(d, use = "complete.obs")
> pdf("correlation.pdf")
> corrplot(res, type = "upper", order = "hclust",
>  tl.col = "black", tl.srt = 45)
> dev.off()
>
> and I am getting the plot in attach. How to make it so that my
> variables are shown on the plot in the order they are in my_data data
> frame?
>
> Thanks
> Ana


correlation.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to order variables on correlation plot

2020-11-06 Thread Ana Marija

Hello

I have data like this:

> head(my_data)
  subjects DIABDUR HBA1C ESRD SEX AGE PHENO  C1   C2
1 fam0110_G110  38   9.41   2  51 2 -0.01144980  0.002661140
2 fam0113_G113  30  12.51   2  40 2 -0.00502052 -0.000929061
3 fam0114_G114  23   8.42   2  45 2 -0.00251578 -0.003450950
4 fam0117_G117  37   9.02   2  46 2 -0.00704917 -0.000573325
5 fam0119_G119  22   9.41   1  46 1  0.00263433  0.001002370
6 fam0119_G120  NANA1   1  71 1 -0.00354795 -0.002045940
C3  C4  C5   C6   C7  C8
1  0.006028150 -0.00176795 -0.00148375  0.004543550 -0.006272170 -0.00535077
2 -0.000453402 -0.00192162  0.00416229  0.007868230 -0.001957670 -0.00473148
3 -0.001680860 -0.00620438 -0.00235092  0.000672831 -0.000278318  0.00647337
4  0.001436740  0.00155568 -0.00556147 -0.000386401 -0.006885350  0.00135539
5 -0.007396920  0.00326229  0.00355575 -0.011149400  0.009156510  0.00120833
6  0.004532050  0.00869862 -0.00113207  0.002244520 -0.002119220  0.00657587
   C9 C10
1  0.00328111 -0.00113515
2 -0.00495790  0.00320201
3  0.00208591 -0.00874752
4 -0.00967934  0.00607760
5  0.00611030  0.00876190
6 -0.00990661  0.00635349

I am plotting it with:

library(dplyr)
library(magrittr)
library(corrplot)
d=my_data %>% data.frame %>% set_rownames(.$subjects) %>% select(-subjects)
res <- cor(d, use = "complete.obs")
pdf("correlation.pdf")
corrplot(res, type = "upper", order = "hclust",
 tl.col = "black", tl.srt = 45)
dev.off()

and I am getting the plot in attach. How to make it so that my
variables are shown on the plot in the order they are in my_data data
frame?

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how do I remove entries in data frame from a vector

2020-10-21 Thread Ana Marija

Makes sense, thank you!

On Wed, 21 Oct 2020 at 17:46, Rolf Turner  wrote:

>
> On Wed, 21 Oct 2020 16:15:22 -0500
> Ana Marija  wrote:
>
> > Hello,
> >
> > I have a data frame with one column:
> >
> > > remove
> >
> > V1
> >
> > 1 ABAFT_g_4RWG569_BI_SNP_A10_35096
> > 2 ABAFT_g_4RWG569_BI_SNP_B12_35130
> > 3 ABAFT_g_4RWG569_BI_SNP_E09_35088
> > 4 ABAFT_g_4RWG569_BI_SNP_E12_35136
> > 5 ABAFT_g_4RWG569_BI_SNP_F11_35122
> > 6 ABAFT_g_4RWG569_BI_SNP_F12_35138
> > 7 ABAFT_g_4RWG569_BI_SNP_G07_35060
> > 8 ABAFT_g_4RWG569_BI_SNP_G12_35140
> >
> > I want to remove these 8 entries from remove data frame from this
> > vector that looks like this:
> >
> > > head(celFiles)
> >
> > [1]
> >
> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL"
> > [2]
> >
> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL"
> >
> > [3]
> >
> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL"
> >
> > [4]
> >
> "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL"
> >
> > [5]
> >
> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL"
> >
> > [6]
> >
> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL"
> > ...
> >
> > I tried doing this:
> >
> > b= celFiles[!basename(celFiles) %in% as.character(remove$V1)]
> >
> > but none of the 8th entries in "remove" data frame have been removed.
> >
> > Please advise,
> > Ana
>
> I would advise you to *look* at basename(celFiles)!!!
>
> The entries end in ".CEL"; the names in remove$V1 do not.  So %in%
> finds no matches.  Perhaps:
>
> b <- celFiles[!basename(celFiles) %in%
>  paste0(as.character(remove$V1),".CEL")]
>
> Note that, for the data that you have presented, none of the entries of
> celFiles "match up" with "remove" so it is *still* the case that (for
> the data shown) none of the entries will be removed.  So your example
> was bad.
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how do I remove entries in data frame from a vector

2020-10-21 Thread Ana Marija

Thank you so much!

On Wed, Oct 21, 2020 at 4:47 PM Rui Barradas  wrote:
>
> Hello,
>
> To remove the file extension it's much easier to use base R
>
>
> filename <- tools::file_path_sans_ext(basename(celFiles))
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 22:41 de 21/10/20, Rui Barradas escreveu:
> > Hello,
> >
> > This is probably because basename keeps the file extension, try instead
> >
> >
> > filename <- sub("(^[^\\.]*)\\..+$", "\\1", basename(celFiles))
> > celFiles[!filename %in% as.character(remove$V1)]
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 22:15 de 21/10/20, Ana Marija escreveu:
> >> Hello,
> >>
> >> I have a data frame with one column:
> >>
> >>> remove
> >>
> >>  V1
> >>
> >> 1 ABAFT_g_4RWG569_BI_SNP_A10_35096
> >> 2 ABAFT_g_4RWG569_BI_SNP_B12_35130
> >> 3 ABAFT_g_4RWG569_BI_SNP_E09_35088
> >> 4 ABAFT_g_4RWG569_BI_SNP_E12_35136
> >> 5 ABAFT_g_4RWG569_BI_SNP_F11_35122
> >> 6 ABAFT_g_4RWG569_BI_SNP_F12_35138
> >> 7 ABAFT_g_4RWG569_BI_SNP_G07_35060
> >> 8 ABAFT_g_4RWG569_BI_SNP_G12_35140
> >>
> >> I want to remove these 8 entries from remove data frame from this
> >> vector that looks like this:
> >>
> >>> head(celFiles)
> >>
> >> [1]
> >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL"
> >>
> >> [2]
> >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL"
> >>
> >>
> >> [3]
> >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL"
> >>
> >>
> >> [4]
> >> "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL"
> >>
> >>
> >> [5]
> >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL"
> >>
> >>
> >> [6]
> >> "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL"
> >>
> >> ...
> >>
> >> I tried doing this:
> >>
> >> b= celFiles[!basename(celFiles) %in% as.character(remove$V1)]
> >>
> >> but none of the 8th entries in "remove" data frame have been removed.
> >>
> >> Please advise,
> >> Ana
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how do I remove entries in data frame from a vector

2020-10-21 Thread Ana Marija

Hello,

I have a data frame with one column:

> remove

V1

1 ABAFT_g_4RWG569_BI_SNP_A10_35096
2 ABAFT_g_4RWG569_BI_SNP_B12_35130
3 ABAFT_g_4RWG569_BI_SNP_E09_35088
4 ABAFT_g_4RWG569_BI_SNP_E12_35136
5 ABAFT_g_4RWG569_BI_SNP_F11_35122
6 ABAFT_g_4RWG569_BI_SNP_F12_35138
7 ABAFT_g_4RWG569_BI_SNP_G07_35060
8 ABAFT_g_4RWG569_BI_SNP_G12_35140

I want to remove these 8 entries from remove data frame from this
vector that looks like this:

> head(celFiles)

[1] 
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL"
[2] 
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL"

[3] 
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL"

[4] 
"GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL"

[5] 
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL"

[6] 
"/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs18.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL"
...

I tried doing this:

b= celFiles[!basename(celFiles) %in% as.character(remove$V1)]

but none of the 8th entries in "remove" data frame have been removed.

Please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2 D density plot interpretation and manipulating the data

2020-10-09 Thread Ana Marija

Hi Abby,

Thanks for getting back to me, yes I believe I did that by doing this:

SNP$density <- get_density(SNP$mean, SNP$var)
> summary(SNP$density)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  0 383 696 73811701789

where get_density() is function from here:
https://slowkow.com/notes/ggplot2-color-by-density/

and keep only entries with density > 400

a=SNP[SNP$density>400,]

and plot it again:

p <- ggplot(a, mapping = aes(x = mean, y = var))
p <- p +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red")

and probably I can increase that threshold...

Any idea how do I interpret data points that are left contained within
the ellipses?

On Fri, Oct 9, 2020 at 6:09 PM Abby Spurdle  wrote:
>
> You could assign a density value to each point.
> Maybe you've done that already...?
>
> Then trim the lowest n (number of) data points
> Or trim the lowest p (proportion of) data points.
>
> e.g.
> Remove the data points with the 20 lowest density values.
> Or remove the data points with the lowest 5% of density values.
>
> I'll let you decide whether that is a good idea or a bad idea.
> And if it's a good idea, then how much to trim.
>
>
> On Sat, Oct 10, 2020 at 5:47 AM Ana Marija  
> wrote:
> >
> > Hi Bert,
> >
> > Another confrontational response from you...
> >
> > You might have noticed that I use the word "outlier" carefully in this
> > post and only in relation to the plotted ellipses. I do not know the
> > underlying algorithm of geom_density_2d() and therefore I am having an
> > issue of how to interpret the plot. I was hoping someone here knows
> > that and can help me.
> >
> > Ana
> >
> > On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter  wrote:
> > >
> > > I recommend that you consult with a local statistical expert. Much of 
> > > what you say (outliers?!?) seems to make little sense, and your 
> > > statistical knowledge seems minimal. Perhaps more to the point, none of 
> > > your questions can be properly answered without subject matter context, 
> > > which this list is not designed to provide. That's why I believe you need 
> > > local expertise.
> > >
> > > Bert Gunter
> > >
> > > "The trouble with having an open mind is that people keep coming along 
> > > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> > >
> > >
> > > On Fri, Oct 9, 2020 at 8:25 AM Ana Marija  
> > > wrote:
> > >>
> > >> Hi Abby,
> > >>
> > >> thank you for getting back to me and for this useful information.
> > >>
> > >> I'm trying to detect the outliers in my distribution based of mean and
> > >> variance. Can I see that from the plot I provided? Would outliers be
> > >> outside of ellipses? If so how do I extract those from my data frame,
> > >> based on which parameter?
> > >>
> > >> So I am trying to connect outliers based on what the plot is showing:
> > >> s <- ggplot(SNP, mapping = aes(x = mean, y = var))
> > >> s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
> > >>
> > >> versus what is in the data:
> > >>
> > >> > head(SNP)
> > >>mean  var sd
> > >> FQC.10090295 0.0327 0.002678 0.0517
> > >> FQC.10119363 0.0220 0.000978 0.0313
> > >> FQC.10132112 0.0275 0.002088 0.0457
> > >> FQC.10201128 0.0169 0.000289 0.0170
> > >> FQC.10208432 0.0443 0.004081 0.0639
> > >> FQC.10218466 0.0116 0.000131 0.0115
> > >> ...
> > >>
> > >> the distribution is not normal, it is right-skewed.
> > >>
> > >> Cheers,
> > >> Ana
> > >>
> > >> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle  wrote:
> > >> >
> > >> > > My understanding is that this represents bivariate normal
> > >> > > approximation of the data which uses the kernel density function to
> > >> > > test for inclusion within a level set. (please correct me)
> > >> >
> > >> > You can fit a bivariate normal distribution by computing five 
> > >> > parameters.
> > >> > Two means, two standard deviations (or two variances) and one
> > >> > correlation (or covariance) coefficient.
> > >> > The bivariate normal *has* elliptical contour

Re: [R] 2 D density plot interpretation and manipulating the data

2020-10-09 Thread Ana Marija

Hi Bert,

Another confrontational response from you...

You might have noticed that I use the word "outlier" carefully in this
post and only in relation to the plotted ellipses. I do not know the
underlying algorithm of geom_density_2d() and therefore I am having an
issue of how to interpret the plot. I was hoping someone here knows
that and can help me.

Ana

On Fri, Oct 9, 2020 at 11:31 AM Bert Gunter  wrote:
>
> I recommend that you consult with a local statistical expert. Much of what 
> you say (outliers?!?) seems to make little sense, and your statistical 
> knowledge seems minimal. Perhaps more to the point, none of your questions 
> can be properly answered without subject matter context, which this list is 
> not designed to provide. That's why I believe you need local expertise.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Oct 9, 2020 at 8:25 AM Ana Marija  wrote:
>>
>> Hi Abby,
>>
>> thank you for getting back to me and for this useful information.
>>
>> I'm trying to detect the outliers in my distribution based of mean and
>> variance. Can I see that from the plot I provided? Would outliers be
>> outside of ellipses? If so how do I extract those from my data frame,
>> based on which parameter?
>>
>> So I am trying to connect outliers based on what the plot is showing:
>> s <- ggplot(SNP, mapping = aes(x = mean, y = var))
>> s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
>>
>> versus what is in the data:
>>
>> > head(SNP)
>>mean  var sd
>> FQC.10090295 0.0327 0.002678 0.0517
>> FQC.10119363 0.0220 0.000978 0.0313
>> FQC.10132112 0.0275 0.002088 0.0457
>> FQC.10201128 0.0169 0.000289 0.0170
>> FQC.10208432 0.0443 0.004081 0.0639
>> FQC.10218466 0.0116 0.000131 0.0115
>> ...
>>
>> the distribution is not normal, it is right-skewed.
>>
>> Cheers,
>> Ana
>>
>> On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle  wrote:
>> >
>> > > My understanding is that this represents bivariate normal
>> > > approximation of the data which uses the kernel density function to
>> > > test for inclusion within a level set. (please correct me)
>> >
>> > You can fit a bivariate normal distribution by computing five parameters.
>> > Two means, two standard deviations (or two variances) and one
>> > correlation (or covariance) coefficient.
>> > The bivariate normal *has* elliptical contours.
>> >
>> > A kernel density estimate is usually regarded as an estimate of an
>> > unknown density function.
>> > Often they use a normal (or Gaussian) kernel, but I wouldn't describe
>> > them as normal approximations.
>> > In general, bivariate kernel density estimates do *not* have
>> > elliptical contours.
>> > But in saying that, if the data is close to normality, then contours
>> > will be close to elliptical.
>> >
>> > Kernel density estimates do not test for inclusion, as such.
>> > (But technically, there are some exceptions to that).
>> >
>> > I'm not sure what you're trying to achieve here.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2 D density plot interpretation and manipulating the data

2020-10-09 Thread Ana Marija

Hi Abby,

thank you for getting back to me and for this useful information.

I'm trying to detect the outliers in my distribution based of mean and
variance. Can I see that from the plot I provided? Would outliers be
outside of ellipses? If so how do I extract those from my data frame,
based on which parameter?

So I am trying to connect outliers based on what the plot is showing:
s <- ggplot(SNP, mapping = aes(x = mean, y = var))
s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")

versus what is in the data:

> head(SNP)
   mean  var sd
FQC.10090295 0.0327 0.002678 0.0517
FQC.10119363 0.0220 0.000978 0.0313
FQC.10132112 0.0275 0.002088 0.0457
FQC.10201128 0.0169 0.000289 0.0170
FQC.10208432 0.0443 0.004081 0.0639
FQC.10218466 0.0116 0.000131 0.0115
...

the distribution is not normal, it is right-skewed.

Cheers,
Ana

On Fri, Oct 9, 2020 at 2:13 AM Abby Spurdle  wrote:
>
> > My understanding is that this represents bivariate normal
> > approximation of the data which uses the kernel density function to
> > test for inclusion within a level set. (please correct me)
>
> You can fit a bivariate normal distribution by computing five parameters.
> Two means, two standard deviations (or two variances) and one
> correlation (or covariance) coefficient.
> The bivariate normal *has* elliptical contours.
>
> A kernel density estimate is usually regarded as an estimate of an
> unknown density function.
> Often they use a normal (or Gaussian) kernel, but I wouldn't describe
> them as normal approximations.
> In general, bivariate kernel density estimates do *not* have
> elliptical contours.
> But in saying that, if the data is close to normality, then contours
> will be close to elliptical.
>
> Kernel density estimates do not test for inclusion, as such.
> (But technically, there are some exceptions to that).
>
> I'm not sure what you're trying to achieve here.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 2 D density plot interpretation and manipulating the data

2020-10-08 Thread Ana Marija

My understanding is that this represents bivariate normal
approximation of the data which uses the kernel density function to
test for inclusion within a level set. (please correct me)

In order to exclude the outlier to these ellipses/contours is it
advisable to do something like this:

SNP$density <- get_density(SNP$mean, SNP$var)
> summary(SNP$density)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  0 383 696 73811701789

where get_density() is function from here:
https://slowkow.com/notes/ggplot2-color-by-density/

and then do something like this:

a=SNP[SNP$density>400,]

and plot it again:

p <- ggplot(a, mapping = aes(x = mean, y = var))
p <- p +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPS_red")

On Thu, Oct 8, 2020 at 3:52 PM Ana Marija  wrote:
>
> Hello,
>
> I have a data frame like this:
>
> > head(SNP)
>mean  var sd
> FQC.10090295 0.0327 0.002678 0.0517
> FQC.10119363 0.0220 0.000978 0.0313
> FQC.10132112 0.0275 0.002088 0.0457
> FQC.10201128 0.0169 0.000289 0.0170
> FQC.10208432 0.0443 0.004081 0.0639
> FQC.10218466 0.0116 0.000131 0.0115
> ...
>
> and I am creating plot like this:
>
> s <- ggplot(SNP, mapping = aes(x = mean, y = var))
> s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
> s
>
> I am getting plot in attach.
>
> My question is how do I:
> 1.interpret the inclusion versus exclusion within the ellipses-contours?
>
> 2. how do I extract from my data frame the points which are outside of 
> ellipses?
>
> Thanks
> Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 2 D density plot interpretation and manipulating the data

2020-10-08 Thread Ana Marija

Hello,

I have a data frame like this:

> head(SNP)
   mean  var sd
FQC.10090295 0.0327 0.002678 0.0517
FQC.10119363 0.0220 0.000978 0.0313
FQC.10132112 0.0275 0.002088 0.0457
FQC.10201128 0.0169 0.000289 0.0170
FQC.10208432 0.0443 0.004081 0.0639
FQC.10218466 0.0116 0.000131 0.0115
...

and I am creating plot like this:

s <- ggplot(SNP, mapping = aes(x = mean, y = var))
s <- s +  geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs")
s

I am getting plot in attach.

My question is how do I:
1.interpret the inclusion versus exclusion within the ellipses-contours?

2. how do I extract from my data frame the points which are outside of ellipses?

Thanks
Ana


snps.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to turn column into column names and fill it with values

2020-09-29 Thread Ana Marija

oh it seems that I can just use your last line of code and solve my problem:
m2=tapply(mc$IID, list(FID=mc$FID, PLATE=mc$PLATE), mean)
m2=as.data.frame(m2)
library(data.table)
m3=setDT(m2, keep.rownames = TRUE)[]
colnames(m3)[1] <- "FID"
mt=merge(mc,m3,by="FID"
for(i in 4:ncol(mt)) mt[,i] <- 1 + (names(mt)[i]== mt$PLATE)

Thanks!

On Tue, Sep 29, 2020 at 12:08 PM Ana Marija  wrote:
>
> HI Bert,
>
> thank you for getting back to me.
> I tried this:
>
> > dat <- cbind(mc, matrix(0,ncol = 34))
> > head(dat)
>   FID  IID   PLATE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
> 22
> 1 fam0110 G110 4RWG569 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  > 0
> 2 fam0113 G113  cherry 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  > 0
> 3 fam0114 G114  cherry 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  > 0
> 4 fam0117 G117 4RWG569 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  > 0
> 5 fam0118 G118 5XAV049 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  > 0
> 6 fam0119 G119  cherry 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  > 0
>   23 24 25 26 27 28 29 30 31 32 33 34
> 1  0  0  0  0  0  0  0  0  0  0  0  0
> 2  0  0  0  0  0  0  0  0  0  0  0  0
> 3  0  0  0  0  0  0  0  0  0  0  0  0
> 4  0  0  0  0  0  0  0  0  0  0  0  0
> 5  0  0  0  0  0  0  0  0  0  0  0  0
> 6  0  0  0  0  0  0  0  0  0  0  0  0
> > names(dat) <- c(names(dat)[1:34], unique(dat$PLATE))
> Error in names(dat) <- c(names(dat)[1:34], unique(dat$PLATE)) :
>   'names' attribute [68] must be the same length as the vector [37]
>
> so names should include FID,IID,PLATE  plus unique dat$PLATE
> how do I fix that so the code works?
>
> Also I tried a bit on my own:
>
> > head(mc)
>   FID  IID   PLATE
> 1 fam0110 G110 4RWG569
> 2 fam0113 G113  cherry
> 3 fam0114 G114  cherry
> 4 fam0117 G117 4RWG569
> 5 fam0118 G118 5XAV049
> 6 fam0119 G119  cherry
> ...
>
> m2=tapply(mc$IID, list(FID=mc$FID, PLATE=mc$PLATE), mean)
> m2=as.data.frame(m2)
> library(data.table)
> m3=setDT(m2, keep.rownames = TRUE)[]
> colnames(m3)[1] <- "FID"
> mt=merge(mc,m3,by="FID")
>
> > head(mt)
>   FID  IID   PLATE 0VXC556 1CNF297 1CWO500 1DXJ626 1LTX827 1SHK635 1TNP840
> 1 fam0110 G110 4RWG569  NA  NA  NA  NA  NA  NA  NA
> 2 fam0113 G113  cherry  NA  NA  NA  NA  NA  NA  NA
> 3 fam0114 G114  cherry  NA  NA  NA  NA  NA  NA  NA
> 4 fam0117 G117 4RWG569  NA  NA  NA  NA  NA  NA  NA
> 5 fam0118 G118 5XAV049  NA  NA  NA  NA  NA  NA  NA
> 6 fam0119 G119  cherry  NA  NA  NA  NA  NA  NA  NA
>   1URP242 2BKX529 2PAG415 3DEF425 3ECO791 3FQM386 3KYJ479 3XHK903 4RWG569
> 1  NA  NA  NA  NA  NA  NA  NA  NA  NA
> 2  NA  NA  NA  NA  NA  NA  NA  NA  NA
> 3  NA  NA  NA  NA  NA  NA  NA  NA  NA
> 4  NA  NA  NA  NA  NA  NA  NA  NA  NA
> 5  NA  NA  NA  NA  NA  NA      NA  NA  NA
> 6  NA  NA  NA  NA  NA  NA  NA  NA  NA
> ...
>
> so this gives me the correct columns. Now is the question of how to
> replace NA with 2 id column name matches the rownname in PLATE column
> with 2 otherwise it is 1.
>
> Cheers,
> Ana
>
> On Tue, Sep 29, 2020 at 11:46 AM Bert Gunter  wrote:
> >
> > I am not sure reshape2 is appropriate for this task, but, assuming I 
> > understand correctly, it's quite easy without it. The following is one way, 
> > which probably can be done more elegantly and efficiently, but I think it 
> > does what you want.
> >
> > "dat" is your example data frame, in which the columns were read in with 
> > "stringsAsFactors" = FALSE (this is important!)
> >
> > dat <- cbind(dat, matrix(0,ncol = 3)) ## change 3 to 34 for your full data
> > names(dat) <- c(names(dat)[1:3], unique(dat$PLATE))
> > for(i in 4:ncol(dat)) dat[,i] <- 1 + (names(dat)[i]== dat$PLATE)
> > dat
> >
> > Result:
> >
> >   FID  IID   PLATE 4RWG569 cherry 5XAV049
> > 1 fam0110 G110 4RWG569   2  1   1
> > 2 fam0113 G113  cherry   1  2   1
> > 3 fam0114 G114  cherry   1  2   1
> > 4 fam0117 G117 4RWG569   2  1   1
> > 5 fam0118 G118 5XAV049   1  1   2
> > 6 fam0119 G119  cherry   1  2   1
> >
> >
> > Bert Gunter
> &

Re: [R] how to turn column into column names and fill it with values

2020-09-29 Thread Ana Marija

HI Bert,

thank you for getting back to me.
I tried this:

> dat <- cbind(mc, matrix(0,ncol = 34))
> head(dat)
  FID  IID   PLATE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
1 fam0110 G110 4RWG569 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
2 fam0113 G113  cherry 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
3 fam0114 G114  cherry 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
4 fam0117 G117 4RWG569 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
5 fam0118 G118 5XAV049 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
6 fam0119 G119  cherry 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0
  23 24 25 26 27 28 29 30 31 32 33 34
1  0  0  0  0  0  0  0  0  0  0  0  0
2  0  0  0  0  0  0  0  0  0  0  0  0
3  0  0  0  0  0  0  0  0  0  0  0  0
4  0  0  0  0  0  0  0  0  0  0  0  0
5  0  0  0  0  0  0  0  0  0  0  0  0
6  0  0  0  0  0  0  0  0  0  0  0  0
> names(dat) <- c(names(dat)[1:34], unique(dat$PLATE))
Error in names(dat) <- c(names(dat)[1:34], unique(dat$PLATE)) :
  'names' attribute [68] must be the same length as the vector [37]

so names should include FID,IID,PLATE  plus unique dat$PLATE
how do I fix that so the code works?

Also I tried a bit on my own:

> head(mc)
  FID  IID   PLATE
1 fam0110 G110 4RWG569
2 fam0113 G113  cherry
3 fam0114 G114  cherry
4 fam0117 G117 4RWG569
5 fam0118 G118 5XAV049
6 fam0119 G119  cherry
...

m2=tapply(mc$IID, list(FID=mc$FID, PLATE=mc$PLATE), mean)
m2=as.data.frame(m2)
library(data.table)
m3=setDT(m2, keep.rownames = TRUE)[]
colnames(m3)[1] <- "FID"
mt=merge(mc,m3,by="FID")

> head(mt)
  FID  IID   PLATE 0VXC556 1CNF297 1CWO500 1DXJ626 1LTX827 1SHK635 1TNP840
1 fam0110 G110 4RWG569  NA  NA  NA  NA  NA  NA  NA
2 fam0113 G113  cherry  NA  NA  NA  NA  NA  NA  NA
3 fam0114 G114  cherry  NA  NA  NA  NA  NA  NA  NA
4 fam0117 G117 4RWG569  NA  NA  NA  NA  NA  NA  NA
5 fam0118 G118 5XAV049  NA  NA  NA  NA  NA  NA  NA
6 fam0119 G119  cherry  NA  NA  NA  NA  NA  NA  NA
  1URP242 2BKX529 2PAG415 3DEF425 3ECO791 3FQM386 3KYJ479 3XHK903 4RWG569
1  NA  NA  NA  NA  NA  NA  NA  NA  NA
2  NA  NA  NA  NA  NA  NA  NA  NA  NA
3  NA  NA  NA  NA  NA  NA  NA  NA  NA
4  NA  NA  NA  NA  NA  NA  NA  NA  NA
5  NA  NA  NA  NA  NA  NA  NA  NA  NA
6  NA  NA  NA  NA  NA  NA  NA  NA  NA
...

so this gives me the correct columns. Now is the question of how to
replace NA with 2 id column name matches the rownname in PLATE column
with 2 otherwise it is 1.

Cheers,
Ana

On Tue, Sep 29, 2020 at 11:46 AM Bert Gunter  wrote:
>
> I am not sure reshape2 is appropriate for this task, but, assuming I 
> understand correctly, it's quite easy without it. The following is one way, 
> which probably can be done more elegantly and efficiently, but I think it 
> does what you want.
>
> "dat" is your example data frame, in which the columns were read in with 
> "stringsAsFactors" = FALSE (this is important!)
>
> dat <- cbind(dat, matrix(0,ncol = 3)) ## change 3 to 34 for your full data
> names(dat) <- c(names(dat)[1:3], unique(dat$PLATE))
> for(i in 4:ncol(dat)) dat[,i] <- 1 + (names(dat)[i]== dat$PLATE)
> dat
>
> Result:
>
>   FID  IID   PLATE 4RWG569 cherry 5XAV049
> 1 fam0110 G110 4RWG569   2  1   1
> 2 fam0113 G113  cherry   1  2   1
> 3 fam0114 G114  cherry   1  2   1
> 4 fam0117 G117 4RWG569   2  1   1
> 5 fam0118 G118 5XAV049   1  1   2
> 6 fam0119 G119  cherry   1  2   1
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Sep 29, 2020 at 9:19 AM Ana Marija  
> wrote:
>>
>> Hello,
>>
>> I have a data frame like this:
>>
>> > head(mc)
>>   FID  IID   PLATE
>> 1 fam0110 G110 4RWG569
>> 2 fam0113 G113  cherry
>> 3 fam0114 G114  cherry
>> 4 fam0117 G117 4RWG569
>> 5 fam0118 G118 5XAV049
>> 6 fam0119 G119  cherry
>> ...
>> > dim(mc)
>> [1] 16254
>> > length(unique(mc$PLATE))
>> [1] 34
>>
>> I am trying to make a new data frame which would look like this:
>>   FID  IID   PLATE   4RWG569  cherry 5XAV049 ...
>> 1 fam0110 G110 4RWG569  2  1  1
>> 2 fam0113 G113

[R] how to turn column into column names and fill it with values

2020-09-29 Thread Ana Marija

Hello,

I have a data frame like this:

> head(mc)
  FID  IID   PLATE
1 fam0110 G110 4RWG569
2 fam0113 G113  cherry
3 fam0114 G114  cherry
4 fam0117 G117 4RWG569
5 fam0118 G118 5XAV049
6 fam0119 G119  cherry
...
> dim(mc)
[1] 16254
> length(unique(mc$PLATE))
[1] 34

I am trying to make a new data frame which would look like this:
  FID  IID   PLATE   4RWG569  cherry 5XAV049 ...
1 fam0110 G110 4RWG569  2  1  1
2 fam0113 G113  cherry   1  2  1
3 fam0114 G114  cherry   1  2  1
4 fam0117 G117 4RWG569  2  1  1
5 fam0118 G118 5XAV049   2  1  1
6 fam0119 G119  cherry   1  2  1
...

so the new data frame would have an additional 34 columns (for every
unique mc$PLATE) and if in the row of PLATE column the value is ==to
that column name I would have 2 otherwise 1

I tried to do this with:

library(reshape2)
>  m2=dcast(mc, IID ~ PLATE)
Using PLATE as value column: use value.var to override.

Please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with nesting if else statements

2020-09-23 Thread Ana Marija

Hi Jeremie,

when I try to reproduce your code this is what I get:

> a=setDT(a)
> head(a)
   FID  IID CURRELIG PLASER RTNPTHY
1: fam0110 G1102  2   2
2: fam0113 G1132  2   2
3: fam0114 G1142  2   2
4: fam0117 G1172  2   2
5: fam0118 G1182 NA   2
6: fam0119 G1192  1   2
> a=a[,PHENO:=NA]
> head(a)
   FID  IID CURRELIG PLASER RTNPTHY PHENO
1: fam0110 G1102  2   2NA
2: fam0113 G1132  2   2NA
3: fam0114 G1142  2   2NA
4: fam0117 G1172  2   2NA
5: fam0118 G1182 NA   2NA
6: fam0119 G1192  1   2NA
> a=a[PLASER==2|RTNPTHY==2,PHENO:=2]
Warning message:
In `[.data.table`(a, PLASER == 2 | RTNPTHY == 2, `:=`(PHENO, 2)) :
  2.00 (type 'double') at RHS position 1 taken as TRUE when
assigning to type 'logical' (column 6 named 'PHENO')

Please advise,
Ana

On Wed, Sep 23, 2020 at 2:48 PM Jeremie Juste  wrote:
>
>
> Hello Ana Marija,
>
> I cannot reproduce your error,
>
> with a$PHENO=ifelse(a$PLASER==2 |a$RTNPTHY==2, 2, ifelse(a$CURRELIG==1 | 
> a$RTNPTHY==1,1,NA))
> For instance I have the expected PHENO=2
>
> > FID  IID   CURRELIG  PLASER  RTNPTHY PHENO
> > 39: fam5706 G57061  1   2 2
>
> In general I find nested ifelse to be difficult to work with especially
> when I am tired :-). I would suggest this alternative way instead. It uses
> data.table and you can investigate each step if you need to.
>
> library(data.table)
> setDT(a)
> a[,PHENO:=NA]
> a[PLASER==2|RTNPTHY==2,PHENO:=2]
> a[is.na(PHENO)&(CURRELIG==1|RTNPTHY==1),PHENO:=1]
>
>
> HTH,
> Jeremie
>
> a <- read.table(text="FID,IID,CURRELIG,PLASER,RTNPTHY
> fam5610,G5610,1,1,1
> fam5614,G5614,1,2,2
> fam5615,G5615,1,1,1
> fam5618,G5618,1,1,2
> fam5621,G5621,1,1,1
> fam5624,G5624,1,1,2
> fam5625,G5625,1,1,1
> fam5628,G5628,1,2,2
> fam5633,G5633,1,2,2
> fam5634,G5634,1,1,1
> fam5635,G5635,2,2,2
> fam5636,G5636,1,1,1
> fam5641,G5641,1,1,1
> fam5645,G5645,2,1,2
> fam5646,G5646,2,2,2
> fam5654,G5654,1,2,2
> fam5655,G5655,1,2,2
> fam5656,G5656,2,2,2
> fam5658,G5658,1,1,1
> fam5659,G5659,2,2,2
> fam5660,G5660,1,1,1
> fam5661,G5661,2,2,2
> fam5664,G5664,1,1,1
> fam5666,G5666,1,1,1
> fam5667,G5667,1,1,2
> fam5670,G5670,1,1,1
> fam5671,G5671,1,1,2
> fam5672,G5672,1,1,2
> fam5673,G5673,1,1,1
> fam5680,G5680,1,2,2
> fam5686,G5686,1,2,2
> fam5687,G5687,1,2,2
> fam5688,G5688,1,1,2
> fam5693,G5693,2,1,1
> fam5695,G5695,1,1,1
> fam5697,G5697,1,1,1
> fam5700,G5700,1,2,2
> fam5701,G5701,1,1,1
> fam5706,G5706,1,1,2
> fam5709,G5709,1,1,1
> fam5713,G5713,1,1,1
> fam5715,G5715,1,1,1
> fam5718,G5718,1,1,1",sep=",", header=TRUE)
>
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with nesting if else statements

2020-09-23 Thread Ana Marija

I tried doing this:
a$PHENO=ifelse(a$PLASER==2 | a$RTNPTHY==2,2,ifelse(a$CURRELIG==1 |
a$RTNPTHY==1,1,NA))

which brought be closer to the solution, but now I have lines like this:
FID   IID CURRELIG PLASER RTNPTHY PHENO
fam3151 G31511  1  NANA
fam3149 G31492  1  NANA
fam3151 G31511  1  NANA
fam0637  G6372 NA  NANA
fam4483 G44831 NA  NANA

I would like these lines to look like this:

FID   IID CURRELIG PLASER RTNPTHY PHENO
fam3151 G31511  1  NA1
fam3149 G31492  1  NA2
fam3151 G31511  1  NA   1
fam0637  G6372 NA  NA2
fam4483 G44831 NA  NA1

in addition to what this command does
a$PHENO=ifelse(a$PLASER==2 | a$RTNPTHY==2,2,ifelse(a$CURRELIG==1 |
a$RTNPTHY==1,1,NA))

On Wed, Sep 23, 2020 at 11:43 AM Ana Marija  wrote:
>
> Hello,
>
> I have a data frame as shown bellow.
> I want to create a new column PHENO which will be defined as follows:
> if CURRELIG==1 -> PHENO==1
> in the above subset those that have:
> PLASER==2 -> PHENO==2
> and
> those where RTNPTHY==1 -> PHENO==1
>
> I tried doing this:
> a$PHENO=ifelse(a$CURRELIG==1 | a$RTNPTHY==1  ,1,ifelse(a$PLASER==2 |
> a$RTNPTHY==2,2,NA))
>
> but this give me some lines where I am not seeing results that I want,
> for example:
> FID   IID CURRELIG PLASER RTNPTHY PHENO
> fam5628 G56281 2   21
>
> here the PHENO should be =2 because RTNPTHY==2 and PLASER==2
> PHENO should be ==2 when either RTNPTHY==2 or PLASER==2
>
> another wrong line is this:
> FID  IID CURRELIG PLASER RTNPTHY PHENO
> fam5706G570611 2 1
>
> again RTNPTHY ==2 and PHENO==1 instead of 2.
>
> My data looks like this:
> FID  IID CURRELIG PLASER RTNPTHY
> fam5610 G56101  1   1
> fam5614 G56141  2   2
> fam5615 G56151  1   1
> fam5618 G56181  1   2
> fam5621 G56211  1   1
> fam5624 G56241  1   2
> fam5625 G56251  1   1
> fam5628 G56281  2   2
> fam5633 G56331  2   2
> fam5634 G56341  1   1
> fam5635 G56352  2   2
> fam5636 G56361  1   1
> fam5641 G56411  1   1
> fam5645 G56452  1   2
> fam5646 G56462  2   2
> fam5654 G56541  2   2
> fam5655 G56551  2   2
> fam5656 G56562  2   2
> fam5658 G56581  1   1
> fam5659 G56592  2   2
> fam5660 G56601  1   1
> fam5661 G56612  2   2
> fam5664 G56641  1   1
> fam5666 G56661  1   1
> fam5667 G56671  1   2
> fam5670 G56701  1   1
> fam5671 G56711  1   2
> fam5672 G56721  1   2
> fam5673 G56731  1   1
> fam5680 G56801  2   2
> fam5686 G56861  2   2
> fam5687 G56871  2   2
> fam5688 G56881  1   2
> fam5693 G56932  1   1
> fam5695 G56951  1   1
> fam5697 G56971  1   1
> fam5700 G57001  2   2
> fam5701 G57011  1   1
> fam5706 G57061  1   2
> fam5709 G57091  1   1
> fam5713 G57131  1   1
> fam5715 G57151  1   1
> fam5718 G57181  1   1
>
> Please advise,
> Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help with nesting if else statements

2020-09-23 Thread Ana Marija

Hello,

I have a data frame as shown bellow.
I want to create a new column PHENO which will be defined as follows:
if CURRELIG==1 -> PHENO==1
in the above subset those that have:
PLASER==2 -> PHENO==2
and
those where RTNPTHY==1 -> PHENO==1

I tried doing this:
a$PHENO=ifelse(a$CURRELIG==1 | a$RTNPTHY==1  ,1,ifelse(a$PLASER==2 |
a$RTNPTHY==2,2,NA))

but this give me some lines where I am not seeing results that I want,
for example:
FID   IID CURRELIG PLASER RTNPTHY PHENO
fam5628 G56281 2   21

here the PHENO should be =2 because RTNPTHY==2 and PLASER==2
PHENO should be ==2 when either RTNPTHY==2 or PLASER==2

another wrong line is this:
FID  IID CURRELIG PLASER RTNPTHY PHENO
fam5706G570611 2 1

again RTNPTHY ==2 and PHENO==1 instead of 2.

My data looks like this:
FID  IID CURRELIG PLASER RTNPTHY
fam5610 G56101  1   1
fam5614 G56141  2   2
fam5615 G56151  1   1
fam5618 G56181  1   2
fam5621 G56211  1   1
fam5624 G56241  1   2
fam5625 G56251  1   1
fam5628 G56281  2   2
fam5633 G56331  2   2
fam5634 G56341  1   1
fam5635 G56352  2   2
fam5636 G56361  1   1
fam5641 G56411  1   1
fam5645 G56452  1   2
fam5646 G56462  2   2
fam5654 G56541  2   2
fam5655 G56551  2   2
fam5656 G56562  2   2
fam5658 G56581  1   1
fam5659 G56592  2   2
fam5660 G56601  1   1
fam5661 G56612  2   2
fam5664 G56641  1   1
fam5666 G56661  1   1
fam5667 G56671  1   2
fam5670 G56701  1   1
fam5671 G56711  1   2
fam5672 G56721  1   2
fam5673 G56731  1   1
fam5680 G56801  2   2
fam5686 G56861  2   2
fam5687 G56871  2   2
fam5688 G56881  1   2
fam5693 G56932  1   1
fam5695 G56951  1   1
fam5697 G56971  1   1
fam5700 G57001  2   2
fam5701 G57011  1   1
fam5706 G57061  1   2
fam5709 G57091  1   1
fam5713 G57131  1   1
fam5715 G57151  1   1
fam5718 G57181  1   1

Please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to overlay two histograms

2020-09-17 Thread Ana Marija

HI Jim,

fantastic solution!
Thank you so much!!!

Ana

On Thu, Sep 17, 2020 at 6:01 PM Jim Lemon  wrote:
>
> Hi Ana,
> Sorry it's not in ggplot, but it may help:
>
> d<-read.table(text="CHR counts name
>   1 193554  old
>   2 220816  old
>   3 174350  old
>   4 163112  old
>   5 168125  old
>   6 182366  old
>   7 143023  old
>   8 147410  old
>   9 122112  old
>  10 138394  old
>  11 130069  old
>  12 124850  old
>  13 104119  old
>  14  83931  old
>  15  72287  old
>  16  71550  old
>  17  58380  old
>  18  76812  old
>  19  37040  old
>  20  63407  old
>  21  33863  old
>  22  33812  old
>   1 202783  new
>   2 252124  new
>   3 213337  new
>   4 201001  new
>   5 207606  new
>   6 228133  new
>   7 147218  new
>   8 177518  new
>   9 121276  new
>  10 163447  new
>  11 158724  new
>  12 142183  new
>  13 89  new
>  14  83043  new
>  15  61063  new
>  16  55439  new
>  17  32883  new
>  18  69135  new
>  19  16624  new
>  20  48541  new
>  21  25479  new
>  22  19698  new",
> header=TRUE,stingsAsFactors=FALSE)
> barpos<-barplot(counts~name+CHR,data=d,beside=TRUE,names.arg=rep("",22))
> legend(40,22,c("new","old"),fill=c("gray20","gray80"))
> library(plotrix)
> staxlab(1,at=colMeans(barpos),labels=1:22)
>
> Jim
>
> On Fri, Sep 18, 2020 at 8:05 AM Ana Marija  
> wrote:
> >
> > Hello,
> >
> > I am trying to overlay two histograms with this:
> >
> > p <- ggplot(d, aes(CHR, counts, fill = name)) + geom_bar(position = "dodge")
> > p
> >
> > but I am getting this error:
> > Error: stat_count() can only have an x or y aesthetic.
> > Run `rlang::last_error()` to see where the error occurred.
> >
> > my data is this:
> >
> > > d
> >CHR counts name
> > 11 193554  old
> > 22 220816  old
> > 33 174350  old
> > 44 163112  old
> > 55 168125  old
> > 66 182366  old
> > 77 143023  old
> > 88 147410  old
> > 99 122112  old
> > 10  10 138394  old
> > 11  11 130069  old
> > 12  12 124850  old
> > 13  13 104119  old
> > 14  14  83931  old
> > 15  15  72287  old
> > 16  16  71550  old
> > 17  17  58380  old
> > 18  18  76812  old
> > 19  19  37040  old
> > 20  20  63407  old
> > 21  21  33863  old
> > 22  22  33812  old
> > 23   1 202783  new
> > 24   2 252124  new
> > 25   3 213337  new
> > 26   4 201001  new
> > 27   5 207606  new
> > 28   6 228133  new
> > 29   7 147218  new
> > 30   8 177518  new
> > 31   9 121276  new
> > 32  10 163447  new
> > 33  11 158724  new
> > 34  12 142183  new
> > 35  13 89  new
> > 36  14  83043  new
> > 37  15  61063  new
> > 38  16  55439  new
> > 39  17  32883  new
> > 40  18  69135  new
> > 41  19  16624  new
> > 42  20  48541  new
> > 43  21  25479  new
> > 44  22  19698  new
> >
> > Basically I need to show counts per CHR in "old" and "new" side by side.
> >
> > Please advise,
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to overlay two histograms

2020-09-17 Thread Ana Marija

Hello,

I am trying to overlay two histograms with this:

p <- ggplot(d, aes(CHR, counts, fill = name)) + geom_bar(position = "dodge")
p

but I am getting this error:
Error: stat_count() can only have an x or y aesthetic.
Run `rlang::last_error()` to see where the error occurred.

my data is this:

> d
   CHR counts name
11 193554  old
22 220816  old
33 174350  old
44 163112  old
55 168125  old
66 182366  old
77 143023  old
88 147410  old
99 122112  old
10  10 138394  old
11  11 130069  old
12  12 124850  old
13  13 104119  old
14  14  83931  old
15  15  72287  old
16  16  71550  old
17  17  58380  old
18  18  76812  old
19  19  37040  old
20  20  63407  old
21  21  33863  old
22  22  33812  old
23   1 202783  new
24   2 252124  new
25   3 213337  new
26   4 201001  new
27   5 207606  new
28   6 228133  new
29   7 147218  new
30   8 177518  new
31   9 121276  new
32  10 163447  new
33  11 158724  new
34  12 142183  new
35  13 89  new
36  14  83043  new
37  15  61063  new
38  16  55439  new
39  17  32883  new
40  18  69135  new
41  19  16624  new
42  20  48541  new
43  21  25479  new
44  22  19698  new

Basically I need to show counts per CHR in "old" and "new" side by side.

Please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to represent the effect of one covariate on regression results?

2020-09-15 Thread Ana Marija

Hi David,

thanks for the useful insight I did of course wrote to plink user
group but no answer there. I guess they are more concerned about how
to run commands with plink as oppose to interpret results.

What I can tell about my cohort is that about 80% of cases had Type 2
diabetes while about 8% had Type 1. (my TD covariate is reference for
the type of diabetes) In the attach is the description of the data.

Cheers,
Ana

On Tue, Sep 15, 2020 at 7:59 PM David Winsemius  wrote:
>
>
> On 9/15/20 8:57 AM, Ana Marija wrote:
> > Hi Abby and David,
> >
> > Thanks for the useful tips! I will check those.
> >
> > I completed the regression analysis in plink (as R would be very slow
> > for my sample size) but as I mentioned I need to determine the
> > influence of a specific covariate in my results and Plink is of no
> > help there.
> >
> > I did Pearson correlation analysis for P values which I got in
> > regression with and without my covariate of interest and I got this:
> >
> >> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95)
> >  Pearson's product-moment correlation
> >
> > data:  tt$P_TD and tt$P_noTD
> > t = 20.17, df = 283, p-value < 2.2e-16
> > alternative hypothesis: true correlation is not equal to 0
> > 95 percent confidence interval:
> >   0.7156134 0.8117108
> > sample estimates:
> >cor
> > 0.7679493
> >
> > I can see the p values are very correlated in those two instances. Can
> > I conclude that my covariate then doesn't have a huge effect or what
> > kind of conclusion I can draw from that?
>
>
> I do not think it follows from the correlation of p-values that your
> covariate "does not have a huge effect". P-values are not really data,
> although they are random values. A simulation study of this would
> require a much better description of the original dataset. Again, that
> is something that the users of Plink are more likely to be able to
> intuit than are we. I still do not see why this question is not being
> addressed to the users of the software from which you are deriving your
> "data".
>
>
> --
>
> David.
>
> >
> > Thanks for all your help
> > Ana
> >
> >
> >
> > On Tue, Sep 15, 2020 at 1:26 AM David Winsemius  
> > wrote:
> >> There is a user-group for PLINK, easily found by looking at the page you
> >> cited. This is not the correct place to submit such questions.
> >>
> >>
> >> https://groups.google.com/g/plink2-users?pli=1
> >>
> >>
> >> --
> >>
> >> David.
> >>
> >> On 9/14/20 6:29 AM, Ana Marija wrote:
> >>> Hello,
> >>>
> >>> I was running association analysis using --glm genotypic from:
> >>> https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
> >>> sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
> >>> result looks like this:
> >>>
> >>>   #CHROMPOSIDREFALTA1TESTOBS_CTBETA
> >>> SEZ_OR_F_STATPERRCODE
> >>>   10135434303rs11101905GAAADD11863
> >>> -0.1107330.0986981-1.121930.261891.
> >>>   10135434303rs11101905GAADOMDEV11863
> >>> 0.0797970.1110040.7188680.47.
> >>>   10135434303rs11101905GAAsex=Female
> >>> 11863-0.1204040.0536069-2.246050.0247006.
> >>>   10135434303rs11101905GAAage11863
> >>> 0.005245010.003915281.339630.180367.
> >>>   10135434303rs11101905GAAPC111863
> >>> -0.01917790.0166868-1.149280.25044.
> >>>   10135434303rs11101905GAAPC211863
> >>> -0.02699390.0173086-1.559570.118863.
> >>>   10135434303rs11101905GAAPC311863
> >>> 0.01152070.01680760.6854480.493061.
> >>>   10135434303rs11101905GAAPC411863
> >>> 9.57832e-050.01246070.00768680.993867.
> >>>   10135434303rs11101905GAAPC511863
> >>> -0.001910470.00543937-0.351230.725416.
> >>>   10135434303rs11101905GAAPC611863
> >>> -0.01033090.0159879-0.6461720.5181

Re: [R] How to represent the effect of one covariate on regression results?

2020-09-15 Thread Ana Marija

Hi Abby and David,

Thanks for the useful tips! I will check those.

I completed the regression analysis in plink (as R would be very slow
for my sample size) but as I mentioned I need to determine the
influence of a specific covariate in my results and Plink is of no
help there.

I did Pearson correlation analysis for P values which I got in
regression with and without my covariate of interest and I got this:

> cor.test(tt$P_TD, tt$P_noTD, method = "pearson", conf.level = 0.95)

Pearson's product-moment correlation

data:  tt$P_TD and tt$P_noTD
t = 20.17, df = 283, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7156134 0.8117108
sample estimates:
  cor
0.7679493

I can see the p values are very correlated in those two instances. Can
I conclude that my covariate then doesn't have a huge effect or what
kind of conclusion I can draw from that?

Thanks for all your help
Ana



On Tue, Sep 15, 2020 at 1:26 AM David Winsemius  wrote:
>
> There is a user-group for PLINK, easily found by looking at the page you
> cited. This is not the correct place to submit such questions.
>
>
> https://groups.google.com/g/plink2-users?pli=1
>
>
> --
>
> David.
>
> On 9/14/20 6:29 AM, Ana Marija wrote:
> > Hello,
> >
> > I was running association analysis using --glm genotypic from:
> > https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
> > sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
> > result looks like this:
> >
> >  #CHROMPOSIDREFALTA1TESTOBS_CTBETA
> >SEZ_OR_F_STATPERRCODE
> >  10135434303rs11101905GAAADD11863
> > -0.1107330.0986981-1.121930.261891.
> >  10135434303rs11101905GAADOMDEV11863
> > 0.0797970.1110040.7188680.47.
> >  10135434303rs11101905GAAsex=Female
> > 11863-0.1204040.0536069-2.246050.0247006.
> >  10135434303rs11101905GAAage11863
> > 0.005245010.003915281.339630.180367.
> >  10135434303rs11101905GAAPC111863
> > -0.01917790.0166868-1.149280.25044.
> >  10135434303rs11101905GAAPC211863
> > -0.02699390.0173086-1.559570.118863.
> >  10135434303rs11101905GAAPC311863
> > 0.01152070.01680760.6854480.493061.
> >  10135434303rs11101905GAAPC411863
> > 9.57832e-050.01246070.00768680.993867.
> >  10135434303rs11101905GAAPC511863
> > -0.001910470.00543937-0.351230.725416.
> >  10135434303rs11101905GAAPC611863
> > -0.01033090.0159879-0.6461720.518168.
> >  10135434303rs11101905GAAPC711863
> > 0.007909970.01440250.5492070.582863.
> >  10135434303rs11101905GAAPC811863
> > -0.002056390.0142709-0.1440960.885424.
> >  10135434303rs11101905GAAPC911863
> > -0.008737710.0057239-1.526530.126878.
> >  10135434303rs11101905GAAPC1011863
> > 0.01161970.01238260.9383880.348045.
> >  10135434303rs11101905GAATD11863
> > -0.6700260.0962216-6.963373.32228e-12.
> >  10135434303rs11101905GAAarray=Biobank
> > 118630.1606660.0736312.182050.0291062.
> >  10135434303rs11101905GAAHBA1C11863
> > 0.02659330.0016875815.75836.0236e-56.
> >  10135434303rs11101905GAAGENO_2DF11863
> >NANA0.7265140.483613.
> >
> > This results is shown just for one ID (rs11101905) there is about 2
> > million of those in the resulting file.
> >
> > My question is how do I present/plot the effect of covariate "TD" in
> > the example it has "P" equal to 3.32228e-12 for all IDs in the
> > resulting file so that I show how much effect covariate "TD" has on
> > the analysis. Should I run another regression without covariate "TD"
> > and than do scatter plot of P values with and without "TD" covariate
> > or there is a better way to do this from the data I already have?
> >
> > Thanks
> > Ana
>

Re: [R] how to replace values in a named vector

2020-09-14 Thread Ana Marija

sorry not replace with NA but with empty string for a name, for example

for example this:

> geneSymbol["Ku8QhfS0n_hIOABXuE"]
Ku8QhfS0n_hIOABXuE
   "MACC1"

would go when I subject it to

> geneSymbol["Ku8QhfS0n_hIOABXuE"]

Ku8QhfS0n_hIOABXuE

On Mon, Sep 14, 2020 at 11:35 AM Ana Marija  wrote:
>
> Hello,
>
> I have a vector like this:
>
> > head(geneSymbol)
> Ku8QhfS0n_hIOABXuE Bx496XsFXiAlj.Eaeo W38p0ogk.wIBVRXllY
> QIBkqIS9LR5DfTlTS8 BZKiEvS0eQ305U0v34 6TheVd.HiE1UF3lX6g
>"MACC1""GGACT"   "A4GALT"
> "NPSR1-AS1""NPSR1-AS1" "AAAS"
>
> it has around 15000 entries. How do I replace all values with NA
> expect these that are named like this:
>
> geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")]
>
>
> > geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")]
> 0lQ1XozriVZTn.PezY uaeFiCdegrnWFijF_s ZOluqaxSe3ndekoNng
> 912ny6eCHjnlY2XSCU odF3XHR8CVl4SAUaUQ
> "FLCN" "FLCN" "FLCN"
> "UCA1" "IL1B"
>
> Thanks
> Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to replace values in a named vector

2020-09-14 Thread Ana Marija

Hello,

I have a vector like this:

> head(geneSymbol)
Ku8QhfS0n_hIOABXuE Bx496XsFXiAlj.Eaeo W38p0ogk.wIBVRXllY
QIBkqIS9LR5DfTlTS8 BZKiEvS0eQ305U0v34 6TheVd.HiE1UF3lX6g
   "MACC1""GGACT"   "A4GALT"
"NPSR1-AS1""NPSR1-AS1" "AAAS"

it has around 15000 entries. How do I replace all values with NA
expect these that are named like this:

geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")]


> geneSymbol[c("0lQ1XozriVZTn.PezY","uaeFiCdegrnWFijF_s","ZOluqaxSe3ndekoNng","912ny6eCHjnlY2XSCU","odF3XHR8CVl4SAUaUQ")]
0lQ1XozriVZTn.PezY uaeFiCdegrnWFijF_s ZOluqaxSe3ndekoNng
912ny6eCHjnlY2XSCU odF3XHR8CVl4SAUaUQ
"FLCN" "FLCN" "FLCN"
"UCA1" "IL1B"

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to represent the effect of one covariate on regression results?

2020-09-14 Thread Ana Marija

Hello,

I was running association analysis using --glm genotypic from:
https://www.cog-genomics.org/plink/2.0/assoc with these covariates:
sex,age,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,TD,array,HBA1C. The
result looks like this:

#CHROMPOSIDREFALTA1TESTOBS_CTBETA
  SEZ_OR_F_STATPERRCODE
10135434303rs11101905GAAADD11863
-0.1107330.0986981-1.121930.261891.
10135434303rs11101905GAADOMDEV11863
0.0797970.1110040.7188680.47.
10135434303rs11101905GAAsex=Female
11863-0.1204040.0536069-2.246050.0247006.
10135434303rs11101905GAAage11863
0.005245010.003915281.339630.180367.
10135434303rs11101905GAAPC111863
-0.01917790.0166868-1.149280.25044.
10135434303rs11101905GAAPC211863
-0.02699390.0173086-1.559570.118863.
10135434303rs11101905GAAPC311863
0.01152070.01680760.6854480.493061.
10135434303rs11101905GAAPC411863
9.57832e-050.01246070.00768680.993867.
10135434303rs11101905GAAPC511863
-0.001910470.00543937-0.351230.725416.
10135434303rs11101905GAAPC611863
-0.01033090.0159879-0.6461720.518168.
10135434303rs11101905GAAPC711863
0.007909970.01440250.5492070.582863.
10135434303rs11101905GAAPC811863
-0.002056390.0142709-0.1440960.885424.
10135434303rs11101905GAAPC911863
-0.008737710.0057239-1.526530.126878.
10135434303rs11101905GAAPC1011863
0.01161970.01238260.9383880.348045.
10135434303rs11101905GAATD11863
-0.6700260.0962216-6.963373.32228e-12.
10135434303rs11101905GAAarray=Biobank
118630.1606660.0736312.182050.0291062.
10135434303rs11101905GAAHBA1C11863
0.02659330.0016875815.75836.0236e-56.
10135434303rs11101905GAAGENO_2DF11863
  NANA0.7265140.483613.

This results is shown just for one ID (rs11101905) there is about 2
million of those in the resulting file.

My question is how do I present/plot the effect of covariate "TD" in
the example it has "P" equal to 3.32228e-12 for all IDs in the
resulting file so that I show how much effect covariate "TD" has on
the analysis. Should I run another regression without covariate "TD"
and than do scatter plot of P values with and without "TD" covariate
or there is a better way to do this from the data I already have?

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract information from .Rdata format

2020-07-31 Thread Ana Marija

do you think that this is useful output from Basics of R?
> load("paired_example.Rdata")
> str(rawdata)
 num [1:4482, 1:10] 46 4 3 48 1 4 0 60 0 12 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:4482] "gene1" "gene2" "gene3" "gene4" ...
  ..$ : chr [1:10] "a.cancer" "b.cancer" "c.cancer" "d.cancer" ..

I think this one is better:
> dat<-local(get(load("paired_example.Rdata")))
> head(dat)
  a.cancer b.cancer c.cancer d.cancer e.cancer a.normal b.normal c.normal
gene1   464   3358   615   42
gene240215   241   30
gene334421300
gene4   482   1006943
gene515236103
gene640011407

On Fri, Jul 31, 2020 at 11:05 PM Bert Gunter  wrote:
>
> Sarah has explained all.
>
> I agree with her about the need for tutorials also. This list cannot 
> substitute for such homework on your own.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Jul 31, 2020 at 7:55 PM Ana Marija  
> wrote:
>>
>> Hi Bert,
>>
>> it gives me this:
>>
>> > a=load("paired_example.Rdata")
>> > str(a)
>>  chr [1:3] "rawdata" "treatment" "patient"
>>
>> I don't know how to extract "treatment" for example in a data frame.
>>
>> I tried this but of no help.
>> > b=a[[2]]
>> > b
>> [1] "treatment"
>>
>> > str(treatment)
>>  chr [1:10] "treat" "treat" "treat" "treat" "treat" "control" "control" ...
>>
>> but this is not the format I need.
>>
>> On Fri, Jul 31, 2020 at 9:18 PM Ana Marija  
>> wrote:
>> >
>> > Hello,
>> >
>> > I have this file:
>> > > a=load("paired_example.Rdata")
>> > > a
>> > [1] "rawdata"   "treatment" "patient"
>> >
>> > I can extract "rawdata" with:
>> >  dat<-local(get(load("paired_example.Rdata")))
>> >
>> > Can you please advise how would I extract in data frame "treatment"
>> > and "patient"?
>> >
>> > Thanks
>> > Ana
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract information from .Rdata format

2020-07-31 Thread Ana Marija

It seems that "treatment" and "patient" are just vectors.

> treatment
 [1] "treat"   "treat"   "treat"   "treat"   "treat"   "control" "control"
 [8] "control" "control" "control"
> patient
 [1] "a" "b" "c" "d" "e" "a" "b" "c" "d" "e"

On Fri, Jul 31, 2020 at 9:53 PM Ana Marija  wrote:
>
> Hi Bert,
>
> it gives me this:
>
> > a=load("paired_example.Rdata")
> > str(a)
>  chr [1:3] "rawdata" "treatment" "patient"
>
> I don't know how to extract "treatment" for example in a data frame.
>
> I tried this but of no help.
> > b=a[[2]]
> > b
> [1] "treatment"
>
> > str(treatment)
>  chr [1:10] "treat" "treat" "treat" "treat" "treat" "control" "control" ...
>
> but this is not the format I need.
>
> On Fri, Jul 31, 2020 at 9:18 PM Ana Marija  
> wrote:
> >
> > Hello,
> >
> > I have this file:
> > > a=load("paired_example.Rdata")
> > > a
> > [1] "rawdata"   "treatment" "patient"
> >
> > I can extract "rawdata" with:
> >  dat<-local(get(load("paired_example.Rdata")))
> >
> > Can you please advise how would I extract in data frame "treatment"
> > and "patient"?
> >
> > Thanks
> > Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract information from .Rdata format

2020-07-31 Thread Ana Marija

Hi Bert,

it gives me this:

> a=load("paired_example.Rdata")
> str(a)
 chr [1:3] "rawdata" "treatment" "patient"

I don't know how to extract "treatment" for example in a data frame.

I tried this but of no help.
> b=a[[2]]
> b
[1] "treatment"

> str(treatment)
 chr [1:10] "treat" "treat" "treat" "treat" "treat" "control" "control" ...

but this is not the format I need.

On Fri, Jul 31, 2020 at 9:18 PM Ana Marija  wrote:
>
> Hello,
>
> I have this file:
> > a=load("paired_example.Rdata")
> > a
> [1] "rawdata"   "treatment" "patient"
>
> I can extract "rawdata" with:
>  dat<-local(get(load("paired_example.Rdata")))
>
> Can you please advise how would I extract in data frame "treatment"
> and "patient"?
>
> Thanks
> Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to extract information from .Rdata format

2020-07-31 Thread Ana Marija

Hello,

I have this file:
> a=load("paired_example.Rdata")
> a
[1] "rawdata"   "treatment" "patient"

I can extract "rawdata" with:
 dat<-local(get(load("paired_example.Rdata")))

Can you please advise how would I extract in data frame "treatment"
and "patient"?

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to loop over two files ...

2020-06-22 Thread Ana Marija

Thank you so much as.character(r) indeed resolved the issue!

On Sat, Jun 20, 2020 at 3:47 AM Ivan Krylov  wrote:
>
> On Fri, 19 Jun 2020 19:36:41 -0500
> Ana Marija  wrote:
>
> > Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1),
> > "\n"),  : argument 1 (type 'list') cannot be handled by 'cat'
>
> It might be a good idea to try to solve problems like this yourself
> instead of waiting for hours for someone to reply. All the required
> information is there in the error message: write() fails because r is a
> list. Why is r a list? It's returned from GET(), so let's read its
> documentation.
>
> httr::GET() returns a response object, not a string [1]. Try passing
> as.character(r) or content(r,'text') instead of just r to write(...) or
> use a different way of extracting the actual response from the response
> object.
>
> --
> Best regards,
> Ivan
>
> [1] https://httr.r-lib.org/reference/GET.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija

unfortunately it complains again:

> f1 <- read_tsv("1g", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
> f2 <- read_tsv("1n", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
> for ( a in rownames(f1) ) {
+
+for ( b in rownames(f2) ) {
+
+ ext <- paste0( "/ld/human/pairwise/",
+   f1[a,1],
+   "/",
+   f2[b,1],
+   "?population_name=1000GENOMES:phase_3:KHV")
+
+ r <- GET(paste(server, ext, sep = ""),
+ content_type("application/json"))
+
+ write(r,file="list.txt",append=TRUE)
+
+
+}
+
+ }
Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"),  :
  argument 1 (type 'list') cannot be handled by 'cat'

> traceback()
2: cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"),
   append = append)
1: write(r, file = "list.txt", append = TRUE)

On Fri, Jun 19, 2020 at 5:19 PM  wrote:
>
> Sorry - its been a long week!
>
> there is a foreach package but I try to avoid extras
>
> make your for statements:
>
> for ( a in rownames(f1) ) {
>
> # a will now be a row number rather than the value, so replace ' a ' in
> the paste0 with: f1[ a, 1]
>
> so
>
> ext <- paste0( "/ld/human/pairwise/",
>   f1[a,1],
>   "/",
>   f2[b,1],
>   "?population_name=1000GENOMES:phase_3:KHV")
>
> On 2020-06-19 22:54, Ana Marija wrote:
> > I tried it:
> >
> >  > library(httr)
> >> library(jsonlite)
> >> library(xml2)
> >> library(readr)
> >> server <- "http://rest.ensembl.org";
> >> f1 <- read_tsv("1g", col_names=F)
> > Parsed with column specification:
> > cols(
> >   X1 = col_character()
> > )
> >> f2 <- read_tsv("1n", col_names=F)
> > Parsed with column specification:
> > cols(
> >   X1 = col_character()
> > )
> >>
> >> for ( a in as.list(f1[,1]) ) {
> > +
> > +for ( b in as.list(f2[,1]) ) {
> > +
> > + ext <- paste0( "/ld/human/pairwise/",
> > + a,
> > + "/",
> > + b,
> > + "?population_name=1000GENOMES:phase_3:KHV")
> > +
> > + r <- GET(paste(server, ext, sep = ""),
> > + content_type("application/json"))
> > +
> > + write(r,file="list.txt",append=TRUE)
> > +
> > +
> > +}
> > +
> > + }
> > Error in parse_url(url) : length(url) == 1 is not TRUE
> >
> >> traceback()
> > 10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
> > 9: stopifnot(length(url) == 1)
> > 8: parse_url(url)
> > 7: is.url(url)
> > 6: stopifnot(is.url(url))
> > 5: build_url(parse_url(url)[c("scheme", "hostname", "port")])
> > 4: handle_name(url)
> > 3: handle_find(url)
> > 2: handle_url(handle, url, ...)
> > 1: GET(paste(server, ext, sep = ""), content_type("application/json"))
> >
> > On Fri, Jun 19, 2020 at 4:41 PM  wrote:
> >>
> >> Oh - read.text isn't in base!  Not sure where is came from (my head
> >> mostly!)  You may have something that adds it but better to use
> >> something that works.  So try using:
> >>
> >> library(readr)
> >> f1 <- read_tsv("1g.txt", col.names=F)
> >>
> >> This will give you a tibble with f1$X1 with the file in it
> >>
> >> then loop it with (a in as.list(f1[,1])
> >>
> >> Others will have much slicker code than me!
> >>
> >> On 2020-06-19 22:02, Ana Marija wrote:
> >> > Hi,
> >> >
> >> > thanks for getting back to me, it is just for my job :)
> >> >
> >> > so I tried it:
> >> >
> >> > library(httr)
> >> > library(jsonlite)
> >> > library(xml2)
> >> > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R",
> >> > "lib")))
> >> > sparkR.session(master = "local[*]", sparkConfig =
> >> > list(spark.driver.memory = "2g"))
> >> >
> >> > server <- "http://rest.ensembl.org";
> >> >
> >>

Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija

Hi Rasmus,

I got those SNPs from two GWAS-es which I run with different
phenotypes and I would like to compare weather the top SNPs in both of
them are in LD.
So 1n.txt and 1g.txt are just top SNPs from those two GWAS-es.
Unfortunately https://ldlink.nci.nih.gov/?tab=ldpair works for only
two SNPs at the time and I need to do that for 300 pairs

On Fri, Jun 19, 2020 at 6:42 PM Rasmus Liland  wrote:
>
> On 2020-06-19 14:34 -0500, Ana Marija wrote:
> >
> > I have two files (each has 300 lines)like this:
>
> The example looks quite similar to the R example in
> https://rest.ensembl.org/documentation/info/ld_pairwise_get#ra
> ...
>
> The question becomes: how did you query the
> 600 variant names in 1g.txt and 1n.txt?
>
>   curl 'https://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?' -H 
> 'Content-type:application/json'
>
> shows the 26 population_names for the
> rs6792369/rs1042779 combination ...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija

I tried it:

 > library(httr)
> library(jsonlite)
> library(xml2)
> library(readr)
> server <- "http://rest.ensembl.org";
> f1 <- read_tsv("1g", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
> f2 <- read_tsv("1n", col_names=F)
Parsed with column specification:
cols(
  X1 = col_character()
)
>
> for ( a in as.list(f1[,1]) ) {
+
+for ( b in as.list(f2[,1]) ) {
+
+ ext <- paste0( "/ld/human/pairwise/",
+ a,
+ "/",
+ b,
+ "?population_name=1000GENOMES:phase_3:KHV")
+
+ r <- GET(paste(server, ext, sep = ""),
+ content_type("application/json"))
+
+ write(r,file="list.txt",append=TRUE)
+
+
+}
+
+ }
Error in parse_url(url) : length(url) == 1 is not TRUE

> traceback()
10: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
9: stopifnot(length(url) == 1)
8: parse_url(url)
7: is.url(url)
6: stopifnot(is.url(url))
5: build_url(parse_url(url)[c("scheme", "hostname", "port")])
4: handle_name(url)
3: handle_find(url)
2: handle_url(handle, url, ...)
1: GET(paste(server, ext, sep = ""), content_type("application/json"))

On Fri, Jun 19, 2020 at 4:41 PM  wrote:
>
> Oh - read.text isn't in base!  Not sure where is came from (my head
> mostly!)  You may have something that adds it but better to use
> something that works.  So try using:
>
> library(readr)
> f1 <- read_tsv("1g.txt", col.names=F)
>
> This will give you a tibble with f1$X1 with the file in it
>
> then loop it with (a in as.list(f1[,1])
>
> Others will have much slicker code than me!
>
> On 2020-06-19 22:02, Ana Marija wrote:
> > Hi,
> >
> > thanks for getting back to me, it is just for my job :)
> >
> > so I tried it:
> >
> > library(httr)
> > library(jsonlite)
> > library(xml2)
> > library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R",
> > "lib")))
> > sparkR.session(master = "local[*]", sparkConfig =
> > list(spark.driver.memory = "2g"))
> >
> > server <- "http://rest.ensembl.org";
> >
> > f1 <- read.text("1g.txt")
> > f2 <- read.text("1n.txt")
> >
> > for ( a in as.list(f1) ) {
> >
> >for ( b in as.list(f2) ) {
> >
> > ext <- paste0( "/ld/human/pairwise/",
> > a,
> > "/",
> > b,
> > "?population_name=1000GENOMES:phase_3:KHV")
> >
> > r <- GET(paste(server, ext, sep = ""),
> > content_type("application/json"))
> >
> > write(r,file="list.txt",append=TRUE)
> >
> >
> >}
> >
> > }
> >
> > and I got this error:
> > Error in as.list.default(f1) :
> >   no method for coercing this S4 class to a vector
> >
> > Please advise
> >
> > On Fri, Jun 19, 2020 at 3:28 PM  wrote:
> >>
> >> so (untested) if you did something like
> >>
> >> f1 <- read.text("1g.txt")
> >> f2 <- read.text("1n.txt")
> >>
> >> for ( a in as.list(f1) ) {
> >>
> >>for ( b in as.list(f2) ) {
> >>
> >> ext <- paste0( "/ld/human/pairwise/",
> >> a,
> >> "/",
> >> b,
> >> "?population_name=1000GENOMES:phase_3:KHV")
> >>
> >> r <- GET(paste(server, ext, sep = ""),
> >> content_type("application/json"))
> >>
> >> # You presumably need to do something with 'r' at the
> >> moment its over written by the next loop..  were
> >> # you appending it to list.txt?  Possibly its just a
> >> bit
> >> of the R output you want.?
> >>
> >> write(r,file="list.txt",append=TRUE)
> >>
> >>
> >>}
> >>
> >> }
> >>
> >>
> >> Are we doing your PhD for you ;-)  Do we get to share ;-)
> >>
> >>
> >> On 2020-06-19 20:34, Ana Marija wrote:
> >> > Hello,
> >> >
> >> > I have two files (each has 300 lines)like this:
> >> >
> >> >

Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija

HI Rasmus,

I tried it:

library(base)

files <- c("1g.txt", "1n.txt")
files <- lapply(files, readLines)
server <- "http://rest.ensembl.org";
population.name <- "1000GENOMES:phase_3:KHV"
ext <- apply(expand.grid(files), 1, function(x) {
  return(paste0(server, "/ld/human/pairwise/",
x[1], "/", x[2],
"?population_name=", population.name))
})

r <- readRDS(paste0(population.name, ".rds"))
lapply(r[1:4], function(x) {
  jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
})

and I got this error:
> r <- readRDS(paste0(population.name, ".rds"))
Error in gzfile(file, "rb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '1000GENOMES:phase_3:KHV.rds', probable
reason 'No such file or directory'
> lapply(r[1:4], function(x) {
+   jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
+     })
Error in lapply(r[1:4], function(x) { : object 'r' not found

Am I am doing here something wrong?
Do I need any other libraries loaded?

Thanks
Ana

On Fri, Jun 19, 2020 at 3:49 PM Rasmus Liland  wrote:
>
> On 2020-06-19 14:34 -0500, Ana Marija wrote:
> >
> > server <- "http://rest.ensembl.org";
> > ext <- 
> > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""), content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r
> >d_prime   r2 variation1 variation2 population_name
> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
>
> Dear Ana,
>
> I tried, but for some reason I get only a
> response for the first URL you supplied.
>
> I wrote this:
>
> files <- c("1g.txt", "1n.txt")
> files <- lapply(files, readLines)
> server <- "http://rest.ensembl.org";
> population.name <- "1000GENOMES:phase_3:KHV"
> ext <- apply(expand.grid(files), 1, function(x) {
>   return(paste0(server, "/ld/human/pairwise/",
> x[1], "/", x[2],
> "?population_name=", population.name))
> })
>
> # r <- lapply(ext, function(x) {
> #   httr::GET(x, httr::content_type("application/json"))
> # })
> # names(r) <- ext
> # file <- paste0(population.name, ".rds")
> # saveRDS(object=r, compress="xz", file=file)
>
> r <- readRDS(paste0(population.name, ".rds"))
> lapply(r[1:4], function(x) {
>   jsonlite::fromJSON(jsonlite::toJSON(httr::content(x)))
> })
>
>
> Which if you are able to run it (saving the
> output in that rds file), yields this:
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV`
>   variation2 population_name  d_prime   r2 variation1
> 1  rs1042779 1000GENOMES:phase_3:KHV 0.975513 0.951626  rs6792369
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs1414517/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs16857712/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> 
> $`http://rest.ensembl.org/ld/human/pairwise/rs16857703/rs1042779?population_name=1000GENOMES:phase_3:KHV`
> list()
>
> For some reason, only the first url works ...
>
> I am a bit unfamiliar working with REST
> API's.  Or web scraping in general.  Daniel
> Cegiełka knows something in this thread some
> days ago, where it might be similar to the
> API of borsaitaliana.it, where you can supply
> headers with curl like he quickly did [2].
>
> You might be able to supply the list of SNPs
> in a header to Ensemble in httr::GET somehow
> if you read some docs on their API?
>
> Best,
> Rasmus
>
> [1] https://marc.info/?t=15924924612&r=1&w=2
> [2] https://marc.info/?l=r-sig-finance&m=159249894208684&w=2

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to loop over two files ...

2020-06-19 Thread Ana Marija

Hi,

thanks for getting back to me, it is just for my job :)

so I tried it:

library(httr)
library(jsonlite)
library(xml2)
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session(master = "local[*]", sparkConfig =
list(spark.driver.memory = "2g"))

server <- "http://rest.ensembl.org";

f1 <- read.text("1g.txt")
f2 <- read.text("1n.txt")

for ( a in as.list(f1) ) {

   for ( b in as.list(f2) ) {

ext <- paste0( "/ld/human/pairwise/",
a,
"/",
b,
"?population_name=1000GENOMES:phase_3:KHV")

r <- GET(paste(server, ext, sep = ""),
content_type("application/json"))

write(r,file="list.txt",append=TRUE)


   }

}

and I got this error:
Error in as.list.default(f1) :
  no method for coercing this S4 class to a vector

Please advise

On Fri, Jun 19, 2020 at 3:28 PM  wrote:
>
> so (untested) if you did something like
>
> f1 <- read.text("1g.txt")
> f2 <- read.text("1n.txt")
>
> for ( a in as.list(f1) ) {
>
>for ( b in as.list(f2) ) {
>
> ext <- paste0( "/ld/human/pairwise/",
> a,
> "/",
> b,
> "?population_name=1000GENOMES:phase_3:KHV")
>
> r <- GET(paste(server, ext, sep = ""),
> content_type("application/json"))
>
> # You presumably need to do something with 'r' at the
> moment its over written by the next loop..  were
> # you appending it to list.txt?  Possibly its just a bit
> of the R output you want.?
>
> write(r,file="list.txt",append=TRUE)
>
>
>}
>
> }
>
>
> Are we doing your PhD for you ;-)  Do we get to share ;-)
>
>
> On 2020-06-19 20:34, Ana Marija wrote:
> > Hello,
> >
> > I have two files (each has 300 lines)like this:
> >
> > head 1g.txt
> > rs6792369
> > rs1414517
> > rs16857712
> > rs16857703
> > rs12239392
> > ...
> >
> > head 1n.txt
> > rs1042779
> > rs2360630
> > rs10753597
> > rs7549096
> > rs2343491
> > ...
> >
> > For each pair of rs# from those two files I can run this command in R
> >
> > library(httr)
> > library(jsonlite)
> > library(xml2)
> >
> > server <- "http://rest.ensembl.org";
> > ext <-
> > "/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"
> >
> > r <- GET(paste(server, ext, sep = ""),
> > content_type("application/json"))
> >
> > stop_for_status(r)
> > head(fromJSON(toJSON(content(r
> >d_prime   r2 variation1 variation2 population_name
> > 1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV
> >
> > What I would like to do is to do is to run this command for every SNP
> > in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
> > is rs# and output every line of result in list.txt
> >
> > The process is illustrated in the attachment.
> >
> > Please help,
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to loop over two files ...

2020-06-19 Thread Ana Marija

Hello,

I have two files (each has 300 lines)like this:

head 1g.txt
rs6792369
rs1414517
rs16857712
rs16857703
rs12239392
...

head 1n.txt
rs1042779
rs2360630
rs10753597
rs7549096
rs2343491
...

For each pair of rs# from those two files I can run this command in R

library(httr)
library(jsonlite)
library(xml2)

server <- "http://rest.ensembl.org";
ext <- 
"/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV"

r <- GET(paste(server, ext, sep = ""), content_type("application/json"))

stop_for_status(r)
head(fromJSON(toJSON(content(r
   d_prime   r2 variation1 variation2 population_name
1 0.975513 0.951626  rs6792369  rs1042779 1000GENOMES:phase_3:KHV

What I would like to do is to do is to run this command for every SNP
in one list (1g.txt) to each SNP in another list (1n.txt). Where SNP#
is rs# and output every line of result in list.txt

The process is illustrated in the attachment.

Please help,
Ana


lists.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] comparing variances/distributions

2020-06-17 Thread Ana Marija

would using Kolmogorov-Smirnov test make more sense here?

> x=m$Pold
> y=m$Pnew
> ks.test(x,y)

Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.0049066, p-value = 1.221e-15
alternative hypothesis: two-sided

Warning message:
In ks.test(x, y) : p-value will be approximate in the presence of ties

as I understand high p-values here say I cannot claim statistical
support for a difference, but low p-values are not evidence of
sameness?
D should be the maximum difference between the two probability distributions?

On Wed, Jun 17, 2020 at 3:06 PM Ana Marija  wrote:
>
> Hello,
>
> I have p values from two distributions, Pold and Pnew
> > head(m)
>CHR POS MARKER   Pnew   Pold
> 1:   1  785989  rs2980300 0.1419 0.9521
> 2:   1 1130727 rs10907175 0.1022 0.4750
> 3:   1 1156131  rs2887286 0.3698 0.5289
> 4:   1 1158631  rs6603781 0.1929 0.2554
> 5:   1 1211292  rs6685064 0.6054 0.2954
> 6:   1 1478153  rs3766180 0.6511 0.5542
> ...
>
> In order to compare those two distributions (QQ plots shown in attach)
> does it make sense to use:
>
> var.test(m$Pold, m$Pnew, alternative = "two.sided")
>
> F test to compare two variances
>
> data:  m$Pold and m$Pnew
> F = 0.99937, num df = 1454159, denom df = 1454159, p-value = 0.7057
> alternative hypothesis: true ratio of variances is not equal to 1
> 95 percent confidence interval:
>  0.9970808 1.0016750
> sample estimates:
> ratio of variances
>  0.9993739
>
>
> Or some other test makes more sense?
>
> Thanks
> Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] comparing variances/distributions

2020-06-17 Thread Ana Marija

Hello,

I have p values from two distributions, Pold and Pnew
> head(m)
   CHR POS MARKER   Pnew   Pold
1:   1  785989  rs2980300 0.1419 0.9521
2:   1 1130727 rs10907175 0.1022 0.4750
3:   1 1156131  rs2887286 0.3698 0.5289
4:   1 1158631  rs6603781 0.1929 0.2554
5:   1 1211292  rs6685064 0.6054 0.2954
6:   1 1478153  rs3766180 0.6511 0.5542
...

In order to compare those two distributions (QQ plots shown in attach)
does it make sense to use:

var.test(m$Pold, m$Pnew, alternative = "two.sided")

F test to compare two variances

data:  m$Pold and m$Pnew
F = 0.99937, num df = 1454159, denom df = 1454159, p-value = 0.7057
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.9970808 1.0016750
sample estimates:
ratio of variances
 0.9993739


Or some other test makes more sense?

Thanks
Ana


qq-plots-compressed.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to change manhattan plot code to get a different color per chromosome

2020-06-16 Thread Ana Marija

Hi Jim,

thank you so much for this elaborate code I will definitely use some
hacks from it in the future.
For now, for the purpose of compactness of the code I just set
manually colors which I want to be there:
manhattan(results_log,chr="CHR",bp="POS",p="META_pval",snp="MARKER",suggestiveline
= F, genomewideline = F,main = "Gokind old",col = c("red2", "orange",
"gold1","springgreen4", "dodgerblue",
"darkorchid1","orchid1"),ylim=c(0,9),cex = 0.6)
in this qqman function.

Cheers,
Ana

On Tue, Jun 16, 2020 at 4:00 AM Jim Lemon  wrote:
>
> Hi Ana,
> Your attached image seems to have bailed out before landing in the
> list. Here's how to do it using a simple manhattan plot with the data
> from:
>
> https://reneshbedre.github.io//assets/posts/mhat/gwas_res_sim.csv
>
> gwas<-read.csv("gwas_res_sim.csv",stringsAsFactors=FALSE)
> # get the data into chromosome order
> gwas<-gwas[order(gwas$chr,gwas$SNP),]
> gwas$BP<-rep(NA,dim(gwas)[1]
> # fake some base positions
> for(chromosome in 1:20) {
>  snporder<-order(as.numeric(gsub("rs","",gwas[gwas$chr == chromosome,"SNP"])))
>  gwas$BP[gwas$chr == chromosome]<-
>   as.numeric(paste(chromosome,snporder,sep="."))
> }
> library(plotrix)
> # set the chromosome colors here - be more creative than me
> chrcol<-color.scale(1:10,extremes=c("red","blue"))
>
> # simple manhattan plot function
> manhattan<-function(x,CHR="CHR",BP="BP",SNP="SNP",p="p",
>  main="Manhattan Plot",xlab="Chromosome",ylab="-log10(p)",
>  pch=".",cex=1,siglevel=0.1,sigcol="green",sigcex=2,
>  chrlab=NULL,chrcol=NULL,annotate=FALSE) {
>
>  par(xaxs="i",yaxs="i")
>  nchr<-length(unique(x[,CHR]))
>  if(is.null(chrlab)) chrlab<-1:nchr
>  ypos<--log10(x[,p])
>  plot(x[,BP],ypos,ylim=c(0,max(ypos,na.rm=TRUE)*1.05),
>   main=main,xlab=xlab,ylab=ylab,
>   pch=pch,col=chrcol[x[,CHR]],cex=4,xaxt="n")
>  abline(h=-log10(siglevel),lty=2)
>  staxlab(1,at=(1:nchr) + 0.5,labels=chrlab)
>  sigindx<-which(ypos >= -log10(siglevel))
>  sigSNP<-unique(x[sigindx,SNP])
>  cat(length(sigindx),"sig\n")
>  points(x[sigindx,BP],ypos[sigindx],
>   pch=pch,col=sigcol,cex=sigcex)
>  if(annotate) text(x[sigindx,BP],ypos[sigindx]*1.03,x[sigindx,SNP])
>  return(sigSNP=sigSNP)
> }
>
> manhattan(gwas,CHR="chr",p="pvalue",cex=4,sigcex=6,
>  chrcol=chrcol,annotate=TRUE)
>
> Jim
>
> On Tue, Jun 16, 2020 at 8:06 AM Ana Marija  
> wrote:
> >
> > Hello,
> >
> > Is there is a way to set colors in this plot to look like this one in
> > attach (different color for each CHR-there is 22 of them)?
> >
> >
> > library(qqman)
> > results_log <- read.table("meta_p_pos_chr.F", 
> > head=TRUE,stringsAsFactors=FALSE)
> > png("META.png")
> > manhattan(results_log,chr="CHR",bp="POS",p="META_pval",snp="MARKER",ylim
> > = c(0, 10))
> > dev.off()
> >
> > Thanks
> > Ana
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to change manhattan plot code to get a different color per chromosome

2020-06-15 Thread Ana Marija

Hello,

Is there is a way to set colors in this plot to look like this one in
attach (different color for each CHR-there is 22 of them)?


library(qqman)
results_log <- read.table("meta_p_pos_chr.F", head=TRUE,stringsAsFactors=FALSE)
png("META.png")
manhattan(results_log,chr="CHR",bp="POS",p="META_pval",snp="MARKER",ylim
= c(0, 10))
dev.off()

Thanks
Ana
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if else statement adjustemtn

2020-06-15 Thread Ana Marija

HI Jim

thank you so much! This is amazing answer!!!

Ana

On Sat, Jun 13, 2020 at 4:09 AM Jim Lemon  wrote:
>
> Right, back from shopping. Since you have fourteen rows containing NAs
> and you only want seven, we can infer that half of them must go. As
> they are neatly divided into seven rows in which only one NA appears
> and seven in which two stare meaninglessly out at us. I will assume
> that the latter are the ones to be discarded. As your condition for
> calculating "pheno" stated that a 2 in either FLASER or PLASER should
> result in a 2 in pheno, the following statement closely conforms to
> that:
>
> b<-read.table(text="FID   IID FLASER PLASER
>   fam1837 G1837  1 NA
>   fam2410 G2410 NA NA
>   fam2838 G2838 NA  2
>   fam3367 G3367  1 NA
>   fam3410 G3410  1 NA
>   fam3492 G3492  1 NA
>   fam0911  G911 NA NA
>   fam3834 G3834  2 NA
>   fam4708 G4708 NA  2
>   fam5162 G5162 NA NA
>   fam5274 G5274 NA NA
>   fam0637  G637 NA NA
>   fam0640  G640 NA NA
>   fam0743  G743 NA NA
>   fam0911  G911 NA NA",
>   header=TRUE,stringsAsFactors=FALSE)
>
> b$pheno<-ifelse(b$FLASER == 2 | b$PLASER == 2,2,1)
> # use the valid FLASER values when PLASER is NA
> b[is.na(b$pheno),]$pheno<-ifelse(!is.na(b[is.na(b$pheno),]$FLASER),
>  b[is.na(b$pheno),]$FLASER,NA)
> # use the valid PLASER values when FLASER if NA
> b[is.na(b$pheno),]$pheno<-ifelse(!is.na(b[is.na(b$pheno),]$PLASER),
>  b[is.na(b$pheno),]$PLASER,NA)
> b
>
> I could write that mess in one straitjacket of conditional statements
> but my brain hurts enough.
>
> Jim
>
>
> On Sat, Jun 13, 2020 at 1:59 PM Ana Marija  
> wrote:
> >
> > Great idea!
> > Here it is:
> > > b[is.na(b$FLASER) | is.na(b$PLASER),]
> > FID   IID FLASER PLASER pheno
> >  1: fam1837 G1837  1 NA 2
> >  2: fam2410 G2410 NA NA 2
> >  3: fam2838 G2838 NA  2 2
> >  4: fam3367 G3367  1 NA 2
> >  5: fam3410 G3410  1 NA 2
> >  6: fam3492 G3492  1 NA 2
> >  7: fam3834 G3834  2 NA 2
> >  8: fam4708 G4708 NA  2 2
> >  9: fam5162 G5162 NA NA 2
> > 10: fam5274 G5274 NA NA 2
> > 11: fam0637  G637 NA NA 2
> > 12: fam0640  G640 NA NA 2
> > 13: fam0743  G743 NA NA 2
> > 14: fam0911  G911 NA NA 2
> >
> > On Fri, Jun 12, 2020 at 10:29 PM Jim Lemon  wrote:
> > >
> > > Since you have only a few troublesome NA values, if you look at them,
> > > or even better, post them:
> > >
> > > b[is.na(b$FLASER) | is.na(b$PLASER),]
> > >
> > > perhaps we can work out the appropriate logic to get rid of only the
> > > ones you don't want.
> > >
> > > Jim
> > >
> > > On Sat, Jun 13, 2020 at 12:50 PM Ana Marija  
> > > wrote:
> > > >
> > > > Hi Rasmus,
> > > >
> > > > thank you for getting back to be, the command your provided seems to
> > > > add all 11 NAs to 2s
> > > > > b$pheno <-
> > > > +   ifelse(b$PLASER==2 |
> > > > +  b$FLASER==2 |
> > > > +  is.na(b$PLASER) |
> > > > +  is.na(b$PLASER) & b$FLASER %in% 1:2 |
> > > > +  is.na(b$FLASER) & b$PLASER == 2,
> > > > +  2, 1)
> > > > > table(b$pheno, exclude = NULL)
> > > >
> > > >   1   2
> > > > 859 839
> > > >
> > > > Once again my desired results is to keep these 7 NAs as NAs
> > > > > table(b$PLASER,b$FLASER, exclude = NULL)
> > > >
> > > >  1   2   3 
> > > >   1836  14   00
> > > >   2691  70  432
> > > >   3  2   7  210
> > > >  4   1   07
> > > >
> > > > and have
> > > > 825 2s (825=691+14+70+7+43)
> > > > and the rest would be 1s (866=1698-7-825)
> > > >
> > > > On Fri, Jun 12, 2020 at 9:29 PM Rasmus Liland  wrote:
> > > > >
> > > > > On 2020-06-13 11:30 +1000, Jim Lemon wrote:
> > > > > > On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote:
> > > > > > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija wrote:
> > > > > > > >
> > > > > >

Re: [R] if else statement adjustemtn

2020-06-12 Thread Ana Marija

Great idea!
Here it is:
> b[is.na(b$FLASER) | is.na(b$PLASER),]
FID   IID FLASER PLASER pheno
 1: fam1837 G1837  1 NA 2
 2: fam2410 G2410 NA NA 2
 3: fam2838 G2838 NA  2 2
 4: fam3367 G3367  1 NA 2
 5: fam3410 G3410  1 NA 2
 6: fam3492 G3492  1 NA 2
 7: fam3834 G3834  2 NA 2
 8: fam4708 G4708 NA  2 2
 9: fam5162 G5162 NA NA 2
10: fam5274 G5274 NA NA 2
11: fam0637  G637 NA NA 2
12: fam0640  G640 NA NA 2
13: fam0743  G743 NA NA 2
14: fam0911  G911 NA NA 2

On Fri, Jun 12, 2020 at 10:29 PM Jim Lemon  wrote:
>
> Since you have only a few troublesome NA values, if you look at them,
> or even better, post them:
>
> b[is.na(b$FLASER) | is.na(b$PLASER),]
>
> perhaps we can work out the appropriate logic to get rid of only the
> ones you don't want.
>
> Jim
>
> On Sat, Jun 13, 2020 at 12:50 PM Ana Marija  
> wrote:
> >
> > Hi Rasmus,
> >
> > thank you for getting back to be, the command your provided seems to
> > add all 11 NAs to 2s
> > > b$pheno <-
> > +   ifelse(b$PLASER==2 |
> > +  b$FLASER==2 |
> > +  is.na(b$PLASER) |
> > +  is.na(b$PLASER) & b$FLASER %in% 1:2 |
> > +  is.na(b$FLASER) & b$PLASER == 2,
> > +  2, 1)
> > > table(b$pheno, exclude = NULL)
> >
> >   1   2
> > 859 839
> >
> > Once again my desired results is to keep these 7 NAs as NAs
> > > table(b$PLASER,b$FLASER, exclude = NULL)
> >
> >  1   2   3 
> >   1836  14   00
> >   2691  70  432
> >   3  2   7  210
> >  4   1   07
> >
> > and have
> > 825 2s (825=691+14+70+7+43)
> > and the rest would be 1s (866=1698-7-825)
> >
> > On Fri, Jun 12, 2020 at 9:29 PM Rasmus Liland  wrote:
> > >
> > > On 2020-06-13 11:30 +1000, Jim Lemon wrote:
> > > > On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote:
> > > > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija wrote:
> > > > > >
> > > > > > I am trying to make a new column
> > > > > > "pheno" so that I reduce the number
> > > > > > of NAs
> > > > >
> > > > > it looks like those two NA values in
> > > > > PLASER are the ones you want to drop.
> > > >
> > > > From just your summary table, it's hard to
> > > > guess the distribution of NA values.
> > >
> > > Dear Ana,
> > >
> > > This small sample
> > >
> > > b <- read.table(text="FLASER;PLASER
> > > 1;2
> > > ;2
> > > ;
> > > 1;
> > > 2;
> > > 2;2
> > > 3;2
> > > 3;3
> > > 1;1", sep=";", header=TRUE)
> > >
> > > table(b$PLASER,b$FLASER, exclude = NULL)
> > >
> > > yields the same combinations you showed
> > > earlier:
> > >
> > >1 2 3 
> > >   11 0 00
> > >   21 1 11
> > >   30 0 10
> > >1 1 01
> > >
> > > If you want to eliminate the four -based
> > > combinations completely, this line
> > >
> > > b$pheno <-
> > >   ifelse(b$PLASER==2 |
> > >  b$FLASER==2 |
> > >  is.na(b$PLASER) |
> > >  is.na(b$PLASER) & b$FLASER %in% 1:2 |
> > >  is.na(b$FLASER) & b$PLASER == 2,
> > >  2, 1)
> > > table(b$pheno, exclude = NULL)
> > >
> > > will do it:
> > >
> > > 1 2
> > > 2 7
> > >
> > > Best,
> > > Rasmus
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if else statement adjustemtn

2020-06-12 Thread Ana Marija

Hi Rasmus,

thank you for getting back to be, the command your provided seems to
add all 11 NAs to 2s
> b$pheno <-
+   ifelse(b$PLASER==2 |
+  b$FLASER==2 |
+  is.na(b$PLASER) |
+  is.na(b$PLASER) & b$FLASER %in% 1:2 |
+  is.na(b$FLASER) & b$PLASER == 2,
+  2, 1)
> table(b$pheno, exclude = NULL)

  1   2
859 839

Once again my desired results is to keep these 7 NAs as NAs
> table(b$PLASER,b$FLASER, exclude = NULL)

 1   2   3 
  1836  14   00
  2691  70  432
  3  2   7  210
 4   1   07

and have
825 2s (825=691+14+70+7+43)
and the rest would be 1s (866=1698-7-825)

On Fri, Jun 12, 2020 at 9:29 PM Rasmus Liland  wrote:
>
> On 2020-06-13 11:30 +1000, Jim Lemon wrote:
> > On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon wrote:
> > > On Sat, Jun 13, 2020 at 10:46 AM Ana Marija wrote:
> > > >
> > > > I am trying to make a new column
> > > > "pheno" so that I reduce the number
> > > > of NAs
> > >
> > > it looks like those two NA values in
> > > PLASER are the ones you want to drop.
> >
> > From just your summary table, it's hard to
> > guess the distribution of NA values.
>
> Dear Ana,
>
> This small sample
>
> b <- read.table(text="FLASER;PLASER
> 1;2
> ;2
> ;
> 1;
> 2;
> 2;2
> 3;2
> 3;3
> 1;1", sep=";", header=TRUE)
>
> table(b$PLASER,b$FLASER, exclude = NULL)
>
> yields the same combinations you showed
> earlier:
>
>1 2 3 
>   11 0 00
>   21 1 11
>   30 0 10
>1 1 01
>
> If you want to eliminate the four -based
> combinations completely, this line
>
> b$pheno <-
>   ifelse(b$PLASER==2 |
>  b$FLASER==2 |
>  is.na(b$PLASER) |
>  is.na(b$PLASER) & b$FLASER %in% 1:2 |
>  is.na(b$FLASER) & b$PLASER == 2,
>  2, 1)
> table(b$pheno, exclude = NULL)
>
> will do it:
>
> 1 2
> 2 7
>
> Best,
> Rasmus
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if else statement adjustemtn

2020-06-12 Thread Ana Marija

Hi Jim,

I tried it:
> b$pheno<-ifelse(b$PLASER==2 | b$FLASER==2 |is.na(b$PLASER) & b$FLASER == 
> 2,2,1)
> table(b$pheno,exclude = NULL)

   12 
 859  828   11
> b$pheno<-ifelse(b$PLASER==2 | b$FLASER==2 |is.na(b$FLASER) & b$PLASER == 
> 2,2,1)
> table(b$pheno,exclude = NULL)

   12 
 859  828   11

Am I am doing something wrong?

Thanks
Ana

On Fri, Jun 12, 2020 at 8:06 PM Jim Lemon  wrote:
>
> Hi Ana,
> From your desired result, it looks like those two NA values in PLASER
> are the ones you want to drop.
> If so, try this:
>
> b$pheno<-ifelse(b$PLASER==2 | b$FLASER==2 |
>  is.na(b$PLASER) & b$FLASER == 2,2,1)
>
> and if I have it the wrong way round, swap FLASER and PLASER in the
> bit I have added.
>
> Jim
>
> On Sat, Jun 13, 2020 at 10:46 AM Ana Marija  
> wrote:
> >
> > Hello
> >
> > I have a data frame like this:
> >
> > > head(b)
> >FID   IID FLASER PLASER
> > 1: fam1000 G1000  1  1
> > 2: fam1001 G1001  1  1
> > 3: fam1003 G1003  1  2
> > 4: fam1005 G1005  1  1
> > 5: fam1009 G1009  1  1
> > 6: fam1052 G1052  1  1
> > ...
> >
> > > table(b$PLASER,b$FLASER, exclude = NULL)
> >
> >  1   2   3 
> >   1836  14   00
> >   2691  70  432
> >   3  2   7  210
> >  4   1   07
> >
> > I am trying to make a new column "pheno" so that I reduce the number of NAs
> >
> > right now I am doing:
> >
> > > b$pheno=ifelse(b$PLASER==2 | b$FLASER==2,2,1)
> > > table(b$pheno, exclude = NULL)
> >
> >12 
> >  859  828   11
> >
> > I would like to reduce this number of NAs to be 7
> > so I would like to have in "pheno column"
> > 7 NAs
> > 825 2s (825=691+14+70+7+43)
> > and the rest would be 1s (866=1698-7-825)
> >
> > How can I change the above command to get these numbers?
> >
> > Thanks
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] if else statement adjustemtn

2020-06-12 Thread Ana Marija

Hello

I have a data frame like this:

> head(b)
   FID   IID FLASER PLASER
1: fam1000 G1000  1  1
2: fam1001 G1001  1  1
3: fam1003 G1003  1  2
4: fam1005 G1005  1  1
5: fam1009 G1009  1  1
6: fam1052 G1052  1  1
...

> table(b$PLASER,b$FLASER, exclude = NULL)

 1   2   3 
  1836  14   00
  2691  70  432
  3  2   7  210
 4   1   07

I am trying to make a new column "pheno" so that I reduce the number of NAs

right now I am doing:

> b$pheno=ifelse(b$PLASER==2 | b$FLASER==2,2,1)
> table(b$pheno, exclude = NULL)

   12 
 859  828   11

I would like to reduce this number of NAs to be 7
so I would like to have in "pheno column"
7 NAs
825 2s (825=691+14+70+7+43)
and the rest would be 1s (866=1698-7-825)

How can I change the above command to get these numbers?

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stack two Stack manhattan plots?

2020-06-11 Thread Ana Marija

Thank you so much it is getting better (see attach) when I do:
ggplot( data = tmp.tidy) +geom_point( aes(y = -log10(value),x =
CHR,color=key) ,position = "jitter", size=0.5 )

is there is a way to have separation between every chromosome shown
better and also every number of chromosome shown on the x-axis?

On Thu, Jun 11, 2020 at 4:39 PM  wrote:
>
> Your dots are too big!
>
> Add
>
> geom_points(... , size = 1
>
> May need to play... 0.5 or 0.1?
>
> On 11 Jun 2020 22:26, Ana Marija  wrote:
>
> I tried it,
> ggplot( data = tmp.tidy) +geom_point( aes(y = BP,x = CHR,color=key)
> ,position = "jitter" )
> I got the attached
>
> On Thu, Jun 11, 2020 at 4:18 PM  wrote:
> >
> > Try adding
> > position = "jitter" to the geom_point(...
> >
> >
> >
> > On 11 Jun 2020 21:41, Ana Marija  wrote:
> >
> > Hello,
> >
> > I tried your code and this is what I got
> >
> > I really need two groups side by side shown per chromosome as it is here:
> > https://imgur.com/a/pj40c
> > on the image there are 4 groups I do have only two
> >
> > On Thu, Jun 11, 2020 at 11:52 AM  wrote:
> > >
> > > On 2020-06-11 15:59, Ana Marija wrote:
> > > > yes all in one plot.
> > > > So I want key (and therefore color)to be "Pold" and "Pnew" as those I
> > > > am comparing per CHR
> > > > so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis)
> > > > On the end x-axis would have two strikes of Pold and Pnew (different
> > > > colors) per one chromosome, and CHR would go from 1 to 22
> > > >
> > >
> > >
> > > ggplot( data = tmp.tidy) +
> > > geom_point( aes(
> > >  y = BP,
> > >  x = CHR,
> > >  color=key) )
> > >
> > > ?
> >
> >
>
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stack two Stack manhattan plots?

2020-06-11 Thread Ana Marija

I tried it,
ggplot( data = tmp.tidy) +geom_point( aes(y = BP,x = CHR,color=key)
,position = "jitter" )
I got the attached

On Thu, Jun 11, 2020 at 4:18 PM  wrote:
>
> Try adding
> position = "jitter" to the geom_point(...
>
>
>
> On 11 Jun 2020 21:41, Ana Marija  wrote:
>
> Hello,
>
> I tried your code and this is what I got
>
> I really need two groups side by side shown per chromosome as it is here:
> https://imgur.com/a/pj40c
> on the image there are 4 groups I do have only two
>
> On Thu, Jun 11, 2020 at 11:52 AM  wrote:
> >
> > On 2020-06-11 15:59, Ana Marija wrote:
> > > yes all in one plot.
> > > So I want key (and therefore color)to be "Pold" and "Pnew" as those I
> > > am comparing per CHR
> > > so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis)
> > > On the end x-axis would have two strikes of Pold and Pnew (different
> > > colors) per one chromosome, and CHR would go from 1 to 22
> > >
> >
> >
> > ggplot( data = tmp.tidy) +
> > geom_point( aes(
> >  y = BP,
> >  x = CHR,
> >  color=key) )
> >
> > ?
>
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stack two Stack manhattan plots?

2020-06-11 Thread Ana Marija

Hello,

I tried your code and this is what I got

I really need two groups side by side shown per chromosome as it is here:
https://imgur.com/a/pj40c
on the image there are 4 groups I do have only two

On Thu, Jun 11, 2020 at 11:52 AM  wrote:
>
> On 2020-06-11 15:59, Ana Marija wrote:
> > yes all in one plot.
> > So I want key (and therefore color)to be "Pold" and "Pnew" as those I
> > am comparing per CHR
> > so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis)
> > On the end x-axis would have two strikes of Pold and Pnew (different
> > colors) per one chromosome, and CHR would go from 1 to 22
> >
>
>
> ggplot( data = tmp.tidy) +
> geom_point( aes(
>  y = BP,
>  x = CHR,
>  color=key) )
>
> ?
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stack two Stack manhattan plots?

2020-06-11 Thread Ana Marija

yes all in one plot.
So I want key (and therefore color)to be "Pold" and "Pnew" as those I
am comparing per CHR
so I used facet_wrap(~CHR) to create a graph per chromosome (on x-axis)
On the end x-axis would have two strikes of Pold and Pnew (different
colors) per one chromosome, and CHR would go from 1 to 22

On Thu, Jun 11, 2020 at 9:26 AM  wrote:
>
> On 2020-06-11 14:54, Ana Marija wrote:
> > Hello,
> >
> > I expected it to look like this:
> > https://imgur.com/a/pj40c
> >
>
> Ah - so all on the one plot? - so you don't want a facet. It puts two
> plots side by side (or 22)
>
> > where x-axis would be CHR, there is 22 of them
> >> unique(tmp.tidy$CHR)
> >  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
> >
>
> You have 22 plots all appearing side by size as part of the facet
>
> I think you want color=CHR - but I don't know what your current color
> setting is doing.
>
> > I also have two phenotypes (keys) which I would like to compare
> >> unique(tmp.tidy$key)
> > [1] "Pold" "Pnew"
> >
>
> So did you want to Facet those?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stack two Stack manhattan plots?

2020-06-11 Thread Ana Marija

Hello,

I expected it to look like this:
https://imgur.com/a/pj40c

where x-axis would be CHR, there is 22 of them
> unique(tmp.tidy$CHR)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22

I also have two phenotypes (keys) which I would like to compare
> unique(tmp.tidy$key)
[1] "Pold" "Pnew"

> dim(tmp.tidy)
[1] 2600184   4

I would like the x-axis to be separated by CHR

> sapply(tmp.tidy,class)
CHR  BP key   value
  "integer"   "integer" "character"   "numeric"

> str(tmp.tidy)
'data.frame':2600184 obs. of  4 variables:
 $ CHR  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ BP   : int  785989 1130727 1156131 1158631 1211292 1478153 1500941
1506035 1510801 1721479 ...
 $ key  : chr  "Pold" "Pold" "Pold" "Pold" ...
 $ value: num  0.952 0.475 0.529 0.255 0.295 ...

Unfortunately qqman doesn't do this kind of overlay of two plots

On Wed, Jun 10, 2020 at 11:24 PM John  wrote:
>
> On Wed, 10 Jun 2020 15:36:11 -0500
> Ana Marija  wrote:
>
> > Hello,
> >
> > I have a data frame like this:
> >
> > > head(tmp1)
> >   CHR  BP   PoldPnew
> > 1   1  785989 0.9521 0.09278
> > 2   1 1130727 0.4750 0.19010
> > 3   1 1156131 0.5289 0.48520
> > 4   1 1158631 0.2554 0.18140
> > 5   1 1211292 0.2954 0.48590
> > 6   1 1478153 0.5542 0.68790
> > ...
> >
> > I did:
> > tmp.tidy <- tmp1 %>% gather(key, value, -BP, -CHR)
> > jpeg("over.jpeg")
> > ggplot(tmp.tidy, aes(BP, value, color=key)) + geom_point() +
> > facet_wrap(~CHR, nrow=1)
> > dev.off()
> >
> > but I got this plot in attach which doesn't make sense. Can you please
> > advise how to make this plot?
> >
> > thanks
> > Ana
>
> If you would, the str() output might help people understand what is
> happening, and also how many records you're looking at.  The head()
> output is a bit thin on information.  There are various manhattan plot
> packages for R including a specialized package for ggplot2.
>
> JWDougherty

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to stack two Stack manhattan plots?

2020-06-10 Thread Ana Marija

HI Jim

I run it like this:
Rscript --no-save Manhattan_plot.R

and just in case I added: stringsAsFactors=FALSE

so the script looks like this:
library(qqman)
results_log <- read.table("logistic_results_M3.assoc.logistic.C",
head=TRUE,stringsAsFactors=FALSE)
jpeg("M3.assoc.logistic.jpeg")
manhattan(results_log,chr="CHR",bp="BP",p="P",snp="SNP", main =
"Manhattan plot:logistic")
dev.off()

this code does work with another data set, the jpeg() does work. I
will try now with PNG

On Wed, Jun 10, 2020 at 5:17 PM Jim Lemon  wrote:
>
> Hi Ana,
> The problem may be that the JPEG device doesn't handle transparency.
> Perhaps PNG?
>
> Jim
>
> On Thu, Jun 11, 2020 at 6:48 AM Ana Marija  
> wrote:
> >
> > Hello,
> >
> > I have a data frame like this:
> >
> > > head(tmp1)
> >   CHR  BP   PoldPnew
> > 1   1  785989 0.9521 0.09278
> > 2   1 1130727 0.4750 0.19010
> > 3   1 1156131 0.5289 0.48520
> > 4   1 1158631 0.2554 0.18140
> > 5   1 1211292 0.2954 0.48590
> > 6   1 1478153 0.5542 0.68790
> > ...
> >
> > I did:
> > tmp.tidy <- tmp1 %>% gather(key, value, -BP, -CHR)
> > jpeg("over.jpeg")
> > ggplot(tmp.tidy, aes(BP, value, color=key)) + geom_point() +
> > facet_wrap(~CHR, nrow=1)
> > dev.off()
> >
> > but I got this plot in attach which doesn't make sense. Can you please
> > advise how to make this plot?
> >
> > thanks
> > Ana
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in plot.window(...) : need finite 'ylim' values

2020-06-10 Thread Ana Marija

Hello,

I do have a file like this:
head M3.assoc.logistic.C
CHR SNP BP P
1 1:785989:T:C 785989 0.4544
1 1:785989:T:C 785989 0.689
1 1:1130727:A:C 1130727 0.05068
1 1:1130727:A:C 1130727 0.07381
1 1:1156131:T:C 1156131 0.6008
1 1:1156131:T:C 1156131 0.8685
...

And I don't have any "NA" or "inf" values in it

and I am plotting it in R via:

library(qqman)
results_log <- read.table("M3.assoc.logistic.C", head=TRUE)
jpeg("Logistic_manhattan_retinopathy_M3.jpeg")
manhattan(results_log,chr="CHR",bp="BP",p="P",snp="SNP", main =
"Manhattan plot: logistic")
dev.off()

but I am getting:

Error in plot.window(...) : need finite 'ylim' values
Calls: manhattan ... do.call -> plot -> plot.default -> localWindow ->
plot.window
Execution halted

Please advise,

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to filter variables which appear in any row but do not include

2020-06-03 Thread Ana Marija

Hi Rui,

thank you so much, that is exactly what I needed!

Cheers,
Ana

On Wed, Jun 3, 2020 at 11:50 AM Rui Barradas  wrote:
>
> Hello,
>
> If you want to filter out rows with any of the values in a 'unwanted'
> vector, try the following.
>
> First, create a test data set.
>
> x <- scan(what = character(), text = '
> "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
> "E107" "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112"
> "E117"
> ')
>
> set.seed(2020)
> dat <- replicate(5, sample(x, 20, TRUE))
> dat <- as.data.frame(dat)
>
>
> Now, remove all rows that have at least one of "E102" or "E112"
>
>
> unwanted <- c("E102", "E112")
> no <- sapply(dat, function(x){
>grepl(paste(unwanted, collapse = "|"), x)
> })
> no <- apply(no, 1, any)
> dat[!no, ]
>
>
> That's it, if I understood the problem.
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 15:55 de 03/06/20, Ana Marija escreveu:
> > Hello.
> >
> > I am trying to filter only rows that have ANY of these variables:
> > E109, E119, E149
> >
> > so I did:
> > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
> >
> > than I checked what I got:
> >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
> >> d0=unlist(s0)
> >> d10=unique(d0)
> >> d10
> >   [1] "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
> > [11] "E107"
> > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
> > d1=unlist(s1)
> > d11=unique(d1)
> >> d11
> >   [1] "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112"
> > [11] "E117"
> >
> > I need help with changing this command
> > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
> >
> > so that in the output I do not have any rows that include E102 or E112?
> >
> > Thanks
> > Ana
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to filter variables which appear in any row but do not include

2020-06-03 Thread Ana Marija

Hi Bert

The issue is that I have around 2000 columns so I can not be checking if
those two are not present in each column of any row “by hand” so to
speakAnd I need my output to be a data frame where neither E102 nor
E112 are present. Basically from the data frame columns that I already
created just remove any row that contains any of those variables.

Thanks
Ana

On Wed, 3 Jun 2020 at 11:00, Bert Gunter  wrote:

> I suggest that you forget all that fancy stuff  (and this is not a use
> case for regular expressions).
> Use %in%  with logical subscripting instead -- basic R functionality that
> can be found in any good R tutorial.
>
> > x <- c("ab","bc","cd")
> > x[x %in% c("ab","cd")]
> [1] "ab" "cd"
> > x[!x %in% c("ab","cd")]
> [1] "bc"
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, Jun 3, 2020 at 7:56 AM Ana Marija 
> wrote:
>
>> Hello.
>>
>> I am trying to filter only rows that have ANY of these variables:
>> E109, E119, E149
>>
>> so I did:
>> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
>>
>> than I checked what I got:
>> > s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
>> > d0=unlist(s0)
>> > d10=unique(d0)
>> > d10
>>  [1] "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
>> [11] "E107"
>> s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
>> d1=unlist(s1)
>> d11=unique(d1)
>> > d11
>>  [1] "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112"
>> [11] "E117"
>>
>> I need help with changing this command
>> controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
>>
>> so that in the output I do not have any rows that include E102 or E112?
>>
>> Thanks
>> Ana
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to filter variables which appear in any row but do not include

2020-06-03 Thread Ana Marija

Hello.

I am trying to filter only rows that have ANY of these variables:
E109, E119, E149

so I did:
controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))

than I checked what I got:
> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
> d0=unlist(s0)
> d10=unique(d0)
> d10
 [1] "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
[11] "E107"
s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
d1=unlist(s1)
d11=unique(d1)
> d11
 [1] "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116" "E112"
[11] "E117"

I need help with changing this command
controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))

so that in the output I do not have any rows that include E102 or E112?

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?

2020-06-01 Thread Ana Marija

Hi Jim,

not in this case, but thanks for asking!

Ana

On Mon, Jun 1, 2020 at 10:04 PM Jim Lemon  wrote:
>
> So recombination sticks out its foot before us. Do you want to account
> for gene linkage?
>
> JIm
>
> On Tue, Jun 2, 2020 at 11:55 AM Ana Marija  
> wrote:
> >
> > Hi Jim
> >
> > > neu3<-neu1[!(neu1$Marker %in% Marker3),]
> > > dim(neu3)
> > [1] 18579
> > > nep3<-nep1[!(nep1$Marker %in% Marker3),]
> > > dim(nep3)
> > [1] 55629
> > > ret3<-ret1[!(ret1$Marker %in% Marker3),]
> > > dim(ret3)
> > [1] 34939
> >
> >
> > If I do:
> >
> >  nn1<-merge(neu1,nep1,by=c("Marker","Chr"))
> > nn2<-merge(nn1,ret1,by=c("Marker","Chr"))
> > > Marker3<-nn2$Marker
> > > length(Marker3)
> > [1] 3742962
> > > Marker4<-nn1$Marker
> > > length(Marker4)
> > [1] 373
> >
> > On Mon, Jun 1, 2020 at 8:50 PM Ana Marija  
> > wrote:
> > >
> > > Hi David,
> > >
> > > that is a great point!
> > > Yes indeed some are non unique:
> > >
> > > > dim(neu1)
> > > [1] 3742845   9
> > > > length(unique(neu1$Marker))
> > > [1] 3741858
> > > > length(unique(nep1$Marker))
> > > [1] 3745560
> > > > dim(nep1)
> > > [1] 3746550   9
> > > > length(unique(ret1$Marker))
> > > [1] 3743494
> > > > dim(ret1)
> > > [1] 3743494   9
> > >
> > > How would I rewrite this code so that is merging by Chr and Marker
> > > column? It seems that a Marker can be under a few Chr.
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jun 1, 2020 at 8:41 PM David Winsemius  
> > > wrote:
> > > >
> > > >
> > > > On 6/1/20 5:40 PM, Ana Marija wrote:
> > > > > Hi Jim,
> > > > >
> > > > > thank you so much for getting back to me. I tried your code and this 
> > > > > is
> > > > > what I get:
> > > > >> dim(neu2)
> > > > > [1] 3740988   9
> > > > >> dim(nep2)
> > > > > [1] 3740988   9
> > > > >> dim(ret2)
> > > > > [1] 3740001   9
> > > > >
> > > > > I think I would need to have the same number of lines in all 3 data 
> > > > > frames.
> > > > >
> > > > > Can you please advise.
> > > >
> > > >
> > > > You should check for duplicated Marker values.
> > > >
> > > >
> > > > --
> > > >
> > > > David
> > > >
> > > > >
> > > > > Cheers
> > > > > Ana
> > > > >
> > > > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon  wrote:
> > > > >
> > > > >> Hi Ana,
> > > > >> Not too hard, but your example has all the "marker" fields in common.
> > > > >> So using a sample that will show the expected result:
> > > > >>
> > > > >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2 Direction  pValue N
> > > > >>   1 10012 1:10012:G:T 0.229925  T  G  + 0.650403 1594
> > > > >>   1 10827 1:10827:C:T 0.287014  T  C  + 0.955449 1594
> > > > >>   1 12713 1:12713:C:T 0.097867  T  C  - 0.290455 1594
> > > > >>   1 12882 1:12882:T:G 0.287014  G  T  + 0.955449 1594
> > > > >>   1 12991 1:12991:G:A 0.097867  A  G  - 0.290455 1594
> > > > >>   1 14726 1:14726:G:A 0.132058  A  G  + 0.115005 1594",
> > > > >>   header=TRUE,stringsAsFactors=FALSE)
> > > > >>
> > > > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N
> > > > >>   1 10012 1:10012:G:T 0.2300430 T  G - 0.1420030 1641
> > > > >>   1 10827 1:10827:C:T 0.2867150 T  C - 0.2045580 1641
> > > > >>   1 12713 1:12713:C:T 0.0975015 T  C - 0.007 1641
> > > > >>   1 12882 1:12882:T:G 0.2867150 G  T - 0.2045580 1641
> > > > >>   1 12991 1:12991:G:A 0.0975015 A  G - 0.007 1641
> > > > >>   1 14726 1:14727:G:A 0.1325410 A  G - 0.8725660 1641",
> > > > >>   header=TRUE,stringsAsFac

Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?

2020-06-01 Thread Ana Marija

Hi Jim

> neu3<-neu1[!(neu1$Marker %in% Marker3),]
> dim(neu3)
[1] 18579
> nep3<-nep1[!(nep1$Marker %in% Marker3),]
> dim(nep3)
[1] 55629
> ret3<-ret1[!(ret1$Marker %in% Marker3),]
> dim(ret3)
[1] 34939


If I do:

 nn1<-merge(neu1,nep1,by=c("Marker","Chr"))
nn2<-merge(nn1,ret1,by=c("Marker","Chr"))
> Marker3<-nn2$Marker
> length(Marker3)
[1] 3742962
> Marker4<-nn1$Marker
> length(Marker4)
[1] 373

On Mon, Jun 1, 2020 at 8:50 PM Ana Marija  wrote:
>
> Hi David,
>
> that is a great point!
> Yes indeed some are non unique:
>
> > dim(neu1)
> [1] 3742845   9
> > length(unique(neu1$Marker))
> [1] 3741858
> > length(unique(nep1$Marker))
> [1] 3745560
> > dim(nep1)
> [1] 3746550   9
> > length(unique(ret1$Marker))
> [1] 3743494
> > dim(ret1)
> [1] 3743494   9
>
> How would I rewrite this code so that is merging by Chr and Marker
> column? It seems that a Marker can be under a few Chr.
>
>
>
>
>
> On Mon, Jun 1, 2020 at 8:41 PM David Winsemius  wrote:
> >
> >
> > On 6/1/20 5:40 PM, Ana Marija wrote:
> > > Hi Jim,
> > >
> > > thank you so much for getting back to me. I tried your code and this is
> > > what I get:
> > >> dim(neu2)
> > > [1] 3740988   9
> > >> dim(nep2)
> > > [1] 3740988   9
> > >> dim(ret2)
> > > [1] 3740001   9
> > >
> > > I think I would need to have the same number of lines in all 3 data 
> > > frames.
> > >
> > > Can you please advise.
> >
> >
> > You should check for duplicated Marker values.
> >
> >
> > --
> >
> > David
> >
> > >
> > > Cheers
> > > Ana
> > >
> > > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon  wrote:
> > >
> > >> Hi Ana,
> > >> Not too hard, but your example has all the "marker" fields in common.
> > >> So using a sample that will show the expected result:
> > >>
> > >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2 Direction  pValue N
> > >>   1 10012 1:10012:G:T 0.229925  T  G  + 0.650403 1594
> > >>   1 10827 1:10827:C:T 0.287014  T  C  + 0.955449 1594
> > >>   1 12713 1:12713:C:T 0.097867  T  C  - 0.290455 1594
> > >>   1 12882 1:12882:T:G 0.287014  G  T  + 0.955449 1594
> > >>   1 12991 1:12991:G:A 0.097867  A  G  - 0.290455 1594
> > >>   1 14726 1:14726:G:A 0.132058  A  G  + 0.115005 1594",
> > >>   header=TRUE,stringsAsFactors=FALSE)
> > >>
> > >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N
> > >>   1 10012 1:10012:G:T 0.2300430 T  G - 0.1420030 1641
> > >>   1 10827 1:10827:C:T 0.2867150 T  C - 0.2045580 1641
> > >>   1 12713 1:12713:C:T 0.0975015 T  C - 0.007 1641
> > >>   1 12882 1:12882:T:G 0.2867150 G  T - 0.2045580 1641
> > >>   1 12991 1:12991:G:A 0.0975015 A  G - 0.007 1641
> > >>   1 14726 1:14727:G:A 0.1325410 A  G - 0.8725660 1641",
> > >>   header=TRUE,stringsAsFactors=FALSE)
> > >>
> > >> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction   pValue N
> > >>   1 10012 1:10012:G:T 0.2322760 T  G - 0.230383 1608
> > >>   1 10827 1:10827:C:T 0.2882460 T  C - 0.120356 1608
> > >>   1 12713 1:12713:C:T 0.0982587 T  C - 0.272936 1608
> > >>   1 12882 1:12882:T:G 0.2882460 G  T - 0.120356 1608
> > >>   1 12991 1:12992:G:A 0.0982587 A  G - 0.272936 1608
> > >>   1 14726 1:14727:G:A 0.1340170 A  G - 0.594538 1608",
> > >> header=TRUE,stringsAsFactors=FALSE)
> > >>
> > >> # merge the three data frames on "Marker"
> > >> nn1<-merge(neu1,nep1,by="Marker")
> > >> nn2<-merge(nn1,ret1,by="Marker")
> > >> # get the common "Marker" strings
> > >> Marker3<-nn2$Marker
> > >> # subset all three data frames on Marker3
> > >> neu2<-neu1[neu1$Marker %in% Marker3,]
> > >> nep2<-nep1[nep1$Marker %in% Marker3,]
> > >> ret2<-ret1[ret1$Marker %in% Marker3,]
> > >>
> > >> Jim
> > >>
> > >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija 
> > >> wrote:
> > >>> Hello,
> >

Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?

2020-06-01 Thread Ana Marija

Hi David,

that is a great point!
Yes indeed some are non unique:

> dim(neu1)
[1] 3742845   9
> length(unique(neu1$Marker))
[1] 3741858
> length(unique(nep1$Marker))
[1] 3745560
> dim(nep1)
[1] 3746550   9
> length(unique(ret1$Marker))
[1] 3743494
> dim(ret1)
[1] 3743494   9

How would I rewrite this code so that is merging by Chr and Marker
column? It seems that a Marker can be under a few Chr.





On Mon, Jun 1, 2020 at 8:41 PM David Winsemius  wrote:
>
>
> On 6/1/20 5:40 PM, Ana Marija wrote:
> > Hi Jim,
> >
> > thank you so much for getting back to me. I tried your code and this is
> > what I get:
> >> dim(neu2)
> > [1] 3740988   9
> >> dim(nep2)
> > [1] 3740988   9
> >> dim(ret2)
> > [1] 3740001   9
> >
> > I think I would need to have the same number of lines in all 3 data frames.
> >
> > Can you please advise.
>
>
> You should check for duplicated Marker values.
>
>
> --
>
> David
>
> >
> > Cheers
> > Ana
> >
> > On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon  wrote:
> >
> >> Hi Ana,
> >> Not too hard, but your example has all the "marker" fields in common.
> >> So using a sample that will show the expected result:
> >>
> >> neu1<-read.table(text="Chr BP Marker  MAF A1 A2 Direction  pValue N
> >>   1 10012 1:10012:G:T 0.229925  T  G  + 0.650403 1594
> >>   1 10827 1:10827:C:T 0.287014  T  C  + 0.955449 1594
> >>   1 12713 1:12713:C:T 0.097867  T  C  - 0.290455 1594
> >>   1 12882 1:12882:T:G 0.287014  G  T  + 0.955449 1594
> >>   1 12991 1:12991:G:A 0.097867  A  G  - 0.290455 1594
> >>   1 14726 1:14726:G:A 0.132058  A  G  + 0.115005 1594",
> >>   header=TRUE,stringsAsFactors=FALSE)
> >>
> >> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N
> >>   1 10012 1:10012:G:T 0.2300430 T  G - 0.1420030 1641
> >>   1 10827 1:10827:C:T 0.2867150 T  C - 0.2045580 1641
> >>   1 12713 1:12713:C:T 0.0975015 T  C - 0.007 1641
> >>   1 12882 1:12882:T:G 0.2867150 G  T - 0.2045580 1641
> >>   1 12991 1:12991:G:A 0.0975015 A  G - 0.007 1641
> >>   1 14726 1:14727:G:A 0.1325410 A  G - 0.8725660 1641",
> >>   header=TRUE,stringsAsFactors=FALSE)
> >>
> >> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction   pValue N
> >>   1 10012 1:10012:G:T 0.2322760 T  G - 0.230383 1608
> >>   1 10827 1:10827:C:T 0.2882460 T  C - 0.120356 1608
> >>   1 12713 1:12713:C:T 0.0982587 T  C - 0.272936 1608
> >>   1 12882 1:12882:T:G 0.2882460 G  T - 0.120356 1608
> >>   1 12991 1:12992:G:A 0.0982587 A  G - 0.272936 1608
> >>   1 14726 1:14727:G:A 0.1340170 A  G - 0.594538 1608",
> >> header=TRUE,stringsAsFactors=FALSE)
> >>
> >> # merge the three data frames on "Marker"
> >> nn1<-merge(neu1,nep1,by="Marker")
> >> nn2<-merge(nn1,ret1,by="Marker")
> >> # get the common "Marker" strings
> >> Marker3<-nn2$Marker
> >> # subset all three data frames on Marker3
> >> neu2<-neu1[neu1$Marker %in% Marker3,]
> >> nep2<-nep1[nep1$Marker %in% Marker3,]
> >> ret2<-ret1[ret1$Marker %in% Marker3,]
> >>
> >> Jim
> >>
> >> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija 
> >> wrote:
> >>> Hello,
> >>>
> >>> I have 3 data frames which have about 3.4 mill lines (but they don't have
> >>> exactly the same number of lines)...they look like this:
> >>> ...
> >>> Is there is a way to create another 3 data frames, say neu2, nep2, ret2
> >>> which would only contain lines that have the same entries in Marker
> >> column
> >>> for all 3 data frames?
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>>  [[alternative HTML version deleted]]
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is there is a way to extract lines in between 3 files that are in common based on one column?

2020-06-01 Thread Ana Marija

Hi Jim,

thank you so much for getting back to me. I tried your code and this is
what I get:
> dim(neu2)
[1] 3740988   9
> dim(nep2)
[1] 3740988   9
> dim(ret2)
[1] 3740001   9

I think I would need to have the same number of lines in all 3 data frames.

Can you please advise.

Cheers
Ana

On Mon, Jun 1, 2020 at 7:31 PM Jim Lemon  wrote:

> Hi Ana,
> Not too hard, but your example has all the "marker" fields in common.
> So using a sample that will show the expected result:
>
> neu1<-read.table(text="Chr BP Marker  MAF A1 A2 Direction  pValue N
>  1 10012 1:10012:G:T 0.229925  T  G  + 0.650403 1594
>  1 10827 1:10827:C:T 0.287014  T  C  + 0.955449 1594
>  1 12713 1:12713:C:T 0.097867  T  C  - 0.290455 1594
>  1 12882 1:12882:T:G 0.287014  G  T  + 0.955449 1594
>  1 12991 1:12991:G:A 0.097867  A  G  - 0.290455 1594
>  1 14726 1:14726:G:A 0.132058  A  G  + 0.115005 1594",
>  header=TRUE,stringsAsFactors=FALSE)
>
> nep1<-read.table(text="Chr BP Marker MAF A1 A2 DirectionpValue N
>  1 10012 1:10012:G:T 0.2300430 T  G - 0.1420030 1641
>  1 10827 1:10827:C:T 0.2867150 T  C - 0.2045580 1641
>  1 12713 1:12713:C:T 0.0975015 T  C - 0.007 1641
>  1 12882 1:12882:T:G 0.2867150 G  T - 0.2045580 1641
>  1 12991 1:12991:G:A 0.0975015 A  G - 0.007 1641
>  1 14726 1:14727:G:A 0.1325410 A  G - 0.8725660 1641",
>  header=TRUE,stringsAsFactors=FALSE)
>
> ret1<-read.table(text="Chr BP Marker MAF A1 A2 Direction   pValue N
>  1 10012 1:10012:G:T 0.2322760 T  G - 0.230383 1608
>  1 10827 1:10827:C:T 0.2882460 T  C - 0.120356 1608
>  1 12713 1:12713:C:T 0.0982587 T  C - 0.272936 1608
>  1 12882 1:12882:T:G 0.2882460 G  T - 0.120356 1608
>  1 12991 1:12992:G:A 0.0982587 A  G - 0.272936 1608
>  1 14726 1:14727:G:A 0.1340170 A  G - 0.594538 1608",
> header=TRUE,stringsAsFactors=FALSE)
>
> # merge the three data frames on "Marker"
> nn1<-merge(neu1,nep1,by="Marker")
> nn2<-merge(nn1,ret1,by="Marker")
> # get the common "Marker" strings
> Marker3<-nn2$Marker
> # subset all three data frames on Marker3
> neu2<-neu1[neu1$Marker %in% Marker3,]
> nep2<-nep1[nep1$Marker %in% Marker3,]
> ret2<-ret1[ret1$Marker %in% Marker3,]
>
> Jim
>
> On Tue, Jun 2, 2020 at 7:50 AM Ana Marija 
> wrote:
> >
> > Hello,
> >
> > I have 3 data frames which have about 3.4 mill lines (but they don't have
> > exactly the same number of lines)...they look like this:
> > ...
> > Is there is a way to create another 3 data frames, say neu2, nep2, ret2
> > which would only contain lines that have the same entries in Marker
> column
> > for all 3 data frames?
> >
> > Thanks
> > Ana
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] is there is a way to extract lines in between 3 files that are in common based on one column?

2020-06-01 Thread Ana Marija

Hello,

I have 3 data frames which have about 3.4 mill lines (but they don't have
exactly the same number of lines)...they look like this:

> neu1=neu[order(neu$Marker),]
> head(neu1)
   ChrBP  Marker  MAF A1 A2 Direction   pValueN
209565   1 10012 1:10012:G:T 0.229925  T  G + 0.650403 1594
209566   1 10827 1:10827:C:T 0.287014  T  C + 0.955449 1594
209567   1 12713 1:12713:C:T 0.097867  T  C - 0.290455 1594
209568   1 12882 1:12882:T:G 0.287014  G  T + 0.955449 1594
209569   1 12991 1:12991:G:A 0.097867  A  G - 0.290455 1594
209570   1 14726 1:14726:G:A 0.132058  A  G + 0.115005 1594
> nep1=nep[order(nep$Marker),]
> head(nep1)
   ChrBP  Marker   MAF A1 A2 DirectionpValue
 N
209642   1 10012 1:10012:G:T 0.2300430  T  G - 0.1420030
1641
209643   1 10827 1:10827:C:T 0.2867150  T  C - 0.2045580
1641
209644   1 12713 1:12713:C:T 0.0975015  T  C - 0.007
1641
209645   1 12882 1:12882:T:G 0.2867150  G  T - 0.2045580
1641
209646   1 12991 1:12991:G:A 0.0975015  A  G - 0.007
1641
209647   1 14726 1:14726:G:A 0.1325410  A  G - 0.8725660
1641
> ret1=ret[order(ret$Marker),]
> head(ret1)
ChrBP  Marker   MAF A1 A2 Direction   pValue
 N
8654531 10012 1:10012:G:T 0.2322760  T  G - 0.230383
1608
4515961 10827 1:10827:C:T 0.2882460  T  C - 0.120356
1608
1026046   1 12713 1:12713:C:T 0.0982587  T  C - 0.272936
1608
4515971 12882 1:12882:T:G 0.2882460  G  T - 0.120356
1608
1026047   1 12991 1:12991:G:A 0.0982587  A  G - 0.272936
1608
2234642   1 14726 1:14726:G:A 0.1340170  A  G - 0.594538
1608

Is there is a way to create another 3 data frames, say neu2, nep2, ret2
which would only contain lines that have the same entries in Marker column
for all 3 data frames?

Thanks
Ana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to load data frame where numeric will be numeric instead of character

2020-06-01 Thread Ana Marija

HI David,

this is the problem:

> NEP <- read.table("gokind.nephropathy.fin",
header=T,stringsAsFactors=FALSE)
> sapply(NEP,class)
Chr  BP  Marker MAF  A1  A2
"character" "character" "character" "character" "character" "character"
  Direction  pValue   N

So even entries like Chr, BP, MAFare characters while they should be
numeric
> head(NEP)
  ChrBP   Marker  MAF A1 A2 Direction   pValueN
1  10 10625 10:10625:A:G   0.4156  G  A + 0.484813 1641
2  10 10645 10:10645:A:C 0.216027  C  A +  0.73597 1641


Can you please tell me what colClasses=colClassvec suppose to do?

Thanks
Ana

On Mon, Jun 1, 2020 at 4:13 PM David Winsemius 
wrote:

>
> On 6/1/20 1:37 PM, Ana Marija wrote:
> > Hello,
> >
> > I have a dataframe like this:
> >
> >ChrBP   Marker  MAF A1 A2 Direction   pValueN
> > 1  10 10625 10:10625:A:G 0.416562  G  A - 0.558228 1594
> > 2  10 10645 10:10645:A:C 0.215182  C  A - 0.880622 1594
> > ...
> >
> > which I load with:
> > NEU <- read.table("gokind.neuropathy.fin",
> header=T,stringsAsFactors=FALSE)
> >
> > and every column is numeric. How to say have all numeric ones stay
> numeric
> > like: Chr, BP, MAF, pValue, N
>
>
> I cannot figure out what the problem is. You say every column is
> numeric. It's not possible to have a column that contains the value
> "10:10625:A:G" be numeric.
>
>
> If you meant to say the every column was character, then the answer
> might be:
>
>
> colClassvec <- rep("numeric",9)
> colClassvec[ c(3,5:7)] <- "character"
>
> NEU <- read.table("gokind.neuropathy.fin",
> header=T,stringsAsFactors=FALSE, colClasses=colClassvec)
>
> --
> David.
>
> >
> > Thanks
> > Ana
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to load data frame where numeric will be numeric instead of character

2020-06-01 Thread Ana Marija

7th fileld, Direction contains only "+" and "-"


On Mon, Jun 1, 2020 at 3:46 PM Bert Gunter  wrote:

> I count 8 fields in your data and 9 names in your heading ??
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Jun 1, 2020 at 1:38 PM Ana Marija 
> wrote:
>
>> Hello,
>>
>> I have a dataframe like this:
>>
>>   ChrBP   Marker  MAF A1 A2 Direction   pValueN
>> 1  10 10625 10:10625:A:G 0.416562  G  A - 0.558228 1594
>> 2  10 10645 10:10645:A:C 0.215182  C  A - 0.880622 1594
>> ...
>>
>> which I load with:
>> NEU <- read.table("gokind.neuropathy.fin",
>> header=T,stringsAsFactors=FALSE)
>>
>> and every column is numeric. How to say have all numeric ones stay numeric
>> like: Chr, BP, MAF, pValue, N
>>
>> Thanks
>> Ana
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to load data frame where numeric will be numeric instead of character

2020-06-01 Thread Ana Marija

Hello,

I have a dataframe like this:

  ChrBP   Marker  MAF A1 A2 Direction   pValueN
1  10 10625 10:10625:A:G 0.416562  G  A - 0.558228 1594
2  10 10645 10:10645:A:C 0.215182  C  A - 0.880622 1594
...

which I load with:
NEU <- read.table("gokind.neuropathy.fin", header=T,stringsAsFactors=FALSE)

and every column is numeric. How to say have all numeric ones stay numeric
like: Chr, BP, MAF, pValue, N

Thanks
Ana

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to only show two numbers on bar_plot with ggplot

2020-05-22 Thread Ana Marija

I resolved it not elegantly with:

d=as.numeric(as.character(e$pheno))
ed<-ggplot(e) +
  geom_bar(aes(x = ESRD, fill =
factor(pheno,labels=c("control","case"+scale_fill_manual(values=c("#56B4E9","#E7B800"))+labs(fill="pheno")+scale_x_continuous(breaks
= unique(d))

ed

where:

> head(e)
  ESRD pheno
11 1
21 1
31 2
41 1
51 1

> sapply(e,class)
 ESRD pheno
"integer"  "factor"

On Fri, May 22, 2020 at 1:52 PM Ana Marija  wrote:
>
> Hello,
>
> I made the plot in attach via:
>
> ed<-ggplot(e) +
>   geom_bar(aes(x = ESRD, fill =
> factor(pheno,labels=c("control","case"+scale_fill_manual(values=c("#56B4E9","#E7B800"))+labs(fill="pheno")
>
> ed
>
> How do I show only 1 and 2 on x axis?
>
> Thanks
> Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to only show two numbers on bar_plot with ggplot

2020-05-22 Thread Ana Marija

Hello,

I made the plot in attach via:

ed<-ggplot(e) +
  geom_bar(aes(x = ESRD, fill =
factor(pheno,labels=c("control","case"+scale_fill_manual(values=c("#56B4E9","#E7B800"))+labs(fill="pheno")

ed

How do I show only 1 and 2 on x axis?

Thanks
Ana
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to show percentage of individuals for two groups on histogram?

2020-05-22 Thread Ana Marija

Hi Eric,

Thank you for getting back to me, I tried those solutions but they
don't do percentage per groups, so if I do
ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) +
geom_histogram(aes(y =

stat(density)), binwidth = 0.5) +
  scale_y_continuous(labels = scales::percent_format())

I am getting the plot in attach, while my results should be more in
this range like on the plot here:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs18.v2.p1&phv=19980&phd=154&pha=2864&pht=62&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1


On Fri, May 22, 2020 at 12:18 AM Eric Berger  wrote:
>
> Hi Ana,
> This is a very common question about ggplot.
> A quick search turns up lots of hits that answer your question. Here
> are a couple
> https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999
> https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables
>
> From reading those discussions, the following should work (untested)
>
> ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y =
> stat(density)), binwidth = 0.5) +
>   scale_y_continuous(labels = scales::percent_format())
>
> HTH,
> Eric
>
>
> On Fri, May 22, 2020 at 7:18 AM Jim Lemon  wrote:
> >
> > Hi Ana,
> > Just noticed a typo from a hasty cut-paste. Two lines should read:
> >
> > casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 2:08 PM Jim Lemon  wrote:
> > >
> > > Hi Ana,
> > > My apologies for the pedestrian graphics, but it may help.
> > >
> > > # a bit of fake data
> > > aafd<-data.frame(FID=paste0("fam",1000:2739),
> > >  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> > >  PLASER=c(rep(1,892),rep(2,848)),
> > >  DIABDUR=sample(10:50,1740,TRUE),
> > >  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> > >  pheno=c(rep("control",892),rep("case",848)))
> > > par(mfrow=c(2,1))
> > > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > > par(mar=c(0,4,1,2))
> > > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> > >  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > > text(mean(barpos),23,
> > >  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > > box()
> > > par(mar=c(3,4,0,2))
> > > barplot(100*controlhist,names.arg=names(controlpct),
> > >  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > > text(mean(barpos),23,
> > >  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > > box()
> > >
> > > Jim
> > >
> > > On Fri, May 22, 2020 at 9:08 AM Ana Marija  
> > > wrote:
> > > >
> > > > the result would basically look something like this on in attach or
> > > > the overlay of those two plots
> > > >
> > > >
> > > > On Thu, May 21, 2020 at 5:23 PM Ana Marija 
> > > >  wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I have a data frame like this:
> > > > > > head(a)
> > > > >  FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > > > 1 fam1000-03 G1000  1  1  38  10.21 control
> > > > > 2 fam1001-03 G1001  1  1  15   7.31 control
> > > > > 3 fam1003-03 G1003  1  2  17   7.01case
> > > > > 4 fam1005-03 G1005  1  1  36   7.71 control
> > > > > 5 fam1009-03 G1009  1  1  23   7.61 control
> > > > > 6 fam1052-03 G1052  1  1  32   7.31 control
> > > > >
> > > > > > dim(a)
> > > > > [1] 16988
> > > > >
> > > > > I am doing histogram plot via:
> > > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > > > position="dodge")
> > > > >
> > > > > there is 848 who have "case" in pheno column and 892 who have
> > > > > "control" in pheno column.
> > > > >
> > > > &g

Re: [R] how to show percentage of individuals for two groups on histogram?

2020-05-22 Thread Ana Marija

HI Jim,

Thank you so much for getting back to me I tried your codes and I got
this in attach,
I think the issue is in calculating percentage per groups (cases or controls)

par(mfrow=c(2,1))
casehist<-table(cut(a$HBA1C[a$pheno=="case"],breaks=0:15))
controlhist<-table(cut(a$HBA1C[a$pheno=="control"],breaks=0:15))

par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casehist),col="orange",
   space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
 "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlhist),
space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
 "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()

I can send you the whole dataset if you would like to try with it
On Thu, May 21, 2020 at 11:14 PM Jim Lemon  wrote:
>
> Hi Ana,
> Just noticed a typo from a hasty cut-paste. Two lines should read:
>
> casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
>
> Jim
>
> On Fri, May 22, 2020 at 2:08 PM Jim Lemon  wrote:
> >
> > Hi Ana,
> > My apologies for the pedestrian graphics, but it may help.
> >
> > # a bit of fake data
> > aafd<-data.frame(FID=paste0("fam",1000:2739),
> >  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> >  PLASER=c(rep(1,892),rep(2,848)),
> >  DIABDUR=sample(10:50,1740,TRUE),
> >  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> >  pheno=c(rep("control",892),rep("case",848)))
> > par(mfrow=c(2,1))
> > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > par(mar=c(0,4,1,2))
> > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> >  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > box()
> > par(mar=c(3,4,0,2))
> > barplot(100*controlhist,names.arg=names(controlpct),
> >  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > box()
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 9:08 AM Ana Marija  
> > wrote:
> > >
> > > the result would basically look something like this on in attach or
> > > the overlay of those two plots
> > >
> > >
> > > On Thu, May 21, 2020 at 5:23 PM Ana Marija  
> > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have a data frame like this:
> > > > > head(a)
> > > >  FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > > 1 fam1000-03 G1000  1  1  38  10.21 control
> > > > 2 fam1001-03 G1001  1  1  15   7.31 control
> > > > 3 fam1003-03 G1003  1  2  17   7.01case
> > > > 4 fam1005-03 G1005  1  1  36   7.71 control
> > > > 5 fam1009-03 G1009  1  1  23   7.61 control
> > > > 6 fam1052-03 G1052  1  1  32   7.31 control
> > > >
> > > > > dim(a)
> > > > [1] 16988
> > > >
> > > > I am doing histogram plot via:
> > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > > position="dodge")
> > > >
> > > > there is 848 who have "case" in pheno column and 892 who have
> > > > "control" in pheno column.
> > > >
> > > > I would like to have on y-axis shown percentage of individuals which
> > > > have either "case" or "control" in pheno instead of count.
> > > >
> > > > Please advise,
> > > > Ana
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to show percentage of individuals for two groups on histogram?

2020-05-21 Thread Ana Marija

the result would basically look something like this on in attach or
the overlay of those two plots


On Thu, May 21, 2020 at 5:23 PM Ana Marija  wrote:
>
> Hello,
>
> I have a data frame like this:
> > head(a)
>  FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> 1 fam1000-03 G1000  1  1  38  10.21 control
> 2 fam1001-03 G1001  1  1  15   7.31 control
> 3 fam1003-03 G1003  1  2  17   7.01case
> 4 fam1005-03 G1005  1  1  36   7.71 control
> 5 fam1009-03 G1009  1  1  23   7.61 control
> 6 fam1052-03 G1052  1  1  32   7.31 control
>
> > dim(a)
> [1] 16988
>
> I am doing histogram plot via:
> ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> position="dodge")
>
> there is 848 who have "case" in pheno column and 892 who have
> "control" in pheno column.
>
> I would like to have on y-axis shown percentage of individuals which
> have either "case" or "control" in pheno instead of count.
>
> Please advise,
> Ana
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to show percentage of individuals for two groups on histogram?

2020-05-21 Thread Ana Marija

Hello,

I have a data frame like this:
> head(a)
 FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
1 fam1000-03 G1000  1  1  38  10.21 control
2 fam1001-03 G1001  1  1  15   7.31 control
3 fam1003-03 G1003  1  2  17   7.01case
4 fam1005-03 G1005  1  1  36   7.71 control
5 fam1009-03 G1009  1  1  23   7.61 control
6 fam1052-03 G1052  1  1  32   7.31 control

> dim(a)
[1] 16988

I am doing histogram plot via:
ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
position="dodge")

there is 848 who have "case" in pheno column and 892 who have
"control" in pheno column.

I would like to have on y-axis shown percentage of individuals which
have either "case" or "control" in pheno instead of count.

Please advise,
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text annotation on Manhattn plot in qqman

2020-05-20 Thread Ana Marija

HI Michael,

Thank you so much!
That worked!!! Now I am just trying to increase the size of text of
SNP and GENE on plot

I tried this:

a$newname <- paste(a$SNP,"\n", a$GENE)
manhattan(a, chr="CHR", bp="BP", snp="newname", p="P",cex =
0.5,annotatePval = 0.0001)

but I am getting this error:

Error in textxy(topSNPs$pos, -log10(topSNPs$P), offset = 0.625, labs =
topSNPs$SNP,  :
  formal argument "cex" matched by multiple actual arguments

Do you by any chance know how to do this?

Cheers
Ana

On Wed, May 20, 2020 at 4:12 AM Michael Dewey  wrote:
>
> a$newname <- paste(a$SNP, a$GENE)
> manhattan(a, chr="CHR", bp="BP", snp="newname", p="P",annotatePval = 0.0001)
>
> However note that I do not use manhattan() so you may need to alter the
> parameters as I am assuming a is where it finds the remaining parameters.
>
> You may also need to play with the sep =, and collapse = parameters to
> paste() to get the precise layout you want.
>
> Michael
>
> On 19/05/2020 17:21, Ana Marija wrote:
> > Hi Michael,
> >
> > can you please send me code how that would be done?
> >
> > Thanks
> > Ana
> >
> > On Tue, May 19, 2020 at 11:18 AM Michael Dewey  
> > wrote:
> >>
> >> Dear Ana
> >>
> >> Perhaps paste together SNP and GENE using paste() and then supply that
> >> as the snp parameter.
> >>
> >> Michael
> >>
> >> On 19/05/2020 17:12, Ana Marija wrote:
> >>> Hello,
> >>>
> >>> I am making manhattan plot with:
> >>> library(qqman)
> >>> manhattan(a, chr="CHR", bp="BP", snp="SNP", p="P",annotatePval = 0.0001)
> >>>
> >>> and I would like to annotate these two SNPs which are above the
> >>> threshold so that they have GENE name beside them:
> >>>
> >>>> a[a$SNP=="rs4081570",]
> >>>   SNPP CHR   BP GENE
> >>> 1 rs4081570 6.564447e-05  19 15918647 UCA1
> >>>> a[a$SNP=="rs11867934",]
> >>>   SNPP CHR   BP GENE
> >>> 1021 rs11867934 6.738066e-06  17 16933404 FLCN
> >>>
> >>> Right now my plot only has SNP name for those 2, how to add GENE names
> >>> (FLCN and UCA1 as well)
> >>>
> >>> Please advise
> >>> Ana
> >>>
> >>>
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> --
> >> Michael
> >> http://www.dewey.myzen.co.uk/home.html
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text annotation on Manhattn plot in qqman

2020-05-19 Thread Ana Marija

Hi Michael,

can you please send me code how that would be done?

Thanks
Ana

On Tue, May 19, 2020 at 11:18 AM Michael Dewey  wrote:
>
> Dear Ana
>
> Perhaps paste together SNP and GENE using paste() and then supply that
> as the snp parameter.
>
> Michael
>
> On 19/05/2020 17:12, Ana Marija wrote:
> > Hello,
> >
> > I am making manhattan plot with:
> > library(qqman)
> > manhattan(a, chr="CHR", bp="BP", snp="SNP", p="P",annotatePval = 0.0001)
> >
> > and I would like to annotate these two SNPs which are above the
> > threshold so that they have GENE name beside them:
> >
> >> a[a$SNP=="rs4081570",]
> >  SNPP CHR   BP GENE
> > 1 rs4081570 6.564447e-05  19 15918647 UCA1
> >> a[a$SNP=="rs11867934",]
> >  SNPP CHR   BP GENE
> > 1021 rs11867934 6.738066e-06  17 16933404 FLCN
> >
> > Right now my plot only has SNP name for those 2, how to add GENE names
> > (FLCN and UCA1 as well)
> >
> > Please advise
> > Ana
> >
> >
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] text annotation on Manhattn plot in qqman

2020-05-19 Thread Ana Marija

Hello,

I am making manhattan plot with:
library(qqman)
manhattan(a, chr="CHR", bp="BP", snp="SNP", p="P",annotatePval = 0.0001)

and I would like to annotate these two SNPs which are above the
threshold so that they have GENE name beside them:

> a[a$SNP=="rs4081570",]
SNPP CHR   BP GENE
1 rs4081570 6.564447e-05  19 15918647 UCA1
> a[a$SNP=="rs11867934",]
SNPP CHR   BP GENE
1021 rs11867934 6.738066e-06  17 16933404 FLCN

Right now my plot only has SNP name for those 2, how to add GENE names
(FLCN and UCA1 as well)

Please advise
Ana
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Ana Marija

Hi Rui,

thank you so much that is exactly what I needed!

Cheers,
Ana

On Fri, May 15, 2020 at 5:12 PM Rui Barradas  wrote:
>
> Hello,
>
> I have tried several options and with large dataframes this one was the
> fastest (in my tests, of the ones I have tried).
>
>
> s1 <- sapply(tot, function(x) grep('^E10', x, value = TRUE))
>
>
> Then unlist(s1).
> A close second (15% slower) was
>
>
> s2 <- tot[sapply(tot, function(x) grepl('^E10', x))]
>
>
> grep/unlist was 3.7 times slower:
>
>
> grep("^E10", unlist(tot), value = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 20:24 de 15/05/20, Ana Marija escreveu:
> > Hello,
> >
> > this command was running for more than 2 hours
> > grep("E10",tot,value=T)
> > and no output
> >
> > and this command
> > df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> >
> > gave me a subset (a data frame) of tot where ^E10
> >
> > what I need is just a vector or all values in tot which start with E10.
> >
> > Thanks
> > Ana
> >
> > On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
> >  wrote:
> >>
> >> Read about regular expressions... they are extremely useful.
> >>
> >> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
> >>
> >> It is bad form not to put spaces around the <- assignment.
> >>
> >>
> >> On May 15, 2020 10:00:04 AM PDT, Ana Marija  
> >> wrote:
> >>> Hello,
> >>>
> >>> I have a data frame:
> >>>
> >>>> dim(tot)
> >>> [1] 502536   1093
> >>>
> >>> How would I extract from it all strings that start with E10?
> >>>
> >>> I know how to extract all rows that contain with E10
> >>> df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> >>>> dim(df0)
> >>> [1] 5105 1093
> >>>
> >>> but I just need a vector of strings that start with E10...
> >>> it would look something like this:
> >>>
> >>> [1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> --
> >> Sent from my phone. Please excuse my brevity.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Ana Marija

Hello,

this command was running for more than 2 hours
grep("E10",tot,value=T)
and no output

and this command
df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))

gave me a subset (a data frame) of tot where ^E10

what I need is just a vector or all values in tot which start with E10.

Thanks
Ana

On Fri, May 15, 2020 at 12:13 PM Jeff Newmiller
 wrote:
>
> Read about regular expressions... they are extremely useful.
>
> df1 <- tot %>% filter_all(any_vars(grepl( '^E10', .)))
>
> It is bad form not to put spaces around the <- assignment.
>
>
> On May 15, 2020 10:00:04 AM PDT, Ana Marija  
> wrote:
> >Hello,
> >
> >I have a data frame:
> >
> >> dim(tot)
> >[1] 502536   1093
> >
> >How would I extract from it all strings that start with E10?
> >
> >I know how to extract all rows that contain with E10
> >df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> >> dim(df0)
> >[1] 5105 1093
> >
> >but I just need a vector of strings that start with E10...
> >it would look something like this:
> >
> >[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"
> >
> >Thanks
> >Ana
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to extract strings in any column and in any row that start with

2020-05-15 Thread Ana Marija

Hello,

I have a data frame:

> dim(tot)
[1] 502536   1093

How would I extract from it all strings that start with E10?

I know how to extract all rows that contain with E10
df0<-tot %>% filter_all(any_vars(. %in% c('E10')))
> dim(df0)
[1] 5105 1093

but I just need a vector of strings that start with E10...
it would look something like this:

[1] "E102" "E109" "E108" "E103" "E104" "E105" "E101" "E106" "E107"

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error: Cannot use `+.gg()` with a single argument.

2020-05-07 Thread Ana Marija

Hello,

I got this error:
Error: Cannot use `+.gg()` with a single argument. Did you
accidentally put + on a new line?

After running this:
data(murders)
library(ggplot2)
library(dplyr)
library(ggplot2)
ggplot(data=murders)

#define the slope of the line
r<-murders %>% summarize(rate=sum(total)/sum(population)*10^6) %>%.$rate
#mamke the plot
murders %>% ggplot(aes(population/10^6,total,label=abb))+
  +geom_abline(intercept = log10(r),lty=2,color="darkgrey")+
  +geom_point(aes(col=region), size=3)+
  +geom_text_repel()+
  +scale_x_log10()+
  +scale_y_log10()+
  +xlab("Populations in millions (log scale)")+
  +ylab("Total number of murders (log scale)")+
  +ggtitle("US Gun Murders in US 2010")+
  +scale_color_discrete(name="Region")+
  +theme_economist()

Is this an issue with my dplyr? Or how I can fix this code in order to work?

Thanks
Ana

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating t-score/t-stats as my zscores

2020-05-06 Thread Ana Marija

Thank you so much! I mostly worry which of those procedures is the
closest to the z-score

On Wed, May 6, 2020 at 10:05 AM Rui Barradas  wrote:
>
> Hello,
>
> Another option is to use stats::t.test
>
> t.test(x)$statistic
>
> Or, if you want to test against a beta0 != 0,
>
> t.test(x, mu = beta0)$statistic
>
>
> But in this case the estimator is the estimator for the mean value.
>
> Hope this helps,
>
> Rui Barradas
>
> Às 15:54 de 06/05/20, Ana Marija escreveu:
> > Thanks Patrick, so in conclusion this is fine?
> > z-score=Beta/StdErr
> >
> > On Wed, May 6, 2020 at 9:52 AM Patrick (Malone Quantitative)
> >  wrote:
> >>
> >> Guessing for Ana, but no, that's a different meaning. Beta/StdErr is a
> >> z statistic--a test statistic against (usually) the tails of the unit
> >> normal distribution. So like a t-test with infinite df.
> >>
> >>
> >> On Wed, May 6, 2020 at 10:41 AM Rui Barradas  wrote:
> >>>
> >>> Hello,
> >>>
> >>> By z-scores do you mean function help('scale')?
> >>>
> >>> Hope this helps,
> >>>
> >>> Rui Barradas
> >>>
> >>> Às 15:31 de 06/05/20, Ana Marija escreveu:
> >>>> Hi Rui,
> >>>>
> >>>> Thank you for getting back to me. Is there is a better way to
> >>>> calculate Z scores if I have p values, SE and Beta?
> >>>>
> >>>> Thanks
> >>>> Ana
> >>>>
> >>>> On Wed, May 6, 2020 at 9:27 AM Rui Barradas  wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> That gives the *absolute* t-scores. If it's all you need/want, then the
> >>>>> answer is yes, you can.
> >>>>>
> >>>>> Hope this helps,
> >>>>>
> >>>>> Rui Barradas
> >>>>>
> >>>>> Às 14:28 de 06/05/20, Ana Marija escreveu:
> >>>>>> Hello,
> >>>>>>
> >>>>>> Can I apply the quantile function qt() this way?
> >>>>>> qt(pvals/2, 406-34, lower.tail = F)
> >>>>>> to get the T-scores?
> >>>>>>
> >>>>>> Thanks
> >>>>>> Ama
> >>>>>>
> >>>>>> __
> >>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide 
> >>>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Patrick S. Malone, Ph.D., Malone Quantitative
> >> NEW Service Models: http://malonequantitative.com
> >>
> >> He/Him/His

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating t-score/t-stats as my zscores

2020-05-06 Thread Ana Marija

Thanks Patrick, so in conclusion this is fine?
z-score=Beta/StdErr

On Wed, May 6, 2020 at 9:52 AM Patrick (Malone Quantitative)
 wrote:
>
> Guessing for Ana, but no, that's a different meaning. Beta/StdErr is a
> z statistic--a test statistic against (usually) the tails of the unit
> normal distribution. So like a t-test with infinite df.
>
>
> On Wed, May 6, 2020 at 10:41 AM Rui Barradas  wrote:
> >
> > Hello,
> >
> > By z-scores do you mean function help('scale')?
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 15:31 de 06/05/20, Ana Marija escreveu:
> > > Hi Rui,
> > >
> > > Thank you for getting back to me. Is there is a better way to
> > > calculate Z scores if I have p values, SE and Beta?
> > >
> > > Thanks
> > > Ana
> > >
> > > On Wed, May 6, 2020 at 9:27 AM Rui Barradas  wrote:
> > >>
> > >> Hello,
> > >>
> > >> That gives the *absolute* t-scores. If it's all you need/want, then the
> > >> answer is yes, you can.
> > >>
> > >> Hope this helps,
> > >>
> > >> Rui Barradas
> > >>
> > >> Às 14:28 de 06/05/20, Ana Marija escreveu:
> > >>> Hello,
> > >>>
> > >>> Can I apply the quantile function qt() this way?
> > >>> qt(pvals/2, 406-34, lower.tail = F)
> > >>> to get the T-scores?
> > >>>
> > >>> Thanks
> > >>> Ama
> > >>>
> > >>> __
> > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide 
> > >>> http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>>
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Patrick S. Malone, Ph.D., Malone Quantitative
> NEW Service Models: http://malonequantitative.com
>
> He/Him/His

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating t-score/t-stats as my zscores

2020-05-06 Thread Ana Marija

as defined here:
https://huwenboshi.github.io/data%20management/2017/11/23/tips-for-formatting-gwas-summary-stats.html

where Effect size is Beta

On Wed, May 6, 2020 at 9:41 AM Rui Barradas  wrote:
>
> Hello,
>
> By z-scores do you mean function help('scale')?
>
> Hope this helps,
>
> Rui Barradas
>
> Às 15:31 de 06/05/20, Ana Marija escreveu:
> > Hi Rui,
> >
> > Thank you for getting back to me. Is there is a better way to
> > calculate Z scores if I have p values, SE and Beta?
> >
> > Thanks
> > Ana
> >
> > On Wed, May 6, 2020 at 9:27 AM Rui Barradas  wrote:
> >>
> >> Hello,
> >>
> >> That gives the *absolute* t-scores. If it's all you need/want, then the
> >> answer is yes, you can.
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Às 14:28 de 06/05/20, Ana Marija escreveu:
> >>> Hello,
> >>>
> >>> Can I apply the quantile function qt() this way?
> >>> qt(pvals/2, 406-34, lower.tail = F)
> >>> to get the T-scores?
> >>>
> >>> Thanks
> >>> Ama
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo