[R] R: estimating genotyping error rate

2019-01-17 Thread N Meriam
Hello,
I have SNP data from genotyping.
I would like to estimate the error rate between replicated samples using R.
How can I proceed?

Thanks
Meriam

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help: fviz_nbclust’ is not available (for R version 3.5.2)

2019-01-16 Thread N Meriam
Thanks for your valuable clarifications.
I tried all the steps again but the problem remains.
In fact,  "fviz_nbclust" is a function inside the package "factoextra".
I run each step very carefully but the problem remains...It doesn't
make sense because I have installed factoextra.

This warning appears:
could not find function "fviz_nbclust"

On Wed, Jan 16, 2019 at 1:22 PM Jeff Newmiller  wrote:
>
> Concept 1: You don't install functions... you install packages that have 
> functions in them. There is a function fviz_nbclust in factoextra.
>
> Concept 2: Once a package is installed, you do NOT have to install it again, 
> e.g. every time you want to do that analysis. Making the installation part of 
> your script is not advised.
>
> Concept 3: Typically we do use the library function with a package name at 
> the beginning of every session where we want to use functions from that 
> package. However, that is optional... you could also just invoke the function 
> directly using factoextra::fviz_nbclust(...blahblah...). Having the library 
> function shortens this and if the package is not installed it provides a 
> clear error message that can be a reminder to the user to install the package.
>
> Execute your code line by line and solve the first error you encounter by 
> examining the error message and reviewing what that line of code is designed 
> to do.
>
> On January 16, 2019 11:00:07 AM PST, N Meriam  wrote:
> >Hello,
> >I'm struggling to install a function called "fviz_nbclus".
> >
> >My code is the following:
> >pkgs <- c("factoextra",  "NbClust")
> >install.packages(pkgs)
> >library(factoextra)
> >library(NbClust)
> ># Standardize the data
> >load("df4.rda")
> >library(FunCluster)
> >
> >install.packages("fviz_nbclust")
> >#fviz_nbclust(df4, FUNcluster, method = c("silhouette", "wss",
> >"gap_stat"))
> >
> >Installing package into ‘C:/Users/DELL/Documents/R/win-library/3.5’
> >(as ‘lib’ is unspecified)
> >Warning in install.packages :
> >  package ‘fviz_nbclust’ is not available (for R version 3.5.2)
> >
> >Best,
> >Meriam
> >
> >__
> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.



-- 
Meriam Nefzaoui
MSc. in Plant Breeding and Genetics
Universidade Federal Rural de Pernambuco (UFRPE) - Recife, Brazil

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R help: fviz_nbclust’ is not available (for R version 3.5.2)

2019-01-16 Thread N Meriam
Hello,
I'm struggling to install a function called "fviz_nbclus".

My code is the following:
pkgs <- c("factoextra",  "NbClust")
install.packages(pkgs)
library(factoextra)
library(NbClust)
# Standardize the data
load("df4.rda")
library(FunCluster)

install.packages("fviz_nbclust")
#fviz_nbclust(df4, FUNcluster, method = c("silhouette", "wss", "gap_stat"))

Installing package into ‘C:/Users/DELL/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘fviz_nbclust’ is not available (for R version 3.5.2)

Best,
Meriam

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Overlapping legend in a circular dendrogram

2019-01-11 Thread N Meriam
Yes I know. Sorry if I reposted this but it's simply because I've
received an email mentioning that the file was too big that's why I
modified my question and reposted it.
I don't want to oblige anyone to respond. I really thought the issue
was my file (too big so nobody received it).

Thanks for your understanding,
Best Myriam

On Fri, Jan 11, 2019 at 3:03 PM Bert Gunter  wrote:
>
> This is the 3rd time you've posted this. Please stop re-posting!
>
> Your question is specialized and involved, and you have failed to provide a 
> reproducible example/data. We are not obliged to respond.
>
> You may do better contacting the maintainer, found by ?maintainer, as 
> recommended by the posting guide for specialized queries such as this.
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Jan 11, 2019 at 12:47 PM N Meriam  wrote:
>>
>> Hi, I'm facing some issues when generationg a circular dendrogram.
>> The labels on the left which are my countries are overlapping with the
>> circular dendrogram (middle). Same happens with the labels (regions)
>> located on the right.
>> I run the following code and I'd like to know what should be changed
>> in my code in order to avoid that.
>>
>> load("hc1.rda")
>> library(cluster)
>> library(ape)
>> library(dendextend)
>> library(circlize)
>> library(RColorBrewer)
>>
>> labels = hc1$labels
>> n = length(labels)
>> dend = as.dendrogram(hc1)
>> markcountry=as.data.frame(markcountry1)
>> #Country colors
>> groupCodes=as.character(as.factor(markcountry[,2]))
>> colorCodes=rainbow(length(unique(groupCodes))) #c("blue","red")
>> names(colorCodes)=unique(groupCodes)
>> labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]
>>
>> #Region colors
>> groupCodesR=as.character(as.factor(markcountry[,3]))
>> colorCodesR=rainbow(length(unique(groupCodesR))) #c("blue","red")
>> names(colorCodesR)=unique(groupCodesR)
>>
>> circos.par(cell.padding = c(0, 0, 0, 0))
>> circos.initialize(factors = "foo", xlim = c(1, n)) # only one sector
>> max_height = attr(dend, "height")  # maximum height of the trees
>>
>> #Region graphics
>> circos.trackPlotRegion(ylim = c(0, 1.5), panel.fun = function(x, y) {
>>   circos.rect(1:361-0.5, rep(0.5, 361), 1:361-0.1, rep(0.8,361), col =
>> colorCodesR[groupCodesR][order.dendrogram(dend)], border = NA)
>> }, bg.border = NA)
>>
>> #labels graphics
>> circos.trackPlotRegion(ylim = c(0, 0.5), bg.border = NA,
>>panel.fun = function(x, y) {
>>
>>circos.text(1:361-0.5,
>> rep(0.5,361),labels(dend), adj = c(0, 0.5),
>>facing = "clockwise", niceFacing = 
>> TRUE,
>>col = labels_colors(dend), cex = 0.45)
>>
>>})
>> dend = color_branches(dend, k = 6, col = 1:6)
>>
>> #Dendrogram graphics
>> circos.trackPlotRegion(ylim = c(0, max_height), bg.border = NA,
>>track.height = 0.4, panel.fun = function(x, y) {
>>  circos.dendrogram(dend, max_height = 0.55)
>>})
>> legend("left",names(colorCodes),col=colorCodes,text.col=colorCodes,bty="n",pch=15,cex=0.8)
>> legend("right",names(colorCodesR),col=colorCodesR,text.col=colorCodesR,bty="n",pch=15,cex=0.35)
>>
>> Cheers,
>> Myriam
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Meriam Nefzaoui
MSc. in Plant Breeding and Genetics
Universidade Federal Rural de Pernambuco (UFRPE) - Recife, Brazil

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Overlapping legend in a circular dendrogram

2019-01-11 Thread N Meriam
Hi, I'm facing some issues when generationg a circular dendrogram.
The labels on the left which are my countries are overlapping with the
circular dendrogram (middle). Same happens with the labels (regions)
located on the right.
I run the following code and I'd like to know what should be changed
in my code in order to avoid that.

load("hc1.rda")
library(cluster)
library(ape)
library(dendextend)
library(circlize)
library(RColorBrewer)

labels = hc1$labels
n = length(labels)
dend = as.dendrogram(hc1)
markcountry=as.data.frame(markcountry1)
#Country colors
groupCodes=as.character(as.factor(markcountry[,2]))
colorCodes=rainbow(length(unique(groupCodes))) #c("blue","red")
names(colorCodes)=unique(groupCodes)
labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]

#Region colors
groupCodesR=as.character(as.factor(markcountry[,3]))
colorCodesR=rainbow(length(unique(groupCodesR))) #c("blue","red")
names(colorCodesR)=unique(groupCodesR)

circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(factors = "foo", xlim = c(1, n)) # only one sector
max_height = attr(dend, "height")  # maximum height of the trees

#Region graphics
circos.trackPlotRegion(ylim = c(0, 1.5), panel.fun = function(x, y) {
  circos.rect(1:361-0.5, rep(0.5, 361), 1:361-0.1, rep(0.8,361), col =
colorCodesR[groupCodesR][order.dendrogram(dend)], border = NA)
}, bg.border = NA)

#labels graphics
circos.trackPlotRegion(ylim = c(0, 0.5), bg.border = NA,
   panel.fun = function(x, y) {

   circos.text(1:361-0.5,
rep(0.5,361),labels(dend), adj = c(0, 0.5),
   facing = "clockwise", niceFacing = TRUE,
   col = labels_colors(dend), cex = 0.45)

   })
dend = color_branches(dend, k = 6, col = 1:6)

#Dendrogram graphics
circos.trackPlotRegion(ylim = c(0, max_height), bg.border = NA,
   track.height = 0.4, panel.fun = function(x, y) {
 circos.dendrogram(dend, max_height = 0.55)
   })
legend("left",names(colorCodes),col=colorCodes,text.col=colorCodes,bty="n",pch=15,cex=0.8)
legend("right",names(colorCodesR),col=colorCodesR,text.col=colorCodesR,bty="n",pch=15,cex=0.35)

Cheers,
Myriam

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Overlapping legend in a circular dendrogram

2019-01-11 Thread N Meriam
Dear all,

I run the following code and I get this graphic (Imageattached). What
should I change in my code in order to adjust the overlapping objects?

load("hc1.rda")
library(cluster)
library(ape)
library(dendextend)
library(circlize)
library(RColorBrewer)

labels = hc1$labels
n = length(labels)
dend = as.dendrogram(hc1)
markcountry=as.data.frame(markcountry1)
#Country colors
groupCodes=as.character(as.factor(markcountry[,2]))
colorCodes=rainbow(length(unique(groupCodes))) #c("blue","red")
names(colorCodes)=unique(groupCodes)
labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]

#Region colors
groupCodesR=as.character(as.factor(markcountry[,3]))
colorCodesR=rainbow(length(unique(groupCodesR))) #c("blue","red")
names(colorCodesR)=unique(groupCodesR)

circos.par(cell.padding = c(0, 0, 0, 0))
circos.initialize(factors = "foo", xlim = c(1, n)) # only one sector
max_height = attr(dend, "height")  # maximum height of the trees

#Region graphics
circos.trackPlotRegion(ylim = c(0, 1.5), panel.fun = function(x, y) {
  circos.rect(1:361-0.5, rep(0.5, 361), 1:361-0.1, rep(0.8,361), col =
colorCodesR[groupCodesR][order.dendrogram(dend)], border = NA)
}, bg.border = NA)

#labels graphics
circos.trackPlotRegion(ylim = c(0, 0.5), bg.border = NA,
   panel.fun = function(x, y) {

   circos.text(1:361-0.5,
rep(0.5,361),labels(dend), adj = c(0, 0.5),
   facing = "clockwise", niceFacing =
TRUE,
   col = labels_colors(dend), cex =
0.45)

   })
dend = color_branches(dend, k = 6, col = 1:6)

#Dendrogram graphics
circos.trackPlotRegion(ylim = c(0, max_height), bg.border = NA,
   track.height = 0.4, panel.fun = function(x, y) {
 circos.dendrogram(dend, max_height = 0.55)
   })
legend("left",names(colorCodes),col=colorCodes,text.col=colorCodes,bty="n",pch=15,cex=0.8)
legend("right",names(colorCodesR),col=colorCodesR,text.col=colorCodesR,bty="n",pch=15,cex=0.35)

Thanks,
Meriam
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R help: circular dendrogram

2019-01-08 Thread N Meriam
Dear all,

I generated a circular dendrogram with R (see attached). I have a
total of 360 landraces.
What I want to do next is generate a different color for each cluster
and also generate colors to show the country/region.
I don't know if it's also possible to put a code number (associated
with each landrace) in front of each ramification.
I want to have an explicit dendrogram.


Rplot01.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
Yes, sorry. I attached the file once again.
Well, still getting the same warning.

> class(genod) <- "numeric"
Warning message:
In class(genod) <- "numeric" : NAs introduced by coercion
> class(genod)
[1] "matrix"

Then, I run the following code and it gives this:

> filn <-"simTunesian.gds"
> snpgdsCreateGeno(filn, genmat = genod,
+  sample.id = sample.id, snp.id = snp.id,
+  snp.chromosome = snp.chromosome,
+  snp.position = snp.position,
+  snp.allele = snp.allele, snpfirstdim=TRUE)
> # calculate similarity matrix
> # Open the GDS file
> (genofile <- snpgdsOpen(filn))
File: C:\Users\DELL\Documents\TEST\simTunesian.gds (1.4M)
+[  ] *
|--+ sample.id   { Str8 363 ZIP_ra(42.5%), 755B }
|--+ snp.id   { Int32 15752 ZIP_ra(35.1%), 21.6K }
|--+ snp.position   { Int32 15752 ZIP_ra(34.7%), 21.3K }
|--+ snp.chromosome   { Float64 15752 ZIP_ra(0.18%), 230B }
|--+ snp.allele   { Str8 15752 ZIP_ra(0.16%), 108B }
\--+ genotype   { Bit2 15752x363, 1.4M } *
> ibs <- snpgdsIBS(genofile, remove.monosnp = FALSE, num.thread=1)
Identity-By-State (IBS) analysis on genotypes:
Excluding 0 SNP on non-autosomes
Working space: 363 samples, 15,752 SNPs
using 1 (CPU) core
IBS:the sum of all selected genotypes (0,1,2) = 3658952
Tue Jan 08 15:38:00 2019(internal increment: 42880)
[==] 100%, completed in 0s
Tue Jan 08 15:38:00 2019Done.
> # maximum similarity value
> max(ibs$ibs)
[1] NaN
> # minimum similarity value
> min(ibs$ibs)
[1] NaN

As you can see, I can't continue my analysis (heat map plot,
clustering with hclust) because values are NaN.


On Tue, Jan 8, 2019 at 2:01 PM David L Carlson  wrote:
>
> Your attached file is not a .csv file since the field are not separated by 
> commas (just rename the mydata.csv to mydata.txt).
>
> The command "genod2 <- as.matrix(genod)" created a character matrix from the 
> data frame genod.  When you try to force genod2 to numeric, the marker column 
> becomes NAs which is probably not what you want.
>
> The error message is because you passed genod (a data frame) to the 
> snpgdsCreateGeno() function not genod2 (the matrix you created from genod).
>
> 
> David L. Carlson
> Department of Anthropology
> Texas A University
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of N Meriam
> Sent: Tuesday, January 8, 2019 1:38 PM
> To: Michael Dewey 
> Cc: r-help@r-project.org
> Subject: Re: [R] Warning message: NAs introduced by coercion
>
> Here's a portion of what my data looks like (text file format attached).
> When running in R, it gives me this:
>
> > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> > require(SNPRelate)
> > library(gdsfmt)
> > myd <- df4
> > myd <- df4
> > names(myd)[-1]
> [1] "marker" "X88""X9" "X17""X25"
> > myd[,1]
> [1]  3  4  5  6  8 10
> # the data must be 0,1,2 with 3 as missing so you have r
> > sample.id <- names(myd)[-1]
> > snp.id <- myd[,1]
> > snp.position <- 1:length(snp.id) # not needed for ibs
> > snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> # genotype data must have - in 3
> > genod <- myd[,-1]
> > genod[is.na(genod)] <- 3
> > genod[genod=="0"] <- 0
> > genod[genod=="1"] <- 2
> > genod2 <- as.matrix(genod)
> > head(genod2)
>  marker X88   X9
>  X17   X25
> [1,]  "100023173|F|0-47:G>A-47:G>A" "0""3""3" "3"
> [2,]  "1043336|F|0-7:A>G-7:A>G" "2""0""3" "0"
> [3,]  "1212218|F|0-49:A>G-49:A>G" "0""0""0" "0"
> [4,]  "1019554|F|0-14:T>C-14:T>C"   "0"   "0""3" "0"
> [5,]  "100024550|F|0-16:G>A-16:G>A" "3""3""3" "3"
> [6,]  "1106702|F|0-8:C>A-8:C>A"  "0"   "0" "0" "0"
> > class(genod2) <- "numeric"
> Warning message: In class(genod2) <- "numeric" : NAs introduced by coercion
> > head(genod2)
> marker   X88  X9   X17  X25
> [1,] NA 0  3 3  

Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
Here's a portion of what my data looks like (text file format attached).
When running in R, it gives me this:

> df4 <- read.csv(file = "mydata.csv", header = TRUE)
> require(SNPRelate)
> library(gdsfmt)
> myd <- df4
> myd <- df4
> names(myd)[-1]
[1] "marker" "X88""X9" "X17""X25"
> myd[,1]
[1]  3  4  5  6  8 10
# the data must be 0,1,2 with 3 as missing so you have r
> sample.id <- names(myd)[-1]
> snp.id <- myd[,1]
> snp.position <- 1:length(snp.id) # not needed for ibs
> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
# genotype data must have - in 3
> genod <- myd[,-1]
> genod[is.na(genod)] <- 3
> genod[genod=="0"] <- 0
> genod[genod=="1"] <- 2
> genod2 <- as.matrix(genod)
> head(genod2)
 marker X88   X9
 X17   X25
[1,]  "100023173|F|0-47:G>A-47:G>A" "0""3""3" "3"
[2,]  "1043336|F|0-7:A>G-7:A>G" "2""0""3" "0"
[3,]  "1212218|F|0-49:A>G-49:A>G" "0""0""0" "0"
[4,]  "1019554|F|0-14:T>C-14:T>C"   "0"   "0""3" "0"
[5,]  "100024550|F|0-16:G>A-16:G>A" "3""3""3" "3"
[6,]  "1106702|F|0-8:C>A-8:C>A"  "0"   "0" "0" "0"
> class(genod2) <- "numeric"
Warning message: In class(genod2) <- "numeric" : NAs introduced by coercion
> head(genod2)
marker   X88  X9   X17  X25
[1,] NA 0  3 3   3
[2,] NA 2  0 3   0
[3,] NA 0  0 0   0
[4,] NA 0  0 3   0
[5,] NA 3  3 3   3
[6,] NA 0  0 0   0
> class(genod2) <- "numeric"
> class(genod2)
[1] "matrix"
# read data
> filn <-"simTunesian.gds"
> snpgdsCreateGeno(filn, genmat = genod,
+  sample.id = sample.id, snp.id = snp.id,
+  snp.chromosome = snp.chromosome,
+  snp.position = snp.position,
+  snp.allele = snp.allele, snpfirstdim=TRUE)
Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = sample.id,
 :   is.matrix(genmat) is not TRUE

Can't find a solution to my problem...my guess is that the problem
comes from converting the column 'marker' factor to numerical.

Best,
Meriam

On Tue, Jan 8, 2019 at 11:28 AM Michael Dewey  wrote:
>
> Dear Meriam
>
> Your csv file did not come through as attachments are stripped unless of
> certain types and you post is very hard to read since you are posting in
> HTML. Try renaming the file to .txt and set your mailer to send
> plain text then people may be able to help you better.
>
> Michael
>
> On 08/01/2019 15:35, N Meriam wrote:
> > I see...
> > Here's a portion of what my data looks like (csv file attached).
> > I run again and here are the results:
> >
> > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> >
> >> require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> 
> >> names(myd)[-1][1] "marker" "X88""X9" "X17""X25"
> >
> >> myd[,1][1]  3  4  5  6  8 10
> >
> >
> >> # the data must be 0,1,2 with 3 as missing so you have r> sample.id <- 
> >> names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not 
> >> needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed 
> >> for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs> # 
> >> genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 
> >> 3> genod[genod=="0"] <- 0> genod[genod=="1"] <- 2
> >
> >> genod2 <- as.matrix(genod)> head(genod2) marker
> >> X88 X9  X17 X25
> > [1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
> > [2,] "1043336|F|0-7:A>G-7:A>G" "2" "0" "3" "0"
> > [3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
> > [4,] "1019554|F|0-14:T>C-14

Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
I see...
Here's a portion of what my data looks like (csv file attached).
I run again and here are the results:

df4 <- read.csv(file = "mydata.csv", header = TRUE)

> require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> 
> names(myd)[-1][1] "marker" "X88""X9" "X17""X25"

> myd[,1][1]  3  4  5  6  8 10


> # the data must be 0,1,2 with 3 as missing so you have r> sample.id <- 
> names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not 
> needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed 
> for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs> # 
> genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 3> 
> genod[genod=="0"] <- 0> genod[genod=="1"] <- 2

> genod2 <- as.matrix(genod)> head(genod2) marker
> X88 X9  X17 X25
[1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
[2,] "1043336|F|0-7:A>G-7:A>G" "2" "0" "3" "0"
[3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
[4,] "1019554|F|0-14:T>C-14:T>C"   "0" "0" "3" "0"
[5,] "100024550|F|0-16:G>A-16:G>A" "3" "3" "3" "3"
[6,] "1106702|F|0-8:C>A-8:C>A" "0" "0" "0" "0"

> class(genod2) <- "numeric"Warning message:In class(genod2) <- "numeric" : NAs 
> introduced by coercion> head(genod2)

 marker X88 X9 X17 X25
[1,] NA   0  3   3   3
[2,] NA   2  0   3   0
[3,] NA   0  0   0   0
[4,] NA   0  0   3   0
[5,] NA   3  3   3   3
[6,] NA   0  0   0   0

> class(genod2) <- "numeric"> class(genod2)[1] "matrix"

> # read data > filn <-"simTunesian.gds"> snpgdsCreateGeno(filn, genmat = 
> genod,+  sample.id = sample.id, snp.id = snp.id,+ 
>  snp.chromosome = snp.chromosome,+  snp.position = 
> snp.position,+  snp.allele = snp.allele, 
> snpfirstdim=TRUE)Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = 
> sample.id,  :
  is.matrix(genmat) is not TRUE

Thanks,
Meriam

On Tue, Jan 8, 2019 at 9:02 AM PIKAL Petr  wrote:

> Hi
>
> see in line
>
> > -Original Message-
> > From: R-help  On Behalf Of N Meriam
> > Sent: Tuesday, January 8, 2019 3:08 PM
> > To: r-help@r-project.org
> > Subject: [R] Warning message: NAs introduced by coercion
> >
> > Dear all,
> >
> > I have a .csv file called df4. (15752 obs. of 264 variables).
> > I apply this code but couldn't continue further other analyses, a warning
> > message keeps coming up. Then, I want to determine max and min
> > similarity values,
> > heat map plot, cluster...etc
> >
> > > require(SNPRelate)
> > > library(gdsfmt)
> > > myd <- read.csv(file = "df4.csv", header = TRUE)
> > > names(myd)[-1]
> > myd[,1]
> > > myd[1:10, 1:10]
> >  # the data must be 0,1,2 with 3 as missing so you have r
> > > sample.id <- names(myd)[-1]
> > > snp.id <- myd[,1]
> > > snp.position <- 1:length(snp.id) # not needed for ibs
> > > snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > > snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> > # genotype data must have - in 3
> > > genod <- myd[,-1]
> > > genod[is.na(genod)] <- 3
> > > genod[genod=="0"] <- 0
> > > genod[genod=="1"] <- 2
> > > genod[1:10,1:10]
> > > genod <- as.matrix(genod)
>
> matrix can have only one type of data so you probaly changed it to
> character by such construction.
>
> > > class(genod) <- "numeric"
>
> This tries to change all "numeric" values to numbers but if it cannot it
> sets it to NA.
>
> something like
>
> > head(iris)
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1  5.1 3.5  1.4 0.2  setosa
> 2  4.9 3.0  1.4 0.2  setosa
> 3  4.7 3.2  1.3 0.2  setosa
> 4  4.6 3.1  1.5 0.2  setosa
> 5  5.0 3.6  1.4 0.2  setosa
> 6  5.4 3.9  1.7 0.4  setosa
> >

[R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
Dear all,

I have a .csv file called df4. (15752 obs. of 264 variables).
I apply this code but couldn't continue further other analyses, a warning
message keeps coming up. Then, I want to determine max and min
similarity values,
heat map plot, cluster...etc

> require(SNPRelate)
> library(gdsfmt)
> myd <- read.csv(file = "df4.csv", header = TRUE)
> names(myd)[-1]
myd[,1]
> myd[1:10, 1:10]
 # the data must be 0,1,2 with 3 as missing so you have r
> sample.id <- names(myd)[-1]
> snp.id <- myd[,1]
> snp.position <- 1:length(snp.id) # not needed for ibs
> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
# genotype data must have - in 3
> genod <- myd[,-1]
> genod[is.na(genod)] <- 3
> genod[genod=="0"] <- 0
> genod[genod=="1"] <- 2
> genod[1:10,1:10]
> genod <- as.matrix(genod)
> class(genod) <- "numeric"


*Warning message:In class(genod) <- "numeric" : NAs introduced by coercion*

Maybe I could illustrate more with details so I can be more specific?
Please, let me know.

I would appreciate your help.
Thanks,
Meriam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.