Re: [R] error in plotting model from kernlab

2019-01-08 Thread PIKAL Petr
Hi

As I said I have no experience with kernlab but I can read in manual:

"probabilities matrix of class probabilities (one column for each class and one 
row for each input)"

from which I understand that  pred is matrix, which should have the same number 
of rows as df$cons but several columns. And matrix is a vector with dimensions 
what means that it is column times longer than df$cons, hence it is longer than 
df$cons.

Plotting error suggeststs that ksvc object has to be classification object and
> type = "C-bsvc",
is not the same as "C-svc".

Cheers
Petr

> -Original Message-
> From: Luigi Marongiu 
> Sent: Tuesday, January 8, 2019 4:40 PM
> To: PIKAL Petr 
> Cc: r-help 
> Subject: Re: [R] error in plotting model from kernlab
>
> Hi,
> the maintainer hasn't answered yet. The problem with 'acc' is that yes the
> objects are not of the same length but they should be: according to the 
> manual,
> ' table(pred, df$cons)' would return a 2x2 matrix of the results. This is not 
> the
> case, so there is a problem with the model -- that is why there is no plotting
> either -- even if an object of class ksvm had been created.
>
> On Tue, Jan 8, 2019 at 4:12 PM PIKAL Petr  wrote:
> >
> > Hi
> >
> > I cannot help you with kernlab
> >
> > > >  pred = predict(mod, df, type = "probabilities")  acc =
> > > > table(pred, df$cons)
> > > Error in table(pred, df$cons) : all arguments must have the same
> > > length which again is weird since mod, df and df$cons are made from
> > > the same dataframe.
> >
> > Why not check length of those objects?
> >
> > length(pred)
> > length(df$cons)
> >
> > > > plot(mod, data = df)
> > > > kernlab::plot(mod, data = df)
> > > but I get this error:
> > >
> > > Error in .local(x, ...) :
> > >   Only plots of classification ksvm objects supported
> > >
> >
> > seems to me selfexplanatory. What did maintainer said about it?
> >
> > Cheers
> > Petr
> >
> >
> > > -Original Message-
> > > From: R-help  On Behalf Of Luigi
> > > Marongiu
> > > Sent: Monday, January 7, 2019 1:26 PM
> > > To: r-help 
> > > Subject: [R] error in plotting model from kernlab
> > >
> > > Dear all,
> > > I have a set of data in this form:
> > > > str 
> > > 'data.frame': 1574 obs. of  14 variables:
> > >  $ serial: int  12751 14157 7226 15663 11088 10464 1003 10427 11934
> > > 3999 ...
> > >  $ plate : int  43 46 22 50 38 37 3 37 41 11 ...
> > >  $ well  : int  79 333 314 303 336 96 235 59 30 159 ...
> > >  $ sample: int  266 295 151 327 231 218 21 218 249 84 ...
> > >  $ target: chr  "HEV 2-AI5IQWR" "Dientamoeba fragilis-AIHSPMK"
> > > "Astro
> > > 2 Liu-AI20UKB" "C difficile GDH-AIS086J" ...
> > >  $ ori.ct: num  0 33.5 0 0 0 ...
> > >  $ ct.out: int  0 1 0 0 0 0 0 1 0 0 ...
> > >  $ mr: num  -0.002 0.109 0.002 0 0.001 0.006 0.015 0.119 0.003 0.004 
> > > ...
> > >  $ fcn   : num  44.54 36.74 6.78 43.09 44.87 ...
> > >  $ mr.out: int  0 1 0 0 0 0 0 1 0 0 ...
> > >  $ oper.a: int  0 1 0 0 0 0 0 1 0 0 ...
> > >  $ oper.b: int  0 1 0 0 0 0 0 1 0 0 ...
> > >  $ oper.c: int  0 1 0 0 0 0 0 1 0 0 ...
> > >  $ cons  : int  0 1 0 0 0 0 0 1 0 0 ...
> > > from which I have selected two numerical variables correspondig to x
> > > and y in a Cartesian plane and one outcome variable (z):
> > > > df = subset(t.data, select = c(mr, fcn, cons))  df$cons =
> > > > factor(c("negative", "positive"))
> > > > head(df)
> > >   mr   fcn cons
> > > 1 -0.002 44.54 negative
> > > 2  0.109 36.74 positive
> > > 3  0.002  6.78 negative
> > > 4  0.000 43.09 positive
> > > 5  0.001 44.87 negative
> > > 6  0.006  2.82 positive
> > >
> > > I created an SVM the method with the KERNLAB package with:
> > > > mod = ksvm(cons ~ mr+fcn, # i prefer it to the more canonical "."
> > > > but the
> > > outcome is the same
> > > data = df,
> > > type = "C-bsvc",
> > > kernel = "rbfdot",
> > > kpar = "automatic",
> > > C = 10,
> > > prob.model = TRUE)
> > >
> > > > mod
> > > Support Vector Machine object of class "ksvm"
> > >
> > > SV type: C-bsvc  (classification)
> > >  parameter : cost C = 10
> > >
> > > Gaussian Radial Basis kernel function.
> > >  Hyperparameter : sigma =  42.0923201429106
> > >
> > > Number of Support Vectors : 1439
> > >
> > > Objective Function Value : -12873.45 Training error : 0.39263
> > > Probability model included.
> > >
> > > First of all, I am not sure if the model worked because 1439 support
> > > vectors out of 1574 data points means that over 90% of the data is
> > > required to fix the hyperplane. this does not look like a model but
> > > a patch. Secondly, the prediction is rubbish -- but this is another
> > > story -- and when I try to create a confusion table of the processed
> > > data I get:
> > > >  pred = predict(mod, df, type = "probabilities")  acc =
> > > > table(pred, df$cons)
> > > Error in table(pred, df$cons) : all arguments must have the same
> > > length which again is weird since mod, df and 

[R] R help: circular dendrogram

2019-01-08 Thread N Meriam
Dear all,

I generated a circular dendrogram with R (see attached). I have a
total of 360 landraces.
What I want to do next is generate a different color for each cluster
and also generate colors to show the country/region.
I don't know if it's also possible to put a code number (associated
with each landrace) in front of each ramification.
I want to have an explicit dendrogram.


Rplot01.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
Yes, sorry. I attached the file once again.
Well, still getting the same warning.

> class(genod) <- "numeric"
Warning message:
In class(genod) <- "numeric" : NAs introduced by coercion
> class(genod)
[1] "matrix"

Then, I run the following code and it gives this:

> filn <-"simTunesian.gds"
> snpgdsCreateGeno(filn, genmat = genod,
+  sample.id = sample.id, snp.id = snp.id,
+  snp.chromosome = snp.chromosome,
+  snp.position = snp.position,
+  snp.allele = snp.allele, snpfirstdim=TRUE)
> # calculate similarity matrix
> # Open the GDS file
> (genofile <- snpgdsOpen(filn))
File: C:\Users\DELL\Documents\TEST\simTunesian.gds (1.4M)
+[  ] *
|--+ sample.id   { Str8 363 ZIP_ra(42.5%), 755B }
|--+ snp.id   { Int32 15752 ZIP_ra(35.1%), 21.6K }
|--+ snp.position   { Int32 15752 ZIP_ra(34.7%), 21.3K }
|--+ snp.chromosome   { Float64 15752 ZIP_ra(0.18%), 230B }
|--+ snp.allele   { Str8 15752 ZIP_ra(0.16%), 108B }
\--+ genotype   { Bit2 15752x363, 1.4M } *
> ibs <- snpgdsIBS(genofile, remove.monosnp = FALSE, num.thread=1)
Identity-By-State (IBS) analysis on genotypes:
Excluding 0 SNP on non-autosomes
Working space: 363 samples, 15,752 SNPs
using 1 (CPU) core
IBS:the sum of all selected genotypes (0,1,2) = 3658952
Tue Jan 08 15:38:00 2019(internal increment: 42880)
[==] 100%, completed in 0s
Tue Jan 08 15:38:00 2019Done.
> # maximum similarity value
> max(ibs$ibs)
[1] NaN
> # minimum similarity value
> min(ibs$ibs)
[1] NaN

As you can see, I can't continue my analysis (heat map plot,
clustering with hclust) because values are NaN.


On Tue, Jan 8, 2019 at 2:01 PM David L Carlson  wrote:
>
> Your attached file is not a .csv file since the field are not separated by 
> commas (just rename the mydata.csv to mydata.txt).
>
> The command "genod2 <- as.matrix(genod)" created a character matrix from the 
> data frame genod.  When you try to force genod2 to numeric, the marker column 
> becomes NAs which is probably not what you want.
>
> The error message is because you passed genod (a data frame) to the 
> snpgdsCreateGeno() function not genod2 (the matrix you created from genod).
>
> 
> David L. Carlson
> Department of Anthropology
> Texas A University
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of N Meriam
> Sent: Tuesday, January 8, 2019 1:38 PM
> To: Michael Dewey 
> Cc: r-help@r-project.org
> Subject: Re: [R] Warning message: NAs introduced by coercion
>
> Here's a portion of what my data looks like (text file format attached).
> When running in R, it gives me this:
>
> > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> > require(SNPRelate)
> > library(gdsfmt)
> > myd <- df4
> > myd <- df4
> > names(myd)[-1]
> [1] "marker" "X88""X9" "X17""X25"
> > myd[,1]
> [1]  3  4  5  6  8 10
> # the data must be 0,1,2 with 3 as missing so you have r
> > sample.id <- names(myd)[-1]
> > snp.id <- myd[,1]
> > snp.position <- 1:length(snp.id) # not needed for ibs
> > snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> # genotype data must have - in 3
> > genod <- myd[,-1]
> > genod[is.na(genod)] <- 3
> > genod[genod=="0"] <- 0
> > genod[genod=="1"] <- 2
> > genod2 <- as.matrix(genod)
> > head(genod2)
>  marker X88   X9
>  X17   X25
> [1,]  "100023173|F|0-47:G>A-47:G>A" "0""3""3" "3"
> [2,]  "1043336|F|0-7:A>G-7:A>G" "2""0""3" "0"
> [3,]  "1212218|F|0-49:A>G-49:A>G" "0""0""0" "0"
> [4,]  "1019554|F|0-14:T>C-14:T>C"   "0"   "0""3" "0"
> [5,]  "100024550|F|0-16:G>A-16:G>A" "3""3""3" "3"
> [6,]  "1106702|F|0-8:C>A-8:C>A"  "0"   "0" "0" "0"
> > class(genod2) <- "numeric"
> Warning message: In class(genod2) <- "numeric" : NAs introduced by coercion
> > head(genod2)
> marker   X88  X9   X17  X25
> [1,] NA 0  3 3   3
> [2,] NA 2  0 3   0
> [3,] NA 0  0 0   0
> [4,] NA 0  0 3   0
> [5,] NA 3  3 3   3
> [6,] NA 0  0 0   0
> > class(genod2) <- "numeric"
> > class(genod2)
> [1] "matrix"
> # read data
> > filn <-"simTunesian.gds"
> > snpgdsCreateGeno(filn, genmat = genod,
> +  sample.id = sample.id, snp.id = snp.id,
> +  snp.chromosome = snp.chromosome,
> +  snp.position = snp.position,
> +  snp.allele = snp.allele, snpfirstdim=TRUE)
> Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = sample.id,
>  :   is.matrix(genmat) is not TRUE
>
> Can't find a solution to my problem...my guess is that the problem
> comes from converting the column 'marker' factor to 

Re: [R] objects are masked _by_ '.GlobalEnv'

2019-01-08 Thread Jeff Newmiller
There is a mailing list for questions about packages... see the Posting Guide.

On January 8, 2019 11:48:37 AM PST, Troels Ring  wrote:
>Dear friends - this is really a question I'm sorry about since it
>doesn't
>follow the requirements. I have made a R package via RStudio and it
>causes
>problems when I try to load some data from within the package. I'm on
>windows, R version 3.5.1 (2018-07-02). 
>
> 
>
>When I am in the directory with the package project (also with plain R)
>
> 
>
>> data(Schell)
>
>> library(chaRBAL)
>
> 
>
>Attaches package: 'chaRBAL'my translation from Danish
>
> 
>
>The following objects are masked _by_ '.GlobalEnv':
>
> 
>
>Na, TOTAL, WA
>
> 
>
>#  BUT: the  values are correct from data(Schell):
>
> 
>
>> Na
>
>[1] 0.008 0.024 0.044 0.064 0.082 0.098 0.114 0.128 0.142 0.154 0.166
>0.176
>0.188 0.198 0.206 0.214 0.224 0.232
>
>[19] 0.242 0.252 0.264 0.278 0.292 0.310 0.330 0.348 0.364 0.374 0.384
>0.390
>
>> TOTAL
>
>   [,1]  [,2]
>
>[1,] 0.004 0.098
>
>[2,] 0.012 0.094
>
>[3,] 0.022 0.089
>
>[4,] 0.032 0.084
>
>[5,] 0.041 0.079
>
>25 more so
>
>> WA
>
>$`buffs`
>
>$`buffs`[[1]]
>
>[1] "Phos"
>
> 
>
>$`buffs`[[2]]
>
>[1] "Cit"
>
> 
>
> 
>
>$KA
>
>$KA[[1]]
>
>[1] 6.918310e-03 6.165950e-08 4.786301e-13
>
> 
>
>$KA[[2]]
>
>[1] 7.413102e-04 1.737801e-05 3.981072e-07
>
> 
>
># Which is all OK
>
># But when now I make the same call again
>
> 
>
> 
>
>> data(Schell)
>
>ls()
>
># [1] "Alb"   "Ca""Cl""K" "Lact"  "Mg""Na""PCO2" 
>"S1"
>
>
>#[10] "TOTAL" "WA"   
>
> 
>
>TOTAL
>
>#  [,1]   [,2]   [,3]
>
>#   [1,] 0.0267 0.0267 0.0267
>
>#   [2,] 0.0200 0.0200 0.0200
>
> 
>
># which is wrong and belongs to another included dataset. How did that
>happen to be caught in globalenvironment, how can I avoid that and get
>rid
># of it?
>
> 
>
>I can see I need to know more about environments. What do you think
>happens?
>
> 
>
>All best wishes
>
>Troels Ring, MD
>
>Aalborg
>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] External validation for a hurdle model (pscl)

2019-01-08 Thread Jeff Newmiller
That said, the gist of the OP's outline is correct, and the main reason to look 
elsewhere is to get more thorough advice on what statistical concerns should be 
addressed than would be on topic here.

One comment: reviewing plots of differences versus various independent 
variables for systematic biases is a task R is particularly well suited for, 
but discovering which plots highlight issues with your model or data takes 
familiarity with your data (explore) and with theory (which you learn 
elsewhere) and with R (which we can help with if you have more specific 
questions).

On January 8, 2019 10:50:14 AM PST, Bert Gunter  wrote:
>This list is (mostly) about R programming. Your query is (mostly) about
>statistics. So you should post on a statistics site like
>stats.stackexchange.com
>not here; I am pretty sure you'll receive lots of answers there.
>
>Cheers,
>Bert
>
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and
>sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Tue, Jan 8, 2019 at 10:18 AM Maria Eugenia Utgés
>
>wrote:
>
>> Hi R-list,
>> We have constructed a hurdle model some time ago.
>> Now we were able to gather new data in the same city (38 new sites),
>and
>> want to do an external validation to see if the model still performs
>ok.
>> All the books and lectures I have read say its the best validation
>option
>> but...
>> I have made a (simple) search, but it seems that as having new data
>for a
>> model is rare, have not found anything with the depth enough so as to
>> reproduce it/adapt it to hurdle models.
>>
>> I have predicted the probability for non-zero counts
>> nonzero <- 1 - predict(final, newdata = datosnuevos, type = "prob")[,
>1]
>>
>> and the predicted mean from the count component
>> countmean <- predict(final, newdata = datosnuevos, type = "count")
>>
>> I understand that "newdata" is taking into account the new values for
>the
>> independent variables (environmental variables), is it?
>>
>> So, I have to compare the predicted values of y (calculated with the
>new
>> values of the environmental variables) with the new observed values.
>>
>> That would be using the model (constructed with the old values),
>having as
>> input the new variables, and having as output a "new" prediction, to
>be
>> contrasted with the "new" observed y.
>>
>> These comparison would be by means of AUC, correct classification,
>and/or
>> what other options? Results of the external validation would just be
>a % of
>> correct predicted values? plots?
>>
>> Need some guidance, sorry if the explanation was "basic" but needed
>to
>> write it in my own words so as not to miss any detail.
>>
>> Thank you very much in advance,
>>
>> María Eugenia Utgés
>>
>> CeNDIE-ANLIS
>> Buenos Aires
>> Argentina
>> a
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread David L Carlson
Your attached file is not a .csv file since the field are not separated by 
commas (just rename the mydata.csv to mydata.txt).

The command "genod2 <- as.matrix(genod)" created a character matrix from the 
data frame genod.  When you try to force genod2 to numeric, the marker column 
becomes NAs which is probably not what you want.

The error message is because you passed genod (a data frame) to the 
snpgdsCreateGeno() function not genod2 (the matrix you created from genod).


David L. Carlson
Department of Anthropology
Texas A University

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of N Meriam
Sent: Tuesday, January 8, 2019 1:38 PM
To: Michael Dewey 
Cc: r-help@r-project.org
Subject: Re: [R] Warning message: NAs introduced by coercion

Here's a portion of what my data looks like (text file format attached).
When running in R, it gives me this:

> df4 <- read.csv(file = "mydata.csv", header = TRUE)
> require(SNPRelate)
> library(gdsfmt)
> myd <- df4
> myd <- df4
> names(myd)[-1]
[1] "marker" "X88""X9" "X17""X25"
> myd[,1]
[1]  3  4  5  6  8 10
# the data must be 0,1,2 with 3 as missing so you have r
> sample.id <- names(myd)[-1]
> snp.id <- myd[,1]
> snp.position <- 1:length(snp.id) # not needed for ibs
> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
# genotype data must have - in 3
> genod <- myd[,-1]
> genod[is.na(genod)] <- 3
> genod[genod=="0"] <- 0
> genod[genod=="1"] <- 2
> genod2 <- as.matrix(genod)
> head(genod2)
 marker X88   X9
 X17   X25
[1,]  "100023173|F|0-47:G>A-47:G>A" "0""3""3" "3"
[2,]  "1043336|F|0-7:A>G-7:A>G" "2""0""3" "0"
[3,]  "1212218|F|0-49:A>G-49:A>G" "0""0""0" "0"
[4,]  "1019554|F|0-14:T>C-14:T>C"   "0"   "0""3" "0"
[5,]  "100024550|F|0-16:G>A-16:G>A" "3""3""3" "3"
[6,]  "1106702|F|0-8:C>A-8:C>A"  "0"   "0" "0" "0"
> class(genod2) <- "numeric"
Warning message: In class(genod2) <- "numeric" : NAs introduced by coercion
> head(genod2)
marker   X88  X9   X17  X25
[1,] NA 0  3 3   3
[2,] NA 2  0 3   0
[3,] NA 0  0 0   0
[4,] NA 0  0 3   0
[5,] NA 3  3 3   3
[6,] NA 0  0 0   0
> class(genod2) <- "numeric"
> class(genod2)
[1] "matrix"
# read data
> filn <-"simTunesian.gds"
> snpgdsCreateGeno(filn, genmat = genod,
+  sample.id = sample.id, snp.id = snp.id,
+  snp.chromosome = snp.chromosome,
+  snp.position = snp.position,
+  snp.allele = snp.allele, snpfirstdim=TRUE)
Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = sample.id,
 :   is.matrix(genmat) is not TRUE

Can't find a solution to my problem...my guess is that the problem
comes from converting the column 'marker' factor to numerical.

Best,
Meriam

On Tue, Jan 8, 2019 at 11:28 AM Michael Dewey  wrote:
>
> Dear Meriam
>
> Your csv file did not come through as attachments are stripped unless of
> certain types and you post is very hard to read since you are posting in
> HTML. Try renaming the file to .txt and set your mailer to send
> plain text then people may be able to help you better.
>
> Michael
>
> On 08/01/2019 15:35, N Meriam wrote:
> > I see...
> > Here's a portion of what my data looks like (csv file attached).
> > I run again and here are the results:
> >
> > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> >
> >> require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> 
> >> names(myd)[-1][1] "marker" "X88""X9" "X17""X25"
> >
> >> myd[,1][1]  3  4  5  6  8 10
> >
> >
> >> # the data must be 0,1,2 with 3 as missing so you have r> sample.id <- 
> >> names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not 
> >> needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed 
> >> for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs> # 
> >> genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 
> >> 3> genod[genod=="0"] <- 0> genod[genod=="1"] <- 2
> >
> >> genod2 <- as.matrix(genod)> head(genod2) marker
> >> X88 X9  X17 X25
> > [1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
> > [2,] "1043336|F|0-7:A>G-7:A>G" "2" "0" "3" "0"
> > [3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
> > [4,] "1019554|F|0-14:T>C-14:T>C"   "0" "0" "3" "0"
> > [5,] "100024550|F|0-16:G>A-16:G>A" "3" "3" "3" "3"
> > [6,] "1106702|F|0-8:C>A-8:C>A" "0" "0" "0" "0"
> >
> >> class(genod2) <- "numeric"Warning message:In class(genod2) <- "numeric" : 
> >> NAs introduced by coercion> head(genod2)
> >
> >   marker X88 X9 X17 X25
> > [1,] NA   0  3   3   3
> > [2,]

Re: [R] Question

2019-01-08 Thread Jeff Newmiller
Er, just keep it simple, Marc... give one option:

library(lattice)

If you _ever_ use require() without acting upon the return value then you are 
setting yourself or someone else up for confusing missing objects errors 
someday for no good reason. This _isn't_ just personal preference... by 
choosing to use the require function you are taking responsibility for the case 
where that package is missing, and by ignoring the return value you are 
immediately abdicating that responsibility. Let the error appear where it makes 
sense by using the library function in the first place.

On January 8, 2019 10:56:57 AM PST, Marc Schwartz via R-help 
 wrote:
>Guys,
>
>lattice is a "recommended" package, which means that it is installed by
>default with any standard R installation.
>
>Thus, all that is required, as Sarah noted in an earlier reply, is
>either:
>
>  library(lattice)
>
>or 
>
>  require(lattice)
>
>depending upon preference.
>
>latticeExtra, on the other hand, is a third party package that would
>need to be installed separately, if desired.
>
>Regards,
>
>Marc Schwartz
>
>
>> On Jan 8, 2019, at 1:46 PM, Bert Gunter 
>wrote:
>> 
>> I think it's ?install.packages
>> 
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming
>along and
>> sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Tue, Jan 8, 2019 at 9:50 AM Rich Shepard
>
>> wrote:
>> 
>>> On Tue, 8 Jan 2019, S. Mahmoud Nasrollahi wrote:
>>> 
 I have got a problem during working with some package in R and in
>spite
>>> of
 trying with R help, internet and any other resources I could not
>succeed.
 Indeed when I what to install some function like bwplot, boxplot,
>xyplot
>>> I
 receive this sort of messages: Warning in install.packages :
>package
 ‘xyplot’ is not available (for R version 3.5.2) Do you know how I
>can
 solve that?
>>> 
>>>   Yep. Those plots are part of the lattice package. You can install
>>> lattice
>>> (and latticeExtra if you want) with
>>> 
 installpkg("lattice")
>>> 
>>> Happy plotting,
>>> 
>>> Rich
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] objects are masked _by_ '.GlobalEnv'

2019-01-08 Thread Troels Ring
Dear friends - this is really a question I'm sorry about since it doesn't
follow the requirements. I have made a R package via RStudio and it causes
problems when I try to load some data from within the package. I'm on
windows, R version 3.5.1 (2018-07-02). 

 

When I am in the directory with the package project (also with plain R)

 

> data(Schell)

> library(chaRBAL)

 

Attaches package: 'chaRBAL'my translation from Danish

 

The following objects are masked _by_ '.GlobalEnv':

 

Na, TOTAL, WA

 

#  BUT: the  values are correct from data(Schell):

 

> Na

[1] 0.008 0.024 0.044 0.064 0.082 0.098 0.114 0.128 0.142 0.154 0.166 0.176
0.188 0.198 0.206 0.214 0.224 0.232

[19] 0.242 0.252 0.264 0.278 0.292 0.310 0.330 0.348 0.364 0.374 0.384 0.390

> TOTAL

   [,1]  [,2]

[1,] 0.004 0.098

[2,] 0.012 0.094

[3,] 0.022 0.089

[4,] 0.032 0.084

[5,] 0.041 0.079

25 more so

> WA

$`buffs`

$`buffs`[[1]]

[1] "Phos"

 

$`buffs`[[2]]

[1] "Cit"

 

 

$KA

$KA[[1]]

[1] 6.918310e-03 6.165950e-08 4.786301e-13

 

$KA[[2]]

[1] 7.413102e-04 1.737801e-05 3.981072e-07

 

# Which is all OK

# But when now I make the same call again

 

 

> data(Schell)

ls()

# [1] "Alb"   "Ca""Cl""K" "Lact"  "Mg""Na""PCO2"  "S1"


#[10] "TOTAL" "WA"   

 

TOTAL

#  [,1]   [,2]   [,3]

#   [1,] 0.0267 0.0267 0.0267

#   [2,] 0.0200 0.0200 0.0200

 

# which is wrong and belongs to another included dataset. How did that
happen to be caught in globalenvironment, how can I avoid that and get rid
# of it?

 

I can see I need to know more about environments. What do you think happens?

 

All best wishes

Troels Ring, MD

Aalborg


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
Here's a portion of what my data looks like (text file format attached).
When running in R, it gives me this:

> df4 <- read.csv(file = "mydata.csv", header = TRUE)
> require(SNPRelate)
> library(gdsfmt)
> myd <- df4
> myd <- df4
> names(myd)[-1]
[1] "marker" "X88""X9" "X17""X25"
> myd[,1]
[1]  3  4  5  6  8 10
# the data must be 0,1,2 with 3 as missing so you have r
> sample.id <- names(myd)[-1]
> snp.id <- myd[,1]
> snp.position <- 1:length(snp.id) # not needed for ibs
> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
# genotype data must have - in 3
> genod <- myd[,-1]
> genod[is.na(genod)] <- 3
> genod[genod=="0"] <- 0
> genod[genod=="1"] <- 2
> genod2 <- as.matrix(genod)
> head(genod2)
 marker X88   X9
 X17   X25
[1,]  "100023173|F|0-47:G>A-47:G>A" "0""3""3" "3"
[2,]  "1043336|F|0-7:A>G-7:A>G" "2""0""3" "0"
[3,]  "1212218|F|0-49:A>G-49:A>G" "0""0""0" "0"
[4,]  "1019554|F|0-14:T>C-14:T>C"   "0"   "0""3" "0"
[5,]  "100024550|F|0-16:G>A-16:G>A" "3""3""3" "3"
[6,]  "1106702|F|0-8:C>A-8:C>A"  "0"   "0" "0" "0"
> class(genod2) <- "numeric"
Warning message: In class(genod2) <- "numeric" : NAs introduced by coercion
> head(genod2)
marker   X88  X9   X17  X25
[1,] NA 0  3 3   3
[2,] NA 2  0 3   0
[3,] NA 0  0 0   0
[4,] NA 0  0 3   0
[5,] NA 3  3 3   3
[6,] NA 0  0 0   0
> class(genod2) <- "numeric"
> class(genod2)
[1] "matrix"
# read data
> filn <-"simTunesian.gds"
> snpgdsCreateGeno(filn, genmat = genod,
+  sample.id = sample.id, snp.id = snp.id,
+  snp.chromosome = snp.chromosome,
+  snp.position = snp.position,
+  snp.allele = snp.allele, snpfirstdim=TRUE)
Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = sample.id,
 :   is.matrix(genmat) is not TRUE

Can't find a solution to my problem...my guess is that the problem
comes from converting the column 'marker' factor to numerical.

Best,
Meriam

On Tue, Jan 8, 2019 at 11:28 AM Michael Dewey  wrote:
>
> Dear Meriam
>
> Your csv file did not come through as attachments are stripped unless of
> certain types and you post is very hard to read since you are posting in
> HTML. Try renaming the file to .txt and set your mailer to send
> plain text then people may be able to help you better.
>
> Michael
>
> On 08/01/2019 15:35, N Meriam wrote:
> > I see...
> > Here's a portion of what my data looks like (csv file attached).
> > I run again and here are the results:
> >
> > df4 <- read.csv(file = "mydata.csv", header = TRUE)
> >
> >> require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> 
> >> names(myd)[-1][1] "marker" "X88""X9" "X17""X25"
> >
> >> myd[,1][1]  3  4  5  6  8 10
> >
> >
> >> # the data must be 0,1,2 with 3 as missing so you have r> sample.id <- 
> >> names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not 
> >> needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed 
> >> for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs> # 
> >> genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 
> >> 3> genod[genod=="0"] <- 0> genod[genod=="1"] <- 2
> >
> >> genod2 <- as.matrix(genod)> head(genod2) marker
> >> X88 X9  X17 X25
> > [1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
> > [2,] "1043336|F|0-7:A>G-7:A>G" "2" "0" "3" "0"
> > [3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
> > [4,] "1019554|F|0-14:T>C-14:T>C"   "0" "0" "3" "0"
> > [5,] "100024550|F|0-16:G>A-16:G>A" "3" "3" "3" "3"
> > [6,] "1106702|F|0-8:C>A-8:C>A" "0" "0" "0" "0"
> >
> >> class(genod2) <- "numeric"Warning message:In class(genod2) <- "numeric" : 
> >> NAs introduced by coercion> head(genod2)
> >
> >   marker X88 X9 X17 X25
> > [1,] NA   0  3   3   3
> > [2,] NA   2  0   3   0
> > [3,] NA   0  0   0   0
> > [4,] NA   0  0   3   0
> > [5,] NA   3  3   3   3
> > [6,] NA   0  0   0   0
> >
> >> class(genod2) <- "numeric"> class(genod2)[1] "matrix"
> >
> >> # read data > filn <-"simTunesian.gds"> snpgdsCreateGeno(filn, genmat = 
> >> genod,+  sample.id = sample.id, snp.id = snp.id,+  
> >> snp.chromosome = snp.chromosome,+  snp.position = 
> >> snp.position,+  snp.allele = snp.allele, 
> >> snpfirstdim=TRUE)Error in snpgdsCreateGeno(filn, genmat = genod, sample.id 
> >> = sample.id,  :
> >is.matrix(genmat) is not TRUE
> >
> > Thanks,
> > Meriam
> >
> > On Tue, Jan 8, 2019 at 9:02 AM PIKAL Petr  wrote:
> >
> >> Hi
> >>
> >> see in line
> >>
> >>> -Original Message-
> >>> From: R-help 

Re: [R] Question

2019-01-08 Thread Rich Shepard

On Tue, 8 Jan 2019, Bert Gunter wrote:


I think it's ?install.packages


Bert,

  Of course it is. My apologies to the original poster.

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2019-01-08 Thread Rich Shepard

On Tue, 8 Jan 2019, Marc Schwartz wrote:


lattice is a "recommended" package, which means that it is installed by
default with any standard R installation.


Marc,

  Thanks for the reminder.

Regards,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2019-01-08 Thread Marc Schwartz via R-help
Guys,

lattice is a "recommended" package, which means that it is installed by default 
with any standard R installation.

Thus, all that is required, as Sarah noted in an earlier reply, is either:

  library(lattice)

or 

  require(lattice)

depending upon preference.

latticeExtra, on the other hand, is a third party package that would need to be 
installed separately, if desired.

Regards,

Marc Schwartz


> On Jan 8, 2019, at 1:46 PM, Bert Gunter  wrote:
> 
> I think it's ?install.packages
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Tue, Jan 8, 2019 at 9:50 AM Rich Shepard 
> wrote:
> 
>> On Tue, 8 Jan 2019, S. Mahmoud Nasrollahi wrote:
>> 
>>> I have got a problem during working with some package in R and in spite
>> of
>>> trying with R help, internet and any other resources I could not succeed.
>>> Indeed when I what to install some function like bwplot, boxplot, xyplot
>> I
>>> receive this sort of messages: Warning in install.packages : package
>>> ‘xyplot’ is not available (for R version 3.5.2) Do you know how I can
>>> solve that?
>> 
>>   Yep. Those plots are part of the lattice package. You can install
>> lattice
>> (and latticeExtra if you want) with
>> 
>>> installpkg("lattice")
>> 
>> Happy plotting,
>> 
>> Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [ESS] ess-noweb-font-lock-mode: emacs hangs using uncomment-region with math environments

2019-01-08 Thread Braun, Michael via ESS-help
Thanks.  polymode does appear to resolve the font-lock problem with my .Rnw 
files.

But it is not clear to me the best way to set up the various poly-packages.  I 
have not found a good “migration guide” online.  Could you offer some 
additional suggestions on what to add to my Preferences.el file? (I am using 
Aquamacs 3.5, based on Emacs 25.3.50.1). Or perhaps others could share their 
own configurations?

Below are the relevant lines in my current Preferences.el file. Other than the 
first three, which I added today to try polymode, the rest were cobbled 
together from various online sources over the years.  Can you help me 
understand which variables would still apply to ESS and polymode, which are no 
longer necessary, and others that I should add?  I am particularly interested 
in enabling preview-latex support for .Rnw files, like I have for .tex. I was 
never able to get that to work right before.

Again, thanks for the support, and the continued development.  It’s much 
appreciated.

Michael

—— My Preferences.el file — — 

(load "ess-autoloads”) 
(require 'poly-R)
(poly-noweb+R-mode)
;;(setq inferior-R-program "/Library/Frameworks/R.framework/Resources/bin/R”)

(add-hook 'ess-mode-hook
  (lambda ()
(ess-set-style 'C++ 'quiet)
(add-hook 'local-write-file-hooks
  (lambda () (ess-nuke-trailing-whitespace))
  (setq ess-nuke-trailing-whitespace-p 't)
  )
(setq ess-first-continued-statement-offset 2
  ess-continued-statement-offset 0
  ess-arg-function-offset t)
))

 (setq ess-swv-plug-into-AUCTeX-p t)

(defun ess-swv-add-TeX-commands ()
;;   various Knit and LaTex commands here
)

;; Presumably these next functions enables AUCTeX and Preview support for .Rnw 
files,
;;   but I’ve never been able to get that to work.
(setq ess-swv-processor 'knitr)
(setq ess-swv-pdflatex-commands ‘pdflatex)

;; ESS Markdown.  Again, is this function redundant to a simple (require 
‘poly-markdown) ?
(defun rmd-mode ()
  "ESS Markdown mode for rmd files"
  (interactive)
  (R-mode)
  (require 'poly-R)
  (require 'poly-markdown)
  (poly-markdown+r-mode))

(setq auto-mode-alist (cons '("\\.Rmd\\'" . rmd-mode)
auto-mode-alist))
(setq auto-mode-alist (cons '("\\.rmd\\'" . rmd-mode)
auto-mode-alist))



--
Michael Braun, Ph.D.
Associate Professor of Marketing, and
  Corrigan Research Professor
Cox School of Business
Southern Methodist University
Dallas, TX 75275
braunm _at_ smu.edu






> On Jan 7, 2019, at 10:43 AM, Alex Branham  wrote:
> 
> Hi Michael -
> 
> ESS's noweb implementation has many bugs like this that are hard to fix.
> We're encouraging users of mixed major-mode buffers to move to polymode,
> which is more actively developed.
> 
> https://polymode.github.io/
> 
> Hope that helps,
> Alex
> 
> On Sun 06 Jan 2019 at 21:50, Braun, Michael via ESS-help 
>  wrote:
> 
>> In a .Rnw file, when calling uncomment-region on a region that contains a 
>> LaTeX math environment (such as align), Emacs will hang, requiring a Force 
>> Quit.  I am using Aquamacs Emacs 3.5, ESS 18.10.2, and MacOS 10.14.2, but 
>> this problem has persisted since at least Aquamacs 3.2 and ESS 16.10.   I’ve 
>> been wrestling with this issue since at least 2016, but now it’s time to get 
>> some help.
>> 
>> To replicate, save the following content in a file with a .Rnw extension. 
>> Then, select a region that includes the equation, and run comment-region, 
>> and then uncomment-region.  I have these functions are bound to M-; or C-c ; 
>> .
>> 
>> --
>> \documentclass{article}
>> \usepackage{amsmath}
>> 
>> \begin{document}
>> 
>> On the following equation, try comment-region, and then uncomment-region.
>> The uncomment-region call is what hangs the process.
>> 
>> \begin{align}
>>  1+1=2
>> \end{align}
>> 
>> \end{document}
>> ---
>> 
>> My longstanding workaround is to either toggle ess-noweb-font-lock-mode off, 
>> or toggle font-lock-mode on, right after opening the file.  This makes the 
>> problem with uncomment-region disappear. But then I lose the ESS font-lock 
>> features.
>> 
>> Interestingly, this does not crash Emacs  if the file is saved with a .tex 
>> extension.  Also, in a .tex file, comment-region prefixes lines with %, but 
>> in a .Rnw file, the comment prefix is %%. I’m not sure if that’s relevant, 
>> but it might be, and I’d like to find a way to change that behavior as well.
>> 
>> Any thoughts?
>> 
>> Thanks,
>> 
>> Michael Braun
>> braunm _at_ smu.edu
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> __
>> ESS-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/ess-help

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


[R] Recursive feature elimination keeping the weights constant

2019-01-08 Thread Priyanka Purkayastha
Dear All,

I am trying to build a model by doing recursive elimination of weights one
by one.

This is the example matrix

ID 885038 885039 885040 885041 885042 885043 Label
weights 0.000236 0.004591 0.00017 0.018113 0.000238 0.006537 N/A
1267359 2 0 0 0 0 1 1
1295720 0 0 0 0 0 1 1
1295721 0 0 0 0 0 1 1
1295723 0 0 0 0 0 1 0
1295724 0 0 0 1 0 1 0
1296724 0 0 0 1 0 1 0
12957243 0 0 0 0 0 1 0
12957424 0 0 0 1 0 1 0
12967244 0 0 0 1 0 1 0
12673529 2 0 0 0 0 1 1
1295720 0 0 0 0 0 1 1
12957221 0 0 0 0 0 1 1
Bellow is the code I have written to eliminate minimum rows of weights one
by one and build SVM model.

library(e1071)
library(caret)
library(gplots)
library(ROCR)

data <- read.csv("data.csv", header = TRUE)
rownames(data) <- data[,1]
data<-data[,-1]

for (k in 1:ncol(data))
  {
  rowMin = which.min(data[1,])
  data = data[-rowMin,]
  data = data[-1,]

  inTraining <- createDataPartition(data$Class, p = .70, list = FALSE)
  training <- data[ inTraining,]
  testing  <- data[-inTraining,]

  ## Building the model 
  svm.model <- svm(Label ~ ., data = training,
cross=10,metric="ROC",type="eps-regression",kernel="linear",na.action=na.omit,probability
= TRUE)

  #prediction and ROC
  svm.model$index
  svm.pred <- predict(svm.model, testing, probability = TRUE)

  #calculating auc
  c <- as.numeric(svm.pred)
  c = c - 1
  pred <- prediction(c, testing$Label)
  perf <- performance(pred,"tpr","fpr")
  plot(perf,fpr.stop=0.1)
  auc <- performance(pred, measure = "auc")
  auc <- auc@y.values[[1]]
  print(paste(ncol(data),colnames(data)[rowMin],auc))

  }

I want my output, like

number of columns, colname with minimum weight, AUC
5 , 885039, 0.67

But I get the following error
Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :
  ‘cross’ cannot exceed the number of observations!
In addition: Warning message:
In svm.default(x, y, scale = scale, ..., na.action = na.action) :
  Variable(s) ‘X885039’ and ‘X885040’ and ‘X885042’ and ‘X885043’ constant.
Cannot scale data.

I

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] External validation for a hurdle model (pscl)

2019-01-08 Thread Bert Gunter
This list is (mostly) about R programming. Your query is (mostly) about
statistics. So you should post on a statistics site like
stats.stackexchange.com
not here; I am pretty sure you'll receive lots of answers there.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 8, 2019 at 10:18 AM Maria Eugenia Utgés 
wrote:

> Hi R-list,
> We have constructed a hurdle model some time ago.
> Now we were able to gather new data in the same city (38 new sites), and
> want to do an external validation to see if the model still performs ok.
> All the books and lectures I have read say its the best validation option
> but...
> I have made a (simple) search, but it seems that as having new data for a
> model is rare, have not found anything with the depth enough so as to
> reproduce it/adapt it to hurdle models.
>
> I have predicted the probability for non-zero counts
> nonzero <- 1 - predict(final, newdata = datosnuevos, type = "prob")[, 1]
>
> and the predicted mean from the count component
> countmean <- predict(final, newdata = datosnuevos, type = "count")
>
> I understand that "newdata" is taking into account the new values for the
> independent variables (environmental variables), is it?
>
> So, I have to compare the predicted values of y (calculated with the new
> values of the environmental variables) with the new observed values.
>
> That would be using the model (constructed with the old values), having as
> input the new variables, and having as output a "new" prediction, to be
> contrasted with the "new" observed y.
>
> These comparison would be by means of AUC, correct classification, and/or
> what other options? Results of the external validation would just be a % of
> correct predicted values? plots?
>
> Need some guidance, sorry if the explanation was "basic" but needed to
> write it in my own words so as not to miss any detail.
>
> Thank you very much in advance,
>
> María Eugenia Utgés
>
> CeNDIE-ANLIS
> Buenos Aires
> Argentina
> a
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2019-01-08 Thread Bert Gunter
I think it's ?install.packages

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 8, 2019 at 9:50 AM Rich Shepard 
wrote:

> On Tue, 8 Jan 2019, S. Mahmoud Nasrollahi wrote:
>
> > I have got a problem during working with some package in R and in spite
> of
> > trying with R help, internet and any other resources I could not succeed.
> > Indeed when I what to install some function like bwplot, boxplot, xyplot
> I
> > receive this sort of messages: Warning in install.packages : package
> > ‘xyplot’ is not available (for R version 3.5.2) Do you know how I can
> > solve that?
>
>Yep. Those plots are part of the lattice package. You can install
> lattice
> (and latticeExtra if you want) with
>
> > installpkg("lattice")
>
> Happy plotting,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] External validation for a hurdle model (pscl)

2019-01-08 Thread Maria Eugenia Utgés
Hi R-list,
We have constructed a hurdle model some time ago.
Now we were able to gather new data in the same city (38 new sites), and
want to do an external validation to see if the model still performs ok.
All the books and lectures I have read say its the best validation option
but...
I have made a (simple) search, but it seems that as having new data for a
model is rare, have not found anything with the depth enough so as to
reproduce it/adapt it to hurdle models.

I have predicted the probability for non-zero counts
nonzero <- 1 - predict(final, newdata = datosnuevos, type = "prob")[, 1]

and the predicted mean from the count component
countmean <- predict(final, newdata = datosnuevos, type = "count")

I understand that "newdata" is taking into account the new values for the
independent variables (environmental variables), is it?

So, I have to compare the predicted values of y (calculated with the new
values of the environmental variables) with the new observed values.

That would be using the model (constructed with the old values), having as
input the new variables, and having as output a "new" prediction, to be
contrasted with the "new" observed y.

These comparison would be by means of AUC, correct classification, and/or
what other options? Results of the external validation would just be a % of
correct predicted values? plots?

Need some guidance, sorry if the explanation was "basic" but needed to
write it in my own words so as not to miss any detail.

Thank you very much in advance,

María Eugenia Utgés

CeNDIE-ANLIS
Buenos Aires
Argentina
a

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2019-01-08 Thread Rich Shepard

On Tue, 8 Jan 2019, S. Mahmoud Nasrollahi wrote:


I have got a problem during working with some package in R and in spite of
trying with R help, internet and any other resources I could not succeed.
Indeed when I what to install some function like bwplot, boxplot, xyplot I
receive this sort of messages: Warning in install.packages : package
‘xyplot’ is not available (for R version 3.5.2) Do you know how I can
solve that?


  Yep. Those plots are part of the lattice package. You can install lattice
(and latticeExtra if you want) with


installpkg("lattice")


Happy plotting,

Rich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread Michael Dewey

Dear Meriam

Your csv file did not come through as attachments are stripped unless of 
certain types and you post is very hard to read since you are posting in 
HTML. Try renaming the file to .txt and set your mailer to send 
plain text then people may be able to help you better.


Michael

On 08/01/2019 15:35, N Meriam wrote:

I see...
Here's a portion of what my data looks like (csv file attached).
I run again and here are the results:

df4 <- read.csv(file = "mydata.csv", header = TRUE)


require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> names(myd)[-1][1] "marker" "X88"
"X9" "X17""X25"



myd[,1][1]  3  4  5  6  8 10




# the data must be 0,1,2 with 3 as missing so you have r> sample.id <- names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not 
needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for 
ibs> # genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 3> genod[genod=="0"] <- 0> 
genod[genod=="1"] <- 2



genod2 <- as.matrix(genod)> head(genod2) markerX88 
X9  X17 X25

[1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
[2,] "1043336|F|0-7:A>G-7:A>G" "2" "0" "3" "0"
[3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
[4,] "1019554|F|0-14:T>C-14:T>C"   "0" "0" "3" "0"
[5,] "100024550|F|0-16:G>A-16:G>A" "3" "3" "3" "3"
[6,] "1106702|F|0-8:C>A-8:C>A" "0" "0" "0" "0"


class(genod2) <- "numeric"Warning message:In class(genod2) <- "numeric" : NAs 
introduced by coercion> head(genod2)


  marker X88 X9 X17 X25
[1,] NA   0  3   3   3
[2,] NA   2  0   3   0
[3,] NA   0  0   0   0
[4,] NA   0  0   3   0
[5,] NA   3  3   3   3
[6,] NA   0  0   0   0


class(genod2) <- "numeric"> class(genod2)[1] "matrix"



# read data > filn <-"simTunesian.gds"> snpgdsCreateGeno(filn, genmat = genod,+ 
 sample.id = sample.id, snp.id = snp.id,+  snp.chromosome = 
snp.chromosome,+  snp.position = snp.position,+  snp.allele = 
snp.allele, snpfirstdim=TRUE)Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = 
sample.id,  :

   is.matrix(genmat) is not TRUE

Thanks,
Meriam

On Tue, Jan 8, 2019 at 9:02 AM PIKAL Petr  wrote:


Hi

see in line


-Original Message-
From: R-help  On Behalf Of N Meriam
Sent: Tuesday, January 8, 2019 3:08 PM
To: r-help@r-project.org
Subject: [R] Warning message: NAs introduced by coercion

Dear all,

I have a .csv file called df4. (15752 obs. of 264 variables).
I apply this code but couldn't continue further other analyses, a warning
message keeps coming up. Then, I want to determine max and min
similarity values,
heat map plot, cluster...etc


require(SNPRelate)
library(gdsfmt)
myd <- read.csv(file = "df4.csv", header = TRUE)
names(myd)[-1]

myd[,1]

myd[1:10, 1:10]

  # the data must be 0,1,2 with 3 as missing so you have r

sample.id <- names(myd)[-1]
snp.id <- myd[,1]
snp.position <- 1:length(snp.id) # not needed for ibs
snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs

# genotype data must have - in 3

genod <- myd[,-1]
genod[is.na(genod)] <- 3
genod[genod=="0"] <- 0
genod[genod=="1"] <- 2
genod[1:10,1:10]
genod <- as.matrix(genod)


matrix can have only one type of data so you probaly changed it to
character by such construction.


class(genod) <- "numeric"


This tries to change all "numeric" values to numbers but if it cannot it
sets it to NA.

something like


head(iris)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  5.1 3.5  1.4 0.2  setosa
2  4.9 3.0  1.4 0.2  setosa
3  4.7 3.2  1.3 0.2  setosa
4  4.6 3.1  1.5 0.2  setosa
5  5.0 3.6  1.4 0.2  setosa
6  5.4 3.9  1.7 0.4  setosa

ir <-head(iris)
irm <- as.matrix(ir)
head(irm)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 "5.1""3.5"   "1.4""0.2"   "setosa"
2 "4.9""3.0"   "1.4""0.2"   "setosa"
3 "4.7""3.2"   "1.3""0.2"   "setosa"
4 "4.6""3.1"   "1.5""0.2"   "setosa"
5 "5.0""3.6"   "1.4""0.2"   "setosa"
6 "5.4""3.9"   "1.7""0.4"   "setosa"

class(irm) <- "numeric"

Warning message:
In class(irm) <- "numeric" : NAs introduced by coercion

head(irm)

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  5.1 3.5  1.4 0.2  NA
2  4.9 3.0  1.4 0.2  NA
3  4.7 3.2  1.3 0.2  NA
4  4.6 3.1  1.5 0.2  NA
5  5.0 3.6  1.4 0.2  NA
6  

Re: [R] Mailinglist

2019-01-08 Thread Richard M. Heiberger
please post a sample dataset based on
mydata <- rbind[participant1[1:3,], participant2[1:4,], participant3[1:5,])

I think mydata at this point is anonymized enough that you can post
it.  Verify with whomever, though.
You might want to change those random U* names of the participants to
c("A","B","C")

post dput(mydata) to the list, in the body of the email.

I think this and the details you posted today will enable one of us to
give you a much simpler script, or maybe even function, to process
your full dataset.
Please confirm that participant1 and p1 are the same object.

Rich

That

On Tue, Jan 8, 2019 at 10:23 AM Rachel Thompson
 wrote:
>
> Hi
>
> Thank you for your help and suggestions!
> I have tried a few things and ask help from lots of people online!
>
> My problem is that I am not able to share the database! I tried to recreate 
> one but I wasn't successful.
> So I found a way to analyze each subject individually, but I do not know how 
> to perform the same steps for all of the subjects at once.
> But I just wanted to share what I did, since you tried to help me!
>
> This is what I did.
>
> I stored all the column names in a vector named "Names"
>
> names=c("participants","id","participantid","key","probetype","time","timespecific","value","valuespecified","valuedetailed","period","periodspecified")
>
>
>
> colnames(gmoji_passivedata)=names
>
>
>
> I used this code to find the number of participants in the dataset
>
>
>
> length(unique(gmoji_passivedata$participants))
>
> The number of participants is 44
>
>
>
> I used this code to find the unique ID for every participant
>
>
>
> library(plyr)
>
> > count(gmoji_passivedata,”participants")
>
> From the dataset, I selected one participant ""U_..."
>
> I used subset data
>
>
>
> participant1=subset(gmoji_passivedata,participants=="U_0139cf62_e615_41f7_a4cc_878c0490c510")
>
>
>
>
>
> With the table code
>
> table(p1$probetype) I found the counts of all the different values of the 
> probe type column
>
>
>
> edu.mit.media.funf.probe.builtin.ActivityProbe
>
>   16167
>
> edu.mit.media.funf.probe.builtin.BluetoothProbe
>
>   405
>
>   edu.mit.media.funf.probe.builtin.CallLogProbe
>
>   1427
>
>edu.mit.media.funf.probe.builtin.ScreenProbe
>
>   1791
>
>  edu.mit.media.funf.probe.builtin.WifiProbe
>
>5386
>
>
>
> The count for the call log probe for the selected participant is 1427
>
>
>
> There was only one participant with sms probe for the rest of the participant 
> the count of sms probe is 0
>
>
>
> For the screen probe and activity probe I found the total count (1791 and 
> 16167)
>
>
>
> For screen probe, I used a subset code and set the value detailed column to 
> false and true
>
>
>
> screenon_false=subset(p1,valuedetailed=="False") (this participant 875)
>
> screenon_true=subset(p1,valuedetailed=="True")   (this participant 916)
>
>
>
> and for activity probe to none, low and high to find the required values
>
>
>
> activity_none=subset(p1,valuedetailed=="none")   (this participant 12900)
>
> activity_low=subset(p1,valuedetailed=="low") (this participant 1050)
>
> activity_high=subset(p1,valuedetailed=="high")   (this participant 2217)
>
>
>
>
>
> I did this for each participant
>
>
> Best,
>
>
> Rachel
>
>
> On Sun, Jan 6, 2019 at 2:48 PM Richard M. Heiberger  wrote:
>>
>> Questions like this
>> 1. I want to have a summary of how many times a specific subject got called
>> (CallLogProbe)
>>
>> suggest that you should look at the table function.  See
>> ?table
>> and run the examples.
>> They show how to get one-way frequency tables and two-way contingency tables.
>>
>> If you have followup questions for the list, you can use the examples in 
>> ?table as your starting point.
>> That way you don't need to worry about sharing your own data.
>>
>>
>> On Sun, Jan 6, 2019 at 1:59 PM Rachel Thompson 
>>  wrote:
>>>
>>> Hi Rich,
>>>
>>> I really feel lost at this point.
>>> I need a code that helps me count the phone activity level(high/low/none),
>>> the screen activity (on/off) and the amount calls and SMS of each subject.
>>>
>>> 1. I want to have a summary of how many times a specific subject got called
>>> (CallLogProbe)
>>> 2. I want to have a summary of how many times a specific subject got a text
>>> message (SMS probe)
>>> 3. I want to have a summary of how many times a specific subject
>>> - Turned their screen on - True  (ScreenProbe)
>>> - Or did not turn their screen on - False (ScreenProbe)
>>> 4.  I want to have a summary of the activity level of a specific subject
>>> - Activity level - none (ActivityProbe)
>>> - Activity level- low (ActivityProbe)
>>> - Activity level - High  

Re: [R] error in plotting model from kernlab

2019-01-08 Thread Luigi Marongiu
Hi,
the maintainer hasn't answered yet. The problem with 'acc' is that yes
the objects are not of the same length but they should be: according
to the manual, ' table(pred, df$cons)' would return a 2x2 matrix of
the results. This is not the case, so there is a problem with the
model -- that is why there is no plotting either -- even if an object
of class ksvm had been created.

On Tue, Jan 8, 2019 at 4:12 PM PIKAL Petr  wrote:
>
> Hi
>
> I cannot help you with kernlab
>
> > >  pred = predict(mod, df, type = "probabilities")
> > >  acc = table(pred, df$cons)
> > Error in table(pred, df$cons) : all arguments must have the same length
> > which again is weird since mod, df and df$cons are made from the same
> > dataframe.
>
> Why not check length of those objects?
>
> length(pred)
> length(df$cons)
>
> > > plot(mod, data = df)
> > > kernlab::plot(mod, data = df)
> > but I get this error:
> >
> > Error in .local(x, ...) :
> >   Only plots of classification ksvm objects supported
> >
>
> seems to me selfexplanatory. What did maintainer said about it?
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help  On Behalf Of Luigi Marongiu
> > Sent: Monday, January 7, 2019 1:26 PM
> > To: r-help 
> > Subject: [R] error in plotting model from kernlab
> >
> > Dear all,
> > I have a set of data in this form:
> > > str 
> > 'data.frame': 1574 obs. of  14 variables:
> >  $ serial: int  12751 14157 7226 15663 11088 10464 1003 10427 11934 3999
> > ...
> >  $ plate : int  43 46 22 50 38 37 3 37 41 11 ...
> >  $ well  : int  79 333 314 303 336 96 235 59 30 159 ...
> >  $ sample: int  266 295 151 327 231 218 21 218 249 84 ...
> >  $ target: chr  "HEV 2-AI5IQWR" "Dientamoeba fragilis-AIHSPMK" "Astro
> > 2 Liu-AI20UKB" "C difficile GDH-AIS086J" ...
> >  $ ori.ct: num  0 33.5 0 0 0 ...
> >  $ ct.out: int  0 1 0 0 0 0 0 1 0 0 ...
> >  $ mr: num  -0.002 0.109 0.002 0 0.001 0.006 0.015 0.119 0.003 0.004 ...
> >  $ fcn   : num  44.54 36.74 6.78 43.09 44.87 ...
> >  $ mr.out: int  0 1 0 0 0 0 0 1 0 0 ...
> >  $ oper.a: int  0 1 0 0 0 0 0 1 0 0 ...
> >  $ oper.b: int  0 1 0 0 0 0 0 1 0 0 ...
> >  $ oper.c: int  0 1 0 0 0 0 0 1 0 0 ...
> >  $ cons  : int  0 1 0 0 0 0 0 1 0 0 ...
> > from which I have selected two numerical variables correspondig to x
> > and y in a Cartesian plane and one outcome variable (z):
> > > df = subset(t.data, select = c(mr, fcn, cons))
> > >  df$cons = factor(c("negative", "positive"))
> > > head(df)
> >   mr   fcn cons
> > 1 -0.002 44.54 negative
> > 2  0.109 36.74 positive
> > 3  0.002  6.78 negative
> > 4  0.000 43.09 positive
> > 5  0.001 44.87 negative
> > 6  0.006  2.82 positive
> >
> > I created an SVM the method with the KERNLAB package with:
> > > mod = ksvm(cons ~ mr+fcn, # i prefer it to the more canonical "." but the
> > outcome is the same
> > data = df,
> > type = "C-bsvc",
> > kernel = "rbfdot",
> > kpar = "automatic",
> > C = 10,
> > prob.model = TRUE)
> >
> > > mod
> > Support Vector Machine object of class "ksvm"
> >
> > SV type: C-bsvc  (classification)
> >  parameter : cost C = 10
> >
> > Gaussian Radial Basis kernel function.
> >  Hyperparameter : sigma =  42.0923201429106
> >
> > Number of Support Vectors : 1439
> >
> > Objective Function Value : -12873.45
> > Training error : 0.39263
> > Probability model included.
> >
> > First of all, I am not sure if the model worked because 1439 support
> > vectors out of 1574 data points means that over 90% of the data is
> > required to fix the hyperplane. this does not look like a model but a
> > patch. Secondly, the prediction is rubbish -- but this is another
> > story -- and when I try to create a confusion table of the processed
> > data I get:
> > >  pred = predict(mod, df, type = "probabilities")
> > >  acc = table(pred, df$cons)
> > Error in table(pred, df$cons) : all arguments must have the same length
> > which again is weird since mod, df and df$cons are made from the same
> > dataframe.
> >
> > Coming to the actual error, I tried to plot the model with:
> > > plot(mod, data = df)
> > > kernlab::plot(mod, data = df)
> > but I get this error:
> >
> > Error in .local(x, ...) :
> >   Only plots of classification ksvm objects supported
> >
> > Would you know what I am missing?
> > Thank you
> > --
> > Best regards,
> > Luigi
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
> partnerů PRECHEZA a.s. jsou zveřejněny na: 
> https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
> processing and protection of business partner’s personal data are available 
> on website: 

Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
I see...
Here's a portion of what my data looks like (csv file attached).
I run again and here are the results:

df4 <- read.csv(file = "mydata.csv", header = TRUE)

> require(SNPRelate)> library(gdsfmt)> myd <- df4> myd <- df4> 
> names(myd)[-1][1] "marker" "X88""X9" "X17""X25"

> myd[,1][1]  3  4  5  6  8 10


> # the data must be 0,1,2 with 3 as missing so you have r> sample.id <- 
> names(myd)[-1]> snp.id <- myd[,1]> snp.position <- 1:length(snp.id) # not 
> needed for ibs> snp.chromosome <- rep(1, each=length(snp.id)) # not needed 
> for ibs> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs> # 
> genotype data must have - in 3> genod <- myd[,-1]> genod[is.na(genod)] <- 3> 
> genod[genod=="0"] <- 0> genod[genod=="1"] <- 2

> genod2 <- as.matrix(genod)> head(genod2) marker
> X88 X9  X17 X25
[1,] "100023173|F|0-47:G>A-47:G>A" "0" "3" "3" "3"
[2,] "1043336|F|0-7:A>G-7:A>G" "2" "0" "3" "0"
[3,] "1212218|F|0-49:A>G-49:A>G"   "0" "0" "0" "0"
[4,] "1019554|F|0-14:T>C-14:T>C"   "0" "0" "3" "0"
[5,] "100024550|F|0-16:G>A-16:G>A" "3" "3" "3" "3"
[6,] "1106702|F|0-8:C>A-8:C>A" "0" "0" "0" "0"

> class(genod2) <- "numeric"Warning message:In class(genod2) <- "numeric" : NAs 
> introduced by coercion> head(genod2)

 marker X88 X9 X17 X25
[1,] NA   0  3   3   3
[2,] NA   2  0   3   0
[3,] NA   0  0   0   0
[4,] NA   0  0   3   0
[5,] NA   3  3   3   3
[6,] NA   0  0   0   0

> class(genod2) <- "numeric"> class(genod2)[1] "matrix"

> # read data > filn <-"simTunesian.gds"> snpgdsCreateGeno(filn, genmat = 
> genod,+  sample.id = sample.id, snp.id = snp.id,+ 
>  snp.chromosome = snp.chromosome,+  snp.position = 
> snp.position,+  snp.allele = snp.allele, 
> snpfirstdim=TRUE)Error in snpgdsCreateGeno(filn, genmat = genod, sample.id = 
> sample.id,  :
  is.matrix(genmat) is not TRUE

Thanks,
Meriam

On Tue, Jan 8, 2019 at 9:02 AM PIKAL Petr  wrote:

> Hi
>
> see in line
>
> > -Original Message-
> > From: R-help  On Behalf Of N Meriam
> > Sent: Tuesday, January 8, 2019 3:08 PM
> > To: r-help@r-project.org
> > Subject: [R] Warning message: NAs introduced by coercion
> >
> > Dear all,
> >
> > I have a .csv file called df4. (15752 obs. of 264 variables).
> > I apply this code but couldn't continue further other analyses, a warning
> > message keeps coming up. Then, I want to determine max and min
> > similarity values,
> > heat map plot, cluster...etc
> >
> > > require(SNPRelate)
> > > library(gdsfmt)
> > > myd <- read.csv(file = "df4.csv", header = TRUE)
> > > names(myd)[-1]
> > myd[,1]
> > > myd[1:10, 1:10]
> >  # the data must be 0,1,2 with 3 as missing so you have r
> > > sample.id <- names(myd)[-1]
> > > snp.id <- myd[,1]
> > > snp.position <- 1:length(snp.id) # not needed for ibs
> > > snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > > snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> > # genotype data must have - in 3
> > > genod <- myd[,-1]
> > > genod[is.na(genod)] <- 3
> > > genod[genod=="0"] <- 0
> > > genod[genod=="1"] <- 2
> > > genod[1:10,1:10]
> > > genod <- as.matrix(genod)
>
> matrix can have only one type of data so you probaly changed it to
> character by such construction.
>
> > > class(genod) <- "numeric"
>
> This tries to change all "numeric" values to numbers but if it cannot it
> sets it to NA.
>
> something like
>
> > head(iris)
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1  5.1 3.5  1.4 0.2  setosa
> 2  4.9 3.0  1.4 0.2  setosa
> 3  4.7 3.2  1.3 0.2  setosa
> 4  4.6 3.1  1.5 0.2  setosa
> 5  5.0 3.6  1.4 0.2  setosa
> 6  5.4 3.9  1.7 0.4  setosa
> > ir <-head(iris)
> > irm <- as.matrix(ir)
> > head(irm)
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1 "5.1""3.5"   "1.4""0.2"   "setosa"
> 2 "4.9""3.0"   "1.4""0.2"   "setosa"
> 3 "4.7""3.2"   "1.3""0.2"   "setosa"
> 4 "4.6""3.1"   "1.5""0.2"   "setosa"
> 5 "5.0""3.6"   "1.4""0.2"   "setosa"
> 6 "5.4""3.9"   "1.7""0.4"   "setosa"
> > class(irm) <- "numeric"
> Warning message:
> In class(irm) <- "numeric" : NAs introduced by coercion
> > head(irm)
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1  5.1 3.5  1.4 0.2  NA
> 2  4.9 3.0  1.4 0.2  NA
> 3  4.7 3.2  1.3 0.2  NA
> 4  4.6 3.1  1.5 0.2  NA
> 5  5.0 3.6  1.4 0.2  NA
> 6  5.4 3.9  1.7 0.4  NA
> >
>
> Cheers
> Petr
>
>

Re: [R] Mailinglist

2019-01-08 Thread Rachel Thompson
Hi

Thank you for your help and suggestions!
I have tried a few things and ask help from lots of people online!

My problem is that I am not able to share the database! I tried to recreate
one but I wasn't successful.
So I found a way to analyze each subject individually, but I do not know
how to perform the same steps for all of the subjects at once.
But I just wanted to share what I did, since you tried to help me!

This is what I did.

I stored all the column names in a vector named "Names"

names=c("participants","id","participantid","key","probetype","time","timespecific","value","valuespecified","valuedetailed","period","periodspecified")



colnames(gmoji_passivedata)=names



I used this code to find the number of participants in the dataset



length(unique(gmoji_passivedata$participants))

The number of participants is 44



I used this code to find the unique ID for every participant



library(plyr)

> count(gmoji_passivedata,”participants")

>From the dataset, I selected one participant ""U_..."

I used subset data



participant1=subset(gmoji_passivedata,participants=="U_0139cf62_e615_41f7_a4cc_878c0490c510")





With the table code

table(p1$probetype) I found the counts of all the different values of the
probe type column



edu.mit.media.funf.probe.builtin.ActivityProbe

  16167

edu.mit.media.funf.probe.builtin.BluetoothProbe

  405

  edu.mit.media.funf.probe.builtin.CallLogProbe

  1427

   edu.mit.media.funf.probe.builtin.ScreenProbe

  1791

 edu.mit.media.funf.probe.builtin.WifiProbe

   5386



The count for the call log probe for the selected participant is 1427



There was only one participant with sms probe for the rest of the
participant the count of sms probe is 0



For the screen probe and activity probe I found the total count (1791 and
16167)



For screen probe, I used a subset code and set the value detailed column to
false and true



screenon_false=subset(p1,valuedetailed=="False") (this participant 875)

screenon_true=subset(p1,valuedetailed=="True")   (this participant 916)



and for activity probe to none, low and high to find the required values



activity_none=subset(p1,valuedetailed=="none")   (this participant 12900)

activity_low=subset(p1,valuedetailed=="low") (this participant 1050)

activity_high=subset(p1,valuedetailed=="high")   (this participant 2217)





I did this for each participant


Best,


Rachel

On Mon, Jan 7, 2019 at 4:07 AM Hasan Diwan  wrote:

> dput(sample(mydata, n=25)) is probably going to be more representative. --
> H
>
> On Mon, 7 Jan 2019 at 00:56, PIKAL Petr  wrote:
>
> > Hi Rachel.
> >
> > You already have got several suggestions, but results depend on structure
> > of your data. The best way from your side would be just copy a part of
> your
> > data directly to email and preferable way is to use "dput".
> >
> > Assuming your data already transfered to R are called "mydata".
> >
> > You can just copy otput of
> >
> > dput(mydata[1:30,])
> >
> > to your next mail.
> >
> > Cheers
> > Petr
> >
> >
> > > -Original Message-
> > > From: R-help  On Behalf Of Rachel
> Thompson
> > > Sent: Sunday, January 6, 2019 7:49 PM
> > > To: Rich Shepard 
> > > Cc: r-help mailing list 
> > > Subject: Re: [R] Mailinglist
> > >
> > > Hi Rich,
> > >
> > > I really feel lost at this point.
> > > I need a code that helps me count the phone activity
> > level(high/low/none),
> > > the screen activity (on/off) and the amount calls and SMS of each
> > subject.
> > >
> > > 1. I want to have a summary of how many times a specific subject got
> > called
> > > (CallLogProbe)
> > > 2. I want to have a summary of how many times a specific subject got a
> > text
> > > message (SMS probe)
> > > 3. I want to have a summary of how many times a specific subject
> > > - Turned their screen on - True  (ScreenProbe)
> > > - Or did not turn their screen on - False (ScreenProbe)
> > > 4.  I want to have a summary of the activity level of a specific
> subject
> > > - Activity level - none (ActivityProbe)
> > > - Activity level- low (ActivityProbe)
> > > - Activity level - High  (ActivityProbe)
> > >
> > > I want to do this for all the 36 subjects(Participants).
> > > In the end, I have to define the percentages and cutoff points of what
> is
> > > considered low-medium-high, based on what the results of all the
> subjects
> > > are. So I am able to see if a specific subject has low social
> interaction
> > > etc.
> > >
> > > I have tried a lot, with the help of youtube etc. But I feel as if I am
> > > trying a lot of things but without clearly knowing if it is the right
> > step.
> > > I have a csv file, but I need to 

Re: [R] Mailinglist

2019-01-08 Thread Rachel Thompson
Hi

Thank you for your help and suggestions!
I have tried a few things and ask help from lots of people online!

My problem is that I am not able to share the database! I tried to recreate
one but I wasn't successful.
So I found a way to analyze each subject individually, but I do not know
how to perform the same steps for all of the subjects at once.
But I just wanted to share what I did, since you tried to help me!

This is what I did.

I stored all the column names in a vector named "Names"

names=c("participants","id","participantid","key","probetype","time","timespecific","value","valuespecified","valuedetailed","period","periodspecified")



colnames(gmoji_passivedata)=names



I used this code to find the number of participants in the dataset



length(unique(gmoji_passivedata$participants))

The number of participants is 44



I used this code to find the unique ID for every participant



library(plyr)

> count(gmoji_passivedata,”participants")

>From the dataset, I selected one participant ""U_..."

I used subset data



participant1=subset(gmoji_passivedata,participants=="U_0139cf62_e615_41f7_a4cc_878c0490c510")





With the table code

table(p1$probetype) I found the counts of all the different values of the
probe type column



edu.mit.media.funf.probe.builtin.ActivityProbe

  16167

edu.mit.media.funf.probe.builtin.BluetoothProbe

  405

  edu.mit.media.funf.probe.builtin.CallLogProbe

  1427

   edu.mit.media.funf.probe.builtin.ScreenProbe

  1791

 edu.mit.media.funf.probe.builtin.WifiProbe

   5386



The count for the call log probe for the selected participant is 1427



There was only one participant with sms probe for the rest of the
participant the count of sms probe is 0



For the screen probe and activity probe I found the total count (1791 and
16167)



For screen probe, I used a subset code and set the value detailed column to
false and true



screenon_false=subset(p1,valuedetailed=="False") (this participant 875)

screenon_true=subset(p1,valuedetailed=="True")   (this participant 916)



and for activity probe to none, low and high to find the required values



activity_none=subset(p1,valuedetailed=="none")   (this participant 12900)

activity_low=subset(p1,valuedetailed=="low") (this participant 1050)

activity_high=subset(p1,valuedetailed=="high")   (this participant 2217)





I did this for each participant


Best,


Rachel

On Mon, Jan 7, 2019 at 3:56 AM PIKAL Petr  wrote:

> Hi Rachel.
>
> You already have got several suggestions, but results depend on structure
> of your data. The best way from your side would be just copy a part of your
> data directly to email and preferable way is to use "dput".
>
> Assuming your data already transfered to R are called "mydata".
>
> You can just copy otput of
>
> dput(mydata[1:30,])
>
> to your next mail.
>
> Cheers
> Petr
>
>
> > -Original Message-
> > From: R-help  On Behalf Of Rachel Thompson
> > Sent: Sunday, January 6, 2019 7:49 PM
> > To: Rich Shepard 
> > Cc: r-help mailing list 
> > Subject: Re: [R] Mailinglist
> >
> > Hi Rich,
> >
> > I really feel lost at this point.
> > I need a code that helps me count the phone activity
> level(high/low/none),
> > the screen activity (on/off) and the amount calls and SMS of each
> subject.
> >
> > 1. I want to have a summary of how many times a specific subject got
> called
> > (CallLogProbe)
> > 2. I want to have a summary of how many times a specific subject got a
> text
> > message (SMS probe)
> > 3. I want to have a summary of how many times a specific subject
> > - Turned their screen on - True  (ScreenProbe)
> > - Or did not turn their screen on - False (ScreenProbe)
> > 4.  I want to have a summary of the activity level of a specific subject
> > - Activity level - none (ActivityProbe)
> > - Activity level- low (ActivityProbe)
> > - Activity level - High  (ActivityProbe)
> >
> > I want to do this for all the 36 subjects(Participants).
> > In the end, I have to define the percentages and cutoff points of what is
> > considered low-medium-high, based on what the results of all the subjects
> > are. So I am able to see if a specific subject has low social interaction
> > etc.
> >
> > I have tried a lot, with the help of youtube etc. But I feel as if I am
> > trying a lot of things but without clearly knowing if it is the right
> step.
> > I have a csv file, but I need to look into what Jeff said about the
> guides.
> > So I am able to share it.
> >
> > Best.
> >
> >
> > On Sun, Jan 6, 2019 at 11:51 AM Rich Shepard 
> > wrote:
> >
> > > On Sun, 6 Jan 2019, Rachel Thompson wrote:
> > >
> > > > I am an intern from Amsterdam and I 

Re: [R] Mailinglist

2019-01-08 Thread Rachel Thompson
Hi

Thank you for your help and suggestions!
I have tried a few things and ask help from lots of people online!

My problem is that I am not able to share the database! I tried to recreate
one but I wasn't successful.
So I found a way to analyze each subject individually, but I do not know
how to perform the same steps for all of the subjects at once.
But I just wanted to share what I did, since you tried to help me!

This is what I did.

I stored all the column names in a vector named "Names"

names=c("participants","id","participantid","key","probetype","time","timespecific","value","valuespecified","valuedetailed","period","periodspecified")



colnames(gmoji_passivedata)=names



I used this code to find the number of participants in the dataset



length(unique(gmoji_passivedata$participants))

The number of participants is 44



I used this code to find the unique ID for every participant



library(plyr)

> count(gmoji_passivedata,”participants")

>From the dataset, I selected one participant ""U_..."

I used subset data



participant1=subset(gmoji_passivedata,participants=="U_0139cf62_e615_41f7_a4cc_878c0490c510")





With the table code

table(p1$probetype) I found the counts of all the different values of the
probe type column



edu.mit.media.funf.probe.builtin.ActivityProbe

  16167

edu.mit.media.funf.probe.builtin.BluetoothProbe

  405

  edu.mit.media.funf.probe.builtin.CallLogProbe

  1427

   edu.mit.media.funf.probe.builtin.ScreenProbe

  1791

 edu.mit.media.funf.probe.builtin.WifiProbe

   5386



The count for the call log probe for the selected participant is 1427



There was only one participant with sms probe for the rest of the
participant the count of sms probe is 0



For the screen probe and activity probe I found the total count (1791 and
16167)



For screen probe, I used a subset code and set the value detailed column to
false and true



screenon_false=subset(p1,valuedetailed=="False") (this participant 875)

screenon_true=subset(p1,valuedetailed=="True")   (this participant 916)



and for activity probe to none, low and high to find the required values



activity_none=subset(p1,valuedetailed=="none")   (this participant 12900)

activity_low=subset(p1,valuedetailed=="low") (this participant 1050)

activity_high=subset(p1,valuedetailed=="high")   (this participant 2217)





I did this for each participant


Best,


Rachel

On Mon, Jan 7, 2019 at 1:28 AM K. Elo  wrote:

> Hi!
>
> Not having a data chunk prevents me from testing abit, but maybe you
> should take a look on:
>
> ?table
> ?xtabs
>
> to start with.
>
> But as already suggested by other users, a small data set would be of
> great help :)
>
> HTH,
> Kimmo
>
> su, 2019-01-06 kello 13:49 -0500, Rachel Thompson kirjoitti:
> > Hi Rich,
> >
> > I really feel lost at this point.
> > I need a code that helps me count the phone activity
> > level(high/low/none),
> > the screen activity (on/off) and the amount calls and SMS of each
> > subject.
> >
> > 1. I want to have a summary of how many times a specific subject got
> > called
> > (CallLogProbe)
> > 2. I want to have a summary of how many times a specific subject got
> > a text
> > message (SMS probe)
> > 3. I want to have a summary of how many times a specific subject
> > - Turned their screen on - True  (ScreenProbe)
> > - Or did not turn their screen on - False (ScreenProbe)
> > 4.  I want to have a summary of the activity level of a specific
> > subject
> > - Activity level - none (ActivityProbe)
> > - Activity level- low (ActivityProbe)
> > - Activity level - High  (ActivityProbe)
> >
> > I want to do this for all the 36 subjects(Participants).
> > In the end, I have to define the percentages and cutoff points of
> > what is
> > considered low-medium-high, based on what the results of all the
> > subjects
> > are. So I am able to see if a specific subject has low social
> > interaction
> > etc.
> >
> > I have tried a lot, with the help of youtube etc. But I feel as if I
> > am
> > trying a lot of things but without clearly knowing if it is the right
> > step.
> > I have a csv file, but I need to look into what Jeff said about the
> > guides.
> > So I am able to share it.
> >
> > Best.
> >
> >
> > On Sun, Jan 6, 2019 at 11:51 AM Rich Shepard <
> > rshep...@appl-ecosys.com>
> > wrote:
> >
> > > On Sun, 6 Jan 2019, Rachel Thompson wrote:
> > >
> > > > I am an intern from Amsterdam and I have to do an analysis in R.
> > > > I spoke
> > > > to my professor in Amsterdam and my supervisor's here in Boston.
> > > > But they
> > > > are to busy to help. I informed them from the start that I am not
> > >
> > > familiar
> > > 

Re: [R] Mailinglist

2019-01-08 Thread Rachel Thompson
Hi

Thank you for your help and suggestions!
I have tried a few things and ask help from lots of people online!

My problem is that I am not able to share the database! I tried to recreate
one but I wasn't successful.
So I found a way to analyze each subject individually, but I do not know
how to perform the same steps for all of the subjects at once.
But I just wanted to share what I did, since you tried to help me!

This is what I did.

I stored all the column names in a vector named "Names"

names=c("participants","id","participantid","key","probetype","time","timespecific","value","valuespecified","valuedetailed","period","periodspecified")



colnames(gmoji_passivedata)=names



I used this code to find the number of participants in the dataset



length(unique(gmoji_passivedata$participants))

The number of participants is 44



I used this code to find the unique ID for every participant



library(plyr)

> count(gmoji_passivedata,”participants")

>From the dataset, I selected one participant ""U_..."

I used subset data



participant1=subset(gmoji_passivedata,participants=="U_0139cf62_e615_41f7_a4cc_878c0490c510")





With the table code

table(p1$probetype) I found the counts of all the different values of the
probe type column



edu.mit.media.funf.probe.builtin.ActivityProbe

  16167

edu.mit.media.funf.probe.builtin.BluetoothProbe

  405

  edu.mit.media.funf.probe.builtin.CallLogProbe

  1427

   edu.mit.media.funf.probe.builtin.ScreenProbe

  1791

 edu.mit.media.funf.probe.builtin.WifiProbe

   5386



The count for the call log probe for the selected participant is 1427



There was only one participant with sms probe for the rest of the
participant the count of sms probe is 0



For the screen probe and activity probe I found the total count (1791 and
16167)



For screen probe, I used a subset code and set the value detailed column to
false and true



screenon_false=subset(p1,valuedetailed=="False") (this participant 875)

screenon_true=subset(p1,valuedetailed=="True")   (this participant 916)



and for activity probe to none, low and high to find the required values



activity_none=subset(p1,valuedetailed=="none")   (this participant 12900)

activity_low=subset(p1,valuedetailed=="low") (this participant 1050)

activity_high=subset(p1,valuedetailed=="high")   (this participant 2217)





I did this for each participant


Best,


Rachel

On Sun, Jan 6, 2019 at 2:48 PM Richard M. Heiberger  wrote:

> Questions like this
> 1. I want to have a summary of how many times a specific subject got called
> (CallLogProbe)
>
> suggest that you should look at the table function.  See
> ?table
> and run the examples.
> They show how to get one-way frequency tables and two-way contingency
> tables.
>
> If you have followup questions for the list, you can use the examples in
> ?table as your starting point.
> That way you don't need to worry about sharing your own data.
>
>
> On Sun, Jan 6, 2019 at 1:59 PM Rachel Thompson <
> rachel.thomp...@student.uva.nl> wrote:
>
>> Hi Rich,
>>
>> I really feel lost at this point.
>> I need a code that helps me count the phone activity level(high/low/none),
>> the screen activity (on/off) and the amount calls and SMS of each subject.
>>
>> 1. I want to have a summary of how many times a specific subject got
>> called
>> (CallLogProbe)
>> 2. I want to have a summary of how many times a specific subject got a
>> text
>> message (SMS probe)
>> 3. I want to have a summary of how many times a specific subject
>> - Turned their screen on - True  (ScreenProbe)
>> - Or did not turn their screen on - False (ScreenProbe)
>> 4.  I want to have a summary of the activity level of a specific subject
>> - Activity level - none (ActivityProbe)
>> - Activity level- low (ActivityProbe)
>> - Activity level - High  (ActivityProbe)
>>
>> I want to do this for all the 36 subjects(Participants).
>> In the end, I have to define the percentages and cutoff points of what is
>> considered low-medium-high, based on what the results of all the subjects
>> are. So I am able to see if a specific subject has low social interaction
>> etc.
>>
>> I have tried a lot, with the help of youtube etc. But I feel as if I am
>> trying a lot of things but without clearly knowing if it is the right
>> step.
>> I have a csv file, but I need to look into what Jeff said about the
>> guides.
>> So I am able to share it.
>>
>> Best.
>>
>>
>> On Sun, Jan 6, 2019 at 11:51 AM Rich Shepard 
>> wrote:
>>
>> > On Sun, 6 Jan 2019, Rachel Thompson wrote:
>> >
>> > > I am an intern from Amsterdam and I have to do an analysis in R. I
>> spoke
>> > > to my professor in 

Re: [R] error in plotting model from kernlab

2019-01-08 Thread PIKAL Petr
Hi

I cannot help you with kernlab

> >  pred = predict(mod, df, type = "probabilities")
> >  acc = table(pred, df$cons)
> Error in table(pred, df$cons) : all arguments must have the same length
> which again is weird since mod, df and df$cons are made from the same
> dataframe.

Why not check length of those objects?

length(pred)
length(df$cons)

> > plot(mod, data = df)
> > kernlab::plot(mod, data = df)
> but I get this error:
>
> Error in .local(x, ...) :
>   Only plots of classification ksvm objects supported
>

seems to me selfexplanatory. What did maintainer said about it?

Cheers
Petr


> -Original Message-
> From: R-help  On Behalf Of Luigi Marongiu
> Sent: Monday, January 7, 2019 1:26 PM
> To: r-help 
> Subject: [R] error in plotting model from kernlab
>
> Dear all,
> I have a set of data in this form:
> > str 
> 'data.frame': 1574 obs. of  14 variables:
>  $ serial: int  12751 14157 7226 15663 11088 10464 1003 10427 11934 3999
> ...
>  $ plate : int  43 46 22 50 38 37 3 37 41 11 ...
>  $ well  : int  79 333 314 303 336 96 235 59 30 159 ...
>  $ sample: int  266 295 151 327 231 218 21 218 249 84 ...
>  $ target: chr  "HEV 2-AI5IQWR" "Dientamoeba fragilis-AIHSPMK" "Astro
> 2 Liu-AI20UKB" "C difficile GDH-AIS086J" ...
>  $ ori.ct: num  0 33.5 0 0 0 ...
>  $ ct.out: int  0 1 0 0 0 0 0 1 0 0 ...
>  $ mr: num  -0.002 0.109 0.002 0 0.001 0.006 0.015 0.119 0.003 0.004 ...
>  $ fcn   : num  44.54 36.74 6.78 43.09 44.87 ...
>  $ mr.out: int  0 1 0 0 0 0 0 1 0 0 ...
>  $ oper.a: int  0 1 0 0 0 0 0 1 0 0 ...
>  $ oper.b: int  0 1 0 0 0 0 0 1 0 0 ...
>  $ oper.c: int  0 1 0 0 0 0 0 1 0 0 ...
>  $ cons  : int  0 1 0 0 0 0 0 1 0 0 ...
> from which I have selected two numerical variables correspondig to x
> and y in a Cartesian plane and one outcome variable (z):
> > df = subset(t.data, select = c(mr, fcn, cons))
> >  df$cons = factor(c("negative", "positive"))
> > head(df)
>   mr   fcn cons
> 1 -0.002 44.54 negative
> 2  0.109 36.74 positive
> 3  0.002  6.78 negative
> 4  0.000 43.09 positive
> 5  0.001 44.87 negative
> 6  0.006  2.82 positive
>
> I created an SVM the method with the KERNLAB package with:
> > mod = ksvm(cons ~ mr+fcn, # i prefer it to the more canonical "." but the
> outcome is the same
> data = df,
> type = "C-bsvc",
> kernel = "rbfdot",
> kpar = "automatic",
> C = 10,
> prob.model = TRUE)
>
> > mod
> Support Vector Machine object of class "ksvm"
>
> SV type: C-bsvc  (classification)
>  parameter : cost C = 10
>
> Gaussian Radial Basis kernel function.
>  Hyperparameter : sigma =  42.0923201429106
>
> Number of Support Vectors : 1439
>
> Objective Function Value : -12873.45
> Training error : 0.39263
> Probability model included.
>
> First of all, I am not sure if the model worked because 1439 support
> vectors out of 1574 data points means that over 90% of the data is
> required to fix the hyperplane. this does not look like a model but a
> patch. Secondly, the prediction is rubbish -- but this is another
> story -- and when I try to create a confusion table of the processed
> data I get:
> >  pred = predict(mod, df, type = "probabilities")
> >  acc = table(pred, df$cons)
> Error in table(pred, df$cons) : all arguments must have the same length
> which again is weird since mod, df and df$cons are made from the same
> dataframe.
>
> Coming to the actual error, I tried to plot the model with:
> > plot(mod, data = df)
> > kernlab::plot(mod, data = df)
> but I get this error:
>
> Error in .local(x, ...) :
>   Only plots of classification ksvm objects supported
>
> Would you know what I am missing?
> Thank you
> --
> Best regards,
> Luigi
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, 

Re: [R] Warning message: NAs introduced by coercion

2019-01-08 Thread PIKAL Petr
Hi

see in line

> -Original Message-
> From: R-help  On Behalf Of N Meriam
> Sent: Tuesday, January 8, 2019 3:08 PM
> To: r-help@r-project.org
> Subject: [R] Warning message: NAs introduced by coercion
>
> Dear all,
>
> I have a .csv file called df4. (15752 obs. of 264 variables).
> I apply this code but couldn't continue further other analyses, a warning
> message keeps coming up. Then, I want to determine max and min
> similarity values,
> heat map plot, cluster...etc
>
> > require(SNPRelate)
> > library(gdsfmt)
> > myd <- read.csv(file = "df4.csv", header = TRUE)
> > names(myd)[-1]
> myd[,1]
> > myd[1:10, 1:10]
>  # the data must be 0,1,2 with 3 as missing so you have r
> > sample.id <- names(myd)[-1]
> > snp.id <- myd[,1]
> > snp.position <- 1:length(snp.id) # not needed for ibs
> > snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> > snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
> # genotype data must have - in 3
> > genod <- myd[,-1]
> > genod[is.na(genod)] <- 3
> > genod[genod=="0"] <- 0
> > genod[genod=="1"] <- 2
> > genod[1:10,1:10]
> > genod <- as.matrix(genod)

matrix can have only one type of data so you probaly changed it to character by 
such construction.

> > class(genod) <- "numeric"

This tries to change all "numeric" values to numbers but if it cannot it sets 
it to NA.

something like

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  5.1 3.5  1.4 0.2  setosa
2  4.9 3.0  1.4 0.2  setosa
3  4.7 3.2  1.3 0.2  setosa
4  4.6 3.1  1.5 0.2  setosa
5  5.0 3.6  1.4 0.2  setosa
6  5.4 3.9  1.7 0.4  setosa
> ir <-head(iris)
> irm <- as.matrix(ir)
> head(irm)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 "5.1""3.5"   "1.4""0.2"   "setosa"
2 "4.9""3.0"   "1.4""0.2"   "setosa"
3 "4.7""3.2"   "1.3""0.2"   "setosa"
4 "4.6""3.1"   "1.5""0.2"   "setosa"
5 "5.0""3.6"   "1.4""0.2"   "setosa"
6 "5.4""3.9"   "1.7""0.4"   "setosa"
> class(irm) <- "numeric"
Warning message:
In class(irm) <- "numeric" : NAs introduced by coercion
> head(irm)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  5.1 3.5  1.4 0.2  NA
2  4.9 3.0  1.4 0.2  NA
3  4.7 3.2  1.3 0.2  NA
4  4.6 3.1  1.5 0.2  NA
5  5.0 3.6  1.4 0.2  NA
6  5.4 3.9  1.7 0.4  NA
>

Cheers
Petr


>
>
> *Warning message:In class(genod) <- "numeric" : NAs introduced by coercion*
>
> Maybe I could illustrate more with details so I can be more specific?
> Please, let me know.
>
> I would appreciate your help.
> Thanks,
> Meriam
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question

2019-01-08 Thread Sarah Goslee
xyplot is not a package, it is a function within the lattice package, which
should already be installed.

library(lattice) # load the package from the R library
?xyplot # look at the help for the function

The others are also functions, not packages.

Sarah

On Tue, Jan 8, 2019 at 9:15 AM S. Mahmoud Nasrollahi 
wrote:

> Dear colleague
> I have got a problem during working with some package in R and in
> spite of trying with R help, internet and any other resources I could
> not succeed. Indeed when I what to install some function like bwplot,
> boxplot,  xyplot I receive this sort of messages:
>  Warning in install.packages :
>   package ‘xyplot’ is not available (for R version 3.5.2)
> Do you know how I can solve that?
>
>
> --
> S. M. Nasrollahi
> Postdoctoral Researcher
> French National Institute for Agricultural Research
>  Unite´ Mixte de Recherches sur les Herbivores,
>  63122 St Gene`s Champanelle, France
> sayyed-mahmoud.nasroll...@inra.fr
> Tel: +9826132248082
> Fax: +9826132246752
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Sarah Goslee (she/her)
http://www.sarahgoslee.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question

2019-01-08 Thread S. Mahmoud Nasrollahi
Dear colleague
I have got a problem during working with some package in R and in
spite of trying with R help, internet and any other resources I could
not succeed. Indeed when I what to install some function like bwplot,
boxplot,  xyplot I receive this sort of messages:
 Warning in install.packages :
  package ‘xyplot’ is not available (for R version 3.5.2)
Do you know how I can solve that?


-- 
S. M. Nasrollahi
Postdoctoral Researcher
French National Institute for Agricultural Research
 Unite´ Mixte de Recherches sur les Herbivores,
 63122 St Gene`s Champanelle, France
sayyed-mahmoud.nasroll...@inra.fr
Tel: +9826132248082
Fax: +9826132246752

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Warning message: NAs introduced by coercion

2019-01-08 Thread N Meriam
Dear all,

I have a .csv file called df4. (15752 obs. of 264 variables).
I apply this code but couldn't continue further other analyses, a warning
message keeps coming up. Then, I want to determine max and min
similarity values,
heat map plot, cluster...etc

> require(SNPRelate)
> library(gdsfmt)
> myd <- read.csv(file = "df4.csv", header = TRUE)
> names(myd)[-1]
myd[,1]
> myd[1:10, 1:10]
 # the data must be 0,1,2 with 3 as missing so you have r
> sample.id <- names(myd)[-1]
> snp.id <- myd[,1]
> snp.position <- 1:length(snp.id) # not needed for ibs
> snp.chromosome <- rep(1, each=length(snp.id)) # not needed for ibs
> snp.allele <- rep("A/G", length(snp.id)) # not needed for ibs
# genotype data must have - in 3
> genod <- myd[,-1]
> genod[is.na(genod)] <- 3
> genod[genod=="0"] <- 0
> genod[genod=="1"] <- 2
> genod[1:10,1:10]
> genod <- as.matrix(genod)
> class(genod) <- "numeric"


*Warning message:In class(genod) <- "numeric" : NAs introduced by coercion*

Maybe I could illustrate more with details so I can be more specific?
Please, let me know.

I would appreciate your help.
Thanks,
Meriam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why does R do this?

2019-01-08 Thread Duncan Murdoch

On 08/01/2019 4:28 a.m., Nick Wray via R-help wrote:

y<-c(1,2,3)
z<-which(y>3)


At this point z is a vector with no entries in it.


z
y<-y[-z]


-z is the same vector.  So y[z] and y[-z] are the same.


y

In the work I'm doing I often have this situation and have to make sure that I 
condition on z being non-zero as y is now numeric(0) rather than the set 
c(1,2,3).  Why does R do this?  Wouldn't it be more sensible for R to simply 
leave the host set unchanged if there are no elements to take out?


No, it wouldn't.  You asked for no entries, so you get no entries.

Follow Thierry's advice, and don't use which() unless you really need a 
vector of indices, and are prepared for an empty one.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why does R do this?

2019-01-08 Thread PIKAL Petr
Hi

It is documented behaviour.

"An empty index selects all values: this is most often used to replace all the 
entries but keep the attributes."

so I presume that changing it could break huge amount of code. The only 
workaround could be to check "z" before using it for indexing.

e.g.
> if(length(z)==0) z <- length(y) + 1
> y[-z]
[1] 1 2 3
>
Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Nick Wray via R-
> help
> Sent: Tuesday, January 8, 2019 10:29 AM
> To: r-help 
> Subject: [R] Why does R do this?
>
> y<-c(1,2,3)
> z<-which(y>3)
> z
> y<-y[-z]
> y
>
> In the work I'm doing I often have this situation and have to make sure that I
> condition on z being non-zero as y is now numeric(0) rather than the set
> c(1,2,3).  Why does R do this?  Wouldn't it be more sensible for R to simply
> leave the host set unchanged if there are no elements to take out?
>
> Any thoughts?
>
> Thanks, Nick Wray
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why does R do this?

2019-01-08 Thread Thierry Onkelinx via R-help
Dear Nick,

The best solution is not to use which() but directy use the logical test.
This will work in case the condition is always FALSE and which() returns a
integer(0). And it is much faster too.
z <- y > 3
y[!z]

library(microbenchmark)
microbenchmark(
  y[!y > 3],
  y[-which(y > 3)]
)

Best regards,




ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///




Op di 8 jan. 2019 om 10:29 schreef Nick Wray via R-help <
r-help@r-project.org>:

> y<-c(1,2,3)
> z<-which(y>3)
> z
> y<-y[-z]
> y
>
> In the work I'm doing I often have this situation and have to make sure
> that I condition on z being non-zero as y is now numeric(0) rather than the
> set c(1,2,3).  Why does R do this?  Wouldn't it be more sensible for R to
> simply leave the host set unchanged if there are no elements to take out?
>
> Any thoughts?
>
> Thanks, Nick Wray
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why does R do this?

2019-01-08 Thread Nick Wray via R-help
y<-c(1,2,3)
z<-which(y>3)
z
y<-y[-z]
y

In the work I'm doing I often have this situation and have to make sure that I 
condition on z being non-zero as y is now numeric(0) rather than the set 
c(1,2,3).  Why does R do this?  Wouldn't it be more sensible for R to simply 
leave the host set unchanged if there are no elements to take out?

Any thoughts?

Thanks, Nick Wray
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.