Re: [datatable-help] Replacing NA values for a categorical variable results in numerical replacements

2016-11-23 Thread affableambler
This may not be the most elegant solution, but one way to do it would be to
convert the factor to a character vector, replace the NAs, and then convert
it back to a factor:

Prestige.bc<-Prestige
Prestige.bc$type<-as.character(Prestige.bc$type)
Prestige.bc$type[which(is.na(Prestige.bc$type))]<-'bc'
Prestige.bc$type<-as.factor(Prestige.bc$type)




--
View this message in context: 
http://r.789695.n4.nabble.com/Replacing-NA-values-for-a-categorical-variable-results-in-numerical-replacements-tp4726789p4726790.html
Sent from the datatable-help mailing list archive at Nabble.com.
___
datatable-help mailing list
datatable-help@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


Re: [datatable-help] Create matrix of columns from grouped rows

2016-11-23 Thread Eric Archer
Thanks! That's about 2-3x faster. I wasn't familiar with stack/unstack. Now
I have another tool.

Cheers,
e.




*Eric Archer, Ph.D.*

Southwest Fisheries Science Center (NMFS/NOAA)
8901 La Jolla Shores Drive
La Jolla, CA 92037 USA
858-546-7121 (work)
858-546-7003 (FAX)

Marine Mammal Genetics Group: swfsc.noaa.gov/mmtd-mmgenetics
GitHub: github/ericarcher

&

Adjunct Professor, Marine Biology
Scripps Institution of Oceanography
University of California, San Diego
http://profiles.ucsd.edu/frederick.archer


"


*The universe doesn't care what you believe. The wonderful thing about
science is that it   doesn't ask for your faith, it just asks   for your
eyes.*"  - Randall Munroe

"*Lighthouses are more helpful than churches.*"
   - Benjamin Franklin

   "*...but I'll take a GPS over either one.*"
   - John C. "Craig" George

On Wed, Nov 23, 2016 at 11:11 AM, nachti [via R] <
ml-node+s789695n472677...@n4.nabble.com> wrote:

> Try this:
>
> t1 <- dt[, unlist(.SD), by = id]
> t(unstack(t1, form = V1 ~ id))
>
> (I think you get done with the colnames yourself ...)
>
> Cheers,
> ~g
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/Create-matrix-of-columns-from-grouped-rows-
> tp4726777p4726778.html
> To unsubscribe from Create matrix of columns from grouped rows, click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://r.789695.n4.nabble.com/Create-matrix-of-columns-from-grouped-rows-tp4726777p4726779.html
Sent from the datatable-help mailing list archive at Nabble.com.___
datatable-help mailing list
datatable-help@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Create matrix of columns from grouped rows

2016-11-23 Thread nachti
Try this:

t1 <- dt[, unlist(.SD), by = id]
t(unstack(t1, form = V1 ~ id))

(I think you get done with the colnames yourself ...)

Cheers,
~g



--
View this message in context: 
http://r.789695.n4.nabble.com/Create-matrix-of-columns-from-grouped-rows-tp4726777p4726778.html
Sent from the datatable-help mailing list archive at Nabble.com.
___
datatable-help mailing list
datatable-help@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


[datatable-help] Create matrix of columns from grouped rows

2016-11-23 Thread Eric Archer
I am trying to find the most efficient (fastest) way of manipulating a
data.table object that contains genetic data. The format is in the following
toy example, where rows represent alleles for individuals and columns are
separate loci. Each individual will be represented in one or more rows
depending on what the ploidy of the loci are. In this example, there are
three tetraploid loci (4 alleles per locus) genotyped for three individuals
(1:3). In my real data, all loci will always have the same ploidy.

> library(data.table)
> dt <- data.table(
+ id = as.character(rep(1:3, each = 4)),
+ loc1 = factor(sample(c("C", "T"), 12, rep = T)),
+ loc2 = factor(sample(c("C", "T"), 12, rep = T)),
+ loc3 = factor(sample(c("C", "T"), 12, rep = T)), 
+ key = "id"
+ )
> dt
id loc1 loc2 loc3
 1:  1TTT
 2:  1CTC
 3:  1TCT
 4:  1CCT
 5:  2TCC
 6:  2TTT
 7:  2CTT
 8:  2CTC
 9:  3CTT
10:  3TTT
11:  3TCT
12:  3TTC

What I'm looking for is the fastest way to convert this data.table to a
matrix where each row has the entire genotypes for one individual with the
alleles for a locus in sequential columns. The code I currently have for
this follows. 

> ids <- dt[, unique(id)]
> .cbindColFunc <- function(x) {
+ do.call(cbind, as.list(as.character(x)))
+ }
> mat <- do.call(rbind, lapply(ids, function(i) {
+ dt[i, do.call(cbind, lapply(.SD, .cbindColFunc)), .SDcols = !"id"]
+ }))
> num.alleles <- ncol(mat) / (ncol(dt) - 1)
> colnames(mat) <- paste(rep(colnames(dt)[-1], each = num.alleles),
> 1:num.alleles, sep = ".")
> mat <- cbind(id = ids, mat)
> mat
 id  loc1.1 loc1.2 loc1.3 loc1.4 loc2.1 loc2.2 loc2.3 loc2.4 loc3.1
loc3.2 loc3.3 loc3.4
[1,] "1" "T""C""T""C""T""T""C""C""T""C"   
"T""T"   
[2,] "2" "T""T""C""C""C""T""T""T""C""T"   
"T""C"   
[3,] "3" "C""T""T""T""T""T""C""T""T""T"   
"T""C"   


Is there a faster, more data.table friendly way to do it?
Thanks in advance!
Eric



--
View this message in context: 
http://r.789695.n4.nabble.com/Create-matrix-of-columns-from-grouped-rows-tp4726777.html
Sent from the datatable-help mailing list archive at Nabble.com.
___
datatable-help mailing list
datatable-help@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help