[R] Change the oder of stacked bar

2020-06-05 Thread Aimin Yan
I want to use the code below this message to make stacked bar plot, my
question is :


I want the stacked bar and its legend following the order as tr from
left to right like the following:

"100.0.250ng_CellLine_0" "75.25.250ng_CellLine_0"
"50.50.250ng_CellLine_0" "10.90.250ng_CellLine_0"
"1.99.250ng_CellLine_0" "0.100.250ng_CellLine_0"
"100.0.500ng_CellLine_0" "75.25.500ng_CellLine_0"
"50.50.500ng_CellLine_0" "10.90.500ng_CellLine_0"
"1.99.500ng_CellLine_0" "0.100.500ng_CellLine_0"

However, It seems the following code does not generate the stacked bar
as this order

In addition, for '0.100.500ng_CellLine_0' in df, the order for gene
and color in stacked bar is not same as the order in df, how to change
this?

Another question is:

tr has 12 treatments, I have to add new_scale_fill() for each
treatment, so I get long code, Is there a way to simplify this?

Thank you

Aimin


library(ggplot2)

library(dplyr)

library(tidyverse)

library(ggnewscale)

df <- read.csv(text='"trt","gene","freq","cols"
 "100.0.250ng_CellLine_0","ALDH16A1",100,"red"
 "100.0.250ng_CellLine_0","Others",0,"lightgrey"
 "75.25.250ng_CellLine_0","ALDH16A1",64.6638014695688,"red"
 "75.25.250ng_CellLine_0","GBE1",2.0074864827395,"#4C00FF"
 "75.25.250ng_CellLine_0","ZNF598",1.5832524608346,"#004CFF"
 "75.25.250ng_CellLine_0","CHMP6",1.35033966449466,"#00E5FF"
 "75.25.250ng_CellLine_0","C20orf27",1.2033827810897,"#00FF4D"
 "75.25.250ng_CellLine_0","NEGR1",0.967697213364758,"#4DFF00"
 "75.25.250ng_CellLine_0","TNFAIP6",0.912241785664772,"#E6FF00"
 "75.25.250ng_CellLine_0","ZSCAN25",0.737557188409816,"#00"
 "75.25.250ng_CellLine_0","BCL2",0.684874532094829,"#FFDE59"
 "75.25.250ng_CellLine_0","CBL",0.676556217939831,"#FFE0B3"
 "75.25.250ng_CellLine_0","Others",25.2128102037987,"lightgrey"
 "50.50.250ng_CellLine_0","ALDH16A1",42.4503581203051,"red"
 "50.50.250ng_CellLine_0","ATF2",2.2360682428,"#4C00FF"
 "50.50.250ng_CellLine_0","DIAPH1",1.52565073079835,"#004CFF"
 "50.50.250ng_CellLine_0","SESTD1",1.20538053921854,"#00E5FF"
 "50.50.250ng_CellLine_0","TFCP2",1.15879578407966,"#00FF4D"
 "50.50.250ng_CellLine_0","SCAPER",1.1180341214,"#4DFF00"
 "50.50.250ng_CellLine_0","CUX1",1.03068770744774,"#E6FF00"
 "50.50.250ng_CellLine_0","TEX10",0.984102952308857,"#00"
 "50.50.250ng_CellLine_0","C6orf89",0.966633669131777,"#FFDE59"
 "50.50.250ng_CellLine_0","PTTG1IP",0.925872008385256,"#FFE0B3"
 "50.50.250ng_CellLine_0","Others",46.3984161183253,"lightgrey"
 "10.90.250ng_CellLine_0","ALDH16A1",4.68952007835455,"red"
 "10.90.250ng_CellLine_0","STK11",1.93143976493634,"#4C00FF"
 "10.90.250ng_CellLine_0","ERGIC2",1.46523016650343,"#004CFF"
 "10.90.250ng_CellLine_0","EFR3A",1.1126346718903,"#00E5FF"
 "10.90.250ng_CellLine_0","TMEM235",1.03819784524976,"#00FF4D"
 "10.90.250ng_CellLine_0","NGLY1",1.01469147894221,"#4DFF00"
 "10.90.250ng_CellLine_0","CNOT10",0.991185112634672,"#E6FF00"
 "10.90.250ng_CellLine_0","NPLOC4",0.983349657198825,"#00"
 "10.90.250ng_CellLine_0","GZMB",0.928501469147894,"#FFDE59"
 "10.90.250ng_CellLine_0","KIF2C",0.924583741429971,"#FFE0B3"
 "10.90.250ng_CellLine_0","Others",84.9206660137121,"lightgrey"
 "1.99.250ng_CellLine_0","DNAH1",2.36284289276808,"red"
 "1.99.250ng_CellLine_0","ALOX5AP",2.29426433915212,"#4C00FF"
 "1.99.250ng_CellLine_0","SEPT7",1.78304239401496,"#004CFF"
 "1.99.250ng_CellLine_0","TCF20",1.35910224438903,"#00E5FF"
 "1.99.250ng_CellLine_0","USP32",1.27805486284289,"#00FF4D"
 "1.99.250ng_CellLine_0","MUS81",1.24688279301746,"#4DFF00"
 "1.99.250ng_CellLine_0","CEP44",1.22817955112219,"#E6FF00"
 "1.99.250ng_CellLine_0","TMEM164",1.20324189526185,"#00"
 "1.99.250ng_CellLine_0","RAP1B",1.18453865336658,"#FFDE59"
 "1.99.250ng_CellLine_0","GSN",1.14713216957606,"#FFE0B3"
 "1.99.250ng_CellLine_0","Others",84.9127182044888,"lightgrey"
 "0.100.250ng_CellLine_0","RTN3",2.3050199437531,"red"
 "0.100.250ng_CellLine_0","CHTF18",1.67637814091135,"#4C00FF"
 "0.100.250ng_CellLine_0","RNPS1",1.41168685550429,"#004CFF"
 "0.100.250ng_CellLine_0","RBKS",1.05325073984891,"#00E5FF"
 "0.100.250ng_CellLine_0","ZNF805",0.987077918497142,"#00FF4D"
 "0.100.250ng_CellLine_0","TMBIM6",0.865761079352242,"#4

[R] (no subject)

2020-06-05 Thread Aimin Yan
I want the stacked bar and its legend following the order as tr from
left to right like the following:

"100.0.250ng_CellLine_0" "75.25.250ng_CellLine_0"
"50.50.250ng_CellLine_0" "10.90.250ng_CellLine_0"
"1.99.250ng_CellLine_0" "0.100.250ng_CellLine_0"
"100.0.500ng_CellLine_0" "75.25.500ng_CellLine_0"
"50.50.500ng_CellLine_0" "10.90.500ng_CellLine_0"
"1.99.500ng_CellLine_0" "0.100.500ng_CellLine_0"

However, It seems the above code does not generate the stacked bar as this order

In addition, for '0.100.500ng_CellLine_0' in df, the order for gene
and color in stacked bar is not same as the order in df:



0.100.500ng_CellLine_0   ALYREF   1.5326986   red
  0.100.500ng_CellLine_0HCG18   1.5108475   #4C00FF
  0.100.500ng_CellLine_0RNU7-146P   0.9224286   #004CFF
  0.100.500ng_CellLine_0  ST3GAL3   0.8849696   #00E5FF
  0.100.500ng_CellLine_0 HSF1   0.8116123   #00FF4D
  0.100.500ng_CellLine_0   HP1BP3   0.7928828   #4DFF00
  0.100.500ng_CellLine_0 DAOA   0.7366942   #E6FF00
  0.100.500ng_CellLine_0CDK13   0.6898705   #00
  0.100.500ng_CellLine_0   PDXDC1   0.6805057   #FFDE59
  0.100.500ng_CellLine_0CKAP5   0.6477290   #FFE0B3
  0.100.500ng_CellLine_0   Others  90.7897612 lightgrey'

library(dplyr)
library(tidyverse)
library(ggnewscale)

df <- read.csv(text='"trt","gene","freq","cols"
 "100.0.250ng_CellLine_0","ALDH16A1",100,"red"
 "100.0.250ng_CellLine_0","Others",0,"lightgrey"
 "75.25.250ng_CellLine_0","ALDH16A1",64.6638014695688,"red"
 "75.25.250ng_CellLine_0","GBE1",2.0074864827395,"#4C00FF"
 "75.25.250ng_CellLine_0","ZNF598",1.5832524608346,"#004CFF"
 "75.25.250ng_CellLine_0","CHMP6",1.35033966449466,"#00E5FF"
 "75.25.250ng_CellLine_0","C20orf27",1.2033827810897,"#00FF4D"
 "75.25.250ng_CellLine_0","NEGR1",0.967697213364758,"#4DFF00"
 "75.25.250ng_CellLine_0","TNFAIP6",0.912241785664772,"#E6FF00"
 "75.25.250ng_CellLine_0","ZSCAN25",0.737557188409816,"#00"
 "75.25.250ng_CellLine_0","BCL2",0.684874532094829,"#FFDE59"
 "75.25.250ng_CellLine_0","CBL",0.676556217939831,"#FFE0B3"
 "75.25.250ng_CellLine_0","Others",25.2128102037987,"lightgrey"
 "50.50.250ng_CellLine_0","ALDH16A1",42.4503581203051,"red"
 "50.50.250ng_CellLine_0","ATF2",2.2360682428,"#4C00FF"
 "50.50.250ng_CellLine_0","DIAPH1",1.52565073079835,"#004CFF"
 "50.50.250ng_CellLine_0","SESTD1",1.20538053921854,"#00E5FF"
 "50.50.250ng_CellLine_0","TFCP2",1.15879578407966,"#00FF4D"
 "50.50.250ng_CellLine_0","SCAPER",1.1180341214,"#4DFF00"
 "50.50.250ng_CellLine_0","CUX1",1.03068770744774,"#E6FF00"
 "50.50.250ng_CellLine_0","TEX10",0.984102952308857,"#00"
 "50.50.250ng_CellLine_0","C6orf89",0.966633669131777,"#FFDE59"
 "50.50.250ng_CellLine_0","PTTG1IP",0.925872008385256,"#FFE0B3"
 "50.50.250ng_CellLine_0","Others",46.3984161183253,"lightgrey"
 "10.90.250ng_CellLine_0","ALDH16A1",4.68952007835455,"red"
 "10.90.250ng_CellLine_0","STK11",1.93143976493634,"#4C00FF"
 "10.90.250ng_CellLine_0","ERGIC2",1.46523016650343,"#004CFF"
 "10.90.250ng_CellLine_0","EFR3A",1.1126346718903,"#00E5FF"
 "10.90.250ng_CellLine_0","TMEM235",1.03819784524976,"#00FF4D"
 "10.90.250ng_CellLine_0","NGLY1",1.01469147894221,"#4DFF00"
 "10.90.250ng_CellLine_0","CNOT10",0.991185112634672,"#E6FF00"
 "10.90.250ng_CellLine_0","NPLOC4",0.983349657198825,"#00"
 "10.90.250ng_CellLine_0","GZMB",0.928501469147894,"#FFDE59"
 "10.90.250ng_CellLine_0","KIF2C",0.924583741429971,"#FFE0B3"
 "10.90.250ng_CellLine_0","Others",84.9206660137121,"lightgrey"
 "1.99.250ng_CellLine_0","DNAH1",2.36284289276808,"red"
 "1.99.250ng_CellLine_0","ALOX5AP",2.29426433915212,"#4C00FF"
 "1.99.250ng_CellLine_0","SEPT7",1.78304239401496,"#004CFF"
 "1.99.250ng_CellLine_0","TCF20",1.35910224438903,"#00E5FF"
 "1.99.250ng_CellLine_0","USP32",1.27805486284289,"#00FF4D"
 "1.99.250ng_CellLine_0","MUS81",1.24688279301746,"#4DFF00"
 "1.99.250ng_CellLine_0","CEP44",1.22817955112219,"#E6FF00"
 "1.99.250ng_CellLine_0","TMEM164",1.20324189526185,"#00"
 "1.99.250ng_CellLine_0","RAP1B",1.18453865336658,"#FFDE59"
 "1.99.250ng_CellLine_0","GSN",1.14713216957606,"#FFE0B3"
 

Re: [R] ask help for ggplot

2020-06-05 Thread Aimin Yan
Thank you, it is very helpful.

I tried the following way to generate stacked bar plot for trt 'M6' and
'M12'

However, the label position of legend in 'M12' is not what I want,
actually in the legend I also want to keep "Others" in the bottom(like the
gene order in stacked bar)

In addition, how to  make  a stacked bar plot for 'M6','M12' and 'M18'
together with different legends('M6', 'M12', 'M18')

Thank you,

Aimin

df.1 <- df[df$trt=='M6',]

g <- unique(as.character(df.1$gene))
i <- which(g == "Others")
g <- c(g[-i], g[i])

df.1$trt <- factor(df.1$trt,levels=unique(as.character(df$trt)))
df.1$gene <- factor(df.1$gene,levels = g)

df.1 %>% ggplot(aes(x=trt,y=freq, fill = gene, group = gene)) +
  geom_bar(stat = "identity", width = 0.5) +
  scale_fill_manual(breaks = df$gene, values = df$cols) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4)) +
theme(legend.position="bottom")+guides(fill=guide_legend(title=df.1$trt,title.position
= "top", ncol=1, keyheight=0.35, default.unit="inch"))

df.2 <- df[df$trt=='M12',]

g <- unique(as.character(df.2$gene))
i <- which(g == "Others")
g <- c(g[-i], g[i])

df.2$trt <- factor(df.2$trt,levels=unique(as.character(df$trt)))
df.2$gene <- factor(df.2$gene,levels = g)

df.2 %>% ggplot(aes(x=trt,y=freq, fill = gene, group = gene)) +
  geom_bar(stat = "identity", width = 0.5) +
  scale_fill_manual(breaks = df$gene, values = df$cols) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4)) +
theme(legend.position="bottom")+guides(fill=guide_legend(title=df.2$trt,title.position
= "top", ncol=1, keyheight=0.35, default.unit="inch"))




On Fri, Jun 5, 2020 at 5:36 AM Rui Barradas  wrote:

> Hello,
>
> Something like this?
>
>
> g <- unique(as.character(df$gene))
> i <- which(g == "Others")
> g <- c(g[i], g[-i])
> df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
> df$gene <- factor(df$gene,levels = g)
>
> ggplot(df, aes(x=trt,y=freq, fill = gene, group = gene)) +
>geom_bar(stat = "identity", width = 0.5,
> position = position_fill()) +
>scale_fill_manual(breaks = df$gene, values = df$cols) +
>theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4))
>
>
> But this places "Others" at the top of each bar.
> To move it to the bottom, instead of the code that creates 'g' run
>
> g <- unique(as.character(df$gene))
> i <- which(g == "Others")
> g <- c(g[-i], g[i])
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 05:14 de 05/06/20, Aimin Yan escreveu:
> > Is there possible to generate a barplot in the following link using
> ggplot?
> >
> > https://photos.app.goo.gl/E3MC461dKaTZfHza9
> >
> > here is what I did
> >
> > library(ggplot2)
> >
> > df <- read.csv(text=
> > "trt,gene,freq,cols
> > M6,ALDH16A1,100.000,red
> > M6,Others,0.000,lightgrey
> > M12,ALDH16A1,64.6638015,red
> > M12,GBE1,2.0074865,#4C00FF
> > M12,ZNF598,1.5832525,#004CFF
> > M12,CHMP6,1.3503397,#00E5FF
> > M12,C20orf27,1.2033828,#00FF4D
> > M12,NEGR1,0.9676972,#4DFF00
> > M12,TNFAIP6,0.9122418,#E6FF00
> > M12,ZSCAN25,0.7375572,#00
> > M12,BCL2,0.6848745,#FFDE59
> > M12,CBL,0.6765562,#FFE0B3
> > M12,Others,25.2128102,lightgrey
> > M18,ALDH16A1,42.4503581,red
> > M18,ATF2,2.2360682,#4C00FF
> > M18,DIAPH1,1.5256507,#004CFF
> > M18,SESTD1,1.2053805,#00E5FF
> > M18,TFCP2,1.1587958,#00FF4D
> > M18,SCAPER,1.1180341,#4DFF00
> > M18,CUX1,1.0306877,#E6FF00
> > M18,TEX10,0.9841030,#00
> > M18,C6orf89,0.9666337,#FFDE59
> > M18,PTTG1IP,0.9258720,#FFE0B3
> > M18,Others,46.3984161,lightgrey")
> >
> > df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
> > df$gene <- factor(df$gene,levels = unique(as.character(df$gene)))
> >
> > ggplot(df, aes(x=trt,y=freq, fill = gene))+geom_bar(stat = "identity",
> > width = 0.5,color="black") + theme(axis.text.x = element_text(angle = 45,
> > hjust = 1,size = 4))
> >
> > df$cols is the color I want to use to label different gene in M6, M12,M18
> > as shown in Figure, and in each bar, the 'Others' of df$gene is always in
> > the bottom of bar in M6,M12,M18
> >
> > Thank you
> >
> > Aimin
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cumulative split of value in data frame column

2020-06-05 Thread Bert Gunter
This is a **plain text list **. In future please post in plain text so that
your post does not get mangled.

Anyway,...

I don't know about "efficient, optimized", but here's one simple way to do
it using ?strsplit to unsplit and then ?paste to recombine:

df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))

cumsplit<- function(x,split = "_"){
w <- x[1]
for(i in seq_along(x)[-1])  w <- c(w, paste(w[i-1],x[i], sep = split))
w
}

> lapply(strsplit(df$FOO, split = "_"), cumsplit)
[[1]]
[1] "A"   "A_B"

[[2]]
[1] "A" "A_B"   "A_B_C"

[[3]]
[1] "A" "A_B"   "A_B_C" "A_B_C_D"   "A_B_C_D_E"

I wouldn't be surprised if clever use of regex's would be faster, but as I
said, this is simple.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jun 5, 2020 at 9:33 AM Ravi Jeyaraman  wrote:

> Assuming, I have a data frame like this ..
>
> df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))
>
> I want to do a 'cumulative split' of the values in column FOO based on the
> delimiter '_'.  The end result should be like this ..
>
> ID  FOO FOO_SPLIT1  FOO_SPLIT2  FOO_SPLIT3
> FOO_SPLIT4  FOO_SPLIT5
> 1   A_B AA_B
> 2   A_B_C   A   A_B
> A_B_C
> 3   A_B_C_D_E   AA_BA_B_C
> A_B_C_D A_B_C_D_E
>
> Any efficient, optimized way to do this?
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in gee.fit$working.correlation[1, 2] : subscript out of bounds

2020-06-05 Thread William Dunlap via R-help
The usual reason for the 'subscript out of bounds' error is that an array's
subscripts exceed the dimensions of the array.  In this case
gee.fit$working.correlation is a 1 by 1 matrix, so subscripting with [1,2]
will cause the error.

Here is a self-contained example that you can send the package's maintainer.

> maintainer("geesmv")
[1] "Zheng Li "

> dx <- cbind(id=1:18, y=sin(1:18), expand.grid(period=c(1.1,1.2,1.3),
Ijt=c("i","ii","iii"))[c(1:9,1:9),])
> options(error=recover)
> test <- GEE.var.fg(y ~ factor(period) +
factor(Ijt),id="id",family=gaussian, dx,corstr="exchangeable")
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
  (Intercept) factor(period)1.2 factor(period)1.3 factor(Ijt)ii
   0.02712257   -0.06015777   -0.115557840.04243596
   factor(Ijt)iii
   0.04114518
Error in gee.fit$working.correlation[1, 2] : subscript out of bounds

Enter a frame number, or 0 to exit

1: GEE.var.fg(y ~ factor(period) + factor(Ijt), id = "id", family =
gaussian,

Selection: 1
Called from: top level
Browse[1]> str(gee.fit$working.correlation)
 num [1, 1] 1

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jun 5, 2020 at 12:28 AM Phat Chau 
wrote:

> Hello,
>
> I have a dataframe in R that looks like the following
>
>   cluster id period   u_3 timeID startTrt Ijterror  y
> 1:   1  1  0 -1.26  11   0   1.2015 17.809
> 2:   1  2  0 -1.26  11   0  -1.6577 14.950
> 3:   1  3  0 -1.26  11   0  -3.8639 12.744
> 4:   1  4  0 -1.26  11   0   1.4978 18.105
> 5:   1  5  0 -1.26  11   0  -5.3182 11.289
>
> When I try to run a gee model on it using the geesmv package which adjusts
> the variance covariance matrix for small sample sizes as follows
>
> test <- GEE.var.fg(y ~ factor(period) +
> factor(Ijt),id="id",family=gaussian, dx,corstr="exchangeable")
>
> I get this error message:
>
> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> running glm to get initial regression estimate
> (Intercept) factor(period)1 factor(period)2 factor(period)3
> factor(period)4 factor(period)5factor(Ijt)1
>   17.25   -8.27   -6.47   -9.13
>  -8.17  -11.898.96
> Error in gee.fit$working.correlation[1, 2] : subscript out of bounds
>
> I think the usual culprit for this kind of error message is that the
> variable being referred to (id in this case I assume) is non-existent. That
> is clearly not the case here and I checked to make sure it is it not a typo.
>
> Does anyone know why this is? How would I troubleshoot this?
>
> Thank you,
> Edward
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to add a calculated column into a data frame

2020-06-05 Thread Ravi Jeyaraman
How about something like this?

df <- data.frame(ID=1:3, DTVAL=c("2009-03-21","2010-05-11","2020-05-05"))

df <- df %>% mutate(YEAR = as.numeric(format(as.Date(DTVAL,'%Y-%m-%d'),
'%Y')))



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Charles Thuo
Sent: Friday, June 05, 2020 12:18 AM
To: r-help@r-project.org
Subject: [R] how to add a calculated column into a data frame

Dear  Sirs,

I have a data frame that has a column that shows the transaction date.

How do i add another column that  extracts the year of transaction from the
transaction date.

Charles

[[alternative HTML version deleted]]

__
R-help@r-project.org   mailing list -- To
UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 
This email has been checked for viruses by AVG.
https://www.avg.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cumulative split of value in data frame column

2020-06-05 Thread Ravi Jeyaraman
Assuming, I have a data frame like this ..

df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))

I want to do a 'cumulative split' of the values in column FOO based on the
delimiter '_'.  The end result should be like this ..

ID  FOO FOO_SPLIT1  FOO_SPLIT2  FOO_SPLIT3
FOO_SPLIT4  FOO_SPLIT5
1   A_B AA_B
2   A_B_C   A   A_B
A_B_C
3   A_B_C_D_E   AA_BA_B_C
A_B_C_D A_B_C_D_E

Any efficient, optimized way to do this?


-- 
This email has been checked for viruses by AVG.
https://www.avg.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ask help for ggplot

2020-06-05 Thread Rui Barradas

Hello,

Something like this?


g <- unique(as.character(df$gene))
i <- which(g == "Others")
g <- c(g[i], g[-i])
df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
df$gene <- factor(df$gene,levels = g)

ggplot(df, aes(x=trt,y=freq, fill = gene, group = gene)) +
  geom_bar(stat = "identity", width = 0.5,
   position = position_fill()) +
  scale_fill_manual(breaks = df$gene, values = df$cols) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4))


But this places "Others" at the top of each bar.
To move it to the bottom, instead of the code that creates 'g' run

g <- unique(as.character(df$gene))
i <- which(g == "Others")
g <- c(g[-i], g[i])


Hope this helps,

Rui Barradas


Às 05:14 de 05/06/20, Aimin Yan escreveu:

Is there possible to generate a barplot in the following link using ggplot?

https://photos.app.goo.gl/E3MC461dKaTZfHza9

here is what I did

library(ggplot2)

df <- read.csv(text=
"trt,gene,freq,cols
M6,ALDH16A1,100.000,red
M6,Others,0.000,lightgrey
M12,ALDH16A1,64.6638015,red
M12,GBE1,2.0074865,#4C00FF
M12,ZNF598,1.5832525,#004CFF
M12,CHMP6,1.3503397,#00E5FF
M12,C20orf27,1.2033828,#00FF4D
M12,NEGR1,0.9676972,#4DFF00
M12,TNFAIP6,0.9122418,#E6FF00
M12,ZSCAN25,0.7375572,#00
M12,BCL2,0.6848745,#FFDE59
M12,CBL,0.6765562,#FFE0B3
M12,Others,25.2128102,lightgrey
M18,ALDH16A1,42.4503581,red
M18,ATF2,2.2360682,#4C00FF
M18,DIAPH1,1.5256507,#004CFF
M18,SESTD1,1.2053805,#00E5FF
M18,TFCP2,1.1587958,#00FF4D
M18,SCAPER,1.1180341,#4DFF00
M18,CUX1,1.0306877,#E6FF00
M18,TEX10,0.9841030,#00
M18,C6orf89,0.9666337,#FFDE59
M18,PTTG1IP,0.9258720,#FFE0B3
M18,Others,46.3984161,lightgrey")

df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
df$gene <- factor(df$gene,levels = unique(as.character(df$gene)))

ggplot(df, aes(x=trt,y=freq, fill = gene))+geom_bar(stat = "identity",
width = 0.5,color="black") + theme(axis.text.x = element_text(angle = 45,
hjust = 1,size = 4))

df$cols is the color I want to use to label different gene in M6, M12,M18
as shown in Figure, and in each bar, the 'Others' of df$gene is always in
the bottom of bar in M6,M12,M18

Thank you

Aimin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Spatial model for categorical data

2020-06-05 Thread Lena Fehlhaber
I did a regression analysis with categorical data with a glm model
approach, which worked fine. I have longitude and latitude coordinates for
each observation and I want to add their geographic spillover effect to the
model.

My sample data is structured:

Index DV IVI IVII IVIII IVIV Long Lat
 1  0  2  1  3  -12  -17.8  12
 2  0  1  1  6  112  11  -122
 3  1  3  6  1  91  57  53

with regression eq. DV ~ IVI + IVII + IVIII + IVIV

That mentioned, I assume that the nearer regions are, the more it may
influence my dependant variable. I found several approaches for spatial
regression models, but not for categorical data. When I try to use existing
libraries and functions, such as spdep's lagsarlm, glmmfields, spatialreg,
gstat, geoRglm and many more (I used this list as a reference:
https://cran.r-project.org/web/views/Spatial.html ). For numeric values, I
am able to do spatial regression, but for categorical values, I struggle.
The data structure is the following:

library(dplyr)
data <- data %>%
  mutate(
DV = as.factor(DV),
IVI = as.factor(IVI),
IVII = as.factor(IVII),
IVIII = as.factor(IVIII),
IVIV = as.numeric(IVIV),
longitude = as.numeric(longitude),
latitude = as.numeric(latitude)
  )

My dependant variable (0|1) as well as my independant variables are
categorical and it would be no use to transform them, of course. I want to
have an other glm model in the end, but with spatial spillover effects
included. The libraries I tested so far can't handle categorical data. Any
leads/ideas would be greatly appreciated.

Thanks a lot.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in gee.fit$working.correlation[1, 2] : subscript out of bounds

2020-06-05 Thread Phat Chau
Hello,

I have a dataframe in R that looks like the following

  cluster id period   u_3 timeID startTrt Ijterror  y
1:   1  1  0 -1.26  11   0   1.2015 17.809
2:   1  2  0 -1.26  11   0  -1.6577 14.950
3:   1  3  0 -1.26  11   0  -3.8639 12.744
4:   1  4  0 -1.26  11   0   1.4978 18.105
5:   1  5  0 -1.26  11   0  -5.3182 11.289

When I try to run a gee model on it using the geesmv package which adjusts the 
variance covariance matrix for small sample sizes as follows

test <- GEE.var.fg(y ~ factor(period) + factor(Ijt),id="id",family=gaussian, 
dx,corstr="exchangeable")

I get this error message:

Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate
(Intercept) factor(period)1 factor(period)2 factor(period)3 factor(period)4 
factor(period)5factor(Ijt)1
  17.25   -8.27   -6.47   -9.13   -8.17 
 -11.898.96
Error in gee.fit$working.correlation[1, 2] : subscript out of bounds

I think the usual culprit for this kind of error message is that the variable 
being referred to (id in this case I assume) is non-existent. That is clearly 
not the case here and I checked to make sure it is it not a typo.

Does anyone know why this is? How would I troubleshoot this?

Thank you,
Edward


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying a function to dataframe column where the function value depends on the value of another column

2020-06-05 Thread Jeff Newmiller
Press send too soon? This is not actually a question.

Do read the Posting Guide... for one thing you need to post in plain text 
because the automatic text conversion tends to mess up what you send if it is 
HTML.

On June 5, 2020 12:02:44 AM PDT, TJUN KIAT TEO  wrote:
>Suppose I have a dataframe in this from
>
>a b c
>g 2 3
>h 4 5
>i 6 7
>
>I want to apply a function to individual elements of column C where the
>function value depends on the value of column A
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Applying a function to dataframe column where the function value depends on the value of another column

2020-06-05 Thread TJUN KIAT TEO
Suppose I have a dataframe in this from

a b c
g 2 3
h 4 5
i 6 7

I want to apply a function to individual elements of column C where the 
function value depends on the value of column A

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.