[R] vignette for package splines2 in Journal of Data Science

2021-09-27 Thread Yan, Jun
Dear R-help-listers,

Users of splines2 may find the following paper in the Journal of Data Science 
(https://jds-online.org) interesting:

Wang, W. and Yan, J. (2021): Shape-restricted regression splines with R package 
splines2. Journal of Data Science. 19(3):498–517.
https://doi.org/10.6339/21-JDS1020

As Editor of the Journal of Data Science, I hope that the paper sets an example 
of such papers in the Computing for Data Science Section of the Journal. This 
could be an outlet for your software package companions.

Jun Yan, Professor
Department of Statistics, University of Connecticut
215 Glenbrook Rd. Unit 4120  Storrs, CT 06269
Voice: 860-486-3416  Fax: 860-486-4113
Web: http://www.stat.uconn.edu/~jyan/
http://scholar.google.com/citations?user=4jVhnnEJ=en
http://www.ams.org/mathscinet/search/publications.html?pg1=IID=743600

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Change the oder of stacked bar

2020-06-05 Thread Aimin Yan
I want to use the code below this message to make stacked bar plot, my
question is :


I want the stacked bar and its legend following the order as tr from
left to right like the following:

"100.0.250ng_CellLine_0" "75.25.250ng_CellLine_0"
"50.50.250ng_CellLine_0" "10.90.250ng_CellLine_0"
"1.99.250ng_CellLine_0" "0.100.250ng_CellLine_0"
"100.0.500ng_CellLine_0" "75.25.500ng_CellLine_0"
"50.50.500ng_CellLine_0" "10.90.500ng_CellLine_0"
"1.99.500ng_CellLine_0" "0.100.500ng_CellLine_0"

However, It seems the following code does not generate the stacked bar
as this order

In addition, for '0.100.500ng_CellLine_0' in df, the order for gene
and color in stacked bar is not same as the order in df, how to change
this?

Another question is:

tr has 12 treatments, I have to add new_scale_fill() for each
treatment, so I get long code, Is there a way to simplify this?

Thank you

Aimin


library(ggplot2)

library(dplyr)

library(tidyverse)

library(ggnewscale)

df <- read.csv(text='"trt","gene","freq","cols"
 "100.0.250ng_CellLine_0","ALDH16A1",100,"red"
 "100.0.250ng_CellLine_0","Others",0,"lightgrey"
 "75.25.250ng_CellLine_0","ALDH16A1",64.6638014695688,"red"
 "75.25.250ng_CellLine_0","GBE1",2.0074864827395,"#4C00FF"
 "75.25.250ng_CellLine_0","ZNF598",1.5832524608346,"#004CFF"
 "75.25.250ng_CellLine_0","CHMP6",1.35033966449466,"#00E5FF"
 "75.25.250ng_CellLine_0","C20orf27",1.2033827810897,"#00FF4D"
 "75.25.250ng_CellLine_0","NEGR1",0.967697213364758,"#4DFF00"
 "75.25.250ng_CellLine_0","TNFAIP6",0.912241785664772,"#E6FF00"
 "75.25.250ng_CellLine_0","ZSCAN25",0.737557188409816,"#00"
 "75.25.250ng_CellLine_0","BCL2",0.684874532094829,"#FFDE59"
 "75.25.250ng_CellLine_0","CBL",0.676556217939831,"#FFE0B3"
 "75.25.250ng_CellLine_0","Others",25.2128102037987,"lightgrey"
 "50.50.250ng_CellLine_0","ALDH16A1",42.4503581203051,"red"
 "50.50.250ng_CellLine_0","ATF2",2.2360682428,"#4C00FF"
 "50.50.250ng_CellLine_0","DIAPH1",1.52565073079835,"#004CFF"
 "50.50.250ng_CellLine_0","SESTD1",1.20538053921854,"#00E5FF"
 "50.50.250ng_CellLine_0","TFCP2",1.15879578407966,"#00FF4D"
 "50.50.250ng_CellLine_0","SCAPER",1.1180341214,"#4DFF00"
 "50.50.250ng_CellLine_0","CUX1",1.03068770744774,"#E6FF00"
 "50.50.250ng_CellLine_0","TEX10",0.984102952308857,"#00"
 "50.50.250ng_CellLine_0","C6orf89",0.966633669131777,"#FFDE59"
 "50.50.250ng_CellLine_0","PTTG1IP",0.925872008385256,"#FFE0B3"
 "50.50.250ng_CellLine_0","Others",46.3984161183253,"lightgrey"
 "10.90.250ng_CellLine_0","ALDH16A1",4.68952007835455,"red"
 "10.90.250ng_CellLine_0","STK11",1.93143976493634,"#4C00FF"
 "10.90.250ng_CellLine_0","ERGIC2",1.46523016650343,"#004CFF"
 "10.90.250ng_CellLine_0","EFR3A",1.1126346718903,"#00E5FF"
 "10.90.250ng_CellLine_0","TMEM235",1.03819784524976,"#00FF4D"
 "10.90.250ng_CellLine_0","NGLY1",1.01469147894221,"#4DFF00"
 "10.90.250ng_CellLine_0","CNOT10",0.991185112634672,"#E6FF00"
 "10.90.250ng_CellLine_0","NPLOC4",0.983349657198825,"#00"
 "10.90.250ng_CellLine_0","GZMB",0.928501469147894,"#FFDE59"
 "10.90.250ng_CellLine_0","KIF2C",0.924583741429971,"#FFE0B3"
 "10.90.250ng_CellLine_0","Others",84.9206660137121,"lightgrey"
 "1.99.250ng_CellLine_0","DNAH1",2.36284289276808,"red"
 "1.99.250ng_CellLine_0","ALOX5AP",2.29426433915212,"#4C00FF"
 "1.99.250ng_CellLine_0","SEPT7",1.78304239401496,"#004CFF"
 "1.99.250ng_CellLine_0","TCF20",1.35910224438903,"#00E5FF"
 "1.99.250ng_CellLine_0","USP32",1.27805486284289,"#00FF4D"
 "1.99.250ng_CellLine_0","MUS81",1.24688279301746,"#4DFF00"
 "1.99.250ng_CellLine_0","CEP44",1.22817955112219,"#E6FF00"
 "1.99.250ng_CellLine_0","TMEM164",1.20324189526185,"#00"
 "1.99.250ng_CellLine_0","RAP1B",1.18453865336658,"#FFDE59"
 "1.99.250ng_CellLine_0","GSN",1.14713216957606,"#FFE0B3"
 "1.99.250ng_CellLine_0","Others",84.9127182044888,"lightgrey"
 "0.100.250ng_CellLine_0","RTN3",2.3050199437531,"red"
 "0.100.250ng_CellLine_0","CHTF18",1.67637814091135,"#4C00FF"
 "0.100.250ng_CellLine_0","RNPS1",1.41168685550429,"#004CFF"
 "0.100.250ng_CellLine_0","RBKS",1.05325073984891,"#00E5FF"
 "0.100.250ng_CellLine_0","ZNF805",0.987077918497142,"#00FF4D"
 

[R] (no subject)

2020-06-05 Thread Aimin Yan
I want the stacked bar and its legend following the order as tr from
left to right like the following:

"100.0.250ng_CellLine_0" "75.25.250ng_CellLine_0"
"50.50.250ng_CellLine_0" "10.90.250ng_CellLine_0"
"1.99.250ng_CellLine_0" "0.100.250ng_CellLine_0"
"100.0.500ng_CellLine_0" "75.25.500ng_CellLine_0"
"50.50.500ng_CellLine_0" "10.90.500ng_CellLine_0"
"1.99.500ng_CellLine_0" "0.100.500ng_CellLine_0"

However, It seems the above code does not generate the stacked bar as this order

In addition, for '0.100.500ng_CellLine_0' in df, the order for gene
and color in stacked bar is not same as the order in df:



0.100.500ng_CellLine_0   ALYREF   1.5326986   red
  0.100.500ng_CellLine_0HCG18   1.5108475   #4C00FF
  0.100.500ng_CellLine_0RNU7-146P   0.9224286   #004CFF
  0.100.500ng_CellLine_0  ST3GAL3   0.8849696   #00E5FF
  0.100.500ng_CellLine_0 HSF1   0.8116123   #00FF4D
  0.100.500ng_CellLine_0   HP1BP3   0.7928828   #4DFF00
  0.100.500ng_CellLine_0 DAOA   0.7366942   #E6FF00
  0.100.500ng_CellLine_0CDK13   0.6898705   #00
  0.100.500ng_CellLine_0   PDXDC1   0.6805057   #FFDE59
  0.100.500ng_CellLine_0CKAP5   0.6477290   #FFE0B3
  0.100.500ng_CellLine_0   Others  90.7897612 lightgrey'

library(dplyr)
library(tidyverse)
library(ggnewscale)

df <- read.csv(text='"trt","gene","freq","cols"
 "100.0.250ng_CellLine_0","ALDH16A1",100,"red"
 "100.0.250ng_CellLine_0","Others",0,"lightgrey"
 "75.25.250ng_CellLine_0","ALDH16A1",64.6638014695688,"red"
 "75.25.250ng_CellLine_0","GBE1",2.0074864827395,"#4C00FF"
 "75.25.250ng_CellLine_0","ZNF598",1.5832524608346,"#004CFF"
 "75.25.250ng_CellLine_0","CHMP6",1.35033966449466,"#00E5FF"
 "75.25.250ng_CellLine_0","C20orf27",1.2033827810897,"#00FF4D"
 "75.25.250ng_CellLine_0","NEGR1",0.967697213364758,"#4DFF00"
 "75.25.250ng_CellLine_0","TNFAIP6",0.912241785664772,"#E6FF00"
 "75.25.250ng_CellLine_0","ZSCAN25",0.737557188409816,"#00"
 "75.25.250ng_CellLine_0","BCL2",0.684874532094829,"#FFDE59"
 "75.25.250ng_CellLine_0","CBL",0.676556217939831,"#FFE0B3"
 "75.25.250ng_CellLine_0","Others",25.2128102037987,"lightgrey"
 "50.50.250ng_CellLine_0","ALDH16A1",42.4503581203051,"red"
 "50.50.250ng_CellLine_0","ATF2",2.2360682428,"#4C00FF"
 "50.50.250ng_CellLine_0","DIAPH1",1.52565073079835,"#004CFF"
 "50.50.250ng_CellLine_0","SESTD1",1.20538053921854,"#00E5FF"
 "50.50.250ng_CellLine_0","TFCP2",1.15879578407966,"#00FF4D"
 "50.50.250ng_CellLine_0","SCAPER",1.1180341214,"#4DFF00"
 "50.50.250ng_CellLine_0","CUX1",1.03068770744774,"#E6FF00"
 "50.50.250ng_CellLine_0","TEX10",0.984102952308857,"#00"
 "50.50.250ng_CellLine_0","C6orf89",0.966633669131777,"#FFDE59"
 "50.50.250ng_CellLine_0","PTTG1IP",0.925872008385256,"#FFE0B3"
 "50.50.250ng_CellLine_0","Others",46.3984161183253,"lightgrey"
 "10.90.250ng_CellLine_0","ALDH16A1",4.68952007835455,"red"
 "10.90.250ng_CellLine_0","STK11",1.93143976493634,"#4C00FF"
 "10.90.250ng_CellLine_0","ERGIC2",1.46523016650343,"#004CFF"
 "10.90.250ng_CellLine_0","EFR3A",1.1126346718903,"#00E5FF"
 "10.90.250ng_CellLine_0","TMEM235",1.03819784524976,"#00FF4D"
 "10.90.250ng_CellLine_0","NGLY1",1.01469147894221,"#4DFF00"
 "10.90.250ng_CellLine_0","CNOT10",0.991185112634672,"#E6FF00"
 "10.90.250ng_CellLine_0","NPLOC4",0.983349657198825,"#00"
 "10.90.250ng_CellLine_0","GZMB",0.928501469147894,"#FFDE59"
 "10.90.250ng_CellLine_0","KIF2C",0.924583741429971,"#FFE0B3"
 "10.90.250ng_CellLine_0","Others",84.9206660137121,"lightgrey"
 "1.99.250ng_CellLine_0","DNAH1",2.36284289276808,"red"
 "1.99.250ng_CellLine_0","ALOX5AP",2.29426433915212,"#4C00FF"
 "1.99.250ng_CellLine_0","SEPT7",1.78304239401496,"#004CFF"
 "1.99.250ng_CellLine_0","TCF20",1.35910224438903,"#00E5FF"
 "1.99.250ng_CellLine_0","USP32",1.27805486284289,"#00FF4D"
 "1.99.250ng_CellLine_0","MUS81",1.24688279301746,"#4DFF00"
 "1.99.250ng_CellLine_0","CEP44",1.22817955112219,"#E6FF00"
 "1.99.250ng_CellLine_0","TMEM164",1.20324189526185,"#00"
 "1.99.250ng_CellLine_0","RAP1B",1.18453865336658,"#FFDE59"
 "1.99.250ng_CellLine_0","GSN",1.14713216957606,"#FFE0B3"
 

Re: [R] ask help for ggplot

2020-06-05 Thread Aimin Yan
Thank you, it is very helpful.

I tried the following way to generate stacked bar plot for trt 'M6' and
'M12'

However, the label position of legend in 'M12' is not what I want,
actually in the legend I also want to keep "Others" in the bottom(like the
gene order in stacked bar)

In addition, how to  make  a stacked bar plot for 'M6','M12' and 'M18'
together with different legends('M6', 'M12', 'M18')

Thank you,

Aimin

df.1 <- df[df$trt=='M6',]

g <- unique(as.character(df.1$gene))
i <- which(g == "Others")
g <- c(g[-i], g[i])

df.1$trt <- factor(df.1$trt,levels=unique(as.character(df$trt)))
df.1$gene <- factor(df.1$gene,levels = g)

df.1 %>% ggplot(aes(x=trt,y=freq, fill = gene, group = gene)) +
  geom_bar(stat = "identity", width = 0.5) +
  scale_fill_manual(breaks = df$gene, values = df$cols) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4)) +
theme(legend.position="bottom")+guides(fill=guide_legend(title=df.1$trt,title.position
= "top", ncol=1, keyheight=0.35, default.unit="inch"))

df.2 <- df[df$trt=='M12',]

g <- unique(as.character(df.2$gene))
i <- which(g == "Others")
g <- c(g[-i], g[i])

df.2$trt <- factor(df.2$trt,levels=unique(as.character(df$trt)))
df.2$gene <- factor(df.2$gene,levels = g)

df.2 %>% ggplot(aes(x=trt,y=freq, fill = gene, group = gene)) +
  geom_bar(stat = "identity", width = 0.5) +
  scale_fill_manual(breaks = df$gene, values = df$cols) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4)) +
theme(legend.position="bottom")+guides(fill=guide_legend(title=df.2$trt,title.position
= "top", ncol=1, keyheight=0.35, default.unit="inch"))




On Fri, Jun 5, 2020 at 5:36 AM Rui Barradas  wrote:

> Hello,
>
> Something like this?
>
>
> g <- unique(as.character(df$gene))
> i <- which(g == "Others")
> g <- c(g[i], g[-i])
> df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
> df$gene <- factor(df$gene,levels = g)
>
> ggplot(df, aes(x=trt,y=freq, fill = gene, group = gene)) +
>geom_bar(stat = "identity", width = 0.5,
> position = position_fill()) +
>scale_fill_manual(breaks = df$gene, values = df$cols) +
>theme(axis.text.x = element_text(angle = 45, hjust = 1,size = 4))
>
>
> But this places "Others" at the top of each bar.
> To move it to the bottom, instead of the code that creates 'g' run
>
> g <- unique(as.character(df$gene))
> i <- which(g == "Others")
> g <- c(g[-i], g[i])
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 05:14 de 05/06/20, Aimin Yan escreveu:
> > Is there possible to generate a barplot in the following link using
> ggplot?
> >
> > https://photos.app.goo.gl/E3MC461dKaTZfHza9
> >
> > here is what I did
> >
> > library(ggplot2)
> >
> > df <- read.csv(text=
> > "trt,gene,freq,cols
> > M6,ALDH16A1,100.000,red
> > M6,Others,0.000,lightgrey
> > M12,ALDH16A1,64.6638015,red
> > M12,GBE1,2.0074865,#4C00FF
> > M12,ZNF598,1.5832525,#004CFF
> > M12,CHMP6,1.3503397,#00E5FF
> > M12,C20orf27,1.2033828,#00FF4D
> > M12,NEGR1,0.9676972,#4DFF00
> > M12,TNFAIP6,0.9122418,#E6FF00
> > M12,ZSCAN25,0.7375572,#00
> > M12,BCL2,0.6848745,#FFDE59
> > M12,CBL,0.6765562,#FFE0B3
> > M12,Others,25.2128102,lightgrey
> > M18,ALDH16A1,42.4503581,red
> > M18,ATF2,2.2360682,#4C00FF
> > M18,DIAPH1,1.5256507,#004CFF
> > M18,SESTD1,1.2053805,#00E5FF
> > M18,TFCP2,1.1587958,#00FF4D
> > M18,SCAPER,1.1180341,#4DFF00
> > M18,CUX1,1.0306877,#E6FF00
> > M18,TEX10,0.9841030,#00
> > M18,C6orf89,0.9666337,#FFDE59
> > M18,PTTG1IP,0.9258720,#FFE0B3
> > M18,Others,46.3984161,lightgrey")
> >
> > df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
> > df$gene <- factor(df$gene,levels = unique(as.character(df$gene)))
> >
> > ggplot(df, aes(x=trt,y=freq, fill = gene))+geom_bar(stat = "identity",
> > width = 0.5,color="black") + theme(axis.text.x = element_text(angle = 45,
> > hjust = 1,size = 4))
> >
> > df$cols is the color I want to use to label different gene in M6, M12,M18
> > as shown in Figure, and in each bar, the 'Others' of df$gene is always in
> > the bottom of bar in M6,M12,M18
> >
> > Thank you
> >
> > Aimin
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ask help for ggplot

2020-06-04 Thread Aimin Yan
Is there possible to generate a barplot in the following link using ggplot?

https://photos.app.goo.gl/E3MC461dKaTZfHza9

here is what I did

library(ggplot2)

df <- read.csv(text=
"trt,gene,freq,cols
M6,ALDH16A1,100.000,red
M6,Others,0.000,lightgrey
M12,ALDH16A1,64.6638015,red
M12,GBE1,2.0074865,#4C00FF
M12,ZNF598,1.5832525,#004CFF
M12,CHMP6,1.3503397,#00E5FF
M12,C20orf27,1.2033828,#00FF4D
M12,NEGR1,0.9676972,#4DFF00
M12,TNFAIP6,0.9122418,#E6FF00
M12,ZSCAN25,0.7375572,#00
M12,BCL2,0.6848745,#FFDE59
M12,CBL,0.6765562,#FFE0B3
M12,Others,25.2128102,lightgrey
M18,ALDH16A1,42.4503581,red
M18,ATF2,2.2360682,#4C00FF
M18,DIAPH1,1.5256507,#004CFF
M18,SESTD1,1.2053805,#00E5FF
M18,TFCP2,1.1587958,#00FF4D
M18,SCAPER,1.1180341,#4DFF00
M18,CUX1,1.0306877,#E6FF00
M18,TEX10,0.9841030,#00
M18,C6orf89,0.9666337,#FFDE59
M18,PTTG1IP,0.9258720,#FFE0B3
M18,Others,46.3984161,lightgrey")

df$trt <- factor(df$trt,levels=unique(as.character(df$trt)))
df$gene <- factor(df$gene,levels = unique(as.character(df$gene)))

ggplot(df, aes(x=trt,y=freq, fill = gene))+geom_bar(stat = "identity",
width = 0.5,color="black") + theme(axis.text.x = element_text(angle = 45,
hjust = 1,size = 4))

df$cols is the color I want to use to label different gene in M6, M12,M18
as shown in Figure, and in each bar, the 'Others' of df$gene is always in
the bottom of bar in M6,M12,M18

Thank you

Aimin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ask help for ggplot

2020-06-04 Thread Aimin Yan
I have a question about using ggplot.

Is there possible to generate a barplot like the attached file using ggplot?

Thank you,

Aimin
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nominations sought for the 2021 ASA Statistical Computing and Graphics Award

2019-12-19 Thread Yan, Jun
(Apologies for cross-posting)

Dear Colleagues,

The ASA Section on Statistical Computing and Section on Statistical Graphics 
are inviting nominations of deserving individuals or teams for the 2021 ASA 
Statistical Computing and Graphics Award 
(https://community.amstat.org/jointscsg-section/awards/computing-graphics-award).
 

The Statistical Computing and Graphics Award recognizes an individual or team 
for innovation in computing, software, or graphics that has had a significant 
impact on statistical practice or research. The past awardees are Luke Tierney 
(2019), Bill Cleveland (2016), and Robert Gentleman and Ross Ihaka (2010). The 
prize carries with it a cash award of $5,000 plus an allowance of up to $1,000 
for travel to the Joint Statistical Meetings (JSM) where the award will be 
presented. Nominations packets have to be submitted by email to Dr. Jun Yan, 
the Awards Chair of the two sections, at jun@uconn.edu by May 31, 2020. Dr. 
Jun Yan is the Award Chair of the two sections and will be pleased to answer 
any questions about the submission process and the preparation of the 
nomination materials.

Qualifications

The prize-winning contribution will have had significant and lasting impacts on 
statistical computing, software or graphics.

The nominee should be a member of the ASA. The Statistical Computing and 
Graphics Award Committee will review the nominations and make the final 
determination of who, if any, should receive the award. The award may not be 
given to a sitting member of the Awards Committee or a sitting member of the 
Executive Committee of the Section of Statistical Computing or the Section of 
Statistical Graphics.

Nomination and Award Dates

Nominations are due by May 31, 2020 for an award to be presented at the 2021 
JSM.

Nominations should be submitted as a complete packet, consisting of:

+ a nomination letter, no longer than four pages, addressing points in the 
selection criteria
+ nominee’s curriculum vita(e)
+ a minimum of 3 (and no more than 4) supporting letters, each no longer than 
two pages

Selection Process

The Committee will consist of the Chairs and Past Chairs of the Section of 
Statistical Computing and the Section of Statistical Graphics. The committee 
will meet at the 2020 JSM to select the recipient(s) of the award.

Jun Yan, Awards Chair
ASA Section on Statistical Computing and
Section on Statistical Graphics
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 2020 Chambers Statistical Software Award submission deadline extended to December 31, 2019

2019-12-18 Thread Yan, Jun
Dear R-help listers,

The deadline for the 2020 Chambers Statistical Software Award submission has 
been extended to December 31, 2019.

The Statistical Computing Section of the American Statistical Association 
announces the competition for the John M. Chambers Statistical Software Award. 
In 1998 the Association for Computing Machinery (ACM) presented the ACM 
Software System Award to John Chambers for the design and development of S. Dr. 
Chambers generously donated his award to the Statistical Computing Section to 
endow an annual prize for statistical software written by, or in collaboration 
with, an undergraduate or graduate student.

Please see http://asa.stat.uconn.edu/#chambers-2020 for detailed instructions.

Best regards


Jun Yan, Awards Chair

ASA Section on Statistical Computing

Professor, Department of Statistics

University of Connecticut


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Introducing teamr Package

2019-07-24 Thread Tanbing Yan
Today I am so pleased to introduce my first CRAN package for sending
formatted messages to Microsoft Teams, teamr
.

Motivation is simple here. For years I have been using Slack and built many
slash commands and apps using incoming webhooks with R, but ever since I
started to use Teams, I found that we will have the same needs for
communicating with R as well. So with some inspiration from the Python
package pymsteams . I created
teamr package
with the hope that this package will provide a simple and clean way to talk
to Teams from R.
Installation

You can install the released version of teamr from CRAN
 with:

install.packages("teamr")

And the development version from GitHub  with:

# install.packages("devtools")
devtools::install_github("wwwjk366/teamr")

Example

This is a basic example of send a simple titled message to MS Teams:

library(teamr)
# initiate new connector card object
cc <- connector_card$new(hookurl =
"https://outlook.office.com/webhook/...;)# add text
cc$text("This is text of main body.")# add title
cc$title("This is message title")# add hyperlink button
cc$add_link_button("Read more", "https://www.google.com;)# change theme color
cc$color("#008000")

We can print out the payload that will be sending to given webhook using
printmethod

# print out the payload for checking
cc$print()

Card:
  hookurl: https://outlook.office.com/webhook/...
  payload:  {"text":"This is text of main body.","title":"This is
message 
title","potentialAction":[{"@context":"http://schema.org","@type":"ViewAction","name":"Read
more","target":["https://www.google.com"]}],"themeColor":"#008000"}

Our JSON payload looks good, time to send it out :)

# send to Teams
cc$send()

[1] TRUE

send menthod will return TRUE if send was successful (status code 200). If
it failed, it will return the reponse object for further investigation. Our
message with a link button will looks like this:

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] color question

2019-02-27 Thread Aimin Yan
The attached is the heatmap.

Thank you,

Aimin

On Wed, Feb 27, 2019 at 10:51 PM Aimin Yan  wrote:

> I have a question about assigning color based on the value of a matrix
>
> The following is my matrix.
>
> d
>  lateRT  earlyRT NAD   ciLAD
> lateRT  1.0 0.00 0.006224017 0.001260241
> earlyRT 0.0 1.00 0.001425649 0.007418436
> NAD 0.006224017 0.0014256488 1.0 0.064653780
> ciLAD   0.001260241 0.0074184361 0.064653780 1.0
> LAD 0.006969928 0.0007096344 0.393556636 0.002483941
>  LAD
> lateRT  0.0069699285
> earlyRT 0.0007096344
> NAD 0.3935566356
> ciLAD   0.0024839407
> LAD 1.00
>
> I want to use the following function to get heatmap and dendrogram
>
> > heatmap.2(d,trace="none",margin=c(8, 10))
>
> but it is hard to use color to make  0.001260241 and 0.0074184361 to be
> visualized differently.
>
> Does anyone know how to adjust color based on these values in this matrix?
>
> Thank you,
>
> Aimin
>
>
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] color question

2019-02-27 Thread Aimin Yan
I have a question about assigning color based on the value of a matrix

The following is my matrix.

d
 lateRT  earlyRT NAD   ciLAD
lateRT  1.0 0.00 0.006224017 0.001260241
earlyRT 0.0 1.00 0.001425649 0.007418436
NAD 0.006224017 0.0014256488 1.0 0.064653780
ciLAD   0.001260241 0.0074184361 0.064653780 1.0
LAD 0.006969928 0.0007096344 0.393556636 0.002483941
 LAD
lateRT  0.0069699285
earlyRT 0.0007096344
NAD 0.3935566356
ciLAD   0.0024839407
LAD 1.00

I want to use the following function to get heatmap and dendrogram

> heatmap.2(d,trace="none",margin=c(8, 10))

but it is hard to use color to make  0.001260241 and 0.0074184361 to be
visualized differently.

Does anyone know how to adjust color based on these values in this matrix?

Thank you,

Aimin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Seeking nomination for the Statistical Computing and Graphics Award

2018-10-18 Thread Yan, Jun
This is a reminder that the deadline is about 4 weeks ahead. We'd appreciate 
your spreading the word or nominating a candidate.


Best regards.

Jun Yan


From: Yan, Jun
Sent: Tuesday, August 21, 2018 6:39:46 AM
To: r-help@R-project.org
Subject: Seeking nomination for the Statistical Computing and Graphics Award

(apologies for cross posting)

The Statistical Computing and Graphics Award of the ASA Sections of Statistical 
Computing and Statistical Graphics recognizes an individual or team for 
innovation in computing, software, or graphics that has had a great impact on 
statistical practice or research. The past awardees include Bill Cleveland 
(2016) and Robert Gentleman and Ross Ihaka (2010). The prize carries with it a 
cash award of $5,000 plus an allowance of up to $1,000 for travel to the next 
Joint Statistical Meetings (JSM) where the award will be presented.

Qualifications
The prize-winning contribution will have had significant and lasting impacts on 
statistical computing, software or graphics.

The Awards Committee depends on the American Statistical Association membership 
to submit nominations. Committee members will review the nominations and make 
the final determination of who, if any, should receive the award. The award may 
not be given to a sitting member of the Awards Committee or a sitting member of 
the Executive Committee of the Section of Statistical Computing or the Section 
of Statistical Graphics.

Nomination and Award Dates
Nominations are due by November 15, 2018 for an award to be presented at the 
JSM in the following year. Nominations should be submitted as a complete 
packet, consisting of:
- a nomination letter, no longer than four pages, addressing points in the 
selection criteria
- nominee's curriculum vita(e)
- a minimum of 3 (and no more than 4) supporting letters, each no longer than 
two pages

Selection Process
The Awards Committee will consist of the Chairs and Past Chairs of the Sections 
on Statistical Computing and Statistical Graphics. The selection process will 
be handled by the Awards Chair of the Statistical Computing Section and the 
Statistical Graphics Section. Nominations and questions are to be sent to the 
e-mail address below.

Jun Yan
Professor
University of Connecticut
jun@uconn.edu






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: Change the position of label when using R package eulerr

2018-09-17 Thread Aimin Yan
Yes, it does. Thank you.

Aimin

On Sun, Sep 16, 2018 at 11:00 PM Paul Murrell 
wrote:

> Hi
>
> The 'x' component of the 't' grob that you get back from grid.get() is a
> unit object, which you can subset and assign to a subset, for example
> this code nudges the fourth label up and to the right 1mm in each
> direction ...
>
>
> x <- t$x
> y <- t$y
>
> x[4] <- t$x[4] + unit(1, "mm")
> y[4] <- t$y[4] + unit(1, "mm")
>
> grid.edit("quantities.grob", x=x, y=y)
>
>
> ... is that the sort of thing you were looking for ?
>
> Paul
>
> On 15/09/18 09:03, Aimin Yan wrote:
> > Thank you,
> >
> > I figure out a way like this:
> >
> > fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
> >  "ciLAD" = 3, "ciLAD_MEF" = 101,
> > "LAD_MEF" = 541,
> >  "ciLAD_MEF" = 2),shape = "ellipse")
> >
> > plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels =
> list(font
> > = 1),alpha=0.7)
> >
> > grid.ls()
> > t <- grid.get("quantities.grob")
> > names(t)
> >
> > # Change these value will change the location of label.
> >
> > grid.edit("quantities.grob",x=unit.c(unit(-14.9884684724791, "native"),
> >   unit(-14.883684319653,
> "native"),
> >   unit(13.9805892820006,
> "native"),
> >   unit(-12.8808987356981,
> "native"),
> >   unit(-11.488226371243,
> "native"),
> >   unit(-9.51474016085318,
> "native"),
> >   unit(-1.00436055190216,
> "native")))
> >
> > grid.edit("quantities.grob",y=unit.c(unit(-8.07672595120493, "native"),
> >   unit(4.78718651828883,
> "native"),
> >   unit(0.25941593099694,
> "native"),
> >   unit(-4.32200781461293,
> "native"),
> >   unit(25.7349463488991,
> "native"),
> >   unit(-22.7610031110325,
> "native"),
> >   unit(14.5001560838519,
> "native")))
> >
> > However, here I just want to change the x and y  value of 4th label, does
> > anyone know how to set it?
> >
> > Aimin
> >
> > On Thu, Sep 13, 2018 at 9:56 PM David Winsemius 
> > wrote:
> >
> >>
> >>> On Sep 13, 2018, at 2:31 PM, Aimin Yan 
> wrote:
> >>>
> >>> I am using eulerr to get venn.
> >>> My code is like:
> >>>
> >>> fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
> >>> "ciLAD" = 3, "ciLAD_MEF" = 101,
> >>> "LAD_MEF" = 541,
> >>> "ciLAD_MEF" = 2),shape = "ellipse")
> >>>
> >>> plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels =
> >> list(font
> >>> = 1),alpha=0.7)
> >>>
> >>> After I get the figure, I find the position of some  labels need to be
> >>> adjusted.
> >>>
> >>> Does anyone has some idea about how to process this?
> >>
> >> Looking at the code of plot.euler we see that the plotting paradigm is
> >> grid. So you could assign the output to a data.object name, search for
> list
> >> items that match the names of the labels you want to reposition, and
> modify
> >> the position values. You would need to be more specific, if you want a
> >> worked example.
> >>
> >> As far as I can see the lables and postions are fairly deep inside a
> list
> >> structure:
> >>
> >>   $ children :List of 1
> >>..$ GRID.gTree.12:List of 5
> >>.. ..$ children
> >>   $ diagram.grob.1
> >>  $children
> >> .. .. .. .. ..$ labels.grob:List of 11
> >>.. .. .. .. .. ..$ label: chr [1:3] "ciLAD" "LAD" "nonXL_MEF"
> >>.. .. .. .. .. ..$ x: 'unit' num [1:3] -1

Re: [R] Change the position of label when using R package eulerr

2018-09-15 Thread Aimin Yan
Thank you, I tried your code, but I still got error:

  fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
"ciLAD" = 3, "ciLAD_MEF" = 101,
"LAD_MEF" = 541,
"ciLAD_MEF" = 2),shape = "ellipse")

fit1.plot <- plot(fit1,quantities = TRUE,fill = rainbow(7),lty =
1:2,labels = list(font = 1),alpha=0.7)
fit1.plot
grid.ls(fit1.plot)
t <- grid.get("quantities.grob")
names(t)
t$label
t$x
t$y

# Try to change the x and y value of the 4th label "2"
grid.edit("quantities.grob",x[[4]]=unit(-11.8262244206465, "native"))
grid.edit("quantities.grob",y[[4]]=unit(-5.19720701058398, "native"))

Error: unexpected '=' in "grid.edit("quantities.grob",x[[4]]="

Aimin

On Fri, Sep 14, 2018 at 6:47 PM David Winsemius 
wrote:

>
> > On Sep 14, 2018, at 2:03 PM, Aimin Yan  wrote:
> >
> > Thank you,
> >
> > I figure out a way like this:
> >
> > fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
> > "ciLAD" = 3, "ciLAD_MEF" = 101,
> "LAD_MEF" = 541,
> > "ciLAD_MEF" = 2),shape = "ellipse")
> >
> > plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels =
> list(font = 1),alpha=0.7)
> >
> > grid.ls()
> > t <- grid.get("quantities.grob")
> > names(t)
> >
> > # Change these value will change the location of label.
> >
> > grid.edit("quantities.grob",x=unit.c(unit(-14.9884684724791, "native"),
> >  unit(-14.883684319653,
> "native"),
> >  unit(13.9805892820006,
> "native"),
> >  unit(-12.8808987356981,
> "native"),
> >  unit(-11.488226371243,
> "native"),
> >  unit(-9.51474016085318,
> "native"),
> >  unit(-1.00436055190216,
> "native")))
> >
> > grid.edit("quantities.grob",y=unit.c(unit(-8.07672595120493, "native"),
> >  unit(4.78718651828883,
> "native"),
> >  unit(0.25941593099694,
> "native"),
> >  unit(-4.32200781461293,
> "native"),
> >  unit(25.7349463488991,
> "native"),
> >  unit(-22.7610031110325,
> "native"),
> >  unit(14.5001560838519,
> "native")))
> >
> > However, here I just want to change the x and y  value of 4th label,
> does anyone know how to set it?
>
> If the t object were a complete grid object, it might have been:
>
> grid.edit("quantities.grob", x[[4]]= unit(-12.8808987356981, "native")
>)
> grid.edit("quantities.grob", y[[4]]= unit(-4.32200781461293, "native"),
>)
>
>
> But I don't think that will succeed since you never assigned the value of
> the plot operation to a name. Instead you pulled out part of the grid
> object that was sitting "free" and unassigned to a name. If you assign that
> value of plot() to `my.plot` you get:
>
>  grid.ls(my.plot)
> euler.diagram
>   GRID.gTree.11
> diagram.grob.1
>   fills.grob.1
>   fills.grob.2
>   fills.grob.3
>   fills.grob.4
>   fills.grob.5
>   fills.grob.6
>   fills.grob.7
>   edges.grob
>   labels.grob
>   quantities.grob
>
> I think you need to work with the tutorials in the grid package.
>
> Look at:
>
> help("grid-package")
>
> --
> David.
>
> >
> > Aimin
> >
> > On Thu, Sep 13, 2018 at 9:56 PM David Winsemius 
> wrote:
> >
> > > On Sep 13, 2018, at 2:31 PM, Aimin Yan 
> wrote:
> > >
> > > I am using eulerr to get venn.
> > > My code is like:
> > >
> > > fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
> > >"ciLAD" = 3, "ciLAD_MEF" = 101,
> > > "LAD_MEF" = 541,
> > >"ciLAD_MEF" = 2),shape 

Re: [R] Change the position of label when using R package eulerr

2018-09-14 Thread Aimin Yan
Thank you,

I figure out a way like this:

fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
"ciLAD" = 3, "ciLAD_MEF" = 101,
"LAD_MEF" = 541,
"ciLAD_MEF" = 2),shape = "ellipse")

plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels = list(font
= 1),alpha=0.7)

grid.ls()
t <- grid.get("quantities.grob")
names(t)

# Change these value will change the location of label.

grid.edit("quantities.grob",x=unit.c(unit(-14.9884684724791, "native"),
 unit(-14.883684319653, "native"),
 unit(13.9805892820006, "native"),
 unit(-12.8808987356981, "native"),
 unit(-11.488226371243, "native"),
 unit(-9.51474016085318, "native"),
 unit(-1.00436055190216, "native")))

grid.edit("quantities.grob",y=unit.c(unit(-8.07672595120493, "native"),
 unit(4.78718651828883, "native"),
 unit(0.25941593099694, "native"),
 unit(-4.32200781461293, "native"),
 unit(25.7349463488991, "native"),
 unit(-22.7610031110325, "native"),
     unit(14.5001560838519, "native")))

However, here I just want to change the x and y  value of 4th label, does
anyone know how to set it?

Aimin

On Thu, Sep 13, 2018 at 9:56 PM David Winsemius 
wrote:

>
> > On Sep 13, 2018, at 2:31 PM, Aimin Yan  wrote:
> >
> > I am using eulerr to get venn.
> > My code is like:
> >
> > fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
> >"ciLAD" = 3, "ciLAD_MEF" = 101,
> > "LAD_MEF" = 541,
> >"ciLAD_MEF" = 2),shape = "ellipse")
> >
> > plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels =
> list(font
> > = 1),alpha=0.7)
> >
> > After I get the figure, I find the position of some  labels need to be
> > adjusted.
> >
> > Does anyone has some idea about how to process this?
>
> Looking at the code of plot.euler we see that the plotting paradigm is
> grid. So you could assign the output to a data.object name, search for list
> items that match the names of the labels you want to reposition, and modify
> the position values. You would need to be more specific, if you want a
> worked example.
>
> As far as I can see the lables and postions are fairly deep inside a list
> structure:
>
>  $ children :List of 1
>   ..$ GRID.gTree.12:List of 5
>   .. ..$ children
>  $ diagram.grob.1
> $children
> .. .. .. .. ..$ labels.grob:List of 11
>   .. .. .. .. .. ..$ label: chr [1:3] "ciLAD" "LAD" "nonXL_MEF"
>   .. .. .. .. .. ..$ x: 'unit' num [1:3] -18.1native
> 69.2native 11.9native
>   .. .. .. .. .. .. ..- attr(*, "valid.unit")= int 4
>   .. .. .. .. .. .. ..- attr(*, "unit")= chr "native"
>   .. .. .. .. .. ..$ y: 'unit' num [1:3] -17.86native
> 5.24native 27.86native
>   .. .. .. .. .. .. ..- attr(*, "valid.unit")= int 4
>   .. .. .. .. .. .. ..- attr(*, "unit")= chr "native"
>
> --
> David.
> >
> >
> > Thank you,
> >
> > Aimin
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'
>  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Change the position of label when using R package eulerr

2018-09-13 Thread Aimin Yan
I am using eulerr to get venn.
My code is like:

fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
"ciLAD" = 3, "ciLAD_MEF" = 101,
"LAD_MEF" = 541,
"ciLAD_MEF" = 2),shape = "ellipse")

plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels = list(font
= 1),alpha=0.7)

After I get the figure, I find the position of some  labels need to be
adjusted.

Does anyone has some idea about how to process this?


Thank you,

Aimin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Change the position of label when using R package eulerr

2018-09-13 Thread Aimin Yan
I am using eulerr to get venn.
My code is like:

fit1 <- euler(c("ciLAD" = 785, "LAD" = 565, "nonXL_MEF" = 167,
"ciLAD" = 3, "ciLAD_MEF" = 101,
"LAD_MEF" = 541,
"ciLAD_MEF" = 2),shape = "ellipse")

plot(fit1,quantities = TRUE,fill = rainbow(7),lty = 1:2,labels = list(font
= 1),alpha=0.7)

After I get the figure, I find the position of some  labels need to be
adjusted.

Does anyone has some idea about how to process this?


Thank you,

Aimin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Seeking nomination for the Statistical Computing and Graphics Award

2018-08-21 Thread Yan, Jun
(apologies for cross posting)

The Statistical Computing and Graphics Award of the ASA Sections of Statistical 
Computing and Statistical Graphics recognizes an individual or team for 
innovation in computing, software, or graphics that has had a great impact on 
statistical practice or research. The past awardees include Bill Cleveland 
(2016) and Robert Gentleman and Ross Ihaka (2010). The prize carries with it a 
cash award of $5,000 plus an allowance of up to $1,000 for travel to the next 
Joint Statistical Meetings (JSM) where the award will be presented.

Qualifications
The prize-winning contribution will have had significant and lasting impacts on 
statistical computing, software or graphics.

The Awards Committee depends on the American Statistical Association membership 
to submit nominations. Committee members will review the nominations and make 
the final determination of who, if any, should receive the award. The award may 
not be given to a sitting member of the Awards Committee or a sitting member of 
the Executive Committee of the Section of Statistical Computing or the Section 
of Statistical Graphics.

Nomination and Award Dates
Nominations are due by November 15, 2018 for an award to be presented at the 
JSM in the following year. Nominations should be submitted as a complete 
packet, consisting of:
- a nomination letter, no longer than four pages, addressing points in the 
selection criteria
- nominee's curriculum vita(e)
- a minimum of 3 (and no more than 4) supporting letters, each no longer than 
two pages

Selection Process
The Awards Committee will consist of the Chairs and Past Chairs of the Sections 
on Statistical Computing and Statistical Graphics. The selection process will 
be handled by the Awards Chair of the Statistical Computing Section and the 
Statistical Graphics Section. Nominations and questions are to be sent to the 
e-mail address below.

Jun Yan
Professor
University of Connecticut
jun@uconn.edu



 
   
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2018 ASA Computing/Graphics: Chambers Software Award and Student Paper Competition

2017-12-01 Thread Yan, Jun
Dear R-help Listers,

The submission site for the two awards (Chambers Student Software;
Computing/Graphics Student Paper) is open at 
http://asa.stat.uconn.edu<http://asa.stat.uconn.edu/>
until the deadline, 5:00 pm EST, December 15, 2017. The results will
be announced by January 15, 2018. The award recipients are expected to
present in a topic-contributed session at the 2018 JSM in Vancouver.

I would appreciate your spreading the words.


Jun Yan


From: Yan, Jun
Sent: Wednesday, October 18, 2017 6:37:19 AM
To: r-help@R-project.org
Subject: 2018 ASA Computing/Graphics: Chambers Software Award and Student Paper 
Competition


Dear R-help Listers,


The following two student competitions are of interests to the now many student 
R package developers. I'd appreciate your help in spreading them.


#1. John M. Chambers Statistical Software Award 2018

The Statistical Computing Section of the American Statistical
Association announces the competition for the John M. Chambers
Statistical Software Award. In 1998 the Association for Computing
Machinery (ACM) presented the ACM Software System Award to John
Chambers for the design and development of S. Dr. Chambers generously
donated his award to the Statistical Computing Section to endow an
annual prize for statistical software written by, or in collaboration
with, an undergraduate or graduate student. The prize carries with it
a cash award, which has been increased from $1,000 to $2,000 starting
from 2018. See http://stat-computing.org/awards/jmc/history.html for

the history of the award.

Both individuals and teams are eligible to participate in the
competition. To be eligible, at least one individual within the team
must have begun the development while a student and must either
currently be a student, or have completed all requirements for her/his
last degree after January 1, 2017. The award will be given to the
student, or split between student team members if the team consists of
multiple students, up to a maximum of three students. If the software
was created by a team, the contribution of the student(s) must be
substantial.

To apply for the award, teams must provide the following materials:

Current CVs of all team members.

A letter from a faculty mentor at the academic institution of one of
the students. The letter should confirm that the student had
substantial participation in the development of the software, certify
her/his student status when the software began to be developed,
confirm that he/she is still a student (or provide a date of degree
completion), and briefly discuss the importance of the software to
statistical practice.

A brief, one to two page description of the software, summarizing what
it does, how it does it, and why it is an important contribution. If
any student team member has continued developing the software after
finishing her/his studies, the description should indicate what was
developed when the individual was a student and what has been added
since.

An installable software package with its source code for use by the
award committee. It should be accompanied by enough information to
allow the judges to effectively use and evaluate the software
(including its design considerations). This information can be
provided in a variety of ways, including but not limited to: a user
manual, a manuscript, a URL, and online help to the system.

All materials must be in English. We prefer that electronic text be
submitted as PDF files. The entries will be judged on a variety of
dimensions, including the importance and relevance for statistical
practice of the tasks performed by the software, ease of use, clarity
of description, elegance and availability for use by the statistical
community. Preference will be given to those entries that are grounded
in software design rather than calculation. The decision of the award
committee is final.

All application materials MUST BE RECEIVED by 5:00pm EST, Friday,
December 15, 2017. The submission window will be open at
http://asa.stat.uconn.edu on December 1, 2017. Questions are to be


emailed to Professor Jun Yan.

#2. Student Paper Competition 2018

The Statistical Computing and Statistical Graphics Sections of the ASA
are co-sponsoring a student paper competition on the topics of
Statistical Computing and Statistical Graphics. Students are
encouraged to submit a paper in one of these areas, which might be
original methodological research, some novel computing or graphical
application in statistics, or any other suitable contribution (for
example, a software-related project). The selected winners will
present their papers in a topic-contributed session at the 2017 Joint
Statistical Meetings. The prize carries with it a cash award of
$1,000.

Anyone who is a student (graduate or undergraduate) on or after
September 1, 2017 is eligible to participate. An entry must include an
abstract, a six page manuscript (including figures, tables and
references;

[R] 2018 ASA Computing/Graphics: Chambers Software Award and Student Paper Competition

2017-10-18 Thread Yan, Jun
Dear R-help Listers,


The following two student competitions are of interests to the now many student 
R package developers. I'd appreciate your help in spreading them.


#1. John M. Chambers Statistical Software Award 2018

The Statistical Computing Section of the American Statistical
Association announces the competition for the John M. Chambers
Statistical Software Award. In 1998 the Association for Computing
Machinery (ACM) presented the ACM Software System Award to John
Chambers for the design and development of S. Dr. Chambers generously
donated his award to the Statistical Computing Section to endow an
annual prize for statistical software written by, or in collaboration
with, an undergraduate or graduate student. The prize carries with it
a cash award, which has been increased from $1,000 to $2,000 starting
from 2018. See http://stat-computing.org/awards/jmc/history.html for

the history of the award.

Both individuals and teams are eligible to participate in the
competition. To be eligible, at least one individual within the team
must have begun the development while a student and must either
currently be a student, or have completed all requirements for her/his
last degree after January 1, 2017. The award will be given to the
student, or split between student team members if the team consists of
multiple students, up to a maximum of three students. If the software
was created by a team, the contribution of the student(s) must be
substantial.

To apply for the award, teams must provide the following materials:

Current CVs of all team members.

A letter from a faculty mentor at the academic institution of one of
the students. The letter should confirm that the student had
substantial participation in the development of the software, certify
her/his student status when the software began to be developed,
confirm that he/she is still a student (or provide a date of degree
completion), and briefly discuss the importance of the software to
statistical practice.

A brief, one to two page description of the software, summarizing what
it does, how it does it, and why it is an important contribution. If
any student team member has continued developing the software after
finishing her/his studies, the description should indicate what was
developed when the individual was a student and what has been added
since.

An installable software package with its source code for use by the
award committee. It should be accompanied by enough information to
allow the judges to effectively use and evaluate the software
(including its design considerations). This information can be
provided in a variety of ways, including but not limited to: a user
manual, a manuscript, a URL, and online help to the system.

All materials must be in English. We prefer that electronic text be
submitted as PDF files. The entries will be judged on a variety of
dimensions, including the importance and relevance for statistical
practice of the tasks performed by the software, ease of use, clarity
of description, elegance and availability for use by the statistical
community. Preference will be given to those entries that are grounded
in software design rather than calculation. The decision of the award
committee is final.

All application materials MUST BE RECEIVED by 5:00pm EST, Friday,
December 15, 2017. The submission window will be open at
http://asa.stat.uconn.edu on December 1, 2017. Questions are to be


emailed to Professor Jun Yan.

#2. Student Paper Competition 2018

The Statistical Computing and Statistical Graphics Sections of the ASA
are co-sponsoring a student paper competition on the topics of
Statistical Computing and Statistical Graphics. Students are
encouraged to submit a paper in one of these areas, which might be
original methodological research, some novel computing or graphical
application in statistics, or any other suitable contribution (for
example, a software-related project). The selected winners will
present their papers in a topic-contributed session at the 2017 Joint
Statistical Meetings. The prize carries with it a cash award of
$1,000.

Anyone who is a student (graduate or undergraduate) on or after
September 1, 2017 is eligible to participate. An entry must include an
abstract, a six page manuscript (including figures, tables and
references; a two-column format is acceptable), blinded versions of
the abstract and manuscript (with no author names or other information
that easily identifies the authors), a CV, and a letter from a faculty
member familiar with the student's work. The applicant must be the
first author of the paper. The faculty letter must include a
verification of the applicant's student status and, in the case of
joint authorship, should indicate what fraction of the contribution is
attributable to the applicant. We prefer that electronic submissions
of papers consist of PDF files. All materials must be in English.

Students may submit papers to no more than two sections and may accept
only

[R] How to decide weight in WLS model in R ?

2015-03-02 Thread Yan Wu
Hi,

I would like to know how to decide the weight in a WLS model in R?

For example, In the pipeline  data from faraway, I try to fit a
regression model Lab ~ Field (non-constant variance). I wish to use weights
to account for the non-constant variance. So how to decide the weight in
the WLS model?

For the pipeline data, they split the range of Field into 12 groups of
size 9. within each group, and they compute the variance of Lab as varlab
and the mean of Field as meanfield. In addition, they suppose that the
variance in the response is linked to the predictor in the following way:
var(Lab)=a*(Field^b).

So we could get a estimate of a and b by regress log(varlab) on
log(meanfield). But how to determine weights in a WLS fit of Lab on Field
in R?

I guess that it may require the function of 'VarConstPower' in R in the
example above. So could you please explain how to use 'VarConstPower' in R?

I will appreciate it if you could please answer the two questions above.

Thanks!
Angela
-

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 32bit R303 calls external C functions

2015-02-12 Thread Li, Yan
I meant R3.0.3. I need this package working in R3.0.3.

I have 32bit and 64bit dlls both included in the package.


-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: Thursday, February 12, 2015 1:01 PM
To: Li, Yan; r-help@r-project.org
Subject: Re: [R] 32bit R303 calls external C functions

On 12/02/2015 11:08 AM, Li, Yan wrote:
 Dear All,

 I build a R package which will need to call external C functions. I 
 registered the C functions in the NAMESPACE file and include 32bit and 64bit 
 dlls in the packages. If I load the package in 64bit R and calls the external 
 C functions, it works fine. However if I load the package in 32bit R and call 
 the external C functions, it either does not work properly or gives back 
 error message saying cannot find the external C functions.

 When I built the same package in R2.15.1, there is no such issue.

 I checked the update news for R303 and found most .C is replaced by .Call. I 
 modified the code but the package cannot be loaded and R ended abnormally.

 Does anyone know if there any difference of 32bit R303 calling external C 
 from 64bitR303? Thank you!

There's no R303; the current version is 3.1.2.  So if you meant 3.0.3, I'd 
suggest upgrading.  If you meant something else, you're probably in the wrong 
place.

For 3.1.2, there are big differences:  32 bit R can't call 64 bit .dlls 
and vice versa.   You can either install the package from source in both 
versions, or arrange to compile both 32 bit and 64 bit dlls, in which case both 
versions can use the binary of the package.

If you've already done all that, then you'll need to give more details, e.g. 
access to the source for the package, to get more specific help.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 32bit R303 calls external C functions

2015-02-12 Thread Li, Yan
Dear All,

I build a R package which will need to call external C functions. I registered 
the C functions in the NAMESPACE file and include 32bit and 64bit dlls in the 
packages. If I load the package in 64bit R and calls the external C functions, 
it works fine. However if I load the package in 32bit R and call the external C 
functions, it either does not work properly or gives back error message saying 
cannot find the external C functions.

When I built the same package in R2.15.1, there is no such issue.

I checked the update news for R303 and found most .C is replaced by .Call. I 
modified the code but the package cannot be loaded and R ended abnormally.

Does anyone know if there any difference of 32bit R303 calling external C from 
64bitR303? Thank you!

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to tell if there is overdispersion in a GLMM?

2015-02-03 Thread maggy yan
I read something on http://glmm.wikidot.com/faq, under How can I deal with
overdispersion in GLMMs?:

library(lme4)  ## 1.0-4set.seed(101)
d - data.frame(y=rpois(1000,lambda=3),x=runif(1000),
f=factor(sample(1:10,size=1000,replace=TRUE)))
m1 - glmer(y~x+(1|f),data=d,family=poisson)
overdisp_fun(m1)##chisqratio  rdfp
## 1026.77808151.0298677  997.0000.2497659
library(glmmADMB)  ## 0.7.7
m2 - glmmadmb(y~x+(1|f),data=d,family=poisson)
overdisp_fun(m2)##chisqratio  rdfp
## 1026.75850311.0298480  997.0000.2499024

In both case, the chisq is  rdf, does it mean there is over dispersion?

thanks for any help

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] measure of goodness of fit for the model without an intercept

2015-01-29 Thread Yan Wu
Hi,

When I fit the regression model without an intercept term, R-squared tends
to much larger than the R-squared in the model with an intercept. So in this
case, what�s a more reasonable measure of the goodness of fit for the model
without an intercept?

Thanks a lot!! 



Yan 




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with a for complex for loop that looks into another data frame

2014-11-07 Thread Yan Wu
Hello everyone!

I am working on a stock trading algorithm. I have created a data frame with
the stocks, and a summary data frame, master_df_ex, and master_df_ex_sum,
respectively.

The goal is to create something each day that has equal long and shorts,
and thus for each day, the sum of the ls_flag = 0.

For master_df_ex, there is a rank by strength of magnitude column, up or
down column, then a long or short column.

The ls_flag is set by the updn_flag for the stocks that are ranked in the
top half (in this case, since we have 4 per day, rank = 1 and rank = 2 are
set by updn_flag.

*I need to create a process to fill in the NA's in the ls_flag.*

My thinking is to create a summary table, where i run a summary,replace the
contents, until both the NA are 0 and the sum of the ls_flag =0.

For example, here is the current data:

asof_dt-rep(seq(as.Date(2014-10-01), as.Date(2014-10-03), days),4)

rank_mag-(rep(seq(1,4),3))

updn_flag-c(-1,-1,-1,1,-1,-1,1,1,-1,1,-1,-1)

ls_flag-c(-1,-1,NA,NA,-1,-1,NA,NA,-1,1,NA,NA)

master_df_ex-data.frame(asof_dt,rank_mag,updn_flag,ls_flag)

master_df_ex-arrange(master_df_ex,asof_dt,rank_mag)

master_df_ex_sum-summarise(master_df,tot_flag = sum(ls_flag, na.rm = TRUE),

tot_NA = sum(is.na(ls_flag)))

 master_df_ex

  asof_dt rank_mag updn_flag ls_flag

1  2014-10-011-1  -1

2  2014-10-012 1   1

3  2014-10-013 1  NA

4  2014-10-014 1  NA

5  2014-10-021-1  -1

6  2014-10-022-1  -1

7  2014-10-023-1  NA

8  2014-10-024 1  NA

9  2014-10-031 1   1

10 2014-10-032 1   1

11 2014-10-033 1  NA

12 2014-10-034-1  NA

 master_df_ex_sum

Source: local data frame [3 x 3]


 asof_dt tot_flag tot_NA

1 2014-10-010  2

2 2014-10-02   -2  2

3 2014-10-032  2

For 2014-10-02, since the the tot_flag = -2, the NA's for this date should
both be -1

For 2014-10-03, since the tot_flag = 2, the NA's should both be -1

For 2014-10-01 (hardest one):

The logic should look at 2014-10-01 in master_df_ex_sum, since tot_NA is
not 0, go into master_df_ex and find the lowest rank by rank_mag where
ls_flag is NA. Assign ls_flag = 1.

Then run the summary again, the NAs will be 1, and the sum of ls_flag will
be 1.

Then it should go into master_df_ex again, and assign a -1 to line 4. Then
the summary will have 0 and 0 and this date should be done.

I hope that makes sense! Any help is appreciated! Thank you very much.

-- 
Yan Wu
510-333-3188 http://bigkidsbighearts.org
yanni...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] random forest application

2014-06-20 Thread Li, Yan
Hi All,

Is anyone using random forest for predicting? Some people claimed that it will 
give more accurate result than decision tree. But considering it builds 500(by 
default) full trees, is it worth to use random forest to predict instead of 
decision tree? What typical applications of this algorithm?

Thank you!

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random forest application

2014-06-20 Thread Li, Yan
Thanks for the reply...Actually you answered my questionI just want to know 
how people use it...

-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Sent: Friday, June 20, 2014 11:31 AM
To: Li, Yan
Cc: r-help@r-project.org
Subject: Re: [R] random forest application

Hi,

This is not an R question, so really not appropriate for the list.

The answer depends on what worth it means to you.

There are many applications:
http://scholar.google.com/scholar?hl=enq=%22random+forest%22btnG=as_sdt=1%2C39as_sdtp=

Sarah

On Fri, Jun 20, 2014 at 10:12 AM, Li, Yan yan...@ibi.com wrote:
 Hi All,

 Is anyone using random forest for predicting? Some people claimed that it 
 will give more accurate result than decision tree. But considering it builds 
 500(by default) full trees, is it worth to use random forest to predict 
 instead of decision tree? What typical applications of this algorithm?

 Thank you!

 Regards,
 Yan
-- 
Sarah Goslee
http://www.functionaldiversity.org
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Send html body email in R

2014-02-07 Thread Bo Yan
Hi,

 

I have managed to send emails with plain text message in the mail body using
sendmailR. However, this package currently doesn't allow you to send html
body email unless the source code is hacked.
http://stackoverflow.com/questions/19844762/how-to-send-html-email-using-r 

 

Could anyone share a solution that allows sending out html body emails in R?
Thanks!

 

Bo 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Send html body email in R

2014-02-07 Thread Bo Yan
Would be my last resort if a cleaner solution cannot be found. 

-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] 
Sent: Friday, February 07, 2014 11:57 AM
To: Bo Yan; r-help@r-project.org
Subject: Re: [R] Send html body email in R

On 07/02/2014 12:47 PM, Bo Yan wrote:
 Hi,

   

 I have managed to send emails with plain text message in the mail body 
 using sendmailR. However, this package currently doesn't allow you to 
 send html body email unless the source code is hacked.
 http://stackoverflow.com/questions/19844762/how-to-send-html-email-usi
 ng-r

   

 Could anyone share a solution that allows sending out html body emails in
R?
 Thanks!

Why not use the solution posted on that page, i.e. modify the source of
sendmailR?

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to melt variable to one variable

2013-11-23 Thread maggy yan
I want to make a stacked bar plot with one bar for two variables from my
data chir, the two variables have about 100 values like no, yes and na. I
want to show how many no, yes and na they both have together with the
stacked bar. I tried to melt these to variables first like this:

melt1=melt(data_chir, measure.vars=c(N1_re, N2_re), var=zpd)

but it says  arguments imply differing number of rows: 98, 196


maybe there is another way to make the plot without melt?

thanks in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sample size for clustering analysis?

2013-11-05 Thread Li, Yan
Hi All,

What formula can I use to determine the right sample size for clustering 
analysis with 100-300 variables?

What sampling methodology can be used for k-means or hierarchical clustering on 
categorical fields so that all values of the categorical fields are included in 
the sample?

Thanks a lot!

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can ODBC or JDBC calls R?

2013-08-09 Thread Li, Yan
Hi R helpers,

I know there are packages RODBC and RJDBC to enable R to access ODBC and JDBC 
data.  Is there a package/software to do the reverse way: calling R from ODBC 
or JDBC? Thank you.

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Li, Yan
H David and other R helpers,

If I rescale the numerical fields to [0,1] and represent the categorical fields 
to 1:k, which is the same starting point as Gower's measure, but I use 
Euclidean distance instead of Gower's distance to do k-means clustering. How 
much is the difference? What is the draw back? 

Thanks you,
Yan

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu] 
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't give us any 
information about how you numerically transformed the categorical variables, 
but the usual approach is to create indicator variables that code 
presence/absence for each category within a categorical variable. Different 
variances between variables can be reduced by standardizing the variables.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical variables? R 
packages? Which is the best?

If a data has both numeric and categorical data, what is the best clustering 
algorithm to use and R package?

I tried numeric transformation of all categorical fields  and doing clustering 
afterwards. But the transformed fields have values from 1...10, and my other 
fields is in a bigger scale:
1-...This will make the categorical fields has less effect on the distance 
calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Li, Yan
Thanks David, This is very useful!

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu] 
Sent: Tuesday, August 06, 2013 11:27 AM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

What do you mean by representing the categorical fields by 1:k?

a - c(red, green, blue, orange, yellow)

becomes

a - c(1, 2, 3, 4, 5)

That guarantees your results are worthless unless your categories have an 
inherent order (e.g. tiny, small, medium, big, giant).
Otherwise it should be four (k-1) indicator/dummy variables (e.g.):

a.red - c(1, 0, 0, 0, 0)
a.green - c(0, 1, 0, 0, 0)
a.blue - c(0, 0, 1, 0, 0)
a.orange - c(0, 0, 0, 1, 0)

Then you can use Euclidean distance.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: Li, Yan [mailto:yan...@ibi.com]
Sent: Tuesday, August 6, 2013 9:36 AM
To: dcarl...@tamu.edu; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

H David and other R helpers,

If I rescale the numerical fields to [0,1] and represent the categorical fields 
to 1:k, which is the same starting point as Gower's measure, but I use 
Euclidean distance instead of Gower's distance to do k-means clustering. How 
much is the difference? What is the draw back? 

Thanks you,
Yan

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu]
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't give us any 
information about how you numerically transformed the categorical variables, 
but the usual approach is to create indicator variables that code 
presence/absence for each category within a categorical variable. Different 
variances between variables can be reduced by standardizing the variables.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical variables? R 
packages? Which is the best?

If a data has both numeric and categorical data, what is the best clustering 
algorithm to use and R package?

I tried numeric transformation of all categorical fields  and doing clustering 
afterwards. But the transformed fields have values from 1...10, and my other 
fields is in a bigger scale:
1-...This will make the categorical fields has less effect on the distance 
calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm for clustering categorical data

2013-08-06 Thread Li, Yan
Thanks for the reply...

For some reason, I need to keep Euclidean distance in the process...

-Original Message-
From: Martin Maechler [mailto:maech...@stat.math.ethz.ch] 
Sent: Tuesday, August 06, 2013 12:04 PM
To: dcarl...@tamu.edu
Cc: Li, Yan; r-help@r-project.org
Subject: Re: [R] algorithm for clustering categorical data

 DC == David Carlson dcarl...@tamu.edu
 on Tue, 6 Aug 2013 10:26:56 -0500 writes:

 What do you mean by representing the categorical fields by 1:k?
 a - c(red, green, blue, orange, yellow)

 becomes

 a - c(1, 2, 3, 4, 5)

 That guarantees your results are worthless worthless indeed!

 unless your categories
 have an inherent order (e.g. tiny, small, medium, big, giant).
 Otherwise it should be four (k-1) indicator/dummy variables (e.g.):

 a.red - c(1, 0, 0, 0, 0)
 a.green - c(0, 1, 0, 0, 0)
 a.blue - c(0, 0, 1, 0, 0)
 a.orange - c(0, 0, 0, 1, 0)

 Then you can use Euclidean distance.

Yes, ... or use Gower's or other similarly sophisticated distances, as you 
(David) mentioned earlier in this thread.

Do also note that a generalized Gower's distance (+ weighting of
variables) is available from the ('recommended' hence always
installed) package 'cluster' :

  require(cluster)
  ?daisy
  ## notably  daisy(*,  metric=gower)

Note that daisy() is more sophisticated than most users know, using the 'type = 
*' specification allowing, notably for binary variables (as your a.col 
dummies above) allowing asymmetric behavior which maybe quite important in 
rare event and similar cases.

Martin


 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352


 -Original Message-
 From: Li, Yan [mailto:yan...@ibi.com] 
 Sent: Tuesday, August 6, 2013 9:36 AM
 To: dcarl...@tamu.edu; r-help@r-project.org
 Subject: RE: [R] algorithm for clustering categorical data

 H David and other R helpers,

 If I rescale the numerical fields to [0,1] and represent the
 categorical fields to 1:k, which is the same starting point as
 Gower's measure, but I use Euclidean distance instead of Gower's
 distance to do k-means clustering. How much is the difference? What
 is the draw back? 

 Thanks you,
 Yan

 -Original Message-
 From: David Carlson [mailto:dcarl...@tamu.edu] 
 Sent: Thursday, August 01, 2013 12:08 PM
 To: Li, Yan; r-help@r-project.org
 Subject: RE: [R] algorithm for clustering categorical data

 Read up on Gower's Distance measures (available in the ecodist
 package) which can combine numeric and categorical data. You didn't
 give us any information about how you numerically transformed the
 categorical variables, but the usual approach is to create indicator
 variables that code presence/absence for each category within a
 categorical variable. Different variances between variables can be
 reduced by standardizing the variables.

 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
 Sent: Thursday, August 1, 2013 11:00 AM
 To: r-help@r-project.org
 Subject: [R] algorithm for clustering categorical data

 Hi All,

 Does anyone know what algorithm for clustering categorical
 variables? R packages? Which is the best?

 If a data has both numeric and categorical data, what is the best
 clustering algorithm to use and R package?

 I tried numeric transformation of all categorical fields  and doing
 clustering afterwards. But the transformed fields have values from
 1...10, and my other fields is in a bigger scale:
 1-...This will make the categorical fields has less effect on
 the distance calculation...

 Thank you!
 Yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] algorithm for clustering categorical data

2013-08-01 Thread Li, Yan
Hi All,

Does anyone know what algorithm for clustering categorical variables? R
packages? Which is the best?

If a data has both numeric and categorical data, what is the best clustering 
algorithm
to use and R package?

I tried numeric transformation of all categorical fields  and doing clustering 
afterwards. But the transformed fields have values from 1...10, and my other 
fields is in a bigger scale: 1-...This will make the categorical fields has 
less effect on the distance calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm for clustering categorical data

2013-08-01 Thread Li, Yan
Great! Thanks!

Yeah, I just use the usual way: as.numeric(..) for numeric 
transformation...seemed a standardization is needed. Thank you.

-Original Message-
From: David Carlson [mailto:dcarl...@tamu.edu] 
Sent: Thursday, August 01, 2013 12:08 PM
To: Li, Yan; r-help@r-project.org
Subject: RE: [R] algorithm for clustering categorical data

Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You didn't give us any 
information about how you numerically transformed the categorical variables, 
but the usual approach is to create indicator variables that code 
presence/absence for each category within a categorical variable. Different 
variances between variables can be reduced by standardizing the variables.

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help@r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical variables? R 
packages? Which is the best?

If a data has both numeric and categorical data, what is the best clustering 
algorithm to use and R package?

I tried numeric transformation of all categorical fields  and doing clustering 
afterwards. But the transformed fields have values from 1...10, and my other 
fields is in a bigger scale:
1-...This will make the categorical fields has less effect on the distance 
calculation...

Thank you!
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm for clustering categorical data

2013-08-01 Thread Li, Yan
Thanks for the reply

From the link you provided, only two packages mentioned categorical field: 
depmix and depmixS4. I'll look at them.



-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, August 01, 2013 12:15 PM
To: Li, Yan
Cc: r-help@r-project.org
Subject: Re: [R] algorithm for clustering categorical data


On Aug 1, 2013, at 9:00 AM, Li, Yan wrote:

 Hi All,
 
 Does anyone know what algorithm for clustering categorical variables? 
 R packages?

Many.

http://cran.r-project.org/web/views/Cluster.html

 Which is the best?

For what purpose?

 
 If a data has both numeric and categorical data, what is the best 
 clustering algorithm to use and R package?
 
 I tried numeric transformation of all categorical fields  and doing 
 clustering afterwards. But the transformed fields have values from 1...10, 
 and my other fields is in a bigger scale: 1-...This will make the 
 categorical fields has less effect on the distance calculation...
 

This seems impossibly vague and confused. You are asked in the Posting Guide to 
provide a working example if you want help with code.

-- 

David.


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Package License

2013-07-17 Thread Li, Yan
HI Helpers,

How could we use R and R packages licensed under GPL into commercial products? 
Is it allowed to load a library and get the results from it and using the 
results for commercial use? Thank you so much!

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loops for matrices

2013-06-12 Thread maggy yan
I have to use a loop (while or for) to return the result of hadamard
product. now it returns a matrix, but when I use is.matrix() to check, it
returns FALSE, whats wrong?

Matrix.mul - function(A, B)
{
while(is.matrix(A) == FALSE | is.matrix(B) == FALSE )
 {print(error)
  break}
while(is.matrix(A) == T  is.matrix(B) == T)
 {
  n - dim(A)[1]; m - dim(A)[2];
  p - dim(B)[1]; q - dim(B)[2];
  while(m == p)
   {
C - matrix(0, nrow = n , ncol = q)
for(s in 1:n)
   {
for(t in 1:q)
   {
c - array(0, dim = m )
for(k in 1:m)
   {
c[k] - A[s,k] * B[k, t]

}
C[s, t] - sum(c)
   }
   }
print(C)
break
}
  while(m != p)
   {
print(error)
break
}
  break
  }
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] keep the centre fixed in K-means clustering

2013-05-22 Thread HJ YAN
Dear Uwe

Just wanted to say thank you so much for this as whilst waiting for a reply
from r-help I had been wrting a piece of ugly code as below to do the job
and yours looks MUCH smarter and I especially like the use of '-apply()'
bit as there is no 'min.col()' function.

p.s. my data has 144 values for each data point (e.g. collected at 10
minute interval for a day)


Also, I wondered if there is a quick manual book/references to all those
useful functions we can learn R in a more efficient way, e.g. I wrote my
function below because I did not know col.Sums/ row.Sums etc. exist so only
could think about a combination of using a loop+ 'which.min()' !


Massive thanks again!
HJ




center-matrix(rnorm(1440,sd=0.5),nrow=10)
centre1-cbind(center,c(1:10))

NewData-matrix(rnorm(1440*5),nrow=50)
NewData1-cbind(NewData,rep(NA,nrow(NewData)))

clust_HJ-function(NewData=NewData1, Centre=centre1){

for(i in 1:nrow(NewData)){
tmp1-rbind(Centre[,c(1:144)],NewData[i,c(1:144)])
dist.matrix-as.matrix(dist(tmp1, method =euclidian))
Ind.dist.min-which.min(dist.matrix[11,c(1:10)])
NewData[i,145]-Ind.dist.min

} # end i loop

output.file=NewData
write.csv(output.file,NewData2.csv,row.names=F)
} # end function

clust_HJ(NewData1,centre1)


On Wed, May 22, 2013 at 10:55 AM, Uwe Ligges 
lig...@statistik.tu-dortmund.de wrote:

 So you just want to compare the distances from each point of your new data
 to each of the Centres and assign the corresponding number of the centre as
 in:

 clust - apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2


 but since the apply loop is rather long here for lots of new data, one may
 want to optimize the runtime for huge data and get:

 tNewData - t(NewData)
 clust - max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2)))


 Best,
 Uwe Ligges





 On 21.05.2013 13:19, HJ YAN wrote:

 Dear R users


 I have the matrix of the centres of some clusters, e.g. 20 clusters each
 with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
 values.

 I have collected new data (each with 100 numeric values) and would like to
 keep the above 20 centres fixed/'unmoved' whilst just see how my new data
 fit in this grouping system, e.g. if the data is close to cluster 1 than
 lable it 'cluster 1'.

 If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
 new data 'NewData' has 500 observations, by using kmeans() will update the
 centres:

 kmeans(NewData, Centre)


 I wondered if there is other R packages out there can keep the centres
 fixed and lable each observations of my new data? Or I have to write my
 own
 function?

 To illustrate my task using a simpler example:

 I have

 Centre- matrix(c(0,1,0,1), nrow=2)

 # the two created centres in a two dimentional case are
 Centre
   [,1] [,2]
 [1,]00
 [2,]11

 NewData-rbind(matrix(rnorm(**100, sd = 0.3), ncol = 2),
  matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))

   NewData1-cbind(c1:100), NewData)
 colnames(NewData1)-c(ID,x**,y)

 # my data
   head(NewData1)
   ID  x  y
 [1,]  1 -0.3974660  0.1541685
 [2,]  2  0.5321347  0.2497867
 [3,]  3  0.2550276  0.1691720
 [4,]  4 -0.1162162  0.6754874
 [5,]  5  0.1570996  0.1175119
 [6,]  6  0.4816195 -0.6836226

 ## I'd like to have outcome as below (whilst keep the tow centers fixed):

 IDx y  Cluster
 [1,] 1   -0.3974660 0.1541685 1
 [2,] 20.5321347 0.2497867 1
 [3,] 30.2550276 0.1691720 1
 [4,] 4   -0.1162162 0.6754874 1

 ...
 [55,]  55 1.1570996  1.1175119 2
 [56,]  56 1.4816195  1.6836226 2


 p.s. I use Euclidian to obtain/calculate distance matrix.


 Many thanks in advance

 HJ

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] keep the centre fixed in K-means clustering

2013-05-21 Thread HJ YAN
Dear R users


I have the matrix of the centres of some clusters, e.g. 20 clusters each
with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
values.

I have collected new data (each with 100 numeric values) and would like to
keep the above 20 centres fixed/'unmoved' whilst just see how my new data
fit in this grouping system, e.g. if the data is close to cluster 1 than
lable it 'cluster 1'.

If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
new data 'NewData' has 500 observations, by using kmeans() will update the
centres:

kmeans(NewData, Centre)


I wondered if there is other R packages out there can keep the centres
fixed and lable each observations of my new data? Or I have to write my own
function?

To illustrate my task using a simpler example:

I have

Centre- matrix(c(0,1,0,1), nrow=2)

# the two created centres in a two dimentional case are
Centre
 [,1] [,2]
[1,]00
[2,]11

NewData-rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))

 NewData1-cbind(c1:100), NewData)
colnames(NewData1)-c(ID,x,y)

# my data
 head(NewData1)
 ID  x  y
[1,]  1 -0.3974660  0.1541685
[2,]  2  0.5321347  0.2497867
[3,]  3  0.2550276  0.1691720
[4,]  4 -0.1162162  0.6754874
[5,]  5  0.1570996  0.1175119
[6,]  6  0.4816195 -0.6836226

## I'd like to have outcome as below (whilst keep the tow centers fixed):

   IDx y  Cluster
[1,] 1   -0.3974660 0.1541685 1
[2,] 20.5321347 0.2497867 1
[3,] 30.2550276 0.1691720 1
[4,] 4   -0.1162162 0.6754874 1

...
[55,]  55 1.1570996  1.1175119 2
[56,]  56 1.4816195  1.6836226 2


p.s. I use Euclidian to obtain/calculate distance matrix.


Many thanks in advance

HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to show a dataset in 3d?

2013-05-15 Thread maggy yan
my dataset looks like this in the beginning:

  Subject  Time1  Time2  Time3
1 1  0.385  0.103 -0.488
2 2 -1.939  0.569  1.370
3 3 -1.196 -0.051  1.247
4 4  0.174 -1.163  0.989
5 5  1.246  0.558 -1.804
6 6 -1.108 -0.057  1.165
7 7 -0.609 -0.344  0.953

so each subject has three observations. now I need to show the dataset in
3d with scatterplot3d and plot3d together, that's what I don't know how it
works.

I tried:
scatterplot3d(data, type=p, highlight.3d=T, pch=16)

the graph looks like 3D but is not reall 3D, I mean I can't e.g. rotate it.
but how can combine both functions?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dotcharts next to boxplots

2013-05-13 Thread maggy yan
my dataset looks like this in the beginning:

   Dosis weight sex
1  06.62  m
2  06.65  m
3  05.78  m
4  05.63  m
5  06.05  m
6  06.48  m
7  05.50  m
8  05.37  m
9  16.25  m
10 16.95  m
11 15.61  m

I've got all the box plots:
boxplot( Daten$weight~interaction(Daten$Dosis,Daten$sex, drop=TRUE)

and the points:
points(Daten$weight~interaction(Daten$Dosis,Daten$sex, drop=TRUE))

but the points are overlapping their box plots, so I tried to move all the
points next to the box plots with this:
 points( Daten$weight~interaction(Daten$Dosis,Daten$sex, drop=TRUE) + 0.2)
 but it did not work, is there any other way for this?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boxplot with grouped variables

2013-05-11 Thread maggy yan
my dataset looked like this in the beginning:

Daten
  V1  V2 V3
1  Dosis Gewicht Geschlecht
2  06.62  m
3  06.65  m
4  05.78  m
5  05.63  m

I need box plots for V2 with all combination of V1 and V3, so I deleted the
first row, and tried this:
boxplot(Daten$V2[Daten$V3==m])
but it does not work and I have no clue what I did wrong.
I'm thankful for any help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need means on all boxplots, but only half of them got that

2013-05-11 Thread maggy yan
I tried to draw a point on all boxplots for their means, I did:

boxplot( Daten$weight~interaction(Daten$Dosis,Daten$sex, drop=TRUE))
means-tapply( Daten$weight, Daten$Dosis, mean)
points(means, pch=5, col=red, lwd=5)

but only the boxplots for male got that point on them, its really weird
because I don't think that I separated the sex in the codes above

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating distance matrix for large dataset

2013-05-02 Thread HJ YAN
Dear R users


I wondered if any of you ever tried to calculate distance matrix with very
large data set, and if anyone out there can confirm this error message I
got actually mean that my data is too large for this task.

negative length vectors are not allowed


My data size and code used

 dim(mydata_nor)[1] 365000144 d - dist(mydata_nor, method = euclidean)



Here my data has 1000 samples each has a year data observed by 10 minutes
interval daily, so the size is  (365* 1000) * 144.


I checked the manual of function 'dist' but can not see the upper limit
size allowed, and I bet there should be one, so any hints is appreciated.


I would also be grateful if any other method for calculating distance
matrix for large dataset could be advised.



I appreciate reproducible code should be provided for your advice, so try
below if needed:

A-matrix(1:365000*144,nrow=365000,ncol=144) dim(A)[1] 365000144
d1-dist(A,method=euclidean)Error in dist(A, method = euclidean) :
  negative length vectors are not allowed




Many thanks in advance!

HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ksvm

2013-03-18 Thread Li, Yan
Hi All,

I'm developing a ksvm application. I need to use the results obtained from ksvm 
in R to construct or restore the ksvm formula and further use this formula to 
predict new data.

The ksvm class contains xmatrix, ymatrix, coefficients, alpha, b and fitted 
value. Taking linear svm as an example ( vanilladot), the formula is like :

 F(x) = sign(sum(w*x*y*z)+b)

Where, x =  xmatrix, y = ymatrix, b is intercept and z is the input matrix? The 
w is alpha or coefficient from ksvm class? I did some calculation but the 
result is not the same as the fitted value.

Can you give me some idea on the prediction of ksvm by the formula?

Thanks a lot!!!

Regards,
Yan


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to combine conditional argument and logical argument in R to create subset of data...

2013-03-06 Thread HJ YAN
Hi Arun


Thank you so much for the help, that's really helpful!!

Also I have a quick question about the code below where I can not see why
it doesn't work...

I know the I shou

V1-c(rep(111,4),rep(222,4),rep(333,4),rep(111,4),rep(222,4),rep(333,3))
V2-c(1:23)
Tem1-cbind(V1,V2)


So Tem 1 looks like...
 Tem1
   V1 V2
 [1,] 111  1
 [2,] 111  2
 [3,] 111  3
 [4,] 111  4
 [5,] 222  5
 [6,] 222  6
 [7,] 222  7
 [8,] 222  8
 [9,] 333  9
[10,] 333 10
[11,] 333 11
[12,] 333 12
[13,] 111 13
[14,] 111 14
[15,] 111 15
[16,] 111 16
[17,] 222 17
[18,] 222 18
[19,] 222 19
[20,] 222 20
[21,] 333 21
[22,] 333 22
[23,] 333 23

I would like the outcome to be...

  V1 V2

 111  1
 111  2
 111  3
 111  4
 111 13
 111 14
 111 15
 111 16
 222  5
 222  6
 222  7
 222  8
 222 17
 222 18
 222 19
 222 20
 333  9
 333 10
 333 11
 333 12
 333 21
 333 22
 333 23


So I tried code as below
--
Tem3-c(NA,NA)
for(i in length(unique(Tem1[,1]))){
Tem2-subset(Tem1,Tem1[,1]==unique(Tem1[,1])[i])
Tem3-rbind(Tem3,Tem2)
Tem3
}
Tem4-Tem3[-1,]
---

And only get this...


 V1 V2
 333  9
 333 10
 333 11
 333 12
 333 21
 333 22
 333 23


I tried to run the code step by step, e.g. letting i=1, then i=2, then i=
3, and updating my Tem3, I did get what I wanted, but wondered why in the
loop above it did not work...??


Many thanks in advance!

HJ














On Wed, Mar 6, 2013 at 4:36 AM, arun smartpink...@yahoo.com wrote:

 Hi,

  b[b[,4]15  (b[,1]4|is.na(b[,1]))  (b[,2]4|is.na(b[,2])),]
  #[,1] [,2] [,3] [,4] [,5]
 #[1,]6   NA   NA   16   20
 #[2,]   NA5   NA   17   21
 A.K.


 - Original Message -
 From: HJ YAN yhj...@googlemail.com
 To: r-help@r-project.org
 Cc:
 Sent: Tuesday, March 5, 2013 9:33 PM
 Subject: [R] How to combine conditional argument and logical argument in R
 to create subset of data...

 Dear R user

 I have data created using code below

 b-matrix(2:21,nrow=4)
 b[,1:3]=NA
 b[4,2]=5
 b[3,1]=6

 Now the data is

  b
  [,1]  [,2]   [,3]  [,4]  [,5]
 [1,]   NA   NA   NA   14   18
 [2,]   NA   NA   NA   15   19
 [3,]  6   NA   NA   16   20
 [4,]   NA5 NA17   21


 I want to keep data in column 4 greater than 15 and the value in column 1 
 2 either greater than 4 or is 'NA'. So I would like to have
 my outcome as below...

 [3,]   6   NA NA 16 20
 [4,] NA 5 NA 17 21

 I thought something like the code below gonna to work but it only returns
 the last row,e.g NA 5 NA 17 21. ...

 bb-b[which( (b[,2]4 | b[,2]==NA)  (b[,1]4 | b[,1]==NA)  b[,4]15) ,])


 Please could anyone help?

 Many thanks in advance

 HJ

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to combine conditional argument and logical argument in R to create subset of data...

2013-03-06 Thread HJ YAN
Dear Arun

Thanks a million for your prompt reply and I love all four ways in your
reply.

Tried the code and just realised an issue here:   in my real work, my data
is about 4GB large and I'm sure that there are many duplicated values in
V2, so that is to say my V1 and V2 should be something like


V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)  # V1 here are some data
index with lots of repeated numeric values
V2-c(1:23, 1:7)  # there are also duplicated values in V2
Tem1-cbind(V1,V2)
Tem2-Tem1[c(1:10,12:15,18:19),] # I know that Tem2 is a subset of Tem1...


So how do I get outcome of the difference of Tem1 and Tem2 if the values in
V2 having duplicates?

  V1 V2
 333 11
 111 16
 111 17
 111 20
 222 21
 222 22
 222 23
 222  1
 222  2
 333  3
 333  4
 333  5
 333  6
 333  7


Massive thanks
HJ




On Wed, Mar 6, 2013 at 4:12 PM, arun smartpink...@yahoo.com wrote:



 Just to add:

 Tem1[Tem1[,2]%in%setdiff(Tem1[,2],Tem2[,2]),]
 A.K.

 - Original Message -
 From: arun smartpink...@yahoo.com
 To: HJ YAN yhj...@googlemail.com
 Cc: R help r-help@r-project.org
 Sent: Wednesday, March 6, 2013 11:06 AM
 Subject: Re: [R] How to combine conditional argument and logical argument
 in R to create subset of data...

 Hi,
 No problem.
 V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)
  length(V1)
 #[1] 30

  V2- c(1:30) #should be the same length as V1
 Tem1- cbind(V1,V2)
 Tem2-Tem1[1:20,]

 Tem1[!Tem1[,2]%in%Tem2[,2],]
  #  V1 V2
  #[1,] 222 21
  #[2,] 222 22
  #[3,] 222 23
  #[4,] 222 24
  #[5,] 222 25
  #[6,] 333 26
  #[7,] 333 27
  #[8,] 333 28
  #[9,] 333 29
 #[10,] 333 30

 #or
 subset(Tem1,!V2%in% Tem2[,2])
 #or
  Tem1[is.na(match(Tem1[,2],Tem2[,2])),]
  #  V1 V2
  #[1,] 222 21
  #[2,] 222 22
  #[3,] 222 23
  #[4,] 222 24
  #[5,] 222 25
  #[6,] 333 26
  #[7,] 333 27
  #[8,] 333 28
  #[9,] 333 29
 #[10,] 333 30
 A.K.




 
 From: HJ YAN yhj...@googlemail.com
 To: arun smartpink...@yahoo.com
 Sent: Wednesday, March 6, 2013 10:33 AM
 Subject: Re: [R] How to combine conditional argument and logical argument
 in R to create subset of data...


 Thank you SO MUCH Arun!!!

 That's brilliant-- I've learnt some very useful new R command now, e.g.
 'do.call' and 'split'. And I see where my code went wrong now.

  I do appreciate greatly for your prompt reply.

 Also, I wonder if there exist a package can find difference between two
 data frames, e.g. one is a subset of the other? e.g.

  V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)
  V2-c(1:23)
 Tem1-cbind(V1,V2)

 Tem2-Tem1[1:20,]


 How do I get outcome like

 [21,] 333 21
 [22,] 333 22
 [23,] 333 23


 P.S. I used 'setdiff' before, but seems it only works for vectors but not
 for dataframe??


 Sorry for so many questions today, as I'm coding for a work deadline
 tonight.


 Many thanks!
 Cheers
 HJ







 On Wed, Mar 6, 2013 at 1:55 PM, arun smartpink...@yahoo.com wrote:

 Hi,
 You can also try this:
  Tem3- list()
  for(i in unique(Tem1[,1])) {
  Tem3[[i]]- subset(Tem1,Tem1[,1]==i)
  Tem4- do.call(rbind,Tem3)
  }
 head(Tem4)
 #  V1 V2
 #[1,] 111  1
 #[2,] 111  2
 #[3,] 111  3
 #[4,] 111  4
 #[5,] 111 13
 #[6,] 111 14
 
 
 #or
 Tem3-c(NA,NA)
  for(i in unique(Tem1[,1])) {
  Tem2- subset(Tem1, Tem1[,1]==i)
  Tem3- rbind(Tem3,Tem2)
  Tem5- Tem3[-1,]
  }
 head(Tem5)
 #  V1 V2
 # 111  1
 # 111  2
 # 111  3
 # 111  4
 # 111 13
 # 111 14
 
 A.K.
 
 
 
 From: HJ YAN yhj...@googlemail.com
 
 To: arun smartpink...@yahoo.com
 Cc: r-help@r-project.org
 Sent: Wednesday, March 6, 2013 8:24 AM
 Subject: Re: [R] How to combine conditional argument and logical argument
 in R to create subset of data...
 
 
 
 Hi Arun
 
 
 Thank you so much for the help, that's really helpful!!
 
 Also I have a quick question about the code below where I can not see why
 it doesn't work...
 
 I know the I shou
 
 V1-c(rep(111,4),rep(222,4),rep(333,4),rep(111,4),rep(222,4),rep(333,3))
 V2-c(1:23)
 Tem1-cbind(V1,V2)
 
 
 So Tem 1 looks like...
  Tem1
V1 V2
  [1,] 111  1
  [2,] 111  2
  [3,] 111  3
  [4,] 111  4
  [5,] 222  5
  [6,] 222  6
  [7,] 222  7
  [8,] 222  8
  [9,] 333  9
 [10,] 333 10
 [11,] 333 11
 [12,] 333 12
 [13,] 111 13
 [14,] 111 14
 [15,] 111 15
 [16,] 111 16
 [17,] 222 17
 [18,] 222 18
 [19,] 222 19
 [20,] 222 20
 [21,] 333 21
 [22,] 333 22
 [23,] 333 23
 
 I would like the outcome to be...
 
   V1 V2
 
  111  1
  111  2
  111  3
  111  4
  111 13
  111 14
  111 15
  111 16
  222  5
  222  6
  222  7
  222  8
  222 17
  222 18
  222 19
  222 20
  333  9
  333 10
  333 11
  333 12
  333 21
  333 22
  333 23
 
 
 So I tried code as below
 --
 Tem3-c(NA,NA)
 for(i in length(unique(Tem1[,1]))){
 Tem2-subset(Tem1,Tem1[,1]==unique(Tem1[,1])[i])
 Tem3-rbind(Tem3,Tem2)
 Tem3
 }
 Tem4-Tem3[-1,]
 ---
 
 And only get this...
 
 
  V1 V2

Re: [R] How to combine conditional argument and logical argument in R to create subset of data...

2013-03-06 Thread HJ YAN
Hi Arun

Massive thanks for the hints of making use of 'paste0'!

But coincidentally there were no pair of data exactly same in indxTem1 and
indxTem2 in the previous example. I changed data as below which is very
likely to be in my real data...


V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)  # V1 here are some data
index with lots of repeated numeric values
V2-c(1:23, 6,7,11,4,5,6,7)  # there are also duplicated values in V2
Tem1-cbind(V1,V2)
Tem2-Tem1[c(1:11,13:15,18:19),] # I know that Tem2 is a subset of Tem1...


And my target outcome is the difference between Tem1 and Tem2 as below:


  V1 V2

 333 12
 111 16
 111 17
 111 20
 222 21
 222 22
 222 23
 222  6
 222  7
 333 11
 333  4
 333  5
 333  6
 333  7

Many thanks
HJ



On Wed, Mar 6, 2013 at 9:29 PM, arun smartpink...@yahoo.com wrote:



 Hi,
 How about this:

 indxTem1-paste0(Tem1[,1],Tem1[,2])
  indxTem2-paste0(Tem2[,1],Tem2[,2])
 Tem1[!indxTem1%in%indxTem2,]
 #   V1 V2
  #[1,] 333 11
  #[2,] 111 16
  #[3,] 111 17
  #[4,] 111 20
  #[5,] 222 21
  #[6,] 222 22
  #[7,] 222 23
  #[8,] 222  1
  #[9,] 222  2
 #[10,] 333  3
 #[11,] 333  4
 #[12,] 333  5
 #[13,] 333  6
 #[14,] 333  7


 A.K.
 
 From: HJ YAN yhj...@googlemail.com
 To: arun smartpink...@yahoo.com
 Cc: r-help@r-project.org
 Sent: Wednesday, March 6, 2013 4:09 PM
 Subject: Re: [R] How to combine conditional argument and logical argument
 in R to create subset of data...


 Dear Arun


 Thanks a million for your prompt reply and I love all four ways in your
 reply.

 Tried the code and just realised an issue here:   in my real work, my data
 is about 4GB large and I'm sure that there are many duplicated values in
 V2, so that is to say my V1 and V2 should be something like


 V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)  # V1 here are some data
 index with lots of repeated numeric values
 V2-c(1:23, 1:7)  # there are also duplicated values in V2
 Tem1-cbind(V1,V2)
 Tem2-Tem1[c(1:10,12:15,18:19),] # I know that Tem2 is a subset of Tem1...


 So how do I get outcome of the difference of Tem1 and Tem2 if the values
 in V2 having duplicates?

   V1 V2
  333 11
  111 16
  111 17
  111 20
  222 21
  222 22
  222 23
  222  1
  222  2
  333  3
  333  4
  333  5
  333  6
  333  7


 Massive thanks
 HJ





 On Wed, Mar 6, 2013 at 4:12 PM, arun smartpink...@yahoo.com wrote:


 
 Just to add:
 
 Tem1[Tem1[,2]%in%setdiff(Tem1[,2],Tem2[,2]),]
 
 A.K.
 
 - Original Message -
 
 From: arun smartpink...@yahoo.com
 To: HJ YAN yhj...@googlemail.com
 Cc: R help r-help@r-project.org
 Sent: Wednesday, March 6, 2013 11:06 AM
 Subject: Re: [R] How to combine conditional argument and logical argument
 in R to create subset of data...
 
 Hi,
 No problem.
 V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)
  length(V1)
 #[1] 30
 
  V2- c(1:30) #should be the same length as V1
 Tem1- cbind(V1,V2)
 Tem2-Tem1[1:20,]
 
 Tem1[!Tem1[,2]%in%Tem2[,2],]
  #  V1 V2
  #[1,] 222 21
  #[2,] 222 22
  #[3,] 222 23
  #[4,] 222 24
  #[5,] 222 25
  #[6,] 333 26
  #[7,] 333 27
  #[8,] 333 28
  #[9,] 333 29
 #[10,] 333 30
 
 #or
 subset(Tem1,!V2%in% Tem2[,2])
 #or
  Tem1[is.na(match(Tem1[,2],Tem2[,2])),]
  #  V1 V2
  #[1,] 222 21
  #[2,] 222 22
  #[3,] 222 23
  #[4,] 222 24
  #[5,] 222 25
  #[6,] 333 26
  #[7,] 333 27
  #[8,] 333 28
  #[9,] 333 29
 #[10,] 333 30
 A.K.
 
 
 
 
 
 From: HJ YAN yhj...@googlemail.com
 To: arun smartpink...@yahoo.com
 Sent: Wednesday, March 6, 2013 10:33 AM
 Subject: Re: [R] How to combine conditional argument and logical argument
 in R to create subset of data...
 
 
 Thank you SO MUCH Arun!!!
 
 That's brilliant-- I've learnt some very useful new R command now, e.g.
 'do.call' and 'split'. And I see where my code went wrong now.
 
  I do appreciate greatly for your prompt reply.
 
 Also, I wonder if there exist a package can find difference between two
 data frames, e.g. one is a subset of the other? e.g.
 
  V1-rep(c(rep(111,5),rep(222,5),rep(333,5)),2)
  V2-c(1:23)
 Tem1-cbind(V1,V2)
 
 Tem2-Tem1[1:20,]
 
 
 How do I get outcome like
 
 [21,] 333 21
 [22,] 333 22
 [23,] 333 23
 
 
 P.S. I used 'setdiff' before, but seems it only works for vectors but not
 for dataframe??
 
 
 Sorry for so many questions today, as I'm coding for a work deadline
 tonight.
 
 
 Many thanks!
 Cheers
 HJ
 
 
 
 
 
 
 
 On Wed, Mar 6, 2013 at 1:55 PM, arun smartpink...@yahoo.com wrote:
 
 Hi,
 You can also try this:
  Tem3- list()
  for(i in unique(Tem1[,1])) {
  Tem3[[i]]- subset(Tem1,Tem1[,1]==i)
  Tem4- do.call(rbind,Tem3)
  }
 head(Tem4)
 #  V1 V2
 #[1,] 111  1
 #[2,] 111  2
 #[3,] 111  3
 #[4,] 111  4
 #[5,] 111 13
 #[6,] 111 14
 
 
 #or
 Tem3-c(NA,NA)
  for(i in unique(Tem1[,1])) {
  Tem2- subset(Tem1, Tem1[,1]==i)
  Tem3- rbind(Tem3,Tem2)
  Tem5- Tem3[-1,]
  }
 head(Tem5)
 #  V1 V2
 # 111  1
 # 111  2
 # 111  3
 # 111  4
 # 111 13
 # 111 14
 
 A.K.
 
 
 
 From: HJ YAN yhj...@googlemail.com
 
 To: arun

[R] How to combine conditional argument and logical argument in R to create subset of data...

2013-03-05 Thread HJ YAN
Dear R user

I have data created using code below

b-matrix(2:21,nrow=4)
b[,1:3]=NA
b[4,2]=5
b[3,1]=6

Now the data is

 b
 [,1]  [,2]   [,3]  [,4]  [,5]
[1,]   NA   NA   NA   14   18
[2,]   NA   NA   NA   15   19
[3,]  6   NA   NA   16   20
[4,]   NA5 NA17   21


I want to keep data in column 4 greater than 15 and the value in column 1 
2 either greater than 4 or is 'NA'. So I would like to have
my outcome as below...

[3,]   6   NA NA 16 20
[4,] NA 5 NA 17 21

I thought something like the code below gonna to work but it only returns
the last row,e.g NA 5 NA 17 21. ...

bb-b[which( (b[,2]4 | b[,2]==NA)  (b[,1]4 | b[,1]==NA)  b[,4]15) ,])


Please could anyone help?

Many thanks in advance

HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] asCairoDevice issue

2013-01-10 Thread Li, Yan
Hi All,

I found this issue when using asCairoDevice to transforming splom scatter plot 
to my RGtk2 GUI:

If I put the code in R GUI or using CairoPNG or Cairo_pdf() to draw the scatter 
plot, I can get it correctly:

The codes are: (you can copy and paste to your R GUI)

super.sym - trellis.par.get(superpose.symbol)
plot.call-splom(~iris[1:4], groups = Species, data = iris,
  panel = panel.superpose,
  key = list(title = Three Varieties of Iris,
 columns = 3,
 points = list(pch = super.sym$pch[1:3],
 col = super.sym$col[1:3]),
 text = list(c(Setosa, Versicolor, Virginica

plot.call

However if I want to draw the same plot in the drawing area by asCairoDevice I 
lost all the colorful dots in the upper left and lower right, having only the 
diagonal charts:

The codes are: (you can copy and paste to your R GUI)

win- gtkWindowNew(show= FALSE)
DA- gtkDrawingArea()
asCairoDevice(DA)
win$add(DA)
win$show()
plot.call

Did I miss anything here? Thanks a lot!!!

Regards,
Yan




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can't remember which package I used. Anyone can help please?

2012-11-13 Thread HJ YAN
Dear R users

 I tried an example earlier to check the results using two different
methods of clustering with same data set, and for both methods, say method
A and method B. Also I decided to have same number of groups/clusters (here
8 was chosen).

I found a good graphical tool in R to compare the distribution of the
number in each group/cluster created by method A and method B, e.g. how
many of the individuals are clustered as 'Group 1' both by method A and B,
and how many are clustered as 'Group 2' both by method A and B and so on.
In this case we have 30 and 28 respectively (please see attached table and
plot).

I lost my code and can not remember which packages/functions I used here?
 Could anyone recognize and give me some clue? I only made a note
'crosstable' if can ring your bell...

Many thanks in advance

HJ
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how are the values in ksvm related to the svm formula ( the parameter for kernal function)

2012-11-06 Thread Li, Yan
Hi All,

The svm formula, saying the simplest linear one: w*x+b=y.
Where w is the normal vectors of the hyper plane and can be calculated by 
sum(alpha(i)*y(i)*x)

The values of ksvm of kernlab has alpha, coef, ymatrix, xmatrix and b. b is 
sure the offset. How are the others related ? alpha = alpha, ymatrix= y, and 
xmatrix = x? what is coef ?

Thanks a lot!

Regards,
Yan


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summarizing data containing data/time information (as factor)

2012-09-06 Thread HJ YAN
, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04,
0.03, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.03, 0.03, 0.03, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04,
0.04, 0.03, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.03, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.03, 0.03, 0.04, 0.04, 0.03, 0.04, 0.04, 0.03, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04,
0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.03, 0.03,
0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.03,
0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
0.04, 0.04, 0.04, 0.04)), .Names = c(V1, V2, V3), class =
data.frame, row.names = c(NA,
-912L))


Let make my question clearer:

I want to make summary of the data above into a table as below to show
the days that there are any data available, e.g.value=1 if
there are any data available for that day, otherwise value=0. There are
three id in my data: 532703, 532704 and 532705. data are collected at 10
minutes interval each day (e.g. 144 observations for a day if no missing
data).


  28/04 29/0430/0401/05   02/05
532703 0  1 1   10
532704 1  1 1   11
532705 0  0 1   10


Sorry for the confusion to David, and hope this is clear now.

Many thanks again.

Best wishes,

HJ




























































On Thu, Sep 6, 2012 at 2:08 AM, arun smartpink...@yahoo.com wrote:

 Hi,

 I couldn't find any attached data.  Could you dput() the data?
 A.K.



 - Original Message -
 From: HJ YAN yhj...@googlemail.com
 To: r-help@r-project.org
 Cc:
 Sent: Wednesday, September 5, 2012 7:57 PM
 Subject: [R] Summarizing data containing data/time information (as factor)

 Dear R user

 I want to create a table (as below) to summarize the attached data
 (Test.csv, which can be read into R by using 'read.csv(Test.csv, header=F)'
 ), to indicate the day that there are any data available, e.g.value=1 if
 there are any data available for that day, otherwise value=0.


   28/04 29/0430/0401/05   02/05
 532703 0  1 1   10
 532704 1  1 1   11
 532705 0  0 1   10

 Only Column A (Names: automatically stored as integer if being read into R)
 and Column B (date/time: automatically stored as factor if being read into
 R) are useful for this task

[R] Summarizing data containing data/time information (as factor)

2012-09-05 Thread HJ YAN
Dear R user

I want to create a table (as below) to summarize the attached data
(Test.csv, which can be read into R by using 'read.csv(Test.csv, header=F)'
), to indicate the day that there are any data available, e.g.value=1 if
there are any data available for that day, otherwise value=0.


  28/04 29/0430/0401/05   02/05
532703 0  1 1   10
532704 1  1 1   11
532705 0  0 1   10

Only Column A (Names: automatically stored as integer if being read into R)
and Column B (date/time: automatically stored as factor if being read into
R) are useful for this task.

Could anyone kindly provide me some hints/ideas about how to write some
code to to this job please?


Many thanks in advance!

Best wishes
HJ
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to convert data to 'normal' if they are in the form of standard scientific notations?

2012-08-06 Thread HJ YAN
Dear R users

I read two csv data files into R and  called them Tem1 and Tem5.

For the first column, data in Tem1 has 13 digits where in Tem5 there are 14
digits for each observation.

Originally there are 'numerical' as can be seen in my code below.  But how
can I display/convert them using other form rather than scientific
notations which seems a standard/default?

 I want them to be in the form like '20110911001084', but I'm very confused
why when I used 'as.factor' call it works for my 'Tem1' but not for
'Tem5'...??


Many thanks!

HJ

 Tem1[1:5,1][1] 2.10004e+12 2.10004e+12 2.10004e+12 2.10004e+12 2.10004e+12 
 Tem5[1:5,1][1] 2.011091e+13 2.011091e+13 2.011091e+13 2.011091e+13 
 2.011091e+13 class(Tem1[1:5,1])[1] numeric class(Tem5[1:5,1])[1] 
 numeric as.factor(Tem1[1:5,1])[1] 2.10004e+12 2.10004e+12 2.10004e+12 
 2.10004e+12 2.10004e+12
Levels: 2.10004e+12 as.factor(Tem5[1:5,1])[1] 20110911001084
20110911001084 20110911001084 20110911001084 20110911001084
Levels: 20110911001084

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to convert data to 'normal' if they are in the form of standard scientific notations?

2012-08-06 Thread HJ YAN
Dear Jean

Thanks a lot for your help.

The reason I did not provide producible code is that my work started with
reading in some large csv files, e.g. the data is not created by myself.
 But the data is from the same data provider so I would expect to receive
data in exactly same data format.


I use read.csv to read the data in. My major curious is that by using
exactly same code as I provided in my email, e.g. 'as.factor' why one of
them work (e.g. convert the numerical data to factor) but the other  one
remains numerical with scientific notation?  So, in R, how do I check if
the data format are different for these two files in their original csv
files, which  might cause the different results..?

Also I tried your code and created some reproducible examples, but still
can not make it work as in your example


 a-c(2.0e+9,2.1e+9) print(a,digits=4)[1] 20 21  # I expected 
 to see 2.0e+9 here...? print(a,digits=7)[1] 20 21  # Think 
 here I should expect same 2.0e+9? getOption(digits)  # Checking my 
 default number of digits now..[1] 7 b-c(3000,3100) print(b)[1] 
 3000 3100   # This is what I expected to see 
 print(b,digits=5)[1] 3000 3100   # I'm so confused why it is not 
 working, e.g. printing 3.0e+9! getOption(digits)   # checking again, but 
 now I would expect it has being changed to 5[1] 7


Any thoughts please...?

Thanks
HJ


On Mon, Aug 6, 2012 at 7:04 PM, Jean V Adams jvad...@usgs.gov wrote:

 HJ,

 You don't provide any reproducible code, so I had to make up my own.

 dat - data.frame(a=letters[1:5], x=c(20110911001084, 20110911001084,
 20110911001084, 20110911001084, 20110911001084),
 y=c(2.10004e+12, 2.10004e+12, 2.10004e+12, 2.10004e+12,
 2.10004e+12))

 In my example, the long numbers print out without scientific notation.

 dat
   a  x y
 1 a 20110911001084 210004000
 2 b 20110911001084 210004000
 3 c 20110911001084 210004000
 4 d 20110911001084 210004000
 5 e 20110911001084 210004000

 I can make it print with scientific notation using the digits argument to
 the print() function.

 print(dat, digits=3)
   ax   y
 1 a 2.01e+13 2.1e+12
 2 b 2.01e+13 2.1e+12
 3 c 2.01e+13 2.1e+12
 4 d 2.01e+13 2.1e+12
 5 e 2.01e+13 2.1e+12

 What is your default number of digits?
 getOption(digits)

 Jean


 HJ YAN yhj...@googlemail.com wrote on 08/06/2012 11:14:17 AM:

 
  Dear R users
 
  I read two csv data files into R and  called them Tem1 and Tem5.
 
  For the first column, data in Tem1 has 13 digits where in Tem5 there are
 14
  digits for each observation.
 
  Originally there are 'numerical' as can be seen in my code below.  But
 how
  can I display/convert them using other form rather than scientific
  notations which seems a standard/default?
 
   I want them to be in the form like '20110911001084', but I'm very
 confused
  why when I used 'as.factor' call it works for my 'Tem1' but not for
  'Tem5'...??
 
 
  Many thanks!
 
  HJ
 
   Tem1[1:5,1][1] 2.10004e+12 2.10004e+12 2.10004e+12 2.10004e+12 2.
  10004e+12 Tem5[1:5,1][1] 2.011091e+13 2.011091e+13 2.011091e+13 2.

  011091e+13 2.011091e+13 class(Tem1[1:5,1])[1] numeric class(Tem5
  [1:5,1])[1] numeric as.factor(Tem1[1:5,1])[1] 2.10004e+12 2.
  10004e+12 2.10004e+12 2.10004e+12 2.10004e+12
  Levels: 2.10004e+12 as.factor(Tem5[1:5,1])[1] 20110911001084
  20110911001084 20110911001084 20110911001084 20110911001084
  Levels: 20110911001084


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using glmnet for the dataset with numerical and categorical

2012-07-12 Thread yan
Dear R users,

if all my numerical variables in my datasets having the same units, may I
leave them unnormalized, just do cv.glmnet
directly(cv.glmnet(data,standardize=FALSE))?
i know normally if there is a mixture of numerical and categorical , one has
to standardize the numerical part before applying  cv.glmnet with
standardize=fase, but that's due to the different units in the numerical
part, right? so if all the units are the same, could I skip the pre
standardize part? 

I tried both ways(standardize the numerical part and not standardize
numerical part), the results are very similar, what I don't understand is
for the coefficients of categorical variables, why they are so different in
two cases?

Many thanks

Yan

--
View this message in context: 
http://r.789695.n4.nabble.com/using-glmnet-for-the-dataset-with-numerical-and-categorical-tp4636279.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Read vector as multi-dimensional data in R by row

2012-07-09 Thread HJ YAN
Dear R users


Say I wanted to read a vector into R as multi-dimensional array by row,
e.g.

a-c(1:20)

 b-array(a,dim=c(2,5,2))
 b
, , 1

 [,1] [,2] [,3] [,4] [,5]
[1,]13579
[2,]2468   10

, , 2

 [,1] [,2] [,3] [,4] [,5]
[1,]   11   13   15   17   19
[2,]   12   14   16   18   20


But actually I wanted...

 [,1] [,2] [,3] [,4] [,5]
[1,]12345
[2,]6789   10

, , 2

 [,1] [,2] [,3] [,4] [,5]
[1,]   11   12   13   14   15
[2,]   16   17   18   19   20


I checked '?array' but there is not an argument or something  like
'byrow=T' as the one in 'matrix'.

Could anyone help please?

Many thanks in advance!

HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Resolution issue with exporting plots from R and write tables in Latex code for producing pdf document

2012-06-08 Thread HJ YAN
Dear R users

I am trying to exporting plots from R to an external folder, or to the
working directory, but the resolution of plots (pdf file) largely reduced.
Any way I can get same quality as my original plots?? e.g. I tested the
plotting part using one example and obtained pretty good (/readable)
quality for each plot in the (4*4) multiple graph. But when I did the loop
and tried to export those plots out using 'dev.copy', the quality was not
same. I do need this loop function as there are 400 datasets so I can not
handle them manually.

Also I am using 'xtable' to write Latex code for my summary tables of the
data (again there are 400 datasets). I saved those Latex code in a list and
use 'sink()' to save them outside the loop function, which works ok. But as
my table is wide, so I found that the Latex code produced by xtable can not
fit my purpose well, e.g. I have set size to 'tiny' but the table still too
wide. Is there an alternative package more handy to do the job? Also are
there a good way to write some hundred tables from R to Latex for producing
pdf documents in an easier fashion?? i.e. Just realised that Latex does not
like compiling large amount tables/figures in one go!

Any advices/ideas are greatly appreciated!


Best wishes
HJ




Below is my code...
=
SumTab-function(Data=SortedDataInList,StartDate=30/1/12,EndDate=31/05/12,StartTime=22:50:00,EndTime=23:00:00){
  Start= chron(StartDate,StartTime , format=c(dates=d/m/y, times=h:m:s))
  End= chron(EndDate,EndTime , format=c(dates=d/m/y, times=h:m:s))
  deltat - times(00:10:00)
  TT - seq(Start,End, by = times(00:10:00))
  TT1 = substr(TT, 2, 18)


  Data1=Data
  for (i in 1:length(Data1)){
SumTab1[[i]]= matrix(NA, nrow=5, ncol=ncol(SortedDataInList[[i]])-4)
SortedDataInList1[[i]]= matrix(NA, nrow=length(TT1),
ncol=ncol(Data1[[i]]))

SortedDataInList1[[i]]=Data1[[i]][match(as.character(TT1),as.character(Data1[[i]][,1])),]


SumTab1[[i]][1,]-apply(SortedDataInList1[[i]][,4:16],2,min)
SumTab1[[i]][2,]-apply(SortedDataInList1[[i]][,4:16],2,mean)
SumTab1[[i]][3,]-apply(SortedDataInList1[[i]][,4:16],2,median)
SumTab1[[i]][4,]-apply(SortedDataInList1[[i]][,4:16],2,sd)
SumTab1[[i]][5,]-apply(SortedDataInList1[[i]][,4:16],2,max)



colnames(SumTab1[[i]])=c(vOL1,VOL2,VOL3,CUR1,CUR2,CUR3,THD1,THD2,THD3,RPD,RPR,RAPD,RAPR)
rownames(SumTab1[[i]])=c(Min,Mean,Standard
Deviation,Median,Max)


SumLax[[i]]-xtable(SumTab1[[i]],label=as.character(StationsInDir[i]),caption=as.character(StationsInDir[i]))


par(mfrow=c(4,4),oma=c(4,0,2,0))

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,4],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Voltage
1 (v),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,5],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Voltage
2 (v),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,6],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Voltage
3 (v),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,7],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Current
1 (A),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,8],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Current
2 (A),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,9],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Current
3 (A),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,10],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Total
harmonic distortion 1 (%),xlab=Date/Time ind.(10 min.
int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,11],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Total
harmonic distortion 2 (%),xlab=Date/Time ind.(10 min.
int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,12],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Total
harmonic distortion 3 (%),xlab=Date/Time ind.(10 min.
int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,13],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Real
power delivered  (mw),xlab=Date/Time ind.(10 min.
int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,14],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Real
power received  (mw),xlab=Date/Time ind.(10 min. int.),type=p,cex=.001)

plot(1:length(SortedDataInList1[[i]][,1]),SortedDataInList1[[i]][,15],xlim=c(1,length(SortedDataInList1[[i]][,1])),ylab=Reactive
power delivered (MVAr),xlab=Date/Time ind.(10 min.
int.),type=p,cex=.001)


Re: [R] Resolution issue with exporting plots from R and write tables in Latex code for producing pdf document

2012-06-08 Thread HJ YAN

 Dear Duncan

 Many thanks again for your help!

 Now I've mended my code in the following simplified form:


 pdf(filename.pdf,paper=a4,width=8,height=12,encoding=default)
 par(mfrow=c(2,2))

 plot(1:length(SortedDataInList1[[3]][,1]),SortedDataInList1[[3]][,4],xlim=c(1,length(SortedDataInList1[[3]][,1])),ylab=Voltage
 1 (v),xlab=Date/Time ind.(10 min. int.),type=p,cex=.0001)
 plot(...)
 plot(...)
 plot(...)
 dev.off()

 And I think that has included all features I need but the outputs I
 got only appear to show part of the figures, e.g. the frame, values on
 x-axes (e.g. 1:length(SortedDataInList1[[3]][,1])), xlab, ylab, titles,
 subtitles are all shown. But not  values on y-axes,
 e.g. 'SortedDataInList1[[3]][,i]' for i=4,5,6... So just 4 blank frames
 with no data in the figure!

 Have I missed to put in some important arguments in this line??

  pdf(filename.pdf,paper=a4,width=8,height=12,encoding=default)


 I have tested that the 'middle part' of my code (as shown below) works
 fine and I can get the plots as I wanted.
 ===
  par(mfrow=c(2,2))
 plot(1:length(SortedDataInList1[[3]][,1]),SortedDataInList1[[3]][,4],xlim=c(1,length(SortedDataInList1[[3]][,1])),ylab=Voltage
 1 (v),xlab=Date/Time ind.(10 min. int.),type=p,cex=.0001)
 plot(...)
 plot(...)
 plot(...)
 

 Many many thanks!

 HJ




 On Fri, Jun 8, 2012 at 11:53 AM, Duncan Murdoch 
 murdoch.dun...@gmail.comwrote:

 On 12-06-08 6:46 AM, yhj...@googlemail.com wrote:

 Dear Duncan



 Thanks a lot for your hints.

 As you can see from my code (just one line above the command using
 dev.copy) I have tried using pdf but got same problem, so I hushed it out.


 That's not the right place to put the pdf() call, it should appear before
 any of the graphing calls.  To keep the pdf resolution, you need to plot to
 the pdf device.

 Duncan Murdoch



 Any ideas??

 Many thanks!

 HJ
 Sent using BlackBerry® from Orange

 -Original Message-
 From: Duncan Murdochmurdoch.duncan@gmail.**commurdoch.dun...@gmail.com
 
 Date: Fri, 08 Jun 2012 05:21:52
 To: HJ YANyhj...@googlemail.com
 Cc: r-help@r-project.orgr-help@r-**project.org r-help@r-project.org
 Subject: Re: [R] Resolution issue with exporting plots from R and write
 tables
  in Latex code for producing pdf document

 On 12-06-07 10:08 PM, HJ YAN wrote:

 Dear R users

 I am trying to exporting plots from R to an external folder, or to the
 working directory, but the resolution of plots (pdf file) largely
 reduced.
 Any way I can get same quality as my original plots?? e.g. I tested the
 plotting part using one example and obtained pretty good (/readable)
 quality for each plot in the (4*4) multiple graph. But when I did the
 loop
 and tried to export those plots out using 'dev.copy', the quality was
 not
 same. I do need this loop function as there are 400 datasets so I can
 not
 handle them manually.


 Don't use dev.copy.  Use pdf() then dev.off() to produce the plots in
 the first place.



 Also I am using 'xtable' to write Latex code for my summary tables of
 the
 data (again there are 400 datasets). I saved those Latex code in a list
 and
 use 'sink()' to save them outside the loop function, which works ok.
 But as
 my table is wide, so I found that the Latex code produced by xtable can
 not
 fit my purpose well, e.g. I have set size to 'tiny' but the table still
 too
 wide. Is there an alternative package more handy to do the job? Also are
 there a good way to write some hundred tables from R to Latex for
 producing
 pdf documents in an easier fashion?? i.e. Just realised that Latex does
 not
 like compiling large amount tables/figures in one go!


 latex has a lot of packages for handling large tables, but it may be
 better to redesign your table to not be so wide.  The tables package
 might help with this, but it doesn't have any particular support for
 wide tables.

 Duncan Murdoch




 Any advices/ideas are greatly appreciated!


 Best wishes
 HJ




 Below is my code...
 =
 SumTab-function(Data=**SortedDataInList,StartDate=**
 30/1/12,EndDate=31/05/12,**StartTime=22:50:00,EndTime=**
 23:00:00){
Start= chron(StartDate,StartTime , format=c(dates=d/m/y,
 times=h:m:s))
End= chron(EndDate,EndTime , format=c(dates=d/m/y, times=h:m:s))
deltat- times(00:10:00)
TT- seq(Start,End, by = times(00:10:00))
TT1 = substr(TT, 2, 18)


Data1=Data
for (i in 1:length(Data1)){
  SumTab1[[i]]= matrix(NA, nrow=5, ncol=ncol(SortedDataInList[[i]**
 ])-4)
  SortedDataInList1[[i]]= matrix(NA, nrow=length(TT1),
 ncol=ncol(Data1[[i]]))

 SortedDataInList1[[i]]=Data1[[**i]][match(as.character(TT1),**
 as.character(Data1[[i]][,1])),**]


  SumTab1[[i]][1,]-apply(**SortedDataInList1[[i]][,4:16],**2,min)
  SumTab1[[i]][2,]-apply(**SortedDataInList1[[i]][,4:16],**2,mean)
  SumTab1[[i]][3,]-apply(**SortedDataInList1[[i]][,4:16],**
 2,median)
  SumTab1[[i]][4,]-apply(**SortedDataInList1[[i]][,4:16],**2,sd

Re: [R] Reading a bunch of csv files into R

2012-05-28 Thread HJ YAN
  Dear Rui, Kevin, Bryan and Nutter


Thank you so much for your very helpful hints!

Now I have extracted all the file names and managed to edit them using the
code (1)-(4) below and obtained the name format as I wanted

(1) files-list.files(path = myworking directory, pattern = NULL,
all.files = FALSE,
   full.names = FALSE, recursive = FALSE,ignore.case = FALSE,
include.dirs = FALSE)

(2) filenames - files[grep([.]csv, files)]

[1] 512180_20120523150757.csv
513687_20120523181947.csv
513690_20120524112111.csv
 521858_20120524091428.csv
 523215_20120523123419.csv
...(a few hundred more...)


(3) data_names - gsub([.]csv, , filenames)

(4) NAME- paste(Data,data_names, sep=.)


Up to here I got NAME containing all the names I'm going to use..

 NAME
[1] Data.512180_20120523150757
Data.513687_20120523181947
Data.513690_20120524112111
 Data.521858_20120524091428
 Data.523215_20120523123419



 But I still haven't successfuly  read the whole bunch of csv files into R
and name them as expected...e.g. I want to read 512180_20120523150757.csv
into R and name it Data.512180_20120523150757 and so on...
For a single file we can just write

Data.512180_20120523150757-read.csv(512180_20120523150757.csv)

If any of the following commands (as you suggested) works, then my question
is sorted out. But I got error messages for every attempt...
(i)
 df.list - lapply(seq_len(filenames), read.csv)

Error in seq_len(filenames) :
  argument must be coercible to non-negative integer
In addition: Warning message:
In is.vector(X) : NAs introduced by coercion

 filenames
[1] 512180_20120523150757.csv 513687_20120523181947.csv
513690_20120524112111.csv 521858_20120524091428.csv
[5] 523215_20120523123419.csv...


(ii) None of the following code works...

myDir=myworking directory

#for(i in 1:length(filenames)){assign(NAME[i], read.csv(file.path(myDir,
filenames[i])))}
#for(i in 1:5){assign(NAME[i], read.csv(file.path=myDir, filenames[i]))}

setwd(myworking directory)
#for(i in 1:5){assign(NAME[i], read.csv( filenames[i]))}



Warning messages:
1: In N[i] - read.csv(filenames[i]) :
  number of items to replace is not a multiple of replacement length
2: In N[i] - read.csv(filenames[i]) :
  number of items to replace is not a multiple of replacement length
3: In N[i] - read.csv(filenames[i]) :
  number of items to replace is not a multiple of replacement length
4: In N[i] - read.csv(filenames[i]) :
  number of items to replace is not a multiple of replacement length
5: In N[i] - read.csv(filenames[i]) :
  number of items to replace is not a multiple of replacement length


Seems I am getting there, but could you spot where my code went wrong
please??

Many thanks again!

HJ





On Fri, May 25, 2012 at 8:36 PM, Rui barradas rui1...@sapo.pt wrote:

 Hello,

 Or maybe put the data frames in a list

 df.list - lapply(seq_len(filenames), read.csv, ...) # '...other...' are
 options you might want to pass, (like headers=TRUE)
 names(df.list) - data_names

 Now access the data frames by number in the list or by name in data_names.

 Hope this helps,

 Rui Barradas
 Em 25-05-2012 20:08, Nutter, Benjamin escreveu:

  For example:

 myDir- some file path
 filenames- list.files(myDir)
 filenames- filenames[grep([.]csv, filenames)]

 data_names- gsub([.]csv, , filenames)

 for(i in 1:length(filenames)) assign(data_names[i],
 read.csv(file.path(myDir, filenames[i])))


  Benjamin Nutter |  Biostatistician |  Quantitative Health Sciences
   Cleveland Clinic|  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216)
 445-1365


 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-bounces@r-**project.orgr-help-boun...@r-project.org]
 On Behalf Of Kevin Wright
 Sent: Friday, May 25, 2012 2:55 PM
 To: HJ YAN
 Cc: r-help@r-project.org
 Subject: Re: [R] Reading a bunch of csv files into R

 See ?dir

 Assign the value to a vector and loop over the elements of the vector.

 Kevin


 On Fri, May 25, 2012 at 12:16 PM, HJ YANyhj...@googlemail.com  wrote:

 Dear R users


 I am struggling from a data importing issue:

 I have some hundreds of csv files needed to be read into R for futher
 analysis. All those csv files are named in one of the three formats:

 (1) strings: e.g. London_Oxford street
 (2) Integer: e.g. 1234_5678
 (3) combined: e.g. London_1234

 I intend to use read.csv(_xxx.csv) but I only dealt with sigle
 documents before and if there are only no more than 20 files, I do not
 bother to search a more efficient way.


 Is there any claver way that I do not have to type in all these
 hundreds names by hand, maybe using a R package or write some code in
 some other languages if it is not too difficult to learn.

 Any thoughts/hints please??

 Many thanks in advance!

 HJ

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do

Re: [R] Reading a bunch of csv files into R

2012-05-28 Thread HJ YAN
Dear Bryan

Thank you so much for your prompt reply!

Please see my responds below under = in your reply...

Many thanks again!

HJ

On Mon, May 28, 2012 at 4:45 PM, Bryan Hanson han...@depauw.edu wrote:

 OK, a couple of things (I only looked through quickly):

 1.  R doesn't allow variable names to begin with a number.  Be sure you
 don't try that.


Yes, I understand this. Some of my csv files' name begining with
number, so I put 'Data' infront them using  'NAME-
paste(Data,data_names, sep=.)' as shown in my last email.


  2.  What's the overall goal here?  Read them in, change the name, then
 write them out?  Let us know and it will be easier to help you.

=
The overall goal here is for my current study I receive hundreds of csv
files every two weeks, and I need to read them into R for futher analysis,
e.g. the data are recorded in 10 minutes apart interval and are collected
every two weeks from a few hundreds monitors.

 So I want to know how to do these jobs more efficiently:

(i) Read them into R; Put the data from same monitors together and checking
missing values, manipulate the data in the way we need, e.g. accordig to
region, monitoring type, which involves aggregating the whole group (or a
sub group) of the data etc;

(ii) Edit the names, because sometimes we want to match names in one format
to another, e.g. 512180_20120523150757==London_2012_May_23rd_15:07:57
(e.g. Location name_Year_Month_Day_Hour_Minute_Second)

(iii) If (i) and (ii) can be done I would think 'write them out' into csv
would not be too difficult. Mainly we do analysis in R and no need output
in csv format so far...




  3.  Regardless of your goal, I think you are over thinking the
 solution.  Let us know what you want to accomplish and we can shorten it up
 I'm sure.

=
I am trying to input the data as a list which might be easier, but I am
not sure if other data type has advantage over that...


Data1-list( NAME)

[1] NAME
 Data.512180_20120523150757 Data.513687_20120523181947
Data.513690_20120524112111 Data.521858_20120524091428
Data.523215_20120523123419

for(i in 1:length(filenames)) {Data1[[i]]-read.csv(filenames[i])}

But when I tried to access the components in this list 'Data1', only the
first method of the three (shown below) works, and I think the other two
are more useful for me. Any ideas??

(1) Data1[[1]]
 *** this one works
(2) Data1[[Data.512180_20120523150757]]
 *** this one doesn't work
(3)  Data1$Data.512180_20120523150757
  *** this one doesn't work

Hope I have made myself clear here.

Thanks!
HJ



 Bryan

  On May 28, 2012, at 11:20 AM, HJ YAN wrote:

   Dear Rui, Kevin, Bryan and Nutter


 Thank you so much for your very helpful hints!

 Now I have extracted all the file names and managed to edit them using the
 code (1)-(4) below and obtained the name format as I wanted

 (1) files-list.files(path = myworking directory, pattern = NULL,
 all.files = FALSE,
full.names = FALSE, recursive = FALSE,ignore.case = FALSE,
 include.dirs = FALSE)

 (2) filenames - files[grep([.]csv, files)]

 [1] 512180_20120523150757.csv
 513687_20120523181947.csv
 513690_20120524112111.csv
  521858_20120524091428.csv
  523215_20120523123419.csv
 ...(a few hundred more...)


 (3) data_names - gsub([.]csv, , filenames)

 (4) NAME- paste(Data,data_names, sep=.)


 Up to here I got NAME containing all the names I'm going to use..

  NAME
 [1] Data.512180_20120523150757
 Data.513687_20120523181947
 Data.513690_20120524112111
  Data.521858_20120524091428
  Data.523215_20120523123419
 


  But I still haven't successfuly  read the whole bunch of csv files into R
 and name them as expected...e.g. I want to read 512180_20120523150757.csv
 into R and name it Data.512180_20120523150757 and so on...
 For a single file we can just write

 Data.512180_20120523150757-read.csv(512180_20120523150757.csv)

 If any of the following commands (as you suggested) works, then my
 question is sorted out. But I got error messages for every attempt...
 (i)
  df.list - lapply(seq_len(filenames), read.csv)

 Error in seq_len(filenames) :
   argument must be coercible to non-negative integer
 In addition: Warning message:
 In is.vector(X) : NAs introduced by coercion

  filenames
 [1] 512180_20120523150757.csv 513687_20120523181947.csv
 513690_20120524112111.csv 521858_20120524091428.csv
 [5] 523215_20120523123419.csv...


 (ii) None of the following code works...

 myDir=myworking directory

 #for(i in 1:length(filenames)){assign(NAME[i], read.csv(file.path(myDir,
 filenames[i])))}
 #for(i in 1:5){assign(NAME[i], read.csv(file.path=myDir, filenames[i]))}

 setwd(myworking directory)
 #for(i in 1:5){assign(NAME[i], read.csv( filenames[i]))}



 Warning messages:
 1: In N[i] - read.csv(filenames[i]) :
   number of items to replace is not a multiple of replacement length
 2: In N[i] - read.csv(filenames[i]) :
   number of items to replace is not a multiple

[R] Reading a bunch of csv files into R

2012-05-25 Thread HJ YAN
Dear R users


I am struggling from a data importing issue:

I have some hundreds of csv files needed to be read into R for futher
analysis. All those csv files are named in one of the three formats:

(1) strings: e.g. London_Oxford street
(2) Integer: e.g. 1234_5678
(3) combined: e.g. London_1234

I intend to use read.csv(_xxx.csv) but I only dealt with
sigle documents before and if there are only no more than 20 files, I do
not bother to search a more efficient way.


Is there any claver way that I do not have to type in all these hundreds
names by hand, maybe using a R package or write some code in some other
languages if it is not too difficult to learn.

Any thoughts/hints please??

Many thanks in advance!

HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet object to pmml

2012-05-18 Thread yan
Dear R users

I generated a model using glmnet, I need to convert it to pmml, but R pmml
package doesn't support glmnet, has anyone come across similar problem? any
idea to solve it?

Many thanks

YAn

--
View this message in context: 
http://r.789695.n4.nabble.com/glmnet-object-to-pmml-tp4630493.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet object to pmml

2012-05-18 Thread Yan Jiao
Dear R users

I used glmnet  generating a regression model, now I need to convert it to pmml 
format, but I noticed pmml r package doesn't support glmnet object, has anyone 
find a way solving this problem? I was thinking convert glmnet object to glm 
object, has anyone tried it?

Many thanks

Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cross validation in glmnet

2012-05-14 Thread yan
I am using cv.glmnet from glmnet package for logistic regression.
my dataset is very imbalanced, 5% sample from one group, the rest from the
other. I'm wondering when doing cv.glmnet for choosing lambda, is every fold
having the same ratio for two groups(every fold has 5% sample from one
group, the rest from the other in my case), or just random?

many thanks 

yan

--
View this message in context: 
http://r.789695.n4.nabble.com/cross-validation-in-glmnet-tp4629919.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmnet speed

2012-05-14 Thread yan
I'm using glmnet for logistic regression, I got a fairly sparse dataset,
20,000 samples(very imbalanced too, 5% from one group), 1500 variables,.

the code have beed running for 2 hours, still waiting for result, I am doing
lasso here(alpha=1), my computer is core 2 due CPU @3Ghz, 4GB ram, why it's
much more slower than the speed report in tibshirani etc's paper?

has anyone got same problem?

Many thanks

yan

--
View this message in context: 
http://r.789695.n4.nabble.com/glmnet-speed-tp4629953.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data indexing issue...

2012-03-27 Thread HJ YAN
Dear R-help,

My dataset (which is a data frame, called 'Calender' here)  includes 365
rows representing 365 days for a year.  One column ('Season')contains
factor data representing seasons, e.g. spring, summer, autumn and winter.
Another column (called 'Day') contains data representing wether the day  is
a working day  (I use 'Wd' for short here)or weekend (I use 'Wkend' for
short here).


I want to seperate the index of the working days  and weekends for each
season. I used R commend which before for one criteria, for example, if I
use...


WdIndex-which(Calender$Day=='Wd')

that will gives a set of indeices of working days in the year.

I wonder in R could I use a combination of something such as 'AND' , 'OR'
(e.g. in MySQL) to set 'multi-criteria'  when selecting data. So for
example...

WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter)


I know the above syntax is wrong, and I checked '?which' which did not give
me an answer and also tried '?AND' but seems it doesn`t exist at all...


Many thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Supperscript, subscript and double lines in the main/sub title and using greek letters

2012-03-27 Thread HJ YAN
Dear R-help,

 I am trying to express myself as best as I can here. If you also use Latex
to edit math reports or other languages with similar editing method,
you'll see what I'm talking about. My sincere appologies if my question is
not clear enough to some extend, as also I'm not able to provide my code
here because I don`t know which one I can use...

When editing the title in R plots, such as using 'plot', or 'xyplot' in
'lattic', what method do you use to write greek letters and make use of
superscript and subscript, e.g. to write mathematical expressions like
using Latex:

\sigma^2
\tau^{2s}
\mu_i
\pi_{2s}

Also I would like to learn how to make two lines in the main title or sub
title if the text I need it too long for putting in a single line, e.g. are
there some R code/syntax allowing me to do something like in Latex to make
two lines in the title, for example using '//' or '\\' to seperate the two
parts of the text I want to put in two lines??

I heard about using something like

plot(x,y, main=expression())

but from neither '?plot' or '?expression' could I find comprehensive
information about what I need...

Many thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data indexing issue...

2012-03-27 Thread HJ YAN
Hi Jim!

Thank you so much for the very helpful hints!!
I am learning 'split' now and it seems very useful..

HJ

On Tue, Mar 27, 2012 at 12:58 PM, jim holtman jholt...@gmail.com wrote:

 Why not use 'split' and get all the groups at once:

 result - split(Calandra, list(Calandra$Day, Calandra$Season, drop = TRUE)

 On Tue, Mar 27, 2012 at 7:43 AM, Ivan Calandra
 ivan.calan...@u-bourgogne.fr wrote:
  Hi HJ,
 
  Take a look at ?; this is probably what you're looking for.
 
  What you could also do is:
  Calender[Calender$Day=='Wd'  Calender$Season==Winter, ]  # notice the
  last comma
 
  This will subset directly without using which(); it might be helpful to
 you.
 
  HTH,
  Ivan
 
  --
  Ivan CALANDRA
  Université de Bourgogne
  UMR CNRS/uB 6282 Biogéosciences
  6 Boulevard Gabriel
  21000 Dijon, FRANCE
  +33(0)3.80.39.63.06
  ivan.calan...@u-bourgogne.fr
  http://biogeosciences.u-bourgogne.fr/calandra
 
 
  Le 27/03/12 12:32, HJ YAN a écrit :
 
  Dear R-help,
 
  My dataset (which is a data frame, called 'Calender' here)  includes 365
  rows representing 365 days for a year.  One column ('Season')contains
  factor data representing seasons, e.g. spring, summer, autumn and
 winter.
  Another column (called 'Day') contains data representing wether the day
   is
  a working day  (I use 'Wd' for short here)or weekend (I use 'Wkend' for
  short here).
 
 
  I want to seperate the index of the working days  and weekends for each
  season. I used R commend which before for one criteria, for example,
 if
  I
  use...
 
 
  WdIndex-which(Calender$Day=='Wd')
 
  that will gives a set of indeices of working days in the year.
 
  I wonder in R could I use a combination of something such as 'AND' ,
 'OR'
  (e.g. in MySQL) to set 'multi-criteria'  when selecting data. So for
  example...
 
  WinterWdIndex-which(Calender$Day=='Wd' AND Calender$Season==Winter)
 
 
  I know the above syntax is wrong, and I checked '?which' which did not
  give
  me an answer and also tried '?AND' but seems it doesn`t exist at all...
 
 
  Many thanks!
  HJ
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Supperscript, subscript and double lines in the main/subtitle and using greekletters

2012-03-27 Thread HJ YAN
Sorry last message was not completed before sending
Please below

On Tue, Mar 27, 2012 at 5:36 PM, HJ YAN yhj...@googlemail.com wrote:

 Thank you very much Gerrit, for the nice hints!

 Just done some more googling and reaserches on this and trying to
 answering it myself...

 Below is the code that works for double lines (adopted from Gerrit's
 hints) and some of the formats (e.g. 1 and 3, but not 2 and 4) listed below:

 (1) \sigma^2
 (2) \tau^{2s}
 (3) \mu_i
 (4) \pi_{2s}

 plot(1:3, ylab = expression(Superscript in greek letters ( * mu^2 ~ m))
, xlab = expression(Subscript in greek letters ~ mu[2]* ~ pi)
   , main = expression(atop(Happy Easter ,to all R-Helpers)))


 For using greek letters, am still a bit confused when needing a
 * though...e.g. seems it needs a * in front of greek letter
 expressions, when applying 'expression (...)'. And a * seems not
 required when a greek letter is needed outside the double quotations, e.g.

when applying just 'expression(...)'.  Again, a * is needed when making
subscript as shown above...

It seems ~ is reserved for making spaces before/between greek letters.
What if we need ~ in the title as ~ is a standard notation in
statistics when expressing is from when writing down a distribution, e.g.
'X~N(0,1)'...

HJ

















 On Tue, Mar 27, 2012 at 2:39 PM, Gerrit Eichner 
 gerrit.eich...@math.uni-giessen.de wrote:

 Hi, HJ,

 see

 ?plotmath

  Hth  --  Gerrit

 --**--**-
 Dr. Gerrit Eichner   Mathematical Institute, Room 212
 gerrit.eich...@math.uni-**giessen.de gerrit.eich...@math.uni-giessen.de  
 Justus-Liebig-University Giessen
 Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
 Fax: +49-(0)641-99-32109
 http://www.uni-giessen.de/cms/**eichnerhttp://www.uni-giessen.de/cms/eichner
 --**--**-



 On Tue, 27 Mar 2012, HJ YAN wrote:

  Dear R-help,

 I am trying to express myself as best as I can here. If you also use
 Latex
 to edit math reports or other languages with similar editing method,
 you'll see what I'm talking about. My sincere appologies if my question
 is
 not clear enough to some extend, as also I'm not able to provide my code
 here because I don`t know which one I can use...

 When editing the title in R plots, such as using 'plot', or 'xyplot' in
 'lattic', what method do you use to write greek letters and make use of
 superscript and subscript, e.g. to write mathematical expressions like
 using Latex:

 \sigma^2
 \tau^{2s}
 \mu_i
 \pi_{2s}

 Also I would like to learn how to make two lines in the main title or sub
 title if the text I need it too long for putting in a single line, e.g.
 are
 there some R code/syntax allowing me to do something like in Latex to
 make
 two lines in the title, for example using '//' or '\\' to seperate the
 two
 parts of the text I want to put in two lines??

 I heard about using something like

 plot(x,y, main=expression())

 but from neither '?plot' or '?expression' could I find comprehensive
 information about what I need...

 Many thanks!
 HJ

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error : package is not installed for 'arch=x64'

2012-03-22 Thread Li, Yan
Hi Uwe,

Thanks very much for your reply! 

It is in Windows OS and I read the manual, installed the latest version of 
RTools. I also have MinGW64 compiler installed. I still kept getting this 
errors. 

Regards,
Yan

-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent: Sunday, March 18, 2012 1:29 PM
To: Li, Yan
Cc: r-help@r-project.org
Subject: Re: [R] Error : package is not installed for 'arch=x64'



On 13.03.2012 18:18, Li, Yan wrote:
 HI All,

 I got the error : package  is not installed for 'arch=x64' when building my 
 own package for 64bit R. How can I configure the arch ? The 'R CMD config' 
 does not work. Thank you very much!

Which OS?

If Linux: run R with the desired architecture and install.package().

If Windows: See the R Installation and Administration manual that explains the 
steps to set the environment up correctly. And if you start with it, use R = 
2.14.2 so that you install the new toolchain to be compatible for future 
releases.

Uwe Ligges





 The detailed error is :

 ** testing if installed package can be loaded Error : package 'xxx' is 
 not installed for 'arch=x64'
 Error: loading failed
 Execution halted
 ERROR: loading failed

 Best,
 Yan

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Check results between two data.frame

2012-03-21 Thread HJ YAN
Dear R-user,

I'm trying to compare two sets of results and wanted to find out which
element in the two data frame/matrix are different.

I wrote the following function and it works ok, and gives me a long list of
good as outcomes.


CHECK-
function (x = file1, y = file2)
{
for (i in 1:nrow(x)) {
for (j in 1:ncol(x)) {
if (x[i, j] == y[i, j]) {
print(good)
}
else {
print(check)
}
}
}
}


However, as the two datasets I was comparing are large (400*100 roughly),
so I would like to create a matrix to identify which ones are not same in
the two dataframes.

So I added 'CHECK_XY' in my code but  when I run it, I got 'Error in
CHECK_XY[i, j] = c(good) : subscript out of bounds'.

Could anyone help please??

CHECK_1-
function (x = file1, y = file2)
{
NROW - nrow(x)
NCOL - ncol(x)
CHECK_XY - as.matrix(NA, NROW, NCOL)
for (i in 1:nrow(x)) {
for (j in 1:ncol(x)) {
if (x[i, j] == y[i, j]) {
CHECK_XY[i, j] = c(good)
}
else {
CHECK_XY[i, j] = c(check)
}
}
}
print(CHECK_XY)
}

Thanks!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 'Unexpected numeric constant'

2012-03-19 Thread HJ YAN
Dear R-help,

I am trying to rename the variables in a dataframe, called 'T1A' here.
Seems renaming was successful, but when I call one of the variable I got
error message and I wanted to know why.


The data frame contains 365 rows and 49 columns. I would like to name the
first column `DATE` and the others T0.5, T1, T1.5,...,T24 (as this is a set
of data collected every half hour for a whole year).

Original data is saved as csv file and column 2-49 are named in format
'00:30,01:00,01:30,...,23:30,00:00'. When I read them into R by using
read.csv, the column names are changed automatically as 'X0.30.00,
X1.00.00,...,X23.30.00,X0.00.00' , which dont look great (i mean I would
prefer it in a format as 'hh:mm', NOT using 'dot' between numbers that used
to indicate time, but I have not found a solution...). So I decided to use
a simplified version as above, e.g. T0.5, T1, T1.5,...,T24 and my code is:


TIME-paste(rep(T,48),as.character(seq(0.5,24,by=0.5)))
names(T1A)-c(DATE,TIME)

 class(T1A$T0.5)  ## without a space between 'T' and '0.5'
[1] NULL
 class(T1A$T 0.5)  ## with a space between 'T' and '0.5'
Error: unexpected numeric constant in class(T1A$T 0.5


I also tried the code below, but got same error message...

 TIME-paste(rep(T,48),seq(0.5,24,by=0.5))
names(T1A)-c(DATE,TIME)


However, if I do not change the columns' name then everything works
fine, e.g. I can call the variables with no problem.

class(T1A$X00.30.00)
[1] numeric

Any thoughts??


Many thanks!!!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importing multiple worksheets from one Excle/ csv file into R

2012-03-15 Thread HJ YAN
Dear R experts,

I am trying to import some data from some Excle files into R. My Excle file
contains about 50 sheets.

One solution I can think about is to convert my Excle file into csv file
first and then load data into R using 'read.csv'.

But it seems to me that 'read.csv' only supports reading one sheet (or 'one
file') each time, so that seems I have to create 50 csv files and do 'copy
and paste' work 50 times which is not ideal!

Alternatively I heard about a package 'xlsReadWrite' and created a 3 sheets
example (e.g. 3 sheets in one Excle file, saved at 'Z:/WORK_2012/Data/' on
my PC and is called 'test.xls' ) But my code failed to work.

-
library(xlsReadWrite)
data1-read.xls(Z:/WORK_2012/Data/test.xls)

 Error in .Call(ReadXls, file, colNames, sheet, type, from, rowNames,  :
  Incorrect number of arguments (11), expecting 10 for 'ReadXls'
--

By reading the error message I thought the error message trys to tell me
that I need to set some arguments, so I found all the arguments from

http://127.0.0.1:12275/library/xlsReadWrite/html/read.xls.html

and put them in the following code...

-
data1-read.xls(Z:/WORK_2012/Data/test.xls,colNames=TRUE,sheet=1,
type=data.frame,from=1,rowNames=TRUE,checkNames=TRUE,dateTime=isodate,
naStrings=NA,stringsAsFactors=TRUE)

Error in .Call(ReadXls, file, colNames, sheet, type, from, rowNames,  :
  Incorrect number of arguments (11), expecting 10 for 'ReadXls'

It would be great if anyone can let me know where the code went wrong and
any suggestion on how to load multiple sheets into R please??

If 'read.xls' works, I would think by setting 'sheet=c(1,2,3)' might do the
job, e.g. reading sheet1, sheet2 and sheet3, assuming sheet1, sheet2 and
sheet3 having same data structures, e.g. same number of columns and same
name of each columns. As there is no argument telling 'read.xls' how to
attach the data together if they are from multiple sheets, e.g. 'by row' or
by 'column', I still can not see how to read multiple sheets from one Excle
file or one csv file and put them into one R data.frame.

Or does anyone ever used any packages in part 8 shown in the following link
that can help to do the job I mentioned here??

http://cran.r-project.org/doc/manuals/R-data.html#Spreadsheet_002dlike-data


Many thanks in advance!

HJ









I know how to import one single worksheet in one file but would like to
know how to import data from .csv file containning multiple worksheets.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Converting factor data into Date-time format

2012-03-13 Thread Haojie Yan
Dear R-user,

I have read a dataset from .csv file into R. This dataset includes one
column containing some data in 'date and time' format, e.g. 'dd/mm/
hh:mm'.

These data were automatically read and saved as 'factor' in R. When I was
trying to produce some plots (such as time series) with the above 'date and
time' on x-axis,  it caused some disodering problem, e.g. 1st of March 2012
is in front of 10th of Feb. 2012 (if the data is from 10th Feb. 2012 to 1st
of March 2012). I understand that I might have to convert them from
'factor' to 'date' first, so I tried using 'as.date'. But this method seems
only work for data in format of  'd/m/y' and no further option that allows
me to add hours and minutes.

I checked online for other methods such as 'as.POSIX' and 'strptime' but
none of them seem to offer me a quick solution.

Please note that the data I received is recorded every 10 minutes so they
are saved in the form of  'dd/mm/ hh:mm', e.g. I only have data
measured up to 'minute' NOT to  'second'. Are there any direct solution
that I can solve this issue??


Many thanks in advance!
HJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error : package is not installed for 'arch=x64'

2012-03-13 Thread Li, Yan
HI All,

I got the error : package  is not installed for 'arch=x64' when building my own 
package for 64bit R. How can I configure the arch ? The 'R CMD config' does not 
work. Thank you very much!

The detailed error is :

** testing if installed package can be loaded
Error : package 'xxx' is not installed for 'arch=x64'
Error: loading failed
Execution halted
ERROR: loading failed

Best,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using jit

2012-02-29 Thread yan jiao

Dear R gurus,

I'm trying to use jit package to parallel my computing
do you put jit(2) /jit(1) in front of every loops? I got 8 nested loops 
in my code.


many thanks

yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] discrete simulated annealing

2012-01-30 Thread yan jiao

Dear All,

I need to use simulated annealing for optimization
is there a way to limit the search place to only discrete values? And 
also exclude certain solutions, e.g. exclude the solutions when all the 
variables are the same?


many thanks

Yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for grouping

2012-01-26 Thread yan
what if I don't need to store the combination results, I just want to get the
combination result for a given row.
e.g. for the 5 elements divided into 3 groups , and if I give another
parameter which is the row number of the results, in petr's example, say if
I set row number to 1, then I get 1,2,3,1,1.

So I need a systematic way of generating the combination, but I don't need
all the combinations in one go.

Many thanks


yan

--
View this message in context: 
http://r.789695.n4.nabble.com/function-for-grouping-tp4324436p4330877.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function restrictedparts

2012-01-25 Thread yan jiao

I am using function restrictedparts, but got error:


restrictedparts(281,10)
Error in integer(len) : vector size specified is too large
Calls: restrictedparts - integer
In addition: Warning message:
In restrictedparts(281, 10) : NAs introduced by coercion
Error in integer(len) : vector size specified is too large
Calls: restrictedparts - integer


is there a similar function can deal with long vector?

I'm using R version 2.14.1 (2011-12-22),x86_64, linux-gnu

many thanks

yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for grouping

2012-01-25 Thread yan
thanks petr,
what if I got 200 elements, so I have to write expand.grid(x1=1, x2=1:2,
x3=1:3, x4=1:3, x5=1:3x200=1:3))?

Many thanks

yan

--
View this message in context: 
http://r.789695.n4.nabble.com/function-for-grouping-tp4324436p4327812.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function for grouping

2012-01-24 Thread yan jiao

Dear All,

I'm wondering if there is a R function could give me all the 
combinations of the grouping/cluster result, given the number of the groups.

e.g.
3 objects: x1 x2 x3, number of groups is 2
so the result will be
group1:x1,x2; group2: x3
group1: x1;group2: x2,x3
group1: x1,x3;group2: x2

many thanks

yan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to remove the redundant rows of a matrix

2011-10-31 Thread yan liu
Hi,

I have a matrix like this, two columns: a, b

 a b  c1 21  c2 19  c2 20  c2 20  c4 25  c5 18  c5 18
how to prepare a new matrix removing the rows with repeated units in b
columns, like there are two 20 and two 18 in b column (others are unique in
b col).  I wanna a pure matrix with one 20 and one 18.

Thanks a lot!

Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cut data into sevral group and assign calculated values individually

2011-10-18 Thread Li, Yan
Thanks for your reply.

Let me make an example then:

m- c(150, 400, 500,750,800, NA)

How can I use cut to generate the m_group as c(0,0.4755,1, 0.2275,0,0):

Breaks  331.04  476.07  608.66   791.5  
  NA
m_group0 x  1   x   0   
   0

Thank you very much!

Regards,
Yan

-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent: Tuesday, October 18, 2011 5:23 AM
To: Li, Yan
Cc: r-help@r-project.org
Subject: Re: [R] cut data into sevral group and assign calculated values 
individually



On 17.10.2011 20:53, Li, Yan wrote:
 Hi All,

 I have some data from which I set four points to be breaks. Based on these 
 points,  I cut the dataset into four groups and assign a number to it:

 =331.04 assign 0
 331.04=476.07 assign data-331.04/(476.07-331.04)
 476.07=608.66  assign 1
 608.66=791.5 assign 791.5- data/(791.5-608.66)
 791.5 and NA assign 0


 Breaks  331.04  476.07  608.66   791.5
 NA
 m_group0 x 1 x
00


 I can use cut() to group the data according to the breaks but having 
 difficulty to assign the two calculated interval values.



Based on the levels of the factor resulted by cut(), you can calculate a 
new vector easily. But since you have not specified a reproducible 
example, I cannot quickly change it in order to show how it works.

Best,
Uwe Ligges



 Thank you very much!

 Regards,
 Yan

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cut data into sevral group and assign calculated values individually

2011-10-17 Thread Li, Yan
Hi All,

I have some data from which I set four points to be breaks. Based on these 
points,  I cut the dataset into four groups and assign a number to it:

=331.04 assign 0
331.04 =476.07 assign data-331.04/(476.07-331.04)
476.07=608.66  assign 1
608.66 =791.5 assign 791.5- data/(791.5-608.66)
 791.5 and NA assign 0


Breaks  331.04  476.07  608.66   791.5  
  NA
m_group0 x 1 x  
 00


I can use cut() to group the data according to the breaks but having difficulty 
to assign the two calculated interval values.

Thank you very much!

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] qcc package

2011-10-14 Thread Li, Yan
Hi All,

I installed qcc package and the dependency packages. For the first time I can 
use the function : process.capability.sixpack(). But later when ran the code 
again I always got the following error: Error: could not find function 
process.capability.sixpack. I tried reinstalling the qcc package but didn't 
help. Does anyone have this kind of experience? Thank you!

Regards,
Yan


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] iSeries and R

2011-09-01 Thread Li, Yan
Hi All,

Does anyone has experiences installing R in iSeries? Does R supports iSeries? 
Any documentation on this topic? Thank you very much!

Regards,
Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Central limit theorem

2011-08-14 Thread maggy yan
my data looks like this:

 PM10   Ref   UZ JZ WT   RH   FT   WR
1   10.973195  4.338874 nein Winter   Dienstag   ja nein West
26.381684  2.250446 nein SommerSonntag nein   ja  Süd
3   62.586512 66.304869   ja SommerSonntag nein nein  Ost
45.590101  8.526152   ja Sommer Donnerstag nein nein Nord
5   30.925054 16.073091 nein WinterSonntag nein nein  Ost
6   10.750567  2.285075 nein Winter   Mittwoch nein nein  Süd
7   39.118316 17.128691   ja SommerSonntag nein nein  Ost
89.327564  7.038572   ja Sommer Montag nein nein Nord
9   52.271744 15.021977 nein Winter Montag nein nein  Ost
10  27.388416 22.449102   ja Sommer Montag nein nein  Ost
11   6.460829  4.486329   ja WinterSamstag nein nein  Süd
12   5.937690 10.247768   ja SommerSonntag nein nein Nord
13  14.004685  5.155790 nein WinterSonntag nein nein Nord
14  12.244333  7.063825   ja Sommer   Mittwoch nein   ja Nord
15  35.195541 12.148438 nein Winter Montag nein nein  Ost
.
.
.
.
til 200
now I should illustrate the Central limit theorem with my data. I need to
make 80 times the arithmetic means of each of the 100 poisson distributet
random numbers with an expected value 7.
the hint says I need a metrices first which includes all of the 8000 values.
but I have no idea where the 8000 values are and how to make the matrices.
please help me

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] linear regression

2011-08-13 Thread maggy yan
dear R users,
my data looks like this

 PM10   Ref   UZ JZ WT   RH   FT   WR
1   10.973195  4.338874 nein Winter   Dienstag   ja nein West
26.381684  2.250446 nein SommerSonntag nein   ja  Süd
3   62.586512 66.304869   ja SommerSonntag nein nein  Ost
45.590101  8.526152   ja Sommer Donnerstag nein nein Nord
5   30.925054 16.073091 nein WinterSonntag nein nein  Ost
6   10.750567  2.285075 nein Winter   Mittwoch nein nein  Süd
7   39.118316 17.128691   ja SommerSonntag nein nein  Ost
89.327564  7.038572   ja Sommer Montag nein nein Nord
9   52.271744 15.021977 nein Winter Montag nein nein  Ost
10  27.388416 22.449102   ja Sommer Montag nein nein  Ost

.

.

.

.

til 200


I'm trying to make a linear regression between PM10 and Ref for each of the
four WR, I've tried this:
plot(Nord$PM10 ~ Nord$Ref, main=Nord, xlab=Ref, ylab=PM10)
but it does not work, because Nord cannot be found
what was wrong? how can I do it? please help me

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] submit R package

2011-05-17 Thread Yan Jiao
Dear all,

I'm just wondering how to submit a package to cran?
I followed the instruction, using anonymous as username and my email address as 
password, but it didn't connect.

Any hints?

Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question of VECM restricted regression

2011-04-29 Thread Meilan Yan
Dear Colleague

  I am trying to figure out how to use R to do OLS restricted VECM regression. 
However, there are some notation I cannot understand.

Please tell me what is 'ect',  'sd' and 'LRM.dl1  in the following practice:

#OLS retricted VECM regression
data(denmark)
sjd - denmark[, c(LRM, LRY, IBO, IDE)]
sjd.vecm- ca.jo(sjd, ecdet = const, type=eigen, K=2, spec=longrun,
season=4)
sjd.vecm.rls-cajorls(sjd.vecm,r=1)
summary(sjd.vecm.rls$rlm)
sjd.vecm.rls$beta

Response LRM.d :
Call:
lm(formula = substitute(LRM.d), data = data.mat)

Residuals:
  Min1QMedian3Q   Max
-0.027598 -0.012836 -0.003395  0.015523  0.056034

Coefficients:
 Estimate Std. Error t value Pr(|t|)
ect1-0.212955   0.064354  -3.309  0.00185 **
sd1 -0.057653   0.010269  -5.614 1.16e-06 ***
sd2 -0.016305   0.009177  -1.777  0.08238 .
sd3 -0.040859   0.008767  -4.660 2.82e-05 ***
LRM.dl1  0.049816   0.191992   0.259  0.79646
LRY.dl1  0.075717   0.157902   0.480  0.63389
IBO.dl1 -1.148954   0.372745  -3.082  0.00350 **
IDE.dl1  0.227094   0.546271   0.416  0.67959

 sjd.vecm.rls$beta
  ect1
LRM.l21.00
LRY.l2   -1.032949
IBO.l25.206919
IDE.l2   -4.215879


Many thanks
Meilan





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to remove the double or single quote from a string (unquote?)?

2011-04-18 Thread JingJiang Yan

Is there a function to get a string without a pair of quotes around it?

I have several expressions like:
glm(V12 ~ V3, family=binomial, data=df1)
glm(V12 ~ V4, family=binomial, data=df1)
...
glm(V12 ~ V8, family=binomial, data=df1)

As you can see, the only differences among them are V3 ... V8.
Because sometimes several of these expressions are performed many times,
I want to use a variable i to change the V3 ... V8. I did this with:

 i - 3:8
 glm(V12 ~ paste(V, i, sep=), family=binomial, data=df1)

However, it seems the paste always returns a variable name with a pair 
of quotes, which were wrong in such condition.
I only find a function sQuote to add quotes to a string, and it looks 
I am looking for an opposite function of it.

Any advice will be appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function order

2011-04-06 Thread Yan Jiao
Dear All

I'm trying to sort a matrix using function order,
Some thing really odd:

e.g.
abc-cbind(c(1,6,2),c(2,5,3),c(3,2,1))## matrix I want to sort

if I do
abc[ order(abc[,3]), increasing = TRUE]

the result is correct
 [,1] [,2] [,3]
[1,]231
[2,]652
[3,]123

But if I want to sort in decresing order:
abc[ order(abc[,3]), decreasing = TRUE]

the result is wrong
 [,1] [,2] [,3]
[1,]231
[2,]652
[3,]123

Also if I use
abc[ order(abc[,3]), increasing = FALSE]
it returns nothing
[1,]
[2,]
[3,]

Why is that?


Many thanks

Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] add zero in front of numbers

2011-04-04 Thread Yan Jiao
Dear R users,

I need to add 0 in front of a series of numbers, e.g. 1-001, 19-019,
Is there a fast way of doing that?

Many thanks

yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regression line on boxplots

2011-04-01 Thread Yan Jiao
Dear R users,

I'm trying to add a regression line on my boxplots (something 
like:boxplot(c(1:3),c(4:6),c(5:8)))
But I can't see it.
Please help !!!
It's not a April fool's joke!!!

Yan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >