Re: [R] working on a data frame
Hi I like to use logical values directly in computations if possible. yourData[,10] - yourData[,9]/(yourData[,8]+(yourData[,8]==0)) Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used in computations. If you really want to change 0 to 1 in column 8 you can use yourData[,8] - yourData[,8]+(yourData[,8]==0) without ifelse stuff. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of William Dunlap Sent: Friday, July 25, 2014 8:07 PM To: Matthew Cc: r-help@r-project.org Subject: Re: [R] working on a data frame if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] You could do express this in R as is8Zero - yourData[,8] == 0 yourData[is8Zero, 8] - 1 yourData[is8Zero, 10] - yourData[is8Zero,9] / yourData[is8Zero,8] Note how logical (Boolean) values are used as subscripts - read the '[' as 'such that' when using logical subscripts. There are many more ways to express the same thing. (I am tempted to change the algorithm to avoid the divide by zero problem by making the quotient (numerator + epsilon)/(denominator + epsilon) where epsilon is a very small number. I am assuming that the raw numbers are counts or at least cannot be negative.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 25, 2014 at 10:44 AM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Thank you for your comments, Peter. A couple of questions. Can I do something like the following ? if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] I think I am just going to have to learn more about R. I thought getting into R would be like going from Perl to Python or Java etc., but it seems like R programming works differently. Matthew On 7/25/2014 12:06 AM, Peter Alspach wrote: Tena koe Matthew Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division.. That being the case, think in terms of vectors, as Sarah says. Try: yourData[,10] - yourData[,9]/yourData[,8] yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9] This doesn't change the 0 to 1 in column 8, but it doesn't appear you actually need to do that. HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew McCormack Sent: Friday, 25 July 2014 3:16 p.m. To: Sarah Goslee Cc: r-help@r-project.org Subject: Re: [R] working on a data frame On 7/24/2014 8:52 PM, Sarah Goslee wrote: Hi, Your description isn't clear: On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu mailto:mccorm...@molbio.mgh.harvard.edu wrote: I am coming from the perspective of Excel and VBA scripts, but I would like to do the following in R. I have a data frame with 14 columns and 32,795 rows. I want to check the value in column 8 (row 1) to see if it is a 0. If it is not a zero, proceed to the next row and check the value for column 8. If it is a zero, then a) change the zero to a 1, b) divide the value in column 9 (row 1) by 1, Row 1, or the row in which column 8 == 0? All rows in which the value in column 8==0. Why do you want to divide by 1? Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division. This is a fairly standard thing to do with this data. (The data are measurements of amounts at two time points. Sometimes a thing will not be present in the beginning (0), but very present at the later time. Column 10 is the log2 of the change. Infinite is not an easy number to work with, so it is common to change the 0 to a 1. On the other hand, something may be present at time 1, but not at the later time. In this case column 10 would be taking the log2 of a number divided by 0, so again the zero is commonly changed to a one in order to get a useable value in column 10. In both the preceding cases there was a real change, but Inf and NaN are not helpful.) c) place the result in column 10 (row 1) and Ditto on the row 1 question. I want to work on all rows where column 8 (and column 9) contain a zero. Column 10 contains the result of the value in column 9 divided by the value in column 8. So, for row 1, column 10 row 1 contains the ratio column 9 row 1 divided by column 8 row 1, and so on through the whole 32,000 or so rows. Most rows do not have a zero in columns 8 or 9. Some rows have zero in column 8 only, and some
[R] lattice, latticeExtra: Adding moving averages to double y plot
Hi lattice users, I would like to add 5-year moving averages to my double y-plot. I have three factors needs to be plotted with moving averages in the same plot. One of these reads off y-axis 1 and two from y-axis 2. I have tried to use the rollmean function from the zoo-packages, but I fail in insering this into lattice (I am not an experienced lattice user). I want to keep the data points in the plot. Find below dummy data and the script as well as annotations further describing my question. thank you in advance! Anna Zakrisson mydata- data.frame( Year = 1980:2009, Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)), Value = rnorm(90, mean = seq(90), sd = rep(c(6, 7, 3), each = 10))) library(Lattice) library(LatticeExtra) stuff1data - mydata[(mydata$Type) %in% c(stuff1), ] stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ] # make moving averages function using zoo and rollmean: library(zoo) library(plyr) f - function(d) { require(zoo) data.frame(Year = d$Year[5:length(d$Year)], mavg = rollmean(d$Value, 5)) } # Apply the function to each group as well as both data frames: madfStuff1 - ddply(stuff1data, Type, f) madfStuff2_3 - ddply(stuff12_3data, Type, f) # Some styles: myStripStyle - function(which.panel, factor.levels, ...) { panel.rect(0, 0, 1, 1, col = bgColors[which.panel], border = 1) panel.text(x = 0.5, y = 0.5, font=2, lab = factor.levels[which.panel], col = txtColors[which.panel]) } myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black, lty=1, pch=1, ylab = sweets, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = 1,col = black) panel.lmline(x,y,col = black, data=madfStuff1) # here I presume that panel.lmline is wrong. # I would like to have my 5 year moving average here, not a straight line. }) myplot1 myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black, lty=1, pch=1, ylab = hours, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = c(2:3),col = black) ## what is this pch defining? Types? #I would like to have different symbols and line types for stuff2 and stuff3 panel.lmline(x,y,col = black, data=madfStuff2_3) # wrong! Need my moving averages here! }) myplot2 doubleYScale(myplot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE, text = c(stuff1, stuff2, stuff3), columns = 2, col=black) # problem here is that I end up with two lines. I need a double y-plot with one moving average plots that are read off y-axis 1 # and two that reads off y-axis 2. I need to keep the data points in the plot. update(trellis.last.object(), par.settings = simpleTheme(col = c(black, black), lty=c(1:3), pch=c(1:3))) # how come that I only get # lines in my legend text and not the symbols too. I thought pch would add symbols?!? Anna Zakrisson Braeunlich PhD student Department of Ecology, Environment and Plant Sciences Stockholm University Svante Arrheniusv. 21A SE-106 91 Stockholm Sweden/Sverige Lives in Berlin. For paper mail: Katzbachstr. 21 D-10965, Berlin Germany/Deutschland E-mail: anna.zakris...@su.se Tel work: +49-(0)3091541281 Mobile: +49-(0)15777374888 LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b º`. . `. . `. . º`. . `. . `. .º`. . `. . `. .º [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Differencing between 2 previous values
Hello All, I am trying to do a simple thing of calculating the absolute difference between 2 previous values. Since my original data consists of 30 rows, this column where I am storing my absolute difference values only consists of 29 rows (called the ?differ?)! And I am having troubling cbind ing the 2 columns. Is there any way I can make the first row of ?differ? column as NA? So my data looks like following dput(data) tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52)), .Names = c(week, value), row.names = c(NA, -30L), class = data.frame) This is how I calculate my ?diff? column: differ - abs(diff(data$value)) Which gives me the following results: [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96 [13] 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90 [25] 0.48 0.70 1.24 0.31 0.79 As you can see this only contains 29 rows, so when I try to cbind it to my current data, I have an error. cbind(differ,data) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 29, 30 What I ideally want is my new dataset to look as: WeekValue Differ 1 9.45 NA 2 7.991.46 3 9.291.30 And so on?. *** MORE THN is a trading style of Royal Sun Alliance Insurance plc (No. 93792). Registered in England and Wales at St. Markâs Court, Chart Way, Horsham, West Sussex, RH12 1XL. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differencing between 2 previous values
Hi see in line -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Pavneet Arora Sent: Monday, July 28, 2014 11:08 AM To: r-help@r-project.org Subject: [R] Differencing between 2 previous values Hello All, I am trying to do a simple thing of calculating the absolute difference between 2 previous values. Since my original data consists of 30 rows, this column where I am storing my absolute difference values only consists of 29 rows (called the ?differ?)! And I am having troubling cbind ing the 2 columns. Is there any way I can make the first row of ?differ? column as NA? So my data looks like following dput(data) tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52)), .Names = c(week, value), row.names = c(NA, -30L), class = data.frame) This is how I calculate my ?diff? column: differ - abs(diff(data$value)) differ - c(NA, abs(diff(data$value))) Regards Petr Which gives me the following results: [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96 [13] 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90 [25] 0.48 0.70 1.24 0.31 0.79 As you can see this only contains 29 rows, so when I try to cbind it to my current data, I have an error. cbind(differ,data) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 29, 30 What I ideally want is my new dataset to look as: WeekValue Differ 1 9.45 NA 2 7.991.46 3 9.291.30 And so on?. *** *** *** ** MORE THN is a trading style of Royal Sun Alliance Insurance plc (No. 93792). Registered in England and Wales at St. Mark’s Court, Chart Way, Horsham, West Sussex, RH12 1XL. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. *** *** *** *** [[alternative HTML version deleted]] Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of
Re: [R] Differencing between 2 previous values
On 28 Jul 2014, at 11:08 , Pavneet Arora pavneet.ar...@uk.rsagroup.com wrote: Hello All, I am trying to do a simple thing of calculating the absolute difference between 2 previous values. Since my original data consists of 30 rows, this column where I am storing my absolute difference values only consists of 29 rows (called the ?differ?)! And I am having troubling cbind ing the 2 columns. Is there any way I can make the first row of ?differ? column as NA? So my data looks like following dput(data) tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52)), .Names = c(week, value), row.names = c(NA, -30L), class = data.frame) This is how I calculate my ?diff? column: differ - abs(diff(data$value)) Which gives me the following results: [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96 [13] 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90 [25] 0.48 0.70 1.24 0.31 0.79 As you can see this only contains 29 rows, so when I try to cbind it to my current data, I have an error. cbind(differ,data) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 29, 30 What I ideally want is my new dataset to look as: WeekValue Differ 1 9.45 NA 2 7.991.46 3 9.291.30 And so on?. The straightforward way is data$Differ - c(NA, abs(diff(data$value))) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Differencing between 2 previous values
Thank you for that simple answer. Really appreciate it. From: PIKAL Petr petr.pi...@precheza.cz To: Pavneet Arora/UK/RoyalSun@RoyalSun, r-help@r-project.org r-help@r-project.org Date: 28/07/2014 10:26 Subject:RE: [R] Differencing between 2 previous values Hi see in line -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Pavneet Arora Sent: Monday, July 28, 2014 11:08 AM To: r-help@r-project.org Subject: [R] Differencing between 2 previous values Hello All, I am trying to do a simple thing of calculating the absolute difference between 2 previous values. Since my original data consists of 30 rows, this column where I am storing my absolute difference values only consists of 29 rows (called the ?differ?)! And I am having troubling cbind ing the 2 columns. Is there any way I can make the first row of ?differ? column as NA? So my data looks like following dput(data) tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52)), .Names = c(week, value), row.names = c(NA, -30L), class = data.frame) This is how I calculate my ?diff? column: differ - abs(diff(data$value)) differ - c(NA, abs(diff(data$value))) Regards Petr Which gives me the following results: [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96 [13] 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90 [25] 0.48 0.70 1.24 0.31 0.79 As you can see this only contains 29 rows, so when I try to cbind it to my current data, I have an error. cbind(differ,data) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 29, 30 What I ideally want is my new dataset to look as: WeekValue Differ 1 9.45 NA 2 7.991.46 3 9.291.30 And so on?. *** *** *** ** MORE THN is a trading style of Royal Sun Alliance Insurance plc (No. 93792). Registered in England and Wales at St. Markâ??s Court, Chart Way, Horsham, West Sussex, RH12 1XL. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. *** *** *** *** [[alternative HTML version deleted]] Tento e-mail a jakékoliv k nÄmu pÅipojené dokumenty jsou důvÄrné a jsou urÄeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavÄ neprodlenÄ jeho odesÃlatele. Obsah tohoto emailu i s pÅÃlohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávnÄni tento email jakkoliv užÃvat, rozÅ¡iÅovat, kopÃrovat Äi zveÅejÅovat. OdesÃlatel e-mailu neodpovÃdá za eventuálnà škodu způsobenou modifikacemi Äi zpoždÄnÃm pÅenosu e-mailu. V pÅÃpadÄ, že je tento e-mail souÄástà obchodnÃho jednánÃ: - vyhrazuje si odesÃlatel právo ukonÄit kdykoliv jednánà o uzavÅenà smlouvy, a to z jakéhokoliv důvodu i bez uvedenà důvodu. - a obsahuje-li nabÃdku, je adresát oprávnÄn nabÃdku bezodkladnÄ pÅijmout; OdesÃlatel tohoto e-mailu (nabÃdky) vyluÄuje pÅijetà nabÃdky ze strany pÅÃjemce s dodatkem Äi odchylkou. - trvá odesÃlatel na tom, že pÅÃsluÅ¡ná smlouva je uzavÅena teprve výslovným dosaženÃm shody na vÅ¡ech jejÃch náležitostech. - odesÃlatel tohoto emailu informuje, že nenà oprávnÄn uzavÃrat za spoleÄnost žádné smlouvy s výjimkou pÅÃpadů, kdy k tomu byl pÃsemnÄ zmocnÄn nebo pÃsemnÄ povÄÅen a takové povÄÅenà nebo plná moc byly adresátovi tohoto emailu pÅÃpadnÄ osobÄ, kterou adresát zastupuje, pÅedloženy nebo jejich existence je adresátovi Äi osobÄ jÃm zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be
Re: [R] Determine all specific same dates between two given dates
Many thanks for you help Uwe! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and external C library cannot open shared object file while LD_LIBRARY_PATH is set
Thanks but that doesn't work: R cannot load a simple external library even if the full path to the directory is specified in the LD_LIBRARY_PATH. I posted a minimal example on gist.github: https://gist.github.com/lindenb/7cd766cbb37de01f6cce The simple C file is compiled but I'm not able to load the library. Pierre (...) ( cross-posted on SO: http://stackoverflow.com/questions/24955829/ ) I'm building a C extension for R, this library also uses the HDF5 library. I compiled a dynamic library (gcc flags: -fPIC -shared -Wl,-soname,libmy.so the library seems to be loaded but R is still missing the symbols from the hdf5 library: I am not any kind of expert, so that this as just a vague possibility from someone who just wants to try to help. But I am concerned about the options in the above. In particular, you have -Wl,-soname,libmy.so Which looks slightly wrong to me, based on the man for ld. I think that, perhaps, this should be: -Wl,-soname=libmy.so. Notice the = instead of the , __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function assignment
Thank you very much! The idea was to use it with an external reference and kind of to write an constructor for an object which points at this external reference and behaves like an R object. So it would be the same as if I do just name - function(name, someValuesForObjectConstruction) but I don't have to provide the names twice. Where function returns an object which stores the string name and knows get, set, ... and since with setReplaceMethod(f=[, signature=myExternList, definition=function(x, i=character, j=missing, y){ setExternReference(i, y) return(x) } I can get nicely the name inside the bracket (x[name] - ) it seemed possible to get name some how out of an name - assignment. Again thank you very much for hints and advice. Florian Ryan florian.r...@aim.com -Original Message- From: peter dalgaard pda...@gmail.com To: Jeff Newmiller jdnew...@dcn.davis.ca.us Cc: Florian Ryan florian.r...@aim.com; r-help r-help@r-project.org Sent: Sat, Jul 26, 2014 10:04 pm Subject: Re: [R] Function assignment On 26 Jul 2014, at 17:01 , Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: What an awful idea... that would lead to incredibly hard-to-debug programs. No, you cannot do that. What kind of problem has led you to want such a capability? Perhaps we can suggest a simpler way to think about your problem. I agree that this is a silly idea, but I actually thought that it could be done by clever manipulation of the call stack. It can if you do the assignment with assign(): foo - function()sys.calls()[[1]][[2]] assign(z, foo()) z [1] z assign(bah, foo()) bah [1] bah but if you do x - foo(), there is no mention of x or x in sys.calls(). Anyways, functions that assume being called in a specific are asking for trouble in all cases where they get called differently. -pd --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 26, 2014 5:29:59 AM PDT, Florian Ryan florian.r...@aim.com wrote: Hello, I would like to use the variable name which i assign the return value of a function in a function. Is that possible? e.g. foo - function(){ some not to me known R magic } myVariableName - foo() myVariableName [1] myVariableName Hope someone can help me. Thanks Florian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a package for EFA with multiple groups?
Hi Elizabeth, In confirmatory factor analysis with multiple groups, the reason one needs to estimate the models simultaneously is that, typically, one is interested in applying constraints (e.g., forcing all or some of the factor loadings to be equal across groups). In exploratory factor analysis, constraints are uncommon (they are somewhat un-exploratory). I would suggest simply using the psych package and subsetting your data to the particular group, as in: efa( data = subset(data, Group == Group1) ) efa( data = subset(data, Group == Group2) ) etc. As you noted, lavaan will allow you to test multiple group CFAs, so if/when you are ready to see whether the same configural factor structure or any other level of invariance holds across your groups, you can use it. Sincerely, Josh On Mon, Jul 28, 2014 at 2:46 PM, Elizabeth Barrett-Cheetham ebarrettcheet...@gmail.com wrote: Hello R users, Iâm hoping to run an exploratory and confirmatory factor analysis on a psychology survey instrument. The data has been collected from multiple groups, and itâs likely that the data is hierarchical/has 2nd order factors. It appears that the lavaan package allows me to run a multiple group hierarchical confirmatory factor analysis. Yet, I canât locate a package that can run the equivalent exploratory analysis. Could anyone please direct me to an appropriate package? Many thanks, Elizabeth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua F. Wiley Ph.D. Student, UCLA Department of Psychology http://joshuawiley.com/ Senior Analyst, Elkhart Group Ltd. http://elkhartgroup.com Office: 260.673.5518 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using foumula to calculate a column in dataframe
Hello All, I need to calculate a column (Vupper) using a formula, but I am not sure how to. It will be easier to explain with an example. Again this is my dataset: dput(nd) structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52), cusum = c(-0.551, -2.56, -3.27, -1.61, 0.549, 0.729, -1.23, 0.229, -0.572, -0.232, -1.2, 0.268, 0.778, 0.178, 0.258, -0.373, 0.246, 0.557, 0.557, 3.56, 4.46, 3.79, 6.08, 7.58, 8.18, 9.26, 9.64, 11.26, 12.57, 13.09 )), .Names = c(week, value, cusum), row.names = c(NA, -30L ), class = data.frame) I have some constants in my data. These are: sigma =1, h = 5, k = 0.5 The formula requires me to start from the bottom row (30th in this case). The formula for the last row will be row 30th Cusi value (13.09) + h(5) * sigma(1) = giving me the value of 18.1 Then the formula for the 29th row for Vupper uses the value of 30th Vupper (18.1) + k(0.5) * sigma(1) = giving me the value of 18.6 Similarly the formula for the 28th row for Vupper will use value of 29th Vupper(18.6) + k(0.5) * sigma(1) = giving me the value of 19.1 And so on?. Also, is there any way to make the formula generalised using loop or functions? Because I really don?t want to have to re-write the program if my number of rows increase or decrease or if I use another dataset? So far my function looks like following (Without the Vupper formula in there): vmask2 - function(data,target,sigma,h,k){ data$deviation - data$value - target data$cusums - cumsum(data$deviation) data$ma - c(NA,abs(diff(data$value))) data$Vupper - *not sure what to put here* data } *** MORE THN is a trading style of Royal Sun Alliance Insurance plc (No. 93792). Registered in England and Wales at St. Markâs Court, Chart Way, Horsham, West Sussex, RH12 1XL. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculate depth from regular xyz grid for any coordinate within the grid
Dear R-experts, I have a regular grid dataframe (here: the first 50 rows) : # data frame (regular grid) with x, y (UTM-coordinates) and z (depth) # x=UTM coordinates (easting, zone 32) # y=UTM coordinates (northing, zone 32) # z=river-depth (meters) df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270), y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970740, 5970750, 5970760, 5970770, 5970780, 5970790, 5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700), z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794, -1.6930, -1.7067, -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285, -1.7422, -1.7558, -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550, -1.6686, -1.6823, -1.6959, -1.7095)) head(df) plot(df[,1:2], las=3) # to show that it's a regular grid My question: is there a function to calculate the depth of any coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear interpolation or any other meaningful method? Thanks a lot for your help in anticipation Best wishes Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate depth from regular xyz grid for any coordinate within the grid
Hi, The area of statistics you're looking for is called geostatistics. There are many R packages to conduct such analyses. See the Spatial task view for some good starting points: http://cran.r-project.org/web/views/Spatial.html You'll need to do some homework to understand the various options and which are best for your data. You might start with Inverse Distance Weighting. Sarah On Mon, Jul 28, 2014 at 9:07 AM, Kulupp kul...@online.de wrote: Dear R-experts, I have a regular grid dataframe (here: the first 50 rows) : # data frame (regular grid) with x, y (UTM-coordinates) and z (depth) # x=UTM coordinates (easting, zone 32) # y=UTM coordinates (northing, zone 32) # z=river-depth (meters) df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270), y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970740, 5970750, 5970760, 5970770, 5970780, 5970790, 5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700), z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794, -1.6930, -1.7067, -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285, -1.7422, -1.7558, -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550, -1.6686, -1.6823, -1.6959, -1.7095)) head(df) plot(df[,1:2], las=3) # to show that it's a regular grid My question: is there a function to calculate the depth of any coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear interpolation or any other meaningful method? Thanks a lot for your help in anticipation Best wishes Thomas -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is dataset headsize from MVA or HSAUR2 packages missing or am I missing something ?
Dear R-helpers, I've started the study of An introductionto applied multivariate analysis with R (Everitt and Hothorn) After loading the library, which depends on HSAUR2, it seems that the dataset headsize is not available (as well as measure and exam) Datasets from the same book are nevertheless available, such as heptathlon, pottery, USairpollution Am I missing something obvious here ? Thanks in advance -- Ottorino-Luca Pantani, Università di Firenze Dip.to di Scienze delle Produzioni Agroalimentari e dell'Ambiente (DISPAA) P.zle Cascine 28 50144 Firenze Italia Debian 7.0 wheezy -- GNOME 3.4.2 GNU Emacs 24.4.1 (i486-pc-linux-gnu, GTK+ Version 2.24.10) ESS version 12.04-4 -- R 3.1.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function assignment
I am sorry but I don't follow your description beyond use it with an external reference because external references are external, while the destination of an assignment is an internal reference. If you want the destination to be an object which uses special knowledge (an external reference) to store the value (e.g. by an assignment function) into an external object, then the internal object should have PREVIOUSLY been constructed with that special knowledge about that external reference. Thus I don't follow why you want the external reference to be the destination of an assignment. The replace method seems much more suitable if you want to supply an external key in the course of the assignment. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 28, 2014 1:59:18 AM PDT, Florian Ryan florian.r...@aim.com wrote: Thank you very much! The idea was to use it with an external reference and kind of to write an constructor for an object which points at this external reference and behaves like an R object. So it would be the same as if I do just name - function(name, someValuesForObjectConstruction) but I don't have to provide the names twice. Where function returns an object which stores the string name and knows get, set, ... and since with setReplaceMethod(f=[, signature=myExternList, definition=function(x, i=character, j=missing, y){ setExternReference(i, y) return(x) } I can get nicely the name inside the bracket (x[name] - ) it seemed possible to get name some how out of an name - assignment. Again thank you very much for hints and advice. Florian Ryan florian.r...@aim.com -Original Message- From: peter dalgaard pda...@gmail.com To: Jeff Newmiller jdnew...@dcn.davis.ca.us Cc: Florian Ryan florian.r...@aim.com; r-help r-help@r-project.org Sent: Sat, Jul 26, 2014 10:04 pm Subject: Re: [R] Function assignment On 26 Jul 2014, at 17:01 , Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: What an awful idea... that would lead to incredibly hard-to-debug programs. No, you cannot do that. What kind of problem has led you to want such a capability? Perhaps we can suggest a simpler way to think about your problem. I agree that this is a silly idea, but I actually thought that it could be done by clever manipulation of the call stack. It can if you do the assignment with assign(): foo - function()sys.calls()[[1]][[2]] assign(z, foo()) z [1] z assign(bah, foo()) bah [1] bah but if you do x - foo(), there is no mention of x or x in sys.calls(). Anyways, functions that assume being called in a specific are asking for trouble in all cases where they get called differently. -pd --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 26, 2014 5:29:59 AM PDT, Florian Ryan florian.r...@aim.com wrote: Hello, I would like to use the variable name which i assign the return value of a function in a function. Is that possible? e.g. foo - function(){ some not to me known R magic } myVariableName - foo() myVariableName [1] myVariableName Hope someone can help me. Thanks Florian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com
Re: [R] lattice, latticeExtra: Adding moving averages to double y plot
Hi Anna Not sure what you want exactly as I do not use themes. Here is one way to get your averages and points # combine averages into mydata mydata$mavg - c(rep(NA,4), madfStuff1[,3], rep(NA,4), subset(madfStuff2_3, Type== stuff2,3, drop = T), rep(NA,4), subset(madfStuff2_3, Type== stuff3,3, drop = T)) xyplot(Value ~ Year, mydata, groups = Type, allow.multiple = T, distribute.type = TRUE, col = c(red,blue,cyan), subscripts = TRUE, panel = panel.superpose, panel.groups = function(x, y, subscripts, ...,group.number) { panel.xyplot(x, y, ...) panel.xyplot(x, mydata[subscripts,mavg], col = c(red,blue,cyan)[group.number], type = l) } ) HTH And now some sleep Duncan BTW package names are case sensitive like R Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Anna Zakrisson Braeunlich Sent: Monday, 28 July 2014 16:38 To: r-help@r-project.org Subject: [R] lattice, latticeExtra: Adding moving averages to double y plot Hi lattice users, I would like to add 5-year moving averages to my double y-plot. I have three factors needs to be plotted with moving averages in the same plot. One of these reads off y-axis 1 and two from y-axis 2. I have tried to use the rollmean function from the zoo-packages, but I fail in insering this into lattice (I am not an experienced lattice user). I want to keep the data points in the plot. Find below dummy data and the script as well as annotations further describing my question. thank you in advance! Anna Zakrisson mydata- data.frame( Year = 1980:2009, Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)), Value = rnorm(90, mean = seq(90), sd = rep(c(6, 7, 3), each = 10))) library(Lattice) library(LatticeExtra) stuff1data - mydata[(mydata$Type) %in% c(stuff1), ] stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ] # make moving averages function using zoo and rollmean: library(zoo) library(plyr) f - function(d) { require(zoo) data.frame(Year = d$Year[5:length(d$Year)], mavg = rollmean(d$Value, 5)) } # Apply the function to each group as well as both data frames: madfStuff1 - ddply(stuff1data, Type, f) madfStuff2_3 - ddply(stuff12_3data, Type, f) # Some styles: myStripStyle - function(which.panel, factor.levels, ...) { panel.rect(0, 0, 1, 1, col = bgColors[which.panel], border = 1) panel.text(x = 0.5, y = 0.5, font=2, lab = factor.levels[which.panel], col = txtColors[which.panel]) } myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black, lty=1, pch=1, ylab = sweets, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = 1,col = black) panel.lmline(x,y,col = black, data=madfStuff1) # here I presume that panel.lmline is wrong. # I would like to have my 5 year moving average here, not a straight line. }) myplot1 myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black, lty=1, pch=1, ylab = hours, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = c(2:3),col = black) ## what is this pch defining? Types? #I would like to have different symbols and line types for stuff2 and stuff3 panel.lmline(x,y,col = black, data=madfStuff2_3) # wrong! Need my moving averages here! }) myplot2 doubleYScale(myplot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE, text = c(stuff1, stuff2, stuff3), columns = 2, col=black) # problem here is that I end up with two lines. I need a double y-plot with one moving average plots that are read off y-axis 1 # and two that reads off y-axis 2. I need to keep the data points in the plot. update(trellis.last.object(), par.settings = simpleTheme(col = c(black, black), lty=c(1:3), pch=c(1:3))) # how come that I only get # lines in my legend text and not the symbols too. I thought pch would add symbols?!? Anna Zakrisson Braeunlich PhD student Department of Ecology, Environment and Plant Sciences Stockholm University Svante Arrheniusv. 21A SE-106 91 Stockholm Sweden/Sverige Lives in Berlin. For paper mail: Katzbachstr. 21 D-10965, Berlin Germany/Deutschland E-mail: anna.zakris...@su.se Tel work: +49-(0)3091541281 Mobile: +49-(0)15777374888 LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b :`. . `. . `.
Re: [R] lattice, latticeExtra: Adding moving averages to double y plot
An utterly perfect example of why one shouldn't send HTML mail to this list. On Mon, Jul 28, 2014 at 11:18 AM, Duncan Mackay dulca...@bigpond.com wrote: Hi Anna Not sure what you want exactly as I do not use themes. Here is one way to get your averages and points # combine averages into mydata mydata$mavg - c(rep(NA,4), madfStuff1[,3],rep(NA,4), subset(madfStuff2_3, Type== stuff2,3, drop = T),rep(NA,4), subset(madfStuff2_3, Type== stuff3,3, drop = T)) xyplot(Value ~ Year, mydata, groups = Type, allow.multiple = T, distribute.type = TRUE, col = c(red,blue,cyan), subscripts = TRUE, panel = panel.superpose, panel.groups = function(x, y, subscripts, ...,group.number) { panel.xyplot(x, y, ...) panel.xyplot(x, mydata[subscripts,mavg], col = c(red,blue,cyan)[group.number], type = l) } And now some sleep Duncan BTW package names are case sensitive like R Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original ! Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Anna Zakrisson Braeunlich Sent: Monday, 28 July 2014 16:38 To: r-help@r-project.org Subject: [R] lattice, latticeExtra: Adding moving averages to double y plot Hi lattice users, I would like to add 5-year moving averages to my double y-plot. I have three factors needs to be plotted with moving averages in the same plot. One of these reads off y-axis 1 and two from y-axis 2. I have tried to use the rollmean function from the zoo-packages, but I fail in insering this into lattice (I am not an experienced lattice user). I want to keep the data points in the plot. Find below dummy data and the script as well as annotations further describing my question. thank you in advance! Anna Zakrisson mydata- data.frame( Year = 1980:2009, Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)), Value = rnorm(90, mean = seq(90), sd = rep(c(6, 7, 3), each = 10))! ) library(Lattice) library(LatticeExtra) stuff1data - mydata[(mydata$ Type) %in% c(stuff1), ] stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ] # make moving averages function using zoo and rollmean: library(zoo) library(plyr) f - function(d) require(zoo) data.frame(Year = d$Year[5:length(d$Year)], mavg = rollmean(d$Value, 5)) # Apply the function to each group as well as both data frames: madfStuff1 - ddply(stuff1data, Type, f) madfStuff2_3 - ddply(stuff12_3data, Type, f) # Some styles: myStripStyle - function(which.panel, factor.levels, ...) { panel.rect(0, 0, 1, 1, col = bgColors[which.panel], border = 1) panel.text(x = 0.5, y = 0.5, font=2, lab = factor.levels[which.panel], col = txtColors[which.panel]) myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black,lty=1, pch=1, ylab = sweets, strip.left = F,strip=myStripStyle, xlab = (Year), ! panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = 1,col = black) panel.lmline(x,y,col = black, data=madfStuff1) # here I presume that panel.lmline is wrong. # I would like to have my 5 year moving average here, not a straight line. }) myplot1 myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black, lty=1, pch=1, ylab = hours, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = c(2:3),col = black) ## what is this pch defining? Types? #I would like to have different symbols and line types for stuff2 and stuff3 panel.lmline(x,y,col = black, data=madfStuff2_3) # wrong! Need my moving averages here! }) myplot2 doubleYScale(myp! lot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE, te xt = c(stuff1, stuff2, stuff3), columns = 2, col=black) # problem here is that I end up with two lines. I need a double y-plot with one moving average plots that are read off y-axis 1 # and two that reads off y-axis 2. I need to keep the data points in the plot. update(trellis.last.object(), par.settings = simpleTheme(col = c(black, black), lty=c(1:3), pch=c(1:3))) # how come that I only get # lines in my legend text and not the symbols too. I thought pch would add symbols?!? Anna Zakrisson Braeunlich PhD student Department of Ecology, Environment and Plant Sciences Stockholm University Svante Arrheniusv. 21A SE-106 91 Stockholm Sweden/Sverige Lives in Berlin. For paper mail: Katzbachstr. 21 D-10965, Berlin
Re: [R] working on a data frame
Thank you very much Peter, Bill and Petr for some great and quite elegant solutions. There is a lot I can learn from these. Yes to your question Bill about the raw numbers, they are counts and they can not be negatives. The data is RNA Sequencing data where there are approximately 32,000 genes being measured for changes between two conditions. There are some genes that are not present (can not be measured) initially, but are present in the second condition, and the reverse is true also of some genes that are present initially and then not be present in the second condition (these are often the most interesting genes). This makes it difficult to compare mathematically the changes of all genes, so it is common practice to change the 0's to 1's and then redo the log2. 1 is considered sufficiently small, actually anywhere up to 3 or 5 could be just do to 'background noise' in the measurement process, but it is somewhat arbitrary. Matthew On 7/28/2014 2:43 AM, PIKAL Petr wrote: Hi I like to use logical values directly in computations if possible. yourData[,10] - yourData[,9]/(yourData[,8]+(yourData[,8]==0)) Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used in computations. If you really want to change 0 to 1 in column 8 you can use yourData[,8] - yourData[,8]+(yourData[,8]==0) without ifelse stuff. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of William Dunlap Sent: Friday, July 25, 2014 8:07 PM To: Matthew Cc: r-help@r-project.org Subject: Re: [R] working on a data frame if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] You could do express this in R as is8Zero - yourData[,8] == 0 yourData[is8Zero, 8] - 1 yourData[is8Zero, 10] - yourData[is8Zero,9] / yourData[is8Zero,8] Note how logical (Boolean) values are used as subscripts - read the '[' as 'such that' when using logical subscripts. There are many more ways to express the same thing. (I am tempted to change the algorithm to avoid the divide by zero problem by making the quotient (numerator + epsilon)/(denominator + epsilon) where epsilon is a very small number. I am assuming that the raw numbers are counts or at least cannot be negative.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 25, 2014 at 10:44 AM, Matthew mccorm...@molbio.mgh.harvard.edu wrote: Thank you for your comments, Peter. A couple of questions. Can I do something like the following ? if yourData[,8]==0, then yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8] I think I am just going to have to learn more about R. I thought getting into R would be like going from Perl to Python or Java etc., but it seems like R programming works differently. Matthew On 7/25/2014 12:06 AM, Peter Alspach wrote: Tena koe Matthew Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division.. That being the case, think in terms of vectors, as Sarah says. Try: yourData[,10] - yourData[,9]/yourData[,8] yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9] This doesn't change the 0 to 1 in column 8, but it doesn't appear you actually need to do that. HTH Peter Alspach -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Matthew McCormack Sent: Friday, 25 July 2014 3:16 p.m. To: Sarah Goslee Cc: r-help@r-project.org Subject: Re: [R] working on a data frame On 7/24/2014 8:52 PM, Sarah Goslee wrote: Hi, Your description isn't clear: On Thursday, July 24, 2014, Matthew mccorm...@molbio.mgh.harvard.edu mailto:mccorm...@molbio.mgh.harvard.edu wrote: I am coming from the perspective of Excel and VBA scripts, but I would like to do the following in R. I have a data frame with 14 columns and 32,795 rows. I want to check the value in column 8 (row 1) to see if it is a 0. If it is not a zero, proceed to the next row and check the value for column 8. If it is a zero, then a) change the zero to a 1, b) divide the value in column 9 (row 1) by 1, Row 1, or the row in which column 8 == 0? All rows in which the value in column 8==0. Why do you want to divide by 1? Column 10 contains the result of the value in column 9 divided by the value in column 8. If the value in column 8==0, then the division can not be done, so I want to change the zero to a one in order to do the division. This is a fairly standard thing to do with this data. (The data are measurements of amounts at two time points. Sometimes a thing will not be present in the beginning (0), but very present at the later time. Column 10 is the log2 of the change. Infinite is not an easy number to work with, so it is common to change the
Re: [R] Calculate depth from regular xyz grid for any coordinate within the grid
I believe the interpp() function from the akima package will do what you want. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/28/14, 6:07 AM, Kulupp kul...@online.de wrote: Dear R-experts, I have a regular grid dataframe (here: the first 50 rows) : # data frame (regular grid) with x, y (UTM-coordinates) and z (depth) # x=UTM coordinates (easting, zone 32) # y=UTM coordinates (northing, zone 32) # z=river-depth (meters) df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270), y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970740, 5970750, 5970760, 5970770, 5970780, 5970790, 5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700), z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794, -1.6930, -1.7067, -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285, -1.7422, -1.7558, -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550, -1.6686, -1.6823, -1.6959, -1.7095)) head(df) plot(df[,1:2], las=3) # to show that it's a regular grid My question: is there a function to calculate the depth of any coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear interpolation or any other meaningful method? Thanks a lot for your help in anticipation Best wishes Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate depth from regular xyz grid for any coordinate within the grid
The raster package can readily provide bilinear interpolation: library(raster) r - rasterFromXY(df) ## due diligence, just a guess here you should check ## projection(r) - +proj=utm +zone=32 +datum=WGS84 ## coordinates to extract m - matrix(c( 3454263, 5970687), ncol = 2) extract(r, m, method = bilinear) [1] -1.686059 ## compare with extract(r, m, method = simple) -1.6877 See ?extract - simplest usage is a query matrix of XY coordinates in the projection used by your raster, it will helpfully transform queries such as a Spatial*DataFrame if needed, as long as both raster x and query y have sufficient projection metadata (and it's up to you to make sure that's set right). (Generally building a raster from XYZ data is sub-optimal since there's so much redundancy in the XY coordinates, and so much room for things to go wrong in between. But sometimes there's no better option. ) Cheers, Mike. On Mon, Jul 28, 2014 at 11:07 PM, Kulupp kul...@online.de wrote: Dear R-experts, I have a regular grid dataframe (here: the first 50 rows) : # data frame (regular grid) with x, y (UTM-coordinates) and z (depth) # x=UTM coordinates (easting, zone 32) # y=UTM coordinates (northing, zone 32) # z=river-depth (meters) df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270), y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970740, 5970750, 5970760, 5970770, 5970780, 5970790, 5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700), z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794, -1.6930, -1.7067, -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285, -1.7422, -1.7558, -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550, -1.6686, -1.6823, -1.6959, -1.7095)) head(df) plot(df[,1:2], las=3) # to show that it's a regular grid My question: is there a function to calculate the depth of any coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear interpolation or any other meaningful method? Thanks a lot for your help in anticipation Best wishes Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Software and Database Engineer Australian Antarctic Division Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rgl.postscript doesn't show the colors correctly
I wrote this code in R, in order to plot a density function kernel smoothing and then save the plot as a eps file: library(ks) library(rgl) kern -read.table(file.choose(),sep=,) hat -kde(kern) plot(hat,drawpoints=TRUE,xlab =x,ylab=y,zlab=z) rgl.postscript(plot1.eps,eps,drawText=TRUE) The problem is that when I insert that eps file in Latex, the colors of the plot are not the same as the plot which is generated in R and it just shows the plot in one color (yellow) instead of a rage of colors (yellow, orange, red...) which shows different densities... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split PVClust plot
Dear All I'm using PVClust to perform hierarchical clustering, for the output plot I can control most of the graphical I need, however the plot is large and I would like to split it vertically into two panels one above the other. Is there a way to plot only part of a PVClust plot, I tried to convert it to a dendrogram with result2 = as.dendrogram(result) however I get the error message no applicable method for 'as.dendrogram' applied to an object of class pvclust. I also wondered whether it would be possible to convert to a phylogenetic tree and use the functions in the 'ape' package? Any suggestion on how to split up a PVclust plot would be greatly appreciated (code for the plot below) Thanks Tom result - pvclust(df.1, method.dist=uncentered, method.hclust=average,nboot=10) par(mar=c(0,0,0,0)) par(oma=c(0,0,0,0)) plot(result, print.pv =FALSE, col.pv=c(red,,), print.num=FALSE, float = 0.02, font=1, axes=T, cex =0.85, main=, sub=, xlab=, ylab= , labels=NULL, hang=-1) pvrect(result, alpha=0.95) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using foumula to calculate a column in dataframe
On Mon, 28 Jul 2014, Pavneet Arora wrote: Hello All, I need to calculate a column (Vupper) using a formula, but I am not sure how to. It will be easier to explain with an example. Again this is my dataset: dput(nd) structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52), cusum = c(-0.551, -2.56, -3.27, -1.61, 0.549, 0.729, -1.23, 0.229, -0.572, -0.232, -1.2, 0.268, 0.778, 0.178, 0.258, -0.373, 0.246, 0.557, 0.557, 3.56, 4.46, 3.79, 6.08, 7.58, 8.18, 9.26, 9.64, 11.26, 12.57, 13.09 )), .Names = c(week, value, cusum), row.names = c(NA, -30L ), class = data.frame) I have some constants in my data. These are: sigma =1, h = 5, k = 0.5 The formula requires me to start from the bottom row (30th in this case). The formula for the last row will be row 30th Cusi value (13.09) + h(5) * sigma(1) = giving me the value of 18.1 Then the formula for the 29th row for Vupper uses the value of 30th Vupper (18.1) + k(0.5) * sigma(1) = giving me the value of 18.6 Similarly the formula for the 28th row for Vupper will use value of 29th Vupper(18.6) + k(0.5) * sigma(1) = giving me the value of 19.1 And so on?. This is a recurrence formula... each value depends on the previous value in the sequence. In general these can be computationally expensive in R, but there are certain very common cases that have built-in functions with which you can build many of the real-world cases you might encounter (such as this one). Also, is there any way to make the formula generalised using loop or functions? Because I really don?t want to have to re-write the program if my number of rows increase or decrease or if I use another dataset? So far my function looks like following (Without the Vupper formula in there): vmask2 - function(data,target,sigma,h,k){ data$deviation - data$value - target data$cusums - cumsum(data$deviation) data$ma - c(NA,abs(diff(data$value))) data$Vupper - *not sure what to put here* data } I avoid using the variable name data because there is a base function of that name. sigma - 1 h - 5 k - 0.5 dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma , rep( 0, nrow(dta) - 1 ) ) + seq( 0, by=k * sigma, length.out=30L ) ) ) Note how the terms in your algorithm are re-grouped into vectors that c and seq and rep can generate, and cumsum is used to implement the recurrence, and the rev function is used to reverse the vector. If you are going to apply this to long sequences of data, you might want to fix the accumulation of floating-point error in the seq call by using integers: dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma , rep( 0, nrow(dta) - 1 ) ) + k * sigma * seq( 0L, by=1L, length.out=30L ) ) ) *** MORE THN is a trading style of Royal Sun Alliance Insurance plc (No. 93792). Registered in England and Wales at St. Mark???s Court, Chart Way, Horsham, West Sussex, RH12 1XL. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. [[alternative HTML version deleted]] Please send your emails in plain text, as the Posting Guide requests. HTML often corrupts what you send to the list. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal,
Re: [R] using foumula to calculate a column in dataframe
On Mon, 28 Jul 2014, Jeff Newmiller wrote: On Mon, 28 Jul 2014, Pavneet Arora wrote: Hello All, I need to calculate a column (Vupper) using a formula, but I am not sure how to. It will be easier to explain with an example. Again this is my dataset: dput(nd) structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52), cusum = c(-0.551, -2.56, -3.27, -1.61, 0.549, 0.729, -1.23, 0.229, -0.572, -0.232, -1.2, 0.268, 0.778, 0.178, 0.258, -0.373, 0.246, 0.557, 0.557, 3.56, 4.46, 3.79, 6.08, 7.58, 8.18, 9.26, 9.64, 11.26, 12.57, 13.09 )), .Names = c(week, value, cusum), row.names = c(NA, -30L ), class = data.frame) I have some constants in my data. These are: sigma =1, h = 5, k = 0.5 The formula requires me to start from the bottom row (30th in this case). The formula for the last row will be row 30th Cusi value (13.09) + h(5) * sigma(1) = giving me the value of 18.1 Then the formula for the 29th row for Vupper uses the value of 30th Vupper (18.1) + k(0.5) * sigma(1) = giving me the value of 18.6 Similarly the formula for the 28th row for Vupper will use value of 29th Vupper(18.6) + k(0.5) * sigma(1) = giving me the value of 19.1 And so on?. This is a recurrence formula... each value depends on the previous value in the sequence. In general these can be computationally expensive in R, but there are certain very common cases that have built-in functions with which you can build many of the real-world cases you might encounter (such as this one). Also, is there any way to make the formula generalised using loop or functions? Because I really don?t want to have to re-write the program if my number of rows increase or decrease or if I use another dataset? So far my function looks like following (Without the Vupper formula in there): vmask2 - function(data,target,sigma,h,k){ data$deviation - data$value - target data$cusums - cumsum(data$deviation) data$ma - c(NA,abs(diff(data$value))) data$Vupper - *not sure what to put here* data } I avoid using the variable name data because there is a base function of that name. sigma - 1 h - 5 k - 0.5 dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma , rep( 0, nrow(dta) - 1 ) ) + seq( 0, by=k * sigma, length.out=30L ) ) ) Oops... accounted for accumulation twice, once with cumsum and once with seq. dta$Vupper - rev( rep( dta[ nrow(dta), cusum ] + h * sigma, nrow(dta) ) + k * sigma * seq( 0L, by=1L, length.out=30L ) ) Note how the terms in your algorithm are re-grouped into vectors that c and seq and rep can generate, and cumsum is used to implement the recurrence, and the rev function is used to reverse the vector. If you are going to apply this to long sequences of data, you might want to fix the accumulation of floating-point error in the seq call by using integers: dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma , rep( 0, nrow(dta) - 1 ) ) + k * sigma * seq( 0L, by=1L, length.out=30L ) ) ) *** MORE THN is a trading style of Royal Sun Alliance Insurance plc (No. 93792). Registered in England and Wales at St. Mark???s Court, Chart Way, Horsham, West Sussex, RH12 1XL. Authorised by the Prudential Regulation Authority and regulated by the Financial Conduct Authority and the Prudential Regulation Authority. [[alternative HTML version deleted]] Please send your emails in plain text, as the Posting Guide requests. HTML often corrupts what you send to the list. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers)
Re: [R] lattice, latticeExtra: Adding moving averages to double y plot
I do not know what happened to my last email as this are set up as plain text so I am sending the code again so I hope this works I am not sure what you wanted exactly but this will plot the points and lines of the average. I have not worried about the 2nd axis Here is one way of doing things by combining the averages into the dataframe. It makes it easier that way as you do not have to match up the x values # combine averages into mydata mydata$mavg - c(rep(NA,4), madfStuff1[,3], rep(NA,4), subset(madfStuff2_3, Type== stuff2,3, drop = T), rep(NA,4), subset(madfStuff2_3, Type== stuff3,3, drop = T)) xyplot(Value ~ Year, mydata, groups = Type, allow.multiple = T, distribute.type = TRUE, col = c(red,blue,cyan), subscripts = TRUE, panel = panel.superpose, panel.groups = function(x, y, subscripts, ...,group.number) { panel.xyplot(x, y, ...) panel.xyplot(x, mydata[subscripts,mavg], col = c(red,blue,cyan)[group.number], type = l) }) Duncan BTW libraries are case sensitive as well. Is it you editor putting capitals? Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Anna Zakrisson Braeunlich Sent: Monday, 28 July 2014 16:38 To: r-help@r-project.org Subject: [R] lattice, latticeExtra: Adding moving averages to double y plot Hi lattice users, I would like to add 5-year moving averages to my double y-plot. I have three factors needs to be plotted with moving averages in the same plot. One of these reads off y-axis 1 and two from y-axis 2. I have tried to use the rollmean function from the zoo-packages, but I fail in insering this into lattice (I am not an experienced lattice user). I want to keep the data points in the plot. Find below dummy data and the script as well as annotations further describing my question. thank you in advance! Anna Zakrisson mydata- data.frame( Year = 1980:2009, Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)), Value = rnorm(90, mean = seq(90), sd = rep(c(6, 7, 3), each = 10))) library(Lattice) library(LatticeExtra) stuff1data - mydata[(mydata$Type) %in% c(stuff1), ] stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ] # make moving averages function using zoo and rollmean: library(zoo) library(plyr) f - function(d) { require(zoo) data.frame(Year = d$Year[5:length(d$Year)], mavg = rollmean(d$Value, 5)) } # Apply the function to each group as well as both data frames: madfStuff1 - ddply(stuff1data, Type, f) madfStuff2_3 - ddply(stuff12_3data, Type, f) # Some styles: myStripStyle - function(which.panel, factor.levels, ...) { panel.rect(0, 0, 1, 1, col = bgColors[which.panel], border = 1) panel.text(x = 0.5, y = 0.5, font=2, lab = factor.levels[which.panel], col = txtColors[which.panel]) } myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black, lty=1, pch=1, ylab = sweets, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = 1,col = black) panel.lmline(x,y,col = black, data=madfStuff1) # here I presume that panel.lmline is wrong. # I would like to have my 5 year moving average here, not a straight line. }) myplot1 myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black, lty=1, pch=1, ylab = hours, strip.left = F, strip=myStripStyle, xlab = (Year), panel = function(x,y,...,subscripts){ panel.xyplot(x, y, pch = c(2:3),col = black) ## what is this pch defining? Types? #I would like to have different symbols and line types for stuff2 and stuff3 panel.lmline(x,y,col = black, data=madfStuff2_3) # wrong! Need my moving averages here! }) myplot2 doubleYScale(myplot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE, text = c(stuff1, stuff2, stuff3), columns = 2, col=black) # problem here is that I end up with two lines. I need a double y-plot with one moving average plots that are read off y-axis 1 # and two that reads off y-axis 2. I need to keep the data points in the plot. update(trellis.last.object(), par.settings = simpleTheme(col = c(black, black), lty=c(1:3), pch=c(1:3))) # how come that I only get # lines in my legend text and not the symbols too. I thought pch would add symbols?!? Anna Zakrisson Braeunlich PhD student Department of Ecology, Environment and Plant Sciences Stockholm University Svante Arrheniusv.
[R] interactive labeling/highlighting on multiple xy scatter plots
hi list, I'm comparing the changes of ~100 analytes in multiple treatment conditions. I plotted them in several different xy scattter plots. It would be nice if I mouse over one point on one scatter plot, the label of the analyte on that scatter plot AS WELL AS on all other scatter plots will be automatically shown. I know brushing in rggobi does this, but its interface is not good and it needs R or ggobi to run (I want send the results to the collaborators and let them to play with it without the need of installing R or ggobi on their machine). rCharts is nice but so far it can only create one scatter plot at a time. Any good suggestions? Many thanks! Tao __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in validObject(.Object) : while running rqpd package
Hi I have installed rqpd on r 2.15.1 win 7 os. after loading rqpd package i get following output Loading required package: quantreg Loading required package: SparseM Attaching package: âSparseMâ The following object(s) are masked from âpackage:baseâ:   backsolve Loading required package: MatrixModels Loading required package: Matrix Loading required package: lattice Attaching package: âMatrixâ The following object(s) are masked from âpackage:SparseMâ:   det Loading required package: Formula Warning messages: 1: package âquantregâ was built under R version 2.15.3 2: package âSparseMâ was built under R version 2.15.3 3: package âMatrixModelsâ was built under R version 2.15.3 4: package âlatticeâ was built under R version 2.15.3 5: package âFormulaâ was built under R version 2.15.3 6: In rm(.First.lib, envir = myEnv) : object '.First.lib' not found whcih i fix by running following command as.environment(match(package:rqpd, search())) environment: package:rqpd attr(,name) [1] package:rqpd attr(,path) [1] C:/Users/fossil/Documents/R/win-library/2.15/rqpd tried running example file data(bwd) cre.form - dbirwt ~ smoke + dmage + agesq + novisit + pretri2 + pretri3 | momid3 | smoke + dmage + agesq crem.fit - rqpd(cre.form, panel(method=cre), data=bwd) i get following error Error in validObject(.Object) :  invalid class âdsparseModelMatrixâ object: superclass mMatrix not defined in the environment of the object's class do i need to install 2.15.3? how can i solve this problem? please help thanks in advance vishal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] outputting R loop to a csv file
Hello, My name is Jenny Jiang and I am a Finance Honours research student from the University of New South Wales. Currently my research project involves the calculating of some network centrality measures in R by using a loop, however I am having some trouble outputting my loop results to a desired CSV format. Basically what I am doing is that for each firm year, I will need to calculate four different measures based on director id and connected director id and output these to the CSV file. I have provided in the attachment the code that I used for the R loop and CSV outputting (main-6.R). Using an example CSV file (data example 2), the output result I get is as shown in measure1.csv. As shown in the output file, the results are really messy, where for each firm year, all director ids and each type of measure for all directors are displayed in one cell. However, the desired format of output that I would like is as shown in output data template.xlsx. As a result, I was just wondering if you could be able to help me to get the desired format that I would like, which would be much easier to enable me to do further research on this. I cannot be more than appreciated. Best regards Jenny __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] outputting R loop to a csv file
It will be difficult to help since all of the attached files were stripped out of your message. R-help accepts very few formats as attached files and they do not include .R or .csv or .xlsx, but they do include .txt (so you could rename your R and csv files). It will be easier to help if we have enough data to test alternate approaches. The function dput() will convert a sample of your data to text format so that you can paste it into your email or provide it as a .txt file. David L. Carlson Department of Anthropology Texas AM University -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jenny Jiang Sent: Monday, July 28, 2014 8:48 PM To: r-help@R-project.org Subject: [R] outputting R loop to a csv file Hello, My name is Jenny Jiang and I am a Finance Honours research student from the University of New South Wales. Currently my research project involves the calculating of some network centrality measures in R by using a loop, however I am having some trouble outputting my loop results to a desired CSV format. Basically what I am doing is that for each firm year, I will need to calculate four different measures based on director id and connected director id and output these to the CSV file. I have provided in the attachment the code that I used for the R loop and CSV outputting (main-6.R). Using an example CSV file (data example 2), the output result I get is as shown in measure1.csv. As shown in the output file, the results are really messy, where for each firm year, all director ids and each type of measure for all directors are displayed in one cell. However, the desired format of output that I would like is as shown in output data template.xlsx. As a result, I was just wondering if you could be able to help me to get the desired format that I would like, which would be much easier to enable me to do further research on this. I cannot be more than appreciated. Best regards Jenny __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] outputting R loop to a csv file
Hi Above what David said, there is chance that you do not need cycle for your computation. From what you describe about your csv files there seems to be some mismatch in your write.csv statement. Make a small example code together with data set, preferably as an output from dput, and try again to ask. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of David L Carlson Sent: Tuesday, July 29, 2014 5:25 AM To: Jenny Jiang; r-help@R-project.org Subject: Re: [R] outputting R loop to a csv file It will be difficult to help since all of the attached files were stripped out of your message. R-help accepts very few formats as attached files and they do not include .R or .csv or .xlsx, but they do include .txt (so you could rename your R and csv files). It will be easier to help if we have enough data to test alternate approaches. The function dput() will convert a sample of your data to text format so that you can paste it into your email or provide it as a .txt file. David L. Carlson Department of Anthropology Texas AM University -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Jenny Jiang Sent: Monday, July 28, 2014 8:48 PM To: r-help@R-project.org Subject: [R] outputting R loop to a csv file Hello, My name is Jenny Jiang and I am a Finance Honours research student from the University of New South Wales. Currently my research project involves the calculating of some network centrality measures in R by using a loop, however I am having some trouble outputting my loop results to a desired CSV format. Basically what I am doing is that for each firm year, I will need to calculate four different measures based on director id and connected director id and output these to the CSV file. I have provided in the attachment the code that I used for the R loop and CSV outputting (main-6.R). Using an example CSV file (data example 2), the output result I get is as shown in measure1.csv. As shown in the output file, the results are really messy, where for each firm year, all director ids and each type of measure for all directors are displayed in one cell. However, the desired format of output that I would like is as shown in output data template.xlsx. As a result, I was just wondering if you could be able to help me to get the desired format that I would like, which would be much easier to enable me to do further research on this. I cannot be more than appreciated. Best regards Jenny __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a
Re: [R-es] wordcloud y tabla de palabras
Buenas tardes, grupo. Agradecido Carlos por tu orientación y Eduardo. Efectivamente, seguí el ejemplo de wordclouds, y al igual que anteriormente logró hacer la nube de texto, pero sólo por cada uno de los textos considerados. Tengo los dos corpus clean por cada uno de los informes que estoy considerando: año 2005 y 2013. tdm05-TermDocumentMatrix(cor.05.cl) tdm13-TermDocumentMatrix(cor.13.cl) m05-as.matrix(tdm05) m13-as.matrix(tdm13) v05 - sort(rowSums(m05),decreasing=TRUE) v13 - sort(rowSums(m13),decreasing=TRUE) df05-data.frame(word = names(v05), freq=v05) df13-data.frame(word = names(v13), freq=v13) wordcloud(df05$word,df05$freq) There were 50 or more warnings (use warnings() to see the first 50) head(df05) word freq seguridad seguridad 56 ciudadana ciudadana 40 funcionarios funcionarios 33 policiales policiales 32 nacional nacional 28 policial policial 28 wordcloud(df13$word,df13$freq) There were 34 warnings (use warnings() to see them) head(df13) word freq seguridad seguridad 33 homicidios homicidios 29 año año 27 país país 21 inseguridad inseguridad 20 violencia violencia 20 Como ven, puedo seguir el procedimiento hasta obtener el wordcloud por cada uno de los informes, pero no logro encontrar la manera de unir los dos documentos de manera que pueda mostrarlos comparativamente en dos wordclouds. En este sentido, he leído, y por lo que entiendo los dos documentos se unen en un solo corpus, que debería contener los dos documentos. Eso lo hice, con los informes, y efectivamente podría representar una sola ventana con el wordcloud de ambos informes. Sin embargo, cuando trato de aplicar colnames, el mensaje de error sigue siendo length of 'dimnames' [2] not equal to array extent, es decir, como si no pudiera aplicar las columnas porque se trata de un solo documento. Entonces, solicito una vez más su valiosa ayuda en lo relacionado con lo siguiente: Después de tener ambos data.frame (año 2005 y año 2013) es que se deben unir los datos. Ahora bien, esto debe realizarse con la orden Corpus? Como dije, lo trabajé uniéndolos en esa orden y me dio el mensaje dimnames [2] no equal to array extent en el paso de aplicación de nombres de columnas. Los uní antes, como en el siguiente ejemplo http://www.webmining.cl/2014/05/text-mining-comparacion-de-2-discursos-presidenciales-del-21-de-mayo-usando-r/ y tampoco logré aplicarle colnames, ni tener la forma de matriz que se requiere para colocar los años en las columnas y las palabras contadas en las filas. Realmente he estado estudiando la herramienta R, y leído varios artículos y revisado materiales relacionados para buscar la manera, pero no logro dar con la manera de visualizar. Gracias nuevamente por la atención. Y gracias por la disposición. Cordial y atentamente, El día 25 de julio de 2014, 0:16, Alfredo David Alvarado Ríos david.alvarad...@gmail.com escribió: Buenas noches grupo. Saludos cordiales. He seguido en la búsqueda de una forma que me permita realizar la comparación de dos documentos pertenecientes a los años 2005 y 2013, y que pueda representar finalmente con wordcloud y con una table en la que las columnas sean los años de cada informe 2005 y 2013, y las filas sean las palabras con la frecuencia de cada una de ellas por cada informe: -- || 2005 | 2013 | -- | terminos | | | -- | terminos | | | -- De manera que buscando y experimentando, adaptando de otras experiencias logré llegar a lo siguiente: ## informes-c(2013, 2005) pathname-C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/ TDM-function(informes, pathname) { info.dir-sprintf(%s/%s, pathname, informes) info.cor-Corpus(DirSource(directory=info.dir, encoding=UTF-8)) info.cor.cl-tm_map(info.cor, content_transformer(tolower)) info.cor.cl-tm_map(info.cor.cl, stripWhitespace) info.cor.cl-tm_map(info.cor.cl,removePunctuation) sw-readLines(C:/Users/d_2/Documents/StopWords.txt, encoding=UTF-8) sw-iconv(enc2utf8(sw), sub = byte) info.cor.cl-tm_map(info.cor.cl, removeWords, stopwords(spanish)) info.tdm-TermDocumentMatrix(info.cor.cl) result-list(name = informes, tdm= info.tdm) } tdm-lapply(informes, TDM, path = pathname) Resultado: tdm [[1]] [[1]]$name [1] 2013 [[1]]$tdm TermDocumentMatrix (terms: 1540, documents: 1) Non-/sparse entries: 1540/0 Sparsity : 0% Maximal term length: 18 Weighting : term frequency (tf) [[2]] [[2]]$name [1] 2005 [[2]]$tdm TermDocumentMatrix (terms: 1849, documents: 1) Non-/sparse entries: 1849/0 Sparsity : 0% Maximal term length: 19 Weighting : term frequency (tf) str(tdm) List of 2 $ :List of 2 ..$ name: 2013
Re: [R-es] wordcloud y tabla de palabras
Hola, La referencia (gracias por proporcionarla) que has incluido es bastante clara y se puede seguir. ¿Has podido sobre tus dos discursos utilizar la misma lógica? La forma de salir de dudas, para empezar, es que adjuntaras el código que estás empleando por ver si hay algún error evidente. Aunque la forma adecuada para que te podamos ayudar es con un ejemplo reproducible: código + datos. Saludos, Carlos Ortega www.qualityexcellence.es El 28 de julio de 2014, 21:24, Alfredo David Alvarado RÃos david.alvarad...@gmail.com escribió: Buenas tardes, grupo. Agradecido Carlos por tu orientación y Eduardo. Efectivamente, seguà el ejemplo de wordclouds, y al igual que anteriormente logró hacer la nube de texto, pero sólo por cada uno de los textos considerados. Tengo los dos corpus clean por cada uno de los informes que estoy considerando: año 2005 y 2013. tdm05-TermDocumentMatrix(cor.05.cl) tdm13-TermDocumentMatrix(cor.13.cl) m05-as.matrix(tdm05) m13-as.matrix(tdm13) v05 - sort(rowSums(m05),decreasing=TRUE) v13 - sort(rowSums(m13),decreasing=TRUE) df05-data.frame(word = names(v05), freq=v05) df13-data.frame(word = names(v13), freq=v13) wordcloud(df05$word,df05$freq) There were 50 or more warnings (use warnings() to see the first 50) head(df05) word freq seguridad seguridad 56 ciudadana ciudadana 40 funcionarios funcionarios 33 policiales policiales 32 nacional nacional 28 policial policial 28 wordcloud(df13$word,df13$freq) There were 34 warnings (use warnings() to see them) head(df13) word freq seguridad seguridad 33 homicidios homicidios 29 año año 27 paÃs paÃs 21 inseguridad inseguridad 20 violencia violencia 20 Como ven, puedo seguir el procedimiento hasta obtener el wordcloud por cada uno de los informes, pero no logro encontrar la manera de unir los dos documentos de manera que pueda mostrarlos comparativamente en dos wordclouds. En este sentido, he leÃdo, y por lo que entiendo los dos documentos se unen en un solo corpus, que deberÃa contener los dos documentos. Eso lo hice, con los informes, y efectivamente podrÃa representar una sola ventana con el wordcloud de ambos informes. Sin embargo, cuando trato de aplicar colnames, el mensaje de error sigue siendo length of 'dimnames' [2] not equal to array extent, es decir, como si no pudiera aplicar las columnas porque se trata de un solo documento. Entonces, solicito una vez más su valiosa ayuda en lo relacionado con lo siguiente: Después de tener ambos data.frame (año 2005 y año 2013) es que se deben unir los datos. Ahora bien, esto debe realizarse con la orden Corpus? Como dije, lo trabajé uniéndolos en esa orden y me dio el mensaje dimnames [2] no equal to array extent en el paso de aplicación de nombres de columnas. Los unà antes, como en el siguiente ejemplo http://www.webmining.cl/2014/05/text-mining-comparacion-de-2-discursos-presidenciales-del-21-de-mayo-usando-r/ y tampoco logré aplicarle colnames, ni tener la forma de matriz que se requiere para colocar los años en las columnas y las palabras contadas en las filas. Realmente he estado estudiando la herramienta R, y leÃdo varios artÃculos y revisado materiales relacionados para buscar la manera, pero no logro dar con la manera de visualizar. Gracias nuevamente por la atención. Y gracias por la disposición. Cordial y atentamente, El dÃa 25 de julio de 2014, 0:16, Alfredo David Alvarado RÃos david.alvarad...@gmail.com escribió: Buenas noches grupo. Saludos cordiales. He seguido en la búsqueda de una forma que me permita realizar la comparación de dos documentos pertenecientes a los años 2005 y 2013, y que pueda representar finalmente con wordcloud y con una table en la que las columnas sean los años de cada informe 2005 y 2013, y las filas sean las palabras con la frecuencia de cada una de ellas por cada informe: -- || 2005 | 2013 | -- | terminos | | | -- | terminos | | | -- De manera que buscando y experimentando, adaptando de otras experiencias logré llegar a lo siguiente: ## informes-c(2013, 2005) pathname-C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/ TDM-function(informes, pathname) { info.dir-sprintf(%s/%s, pathname, informes) info.cor-Corpus(DirSource(directory=info.dir, encoding=UTF-8)) info.cor.cl-tm_map(info.cor, content_transformer(tolower)) info.cor.cl-tm_map(info.cor.cl, stripWhitespace) info.cor.cl-tm_map(info.cor.cl,removePunctuation) sw-readLines(C:/Users/d_2/Documents/StopWords.txt, encoding=UTF-8) sw-iconv(enc2utf8(sw), sub = byte)