Re: [R] working on a data frame

2014-07-28 Thread PIKAL Petr
Hi

I like to use logical values directly in computations if possible.

yourData[,10] - yourData[,9]/(yourData[,8]+(yourData[,8]==0))

Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used 
in computations. If you really want to change 0 to 1 in column 8 you can use

yourData[,8]  -  yourData[,8]+(yourData[,8]==0)

without ifelse stuff.

Regards
Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of William Dunlap
 Sent: Friday, July 25, 2014 8:07 PM
 To: Matthew
 Cc: r-help@r-project.org
 Subject: Re: [R] working on a data frame

  if
  yourData[,8]==0,
  then
  yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]

 You could do express this in R as
is8Zero - yourData[,8] == 0
yourData[is8Zero, 8] - 1
yourData[is8Zero, 10] - yourData[is8Zero,9] / yourData[is8Zero,8]
 Note how logical (Boolean) values are used as subscripts - read the '['
 as 'such that' when using logical subscripts.

 There are many more ways to express the same thing.

 (I am tempted to change the algorithm to avoid the divide by zero
 problem by making the quotient (numerator + epsilon)/(denominator +
 epsilon) where epsilon is a very small number.  I am assuming that the
 raw numbers are counts or at least cannot be negative.)

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Fri, Jul 25, 2014 at 10:44 AM, Matthew
 mccorm...@molbio.mgh.harvard.edu wrote:
  Thank you for your comments, Peter.
 
  A couple of questions.  Can I do something like the following ?
 
  if
  yourData[,8]==0,
  then
  yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]
 
 
  I think I am just going to have to learn more about R. I thought
  getting into R would be like going from Perl to Python or Java etc.,
  but it seems like R programming works differently.
 
  Matthew
 
 
  On 7/25/2014 12:06 AM, Peter Alspach wrote:
 
  Tena koe Matthew
 
   Column 10 contains the result of the value in column 9 divided by
  the value in column 8. If the value in column 8==0, then the
 division
  can not be done, so  I want to change the zero to a one in order to
 do the division..
  That being the case, think in terms of vectors, as Sarah says.  Try:
 
  yourData[,10] - yourData[,9]/yourData[,8]
  yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9]
 
  This doesn't change the 0 to 1 in column 8, but it doesn't appear
 you
  actually need to do that.
 
  HTH 
 
  Peter Alspach
 
  -Original Message-
  From: r-help-boun...@r-project.org
  [mailto:r-help-boun...@r-project.org]
  On Behalf Of Matthew McCormack
  Sent: Friday, 25 July 2014 3:16 p.m.
  To: Sarah Goslee
  Cc: r-help@r-project.org
  Subject: Re: [R] working on a data frame
 
 
  On 7/24/2014 8:52 PM, Sarah Goslee wrote:
 
  Hi,
 
  Your description isn't clear:
 
  On Thursday, July 24, 2014, Matthew
  mccorm...@molbio.mgh.harvard.edu
 mailto:mccorm...@molbio.mgh.harvard.edu wrote:
 
   I am coming from the perspective of Excel and VBA scripts, but
 I
   would like to do the following in R.
 
I have a data frame with 14 columns and 32,795 rows.
 
   I want to check the value in column 8 (row 1) to see if it is
 a 0.
   If it is not a zero, proceed to the next row and check the
 value
   for column 8.
   If it is a zero, then
   a) change the zero to a 1,
   b) divide the value in column 9 (row 1) by 1,
 
 
  Row 1, or the row in which column 8 == 0?
 
  All rows in which the value in column 8==0.
 
  Why do you want to divide by 1?
 
  Column 10 contains the result of the value in column 9 divided by
 the
  value in column 8. If the value in column 8==0, then the division
 can
  not be done, so  I want to change the zero to a one in order to do
 the division.
  This is a fairly standard thing to do with this data. (The data are
  measurements of amounts at two time points. Sometimes a thing will
  not be present in the beginning (0), but very present at the later
  time. Column 10 is the log2 of the change. Infinite is not an easy
  number to work with, so it is common to change the 0 to a 1. On the
  other hand, something may be present at time 1, but not at the later
  time. In this case column 10 would be taking the log2 of a number
  divided by 0, so again the zero is commonly changed to a one in
 order
  to get a useable value in column 10. In both the preceding cases
  there was a real change, but Inf and NaN are not helpful.)
 
   c) place the result in column 10 (row 1) and
 
 
  Ditto on the row 1 question.
 
  I want to work on all rows where column 8 (and column 9) contain a
 zero.
  Column 10 contains the result of the value in column 9 divided by
 the
  value in column 8. So, for row 1, column 10 row 1 contains the ratio
  column
  9 row 1 divided by column 8 row 1, and so on through the whole
 32,000
  or so rows.
 
  Most rows do not have a zero in columns 8 or 9. Some rows have  zero
  in column 8 only, and some 

[R] lattice, latticeExtra: Adding moving averages to double y plot

2014-07-28 Thread Anna Zakrisson Braeunlich
Hi lattice users,

I would like to add 5-year moving averages to my double y-plot. I have three 
factors needs to be plotted with moving averages in the same plot. One of these 
reads off y-axis 1 and two from y-axis 2. I have tried to use the rollmean 
function from the zoo-packages, but I fail in insering this into lattice (I am 
not an experienced lattice user). I want to keep the data points in the plot.
Find below dummy data and the script as well as annotations further describing 
my question.

thank you in advance!
Anna Zakrisson

mydata- data.frame(
  Year = 1980:2009,
  Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)),
  Value = rnorm(90, mean = seq(90),
sd = rep(c(6, 7, 3), each = 10)))

library(Lattice)
library(LatticeExtra)

stuff1data - mydata[(mydata$Type) %in% c(stuff1), ]
stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ]


# make moving averages function using zoo and rollmean:
library(zoo)
library(plyr)

f - function(d)
{
  require(zoo)
  data.frame(Year = d$Year[5:length(d$Year)],
 mavg = rollmean(d$Value, 5))
}

# Apply the function to each group as well as both data frames:
madfStuff1 - ddply(stuff1data, Type, f)
madfStuff2_3 - ddply(stuff12_3data, Type, f)

# Some styles:
myStripStyle - function(which.panel, factor.levels, ...) {
  panel.rect(0, 0, 1, 1,
 col = bgColors[which.panel],
 border = 1)
  panel.text(x = 0.5, y = 0.5,
 font=2,
 lab = factor.levels[which.panel],
 col = txtColors[which.panel])
}


myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black,
   lty=1, pch=1,
   ylab = sweets, strip.left = F,
   strip=myStripStyle,
   xlab = (Year),
  panel = function(x,y,...,subscripts){
panel.xyplot(x, y, pch = 1,col = black)
panel.lmline(x,y,col = black, data=madfStuff1) # here I 
presume that panel.lmline is wrong.
# I would like to have my 5 year moving average here, not a 
straight line.
  })
myplot1


myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black,
  lty=1, pch=1,
  ylab = hours, strip.left = F,
  strip=myStripStyle,
  xlab = (Year),
  panel = function(x,y,...,subscripts){
panel.xyplot(x, y, pch = c(2:3),col = black) ## what is 
this pch defining? Types?
#I would like to have different symbols and line types for 
stuff2 and stuff3
panel.lmline(x,y,col = black, data=madfStuff2_3) # wrong! 
Need my moving averages here!
  })
myplot2

doubleYScale(myplot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE,
 text = c(stuff1, stuff2, stuff3), columns = 2, col=black)

# problem here is that I end up with two lines. I need a double y-plot with one 
moving average plots that are read off y-axis 1
# and two that reads off y-axis 2. I need to keep the data points in the plot.

update(trellis.last.object(),
   par.settings = simpleTheme(col = c(black, black), lty=c(1:3), 
pch=c(1:3))) # how come that I only get
# lines in my legend text and not the symbols too. I thought pch would add 
symbols?!?


Anna Zakrisson Braeunlich
PhD student

Department of Ecology, Environment and Plant Sciences
Stockholm University
Svante Arrheniusv. 21A
SE-106 91 Stockholm
Sweden/Sverige

Lives in Berlin.
For paper mail:
Katzbachstr. 21
D-10965, Berlin
Germany/Deutschland

E-mail: anna.zakris...@su.se
Tel work: +49-(0)3091541281
Mobile: +49-(0)15777374888
LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

º`•. . • `•. .• `•. . º`•. . • `•. .• `•. .º`•. . • `•. .• 
`•. .º

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Differencing between 2 previous values

2014-07-28 Thread Pavneet Arora
Hello All,

I am trying to do a simple thing of calculating the absolute difference 
between 2 previous values. Since my original data consists of 30 rows, 
this column where I am storing my absolute difference values only consists 
of 29 rows (called the ?differ?)! And I am having troubling cbind ing the 
2 columns. Is there any way I can make the first  row of ?differ? column 
as NA?

So my data looks like following
dput(data)
tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 
11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 
10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 
11.31, 10.52)), .Names = c(week, value), row.names = c(NA, 
-30L), class = data.frame)

This is how I calculate my ?diff? column:
differ - abs(diff(data$value))
Which gives me the following results:
[1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96
[13] 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90
[25] 0.48 0.70 1.24 0.31 0.79

As you can see this only contains 29 rows, so when I try to cbind it to my 
current data, I have an error. 
cbind(differ,data)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 29, 30

What I ideally want is my new dataset to look as:
WeekValue   Differ
   1  9.45 NA
   2  7.991.46
   3  9.291.30

And so on?.

***
MORE THN is a trading style of Royal  Sun Alliance Insurance plc (No. 93792). 
Registered in England and Wales at St. Mark’s Court, Chart Way, Horsham, West 
Sussex, RH12 1XL. 

Authorised by the Prudential Regulation Authority and regulated by the 
Financial Conduct Authority and the Prudential Regulation Authority.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differencing between 2 previous values

2014-07-28 Thread PIKAL Petr
Hi

see in line

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Pavneet Arora
 Sent: Monday, July 28, 2014 11:08 AM
 To: r-help@r-project.org
 Subject: [R] Differencing between 2 previous values

 Hello All,

 I am trying to do a simple thing of calculating the absolute difference
 between 2 previous values. Since my original data consists of 30 rows,
 this column where I am storing my absolute difference values only
 consists of 29 rows (called the ?differ?)! And I am having troubling
 cbind ing the
 2 columns. Is there any way I can make the first  row of ?differ?
 column as NA?

 So my data looks like following
 dput(data)
 tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value
 = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34,
 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33,
 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52)), .Names =
 c(week, value), row.names = c(NA, -30L), class = data.frame)

 This is how I calculate my ?diff? column:
 differ - abs(diff(data$value))

differ - c(NA, abs(diff(data$value)))

Regards
Petr

 Which gives me the following results:
 [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96 [13]
 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90 [25] 0.48
 0.70 1.24 0.31 0.79

 As you can see this only contains 29 rows, so when I try to cbind it to
 my current data, I have an error.
 cbind(differ,data)
 Error in data.frame(..., check.names = FALSE) :
   arguments imply differing number of rows: 29, 30

 What I ideally want is my new dataset to look as:
 WeekValue   Differ
1  9.45 NA
2  7.991.46
3  9.291.30

 And so on?.

 ***
 ***
 ***
 **
 MORE THN is a trading style of Royal  Sun Alliance Insurance plc (No.
 93792). Registered in England and Wales at St. Mark’s Court, Chart
 Way, Horsham, West Sussex, RH12 1XL.

 Authorised by the Prudential Regulation Authority and regulated by the
 Financial Conduct Authority and the Prudential Regulation Authority.
 ***
 ***
 ***
 ***

   [[alternative HTML version deleted]]



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of 

Re: [R] Differencing between 2 previous values

2014-07-28 Thread peter dalgaard

On 28 Jul 2014, at 11:08 , Pavneet Arora pavneet.ar...@uk.rsagroup.com wrote:

 Hello All,
 
 I am trying to do a simple thing of calculating the absolute difference 
 between 2 previous values. Since my original data consists of 30 rows, 
 this column where I am storing my absolute difference values only consists 
 of 29 rows (called the ?differ?)! And I am having troubling cbind ing the 
 2 columns. Is there any way I can make the first  row of ?differ? column 
 as NA?
 
 So my data looks like following
 dput(data)
 tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
 29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 
 11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 
 10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 
 11.31, 10.52)), .Names = c(week, value), row.names = c(NA, 
 -30L), class = data.frame)
 
 This is how I calculate my ?diff? column:
 differ - abs(diff(data$value))
 Which gives me the following results:
 [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96
 [13] 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90
 [25] 0.48 0.70 1.24 0.31 0.79
 
 As you can see this only contains 29 rows, so when I try to cbind it to my 
 current data, I have an error. 
 cbind(differ,data)
 Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 29, 30
 
 What I ideally want is my new dataset to look as:
 WeekValue   Differ
   1  9.45 NA
   2  7.991.46
   3  9.291.30
 
 And so on?.

The straightforward way is 

data$Differ - c(NA, abs(diff(data$value)))

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Differencing between 2 previous values

2014-07-28 Thread Pavneet Arora
Thank you for that simple answer. Really appreciate it.



From:   PIKAL Petr petr.pi...@precheza.cz
To: Pavneet Arora/UK/RoyalSun@RoyalSun, r-help@r-project.org 
r-help@r-project.org
Date:   28/07/2014 10:26
Subject:RE: [R] Differencing between 2 previous values



Hi

see in line

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Pavneet Arora
 Sent: Monday, July 28, 2014 11:08 AM
 To: r-help@r-project.org
 Subject: [R] Differencing between 2 previous values

 Hello All,

 I am trying to do a simple thing of calculating the absolute difference
 between 2 previous values. Since my original data consists of 30 rows,
 this column where I am storing my absolute difference values only
 consists of 29 rows (called the ?differ?)! And I am having troubling
 cbind ing the
 2 columns. Is there any way I can make the first  row of ?differ?
 column as NA?

 So my data looks like following
 dput(data)
 tructure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), value
 = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 11.46, 9.2, 10.34,
 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 10.31, 10, 13, 10.9, 9.33,
 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 11.31, 10.52)), .Names =
 c(week, value), row.names = c(NA, -30L), class = data.frame)

 This is how I calculate my ?diff? column:
 differ - abs(diff(data$value))

differ - c(NA, abs(diff(data$value)))

Regards
Petr

 Which gives me the following results:
 [1] 1.46 1.30 2.37 0.50 1.98 2.14 3.42 2.26 1.14 1.31 2.44 0.96 [13]
 1.11 0.68 0.71 1.25 0.31 0.31 3.00 2.10 1.57 2.96 0.79 0.90 [25] 0.48
 0.70 1.24 0.31 0.79

 As you can see this only contains 29 rows, so when I try to cbind it to
 my current data, I have an error.
 cbind(differ,data)
 Error in data.frame(..., check.names = FALSE) :
   arguments imply differing number of rows: 29, 30

 What I ideally want is my new dataset to look as:
 WeekValue   Differ
1  9.45 NA
2  7.991.46
3  9.291.30

 And so on?.

 ***
 ***
 ***
 **
 MORE THN is a trading style of Royal  Sun Alliance Insurance plc (No.
 93792). Registered in England and Wales at St. Markâ??s Court, Chart
 Way, Horsham, West Sussex, RH12 1XL.

 Authorised by the Prudential Regulation Authority and regulated by the
 Financial Conduct Authority and the Prudential Regulation Authority.
 ***
 ***
 ***
 ***

   [[alternative HTML version deleted]]



Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě 
neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho 
kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento 
email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou 
modifikacemi 
či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření 
smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně 
přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve 
výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za 
společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně 
zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly 
adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, 
předloženy nebo jejich existence je adresátovi či osobě jím zastoupené 
známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its 
sender. Delete the contents of this e-mail with all attachments and its 
copies from your system.
If you are not the intended recipient of this e-mail, you are not 
authorized to use, disseminate, copy or disclose this e-mail in any 
manner.
The sender of this e-mail shall not be 

Re: [R] Determine all specific same dates between two given dates

2014-07-28 Thread Frank S.
Many thanks for you help Uwe! 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and external C library cannot open shared object file while LD_LIBRARY_PATH is set

2014-07-28 Thread Pierre Lindenbaum
Thanks but that doesn't work: R cannot load a simple external library 
even if the full path to the directory  is specified in the LD_LIBRARY_PATH.


I posted a minimal example on gist.github:

https://gist.github.com/lindenb/7cd766cbb37de01f6cce

The simple C file is compiled but I'm not able to load the library.

Pierre


(...)

( cross-posted on SO: http://stackoverflow.com/questions/24955829/ )
I'm building a C extension for R, this library also uses the HDF5 library.

I compiled a dynamic library (gcc flags: -fPIC -shared -Wl,-soname,libmy.so
the library seems to be loaded but R is still missing the symbols from the
hdf5 library:


I am not any kind of expert, so that this as just a vague possibility
from someone who just wants to try to help. But I am concerned about
the options in the above. In particular, you have
-Wl,-soname,libmy.so Which looks slightly wrong to me, based on the
man for ld. I think that, perhaps, this should be:
-Wl,-soname=libmy.so. Notice the = instead of the ,




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function assignment

2014-07-28 Thread Florian Ryan

Thank you very much!

The idea was to use it with an external reference and kind of to write an 
constructor 
for an object which points at this external reference and behaves like an R 
object.

So it would be the same as if I do just
name - function(name, someValuesForObjectConstruction)
but I don't have to provide the names twice.
Where function returns an object which stores the string name
and knows get, set, ...

and since with 
setReplaceMethod(f=[, signature=myExternList,
definition=function(x, i=character, j=missing, y){
setExternReference(i, y)
return(x)   
}

I can get nicely the name inside the bracket (x[name] - )
it seemed possible to get name some how out of an name -
assignment.

Again thank you very much for hints and advice.

 

Florian Ryan
florian.r...@aim.com

 

 

-Original Message-
From: peter dalgaard pda...@gmail.com
To: Jeff Newmiller jdnew...@dcn.davis.ca.us
Cc: Florian Ryan florian.r...@aim.com; r-help r-help@r-project.org
Sent: Sat, Jul 26, 2014 10:04 pm
Subject: Re: [R] Function assignment



On 26 Jul 2014, at 17:01 , Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:

 What an awful idea... that would lead to incredibly hard-to-debug programs. 
No, you cannot do that. What kind of problem has led you to want such a 
capability? Perhaps we can suggest a simpler way to think about your problem.

I agree that this is a silly idea, but I actually thought that it could be done 
by clever manipulation of the call stack. It can if you do the assignment with 
assign():

 foo - function()sys.calls()[[1]][[2]]
 assign(z, foo())
 z
[1] z
 assign(bah, foo())
 bah
[1] bah

but if you do x - foo(), there is no mention of x or x in sys.calls().

Anyways, functions that assume being called in a specific are asking for 
trouble 
in all cases where they get called differently.

-pd



 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 --- 
 Sent from my phone. Please excuse my brevity.
 
 On July 26, 2014 5:29:59 AM PDT, Florian Ryan florian.r...@aim.com wrote:
 Hello,
 
 I would like to use the variable name which i assign the return value
 of a function in a function. Is that possible?
 e.g.
 
 foo - function(){
   some not to me known R magic
 }
 
 myVariableName - foo()
 myVariableName
 [1] myVariableName
 
 Hope someone can help me.
 
 Thanks
 Florian
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com









 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a package for EFA with multiple groups?

2014-07-28 Thread Joshua Wiley
Hi Elizabeth,

In confirmatory factor analysis with multiple groups, the reason one needs
to estimate the models simultaneously is that, typically, one is interested
in applying constraints (e.g., forcing all or some of the factor loadings
to be equal across groups).  In exploratory factor analysis, constraints
are uncommon (they are somewhat un-exploratory).

I would suggest simply using the psych package and subsetting your data to
the particular group, as in:

efa( data = subset(data, Group == Group1) )

efa( data = subset(data, Group == Group2) )

etc.

As you noted, lavaan will allow you to test multiple group CFAs, so if/when
you are ready to see whether the same configural factor structure or any
other level of invariance holds across your groups, you can use it.

Sincerely,

Josh




On Mon, Jul 28, 2014 at 2:46 PM, Elizabeth Barrett-Cheetham 
ebarrettcheet...@gmail.com wrote:

 Hello R users,

 I’m hoping to run an exploratory and confirmatory factor analysis on a
 psychology survey instrument. The data has been collected from
 multiple groups, and it’s likely that the data is hierarchical/has 2nd
 order factors.

 It appears that the lavaan package allows me to run a multiple group
 hierarchical confirmatory factor analysis. Yet, I can’t locate a
 package that can run the equivalent exploratory analysis.

 Could anyone please direct me to an appropriate package?

 Many thanks,

 Elizabeth

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using foumula to calculate a column in dataframe

2014-07-28 Thread Pavneet Arora
Hello All,
I need to calculate a column (Vupper) using a formula, but I am not sure 
how to. It will be easier to explain with an example. 

Again this is my dataset:
dput(nd)
structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04, 
11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62, 
10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62, 
11.31, 10.52), cusum = c(-0.551, -2.56, -3.27, -1.61, 
0.549, 0.729, -1.23, 0.229, 
-0.572, -0.232, -1.2, 0.268, 
0.778, 0.178, 0.258, 
-0.373, 
0.246, 0.557, 0.557, 3.56, 
4.46, 3.79, 6.08, 7.58, 8.18, 9.26, 9.64, 11.26, 12.57, 13.09
)), .Names = c(week, value, cusum), row.names = c(NA, -30L
), class = data.frame)

I have some constants in my data. These are:
sigma =1, h = 5, k = 0.5

The formula requires me to start from the bottom row (30th in this case). 
The formula for the last row will be row 30th Cusi value (13.09) + h(5) * 
sigma(1) = giving me the value of 18.1

Then the formula for the 29th row for Vupper uses the value of 30th Vupper 
(18.1) + k(0.5) * sigma(1) = giving me the value of 18.6

Similarly the formula for the 28th row for Vupper will use value of 29th 
Vupper(18.6) + k(0.5) * sigma(1) = giving me the value of 19.1

And so on?.

Also, is there any way to make the formula generalised using loop or 
functions? Because I really don?t want to have to re-write the program if 
my number of rows increase or decrease or if I use another dataset? 

So far my function looks like following (Without the Vupper formula in 
there):
vmask2 - function(data,target,sigma,h,k){
  data$deviation - data$value - target
  data$cusums - cumsum(data$deviation)
  data$ma - c(NA,abs(diff(data$value)))
  data$Vupper - *not sure what to put here*

  data
}
***
MORE THN is a trading style of Royal  Sun Alliance Insurance plc (No. 93792). 
Registered in England and Wales at St. Mark’s Court, Chart Way, Horsham, West 
Sussex, RH12 1XL. 

Authorised by the Prudential Regulation Authority and regulated by the 
Financial Conduct Authority and the Prudential Regulation Authority.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculate depth from regular xyz grid for any coordinate within the grid

2014-07-28 Thread Kulupp

Dear R-experts,

I have a regular grid dataframe (here: the first 50 rows) :

# data frame (regular grid) with x, y (UTM-coordinates) and z (depth)
# x=UTM coordinates (easting, zone 32)
# y=UTM coordinates (northing, zone 32)
# z=river-depth (meters)
df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 
3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 
3454250, 3454250, 3454250,
 3454250, 3454250, 3454260, 3454260, 3454260, 
3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 
3454260, 3454260, 3454260,
 3454260, 3454260, 3454260, 3454260, 3454260, 
3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 
3454270, 3454270, 3454270,

 3454270, 3454270),
 y=c(5970610, 5970620, 5970630, 5970640, 5970650, 
5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 
5970690, 5970700, 5970710,
 5970720, 5970730, 5970610, 5970620, 5970630, 
5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 
5970720, 5970730, 5970740,
 5970750, 5970760, 5970770, 5970780, 5970790, 
5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 
5970660, 5970670, 5970680,

 5970690, 5970700),
 z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, 
-1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, 
-1.6794, -1.6930, -1.7067,
  -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, 
-1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, 
-1.7285, -1.7422, -1.7558,
  -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, 
-1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, 
-1.6550, -1.6686, -1.6823,

  -1.6959, -1.7095))
head(df)
plot(df[,1:2], las=3)   # to show that it's a regular grid

My question: is there a function to calculate the depth of any 
coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by 
bilinear interpolation or any other meaningful method?


Thanks a lot for your help in anticipation

Best wishes

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate depth from regular xyz grid for any coordinate within the grid

2014-07-28 Thread Sarah Goslee
Hi,

The area of statistics you're looking for is called geostatistics.
There are many R packages to conduct such analyses. See the Spatial
task view for some good starting points:

http://cran.r-project.org/web/views/Spatial.html

You'll need to do some homework to understand the various options and
which are best for your data. You might start with Inverse Distance
Weighting.

Sarah

On Mon, Jul 28, 2014 at 9:07 AM, Kulupp kul...@online.de wrote:
 Dear R-experts,

 I have a regular grid dataframe (here: the first 50 rows) :

 # data frame (regular grid) with x, y (UTM-coordinates) and z (depth)
 # x=UTM coordinates (easting, zone 32)
 # y=UTM coordinates (northing, zone 32)
 # z=river-depth (meters)
 df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250,
 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250,
 3454250, 3454250,
  3454250, 3454250, 3454260, 3454260, 3454260, 3454260,
 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260,
 3454260, 3454260,
  3454260, 3454260, 3454260, 3454260, 3454260, 3454260,
 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270,
 3454270, 3454270,
  3454270, 3454270),
  y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610,
 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690,
 5970700, 5970710,
  5970720, 5970730, 5970610, 5970620, 5970630, 5970640,
 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720,
 5970730, 5970740,
  5970750, 5970760, 5970770, 5970780, 5970790, 5970800,
 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660,
 5970670, 5970680,
  5970690, 5970700),
  z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704,
 -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794,
 -1.6930, -1.7067,
   -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195,
 -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285,
 -1.7422, -1.7558,
   -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376,
 -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550,
 -1.6686, -1.6823,
   -1.6959, -1.7095))
 head(df)
 plot(df[,1:2], las=3)   # to show that it's a regular grid

 My question: is there a function to calculate the depth of any coordinate
 pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear
 interpolation or any other meaningful method?

 Thanks a lot for your help in anticipation

 Best wishes

 Thomas

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is dataset headsize from MVA or HSAUR2 packages missing or am I missing something ?

2014-07-28 Thread ottorino
Dear R-helpers,
I've started the study of 

An introductionto applied multivariate analysis with R (Everitt and
Hothorn)

After loading the library, which depends on HSAUR2, it seems that the
dataset headsize is not available (as well as measure and exam)

Datasets from the same book are nevertheless available, such as
heptathlon, pottery, USairpollution

Am I missing something obvious here ?

Thanks in advance

-- 
Ottorino-Luca Pantani, Università di Firenze
Dip.to di Scienze delle Produzioni Agroalimentari e  
dell'Ambiente (DISPAA)
P.zle Cascine 28 50144 Firenze Italia
Debian 7.0 wheezy -- GNOME 3.4.2
GNU Emacs 24.4.1 (i486-pc-linux-gnu, GTK+ Version 2.24.10)
ESS version 12.04-4 -- R 3.1.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function assignment

2014-07-28 Thread Jeff Newmiller
I am sorry but I don't follow your description beyond use it with an external 
reference because external references are external, while the destination of 
an assignment is an internal reference. If you want the destination to be an 
object which uses special knowledge (an external reference) to store the value 
(e.g. by an assignment function) into an external object, then the internal 
object should have PREVIOUSLY been constructed with that special knowledge 
about that external reference. Thus I don't follow why you want the external 
reference to be the destination of an assignment. The replace method seems much 
more suitable if you want to supply an external key in the course of the 
assignment.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On July 28, 2014 1:59:18 AM PDT, Florian Ryan florian.r...@aim.com wrote:

Thank you very much!

The idea was to use it with an external reference and kind of to write
an constructor 
for an object which points at this external reference and behaves like
an R object.

So it would be the same as if I do just
name - function(name, someValuesForObjectConstruction)
but I don't have to provide the names twice.
Where function returns an object which stores the string name
and knows get, set, ...

and since with 
setReplaceMethod(f=[, signature=myExternList,
definition=function(x, i=character, j=missing, y){
setExternReference(i, y)
return(x)   
}

I can get nicely the name inside the bracket (x[name] - )
it seemed possible to get name some how out of an name -
assignment.

Again thank you very much for hints and advice.

 

Florian Ryan
florian.r...@aim.com

 

 

-Original Message-
From: peter dalgaard pda...@gmail.com
To: Jeff Newmiller jdnew...@dcn.davis.ca.us
Cc: Florian Ryan florian.r...@aim.com; r-help r-help@r-project.org
Sent: Sat, Jul 26, 2014 10:04 pm
Subject: Re: [R] Function assignment



On 26 Jul 2014, at 17:01 , Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 What an awful idea... that would lead to incredibly hard-to-debug
programs. 
No, you cannot do that. What kind of problem has led you to want such a

capability? Perhaps we can suggest a simpler way to think about your
problem.

I agree that this is a silly idea, but I actually thought that it could
be done 
by clever manipulation of the call stack. It can if you do the
assignment with 
assign():

 foo - function()sys.calls()[[1]][[2]]
 assign(z, foo())
 z
[1] z
 assign(bah, foo())
 bah
[1] bah

but if you do x - foo(), there is no mention of x or x in
sys.calls().

Anyways, functions that assume being called in a specific are asking
for trouble 
in all cases where they get called differently.

-pd




---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go...
  Live:   OO#.. Dead: OO#.. 
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
rocks...1k

---

 Sent from my phone. Please excuse my brevity.
 
 On July 26, 2014 5:29:59 AM PDT, Florian Ryan florian.r...@aim.com
wrote:
 Hello,
 
 I would like to use the variable name which i assign the return
value
 of a function in a function. Is that possible?
 e.g.
 
 foo - function(){
   some not to me known R magic
 }
 
 myVariableName - foo()
 myVariableName
 [1] myVariableName
 
 Hope someone can help me.
 
 Thanks
 Florian
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com












Re: [R] lattice, latticeExtra: Adding moving averages to double y plot

2014-07-28 Thread Duncan Mackay
Hi Anna

Not sure what you want exactly as I do not use themes.

Here is one way  to get your averages and points 

# combine averages into mydata
 mydata$mavg -
 c(rep(NA,4), madfStuff1[,3],
   rep(NA,4), subset(madfStuff2_3, Type== stuff2,3, drop = T),
   rep(NA,4), subset(madfStuff2_3, Type== stuff3,3, drop = T))
   
 xyplot(Value ~ Year, mydata, groups = Type,
allow.multiple = T,
distribute.type = TRUE,
col = c(red,blue,cyan),
 subscripts = TRUE,
panel = panel.superpose,
panel.groups = function(x, y, subscripts, ...,group.number) {
  panel.xyplot(x, y, ...)
   panel.xyplot(x, mydata[subscripts,mavg], col =
c(red,blue,cyan)[group.number], type = l)
 }
) 

HTH
And now some sleep

Duncan

BTW package names are case sensitive like R

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Anna Zakrisson Braeunlich
Sent: Monday, 28 July 2014 16:38
To: r-help@r-project.org
Subject: [R] lattice, latticeExtra: Adding moving averages to double y plot

Hi lattice users,

I would like to add 5-year moving averages to my double y-plot. I have three
factors needs to be plotted with moving averages in the same plot. One of
these reads off y-axis 1 and two from y-axis 2. I have tried to use the
rollmean function from the zoo-packages, but I fail in insering this into
lattice (I am not an experienced lattice user). I want to keep the data
points in the plot.
Find below dummy data and the script as well as annotations further
describing my question.

thank you in advance!
Anna Zakrisson

mydata- data.frame(
  Year = 1980:2009,
  Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)),
  Value = rnorm(90, mean = seq(90),
sd = rep(c(6, 7, 3), each = 10)))

library(Lattice)
library(LatticeExtra)

stuff1data - mydata[(mydata$Type) %in% c(stuff1), ]
stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ]


# make moving averages function using zoo and rollmean:
library(zoo)
library(plyr)

f - function(d)
{
  require(zoo)
  data.frame(Year = d$Year[5:length(d$Year)],
 mavg = rollmean(d$Value, 5))
}

# Apply the function to each group as well as both data frames:
madfStuff1 - ddply(stuff1data, Type, f)
madfStuff2_3 - ddply(stuff12_3data, Type, f)

# Some styles:
myStripStyle - function(which.panel, factor.levels, ...) {
  panel.rect(0, 0, 1, 1,
 col = bgColors[which.panel],
 border = 1)
  panel.text(x = 0.5, y = 0.5,
 font=2,
 lab = factor.levels[which.panel],
 col = txtColors[which.panel])
}


myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black,
   lty=1, pch=1,
   ylab = sweets, strip.left = F,
   strip=myStripStyle,
   xlab = (Year),
  panel = function(x,y,...,subscripts){
panel.xyplot(x, y, pch = 1,col = black)
panel.lmline(x,y,col = black, data=madfStuff1) # here
I presume that panel.lmline is wrong.
# I would like to have my 5 year moving average here,
not a straight line.
  })
myplot1


myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black,
  lty=1, pch=1,
  ylab = hours, strip.left = F,
  strip=myStripStyle,
  xlab = (Year),
  panel = function(x,y,...,subscripts){
panel.xyplot(x, y, pch = c(2:3),col = black) ## what
is this pch defining? Types?
#I would like to have different symbols and line types
for stuff2 and stuff3
panel.lmline(x,y,col = black, data=madfStuff2_3) #
wrong! Need my moving averages here!
  })
myplot2

doubleYScale(myplot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE,
 text = c(stuff1, stuff2, stuff3), columns = 2,
col=black)

# problem here is that I end up with two lines. I need a double y-plot with
one moving average plots that are read off y-axis 1
# and two that reads off y-axis 2. I need to keep the data points in the
plot.

update(trellis.last.object(),
   par.settings = simpleTheme(col = c(black, black), lty=c(1:3),
pch=c(1:3))) # how come that I only get
# lines in my legend text and not the symbols too. I thought pch would add
symbols?!?


Anna Zakrisson Braeunlich
PhD student

Department of Ecology, Environment and Plant Sciences
Stockholm University
Svante Arrheniusv. 21A
SE-106 91 Stockholm
Sweden/Sverige

Lives in Berlin.
For paper mail:
Katzbachstr. 21
D-10965, Berlin
Germany/Deutschland

E-mail: anna.zakris...@su.se
Tel work: +49-(0)3091541281
Mobile: +49-(0)15777374888
LinkedIn: http://se.linkedin.com/pub/anna-zakrisson-braeunlich/33/5a2/51b

:`. .  `. . `. 

Re: [R] lattice, latticeExtra: Adding moving averages to double y plot

2014-07-28 Thread Sarah Goslee
An utterly perfect example of why one shouldn't send HTML mail to this list.

On Mon, Jul 28, 2014 at 11:18 AM, Duncan Mackay dulca...@bigpond.com wrote:
 Hi Anna Not sure what you want exactly as I do not use themes. Here is one 
 way  to get your averages and points  # combine averages into mydata  
 mydata$mavg -  c(rep(NA,4), madfStuff1[,3],rep(NA,4), 
 subset(madfStuff2_3, Type== stuff2,3, drop = T),rep(NA,4), 
 subset(madfStuff2_3, Type== stuff3,3, drop = T))  xyplot(Value ~ Year, 
 mydata, groups = Type, allow.multiple = T, distribute.type = 
 TRUE, col = c(red,blue,cyan),  subscripts = TRUE,   
   panel = panel.superpose, panel.groups = function(x, y, subscripts, 
 ...,group.number) {   panel.xyplot(x, y, ...) 
panel.xyplot(x, mydata[subscripts,mavg], col = 
 c(red,blue,cyan)[group.number], type = l)  } And now some 
 sleep Duncan BTW package names are case sensitive like R Duncan Mackay 
 Department of Agronomy and Soil Science University of New England Armidale 
 NSW 2351 Email: home: mac...@northnet.com.au -Original !
 Message- From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Anna Zakrisson Braeunlich 
Sent: Monday, 28 July 2014 16:38 To: r-help@r-project.org Subject: [R] lattice, 
latticeExtra: Adding moving averages to double y plot Hi lattice users, I would 
like to add 5-year moving averages to my double y-plot. I have three factors 
needs to be plotted with moving averages in the same plot. One of these reads 
off y-axis 1 and two from y-axis 2. I have tried to use the rollmean function 
from the zoo-packages, but I fail in insering this into lattice (I am not an 
experienced lattice user). I want to keep the data points in the plot. Find 
below dummy data and the script as well as annotations further describing my 
question. thank you in advance! Anna Zakrisson mydata- data.frame(   Year = 
1980:2009,   Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)),  
 Value = rnorm(90, mean = seq(90), sd = rep(c(6, 7, 3), each = 
10))!
 ) library(Lattice) library(LatticeExtra) stuff1data - mydata[(mydata$
Type) %in% c(stuff1), ] stuff12_3data - mydata[(mydata$Type) %in% 
c(stuff2, stuff3), ] # make moving averages function using zoo and 
rollmean: library(zoo) library(plyr) f - function(d)   require(zoo)   
data.frame(Year = d$Year[5:length(d$Year)],  mavg = 
rollmean(d$Value, 5)) # Apply the function to each group as well as both data 
frames: madfStuff1 - ddply(stuff1data, Type, f) madfStuff2_3 - 
ddply(stuff12_3data, Type, f) # Some styles: myStripStyle - 
function(which.panel, factor.levels, ...) {   panel.rect(0, 0, 1, 1,
  col = bgColors[which.panel],  border = 1)   panel.text(x = 0.5, y 
= 0.5,  font=2,  lab = factor.levels[which.panel],  
col = txtColors[which.panel]) myplot1 - xyplot(Value ~ Year, data = 
stuff1data, col=black,lty=1, pch=1,
ylab = sweets, strip.left = F,strip=myStripStyle, 
   xlab = (Year), !
   panel = function(x,y,...,subscripts){ 
panel.xyplot(x, y, pch = 1,col = black) 
panel.lmline(x,y,col = black, data=madfStuff1) # here I presume that 
panel.lmline is wrong. # I would like to have my 5 year 
moving average here, not a straight line.   }) myplot1 myplot2 
- xyplot(Value ~ Year, data = stuff12_3data, col=black,   
lty=1, pch=1,   ylab = hours, strip.left = F, 
  strip=myStripStyle,   xlab = (Year),   
panel = function(x,y,...,subscripts){ panel.xyplot(x, y, 
pch = c(2:3),col = black) ## what is this pch defining? Types?  
   #I would like to have different symbols and line types for stuff2 and 
stuff3 panel.lmline(x,y,col = black, data=madfStuff2_3) # 
wrong! Need my moving averages here!   }) myplot2 
doubleYScale(myp!
 lot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE,  te
xt = c(stuff1, stuff2, stuff3), columns = 2, col=black) # problem here 
is that I end up with two lines. I need a double y-plot with one moving average 
plots that are read off y-axis 1 # and two that reads off y-axis 2. I need to 
keep the data points in the plot. update(trellis.last.object(),
par.settings = simpleTheme(col = c(black, black), lty=c(1:3), pch=c(1:3))) 
# how come that I only get # lines in my legend text and not the symbols too. I 
thought pch would add symbols?!? Anna Zakrisson Braeunlich PhD student 
Department of Ecology, Environment and Plant Sciences Stockholm University 
Svante Arrheniusv. 21A SE-106 91 Stockholm Sweden/Sverige Lives in Berlin. For 
paper mail: Katzbachstr. 21 D-10965, Berlin 

Re: [R] working on a data frame

2014-07-28 Thread Matthew
Thank you very much Peter, Bill and Petr for some great and quite 
elegant solutions. There is a lot I can learn from these.


Yes to your question Bill about the raw numbers, they are counts 
and they can not be negatives. The data is RNA Sequencing data where 
there are approximately 32,000 genes being measured for changes between 
two conditions. There are some genes that are not present (can not be 
measured) initially, but are present in the second condition, and the 
reverse is true also of some genes that are present initially and then 
not be present in the second condition (these are often the most 
interesting genes). This makes it difficult to compare mathematically 
the changes of all genes, so it is common practice to change the 0's to 
1's and then redo the log2. 1 is considered sufficiently small, actually 
anywhere up to 3 or 5 could be just do to 'background noise' in the 
measurement process, but it is somewhat arbitrary.


Matthew

On 7/28/2014 2:43 AM, PIKAL Petr wrote:

Hi

I like to use logical values directly in computations if possible.

yourData[,10] - yourData[,9]/(yourData[,8]+(yourData[,8]==0))

Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used 
in computations. If you really want to change 0 to 1 in column 8 you can use

yourData[,8]  -  yourData[,8]+(yourData[,8]==0)

without ifelse stuff.

Regards
Petr



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of William Dunlap
Sent: Friday, July 25, 2014 8:07 PM
To: Matthew
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]

You could do express this in R as
is8Zero - yourData[,8] == 0
yourData[is8Zero, 8] - 1
yourData[is8Zero, 10] - yourData[is8Zero,9] / yourData[is8Zero,8]
Note how logical (Boolean) values are used as subscripts - read the '['
as 'such that' when using logical subscripts.

There are many more ways to express the same thing.

(I am tempted to change the algorithm to avoid the divide by zero
problem by making the quotient (numerator + epsilon)/(denominator +
epsilon) where epsilon is a very small number.  I am assuming that the
raw numbers are counts or at least cannot be negative.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jul 25, 2014 at 10:44 AM, Matthew
mccorm...@molbio.mgh.harvard.edu wrote:

Thank you for your comments, Peter.

A couple of questions.  Can I do something like the following ?

if
yourData[,8]==0,
then
yourData[,8]==1, yourData[,10] - yourData[,9]/yourData[,8]


I think I am just going to have to learn more about R. I thought
getting into R would be like going from Perl to Python or Java etc.,
but it seems like R programming works differently.

Matthew


On 7/25/2014 12:06 AM, Peter Alspach wrote:

Tena koe Matthew

 Column 10 contains the result of the value in column 9 divided by
the value in column 8. If the value in column 8==0, then the

division

can not be done, so  I want to change the zero to a one in order to

do the division..

That being the case, think in terms of vectors, as Sarah says.  Try:

yourData[,10] - yourData[,9]/yourData[,8]
yourData[yourData[,8]==0,10] - yourData[yourData[,8]==0,9]

This doesn't change the 0 to 1 in column 8, but it doesn't appear

you

actually need to do that.

HTH 

Peter Alspach

-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org]
On Behalf Of Matthew McCormack
Sent: Friday, 25 July 2014 3:16 p.m.
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] working on a data frame


On 7/24/2014 8:52 PM, Sarah Goslee wrote:

Hi,

Your description isn't clear:

On Thursday, July 24, 2014, Matthew
mccorm...@molbio.mgh.harvard.edu

mailto:mccorm...@molbio.mgh.harvard.edu wrote:

  I am coming from the perspective of Excel and VBA scripts, but

I

  would like to do the following in R.

   I have a data frame with 14 columns and 32,795 rows.

  I want to check the value in column 8 (row 1) to see if it is

a 0.

  If it is not a zero, proceed to the next row and check the

value

  for column 8.
  If it is a zero, then
  a) change the zero to a 1,
  b) divide the value in column 9 (row 1) by 1,


Row 1, or the row in which column 8 == 0?

All rows in which the value in column 8==0.

Why do you want to divide by 1?

Column 10 contains the result of the value in column 9 divided by

the

value in column 8. If the value in column 8==0, then the division

can

not be done, so  I want to change the zero to a one in order to do

the division.

This is a fairly standard thing to do with this data. (The data are
measurements of amounts at two time points. Sometimes a thing will
not be present in the beginning (0), but very present at the later
time. Column 10 is the log2 of the change. Infinite is not an easy
number to work with, so it is common to change the 

Re: [R] Calculate depth from regular xyz grid for any coordinate within the grid

2014-07-28 Thread MacQueen, Don
I believe the interpp() function from the akima package will do what you
want.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 7/28/14, 6:07 AM, Kulupp kul...@online.de wrote:

Dear R-experts,

I have a regular grid dataframe (here: the first 50 rows) :

# data frame (regular grid) with x, y (UTM-coordinates) and z (depth)
# x=UTM coordinates (easting, zone 32)
# y=UTM coordinates (northing, zone 32)
# z=river-depth (meters)
df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240,
3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250,
3454250, 3454250, 3454250,
  3454250, 3454250, 3454260, 3454260, 3454260,
3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260,
3454260, 3454260, 3454260,
  3454260, 3454260, 3454260, 3454260, 3454260,
3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270,
3454270, 3454270, 3454270,
  3454270, 3454270),
  y=c(5970610, 5970620, 5970630, 5970640, 5970650,
5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680,
5970690, 5970700, 5970710,
  5970720, 5970730, 5970610, 5970620, 5970630,
5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710,
5970720, 5970730, 5970740,
  5970750, 5970760, 5970770, 5970780, 5970790,
5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650,
5970660, 5970670, 5970680,
  5970690, 5970700),
  z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247,
-1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658,
-1.6794, -1.6930, -1.7067,
   -1.7216, -1.7384, -1.5786, -1.5922, -1.6059,
-1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149,
-1.7285, -1.7422, -1.7558,
   -1.7694, -1.7831, -1.7967, -1.8103, -1.8239,
-1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414,
-1.6550, -1.6686, -1.6823,
   -1.6959, -1.7095))
head(df)
plot(df[,1:2], las=3)   # to show that it's a regular grid

My question: is there a function to calculate the depth of any
coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by
bilinear interpolation or any other meaningful method?

Thanks a lot for your help in anticipation

Best wishes

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate depth from regular xyz grid for any coordinate within the grid

2014-07-28 Thread Michael Sumner
The raster package can readily provide bilinear interpolation:

library(raster)
r - rasterFromXY(df)
## due diligence, just a guess here you should check
## projection(r) - +proj=utm +zone=32 +datum=WGS84

## coordinates to extract
m - matrix(c( 3454263, 5970687), ncol = 2)

extract(r, m, method = bilinear)
[1] -1.686059

## compare with
extract(r, m, method = simple)
-1.6877

See ?extract - simplest usage is a query matrix of XY coordinates in
the projection used by your raster, it will helpfully transform
queries such as a Spatial*DataFrame if needed, as long as both
raster x and query y have sufficient projection metadata (and it's up
to you to make sure that's set right).

(Generally building a raster from XYZ data is sub-optimal since
there's so much redundancy in the XY coordinates, and so much room for
things to go wrong in between. But sometimes there's no better option.
)

Cheers, Mike.

On Mon, Jul 28, 2014 at 11:07 PM, Kulupp kul...@online.de wrote:
 Dear R-experts,

 I have a regular grid dataframe (here: the first 50 rows) :

 # data frame (regular grid) with x, y (UTM-coordinates) and z (depth)
 # x=UTM coordinates (easting, zone 32)
 # y=UTM coordinates (northing, zone 32)
 # z=river-depth (meters)
 df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250,
 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250,
 3454250, 3454250,
  3454250, 3454250, 3454260, 3454260, 3454260, 3454260,
 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260,
 3454260, 3454260,
  3454260, 3454260, 3454260, 3454260, 3454260, 3454260,
 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270,
 3454270, 3454270,
  3454270, 3454270),
  y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610,
 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690,
 5970700, 5970710,
  5970720, 5970730, 5970610, 5970620, 5970630, 5970640,
 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720,
 5970730, 5970740,
  5970750, 5970760, 5970770, 5970780, 5970790, 5970800,
 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660,
 5970670, 5970680,
  5970690, 5970700),
  z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704,
 -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794,
 -1.6930, -1.7067,
   -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195,
 -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285,
 -1.7422, -1.7558,
   -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376,
 -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550,
 -1.6686, -1.6823,
   -1.6959, -1.7095))
 head(df)
 plot(df[,1:2], las=3)   # to show that it's a regular grid

 My question: is there a function to calculate the depth of any coordinate
 pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear
 interpolation or any other meaningful method?

 Thanks a lot for your help in anticipation

 Best wishes

 Thomas

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Michael Sumner
Software and Database Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsum...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rgl.postscript doesn't show the colors correctly

2014-07-28 Thread Ferra Xu
I wrote this code in R, in order to plot a density function kernel smoothing 
and then save the plot as a eps file:



library(ks) 
library(rgl)
kern -read.table(file.choose(),sep=,)
hat -kde(kern)
plot(hat,drawpoints=TRUE,xlab =x,ylab=y,zlab=z) 
rgl.postscript(plot1.eps,eps,drawText=TRUE)

The problem is that when I insert that eps file in Latex, the colors of the 
plot are not the same as the plot which is generated in R and it just shows the 
plot in one color (yellow) instead of a rage of colors (yellow, orange, red...) 
which shows different densities...
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Split PVClust plot

2014-07-28 Thread Worthington, Thomas A
Dear All 

I'm using PVClust to perform hierarchical clustering, for the output plot I can 
control most of the graphical I need, however the plot is large and I would 
like to split it vertically into two panels one above the other. Is there a way 
to plot only part of a PVClust plot, I tried to convert it to a dendrogram with 

result2  = as.dendrogram(result)

however I get the error message no applicable method for 'as.dendrogram' 
applied to an object of class pvclust. I also wondered whether it would be 
possible to convert to a phylogenetic tree and use the functions in the 'ape' 
package?

Any suggestion on how to split up a PVclust plot would be greatly appreciated  
(code for the plot below)

Thanks
Tom 


result - pvclust(df.1, method.dist=uncentered, 
method.hclust=average,nboot=10)
par(mar=c(0,0,0,0))
par(oma=c(0,0,0,0))
plot(result, print.pv =FALSE, col.pv=c(red,,), print.num=FALSE, float = 
0.02, font=1, 
axes=T, cex =0.85, main=, sub=, xlab=, ylab= , labels=NULL, 
hang=-1)
pvrect(result, alpha=0.95)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using foumula to calculate a column in dataframe

2014-07-28 Thread Jeff Newmiller

On Mon, 28 Jul 2014, Pavneet Arora wrote:


Hello All,
I need to calculate a column (Vupper) using a formula, but I am not sure
how to. It will be easier to explain with an example.

Again this is my dataset:
dput(nd)
structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04,
11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62,
10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62,
11.31, 10.52), cusum = c(-0.551, -2.56, -3.27, -1.61,
0.549, 0.729, -1.23, 0.229,
-0.572, -0.232, -1.2, 0.268,
0.778, 0.178, 0.258,
-0.373,
0.246, 0.557, 0.557, 3.56,
4.46, 3.79, 6.08, 7.58, 8.18, 9.26, 9.64, 11.26, 12.57, 13.09
)), .Names = c(week, value, cusum), row.names = c(NA, -30L
), class = data.frame)

I have some constants in my data. These are:
sigma =1, h = 5, k = 0.5

The formula requires me to start from the bottom row (30th in this case).
The formula for the last row will be row 30th Cusi value (13.09) + h(5) *
sigma(1) = giving me the value of 18.1

Then the formula for the 29th row for Vupper uses the value of 30th Vupper
(18.1) + k(0.5) * sigma(1) = giving me the value of 18.6

Similarly the formula for the 28th row for Vupper will use value of 29th
Vupper(18.6) + k(0.5) * sigma(1) = giving me the value of 19.1

And so on?.


This is a recurrence formula... each value depends on the previous value 
in the sequence. In general these can be computationally expensive in R, 
but there are certain very common cases that have built-in functions with 
which you can build many of the real-world cases you might encounter 
(such as this one).




Also, is there any way to make the formula generalised using loop or
functions? Because I really don?t want to have to re-write the program if
my number of rows increase or decrease or if I use another dataset?

So far my function looks like following (Without the Vupper formula in
there):
vmask2 - function(data,target,sigma,h,k){
 data$deviation - data$value - target
 data$cusums - cumsum(data$deviation)
 data$ma - c(NA,abs(diff(data$value)))
 data$Vupper - *not sure what to put here*

 data
}


I avoid using the variable name data because there is a base function of 
that name.


sigma - 1
h - 5
k - 0.5

dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma
, rep( 0, nrow(dta) - 1 )
)
+ seq( 0, by=k * sigma, length.out=30L )
 )
 )

Note how the terms in your algorithm are re-grouped into vectors that c 
and seq and rep can generate, and cumsum is used to implement the 
recurrence, and the rev function is used to reverse the vector.
If you are going to apply this to long sequences of data, you might want 
to fix the accumulation of floating-point error in the seq call by using 
integers:


dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma
, rep( 0, nrow(dta) - 1 )
)
+ k * sigma * seq( 0L, by=1L, length.out=30L )
 )
 )


***
MORE THN is a trading style of Royal  Sun Alliance Insurance plc (No. 93792). 
Registered in England and Wales at St. Mark???s Court, Chart Way, Horsham, West 
Sussex, RH12 1XL.

Authorised by the Prudential Regulation Authority and regulated by the 
Financial Conduct Authority and the Prudential Regulation Authority.


[[alternative HTML version deleted]]


Please send your emails in plain text, as the Posting Guide requests. HTML 
often corrupts what you send to the list.


---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, 

Re: [R] using foumula to calculate a column in dataframe

2014-07-28 Thread Jeff Newmiller

On Mon, 28 Jul 2014, Jeff Newmiller wrote:


On Mon, 28 Jul 2014, Pavneet Arora wrote:


Hello All,
I need to calculate a column (Vupper) using a formula, but I am not sure
how to. It will be easier to explain with an example.

Again this is my dataset:
dput(nd)
structure(list(week = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30), value = c(9.45, 7.99, 9.29, 11.66, 12.16, 10.18, 8.04,
11.46, 9.2, 10.34, 9.03, 11.47, 10.51, 9.4, 10.08, 9.37, 10.62,
10.31, 10, 13, 10.9, 9.33, 12.29, 11.5, 10.6, 11.08, 10.38, 11.62,
11.31, 10.52), cusum = c(-0.551, -2.56, -3.27, -1.61,
0.549, 0.729, -1.23, 0.229,
-0.572, -0.232, -1.2, 0.268,
0.778, 0.178, 0.258,
-0.373,
0.246, 0.557, 0.557, 3.56,
4.46, 3.79, 6.08, 7.58, 8.18, 9.26, 9.64, 11.26, 12.57, 13.09
)), .Names = c(week, value, cusum), row.names = c(NA, -30L
), class = data.frame)

I have some constants in my data. These are:
sigma =1, h = 5, k = 0.5

The formula requires me to start from the bottom row (30th in this case).
The formula for the last row will be row 30th Cusi value (13.09) + h(5) *
sigma(1) = giving me the value of 18.1

Then the formula for the 29th row for Vupper uses the value of 30th Vupper
(18.1) + k(0.5) * sigma(1) = giving me the value of 18.6

Similarly the formula for the 28th row for Vupper will use value of 29th
Vupper(18.6) + k(0.5) * sigma(1) = giving me the value of 19.1

And so on?.


This is a recurrence formula... each value depends on the previous value in 
the sequence. In general these can be computationally expensive in R, but 
there are certain very common cases that have built-in functions with which 
you can build many of the real-world cases you might encounter (such as this 
one).




Also, is there any way to make the formula generalised using loop or
functions? Because I really don?t want to have to re-write the program if
my number of rows increase or decrease or if I use another dataset?

So far my function looks like following (Without the Vupper formula in
there):
vmask2 - function(data,target,sigma,h,k){
 data$deviation - data$value - target
 data$cusums - cumsum(data$deviation)
 data$ma - c(NA,abs(diff(data$value)))
 data$Vupper - *not sure what to put here*

 data
}


I avoid using the variable name data because there is a base function of 
that name.


sigma - 1
h - 5
k - 0.5

dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma
   , rep( 0, nrow(dta) - 1 )
   )
   + seq( 0, by=k * sigma, length.out=30L )
)
)


Oops... accounted for accumulation twice, once with cumsum and once with 
seq.


dta$Vupper - rev( rep( dta[ nrow(dta), cusum ] + h * sigma, nrow(dta) )
 + k * sigma * seq( 0L, by=1L, length.out=30L )
 )


Note how the terms in your algorithm are re-grouped into vectors that c and 
seq and rep can generate, and cumsum is used to implement the recurrence, and 
the rev function is used to reverse the vector.
If you are going to apply this to long sequences of data, you might want to 
fix the accumulation of floating-point error in the seq call by using 
integers:


dta$Vupper - rev( cumsum( c( dta[ nrow(dta), cusum ] + h * sigma
   , rep( 0, nrow(dta) - 1 )
   )
   + k * sigma * seq( 0L, by=1L, length.out=30L )
)
)


***
MORE THN is a trading style of Royal  Sun Alliance Insurance plc (No. 
93792). Registered in England and Wales at St. Mark???s Court, Chart Way, 
Horsham, West Sussex, RH12 1XL.


Authorised by the Prudential Regulation Authority and regulated by the 
Financial Conduct Authority and the Prudential Regulation Authority.



[[alternative HTML version deleted]]


Please send your emails in plain text, as the Posting Guide requests. HTML 
often corrupts what you send to the list.


---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
 Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   

Re: [R] lattice, latticeExtra: Adding moving averages to double y plot

2014-07-28 Thread Duncan Mackay
I do not know what happened to my last email as this are set up as plain
text so I am sending the code again so I hope this works

I am not sure what you wanted exactly but this will plot the points and
lines of the average.
 
I have not worried about the 2nd axis

Here is one way of doing things  by combining the averages into the
dataframe. 
It makes it easier that way as you do not have to match up the x values

# combine averages into mydata
 mydata$mavg -
 c(rep(NA,4), madfStuff1[,3],
   rep(NA,4), subset(madfStuff2_3, Type== stuff2,3, drop = T),
   rep(NA,4), subset(madfStuff2_3, Type== stuff3,3, drop = T))

 xyplot(Value ~ Year, mydata, groups = Type,
allow.multiple = T,
distribute.type = TRUE,
col = c(red,blue,cyan),
 subscripts = TRUE,
panel = panel.superpose,
panel.groups = function(x, y, subscripts, ...,group.number) {
  panel.xyplot(x, y, ...)
   panel.xyplot(x, mydata[subscripts,mavg], col =
c(red,blue,cyan)[group.number], type = l)
})

Duncan

BTW libraries are case sensitive as well. Is it you editor putting capitals?

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Anna Zakrisson Braeunlich
Sent: Monday, 28 July 2014 16:38
To: r-help@r-project.org
Subject: [R] lattice, latticeExtra: Adding moving averages to double y plot

Hi lattice users,

I would like to add 5-year moving averages to my double y-plot. I have three
factors needs to be plotted with moving averages in the same plot. One of
these reads off y-axis 1 and two from y-axis 2. I have tried to use the
rollmean function from the zoo-packages, but I fail in insering this into
lattice (I am not an experienced lattice user). I want to keep the data
points in the plot.
Find below dummy data and the script as well as annotations further
describing my question.

thank you in advance!
Anna Zakrisson

mydata- data.frame(
  Year = 1980:2009,
  Type = factor(rep(c(stuff1, stuff2, stuff3), each = 10*3)),
  Value = rnorm(90, mean = seq(90),
sd = rep(c(6, 7, 3), each = 10)))

library(Lattice)
library(LatticeExtra)

stuff1data - mydata[(mydata$Type) %in% c(stuff1), ]
stuff12_3data - mydata[(mydata$Type) %in% c(stuff2, stuff3), ]


# make moving averages function using zoo and rollmean:
library(zoo)
library(plyr)

f - function(d)
{
  require(zoo)
  data.frame(Year = d$Year[5:length(d$Year)],
 mavg = rollmean(d$Value, 5))
}

# Apply the function to each group as well as both data frames:
madfStuff1 - ddply(stuff1data, Type, f)
madfStuff2_3 - ddply(stuff12_3data, Type, f)

# Some styles:
myStripStyle - function(which.panel, factor.levels, ...) {
  panel.rect(0, 0, 1, 1,
 col = bgColors[which.panel],
 border = 1)
  panel.text(x = 0.5, y = 0.5,
 font=2,
 lab = factor.levels[which.panel],
 col = txtColors[which.panel])
}


myplot1 - xyplot(Value ~ Year, data = stuff1data, col=black,
   lty=1, pch=1,
   ylab = sweets, strip.left = F,
   strip=myStripStyle,
   xlab = (Year),
  panel = function(x,y,...,subscripts){
panel.xyplot(x, y, pch = 1,col = black)
panel.lmline(x,y,col = black, data=madfStuff1) # here
I presume that panel.lmline is wrong.
# I would like to have my 5 year moving average here,
not a straight line.
  })
myplot1


myplot2 - xyplot(Value ~ Year, data = stuff12_3data, col=black,
  lty=1, pch=1,
  ylab = hours, strip.left = F,
  strip=myStripStyle,
  xlab = (Year),
  panel = function(x,y,...,subscripts){
panel.xyplot(x, y, pch = c(2:3),col = black) ## what
is this pch defining? Types?
#I would like to have different symbols and line types
for stuff2 and stuff3
panel.lmline(x,y,col = black, data=madfStuff2_3) #
wrong! Need my moving averages here!
  })
myplot2

doubleYScale(myplot1, myplot2, style1 = 0, style2=0, add.ylab2 = TRUE,
 text = c(stuff1, stuff2, stuff3), columns = 2,
col=black)

# problem here is that I end up with two lines. I need a double y-plot with
one moving average plots that are read off y-axis 1
# and two that reads off y-axis 2. I need to keep the data points in the
plot.

update(trellis.last.object(),
   par.settings = simpleTheme(col = c(black, black), lty=c(1:3),
pch=c(1:3))) # how come that I only get
# lines in my legend text and not the symbols too. I thought pch would add
symbols?!?


Anna Zakrisson Braeunlich
PhD student

Department of Ecology, Environment and Plant Sciences
Stockholm University
Svante Arrheniusv. 

[R] interactive labeling/highlighting on multiple xy scatter plots

2014-07-28 Thread Shi, Tao
hi list,

I'm comparing the changes of ~100 analytes in multiple treatment conditions.  I 
plotted them in several different xy scattter plots.  It would be nice if I 
mouse over one point on one scatter plot, the label of the analyte on that 
scatter plot AS WELL AS on all other scatter plots will be automatically shown. 
 I know brushing in rggobi does this, but its interface is not good and it 
needs R or ggobi to run (I want send the results to the collaborators and let 
them to play with it without the need of installing R or ggobi on their 
machine).  rCharts is nice but so far it can only create one scatter plot at a 
time. 

Any good suggestions?

Many thanks!

Tao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in validObject(.Object) : while running rqpd package

2014-07-28 Thread Vishal Chari
Hi 

I have installed rqpd on r 2.15.1 win 7 os. 
after loading rqpd package i get following output

Loading required package: quantreg
Loading required package: SparseM

Attaching package: ‘SparseM’

The following object(s) are masked from ‘package:base’:

    backsolve

Loading required package: MatrixModels
Loading required package: Matrix
Loading required package: lattice

Attaching package: ‘Matrix’

The following object(s) are masked from ‘package:SparseM’:

    det

Loading required package: Formula
Warning messages:
1: package ‘quantreg’ was built under R version 2.15.3 
2: package ‘SparseM’ was built under R version 2.15.3 
3: package ‘MatrixModels’ was built under R version 2.15.3 
4: package ‘lattice’ was built under R version 2.15.3 
5: package ‘Formula’ was built under R version 2.15.3 
6: In rm(.First.lib, envir = myEnv) : object '.First.lib' not found

whcih i fix by running following command 

 as.environment(match(package:rqpd, search()))
environment: package:rqpd
attr(,name)
[1] package:rqpd
attr(,path)
[1] C:/Users/fossil/Documents/R/win-library/2.15/rqpd


tried running example file 
 data(bwd)
 cre.form - dbirwt ~ smoke + dmage + agesq + novisit + pretri2 + pretri3 | 
 momid3 | smoke + dmage + agesq 
 crem.fit - rqpd(cre.form, panel(method=cre), data=bwd)

i get following error 


Error in validObject(.Object) : 
  invalid class “dsparseModelMatrix” object: superclass mMatrix not 
defined in the environment of the object's class


do i need to install 2.15.3?
how can i solve this problem?

please help 
thanks in advance 

vishal
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] outputting R loop to a csv file

2014-07-28 Thread Jenny Jiang
Hello,


My name is Jenny Jiang and I am a Finance Honours research student from the 
University of New South Wales. Currently my research project involves the 
calculating of some network centrality measures in R by using a loop, however I 
am having some trouble outputting my loop results to a desired CSV format.


Basically what I am doing is that for each firm year, I will need to calculate 
four different measures based on director id and connected director id and 
output these to the CSV file. I have provided in the attachment the code that I 
used for the R loop and CSV outputting (main-6.R). Using an example CSV file 
(data example 2), the output result I get is as shown in measure1.csv. As shown 
in the output file, the results are really messy, where for each firm year, all 
director ids and each type of measure for all directors are displayed in one 
cell. However, the desired format of output that I would like is as shown in 
output data template.xlsx.


As a result, I was just wondering if you could be able to help me to get the 
desired format that I would like, which would be much easier to enable me to do 
further research on this.


I cannot be more than appreciated.




Best regards


Jenny
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] outputting R loop to a csv file

2014-07-28 Thread David L Carlson
It will be difficult to help since all of the attached files were stripped out 
of your message. R-help accepts very few formats as attached files and they do 
not include .R or .csv or .xlsx, but they do include .txt (so you could rename 
your R and csv files). It will be easier to help if we have enough data to test 
alternate approaches. The function dput() will convert a sample of your data to 
text format so that you can paste it into your email or provide it as a .txt 
file.

David L. Carlson
Department of Anthropology
Texas AM University

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jenny Jiang
Sent: Monday, July 28, 2014 8:48 PM
To: r-help@R-project.org
Subject: [R] outputting R loop to a csv file

Hello,


My name is Jenny Jiang and I am a Finance Honours research student from the 
University of New South Wales. Currently my research project involves the 
calculating of some network centrality measures in R by using a loop, however I 
am having some trouble outputting my loop results to a desired CSV format.


Basically what I am doing is that for each firm year, I will need to calculate 
four different measures based on director id and connected director id and 
output these to the CSV file. I have provided in the attachment the code that I 
used for the R loop and CSV outputting (main-6.R). Using an example CSV file 
(data example 2), the output result I get is as shown in measure1.csv. As shown 
in the output file, the results are really messy, where for each firm year, all 
director ids and each type of measure for all directors are displayed in one 
cell. However, the desired format of output that I would like is as shown in 
output data template.xlsx.


As a result, I was just wondering if you could be able to help me to get the 
desired format that I would like, which would be much easier to enable me to do 
further research on this.


I cannot be more than appreciated.




Best regards


Jenny

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] outputting R loop to a csv file

2014-07-28 Thread PIKAL Petr
Hi

Above what David said, there is chance that you do not need cycle for your 
computation.

From what you describe about your csv files there seems to be some mismatch in 
your write.csv statement.

Make a small example code together with data set, preferably as an output from 
dput, and try again to ask.

Regards
Petr

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of David L Carlson
 Sent: Tuesday, July 29, 2014 5:25 AM
 To: Jenny Jiang; r-help@R-project.org
 Subject: Re: [R] outputting R loop to a csv file

 It will be difficult to help since all of the attached files were
 stripped out of your message. R-help accepts very few formats as
 attached files and they do not include .R or .csv or .xlsx, but they do
 include .txt (so you could rename your R and csv files). It will be
 easier to help if we have enough data to test alternate approaches. The
 function dput() will convert a sample of your data to text format so
 that you can paste it into your email or provide it as a .txt file.

 David L. Carlson
 Department of Anthropology
 Texas AM University

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Jenny Jiang
 Sent: Monday, July 28, 2014 8:48 PM
 To: r-help@R-project.org
 Subject: [R] outputting R loop to a csv file

 Hello,


 My name is Jenny Jiang and I am a Finance Honours research student from
 the University of New South Wales. Currently my research project
 involves the calculating of some network centrality measures in R by
 using a loop, however I am having some trouble outputting my loop
 results to a desired CSV format.


 Basically what I am doing is that for each firm year, I will need to
 calculate four different measures based on director id and connected
 director id and output these to the CSV file. I have provided in the
 attachment the code that I used for the R loop and CSV outputting
 (main-6.R). Using an example CSV file (data example 2), the output
 result I get is as shown in measure1.csv. As shown in the output file,
 the results are really messy, where for each firm year, all director
 ids and each type of measure for all directors are displayed in one
 cell. However, the desired format of output that I would like is as
 shown in output data template.xlsx.


 As a result, I was just wondering if you could be able to help me to
 get the desired format that I would like, which would be much easier to
 enable me to do further research on this.


 I cannot be more than appreciated.




 Best regards


 Jenny

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 

Re: [R-es] wordcloud y tabla de palabras

2014-07-28 Thread Alfredo David Alvarado Ríos
Buenas tardes, grupo. Agradecido Carlos por tu orientación y Eduardo.
Efectivamente, seguí el ejemplo de wordclouds, y al igual que
anteriormente logró hacer la nube de texto, pero sólo por cada uno de
los textos considerados.
Tengo los dos corpus clean por cada uno de los informes que estoy
considerando: año 2005 y 2013.

tdm05-TermDocumentMatrix(cor.05.cl)
tdm13-TermDocumentMatrix(cor.13.cl)
 m05-as.matrix(tdm05)
 m13-as.matrix(tdm13)
 v05 - sort(rowSums(m05),decreasing=TRUE)
 v13 - sort(rowSums(m13),decreasing=TRUE)
 df05-data.frame(word = names(v05), freq=v05)
 df13-data.frame(word = names(v13), freq=v13)
 wordcloud(df05$word,df05$freq)
There were 50 or more warnings (use warnings() to see the first 50)
 head(df05)
 word freq
seguridad   seguridad   56
ciudadana   ciudadana   40
funcionarios funcionarios   33
policiales policiales   32
nacional nacional   28
policial policial   28

 wordcloud(df13$word,df13$freq)
There were 34 warnings (use warnings() to see them)

 head(df13)
   word freq
seguridad seguridad   33
homicidios   homicidios   29
año año   27
país   país   21
inseguridad inseguridad   20
violencia violencia   20

Como ven, puedo seguir el procedimiento hasta obtener el wordcloud por
cada uno de los informes, pero no logro encontrar la manera de unir
los dos documentos de manera que pueda mostrarlos comparativamente en
dos wordclouds.
En este sentido, he leído, y por lo que entiendo los dos documentos se
unen en un solo corpus, que debería contener los dos documentos. Eso
lo hice, con los informes, y efectivamente podría representar una sola
ventana con el wordcloud de ambos informes.
Sin embargo, cuando trato de aplicar colnames, el mensaje de error
sigue siendo length of 'dimnames' [2] not equal to array extent, es
decir, como si no pudiera aplicar las columnas porque se trata de un
solo documento.
Entonces, solicito una vez más su valiosa ayuda en lo relacionado con
lo siguiente:
Después de tener ambos data.frame (año 2005 y año 2013) es que se
deben unir los datos. Ahora bien, esto debe realizarse con la orden
Corpus? Como dije, lo trabajé uniéndolos en esa orden y me dio el
mensaje dimnames [2] no equal to array extent en el paso de
aplicación de nombres de columnas.
Los uní antes, como en el siguiente ejemplo
http://www.webmining.cl/2014/05/text-mining-comparacion-de-2-discursos-presidenciales-del-21-de-mayo-usando-r/
y tampoco logré aplicarle colnames, ni tener la forma de matriz que se
requiere para colocar los años en las columnas y las palabras contadas
en las filas.
Realmente he estado estudiando la herramienta R, y leído varios
artículos y revisado materiales relacionados para buscar la manera,
pero no logro dar con la manera de visualizar.
Gracias nuevamente por la atención. Y gracias por la disposición.
Cordial y atentamente,





El día 25 de julio de 2014, 0:16, Alfredo David Alvarado Ríos
david.alvarad...@gmail.com escribió:
 Buenas noches grupo. Saludos cordiales.

 He seguido en la búsqueda de una forma que me permita realizar la
 comparación de dos documentos pertenecientes a los años 2005 y 2013, y
 que pueda representar finalmente con wordcloud y con una table en la
 que las columnas sean los años de cada informe 2005 y 2013, y las
 filas sean las palabras con la frecuencia de cada una de ellas por
 cada informe:


 --
 ||  2005 | 2013  |
 --
 | terminos |   |   |
 --
 | terminos |   |   |
 --


 De manera que buscando y experimentando, adaptando de otras
 experiencias logré llegar a lo siguiente:

 ##

informes-c(2013, 2005)
pathname-C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/

TDM-function(informes, pathname) {
  info.dir-sprintf(%s/%s, pathname, informes)
  info.cor-Corpus(DirSource(directory=info.dir, encoding=UTF-8))
  info.cor.cl-tm_map(info.cor, content_transformer(tolower))
  info.cor.cl-tm_map(info.cor.cl, stripWhitespace)
  info.cor.cl-tm_map(info.cor.cl,removePunctuation)
  sw-readLines(C:/Users/d_2/Documents/StopWords.txt, encoding=UTF-8)
  sw-iconv(enc2utf8(sw), sub = byte)
  info.cor.cl-tm_map(info.cor.cl, removeWords, stopwords(spanish))
  info.tdm-TermDocumentMatrix(info.cor.cl)
  result-list(name = informes, tdm= info.tdm)
  }
tdm-lapply(informes, TDM, path = pathname)

 Resultado:

 tdm
 [[1]]
 [[1]]$name
 [1] 2013
 [[1]]$tdm
 TermDocumentMatrix (terms: 1540, documents: 1)
 Non-/sparse entries: 1540/0
 Sparsity   : 0%
 Maximal term length: 18
 Weighting  : term frequency (tf)

 [[2]]
 [[2]]$name
 [1] 2005
 [[2]]$tdm
 TermDocumentMatrix (terms: 1849, documents: 1)
 Non-/sparse entries: 1849/0
 Sparsity   : 0%
 Maximal term length: 19
 Weighting  : term frequency (tf)

 str(tdm)
 List of 2
  $ :List of 2
   ..$ name:  2013
 

Re: [R-es] wordcloud y tabla de palabras

2014-07-28 Thread Carlos Ortega
Hola,

La referencia (gracias por proporcionarla) que has incluido es bastante
clara y se puede seguir.
¿Has podido sobre tus dos discursos utilizar la misma lógica?

La forma de salir de dudas, para empezar, es que adjuntaras el código que
estás empleando por ver si hay algún error evidente. Aunque la forma
adecuada para que te podamos ayudar es con un ejemplo reproducible: código
+ datos.

Saludos,
Carlos Ortega
www.qualityexcellence.es


El 28 de julio de 2014, 21:24, Alfredo David Alvarado Ríos 
david.alvarad...@gmail.com escribió:

 Buenas tardes, grupo. Agradecido Carlos por tu orientación y Eduardo.
 Efectivamente, seguí el ejemplo de wordclouds, y al igual que
 anteriormente logró hacer la nube de texto, pero sólo por cada uno de
 los textos considerados.
 Tengo los dos corpus clean por cada uno de los informes que estoy
 considerando: año 2005 y 2013.

 tdm05-TermDocumentMatrix(cor.05.cl)
 tdm13-TermDocumentMatrix(cor.13.cl)
  m05-as.matrix(tdm05)
  m13-as.matrix(tdm13)
  v05 - sort(rowSums(m05),decreasing=TRUE)
  v13 - sort(rowSums(m13),decreasing=TRUE)
  df05-data.frame(word = names(v05), freq=v05)
  df13-data.frame(word = names(v13), freq=v13)
  wordcloud(df05$word,df05$freq)
 There were 50 or more warnings (use warnings() to see the first 50)
  head(df05)
  word freq
 seguridad   seguridad   56
 ciudadana   ciudadana   40
 funcionarios funcionarios   33
 policiales policiales   32
 nacional nacional   28
 policial policial   28

  wordcloud(df13$word,df13$freq)
 There were 34 warnings (use warnings() to see them)

  head(df13)
word freq
 seguridad seguridad   33
 homicidios   homicidios   29
 año año   27
 país   país   21
 inseguridad inseguridad   20
 violencia violencia   20

 Como ven, puedo seguir el procedimiento hasta obtener el wordcloud por
 cada uno de los informes, pero no logro encontrar la manera de unir
 los dos documentos de manera que pueda mostrarlos comparativamente en
 dos wordclouds.
 En este sentido, he leído, y por lo que entiendo los dos documentos se
 unen en un solo corpus, que debería contener los dos documentos. Eso
 lo hice, con los informes, y efectivamente podría representar una sola
 ventana con el wordcloud de ambos informes.
 Sin embargo, cuando trato de aplicar colnames, el mensaje de error
 sigue siendo length of 'dimnames' [2] not equal to array extent, es
 decir, como si no pudiera aplicar las columnas porque se trata de un
 solo documento.
 Entonces, solicito una vez más su valiosa ayuda en lo relacionado con
 lo siguiente:
 Después de tener ambos data.frame (año 2005 y año 2013) es que se
 deben unir los datos. Ahora bien, esto debe realizarse con la orden
 Corpus? Como dije, lo trabajé uniéndolos en esa orden y me dio el
 mensaje dimnames [2] no equal to array extent en el paso de
 aplicación de nombres de columnas.
 Los uní antes, como en el siguiente ejemplo

 http://www.webmining.cl/2014/05/text-mining-comparacion-de-2-discursos-presidenciales-del-21-de-mayo-usando-r/
 y tampoco logré aplicarle colnames, ni tener la forma de matriz que se
 requiere para colocar los años en las columnas y las palabras contadas
 en las filas.
 Realmente he estado estudiando la herramienta R, y leído varios
 artículos y revisado materiales relacionados para buscar la manera,
 pero no logro dar con la manera de visualizar.
 Gracias nuevamente por la atención. Y gracias por la disposición.
 Cordial y atentamente,





 El día 25 de julio de 2014, 0:16, Alfredo David Alvarado Ríos
 david.alvarad...@gmail.com escribió:
  Buenas noches grupo. Saludos cordiales.
 
  He seguido en la búsqueda de una forma que me permita realizar la
  comparación de dos documentos pertenecientes a los años 2005 y 2013, y
  que pueda representar finalmente con wordcloud y con una table en la
  que las columnas sean los años de cada informe 2005 y 2013, y las
  filas sean las palabras con la frecuencia de cada una de ellas por
  cada informe:
 
 
  --
  ||  2005 | 2013  |
  --
  | terminos |   |   |
  --
  | terminos |   |   |
  --
 
 
  De manera que buscando y experimentando, adaptando de otras
  experiencias logré llegar a lo siguiente:
 
  ##
 
 informes-c(2013, 2005)
 pathname-C:/Users/d_2/Documents/Comision/PLAN de
 INSPECCIONES/Informes/
 
 TDM-function(informes, pathname) {
   info.dir-sprintf(%s/%s, pathname, informes)
   info.cor-Corpus(DirSource(directory=info.dir, encoding=UTF-8))
   info.cor.cl-tm_map(info.cor, content_transformer(tolower))
   info.cor.cl-tm_map(info.cor.cl, stripWhitespace)
   info.cor.cl-tm_map(info.cor.cl,removePunctuation)
   sw-readLines(C:/Users/d_2/Documents/StopWords.txt, encoding=UTF-8)
   sw-iconv(enc2utf8(sw), sub = byte)