Re: [R] installing the lubricate package

2016-07-21 Thread Ismail SEZEN
You don't have to download and install from github. You can install
lubridate package easly from cran repository. If you really intend to
install from github, i advise you install devtools package first and use
install_github function.

http://www.inside-r.org/packages/cran/devtools/docs/install_github

On Fri, Jul 22, 2016, 03:36 lily li  wrote:

> Hi R users,
>
> I'm trying to download lubricate from this website, and then install it on
> my mac.
> https://github.com/hadley/lubridate
>
> but it says windows version does not apply to mac. How to install the
> package for mac? Thanks.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Mark Leeds
Hi Jeremiah: I think I wasn't that clear. I'm not suggesting  the kalman
filter to deal with time varying coefficients. As Roy pointed out, one can
use the kalman filter to do regular regression where one "sees" a new data
point as each time unit passes. It can be assumed that the coefficients do
not vary ( basically by having no variance in the system equation ).

The problem as I see it, is that Duncan and Horn's approach ( and Roy
alluded
to this problem also ), only deals with adding one point at a time to the
front of the data set.  It doesn't handle the fact that you want to drop
the nth observation and everything older than that observation  also.

don't know how easy it would be  to modify their approach to deal with the
fact that you are using a moving window rather than just adding one point
at a time.

The main point I wanted to get across here is that I was not suggesting the
KF as a way to handle varying coefficients. You can assume that they're
fixed and still use it. See the reference I pointed out for more on this
approach and my apologies for the confusion.
















On Thu, Jul 21, 2016 at 5:43 PM, jeremiah rounds 
wrote:

> I agree that when appropriate Kalman Filter/Smoothing the higher-quality
> way to go about estimating a time-varying coefficient (given that is what
> they do),  and I have noted that both the R package "dlm" and the function
> "StructTS" handle these problems quickly.  I am working on that in
> parallel.
>
> One of the things I am unsure about with Kalman Filters is how to estimate
> variance parameters when the process is unusual in some way that isn't in
> the model and it is not feasible to adjust the model by-hand.  dlm's dlmMLE
> seems to produce non-sense (not because of the author's work but because of
> assumptions).  At least with moving window regressions after the unusual
> event is past your window the influence of that event is gone.That
> isn't really a question for this group it is more about me reading more.
> When I get that "how to handle all the strange things big data throws at
> you" worked out for Kalman Filters, I will go back to those because I
> certainly like what I see when everything is right.  There is a plethora of
> related topics right?  Bayesian Model Averaging, G-ARCH models for
> heteroscedasticity, etc.
>
> Anyway... roll::roll_lm, cheers!
>
> Thanks,
> Jeremiah
>
>
>
> On Thu, Jul 21, 2016 at 2:08 PM, Mark Leeds  wrote:
>
>> Hi Jermiah: another possibly faster way would be to use a kalman
>> filtering framework. I forget the details but duncan and horne have a paper
>> which shows how a regression can be re-computed each time a new data point
>> is added .I
>> forget if they handle taking one off of the back also which is what you
>> need.
>>
>> The paper at the link below isn't the paper I'm talking about but it's
>> reference[1] in that paper. Note that this suggestion might not be a better
>> approach  than the various approaches already suggested so I wouldn't go
>> this route unless you're very interested.
>>
>>
>> Mark
>>
>> https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck <
>> ggrothendi...@gmail.com> wrote:
>>
>>> I would be careful about making assumptions regarding what is faster.
>>> Performance tends to be nonintuitive.
>>>
>>> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
>>> you provided rollapply/fastLm was three times faster than roll_lm.  Of
>>> course this could change with data of different dimensions but it
>>> would be worthwhile to do actual benchmarks before making assumptions.
>>>
>>> I also noticed that roll_lm did not give the same coefficients as the
>>> other two.
>>>
>>> set.seed(1)
>>> library(zoo)
>>> library(RcppArmadillo)
>>> library(roll)
>>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>> colnames(z) <- c("y", "x")
>>>
>>> ## rolling regression of width 4
>>> library(rbenchmark)
>>> benchmark(fastLm = rollapplyr(z, width = 4,
>>>  function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>>>  by.column = FALSE),
>>>lm = rollapplyr(z, width = 4,
>>>  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>>  by.column = FALSE),
>>>roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
>>> F]), 4,
>>>  center = FALSE))[1:4]
>>>
>>>
>>>  test replications elapsed relative
>>> 1  fastLm  1000.221.000
>>> 2  lm  1000.723.273
>>> 3 roll_lm  1000.642.909
>>>
>>> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>>>  wrote:
>>> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
>>> maybe
>>> > I would prefer it to have options to return a few more things, but it
>>> is
>>> > the coefficients, and the remaining statistics you might want can be
>>> > calculated fast enough from there.
>>> >
>>> >
>>> > On 

[R] adding new values to dataframe based on existing values in two columns

2016-07-21 Thread Alexander.Herr
Hiya,

I am trying to assign minimum values to a dataframe based on existing columns.  
I can do this via loops, but surely R has much more elegant solutions...

Here is my code:
set.seed(666)
xyz<-as.data.frame(cbind(x=rep(rpois(50,10),2)+1, 
y=rep(rpois(50,10),2)+1,z=runif(100, min=-3, max=40)))
xyz[order(xyz[,1], xyz[,2]),]->xyz
 unique(xyz[,1:2])
dim(xyz)
   aggregate(xyz[,3],by=list(x=xyz[,1],y=xyz[,2]), min)->mins
   
 xyz$mins<-rep(NA, nrow(xyz))

#now assign min values to each xy combination
for(i in unique(xyz[,1])) {
   mins[mins[,1]==i,]->mm
for( j in unique(mm[,2])) {
mins[mins[,1]==i & mins[,2] == j,3]->xyz[xyz[,1]==i & xyz[,2]==j,4]
}
}

Thanks and cheers
Herry


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why the order of parameters in a logistic regression affects results significantly?

2016-07-21 Thread Qinghua He via R-help
Using the same data, if I ran
fit2 
<-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
 
I obtained:
> exp(coef(fit2))(Intercept)         Age        LumA        LumB        HER2    
>    Basal      Normal  0.24866935  1.00433781  0.10639937  0.31614001  
> 0.08220685 20.25180956          NA 
while if I ran

fit2 
<-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
I obtained:
> exp(coef(fit2)) (Intercept)          Age         LumA         LumB        
> Basal       Normal         HER2   0.02044232   1.00433781   1.29428846   
> 3.84566516 246.35185956  12.16443690           NA 

Essentially they're the same model - I just moved HER2 to the last. But the OR 
changed significantly. Can someone explain?
For the latter result, I don't even know how to interpret as all factors have 
OR>1 (except Intercept), how could that possible? Can I eliminate the effect of 
intercept?
Also, I cannot obtain OR for the last factor due to collinearity. However, I 
know others obtained OR for all factors for the same dataset. Can someone tell 
me how to obtain OR for all factors? All factors are categorical variables 
(i.e., 0 or 1).
Thanks!
Peter
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installing the lubricate package

2016-07-21 Thread lily li
Hi R users,

I'm trying to download lubricate from this website, and then install it on
my mac.
https://github.com/hadley/lubridate

but it says windows version does not apply to mac. How to install the
package for mac? Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Jean-Claude Arbaut
This may be useful:

Sven Hammarling and Craig Lucas
"Updating the QR factorization and the least squares problem"
http://eprints.ma.man.ac.uk/1192/01/covered/MIMS_ep2008_111.pdf
http://www.maths.manchester.ac.uk/~clucas/updating/

2016-07-21 20:02 GMT+02:00 jeremiah rounds :
> Hi,
>
> A not unusual task is performing a multiple regression in a rolling window
> on a time-series.A standard piece of advice for doing in R is something
> like the code that follows at the end of the email.  I am currently using
> an "embed" variant of that code and that piece of advice is out there too.
>
> But, it occurs to me that for such an easily specified matrix operation
> standard R code is really slow.   rollapply constantly returns to R
> interpreter at each window step for a new lm.   All lm is at its heart is
> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
> window you are just incrementing a counter and peeling off rows (or columns
> of X and y) of a particular window size, and following that up with some
> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> practically writes itself and you might want a wrapper of something like:
> rolling_lm (y=y, x=x, width=4).
>
> My question is this: has any of the thousands of R packages out there
> published anything like that.  Rolling window multiple regressions that
> stay in C/C++ until the rolling window completes?  No sense and writing it
> if it exist.
>
>
> Thanks,
> Jeremiah
>
> Standard (slow) advice for "rolling window regression" follows:
>
>
> set.seed(1)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> rollapply(z, width = 4,
>function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>by.column = FALSE, align = "right")
>
> ## result is identical to
> coef(lm(y ~ x, data = z[1:4,]))
> coef(lm(y ~ x, data = z[2:5,]))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PDF extraction with tm package

2016-07-21 Thread Steven Kang
Hi R users,

I’m having some issues trying to extract texts from PDF file using tm
package.

Here are the steps that were carried out:

1. Downloaded and installed the following programs:

- Xpdf (Copied the ‘bin32’, ‘bin64’, ‘doc’ folders into ‘C:\Program
Files\Xpdf’ directory; also added C:\Program Files\Xpdf\bin64\pdfinfo.exe &
C:\Program Files\Xpdf\bin64\pdftotext.exe in existing PATH

- Tesseract

- Imagemagick

2. Used the following scripts and the corresponding error messages:

# Directory where PDF files are stored

>cname <- getwd()

>Corpus(DirSource(cname), readerControl=list(reader = readPDF))

Error in system2("pdftotext", c(control$text, shQuote(x), "-"), stdout =
TRUE) :
'"pdftotext"' not found

 In addition: Warning message:

running command '"pdfinfo" "C:\Users\R_Files\XXX.pdf"' had status 127

>file.exists(Sys.which(c("pdfinfo","pdftpotext")))
[1] FALSE FALSE

It seems like R can’t find pdfinfo & pdftotext exe files, but not sure as
to why this would be the case despite xpdf files being copied into
‘C:\Program Files’ (Im using Windows 7 64bits)

I’m aware that ‘pdf_text’ function from pdftools package can extract texts
from PDF file and outputs into a string. But I was after something which is
able to convert PDF (ie transaction data) into a dataframe without regular
expression. Is tm package capable of doing this conversion? Are there any
other alternatives to these methods?

Your expertise in resolving this problem would be highly appreciated.


Steve

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about interpolating data in r

2016-07-21 Thread Ismail SEZEN

> On 22 Jul 2016, at 01:54, lily li  wrote:
> 
> Thanks, I meant if there are missing data at the beginning and end of a
> dataframe, how to interpolate according to available data?
> 
> For example, the A column has missing values at the beginning and end, how
> to interpolate linearly between 10 and 12 for the missing values?
> 
> df <- data.frame(A=c(NA, NA,10,11,12, NA),B=c(5,5,4,3,4,5),C=c(3.3,4,3,1.5,
> 2.2,4),time=as.Date(c("1990-01-01","1990-02-
> 07","1990-02-14","1990-02-28","1990-03-01","1990-03-20")))
> 

As William was answered;

with(df, approx(x=time, y=A, xout=seq(min(time, na.rm =T), max(time, na.rm = 
T), by="days")))

will help you interpolate linearly between knwon values even column has NA’s.


> 
> On Thu, Jul 21, 2016 at 4:48 PM, William Dunlap  wrote:
> 
>> Try approx(), as in:
>> 
>> df <-
>> data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
>> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>> 
>> Do you notice how one can copy and paste that example out of the
>> mail an into R to see how it works?  It would help if your questions
>> had that same property - show how the example data could be created.
>> 
>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>> 
>> On Thu, Jul 21, 2016 at 3:34 PM, lily li  wrote:
>> 
>>> I have a question about interpolating missing values in a dataframe. The
>>> dataframe is in the following, Column C has no data before 2009-01-05 and
>>> after 2009-12-31, how to interpolate data for the blanks? That is to say,
>>> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>>> 
>>> 
>>> df
>>> timeA  B C
>>> 2009-01-013  4.5
>>> 2009-01-024  5
>>> 2009-01-033.3   6
>>> 2009-01-044.1   7
>>> 2009-01-054.4   6.2   5.4
>>> ...
>>> 
>>> 2009-11-205.1   5.5   6.1
>>> 2009-11-215.4   4
>>> ...
>>> 2009-12-314.5   6
>>> 
>>>[[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] about interpolating data in r

2016-07-21 Thread Ismail SEZEN

> On 22 Jul 2016, at 01:34, lily li  wrote:
> 
> I have a question about interpolating missing values in a dataframe.

First of all, filling missing values action must be taken into account very 
carefully. It must be known the nature of the data that wanted to be filled and 
most of the time, to let them be NA is the most appropriate action.

> The
> dataframe is in the following, Column C has no data before 2009-01-05 and
> after 2009-12-31, how to interpolate data for the blanks?

Why a dataframe? Is there any relationship between columns A,B and C? If there 
is, then you might want to consider filling missing values by a linear model 
approach instead of interpolation. You said that there is not data before 
2009-01-05 and after 2009-12-31 but according to dataframe, there is not data 
after 2009-11-20?

> That is to say,
> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.

Also you metion interpolating blanks but you want interpolation between two 
gaps? Do you want to fill missing values before 2009-01-05 and after 2009-11-20 
or do you want to find intermediate values between 2009-01-05 and 2009-11-20? 
This is a bit unclear.

> 
> 
> df
> timeA  B C
> 2009-01-013  4.5
> 2009-01-024  5
> 2009-01-033.3   6
> 2009-01-044.1   7
> 2009-01-054.4   6.2   5.4
> ...
> 
> 2009-11-205.1   5.5   6.1
> 2009-11-215.4   4
> ...
> 2009-12-314.5   6


If you want to fill missing values at the end-points for column C (before 
2009-01-05 and after 2009-11-20), and all data you have is between 2009-01-05 
and 2009-11-20, this means that you want extrapolation (guessing unkonwn values 
that is out of known values). So, you can use only values at column C to guess 
missing end-point values. You can use splinefun (or spline) functions for this 
purpose. But let me note that this kind of approach might help you only for a 
few missing values close to end-points. Otherwise, you might find yourself in a 
huge mistake. 

As I mentioned in my first sentence, If you have a relationship between all 
columns or you have data for column C for other years (for instance, assume 
that you have data for column C for 2007, 2008, and 2010 but not 2009) you may 
want to try a statistical approach to fill the missing values.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about interpolating data in r

2016-07-21 Thread lily li
Thanks, I meant if there are missing data at the beginning and end of a
dataframe, how to interpolate according to available data?

For example, the A column has missing values at the beginning and end, how
to interpolate linearly between 10 and 12 for the missing values?

df <- data.frame(A=c(NA, NA,10,11,12, NA),B=c(5,5,4,3,4,5),C=c(3.3,4,3,1.5,
2.2,4),time=as.Date(c("1990-01-01","1990-02-
07","1990-02-14","1990-02-28","1990-03-01","1990-03-20")))


On Thu, Jul 21, 2016 at 4:48 PM, William Dunlap  wrote:

> Try approx(), as in:
>
> df <-
> data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
> with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
>
> Do you notice how one can copy and paste that example out of the
> mail an into R to see how it works?  It would help if your questions
> had that same property - show how the example data could be created.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 21, 2016 at 3:34 PM, lily li  wrote:
>
>> I have a question about interpolating missing values in a dataframe. The
>> dataframe is in the following, Column C has no data before 2009-01-05 and
>> after 2009-12-31, how to interpolate data for the blanks? That is to say,
>> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>>
>>
>> df
>> timeA  B C
>> 2009-01-013  4.5
>> 2009-01-024  5
>> 2009-01-033.3   6
>> 2009-01-044.1   7
>> 2009-01-054.4   6.2   5.4
>> ...
>>
>> 2009-11-205.1   5.5   6.1
>> 2009-11-215.4   4
>> ...
>> 2009-12-314.5   6
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about interpolating data in r

2016-07-21 Thread William Dunlap via R-help
Try approx(), as in:

df <-
data.frame(A=c(10,11,12),B=c(5,5,4),C=c(3.3,4,3),time=as.Date(c("1990-01-01","1990-02-07","1990-02-14")))
with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))

Do you notice how one can copy and paste that example out of the
mail an into R to see how it works?  It would help if your questions
had that same property - show how the example data could be created.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 21, 2016 at 3:34 PM, lily li  wrote:

> I have a question about interpolating missing values in a dataframe. The
> dataframe is in the following, Column C has no data before 2009-01-05 and
> after 2009-12-31, how to interpolate data for the blanks? That is to say,
> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>
>
> df
> timeA  B C
> 2009-01-013  4.5
> 2009-01-024  5
> 2009-01-033.3   6
> 2009-01-044.1   7
> 2009-01-054.4   6.2   5.4
> ...
>
> 2009-11-205.1   5.5   6.1
> 2009-11-215.4   4
> ...
> 2009-12-314.5   6
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to expand the dataframe

2016-07-21 Thread William Dunlap via R-help
Depending on your situation you may want all=TRUE, all.x=TRUE, or
all.y=TRUE.
I think the SQL people call these outer joins, left outer joins, and right
outer joins.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Jul 21, 2016 at 2:35 PM, lily li  wrote:

> I use this code, and it works. So has to set 'all=TRUE'.
>
> merge(df, data.frame(time=seq(as.Date("1990-01-01"),
> to=as.Date("1990-12-31"), by="days")), all=TRUE)
>
>
> On Thu, Jul 21, 2016 at 9:22 AM, Daniel Nordlund 
> wrote:
>
> > On 7/20/2016 8:26 PM, lily li wrote:
> >
> >> Yes, I tried to create a dataframe and merge it with the shortened
> >> dataframe. The resulting dataframe goes with the short one and truncates
> >> the complete date column, so it does not work.
> >>
> >> On Wed, Jul 20, 2016 at 6:38 PM, David Winsemius <
> dwinsem...@comcast.net>
> >> wrote:
> >>
> >>
> >>> On Jul 20, 2016, at 1:31 PM, lily li  wrote:
> 
>  Hi R users,
> 
>  I have a dataframe, where there is a column 'time' represents time
>  series
>  but is not complete. How to expand the dataframe so this column will
> 
> >>> become
> >>>
>  complete, where other columns with the newly added rows have NA
> values?
>  Thanks.
> 
>  df
>  A B C time
>  105 3.3 1990-01-01
>  115  4 1990-02-07
>  124 3  1990-02-14
>  ...
> 
> >>>
> >>> Make a dataframe with a 'time' column using seq.Date and merge that
> >>> dataframe with your df dataframe.
> >>>
> >>>
>    [[alternative HTML version deleted]]
> 
> >>>
> >>> Really  isn't it time you learned how to send plain text. You've
> >>> posted many questions on Rhelp.  It's really not that difficult on
> >>> gmail. I
> >>> also have a gmail account and have had no difficulty finding
> instructions
> >>> on how to do it.
> >>>
> >>>
>  __
>  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
> 
> >>> http://www.R-project.org/posting-guide.html
> >>>
>  and provide commented, minimal, self-contained, reproducible code.
> 
> >>>
> >>> David Winsemius
> >>> Alameda, CA, USA
> >>>
> >>>
> >>>
> >>
> > Don't just say you tried to do the merge and it doesn't work.  At a
> > minimum show us the ACTUAL code you used and give us any error messages
> you
> > got or show us a portion of the results and explain why it is not what
> you
> > expected.  If possible, give us a small amount of data using dput() so
> that
> > we can "play along at home" (i.e. give us a reproducible example).
> >
> > Dan
> >
> > Daniel Nordlund
> > Port Townsend, WA
> >
> >
> >
> > --
> > Daniel Noredlund
> > Bothell, WA USA
> >
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotCI linetypes

2016-07-21 Thread Jim Lemon
Hi Florian,
As I suspected, you will have to use something other than the "arrows"
function, which most people use to draw "error bars". This is where
the "lty" argument gets used for all of the lines. You could
substitute a function like this:

foobars<-function(x0,y0,x1,y1,length=0.02,col=par("fg"),
 lty=par("lty"),lwd=par("lwd")) {

 segments(x0,y0,x1,y1,col=col,lty=lty,lwd=lwd)
 capx<-diff(par("usr"))[1]*length
 segments(x1-capx,y1,x1+capx,y1,col=col,lty=1,lwd=lwd)
}

for arrows and then replace calls to "arrows" with calls to "foobars".
Obviously "foobars" will only work for vertical bars, but could easily
be modified to handle horizontal bars. I think that should be all you
need.

Jim


On Thu, Jul 21, 2016 at 10:05 PM, Jim Lemon  wrote:
> Hi Florian,
> I'll have to think about this one.Neither plotCI (which Ben Bolker
> wrote) nor dispersion (which I wrote) do this at the moment, but
> perhaps I can work out something that won't be too hard to program.
> I'll  get back to you.
>
> JIm
>
>
> On Thu, Jul 21, 2016 at 8:07 PM, Florian Detsch
>  wrote:
>> Dear Jim,
>>
>> first of all, thanks for your wonderful work on the plotrix package.
>> In plotCI, I wonder if it was possible to control the line types of lines
>> (i.e., the part connecting point symbols with caps) and caps separately?
>> Please consider the following code snippet to clarify my point.
>>
>> ## sample data from ?plotCI
>> set.seed(10)
>> y <- runif(10)
>> err <- runif(10)
>>
>> ## visualize data
>> par(mfrow = c(2, 1))
>> for (lwd in c(2, 1.25))
>>   plotCI(1:10, y, err, slty = 2, lwd = lwd)
>>
>> Using slty = 2 alongside with lwd = 2 results in only half sides of the line
>> caps being drawn. That means, if I wanted to stick with dashed lines, I'd be
>> forced to downregulate lwd e.g. to a value of 1.25 in order to achieve fully
>> drawn caps. Would it be possible to implement something like llty (for line
>> type of lines; which I would then set to 2) and clty (for line type of caps;
>> which I would then set to 1) or is there any other, probably more convenient
>> solution?
>>
>> Best,
>> Florian
>>
>> --
>> Florian Detsch (M.Sc. Physical Geography)
>> Environmental Informatics
>> Department of Geography
>> Philipps-Universität Marburg
>> Deutschhausstraße 12
>> 35032 (parcel post: 35037) Marburg, Germany
>>
>> Phone: +49 (0) 6421 28-25323
>> Web: http://umweltinformatik-marburg.de/en/staff/florian-detsch/
>>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] about interpolating data in r

2016-07-21 Thread lily li
I have a question about interpolating missing values in a dataframe. The
dataframe is in the following, Column C has no data before 2009-01-05 and
after 2009-12-31, how to interpolate data for the blanks? That is to say,
interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.


df
timeA  B C
2009-01-013  4.5
2009-01-024  5
2009-01-033.3   6
2009-01-044.1   7
2009-01-054.4   6.2   5.4
...

2009-11-205.1   5.5   6.1
2009-11-215.4   4
...
2009-12-314.5   6

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Pronóstico de series de tiempo usando redes neuronales

2016-07-21 Thread Carlos Ortega
Hola,

Mira aquí:
https://cran.r-project.org/web/views/TimeSeries.html

En el epígrafe "Nonlinear Time Series Analysis".

Saludos,
Carlos Ortega
www.qualityexcellence.es

El 21 de julio de 2016, 23:58, Elkin Tabares 
escribió:

> Hola a toda la comunidad R,
>
> Deseo consultar si existe un paquete en R que me permita realizar
> pronóstico de series de tiempo a partir de un red neuronal autorregresiva
> con entradas exogenas,  hasta el momento solo he encontrado la función
> nnetar del paquete forecast, pero busco algo similar al toolbox de redes
> neuronales en matlab. Muchas gracias por la colaboración .
>
> Cordialmente,
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread jeremiah rounds
I agree that when appropriate Kalman Filter/Smoothing the higher-quality
way to go about estimating a time-varying coefficient (given that is what
they do),  and I have noted that both the R package "dlm" and the function
"StructTS" handle these problems quickly.  I am working on that in
parallel.

One of the things I am unsure about with Kalman Filters is how to estimate
variance parameters when the process is unusual in some way that isn't in
the model and it is not feasible to adjust the model by-hand.  dlm's dlmMLE
seems to produce non-sense (not because of the author's work but because of
assumptions).  At least with moving window regressions after the unusual
event is past your window the influence of that event is gone.That
isn't really a question for this group it is more about me reading more.
When I get that "how to handle all the strange things big data throws at
you" worked out for Kalman Filters, I will go back to those because I
certainly like what I see when everything is right.  There is a plethora of
related topics right?  Bayesian Model Averaging, G-ARCH models for
heteroscedasticity, etc.

Anyway... roll::roll_lm, cheers!

Thanks,
Jeremiah



On Thu, Jul 21, 2016 at 2:08 PM, Mark Leeds  wrote:

> Hi Jermiah: another possibly faster way would be to use a kalman filtering
> framework. I forget the details but duncan and horne have a paper which
> shows how a regression can be re-computed each time a new data point is
> added .I
> forget if they handle taking one off of the back also which is what you
> need.
>
> The paper at the link below isn't the paper I'm talking about but it's
> reference[1] in that paper. Note that this suggestion might not be a better
> approach  than the various approaches already suggested so I wouldn't go
> this route unless you're very interested.
>
>
> Mark
>
> https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf
>
>
>
>
>
>
> On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck <
> ggrothendi...@gmail.com> wrote:
>
>> I would be careful about making assumptions regarding what is faster.
>> Performance tends to be nonintuitive.
>>
>> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
>> you provided rollapply/fastLm was three times faster than roll_lm.  Of
>> course this could change with data of different dimensions but it
>> would be worthwhile to do actual benchmarks before making assumptions.
>>
>> I also noticed that roll_lm did not give the same coefficients as the
>> other two.
>>
>> set.seed(1)
>> library(zoo)
>> library(RcppArmadillo)
>> library(roll)
>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> colnames(z) <- c("y", "x")
>>
>> ## rolling regression of width 4
>> library(rbenchmark)
>> benchmark(fastLm = rollapplyr(z, width = 4,
>>  function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>>  by.column = FALSE),
>>lm = rollapplyr(z, width = 4,
>>  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>  by.column = FALSE),
>>roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
>> F]), 4,
>>  center = FALSE))[1:4]
>>
>>
>>  test replications elapsed relative
>> 1  fastLm  1000.221.000
>> 2  lm  1000.723.273
>> 3 roll_lm  1000.642.909
>>
>> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>>  wrote:
>> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
>> maybe
>> > I would prefer it to have options to return a few more things, but it is
>> > the coefficients, and the remaining statistics you might want can be
>> > calculated fast enough from there.
>> >
>> >
>> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
>> achim.zeil...@uibk.ac.at>
>> > wrote:
>> >
>> >> Jeremiah,
>> >>
>> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
>> >> Rcpp and the former also provides rolling lm models. The latter has a
>> >> generic interface that let's you define your own function.
>> >>
>> >> One thing to pay attention to, though, is the numerical reliability.
>> >> Especially on large time series with relatively short windows there is
>> a
>> >> good chance of encountering numerically challenging situations. The QR
>> >> decomposition used by lm is fairly robust while other more
>> straightforward
>> >> matrix multiplications may not be. This should be kept in mind when
>> writing
>> >> your own Rcpp code for plugging it into RcppRoll.
>> >>
>> >> But I haven't check what the roll package does and how reliable that
>> is...
>> >>
>> >> hth,
>> >> Z
>> >>
>> >>
>> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>> >>
>> >> Hi,
>> >>>
>> >>> A not unusual task is performing a multiple regression in a rolling
>> window
>> >>> on a time-series.A standard piece of advice for doing in R is
>> >>> something
>> >>> like the code that follows at the end of the email.  I am currently
>> using
>> >>> an "embed" variant of that code and that piece of 

Re: [R] how to expand the dataframe

2016-07-21 Thread lily li
I use this code, and it works. So has to set 'all=TRUE'.

merge(df, data.frame(time=seq(as.Date("1990-01-01"),
to=as.Date("1990-12-31"), by="days")), all=TRUE)


On Thu, Jul 21, 2016 at 9:22 AM, Daniel Nordlund 
wrote:

> On 7/20/2016 8:26 PM, lily li wrote:
>
>> Yes, I tried to create a dataframe and merge it with the shortened
>> dataframe. The resulting dataframe goes with the short one and truncates
>> the complete date column, so it does not work.
>>
>> On Wed, Jul 20, 2016 at 6:38 PM, David Winsemius 
>> wrote:
>>
>>
>>> On Jul 20, 2016, at 1:31 PM, lily li  wrote:

 Hi R users,

 I have a dataframe, where there is a column 'time' represents time
 series
 but is not complete. How to expand the dataframe so this column will

>>> become
>>>
 complete, where other columns with the newly added rows have NA values?
 Thanks.

 df
 A B C time
 105 3.3 1990-01-01
 115  4 1990-02-07
 124 3  1990-02-14
 ...

>>>
>>> Make a dataframe with a 'time' column using seq.Date and merge that
>>> dataframe with your df dataframe.
>>>
>>>
   [[alternative HTML version deleted]]

>>>
>>> Really  isn't it time you learned how to send plain text. You've
>>> posted many questions on Rhelp.  It's really not that difficult on
>>> gmail. I
>>> also have a gmail account and have had no difficulty finding instructions
>>> on how to do it.
>>>
>>>
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

>>> http://www.R-project.org/posting-guide.html
>>>
 and provide commented, minimal, self-contained, reproducible code.

>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>>>
>>>
>>
> Don't just say you tried to do the merge and it doesn't work.  At a
> minimum show us the ACTUAL code you used and give us any error messages you
> got or show us a portion of the results and explain why it is not what you
> expected.  If possible, give us a small amount of data using dput() so that
> we can "play along at home" (i.e. give us a reproducible example).
>
> Dan
>
> Daniel Nordlund
> Port Townsend, WA
>
>
>
> --
> Daniel Noredlund
> Bothell, WA USA
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Roy Mendelssohn - NOAA Federal
I have no idea which method produces the fastest results,  but the package KFAS 
has a function to do recursive regressions using the Kalman filter.  One 
difference is that it is not, as far as a I can telll, a moving window (so past 
data are being dropped),  just a recursively computed regression.

HTH,

-Roy

> On Jul 21, 2016, at 2:08 PM, Mark Leeds  wrote:
> 
> Hi Jermiah: another possibly faster way would be to use a kalman filtering
> framework. I forget the details but duncan and horne have a paper which
> shows how a regression can be re-computed each time a new data point is
> added .I
> forget if they handle taking one off of the back also which is what you
> need.
> 
> The paper at the link below isn't the paper I'm talking about but it's
> reference[1] in that paper. Note that this suggestion might not be a better
> approach  than the various approaches already suggested so I wouldn't go
> this route unless you're very interested.
> 
> 
> Mark
> 
> https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf
> 
> 
> 
> 
> 
> 
> On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck > wrote:
> 
>> I would be careful about making assumptions regarding what is faster.
>> Performance tends to be nonintuitive.
>> 
>> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
>> you provided rollapply/fastLm was three times faster than roll_lm.  Of
>> course this could change with data of different dimensions but it
>> would be worthwhile to do actual benchmarks before making assumptions.
>> 
>> I also noticed that roll_lm did not give the same coefficients as the
>> other two.
>> 
>> set.seed(1)
>> library(zoo)
>> library(RcppArmadillo)
>> library(roll)
>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> colnames(z) <- c("y", "x")
>> 
>> ## rolling regression of width 4
>> library(rbenchmark)
>> benchmark(fastLm = rollapplyr(z, width = 4,
>> function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>> by.column = FALSE),
>>   lm = rollapplyr(z, width = 4,
>> function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>> by.column = FALSE),
>>   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
>> F]), 4,
>> center = FALSE))[1:4]
>> 
>> 
>> test replications elapsed relative
>> 1  fastLm  1000.221.000
>> 2  lm  1000.723.273
>> 3 roll_lm  1000.642.909
>> 
>> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>>  wrote:
>>> Thanks all.  roll::roll_lm was essentially what I wanted.   I think
>> maybe
>>> I would prefer it to have options to return a few more things, but it is
>>> the coefficients, and the remaining statistics you might want can be
>>> calculated fast enough from there.
>>> 
>>> 
>>> On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
>> achim.zeil...@uibk.ac.at>
>>> wrote:
>>> 
 Jeremiah,
 
 for this purpose there are the "roll" and "RcppRoll" packages. Both use
 Rcpp and the former also provides rolling lm models. The latter has a
 generic interface that let's you define your own function.
 
 One thing to pay attention to, though, is the numerical reliability.
 Especially on large time series with relatively short windows there is a
 good chance of encountering numerically challenging situations. The QR
 decomposition used by lm is fairly robust while other more
>> straightforward
 matrix multiplications may not be. This should be kept in mind when
>> writing
 your own Rcpp code for plugging it into RcppRoll.
 
 But I haven't check what the roll package does and how reliable that
>> is...
 
 hth,
 Z
 
 
 On Thu, 21 Jul 2016, jeremiah rounds wrote:
 
 Hi,
> 
> A not unusual task is performing a multiple regression in a rolling
>> window
> on a time-series.A standard piece of advice for doing in R is
> something
> like the code that follows at the end of the email.  I am currently
>> using
> an "embed" variant of that code and that piece of advice is out there
>> too.
> 
> But, it occurs to me that for such an easily specified matrix operation
> standard R code is really slow.   rollapply constantly returns to R
> interpreter at each window step for a new lm.   All lm is at its heart
>> is
> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
>> rolling
> window you are just incrementing a counter and peeling off rows (or
> columns
> of X and y) of a particular window size, and following that up with
>> some
> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> practically writes itself and you might want a wrapper of something
>> like:
> rolling_lm (y=y, x=x, width=4).
> 
> My question is this: has any of the thousands of R packages out there
> published anything like that.  Rolling window multiple regressions that
> stay 

[R] How to plot marginal effects (MEM) in R?

2016-07-21 Thread Faradj Koliev
Dear all, 

I have two logistic regression models:


   • model <- glm(Y ~ X1+X2+X3+X4, data = data, family = "binomial")



   • modelInteraction <- glm(Y ~ X1+X2+X3+X4+X1*X4, data = data, family = 
"binomial")

To calculate the marginal effects (MEM approach) for these models, I used the 
`mfx` package:


   • a<- logitmfx(model, data=data, atmean=TRUE)



•b<- logitmfx(modelInteraction, data=data, atmean=TRUE)


What I want to do now is 1) plot all the results for "model" and 2) show the 
result just for two variables: X1 and X2. 
3) I also want to plot the interaction term in ”modelInteraction”.


I have been looking around for the solutions but haven't been able to find any. 
I would appreciate any suggestions. 

A reproducible sample: 

> dput(data)
structure(list(Y = c(0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X1 = c(1L, 0L, 1L, 
0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 
1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 0L), X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), X4 = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L)), .Names = c("Y", "X1", "X2", 
"X3", "X4"), row.names = c(NA, -69L), class = "data.frame")




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Mark Leeds
Hi Jermiah: another possibly faster way would be to use a kalman filtering
framework. I forget the details but duncan and horne have a paper which
shows how a regression can be re-computed each time a new data point is
added .I
forget if they handle taking one off of the back also which is what you
need.

The paper at the link below isn't the paper I'm talking about but it's
reference[1] in that paper. Note that this suggestion might not be a better
approach  than the various approaches already suggested so I wouldn't go
this route unless you're very interested.


Mark

https://www.le.ac.uk/users/dsgp1/COURSES/MESOMET/ECMETXT/recurse.pdf






On Thu, Jul 21, 2016 at 4:28 PM, Gabor Grothendieck  wrote:

> I would be careful about making assumptions regarding what is faster.
> Performance tends to be nonintuitive.
>
> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
> you provided rollapply/fastLm was three times faster than roll_lm.  Of
> course this could change with data of different dimensions but it
> would be worthwhile to do actual benchmarks before making assumptions.
>
> I also noticed that roll_lm did not give the same coefficients as the
> other two.
>
> set.seed(1)
> library(zoo)
> library(RcppArmadillo)
> library(roll)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> library(rbenchmark)
> benchmark(fastLm = rollapplyr(z, width = 4,
>  function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>  by.column = FALSE),
>lm = rollapplyr(z, width = 4,
>  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>  by.column = FALSE),
>roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
> F]), 4,
>  center = FALSE))[1:4]
>
>
>  test replications elapsed relative
> 1  fastLm  1000.221.000
> 2  lm  1000.723.273
> 3 roll_lm  1000.642.909
>
> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>  wrote:
> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
> maybe
> > I would prefer it to have options to return a few more things, but it is
> > the coefficients, and the remaining statistics you might want can be
> > calculated fast enough from there.
> >
> >
> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
> achim.zeil...@uibk.ac.at>
> > wrote:
> >
> >> Jeremiah,
> >>
> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
> >> Rcpp and the former also provides rolling lm models. The latter has a
> >> generic interface that let's you define your own function.
> >>
> >> One thing to pay attention to, though, is the numerical reliability.
> >> Especially on large time series with relatively short windows there is a
> >> good chance of encountering numerically challenging situations. The QR
> >> decomposition used by lm is fairly robust while other more
> straightforward
> >> matrix multiplications may not be. This should be kept in mind when
> writing
> >> your own Rcpp code for plugging it into RcppRoll.
> >>
> >> But I haven't check what the roll package does and how reliable that
> is...
> >>
> >> hth,
> >> Z
> >>
> >>
> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
> >>
> >> Hi,
> >>>
> >>> A not unusual task is performing a multiple regression in a rolling
> window
> >>> on a time-series.A standard piece of advice for doing in R is
> >>> something
> >>> like the code that follows at the end of the email.  I am currently
> using
> >>> an "embed" variant of that code and that piece of advice is out there
> too.
> >>>
> >>> But, it occurs to me that for such an easily specified matrix operation
> >>> standard R code is really slow.   rollapply constantly returns to R
> >>> interpreter at each window step for a new lm.   All lm is at its heart
> is
> >>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in
> rolling
> >>> window you are just incrementing a counter and peeling off rows (or
> >>> columns
> >>> of X and y) of a particular window size, and following that up with
> some
> >>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> >>> practically writes itself and you might want a wrapper of something
> like:
> >>> rolling_lm (y=y, x=x, width=4).
> >>>
> >>> My question is this: has any of the thousands of R packages out there
> >>> published anything like that.  Rolling window multiple regressions that
> >>> stay in C/C++ until the rolling window completes?  No sense and
> writing it
> >>> if it exist.
> >>>
> >>>
> >>> Thanks,
> >>> Jeremiah
> >>>
> >>> Standard (slow) advice for "rolling window regression" follows:
> >>>
> >>>
> >>> set.seed(1)
> >>> z <- zoo(matrix(rnorm(10), ncol = 2))
> >>> colnames(z) <- c("y", "x")
> >>>
> >>> ## rolling regression of width 4
> >>> rollapply(z, width = 4,
> >>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
> >>>   by.column = FALSE, align = "right")
> >>>

[R] Aggregate data to lower resolution

2016-07-21 Thread Miluji Sb
Dear all,

I have the following GDP data by latitude and longitude at 0.5 degree by
0.5 degree.

temp <- dput(head(ptsDF,10))
structure(list(longitude = c(-68.25, -67.75, -67.25, -68.25,
-67.75, -67.25, -71.25, -70.75, -69.25, -68.75), latitude = c(-54.75,
-54.75, -54.75, -54.25, -54.25, -54.25, -53.75, -53.75, -53.75,
-53.75), GDP = c(1.683046, 0.3212307, 0.0486207, 0.1223268, 0.0171909,
0.0062104, 0.22379, 0.1406729, 0.0030038, 0.0057422)), .Names =
c("longitude",
"latitude", "GDP"), row.names = c(4L, 17L, 30L, 43L, 56L, 69L,
82L, 95L, 108L, 121L), class = "data.frame")

I would like to aggregate the data 1 degree by 1 degree. I understand that
the first step is to convert to raster. I have tried:

rasterDF <- rasterFromXYZ(temp)
r <- aggregate(rasterDF,fact=2, fun=sum)

But this does not seem to work. Could anyone help me out please? Thank you
in advance.

Sincerely,

Milu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread jeremiah rounds
I appreciate the timing, so much so I changed the code to show the issue.
 It is a problem of scale.

 roll_lm probably has a heavy start-up cost but otherwise completely
out-performs those other versions at scale.  I suspect you are timing the
nearly  constant time start-up cost in small data.  I did give code to
paint a picture, but it was just cartoon code lifted from stackexchange.
If you want to characterize the real problem it is closer to:
30 day rolling windows on 24 daily (by hour) measurements for 5 years with
24+7 -1 dummy predictor variables and finally you need to do this for 300
sets of data.

Pseudo-code is closer to what follows and roll_lm can handle that input in
a timely manner.  You can do it with lm.fit, but you need to spend a lot of
time waiting.  The issue of accuracy needs a follow-up check.  Not sure why
it would be different.  Worth a check on that.

Thanks,
Jeremiah


library(rbenchmark)
N = 30*24*12*5
window = 30*24
npred = 15  #15 chosen arbitrarily...
set.seed(1)
library(zoo)
library(RcppArmadillo)
library(roll)
x = matrix(rnorm(N*(npred+1)), ncol = npred+1)
colnames(x) <- c("y",  paste0("x", 1:npred))
z <- zoo(x)


benchmark(
   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, -1, drop =
F]), window,
 center = FALSE), replications=3)

Which comes out as:
 test replications elapsed relative user.self sys.self user.child
sys.child
1 roll_lm3   6.273138.3120.654  0
  0





## You arn't going to get that below...

benchmark(fastLm = rollapplyr(z, width = window,
 function(x) coef(fastLm(cbind(1, x[, -1]), x[, 1])),
 by.column = FALSE),
   lm = rollapplyr(z, width = window,
 function(x) coef(lm(y ~ ., data = as.data.frame(x))),
 by.column = FALSE), replications=3)



On Thu, Jul 21, 2016 at 1:28 PM, Gabor Grothendieck  wrote:

> I would be careful about making assumptions regarding what is faster.
> Performance tends to be nonintuitive.
>
> When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
> you provided rollapply/fastLm was three times faster than roll_lm.  Of
> course this could change with data of different dimensions but it
> would be worthwhile to do actual benchmarks before making assumptions.
>
> I also noticed that roll_lm did not give the same coefficients as the
> other two.
>
> set.seed(1)
> library(zoo)
> library(RcppArmadillo)
> library(roll)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> library(rbenchmark)
> benchmark(fastLm = rollapplyr(z, width = 4,
>  function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
>  by.column = FALSE),
>lm = rollapplyr(z, width = 4,
>  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>  by.column = FALSE),
>roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop =
> F]), 4,
>  center = FALSE))[1:4]
>
>
>  test replications elapsed relative
> 1  fastLm  1000.221.000
> 2  lm  1000.723.273
> 3 roll_lm  1000.642.909
>
> On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
>  wrote:
> >  Thanks all.  roll::roll_lm was essentially what I wanted.   I think
> maybe
> > I would prefer it to have options to return a few more things, but it is
> > the coefficients, and the remaining statistics you might want can be
> > calculated fast enough from there.
> >
> >
> > On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis <
> achim.zeil...@uibk.ac.at>
> > wrote:
> >
> >> Jeremiah,
> >>
> >> for this purpose there are the "roll" and "RcppRoll" packages. Both use
> >> Rcpp and the former also provides rolling lm models. The latter has a
> >> generic interface that let's you define your own function.
> >>
> >> One thing to pay attention to, though, is the numerical reliability.
> >> Especially on large time series with relatively short windows there is a
> >> good chance of encountering numerically challenging situations. The QR
> >> decomposition used by lm is fairly robust while other more
> straightforward
> >> matrix multiplications may not be. This should be kept in mind when
> writing
> >> your own Rcpp code for plugging it into RcppRoll.
> >>
> >> But I haven't check what the roll package does and how reliable that
> is...
> >>
> >> hth,
> >> Z
> >>
> >>
> >> On Thu, 21 Jul 2016, jeremiah rounds wrote:
> >>
> >> Hi,
> >>>
> >>> A not unusual task is performing a multiple regression in a rolling
> window
> >>> on a time-series.A standard piece of advice for doing in R is
> >>> something
> >>> like the code that follows at the end of the email.  I am currently
> using
> >>> an "embed" variant of that code and that piece of advice is out there
> too.
> >>>
> >>> But, it occurs to me that for such an easily specified matrix operation
> >>> standard R code is really slow.   rollapply constantly returns to R
> >>> interpreter at each window step for a new lm.   

Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Gabor Grothendieck
I would be careful about making assumptions regarding what is faster.
Performance tends to be nonintuitive.

When I ran rollapply/lm, rollapply/fastLm and roll_lm on the example
you provided rollapply/fastLm was three times faster than roll_lm.  Of
course this could change with data of different dimensions but it
would be worthwhile to do actual benchmarks before making assumptions.

I also noticed that roll_lm did not give the same coefficients as the other two.

set.seed(1)
library(zoo)
library(RcppArmadillo)
library(roll)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
library(rbenchmark)
benchmark(fastLm = rollapplyr(z, width = 4,
 function(x) coef(fastLm(cbind(1, x[, 2]), x[, 1])),
 by.column = FALSE),
   lm = rollapplyr(z, width = 4,
 function(x) coef(lm(y ~ x, data = as.data.frame(x))),
 by.column = FALSE),
   roll_lm =  roll_lm(coredata(z[, 1, drop = F]), coredata(z[, 2, drop = F]), 4,
 center = FALSE))[1:4]


 test replications elapsed relative
1  fastLm  1000.221.000
2  lm  1000.723.273
3 roll_lm  1000.642.909

On Thu, Jul 21, 2016 at 3:45 PM, jeremiah rounds
 wrote:
>  Thanks all.  roll::roll_lm was essentially what I wanted.   I think maybe
> I would prefer it to have options to return a few more things, but it is
> the coefficients, and the remaining statistics you might want can be
> calculated fast enough from there.
>
>
> On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis 
> wrote:
>
>> Jeremiah,
>>
>> for this purpose there are the "roll" and "RcppRoll" packages. Both use
>> Rcpp and the former also provides rolling lm models. The latter has a
>> generic interface that let's you define your own function.
>>
>> One thing to pay attention to, though, is the numerical reliability.
>> Especially on large time series with relatively short windows there is a
>> good chance of encountering numerically challenging situations. The QR
>> decomposition used by lm is fairly robust while other more straightforward
>> matrix multiplications may not be. This should be kept in mind when writing
>> your own Rcpp code for plugging it into RcppRoll.
>>
>> But I haven't check what the roll package does and how reliable that is...
>>
>> hth,
>> Z
>>
>>
>> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>>
>> Hi,
>>>
>>> A not unusual task is performing a multiple regression in a rolling window
>>> on a time-series.A standard piece of advice for doing in R is
>>> something
>>> like the code that follows at the end of the email.  I am currently using
>>> an "embed" variant of that code and that piece of advice is out there too.
>>>
>>> But, it occurs to me that for such an easily specified matrix operation
>>> standard R code is really slow.   rollapply constantly returns to R
>>> interpreter at each window step for a new lm.   All lm is at its heart is
>>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
>>> window you are just incrementing a counter and peeling off rows (or
>>> columns
>>> of X and y) of a particular window size, and following that up with some
>>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>>> practically writes itself and you might want a wrapper of something like:
>>> rolling_lm (y=y, x=x, width=4).
>>>
>>> My question is this: has any of the thousands of R packages out there
>>> published anything like that.  Rolling window multiple regressions that
>>> stay in C/C++ until the rolling window completes?  No sense and writing it
>>> if it exist.
>>>
>>>
>>> Thanks,
>>> Jeremiah
>>>
>>> Standard (slow) advice for "rolling window regression" follows:
>>>
>>>
>>> set.seed(1)
>>> z <- zoo(matrix(rnorm(10), ncol = 2))
>>> colnames(z) <- c("y", "x")
>>>
>>> ## rolling regression of width 4
>>> rollapply(z, width = 4,
>>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>>   by.column = FALSE, align = "right")
>>>
>>> ## result is identical to
>>> coef(lm(y ~ x, data = z[1:4,]))
>>> coef(lm(y ~ x, data = z[2:5,]))
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at 

Re: [R] how to stop interpretation in system()

2016-07-21 Thread Michael Peng
Yes. I forgot to  escape the escape. Thank you so much.

2016-07-21 16:07 GMT-04:00 Sarah Goslee :

> You could escape the backslash.
>
> system("cmd 'a\\tb'")
>
>
> On Thu, Jul 21, 2016 at 4:00 PM, Michael Peng
>  wrote:
> > Hi,
> >
> > I am trying to use system() to run some command in OS. such as
> >
> > system("cmd 'a\tb')
> >
> > however,  it alway runs
> > cmd 'ab'
> > instead of
> > cmd 'a\tb'
> >
> > How can I prevent system to interpret 'a\tb' to 'ab'?
> >
> >
> > Thanks
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to stop interpretation in system()

2016-07-21 Thread Sarah Goslee
You could escape the backslash.

system("cmd 'a\\tb'")


On Thu, Jul 21, 2016 at 4:00 PM, Michael Peng
 wrote:
> Hi,
>
> I am trying to use system() to run some command in OS. such as
>
> system("cmd 'a\tb')
>
> however,  it alway runs
> cmd 'ab'
> instead of
> cmd 'a\tb'
>
> How can I prevent system to interpret 'a\tb' to 'ab'?
>
>
> Thanks

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to stop interpretation in system()

2016-07-21 Thread Michael Peng
Hi,

I am trying to use system() to run some command in OS. such as

system("cmd 'a\tb')

however,  it alway runs
cmd 'ab'
instead of
cmd 'a\tb'

How can I prevent system to interpret 'a\tb' to 'ab'?


Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread jeremiah rounds
 Thanks all.  roll::roll_lm was essentially what I wanted.   I think maybe
I would prefer it to have options to return a few more things, but it is
the coefficients, and the remaining statistics you might want can be
calculated fast enough from there.


On Thu, Jul 21, 2016 at 12:36 PM, Achim Zeileis 
wrote:

> Jeremiah,
>
> for this purpose there are the "roll" and "RcppRoll" packages. Both use
> Rcpp and the former also provides rolling lm models. The latter has a
> generic interface that let's you define your own function.
>
> One thing to pay attention to, though, is the numerical reliability.
> Especially on large time series with relatively short windows there is a
> good chance of encountering numerically challenging situations. The QR
> decomposition used by lm is fairly robust while other more straightforward
> matrix multiplications may not be. This should be kept in mind when writing
> your own Rcpp code for plugging it into RcppRoll.
>
> But I haven't check what the roll package does and how reliable that is...
>
> hth,
> Z
>
>
> On Thu, 21 Jul 2016, jeremiah rounds wrote:
>
> Hi,
>>
>> A not unusual task is performing a multiple regression in a rolling window
>> on a time-series.A standard piece of advice for doing in R is
>> something
>> like the code that follows at the end of the email.  I am currently using
>> an "embed" variant of that code and that piece of advice is out there too.
>>
>> But, it occurs to me that for such an easily specified matrix operation
>> standard R code is really slow.   rollapply constantly returns to R
>> interpreter at each window step for a new lm.   All lm is at its heart is
>> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
>> window you are just incrementing a counter and peeling off rows (or
>> columns
>> of X and y) of a particular window size, and following that up with some
>> matrix multiplication in a loop.   The psuedo-code for that Rcpp
>> practically writes itself and you might want a wrapper of something like:
>> rolling_lm (y=y, x=x, width=4).
>>
>> My question is this: has any of the thousands of R packages out there
>> published anything like that.  Rolling window multiple regressions that
>> stay in C/C++ until the rolling window completes?  No sense and writing it
>> if it exist.
>>
>>
>> Thanks,
>> Jeremiah
>>
>> Standard (slow) advice for "rolling window regression" follows:
>>
>>
>> set.seed(1)
>> z <- zoo(matrix(rnorm(10), ncol = 2))
>> colnames(z) <- c("y", "x")
>>
>> ## rolling regression of width 4
>> rollapply(z, width = 4,
>>   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>>   by.column = FALSE, align = "right")
>>
>> ## result is identical to
>> coef(lm(y ~ x, data = z[1:4,]))
>> coef(lm(y ~ x, data = z[2:5,]))
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Achim Zeileis

Jeremiah,

for this purpose there are the "roll" and "RcppRoll" packages. Both use 
Rcpp and the former also provides rolling lm models. The latter has a 
generic interface that let's you define your own function.


One thing to pay attention to, though, is the numerical reliability. 
Especially on large time series with relatively short windows there is a 
good chance of encountering numerically challenging situations. The QR 
decomposition used by lm is fairly robust while other more straightforward 
matrix multiplications may not be. This should be kept in mind when 
writing your own Rcpp code for plugging it into RcppRoll.


But I haven't check what the roll package does and how reliable that is...

hth,
Z

On Thu, 21 Jul 2016, jeremiah rounds wrote:


Hi,

A not unusual task is performing a multiple regression in a rolling window
on a time-series.A standard piece of advice for doing in R is something
like the code that follows at the end of the email.  I am currently using
an "embed" variant of that code and that piece of advice is out there too.

But, it occurs to me that for such an easily specified matrix operation
standard R code is really slow.   rollapply constantly returns to R
interpreter at each window step for a new lm.   All lm is at its heart is
(X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
window you are just incrementing a counter and peeling off rows (or columns
of X and y) of a particular window size, and following that up with some
matrix multiplication in a loop.   The psuedo-code for that Rcpp
practically writes itself and you might want a wrapper of something like:
rolling_lm (y=y, x=x, width=4).

My question is this: has any of the thousands of R packages out there
published anything like that.  Rolling window multiple regressions that
stay in C/C++ until the rolling window completes?  No sense and writing it
if it exist.


Thanks,
Jeremiah

Standard (slow) advice for "rolling window regression" follows:


set.seed(1)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
rollapply(z, width = 4,
  function(x) coef(lm(y ~ x, data = as.data.frame(x))),
  by.column = FALSE, align = "right")

## result is identical to
coef(lm(y ~ x, data = z[1:4,]))
coef(lm(y ~ x, data = z[2:5,]))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread Gabor Grothendieck
Just replacing lm with a faster version would speed it up.  Try lm.fit
or even faster is fastLm in the RcppArmadillo package.

On Thu, Jul 21, 2016 at 2:02 PM, jeremiah rounds
 wrote:
> Hi,
>
> A not unusual task is performing a multiple regression in a rolling window
> on a time-series.A standard piece of advice for doing in R is something
> like the code that follows at the end of the email.  I am currently using
> an "embed" variant of that code and that piece of advice is out there too.
>
> But, it occurs to me that for such an easily specified matrix operation
> standard R code is really slow.   rollapply constantly returns to R
> interpreter at each window step for a new lm.   All lm is at its heart is
> (X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
> window you are just incrementing a counter and peeling off rows (or columns
> of X and y) of a particular window size, and following that up with some
> matrix multiplication in a loop.   The psuedo-code for that Rcpp
> practically writes itself and you might want a wrapper of something like:
> rolling_lm (y=y, x=x, width=4).
>
> My question is this: has any of the thousands of R packages out there
> published anything like that.  Rolling window multiple regressions that
> stay in C/C++ until the rolling window completes?  No sense and writing it
> if it exist.
>
>
> Thanks,
> Jeremiah
>
> Standard (slow) advice for "rolling window regression" follows:
>
>
> set.seed(1)
> z <- zoo(matrix(rnorm(10), ncol = 2))
> colnames(z) <- c("y", "x")
>
> ## rolling regression of width 4
> rollapply(z, width = 4,
>function(x) coef(lm(y ~ x, data = as.data.frame(x))),
>by.column = FALSE, align = "right")
>
> ## result is identical to
> coef(lm(y ~ x, data = z[1:4,]))
> coef(lm(y ~ x, data = z[2:5,]))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C/C++/Fortran Rolling Window Regressions

2016-07-21 Thread jeremiah rounds
Hi,

A not unusual task is performing a multiple regression in a rolling window
on a time-series.A standard piece of advice for doing in R is something
like the code that follows at the end of the email.  I am currently using
an "embed" variant of that code and that piece of advice is out there too.

But, it occurs to me that for such an easily specified matrix operation
standard R code is really slow.   rollapply constantly returns to R
interpreter at each window step for a new lm.   All lm is at its heart is
(X^t X)^(-1) * Xy,  and if you think about doing that with Rcpp in rolling
window you are just incrementing a counter and peeling off rows (or columns
of X and y) of a particular window size, and following that up with some
matrix multiplication in a loop.   The psuedo-code for that Rcpp
practically writes itself and you might want a wrapper of something like:
rolling_lm (y=y, x=x, width=4).

My question is this: has any of the thousands of R packages out there
published anything like that.  Rolling window multiple regressions that
stay in C/C++ until the rolling window completes?  No sense and writing it
if it exist.


Thanks,
Jeremiah

Standard (slow) advice for "rolling window regression" follows:


set.seed(1)
z <- zoo(matrix(rnorm(10), ncol = 2))
colnames(z) <- c("y", "x")

## rolling regression of width 4
rollapply(z, width = 4,
   function(x) coef(lm(y ~ x, data = as.data.frame(x))),
   by.column = FALSE, align = "right")

## result is identical to
coef(lm(y ~ x, data = z[1:4,]))
coef(lm(y ~ x, data = z[2:5,]))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] readline issue with 3.3.1

2016-07-21 Thread Martin Maechler
> Ralf Goertz 
> on Wed, 20 Jul 2016 16:37:53 +0200 writes:

> Am Wed, 20 Jul 2016 11:35:31 +0200
> schrieb Ralf Goertz :

>> Hi,
>> 
>> after a recent update to version 3.3.1 on Opensuse Leap I have
>> problems with command lines longer than the terminal width. E.g. when
>> I do this

> I installed readline version 6.3 and the issue is gone. So probably some
> of the recent changes in R's readline code are incompatible with version
> readline version 6.2.

Yes, it seems so, unfortunately.

Thank you for reporting !

Our plan had been different: the NEWS entry for 3.3.1 (among 'BUG FIXES') says

• Use of Ctrl-C to terminate a reverse incremental search started
  by Ctrl-R in the readline-based Unix terminal interface is now
  supported when R was compiled against readline >= 6.0 (Ctrl-G
  always worked).  (PR#16603)

So we had hoped that change (fixing a bug you *should* have been
able to see with readline 6.2, as well) would work correctly in all
versions of readline >= 6.0,
but evidently it did not in yours.

Martin Maechler
ETH Zurich / R Core Team

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to expand the dataframe

2016-07-21 Thread Daniel Nordlund

On 7/20/2016 8:26 PM, lily li wrote:

Yes, I tried to create a dataframe and merge it with the shortened
dataframe. The resulting dataframe goes with the short one and truncates
the complete date column, so it does not work.

On Wed, Jul 20, 2016 at 6:38 PM, David Winsemius 
wrote:




On Jul 20, 2016, at 1:31 PM, lily li  wrote:

Hi R users,

I have a dataframe, where there is a column 'time' represents time series
but is not complete. How to expand the dataframe so this column will

become

complete, where other columns with the newly added rows have NA values?
Thanks.

df
A B C time
105 3.3 1990-01-01
115  4 1990-02-07
124 3  1990-02-14
...


Make a dataframe with a 'time' column using seq.Date and merge that
dataframe with your df dataframe.



  [[alternative HTML version deleted]]


Really  isn't it time you learned how to send plain text. You've
posted many questions on Rhelp.  It's really not that difficult on gmail. I
also have a gmail account and have had no difficulty finding instructions
on how to do it.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius
Alameda, CA, USA






Don't just say you tried to do the merge and it doesn't work.  At a 
minimum show us the ACTUAL code you used and give us any error messages 
you got or show us a portion of the results and explain why it is not 
what you expected.  If possible, give us a small amount of data using 
dput() so that we can "play along at home" (i.e. give us a reproducible 
example).


Dan

Daniel Nordlund
Port Townsend, WA



--
Daniel Noredlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading PCorpus created using tm package

2016-07-21 Thread Paul Johnston
Dear All

I have created a permanent corpus and I can see the file exist:

setwd("E:/textmining/texts")
(data_mined_permanent <- PCorpus(DirSource(("E:/textmining/texts")),
 readerControl = list(languages = "eng"), 
 dbControl = 
list(dbName="E:/textmining/db_one",dbType = "DB1")))
print(list.files(path = "E:/textmining/"))


## [1] "corpora" "db_one"
## [3] "part_one.pdf" "part_one.rmd"
## [5] "processing_corpra.pdf" "processing_corpra.rmd"
## [7] "textmining.pdf" "textmining.rmd"
## [9] "textmining2.pdf" "textmining2.Rmd"
## [11] "textmining3.pdf" "textmining3.Rmd"

However I cannot see how to reload the said corpus after restarting an R 
session.

I've seen this 
http://stackoverflow.com/questions/28377646/how-to-reconnect-to-the-pcorpus-in-the-r-tm-package

So can I create a corpus and just reload the file persistent copy or not?

Cheers Paul

Paul Johnston
Research Infrastructure
Room B39
Sackville Street Building

 http://bit.ly/ResearchITFeedback
We would run with all of our might 
Push the king off to take the hill 
And to learn who was king 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with Transform function in R

2016-07-21 Thread tom
Not sure I can translate the format of your Date column correctly, however the 
command
DF1$Date <- as.Date(DF1$Date, format=’formatstr’)

Will convert the dates into a format correctly handled by R.
?strptime 

Should give you an idea of what formatstr should look like. 
I.e. if 
date = 160721
as.Date(date, format=”%y%M%d”)



From: Bhaskar Mitra
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] txtProgressBar()

2016-07-21 Thread tom
You may like to look at 
?suppressMessages

P.S. sorry for posting in HTML, new laptop and it’s next on my list of things 
to fix.


From: Ivan Calandra
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issue with Transform function in R

2016-07-21 Thread Ivan Calandra

Oops, missed that one!

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 21/07/2016 à 14:33, ruipbarra...@sapo.pt a écrit :


Hello,

Another thing to consider is to use Variable1 = NA, not '=='.
With '==' it will probably return TRUE/FALSE/NA.

Hope this helps,

Rui Barradas

Citando Ivan Calandra >:


This might not be the whole story, but part of the problem is that 
you want to select a _*character string*_ greater/smaller than 
another. That doesn't make much sense!


I am not sure how to best compare two dates, but if you convert the 
Date values into numeric, then that would work. The problem is that 
it seems your Date values are character, and the comparison in your 
ifelse is also a character.


So something like this might work (untested because no reproducible 
example):
transform(Df1, ifelse(as.numeric(Date) > numeric.value1 && 
as.numeric(Date) < numeric.value2, Variable1 ==NA, Variable1))


HTH,
Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr 
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 21/07/2016 à 03:00, Bhaskar Mitra a écrit :


Hello Everyone,


I am trying to replace the values in the 2nd column (Variable 1)
corresponding to certain dates  (Date)


with NAs as shown below. Both Date and Variable1 are numeric vectors 
. I am

trying to use the transform function


as shown below but it doesn’t seem to work even though if I am not 
getting

any error


Any suggestions/help in this regard?

regards,

-



Df1 <- data.frame(Date, Variable1)




a1 <- transform(Df1, ifelse(Date  > "010301000300 " && Date <
"010501000300", Variable1 ==NA, Variable1))


Original Data frame



 Date  Variable1

010101000300 1

010201000300 2

010301000300 3

010401000300 4

010501000300 5

010601000300 6

010701000300 7

.

.

.

……….




Transformed data frame (i hope to transform)



  Date   Variable1

0101010003001

0102010003002

010301000300   NA

010401000300   NA

010501000300NA

0106010003006

0107010003007



……….

[[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list -- 
To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org  mailing list -- To 
UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.htmland provide commented, 
minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help installing Rcmdr on Mac

2016-07-21 Thread Duncan Murdoch

On 20/07/2016 11:47 PM, Judy Munday wrote:

I am having trouble installing R Commander on a Mac (OS X 10.7.5).  I would 
really appreciate some help!


I receive the following message after following the instructions to install 
Rcmdr via Package Installer:


Warning: dependency 'rglwidget' is not available


This is an indication that the CRAN mirror you're using is incomplete. 
I recommend cloud.r-project.org as a reliable mirror; the folks at 
RStudio run it, and they do a good job.



trying URL 
'http://cran.ms.unimelb.edu.au/bin/macosx/contrib/3.2/Rcmdr_2.2-5.tgz'
Content type 'application/x-gzip' length 5456133 bytes (5.2 MB)
==
downloaded 5.2 MB


That part was fine.

Duncan Murdoch


? If someone could please help me, I would be very grateful.   It is quite 
urgent!

Many thanks in advance

Judy Munday


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help installing Rcmdr on Mac

2016-07-21 Thread Fox, John
Dear Judy,

I see that the rglwidget package *is* available on CRAN but also that you're 
using a very old version of Mac OS X. You may have to upgrade OS X and install 
the current version of R (which is not available for OS X 10.7) in order to use 
the Rcmdr package.

You may also be able to get a more definitive answer from others more familiar 
with R package availability on Mac OS X.

I hope this helps,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox



> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Judy Munday
> Sent: July 20, 2016 11:47 PM
> To: r-help@R-project.org
> Subject: [R] Help installing Rcmdr on Mac
> 
> I am having trouble installing R Commander on a Mac (OS X 10.7.5).  I would
> really appreciate some help!
> 
> 
> I receive the following message after following the instructions to install 
> Rcmdr
> via Package Installer:
> 
> 
> Warning: dependency 'rglwidget' is not available trying URL
> 'http://cran.ms.unimelb.edu.au/bin/macosx/contrib/3.2/Rcmdr_2.2-5.tgz'
> Content type 'application/x-gzip' length 5456133 bytes (5.2 MB)
> ==
> downloaded 5.2 MB
> 
> ? If someone could please help me, I would be very grateful.   It is quite 
> urgent!
> 
> Many thanks in advance
> 
> Judy Munday
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with Transform function in R

2016-07-21 Thread ruipbarradas
Hello,

Another thing to consider is to use Variable1 = NA, not '=='.
With '==' it will probably return TRUE/FALSE/NA.

Hope this helps,

Rui Barradas
 

Citando Ivan Calandra :

> This might not be the whole story, but part of the problem is that  
> you want to select a _*character string*_ greater/smaller than  
> another. That doesn't make much sense!
>
> I am not sure how to best compare two dates, but if you convert the  
> Date values into numeric, then that would work. The problem is that  
> it seems your Date values are character, and the comparison in your  
> ifelse is also a character.
>
> So something like this might work (untested because no reproducible example):
> transform(Df1, ifelse(as.numeric(Date) > numeric.value1 &&  
> as.numeric(Date) < numeric.value2, Variable1 ==NA, Variable1))
>
> HTH,
> Ivan
>
> --
> Ivan Calandra, PhD
> Scientific Mediator
> University of Reims Champagne-Ardenne
> GEGENAA - EA 3795
> CREA - 2 esplanade Roland Garros
> 51100 Reims, France
> +33(0)3 26 77 36 89
> ivan.calan...@univ-reims.fr
> --
> https://www.researchgate.net/profile/Ivan_Calandra
> https://publons.com/author/705639/
>
> Le 21/07/2016 à 03:00, Bhaskar Mitra a écrit :
>> Hello Everyone,
>>
>> I am trying to replace the values in the 2nd column (Variable 1)
>> corresponding to certain dates  (Date)
>>
>> with NAs as shown below. Both Date and Variable1 are numeric vectors . I am
>> trying to use the transform function
>>
>> as shown below but it doesn’t seem to work even though if I am not getting
>> any error
>>
>> Any suggestions/help in this regard?
>>
>> regards,
>>
>> -
>>
>> Df1 <- data.frame(Date, Variable1)
>>
>> a1 <- transform(Df1, ifelse(Date  > "010301000300 " && Date <
>> "010501000300", Variable1 ==NA, Variable1))
>>
>> Original Data frame
>>
>>      Date                          Variable1
>>
>> 010101000300                     1
>>
>> 010201000300                     2
>>
>> 010301000300                     3
>>
>> 010401000300                     4
>>
>> 010501000300                     5
>>
>> 010601000300                     6
>>
>> 010701000300                     7
>>
>> .
>>
>> .
>>
>> .
>>
>> ……….
>>
>> Transformed data frame (i hope to transform)
>>
>>       Date                       Variable1
>>
>> 010101000300                    1
>>
>> 010201000300                    2
>>
>> 010301000300                   NA
>>
>> 010401000300                   NA
>>
>> 010501000300                    NA
>>
>> 010601000300                    6
>>
>> 010701000300                    7
>>
>> ……….
>>
>>         [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a vector of strings

2016-07-21 Thread Michael Dewey

Dear Eric

I think you are looking for sub or gsub

Without an example set of input and output I am not quite sure but you 
would need to define an expression which matches your separator (;) 
followed by any characters up to the end of line. If you have trouble 
with that then someone here will no doubt write the pattern for you but 
learning about regular expressions is well worthwhile


On 21/07/2016 12:54, Eric Elguero wrote:

Hi everybody,

I have a vector of character strings.
Each string has the same pattern and I want
to split them in pieces and get a vector made
of the first pieces of each string.

The problem is that strsplit returns a list.

All I found is

uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]

where x is the vector ";" is the delimiting character
and I know that each string will be cut in 3 pieces.

That works for my problem but I would prefer a
more elegant solution. Besides, it would not
work if all the string didn't have the same
number of pieces.

does someone have a better solution?

sorry if that topic was discussed recently.
There is too much traffic on the r-help list,
I cannot catch up.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issue with Transform function in R

2016-07-21 Thread Ivan Calandra
This might not be the whole story, but part of the problem is that you 
want to select a _*character string*_ greater/smaller than another. That 
doesn't make much sense!


I am not sure how to best compare two dates, but if you convert the Date 
values into numeric, then that would work. The problem is that it seems 
your Date values are character, and the comparison in your ifelse is 
also a character.


So something like this might work (untested because no reproducible 
example):
transform(Df1, ifelse(as.numeric(Date) > numeric.value1 && 
as.numeric(Date) < numeric.value2, Variable1 ==NA, Variable1))


HTH,
Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 21/07/2016 à 03:00, Bhaskar Mitra a écrit :

Hello Everyone,


I am trying to replace the values in the 2nd column (Variable 1)
corresponding to certain dates  (Date)


with NAs as shown below. Both Date and Variable1 are numeric vectors . I am
trying to use the transform function


as shown below but it doesn’t seem to work even though if I am not getting
any error


Any suggestions/help in this regard?

regards,

-



Df1 <- data.frame(Date, Variable1)




a1 <- transform(Df1, ifelse(Date  > "010301000300 " && Date <
"010501000300", Variable1 ==NA, Variable1))


Original Data frame



  Date  Variable1

010101000300 1

010201000300 2

010301000300 3

010401000300 4

010501000300 5

010601000300 6

010701000300 7

  .

.

.

……….




Transformed data frame (i hope to transform)



   Date   Variable1

0101010003001

0102010003002

010301000300   NA

010401000300   NA

010501000300NA

0106010003006

0107010003007



……….

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a vector of strings

2016-07-21 Thread Ben Tupper
Hi,

I'm not sure about the more generalized solution, but how about this for a 
start.


x <- c("a;b;c", "d;e", "foo;g;h;i")
x
#[1] "a;b;c" "d;e"   "foo;g;h;i"

sapply(strsplit(x, ";",fixed = TRUE), '[',1)
#[1] "a"   "d"   "foo"

If you want elegance then I suggest you take a look at the stringr package. 

https://cran.r-project.org/web/packages/stringr/index.html

Cheers,
Ben


> On Jul 21, 2016, at 7:54 AM, Eric Elguero  wrote:
> 
> Hi everybody,
> 
> I have a vector of character strings.
> Each string has the same pattern and I want
> to split them in pieces and get a vector made
> of the first pieces of each string.
> 
> The problem is that strsplit returns a list.
> 
> All I found is
> 
> uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]
> 
> where x is the vector ";" is the delimiting character
> and I know that each string will be cut in 3 pieces.
> 
> That works for my problem but I would prefer a
> more elegant solution. Besides, it would not
> work if all the string didn't have the same
> number of pieces.
> 
> does someone have a better solution?
> 
> sorry if that topic was discussed recently.
> There is too much traffic on the r-help list,
> I cannot catch up.
> 
> -- 
> Eric Elguero
> 
> MIVEGEC. - UMR (CNRS/IRD/UM) 5290
> Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle
> Institut de Recherche pour le Développement (IRD)
> 911, Avenue Agropolis
> BP 64501
> 34394 Montpellier Cedex 5, France
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org

Report Gulf of Maine jellyfish sightings to jellyf...@bigelow.org or tweet them 
to #MaineJellies -- include date, time, and location, as well as any 
descriptive information such as size or type.  Learn more at 
https://www.bigelow.org/research/srs/nick-record/nick-record-laboratory/mainejellies/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] splitting a vector of strings

2016-07-21 Thread Eric Elguero

Hi everybody,

I have a vector of character strings.
Each string has the same pattern and I want
to split them in pieces and get a vector made
of the first pieces of each string.

The problem is that strsplit returns a list.

All I found is

uu<- matrix(unlist(strsplit(x,";")),ncol=3,byrow=T)[,1]

where x is the vector ";" is the delimiting character
and I know that each string will be cut in 3 pieces.

That works for my problem but I would prefer a
more elegant solution. Besides, it would not
work if all the string didn't have the same
number of pieces.

does someone have a better solution?

sorry if that topic was discussed recently.
There is too much traffic on the r-help list,
I cannot catch up.

--
Eric Elguero

MIVEGEC. - UMR (CNRS/IRD/UM) 5290
Maladies Infectieuses et Vecteurs, Génétique, Evolution et Contrôle
Institut de Recherche pour le Développement (IRD)
911, Avenue Agropolis
BP 64501
34394 Montpellier Cedex 5, France

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Error con la R Commander

2016-07-21 Thread martin_jose


Muchas gracias, ha funcionado.

El problema es que el Mac es ya un poco viejo y no admite más  
actualizaciones. Por eso, cada vez que se me actualiza... a temblar!!!


Un abrazo

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R] Help installing Rcmdr on Mac

2016-07-21 Thread Judy Munday
I am having trouble installing R Commander on a Mac (OS X 10.7.5).  I would 
really appreciate some help!


I receive the following message after following the instructions to install 
Rcmdr via Package Installer:


Warning: dependency 'rglwidget' is not available
trying URL 
'http://cran.ms.unimelb.edu.au/bin/macosx/contrib/3.2/Rcmdr_2.2-5.tgz'
Content type 'application/x-gzip' length 5456133 bytes (5.2 MB)
==
downloaded 5.2 MB

? If someone could please help me, I would be very grateful.   It is quite 
urgent!

Many thanks in advance

Judy Munday


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Issue with Transform function in R

2016-07-21 Thread Bhaskar Mitra
Hello Everyone,


I am trying to replace the values in the 2nd column (Variable 1)
corresponding to certain dates  (Date)


with NAs as shown below. Both Date and Variable1 are numeric vectors . I am
trying to use the transform function


as shown below but it doesn’t seem to work even though if I am not getting
any error


Any suggestions/help in this regard?

regards,

-



Df1 <- data.frame(Date, Variable1)




a1 <- transform(Df1, ifelse(Date  > "010301000300 " && Date <
"010501000300", Variable1 ==NA, Variable1))


Original Data frame



 Date  Variable1

010101000300 1

010201000300 2

010301000300 3

010401000300 4

010501000300 5

010601000300 6

010701000300 7

 .

.

.

……….




Transformed data frame (i hope to transform)



  Date   Variable1

0101010003001

0102010003002

010301000300   NA

010401000300   NA

010501000300NA

0106010003006

0107010003007



……….

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Error con la R Commander

2016-07-21 Thread Isidro Hidalgo Arellano
¿Has actualizado el paquete 'pbkrtest' a la nueva versión (va por la
0.4.6.)?
El error te está indicando que la tuya (0.4.2.) es demasiado 'viejuna'
Un saludo

Isidro Hidalgo Arellano
Observatorio del Mercado de Trabajo
Consejería de Economía, Empresas y Empleo
http://www.castillalamancha.es/



-Mensaje original-
De: R-help-es [mailto:r-help-es-boun...@r-project.org] En nombre de JOSE
MARTIN AREVALO
Enviado el: jueves, 21 de julio de 2016 13:31
Para: r-help-es@r-project.org
Asunto: [R-es] Error con la R Commander

Hola, hace unos días actualicé los paquetes que tenía instalados en R y se
me actualizó R Commander. Ahora, cuando intento, abrir el paquete me da el
siguiente error:

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck =
vI[[j]]) : 
  namespace ‘pbkrtest’ 0.4-2 is being loaded, but >= 0.4.4 is required

La versión que tengo instalada de R es la 3.2.1 para Mac. He visto los
requisitos de RCommander y parece que tengo todos los paquetes que necesita
instalados.

¿Cómo puedo arreglarlo?

Toda ayuda será bienvenida.

Muchas gracias
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Error con la R Commander

2016-07-21 Thread miguel.angel.rodriguez.muinos
Hola Jose.

Debes instalar una versión más actual de
https://cran.r-project.org/web/packages/pbkrtest/index.html

El error dice que tienes la 0.4-2 y necesitas la 0.4-4 o la 0.4-6 (que
es la última)

Un saludo,

--
Miguel Ángel Rodríguez Muíños
Dirección Xeral de Saúde Pública
Consellería de Sanidade
Xunta de Galicia
http://dxsp.sergas.es



El 21/07/2016 a las 13:31, JOSE MARTIN AREVALO escribió:
> Hola, hace unos días actualicé los paquetes que tenía instalados en R y se me 
> actualizó R Commander. Ahora, cuando intento, abrir el paquete me da el 
> siguiente error:
>
> Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = 
> vI[[j]]) :
>namespace ‘pbkrtest’ 0.4-2 is being loaded, but >= 0.4.4 is required
>
> La versión que tengo instalada de R es la 3.2.1 para Mac. He visto los 
> requisitos de RCommander y parece que tengo todos los paquetes que necesita 
> instalados.
>
> ¿Cómo puedo arreglarlo?
>
> Toda ayuda será bienvenida.
>
> Muchas gracias






Nota: A información contida nesta mensaxe e os seus posibles documentos 
adxuntos é privada e confidencial e está dirixida únicamente ó seu 
destinatario/a. Se vostede non é o/a destinatario/a orixinal desta mensaxe, por 
favor elimínea. A distribución ou copia desta mensaxe non está autorizada.

Nota: La información contenida en este mensaje y sus posibles documentos 
adjuntos es privada y confidencial y está dirigida únicamente a su 
destinatario/a. Si usted no es el/la destinatario/a original de este mensaje, 
por favor elimínelo. La distribución o copia de este mensaje no está autorizada.

See more languages: http://www.sergas.es/aviso-confidencialidad

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R-es] Error con la R Commander

2016-07-21 Thread JOSE MARTIN AREVALO
Hola, hace unos días actualicé los paquetes que tenía instalados en R y se me 
actualizó R Commander. Ahora, cuando intento, abrir el paquete me da el 
siguiente error:

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = 
vI[[j]]) : 
  namespace ‘pbkrtest’ 0.4-2 is being loaded, but >= 0.4.4 is required

La versión que tengo instalada de R es la 3.2.1 para Mac. He visto los 
requisitos de RCommander y parece que tengo todos los paquetes que necesita 
instalados.

¿Cómo puedo arreglarlo?

Toda ayuda será bienvenida.

Muchas gracias
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R] R Toolbox (Release 2 of 2016-07-21)

2016-07-21 Thread G . Maubach
Hi All,

I have uploaded a new release of the R Toolbox.

R Toolbox is a collection of simple but useful functions which I developed 
for myself to shorten the develoment process. Currently all functions use 
base R. No other packages are needed. One exception is "t_openxlsx" cause 
this module deals explicitly with the openxlsx package.

It is simple to install the functions. Just copy them to an appropriety 
place on your hard disk and adjust the variable "t_toolbox_location" to 
the place you stored the toolbox in. Running "r_toolbox.R" from that 
location will load all modules.

In addition to new functions (see Release Comparison below) some functions 
were improved. The are called with their package names, e. g. 
openxlsx::read.xlsx() instead of "read.xlsx()". This way confusion with 
functions having the same name but comming from other packages is avoided.

Pleae be aware that I have include some not tested function in this 
release. All modules have a variable "t_status" now, stating the 
development status, e. g. "development", "testing", "release". 

Here is a Releae Comparison:

-- cut --

release_comparison <-
   structure(list(Module = c("r_toolbox.R", "t_adjust_packages.R", 
  "t_conventions.r", 
"t_create_variable.R", "t_definitions.R", 
  "t_find_originals_and_duplicates.R", 
"t_get_factor_levels.R", 
  "t_merge_variables.R", "t_n_miss.R", 
"t_n_valid.R", "t_openxlsx_shortcuts.r", 
  "t_rename_variables.R", 
"t_replace_na.R", "t_report_memory.R", 
  "t_select_vars_by_type.R"), Release1 = 
c(TRUE, FALSE, FALSE, 
 FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, 
 FALSE, FALSE), Release2 = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
 TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), .Names = c("Module", 
   "Release1", "Release2"), row.names = c(NA, 15L), 
class = "data.frame")
edit(release_comparison)

-- cut ---

Release 1 is of 2016-05-31, Releae 2 of 2016-06-21.

You can download the toolbox from

https://sourceforge.net/projects/r-project-utilities/

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] un solo un favor

2016-07-21 Thread Francisco Rodriguez Sanchez

Hola Mauricio,

Como ya han indicado otros, spTransform y CRS sólo sirven para 
reproyectar (de geográficas a UTM en este caso).


Para recortar a una zona determinada puedes usar por ejemplo la función 
crop del paquete raster (entre otras opciones).


Algo así funciona:

chile <- readRDS("CHL_adm0.rds")

# project to UTM 19S
chile.UTM <- spTransform(chile,
  CRS("+proj=utm +zone=19 +south 
+ellps=WGS84 +datum=WGS84 +units=m +no_defs"))

bbox(chile.UTM)   # bounding box (extent) of the entire country

# Now crop to selected region
library(raster)
area.extent <- extent(c(10, 50, 450, 500))  # define 
limits (in UTM coordinates)

study.area <- crop(chile.UTM, area.extent)   # crop
plot(study.area)


Suerte

Paco



El 20/07/2016 a las 18:43, Mauricio Mardones Inostroza escribió:

Hola a todos
Esta es mi primera pregunta en el grupo, y es sencilla pero me tiene
atascado. Estoy tratando de cortar mi mapa de (poner limites en UTM)  en un
lugar definido como mi area de estudio (en este caso el sur de chile). Pero
creo no estar usando bien la función CRS ponendo bien los limites
requeridos.



study_area <- readRDS("CHL_adm0.rds")
study_area_UTM <- spTransform(study_area, CRS("+proj=utm +zone=19

+datum=WGS84"))

study_area_UTM <- spTransform(study_area_UTM, CRS(

+ paste("+x_0=-200.0 +y_0=-50.0 +ellps=GRS80 +units=us-ft
+no_defs")))
Error in spTransform(study_area_UTM, CRS(paste("+x_0=-200.0
+y_0=-50.0 +ellps=GRS80 +units=us-ft +no_defs"))) :
   error in evaluating the argument 'CRSobj' in selecting a method for
function 'spTransform': Error in CRS(paste("+x_0=-200.0 +y_0=-50.0
+ellps=GRS80 +units=us-ft +no_defs")) :
   projection not named



Espero me puedan ayudar.

Saludos


--
Dr Francisco Rodriguez-Sanchez
Integrative Ecology Group
Estacion Biologica de Doñana - CSIC
Avda. Americo Vespucio s/n
41092 Sevilla (Spain)
http://bit.ly/frod_san

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] un solo un favor

2016-07-21 Thread rubenfcasal

Hola Mauricio,

No tengo muy claro que intentas hacer. En primer lugar la 
proyección/CRS no serviría para recortar los datos (simplemente sirve 
para poder interpretar las coordenadas), para esto en principio 
utilizaría la función 'over' del paquete sp (ejecuta 'vignette("over")').


En segundo lugar tu objeto "CHL_adm0.rds" parece ser un polígono 
espacial correspondiente al contorno de Chile (descargado de 
http://www.gadm.org/country). Si simplemente lo vas a utilizar para 
representarlo bastaría con fijar los límites a la hora de hacer el 
gráfico (o pintarlo por encima...). Si necesitas algo más, recortarlo 
puede ser un problema. Puede que lo que te interese sea aumentar la 
resolución (empleando "CHL_admX.rds", i.e. áreas administrativas de 
nivel X) y seleccionar las áreas que te interesen. En cualquier caso te 
recomiendo que te documentes sobre como manipular este tipo de objetos 
(la referencia recomendada para empezar sería el libro Applied Spatial 
Data Analysis With R).


Un saludo, Rubén.


El 20/07/2016 a las 18:43, Mauricio Mardones Inostroza escribió:

Hola a todos
Esta es mi primera pregunta en el grupo, y es sencilla pero me tiene
atascado. Estoy tratando de cortar mi mapa de (poner limites en UTM)  en un
lugar definido como mi area de estudio (en este caso el sur de chile). Pero
creo no estar usando bien la función CRS ponendo bien los limites
requeridos.



study_area <- readRDS("CHL_adm0.rds")
study_area_UTM <- spTransform(study_area, CRS("+proj=utm +zone=19

+datum=WGS84"))

study_area_UTM <- spTransform(study_area_UTM, CRS(

+ paste("+x_0=-200.0 +y_0=-50.0 +ellps=GRS80 +units=us-ft
+no_defs")))
Error in spTransform(study_area_UTM, CRS(paste("+x_0=-200.0
+y_0=-50.0 +ellps=GRS80 +units=us-ft +no_defs"))) :
   error in evaluating the argument 'CRSobj' in selecting a method for
function 'spTransform': Error in CRS(paste("+x_0=-200.0 +y_0=-50.0
+ellps=GRS80 +units=us-ft +no_defs")) :
   projection not named



Espero me puedan ayudar.

Saludos


___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] txtProgressBar()

2016-07-21 Thread Ivan Calandra

Thank you Greg,

This is what I figured out... The problem with txtProgressBar() is that 
many packages display some information during installation (even with 
quiet=TRUE), especially the installation of dependencies, so that the 
progress bar is not very useful. So I have tried with tkProgressBar() 
and it seems to work, although it takes some time to initialize.


Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 20/07/2016 à 17:39, Greg Snow a écrit :

You need to figure out how to tell txtProgressBar what the progress is.

One simple option would be that if you are installing 10 packages,
then create the bar with a range of values from 0 to 10 and initialize
it at 0, then after the first package installs update it to show 1,
after the 2nd installs update it to show 2, etc. until all 10 are
installed.

This is the simplest from the programming side, but the packages may
take different amounts of time to install.  If you have a feel for how
long they take to install (relative to each other) then you can
incorporate this with a percentage, e.g. after the 1st package
installs you may set the bar to 28%, after the second installs you may
then update it to 31%, etc. with the jumps proportional to expected
time to install.


On Wed, Jul 20, 2016 at 2:00 AM, Ivan Calandra
 wrote:

Dear useRs,

In a script that will be source()d, I want to install the uninstalled
packages and follow the progression with a bar. So I looked at
txtProgressBar() but I cannot figure out how to use it to show the
progression of the installation.

All the examples I have found just display the progress of... the progress
bar itself ?

Any idea?

Thanks in advance,
Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.