Hello Peter, by judging from your code snippet:
|> ts_Y <- ts(log_residuals[1:104]); # detrended sales data |> ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]); |> ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]); |> training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL); |> |> ### Try VAR(3) |> var_model <- VAR (y=training_matrix, p=3, |> type="both", season=NULL, |> exogen=NULL, lag.max=NULL); you have one endogenous variable, namely ts_Y, and two exgoenous variables, namely ts_XGG and ts_XGL. Now, how you have set up 'training_matrix' all three variables are treated as endogenous (see ?VAR for more information). What you really want to estimate and predict is a **univariate** AR(3) model with two exogenous variables. For these type of models VAR() is not the right function, but you could rather use lm() and/or dynlm(). The forcasts should then be computed recursively. Best, Bernhard |> -----Original Message----- |> From: r-help-boun...@r-project.org |> [mailto:r-help-boun...@r-project.org] On Behalf Of |> pe...@linelink.nl |> Sent: Sunday, February 07, 2010 11:37 PM |> To: r-help@r-project.org |> Subject: [R] Out-of-sample prediction with VAR |> |> Good day, |> |> I'm using a VAR model to forecast sales with some extra |> variables (google |> trends data). I have divided my dataset into a trainingset |> (weekly sales + |> vars in 2006 and 2007) and a holdout set (2008). |> It is unclear to me how I should predict the out-of-sample |> data, because |> using the predict() function in the vars package seems to |> estimate my |> google trends vars as well. However, I want to forecast |> the sales figures, |> with knowledge of the actual google trends data. |> |> My questions: |> 1. How should I do this? I currently extract the linear |> model generated by |> the VAR(3) function to predict the holdout set, but that seems |> inappropriate? |> 2. In case that I am doing it right, how is it possible that a |> automatically fitted model with more variables actually |> performs less good |> (in terms of MAPE)? Shouldn't it at least predict just as |> well as the |> simple AR(3) by finding that the extra variables have no |> added value? |> |> My code: |> |> ts_Y <- ts(log_residuals[1:104]); # detrended sales data |> ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]); |> ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]); |> training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL); |> |> ### Try VAR(3) |> var_model <- VAR (y=training_matrix, p=3, |> type="both", season=NULL, |> exogen=NULL, lag.max=NULL); |> |> ## Out of sample forecasting |> var.lm = lm(var_model$varresult$ts_Y); # the |> generated LM |> |> ts_Y <- ts(log_residuals[105:155]); |> ts_XGG <- ts(salesmodeldata$gtrends_global[105:155]); |> ts_XGL <- ts(salesmodeldata$gtrends_local[105:155]); |> |> # Notice how I manually create the lagged |> values to be used in the |> Linear Model |> holdout_matrix <- |> na.omit(data.frame(ts.union(ts_Y, ts_XGG, ts_XGL, |> ts_Y.l1 = lag(ts_Y,-1), ts_Y.l2 = lag(ts_Y,-2), ts_Y.l3 = |> lag(ts_Y,-3), |> ts_XGG.l1 = lag(ts_XGG,-1), ts_XGG.l2 = lag(ts_XGG,-2), ts_XGG.l3 = |> lag(ts_XGG,-3), ts_XGL.l1 = lag(ts_XGL,-1), ts_XGL.l2 = |> lag(ts_XGL,-2), |> ts_XGL.l3 = lag(ts_XGL,-3), const=1, trend=0.0001514194 ))); |> |> var.predict = predict(object=var_model, |> n.ahead=52, dumvar=holdout_matrix); |> |> ## Assess accuracy |> calc_mape (holdout_matrix$ts_Y, var.predict, |> islog=T, print=T) |> |> Some context: |> For my Master's thesis I'm using R to test the predictive |> power of web |> metrics (such as google trends data & pageviews) in sales |> forecasting. To |> properly assess this, I employ a simple AR model (for time |> series without |> the extra variables) and a VAR model for the predictions |> with the extra |> variables. I also develop a random forest with, and |> without the buzz |> variables and see if MAPE improves. |> |> Many thanks in advance! |> |> ______________________________________________ |> R-help@r-project.org mailing list |> https://stat.ethz.ch/mailman/listinfo/r-help |> PLEASE do read the posting guide |> http://www.R-project.org/posting-guide.html |> and provide commented, minimal, self-contained, reproducible code. |> ***************************************************************** Confidentiality Note: The information contained in this ...{{dropped:10}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.