Re: Robust regression
If, for example, normality assumption holds then by doing robust regression instead of OLS you lose efficiency. So, it's not the same result after all. But you can do both, compare and decide. If robust regression produces results which are not really different from the OLS then stay with OLS. On Fri, 1 Mar 2002, Rich Ulrich wrote: On 1 Mar 2002 00:36:01 -0800, [EMAIL PROTECTED] (Alex Yu) wrote: I know that robust regression can downweight outliers. Should someone apply robust regression when the data have skewed distributions but do not have outliers? Regression assumptions require normality of residuals, but not the normality of raw scores. So does it help at all to use robust regression in this situation. Any help will be appreciated. Go ahead and do it if you want. If someone asks (or even if they don't), you can tell them that robust regression gives exactly the same result. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: How to test whether f(X,Y)=f(X)f(Y) is true??
You can start with checking if they are correlated. It's simpler to do. If you find that they are correlated then you have the answer to your question. If you find that they are uncorrelated and you have a reason to believe that they may be not independent anyway then you can look for more advanced tests. On 20 Feb 2002, Linda wrote: Hi! I have some experimental data collected and can be grouped into 2 variables, X and Y. One is the dependent variable (Y) and the other is an independent variable (X). What test shall I made to check whether there can be expressed as independent or not?? Thanks.. Linda = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: garch residuals
Stock market returns usually satisfy martingale property, and are uncorrelated. I think you should check your calculations again for errors. Are you sure that you are working with returns and not prices? I guess that by heavy correlation you mean that estimated autoregressive coefficient is close to 1, which holds for prices. Just a suggestion, hope it helps. On Tue, 19 Feb 2002, Daan Taks wrote: I have a question about my residuals. When testing for autocorrelation I come to the conclusion that the models (garch, Egarch, GJR a.k.a. Tarch) remove the correlation from the squared standardized residuals but not from the standardized residuals. Are my models misspecified?? I use returns from the FTSE, the DAX, and the SP. These returns are (heavily) correlated, should a garch model remove the correlation of the returns? Or should it only remove the correlation of the squared returns?? Thanks. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Non Parametric Unit Root Test
uniform distribution of what? Unit Root testing theory uses asymptotic results, so underlying distribution does not really matter as long as it satisfies some comditions. Check out Davidson Econometric Theory. You can find there a good intro into unit roots tests. More advanced treatment is in Tanaka Time Series Analysis. On 22 Jan 2002, Maand M wrote: Hi: I would like to know where can I read more about Non Parametric Unit Root Test for uniform distribution. Any book or paper on it? Any comment is welcome. Maand __ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/ = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Buy Book on Probability and statistical inference
Casella and Berger Statistical Inference is a very popular graduate level textbook on the topic. It's not related to your field directly, but it gives introduction to the concepts used in statistics: likelihood, sufficiency, completeness, statistical decision theory. Also you may want to get graduate level probability textbook. I recommend to try Shiryayev Probability. Again, these books are not applied and rather general, but you have to know this stuff if you are serious about statistical analysis. On Mon, 14 Jan 2002, Chia C Chong wrote: Vadim and Oxana Marmer [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... On Sat, 12 Jan 2002 14:37:10 -, Chia C Chong [EMAIL PROTECTED] wrote: Hi! I wish to get a book in Probability and statistical inference . I wish to get some advices first..Any good suggestion?? it depends on your background and your interests. If you can give more details about this then you can get more helpful suggestions. I am currently doing a PhD in Wireless Communications. My research are is to develop a statistical wireless channel model for the 4th generation systems. I would prefer a books that deal with a lot of pratical examples especially how to fit measurement data to theoretical distributions and perform goodness of fit test of their fits. Thanks.. CCC = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Buy Book on Probability and statistical inference
On Sat, 12 Jan 2002 14:37:10 -, Chia C Chong [EMAIL PROTECTED] wrote: Hi! I wish to get a book in Probability and statistical inference . I wish to get some advices first..Any good suggestion?? it depends on your background and your interests. If you can give more details about this then you can get more helpful suggestions. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Excel vs Quattro Pro
there is a lot of packages that are half-way between spreadsheets and formal programming languages: SAS, SPSS, Stata. anything is better than spreadsheets. On 8 Jan 2002, Kenmlin wrote: i don't know the answer to this but ... i have a general question with regards to using spreadsheets for stat analysis Many students are computer illiterate and it might be easier to teach them how to use the spreadsheet than a formal programming language. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
besides, who needs those tables? we have computers now, don't we? I was told that there were tables for logarithms once. I have not seen one in my life. Is not it the same kind of stuff? 3. Outdated. on the grounds that when sigma is unknown, the proper distribution is t (unless N is small and the parent population is screwy) regardless how large the sample size may be. The main (if not the only) reason for the apparent logical bifurcation at N = 30 or thereabouts was that, when one's only sources of information about critical values were printed tables, 30 lines was about what fit on one page (plus maybe a few extra lines for 40, 60, 120 d.f.) and one could not (or at any rate did not) expect one's business students to have convenient access to more extensive tables of the t distribution. And, one suspects latterly, authors were skeptical that students would pay attention to (or perhaps be able to master?) the technique of interpolating by reciprocals between 30 df and larger numbers of df (particularly including infinity). But currently, _I_ would not expect business students to carry out the calculations for hypothesis tests, or confidence intervals, by hand, except maybe half a dozen times in class for the good of their souls: I'd expect them to learn to invoke a statistical package, or else something like Excel that pretends to supply adequate statistical routines. And for all the packages I know of, there is a built-in function for calculating, or approximating, the cumulative distribution of t for ANY number of df. The advice in any _current_ business- statistics text ought to be, therefore, to use t _whenever_ sigma is not known. And if the textbook isn't up to that standard, the instructor jolly well should be. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
3) When n is greater than 30 and we do not know sigma, we must estimate sigma using s so we really should be using t rather than z. you are wrong. you use t-distribution not because you don't know sigma, but because your statistic has EXACT t-distribution under certain conditions. I know that the textbook says if we knew sigma then the distribution would be normal, but because we used s instead the distribution turned out to be t. It does not say how exactly it becomes t, so you make the conclusion: use t instead of normal whenever you use s instead of sigma. But it's wrong, it does not go like this. when you don't know underlying distribution of the sample you may use normal distribution (under certain regularity conditions), as an APPROXIMATION to the actual distribution of your statistic. approximate distribution in most cases is not parameter-free, it may depend, for example, on unknown sigma. in such situation you may replace the unknown parameter by its consistent estimator.the approximate distribution is still normal. think about it as iterated approximation. first you approximate the actual distribution by N(0,sigma^2), then you approximate it by N(0,S^2), where S^2 is a consistent estimator for sigma. there are formal theorems that allow you to do this kind of thigs. The essential difference between two approaches is that the first one tries to derive the EXACT disribution, second says I will use APPROXIMATION. number 30 has no importance at all, throw away all the tables you have. I cannot believe they still teach you this stuff. I wish it was that simle:30! Your confusion is the result of oversimplification and desire to provide students with simple stratagies which present in basic statistics textbooks. I guess it makes teaching very simple, but it mislead students. Your confusion is an example. The problem is that there is no simple strategies, and things are much-much more complicated than they appear in basic textbooks. Basic text books don't tell you the whole story, and they don't even try, because you simply cannot do this at their level. Don't make any strong conclusions after reading only basic textbooks. In practice, in business and economics statistics, nobody uses t-tests, but normal and chi-square approximations are used a lot. The assumptions that you have to make for t-test are too strong. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When to Use t and When to Use z Revisited
Sigma is hardly ever known, so you must use t. Then why not simply tell the students: use the t table as far as it goes, (usually around n=120), and after that, use the n=\infty line (which corresponds to the normal distribution). Then there is no need for a rule for when to use z, when to use t. but the data is not normal either in 99.9(9) of the cases. Furthermore, the data that you see in economics/business is very often is not an iid sample either. So, one way or another you end up with normal or chi-square. actually, there is an alternative to both approaches. it's bootstrap. but it does not always work and should not be used blindly. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Web programmer for you...
is it for you! ? (I hope there are Seinfeld fans here) On 8 Dec 2001, Alexander wrote: Hello, I am a professional web-programmer. (php/perl/mySQL/javascript/HTML). I want to work with you... If you are interested in my help, please write me: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Our work will be the next: You will send me an order, I'll do it and show it to you. You will pay me, only if you like my work... It's very easy. P.S.: My work is rather cheap... See you later... Alex = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: When Can We Really Use CLT Student t
On 21 Nov 2001, Ronny Richardson wrote: As I understand it, the Central Limit Theorem (CLT) guarantees that the distribution of sample means is normally distributed regardless of the distribution of the underlying data as long as the sample size is large enough and the population standard deviation is known. CLT does not guarantee anything. It's just an approximation that sometimes works and sometimes does not work. The underlying distribution does actually matter, or, more correctly, the data has to satisfy some regularity conditions for CLT to apply. Population standard deviation does not need to be known. It seems to me that most statistics books I see over optimistically invoke the CLT not when n is over 30 and the population standard deviation is known but anytime n is over 30. This seems inappropriate to me or am I overlooking something? Sometimes CLT is a good approximation for small data sets too, and sometimes it's not good even if n is very large. It all depends on the model, the data and so on. Often it's your only choice to use asymptotic argument and CLT. When the population standard deviation is not know (which is almost all the time) it seems to me that the Student t (t) distribution is more appropriate. not at all. again, you do not need to know standard deviation to apply CLT. You can replace unknown parameters by their consistent estiamtors. I do not know which textbooks you are refering to, but I suggest you to try something more advanced like Estimation and Inference in Econometrics by Davidson and MacKinnon or Econometric Theory by Davidson. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
There is a new book by Davidson Econometric Theory. His treatment of unit-root/cointegration econometrics is very clear and easy to follow. On Sun, 28 Oct 2001, David B wrote: The regression equation with iid errors implies cointegration of the two series. Yes, but not vice versa. So the quoted passage may be referring to a more general case. Yes, you are right. I unhappily figured it out just after having post it. I tried to reread this book but it is really too harsh for me (and besides is not very well written imho). You could look at section 8.2, entitled Ordinary least squares under more general conditions, of Time Series Analysis, by J. D. Hamilton. Section 8.3 might be of interest too. I don't have yet this (well-known) book, but I note this reference ; thanks for it. David B = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
First of all, interpretation of what is being estimated. 1. Fixed regressors: implies that data is coming from well controled experiment. Regressors (X's) are control variables and you estimate the response of the variable under investigation (Y) to changes in control variables, i.e. you (or the person who collected the data) were able to change values of X's and observe the response of Y. 2. Random regressors. Several cases here. (a) Cross-section data. Often You estimate Conditional Expectation Function, E(Y|X). With assumption of E(U|X)=3D0 statistics of this case is not realy different from fixed regressors case, since you can always condition on the data observed. (b) time-series case. You estimate Data Generating Process. In most of the cases you cannot condition on the data observed anymore (because then nothing random left in the model). You do not have E(U|X)=3D0, but E(UX)=3D0, which is weaker and makes statistics a little bit more complicated. (c) Cointegration. You estimate long-run (steady-state) relationship between variables. You do not have dependent and independent variables anymore, you do not have E(UX')=3D0 anymore (actually, you do not need it , see Fully Modified Least Squares in Davidson's book). Statistics of this case is completely different. In my opinion, first, you have to decide what interpretation fits your data most. Do you have experiment data or you are going to estimate long-run relationship between the variables? After you have decided what you are going to estimate you can choose appropriate technique. On Sun, 28 Oct 2001, David B wrote: Vadim and Oxana Marmer [EMAIL PROTECTED] a =E9crit dans le messa= ge news: [EMAIL PROTECTED] if you regress log of agregate consumption on log of GDP, woul you like= to treat log GDP as fixed regressor? I guess not. Fixed regressor implies = a lot of strong properties which is not reasonable to assume in this case= =2E What kind of properties if I may ask (I personnally never heard of such criteria) ? That could be an element of answer for me. David B = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
In the case of Y and X being two independent random walks, the mean of (XX)(-1)X'Y can be calculated using Wiener distribution theory however, and it is not zero (it looks very bad). The t-stat for slope is not zero either. The variance of both slope estimator and t-stats are much higher than standard theory forecast, and, what is even worse, do not decrease as sample size increase. If they are independent random walks with mean 0, or even if E(Y|X)=0, the mean of this will have to be 0. the problem is that you cannot test for this using standard regression diagnostic because t-statistic diverges to infinity as sample size increases, so you have to adjust your methods. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
(XX)(-1)X'Y can be calculated using Wiener distribution theory however, and it is not zero (it looks very bad). The t-stat for slope is not zero either. The variance of both slope estimator and t-stats are much higher than standard theory forecast, and, what is even worse, do not decrease as sample size increase. If Y = a + b*X + i.i.d. noise, X and Y can't be independent random walks. If the noise is not independent, then you need to account for that when computing the standard error. Radford Neal y=a+bx+... is the equation that researcher is trying to fit, but the true model is b=0. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
You will easily be able to see that that residuals from this regression are not independent. So this isn't a counterexample to my claim that There is certainly nothing wrong with using standard regression when an explanatory variable is randomly generated, from whatever sort of stochastic process you please, as long as the regression residuals are independent. You do not need independent residuals for regression If you account for this dependence in your test, I don't think you will reject the null hypothesis that b=0. Yes you will, if you use standard regression diagnostic. Now the intuition. Consider two time series: 1) US GDP, 2) cummulative amount of rain in Brazil. You can think that these series are independent, but try to run 2 on 1 and you will have very significant coefficients. The two time series may be independent, but if you fit a regression model, it will be obvious that the residuals are autocorrelated, and you need to adjust for this in doing your significance test. simple adjustment for autocorrelation won't help = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
... You can treat regressors as non-stochastic if you have control over it. So, it seems to me that the only case when you can treat regressors as fixed is when your data is coming from some designed experiment. I do not know what is your field of study, but if it's social science then you have a problem. In social science most of the data is measurement of uncontrolled (by researcher) processes and cannot be treated as fixed. What do you mean by cannot? What is it that goes wrong? Are you saying that the model will not make good predictions for new data from the same source? If so, I think you are wrong. Or are you saying that you won't be able to make conclusions about causal influences? That might well be, but for that, it's not really just a matter of fixed versus stochastic. When I say that you cannot treat regressors as fixed I mean following. Suppose Y=consumption, X=GDP then E(inv(X'X)X'Y) is not equal to inv(X'X)X'E(Y) since both X and Y are random variables, and you need a little bit different treatment of regression. So, mechanics of OLS changes a little bit, and of course, interpretation of regression is different. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
Neither is it in general. For consistency of the estimator, the inverse of X'X needs to converge to 0. But this is not generally a problem because of using integrated processes.. and speaking about random regressors: if regressors are not fixed than you need almost sure convergence or convergence in probability of the inverse of X'X, which are more complicated concepts. I mean, you have to be careful when you say that your regressors are not random variables. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
if you regress log of agregate consumption on log of GDP, woul you like to treat log GDP as fixed regressor? I guess not. Fixed regressor implies a lot of strong properties which is not reasonable to assume in this case. On Wed, 24 Oct 2001, David B wrote: this seems to be rather strong statement. You can treat regressors as non-stochastic if you have control over it. So, it seems to me that the only case when you can treat regressors as fixed is when your data is coming from some designed experiment. That is precisely what I wanted opinions about. It seems to me it is a philosophy of probability problem (to be pompous), which is overlooked in basic econometrics/statistics textbooks (or even more advanced one, I would say). Why would one be obliged to carefully test systematically for unit roots, since integrated process do not really exist ? Why couldn't we treat always the regressors as fixed, just keeping in mind that when they look like they are generated by an I(1) process, standard inference *could* be wrong ? Of course, I am aware that the theory of cointegration is *very* important, and that this simple question does question the importance it has taken. David B = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
In any case, the original poster explicitly claimed that regression with an explanatory variable that was generated by a non-stationary process was invalid even if the residuals of the regression are independent. I claim that this is not true. if both dependent and independent variables are I(1), residuals are iid then you have cointegration. Standard tools (Wald , Likelihoo ration and Score tests ) are invalid because limiting distribution of estimators is not Normal. Anyway, to use these standard tools some moment conditions on variables appearing in regression have to be satisfied. For example, sup E(Xt^2) has to be finite which is not true if Xt is integrated. Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 15.510240.62466 24.83 2e-16 *** x0.408630.01898 21.52 2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 it speaks for itself: how often do you see t-stat=22? Actually, I would recommend you to repeat this experiment for example 100 times and to check how many time you cannot reject b=0. adjusting for autocorrelation you will conclude that you effectively have about five data points' worth of information. I don't think you will reject the null hypothesis. the only adjustment that is going to work here is to difference the data. Why are you interested in E(inv(X'X)X'Y)? I think you may be trying to find standard errors by finding the unconditional variance of the estimators. You shouldn't do this, however. You should be finding the variance conditional on the observed X, since X in itself is not informative regarding the regression coefficients. That's right, if you can condition on X and E(U|X)=0 then it's not very different from fixed regressors ccase. But sometimes you cannot condition on X (in time series models). Also, sometimes you cannot or do not want to assume that E(U|X)=0. So, there are cases when you have to deal with unconditional moments. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
since you can always consider the explanatory variable as non stochastic this seems to be rather strong statement. You can treat regressors as non-stochastic if you have control over it. So, it seems to me that the only case when you can treat regressors as fixed is when your data is coming from some designed experiment. I do not know what is your field of study, but if it's social science then you have a problem. In social science most of the data is measurement of uncontrolled (by researcher) processes and cannot be treated as fixed. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: regression ? cointegration ?
It won't work in the following sense. Suppose that you run a regression of y on x trying to estimate a relationship of the form y=a+bx+u. Further suppose that y(t)=y(t-1)+e1(t) and x(t)=x(t-1)+e2(t), so both processes are integrated. Further, suppose that e1 and e2 are independent and thus there is no relationship between y and x. you have estimated your coefficient b and trying to test that b=0. Now the main part: you will discover that coefficient value is very small but t-statistic is very large imposing that b is not zero. The problem with integrated regressors is that t-statistic diverges to infinity as sample size increases when y and x are independent. Further, in the case of integrated regressors and dependent variables the asymptotic distribution of coefficients is no longer normal. So, won't work means that you cannot test the relationship between variables using standard tools (F tests) when you have integrated variables. Now the intuition. Consider two time series: 1) US GDP, 2) cummulative amount of rain in Brazil. You can think that these series are independent, but try to run 2 on 1 and you will have very significant coefficients. Now, what to read. you can try any modern textbook on time series. My recommendation White Asymptotic Theory for Econometricians or Davidson Econometric Theory On 23 Oct 2001, Radford Neal wrote: In article 9r4ao0$l07$[EMAIL PROTECTED], David B [EMAIL PROTECTED] wrote: There is certainly nothing wrong with using standard regression when an explanatory variable is randomly generated, from whatever sort of stochastic process you please, as long as the regression residuals are independent If the explanatory variable is generated by an integrated process, it won't work, even if the error term is an iid process. This is what I am disputing. What basis do you have for claiming that it won't work? And in what sense do you mean that it won't work? I suspect that you've encountered a claim that is somewhat like this in some reference book, and have mis-interpreted it. Radford Neal Radford M. Neal [EMAIL PROTECTED] Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED] University of Toronto http://www.cs.utoronto.ca/~radford = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Stochastic processes time series
Yale, UCSD On 13 Oct 2001, Cengiz wrote: Which US graduate universities are considered to be the strongest in the area of stochastic processes and time series analysis? Thank you in advance. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: not significant
You need to check (may be by simulations) if your test has any power to reject the null. If the power is low than get more subjects. On 12 Sep 2001, sylvie perera wrote: Hi, If a result is not significant, I realise this is because it may be due to chance. Is there a way of telling if more subjects are needed or there actually is no difference between the groups? Thanks in advance Sylvie. Get your FREE download of MSN Explorer at http://explorer.msn.com = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: simple linear regression
the assumption of fixed regressors (X) is the first to be relaxed usually. There is no sence to assume fixed regressors unless your data is coming from a controled expreriment. The model and estimation methods may stay without change, only the interpretation of the model changes. Now you can speak about conditional expectation of y (if regressors are fixed there is nothing to condition on, remember: conditional expectation is still a random variable, even if you call it expectation). Regarding normality, you can keep it or drop it, it does not have any relation to randomness of regressors. The same is correct regarding independence of residuals. If you are interested in this topic seriously then get a good Econometrics book. For example, A course in Econometrics by Goldberger is a nice place to start. Now to the second part. Prediction error = Y(future)-Y_hat is a random variable, the problem is that that prediction error is unobservable random variable, so yoou cannot treat it in the usual manner. Also, you have more than just a single r.v., you have a series of such r.v. depending on how far in the future you want to go (+1 period, +2, ... etc). For each period there is a r.v. which has a distribution and hopefully finite mean and variance. On 12 Sep 2001, James Ankeny wrote: I have two questions regarding simple linear regression that I was hoping someone could help me with. 1) According to what I have learned so far, the levels of X are fixed, so that only Y is the random variable ( error is random as well). My question is, what if X is a random variable as well? It seems like this could be the case with some of my textbook examples. Does simple model of y=a+bx+e still hold? Are assumptions the same, such as conditional distributions of Y are normal with same variance, E(Y) is a straight line function of X, and independence/normality of error terms? Also, in repeated sampling the sample slope is normal because Y is normal. However, if X also varies from sample to sample, is the sample slope still normally distributed (sampling distribution)? 2) My second question regards the prediction interval. I can perform this on a computer, but it is difficult for me to conceptualize. If you are using Y-hat (the mean of estimated regression function) to estimate a future response, does this mean that the difference, (Y(future response)-Y hat), is a statistic that has a sampling distribution, from which you can derive the standard error? It seems like this might be the case, but there is no parameter. I don't even know if what I just said makes any sense. I understand that my questions are long, and perhaps not in any logical order, but I would greatly appreciate any help with these conceptual matters. Thank you ___ Send a cool gift with your E-Card http://www.bluemountain.com/giftcenter/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: SD is Useful to Normal Distribution Only ?
On 21 Aug 2001, Donald Burrill wrote: On 21 Aug 2001, RFerreira wrote [edited]: The formula [for] the Standard Deviation, SD=((x-mean)^2/(n-1))^0.5, can be applied to any data set. [With] that value we know two things about the set: mean and SD. With these two values we can have one powerful intuitive use to them: The centre of the set is the mean and 68% of values are in the interval [mean-SD to mean+SD], IF the set have Normal Distribution. If the set distribution is NOT Normal, what intuitive use have the values? How about limiting distribution (CLT)? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Multivariate time series analysis
Try Matlab. There is a number of different time-series packages available for it. You can download trial version for free from Mathworks website. Than get Econometrics Toolbox from http://www.spatial-econometrics.com/ and Time-Series toolbox from http://www.physik3.gwdg.de/tstool/ Besides, Matlab is very popular for scientific programming, you can only benefit from learning it. It's much more flexible than black-box packages like SAS, Stata, SPSS. In my opinion SAS is great when you need to make comlicated manipulations with your data, but for statistical analysis there are better alternatives. Finally, if you do not want to spend too much time on learning a new program, if you want something very user-friendly, easy to learn, simple to use, then you have to get Eviews. On Thu, 17 May 2001, Alaa Ali wrote: I am looking for a good software that has the multivariate time series capabilities. Things such as multivariate ARMA, ARMAX: state space model, tranfer function modeling, ...etc. I have tried to use ASTSA, a freeware, but was much less tha satisfied. I am trying to use spacestate function within SAS/ETS package and am not sure if it will have what I want either (it only deals with square matrix). The problem briefly for those interested: Given, m time series of independent variables and n time series of dependent variable, I would like to predict the dependent variable at time step t+1 based on linear contributions from the n independent variables and m dependent variables over several lags in the past. Thanks aa = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: errors in journal articles
I think it's a normal situation. Journals have articles with errors. Textbooks have errors. There nothing that can be done, because it's only natural to make mistakes. You should feel good that you can see those things, but be ready that some day they will find an error in your paper. Vadim On 27 Apr 2001, Lise DeShea wrote: List Members: I teach statistics and experimental design at the University of Kentucky, and I give journal articles to my students occasionally with instructions to identify what kind of research was conducted, what the independent and dependent variables were, etc. For my advanced class, I ask them to identify anything that the researcher did incorrectly. As an example, there was an article in a recent issue of an APA journal where the researchers randomly assigned participants to one of six conditions in a 2x3 factorial design. The N wouldn't allow equal cell sizes, and the reported df exceeded N. Yet the article said the researchers ran a two-way fixed-effects ANOVA. One of my students wrote on her homework, It is especially hard to know when you are doing something wrong when journals allow bad examples of research to be published on a regular basis. I'd like to hear what other list members think about this problem and whether there are solutions that would not alienate journal editors. (As a relative new assistant professor, I can't do that or I'll never get published, I'll be denied tenure, and I'll have to go out on the street corners with a sign that says, Will Analyze Data For Food.) Cheers. Lise ~~~ Lise DeShea, Ph.D. Assistant Professor Educational and Counseling Psychology Department University of Kentucky 245 Dickey Hall Lexington KY 40506 Email: [EMAIL PROTECTED] Phone: (859) 257-9884 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
convergence of probability measures
Does anyone familiar with this book by P. Billingsley? There are two editions of this book, and it seems like they are different. What are the differences? Which edition should I get? I know that the first edition is going out of print, but it still available at some on-line stores. Thanks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: ARIMA forecasting using EViews
Without any relation to the type of your data (stock market) : ARMA is a way to model a data with no long-range dependence. Correlation among observations dies out really fast ( at exponential rate ), so when you trying to forecast out of sample, you realise very soon that the past data contains no information about future, and the best possible predictor is unconditional mean. So if you want "non-flat" predictor you should move to statistical models which allow for long-range dependence. An example is fractional ARIMA or ARFIMA. Jan Beran "Statistics for Long-Memory processes" is the best book (very expensive) on that subject ( and the only ?). You can find also some information (but not very much) in Hamilton "Time Series" and Gourieroux "Time Series and Dynamic Models". In your case, it turns out that expected price change (?) is zero, and the best predictor of future price is its today value. It sounds like efficient market hypothesis, and if you belive in it you should not have been trying to forecast it in the first place. If you do not believe in efficient market hypothesis, and think that today financial data contains some information about future that can be extracted using statistical methods, you should use something more advanced than simple ARIMA. I am sure that any possible correlation of that type has been exploited already. I am sure also that there is no good univariate statistical model for financial data, and if somebody has one, I am sure he would not tell anyone :) But you can try multivariate models, for example Multifactor Pricing Models (see, for example, "The Econometrics of Financial Markets" by Campbell). If you can specify all factors affecting prices and if you have a good idea about those factors future values then you can do a good prediction. But again, it's very hard to come out with a predictor of future prices that is better than today's price, you know why :) But it's much more easier to model/predict second moments or volatility. Check out GARCH models (Campbell's book again is one of sources for references). Vadim On 10 Apr 2001, Matt Kaar wrote: I have a question that probably applies to ARIMA forecasting in general, but the specific piece of econometrics software I'm using is EViews. When I use an ARIMA(1,1,0) model to model ~150 pieces of stock market data and then use the EViews software to forecast the next 100 values, Every forecast after about the sixth forecasted value is the same to around 10 significant figures. My question is: Why is this happening? My professor said that ARIMA(1,1,0) should be able to forecast varying values way past the sixth value. Thanks, Matt -- Matt Kaar Georgia Tech, CS Major Email: [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =