Re: Robust regression

2002-03-01 Thread Vadim and Oxana Marmer

If, for example, normality assumption holds then by doing robust
regression instead of OLS you lose efficiency. So, it's not the same
result after all. But you can do both, compare and decide. If robust
regression produces results which are not really different from the OLS
then stay with OLS.

On Fri, 1 Mar 2002, Rich Ulrich wrote:

 On 1 Mar 2002 00:36:01 -0800, [EMAIL PROTECTED] (Alex Yu)
 wrote:

 
  I know that robust regression can downweight outliers. Should someone
  apply robust regression when the data have skewed distributions but do not
  have outliers? Regression assumptions require normality of residuals, but
  not the normality of raw scores. So does it help at all to use robust
  regression in this situation. Any help will be appreciated.

 Go ahead and do it if you want.

 If someone asks (or even if they don't), you can tell
 them that robust regression gives exactly the same result.


 --
 Rich Ulrich, [EMAIL PROTECTED]
 http://www.pitt.edu/~wpilib/index.html




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: How to test whether f(X,Y)=f(X)f(Y) is true??

2002-02-20 Thread Vadim and Oxana Marmer

You can start with checking if they are correlated. It's simpler to do. If
you find that they are correlated then you have the answer to your
question.
If you find that they are uncorrelated and you have a reason to believe
that they may be not independent anyway then you can look for more
advanced tests.

On 20 Feb 2002, Linda wrote:

 Hi!

 I have some experimental data collected and can be grouped into 2
 variables, X and Y. One is the dependent variable (Y) and the other is
 an independent variable (X). What test shall I made to check whether
 there can be expressed as independent or not??

 Thanks..

 Linda




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: garch residuals

2002-02-19 Thread Vadim and Oxana Marmer

Stock market returns usually satisfy martingale property, and are
uncorrelated. I think you should check your calculations again for errors.
Are you sure that you are working with returns and not prices? I guess
that by heavy correlation you mean that estimated autoregressive
coefficient is close to 1, which holds for prices. Just a suggestion, hope
it helps.




On Tue, 19 Feb 2002, Daan Taks wrote:

 I have a question about my residuals. When testing for autocorrelation
 I come to the conclusion that the models (garch, Egarch, GJR a.k.a.
 Tarch) remove the correlation from the squared standardized residuals
 but not from the standardized residuals. Are my models misspecified??
 I use returns from the FTSE, the DAX, and the SP. These returns are
 (heavily) correlated, should a garch model remove the correlation of
 the returns? Or should it only remove the correlation of the squared
 returns??
 Thanks.






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Non Parametric Unit Root Test

2002-01-24 Thread Vadim and Oxana Marmer

uniform distribution of what?

Unit Root testing theory uses asymptotic results, so underlying distribution
does not really matter as long as it satisfies some comditions.

Check out Davidson Econometric Theory. You can find there a good intro
into unit roots tests. More advanced treatment is in Tanaka
Time Series Analysis.

On 22 Jan 2002, Maand M wrote:

 Hi:

 I would like to know where can I read more about Non
 Parametric Unit Root Test for uniform distribution.
 Any book or paper on it?

 Any comment is welcome.

 Maand

 __
 Do You Yahoo!?
 Send FREE video emails in Yahoo! Mail!
 http://promo.yahoo.com/videomail/


 =
 Instructions for joining and leaving this list, remarks about the
 problem of INAPPROPRIATE MESSAGES, and archives are available at
   http://jse.stat.ncsu.edu/
 =




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Buy Book on Probability and statistical inference

2002-01-15 Thread Vadim and Oxana Marmer

Casella and Berger Statistical Inference is a very popular graduate
level textbook on the topic. It's not related to your field directly, but
it gives introduction to the concepts used in statistics: likelihood,
sufficiency, completeness, statistical decision theory. Also you may want
to get graduate level probability textbook. I recommend to try Shiryayev
Probability. Again, these books are not applied and rather general, but
you have to know this stuff if you are serious about statistical analysis.

On Mon, 14 Jan 2002, Chia C Chong wrote:


 Vadim and Oxana Marmer [EMAIL PROTECTED] wrote in message
 [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
   On Sat, 12 Jan 2002 14:37:10 -, Chia C Chong
   [EMAIL PROTECTED] wrote:
  
Hi!
   
I wish to get a book in Probability and statistical inference . I wish
 to
get some advices first..Any good suggestion??
  
 
  it depends on your background and your interests. If you can give more
  details about this then you can get more helpful suggestions.
 

 I am currently doing a PhD in Wireless Communications. My research are is to
 develop a statistical wireless channel model for the 4th generation systems.
 I would prefer a books that deal with a lot of pratical examples especially
 how to fit measurement data to theoretical distributions and perform
 goodness of fit test of their fits.

 Thanks..

 CCC






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Buy Book on Probability and statistical inference

2002-01-13 Thread Vadim and Oxana Marmer

 On Sat, 12 Jan 2002 14:37:10 -, Chia C Chong
 [EMAIL PROTECTED] wrote:

  Hi!
 
  I wish to get a book in Probability and statistical inference . I wish to
  get some advices first..Any good suggestion??


it depends on your background and your interests. If you can give more
details about this then you can get more helpful suggestions.



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Excel vs Quattro Pro

2002-01-07 Thread Vadim and Oxana Marmer

there is a lot of packages that are half-way between spreadsheets and
formal programming languages: SAS, SPSS, Stata. anything is better than
spreadsheets.


On 8 Jan 2002, Kenmlin wrote:

 i don't know the answer to this but ... i have a general question with
 regards to using spreadsheets for stat analysis

 Many students are computer illiterate and it might be easier to teach them how
 to use the spreadsheet than a formal programming language.






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer

besides, who needs those tables? we have computers now, don't we?
I was told that there were tables for logarithms once. I have not seen one
in my life. Is not it the same kind of stuff?


   3.  Outdated.

 on the grounds that when sigma is unknown, the proper distribution is t
 (unless N is small and the parent population is screwy) regardless how
 large the sample size may be.  The main (if not the only) reason for the
 apparent logical bifurcation at N = 30 or thereabouts was that, when
 one's only sources of information about critical values were printed
 tables, 30 lines was about what fit on one page (plus maybe a few extra
 lines for 40, 60, 120 d.f.) and one could not (or at any rate did not)
 expect one's business students to have convenient access to more
 extensive tables of the t distribution.  And, one suspects latterly,
 authors were skeptical that students would pay attention to (or perhaps
 be able to master?) the technique of interpolating by reciprocals between
 30 df and larger numbers of df (particularly including infinity).

 But currently, _I_ would not expect business students to carry out the
 calculations for hypothesis tests, or confidence intervals, by hand,
 except maybe half a dozen times in class for the good of their souls:
 I'd expect them to learn to invoke a statistical package, or else
 something like Excel that pretends to supply adequate statistical
 routines.  And for all the packages I know of, there is a built-in
 function for calculating, or approximating, the cumulative distribution
 of t for ANY number of df.  The advice in any _current_ business-
 statistics text ought to be, therefore, to use t _whenever_ sigma is not
 known.  And if the textbook isn't up to that standard, the instructor
 jolly well should be.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer

 3) When n is greater than 30 and we do not know sigma, we must estimate
 sigma using s so we really should be using t rather than z.


you are wrong. you use t-distribution not because you don't know sigma,
but because your statistic has EXACT t-distribution under certain
conditions. I know that the textbook says if we knew sigma then the
distribution would be normal, but because we used s instead the
distribution turned out to be t. It does not say how exactly it becomes
t, so you make the conclusion: use t instead of normal whenever you use s
instead of sigma. But it's wrong, it does not go like this.

when you don't know underlying distribution of the sample you may use
normal distribution (under certain regularity conditions),
as an APPROXIMATION to the actual distribution of your statistic.
approximate distribution in most cases is not parameter-free, it may
depend, for example, on unknown sigma. in such situation you may replace
the
unknown parameter by its consistent estimator.the  approximate
distribution is
still normal. think about it as iterated approximation. first you
approximate the actual distribution by N(0,sigma^2), then you approximate
it by N(0,S^2), where S^2 is a consistent estimator for sigma. there are
formal theorems that allow you to do this kind of thigs.

The essential difference between two approaches is that the first one
tries to derive the
EXACT disribution, second says I will use APPROXIMATION.

number 30 has no importance at all, throw away all the tables you have. I
cannot believe they still teach you this stuff. I wish it was that
simle:30!

Your confusion is the result of oversimplification and desire to provide
students with simple stratagies which present in basic statistics
textbooks. I guess it makes teaching very simple, but it mislead students.
Your confusion is an example. The problem is that there is no simple strategies,
and things are much-much more complicated than they appear in basic textbooks.
Basic text books don't tell you the whole story, and they don't even try,
because you simply cannot do this at their level. Don't make any strong
conclusions after reading only basic textbooks.

In practice, in business and economics statistics, nobody uses
t-tests, but normal and chi-square approximations are used a lot. The
assumptions that you have to make for t-test are too strong.







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When to Use t and When to Use z Revisited

2001-12-10 Thread Vadim and Oxana Marmer


 Sigma is hardly ever known, so you must use t. Then why not simply tell
 the students: use the t table as far as it goes, (usually around
 n=120), and after that, use the n=\infty line (which corresponds to the
 normal distribution). Then there is no need for a rule for when to use
 z, when to use t.


but the data is not normal either in 99.9(9) of the cases. Furthermore,
the data that you see in economics/business is very often is not  an iid
sample either. So, one way or another you end up with normal or
chi-square.

actually, there is an alternative to both approaches. it's bootstrap. but
it does not always work and should not be used blindly.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Web programmer for you...

2001-12-08 Thread Vadim and Oxana Marmer

is it for you! ? (I hope there are Seinfeld fans here)

On 8 Dec 2001, Alexander wrote:

 Hello, I am a professional web-programmer.
 (php/perl/mySQL/javascript/HTML).
 I want to work with you... If you are interested in my help, please
 write me:
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]

 Our work will be the next: You will send me an order, I'll do it and
 show it to you. You will pay me, only if you like my work...
 It's very easy.

 P.S.: My work is rather cheap...

 See you later...
 Alex




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: When Can We Really Use CLT Student t

2001-11-21 Thread Vadim and Oxana Marmer

On 21 Nov 2001, Ronny Richardson wrote:

 As I understand it, the Central Limit Theorem (CLT) guarantees that the
 distribution of sample means is normally distributed regardless of the
 distribution of the underlying data as long as the sample size is large
 enough and the population standard deviation is known.

CLT does not guarantee anything. It's just an approximation that sometimes
works and sometimes does not work. The underlying distribution does
actually matter, or, more correctly, the data has to satisfy some
regularity conditions for CLT to apply. Population standard deviation does
not need to be known.



 It seems to me that most statistics books I see over optimistically invoke
 the CLT not when n is over 30 and the population standard deviation is
 known but anytime n is over 30. This seems inappropriate to me or am I
 overlooking something?

Sometimes CLT is a good approximation for small data sets too, and
sometimes it's not good even if n is very large. It all depends on the model,
the data and so on. Often it's your only choice to use asymptotic argument
and CLT.



 When the population standard deviation is not know (which is almost all the
 time) it seems to me that the Student t (t) distribution is more
 appropriate.

not at all. again, you do not need to know standard deviation to apply
CLT. You can replace unknown parameters by their consistent estiamtors.


I do not know which textbooks you are refering to, but I suggest you to
try something more advanced like Estimation and Inference in
Econometrics by Davidson and MacKinnon or Econometric Theory by
Davidson.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-28 Thread Vadim and Oxana Marmer


There is a new book by Davidson Econometric Theory. His treatment of
unit-root/cointegration econometrics is very clear and easy to follow.


On Sun, 28 Oct 2001, David B wrote:

  The regression equation with iid errors implies cointegration of the two
  series.
 
  Yes, but not vice versa.  So the quoted passage may be referring to a more
  general case.

 Yes, you are right. I unhappily figured it out just after having post it. I
 tried to reread this book but it is really too harsh for me (and besides is
 not very well written imho).

  You could look at section 8.2, entitled Ordinary least squares under
  more general conditions, of Time Series Analysis, by J. D. Hamilton.
  Section 8.3 might be of interest too.

 I don't have yet this (well-known) book, but I note this reference ; thanks
 for it.

 David B







=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-28 Thread Vadim and Oxana Marmer

First of all, interpretation of what is being estimated.
1. Fixed regressors:
implies that data is coming from well controled experiment. Regressors
(X's) are control variables and you estimate the response of the variable
under investigation (Y) to changes in control variables, i.e. you (or the
person who collected the data) were able to change values of X's and
observe the response of Y.
2. Random regressors.
Several cases here. (a) Cross-section data. Often You estimate Conditional
Expectation Function, E(Y|X). With assumption of E(U|X)=3D0 statistics of
this case
is not realy different from fixed regressors case, since you can always
condition on the data observed. (b) time-series case. You estimate Data
Generating Process. In most of the cases you cannot condition on the data
observed anymore (because then nothing random left in the model). You do
not have E(U|X)=3D0, but E(UX)=3D0, which is weaker and makes statistics a
little bit more complicated. (c) Cointegration. You estimate long-run
(steady-state) relationship between variables. You do not have dependent
and independent variables anymore, you do not have E(UX')=3D0 anymore
(actually, you do not need it , see Fully Modified Least Squares in
Davidson's book). Statistics of this case is completely different.

In my opinion, first, you have to decide what interpretation fits your
data most. Do you have experiment data or you are going to estimate
long-run relationship between the variables? After you have decided what
you are going to estimate you can choose appropriate technique.




On Sun, 28 Oct 2001, David B wrote:


 Vadim and Oxana Marmer [EMAIL PROTECTED] a =E9crit dans le messa=
ge
 news: [EMAIL PROTECTED]
 
  if you regress log of agregate consumption on log of GDP, woul you like=
 to
  treat log GDP as fixed regressor? I guess not. Fixed regressor implies =
a
  lot of strong properties which is not reasonable to assume in this case=
=2E
 

 What kind of properties if I may ask (I personnally never heard of such
 criteria) ? That could be an element of answer for me.

 David B






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer


 In the case of Y and X being two independent random walks, the mean of
 (XX)(-1)X'Y can be calculated using Wiener distribution theory however, and
 it is not zero (it looks very bad). The t-stat for slope is not zero either.
 The variance of both slope estimator and t-stats are much higher than
 standard theory forecast, and, what is even worse, do not decrease as sample
 size increase.

 If they are independent random walks with mean 0, or even if
 E(Y|X)=0, the mean of this will have to be 0.


 the problem is that you cannot test for this using standard regression
diagnostic because t-statistic diverges to infinity as sample size
increases, so you have to adjust your methods.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer

 (XX)(-1)X'Y can be calculated using Wiener distribution theory however, and
 it is not zero (it looks very bad). The t-stat for slope is not zero either.
 The variance of both slope estimator and t-stats are much higher than
 standard theory forecast, and, what is even worse, do not decrease as sample
 size increase.

 If Y = a + b*X + i.i.d. noise, X and Y can't be independent random walks.
 If the noise is not independent, then you need to account for that when
 computing the standard error.

   Radford Neal

y=a+bx+... is the equation that researcher is trying to fit, but the true
model is b=0.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer


 You will easily be able to see that that residuals from this
 regression are not independent.  So this isn't a counterexample to my
 claim that There is certainly nothing wrong with using standard
 regression when an explanatory variable is randomly generated, from
 whatever sort of stochastic process you please, as long as the
 regression residuals are independent.


You do not need independent residuals for regression



 If you account for this dependence in your test, I don't think you
 will reject the null hypothesis that b=0.



Yes you will, if you use standard regression diagnostic.


 Now the intuition. Consider two time series: 1) US GDP,
 2) cummulative amount of rain in Brazil. You can think that these series
 are independent, but try to run 2 on 1 and you will have very
 significant coefficients.

 The two time series may be independent, but if you fit a regression
 model, it will be obvious that the residuals are autocorrelated, and
 you need to adjust for this in doing your significance test.


simple adjustment for autocorrelation won't help




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer


  ... You can treat
 regressors as non-stochastic if you have control over it. So, it seems to
 me that the only case when you can treat regressors as fixed is when your
 data is coming from some designed experiment. I do not know what is your
 field of study, but if it's social science then you have a problem. In
 social science most of the data is measurement of uncontrolled (by
 researcher) processes and cannot be treated as fixed.

 What do you mean by cannot?  What is it that goes wrong?  Are you
 saying that the model will not make good predictions for new data from
 the same source?  If so, I think you are wrong.  Or are you saying
 that you won't be able to make conclusions about causal influences?
 That might well be, but for that, it's not really just a matter of
 fixed versus stochastic.


When I say that you cannot treat regressors as fixed I mean following.
Suppose Y=consumption, X=GDP then E(inv(X'X)X'Y) is not equal to
inv(X'X)X'E(Y) since both X and Y are random variables, and you need a
little bit different treatment of
regression. So, mechanics of OLS changes a little bit, and of course,
interpretation of regression is different.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer


 Neither is it in general.  For consistency of the estimator,
 the inverse of X'X needs to converge to 0.  But this is not
 generally a problem because of using integrated processes..

and speaking about random regressors: if regressors are not fixed than you
need almost sure convergence or convergence in probability of the inverse
of X'X, which are more complicated concepts. I mean, you have to be
careful when you say that your regressors are not random variables.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer


if you regress log of agregate consumption on log of GDP, woul you like to
treat log GDP as fixed regressor? I guess not. Fixed regressor implies a
lot of strong properties which is not reasonable to assume in this case.



On Wed, 24 Oct 2001, David B wrote:


  this seems to be rather strong statement. You can treat
  regressors as non-stochastic if you have control over it. So, it seems to
  me that the only case when you can treat regressors as fixed is when your
  data is coming from some designed experiment.

 That is precisely what I wanted opinions about. It seems to me it is a
 philosophy of probability problem (to be pompous), which is overlooked in
 basic econometrics/statistics textbooks (or even more advanced one, I would
 say).
 Why would one be obliged to carefully test systematically for unit roots,
 since integrated process do not really exist ?
 Why couldn't we treat always the regressors as fixed, just keeping in mind
 that when they look like they are generated by an I(1) process, standard
 inference *could* be wrong ?
 Of course, I am aware that the theory of cointegration is *very* important,
 and that this simple question does question the importance it has taken.

 David B






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-24 Thread Vadim and Oxana Marmer


 In any case, the original poster explicitly claimed that regression
 with an explanatory variable that was generated by a non-stationary
 process was invalid even if the residuals of the regression are
 independent.  I claim that this is not true.


if both dependent and independent variables are I(1), residuals are
iid then you have cointegration. Standard tools (Wald , Likelihoo
ration and Score tests ) are
invalid because limiting distribution of estimators is not Normal.

Anyway, to use these standard tools some moment conditions on variables
appearing in regression have to be satisfied. For example, sup E(Xt^2) has
to
be finite which is not true if Xt is integrated.



 Coefficients:
 Estimate Std. Error t value Pr(|t|)
 (Intercept) 15.510240.62466   24.83   2e-16 ***
 x0.408630.01898   21.52   2e-16 ***
 ---
 Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1


it speaks for itself: how often do you see t-stat=22? Actually, I would
recommend you to repeat this experiment for example 100 times and to check
how many time you cannot reject b=0.

 adjusting for autocorrelation you will conclude that you effectively
 have about five data points' worth of information.  I don't think you
 will reject the null hypothesis.


the only adjustment that is going to work here is to difference the data.




 Why are you interested in E(inv(X'X)X'Y)?  I think you may be trying
 to find standard errors by finding the unconditional variance of the
 estimators.  You shouldn't do this, however.  You should be finding
 the variance conditional on the observed X, since X in itself is not
 informative regarding the regression coefficients.

That's right, if you can condition on X and E(U|X)=0 then it's not very
different from fixed regressors ccase. But
sometimes you cannot condition on X (in time series models). Also,
sometimes you cannot or do not want to assume that E(U|X)=0. So, there are
cases when you have to deal with unconditional moments.





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-23 Thread Vadim and Oxana Marmer

  since you
 can always consider the explanatory variable as non stochastic


this seems to be rather strong statement. You can treat
regressors as non-stochastic if you have control over it. So, it seems to
me that the only case when you can treat regressors as fixed is when your
data is coming from some designed experiment. I do not know what is your
field of study, but if it's social science then you have a problem. In
social science most of the data is measurement of uncontrolled (by
researcher) processes and cannot be treated as fixed.





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: regression ? cointegration ?

2001-10-23 Thread Vadim and Oxana Marmer

It won't work in the following sense. Suppose that you run a regression of
y on x trying to estimate a relationship of the form y=a+bx+u. Further
suppose that y(t)=y(t-1)+e1(t) and x(t)=x(t-1)+e2(t), so both processes
are integrated. Further, suppose that e1 and e2 are independent and thus
there is no relationship between y and x.  you have
estimated your coefficient b and trying to test that b=0. Now the main
part: you will discover that coefficient value is very small
but t-statistic is very large imposing that b is
not zero. The problem with integrated regressors is that t-statistic
diverges to infinity as sample size increases when y and x are
independent.
Further, in the case of integrated regressors and dependent variables the
asymptotic distribution of coefficients is no longer normal.

So, won't work means that you cannot test the relationship between
variables using standard tools (F tests) when you have integrated
variables.

Now the intuition. Consider two time series: 1) US GDP,
2) cummulative amount of rain in Brazil. You can think that these series
are independent, but try to run 2 on 1 and you will have very
significant coefficients.

Now, what to read. you can try any modern textbook on time series. My
recommendation White Asymptotic Theory for Econometricians or Davidson
Econometric Theory


On 23 Oct 2001, Radford Neal wrote:

 In article 9r4ao0$l07$[EMAIL PROTECTED],
 David B [EMAIL PROTECTED] wrote:
  There is certainly nothing wrong with using standard regression when
  an explanatory variable is randomly generated, from whatever sort of
  stochastic process you please, as long as the regression residuals are
  independent
 
 If the explanatory variable is generated by an integrated process, it won't
 work, even if the error term is an iid process.

 This is what I am disputing.  What basis do you have for claiming that
 it won't work?  And in what sense do you mean that it won't work?

 I suspect that you've encountered a claim that is somewhat like this
 in some reference book, and have mis-interpreted it.

Radford Neal

 
 Radford M. Neal   [EMAIL PROTECTED]
 Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
 University of Toronto http://www.cs.utoronto.ca/~radford
 




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Stochastic processes time series

2001-10-19 Thread Vadim and Oxana Marmer


Yale, UCSD

On 13 Oct 2001, Cengiz wrote:

 Which US graduate universities are considered to be the strongest in
 the area of stochastic processes and time series analysis? Thank you
 in advance.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: not significant

2001-09-12 Thread Vadim and Oxana Marmer

You need to check (may be by simulations) if your test has any power to
reject the null. If the power is low than get more subjects.

On 12 Sep 2001, sylvie perera wrote:

 Hi,

 If a result is not significant, I realise this is because it may be due
 to chance.

 Is there a way of telling if more subjects are needed or there actually
 is no difference between the groups?

 Thanks in advance
 Sylvie.

 
 Get your FREE download of MSN Explorer at http://explorer.msn.com
 =
 Instructions for joining and leaving this list and remarks about the
 problem of INAPPROPRIATE MESSAGES are available at
 http://jse.stat.ncsu.edu/
 =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: simple linear regression

2001-09-12 Thread Vadim and Oxana Marmer

the assumption of fixed regressors (X) is the first to be relaxed
usually. There is no sence to assume fixed regressors unless your data is
coming from a controled expreriment. The model and estimation methods may
stay without change,  only the interpretation of the model  changes. Now
you can speak about conditional expectation of y (if regressors are fixed
there is nothing to condition on, remember: conditional expectation is
still a random variable, even if you call it expectation). Regarding
normality, you can keep it or drop it, it does not have any relation to
randomness of regressors. The same is correct regarding independence of
residuals. If you are interested in this topic seriously then get a good
Econometrics book. For example, A course in Econometrics by Goldberger is
a nice place to start.

Now to the second part. Prediction error = Y(future)-Y_hat is a random
variable, the problem is that that prediction error is unobservable random
variable, so yoou cannot treat it in the usual manner. Also, you have more
than just a single r.v., you have a series of such r.v. depending on how
far in the future you want to go (+1 period, +2, ... etc). For each period
there is a r.v. which has a distribution and hopefully finite mean and
variance.

On 12 Sep 2001, James Ankeny wrote:

   I have two questions regarding simple linear regression that I was hoping
 someone could help me with.

 1) According to what I have learned so far, the levels of X are fixed, so
 that only Y is the random variable ( error is random as well). My question
 is, what if X is a random variable as well? It seems like this could be the
 case with some of my textbook examples. Does simple model of y=a+bx+e still
 hold? Are assumptions the same, such as conditional distributions of Y are
 normal with same variance, E(Y) is a straight line function of X, and
 independence/normality of error terms? Also, in repeated sampling the sample
 slope is normal because Y is normal. However, if X also varies from sample
 to sample, is the sample slope still normally distributed (sampling
 distribution)?

 2) My second question regards the prediction interval. I can perform this on
 a computer, but it is difficult for me to conceptualize. If you are using
 Y-hat (the mean of estimated regression function) to estimate a future
 response, does this mean that the difference,
 (Y(future response)-Y hat), is a statistic that has a sampling distribution,
 from which you can derive the standard error? It seems like this might be
 the case, but there is no parameter. I don't even know if what I just said
 makes any sense.

  I understand that my questions are long, and perhaps not in any logical
 order, but I would greatly appreciate any help with these conceptual
 matters.

  Thank you





 ___
 Send a cool gift with your E-Card
 http://www.bluemountain.com/giftcenter/




 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: SD is Useful to Normal Distribution Only ?

2001-08-29 Thread Vadim and Oxana Marmer

On 21 Aug 2001, Donald Burrill wrote:

 On 21 Aug 2001, RFerreira wrote [edited]:

  The formula [for] the Standard Deviation, SD=((x-mean)^2/(n-1))^0.5,
  can be applied to any data set.  [With] that value we know two things
  about the set:  mean and SD.  With these two values we can have one
  powerful intuitive use to them:  The centre of the set is the mean
  and 68% of values are in the interval [mean-SD to mean+SD], IF the set
  have Normal Distribution.  If the set distribution is NOT Normal, what
  intuitive use have the values?
 How about limiting distribution (CLT)?



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Multivariate time series analysis

2001-05-17 Thread Vadim and Oxana Marmer

Try Matlab. There is a number of different time-series packages available
for it. You can download trial version for free from Mathworks website.
Than get Econometrics Toolbox from http://www.spatial-econometrics.com/
and Time-Series toolbox from http://www.physik3.gwdg.de/tstool/

Besides, Matlab is very popular for scientific programming, you can only
benefit from learning it. It's much more flexible than black-box
packages like SAS, Stata, SPSS. In my opinion SAS is great when you need
to make comlicated manipulations with your data, but for statistical
analysis there are better alternatives.

Finally, if you do not want to spend too much time on learning a new
program, if you want something very user-friendly, easy to learn, simple
to use, then you have to get Eviews.

 On Thu, 17 May 2001, Alaa Ali wrote:

 I am looking for a good software that has the multivariate time series
 capabilities.  Things such as multivariate ARMA, ARMAX:  state space
 model, tranfer function modeling, ...etc.  I have tried to use ASTSA, a
 freeware, but was much less tha satisfied.  I am trying to use
 spacestate function within SAS/ETS package and am not sure if it will
 have what I want either (it only deals with square matrix).

 The problem briefly for those  interested:

 Given, m time series of independent variables and n time series of
 dependent variable, I would like to predict the dependent variable at
 time step t+1 based on linear contributions from the n independent
 variables and m dependent variables over several lags in the past.


 Thanks
 aa




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: errors in journal articles

2001-04-28 Thread Vadim and Oxana Marmer

I think it's a normal situation. Journals have articles with errors.
Textbooks have errors. There nothing that can be done, because it's only
natural to make mistakes. You should feel good that you can see those
things, but be ready that some day they will find an error in your paper.

Vadim


On 27 Apr 2001, Lise DeShea wrote:

 List Members:

 I teach statistics and experimental design at the University of Kentucky,
 and I give  journal articles to my students occasionally with instructions
 to identify what kind of research was conducted, what the independent and
 dependent variables were, etc.  For my advanced class, I ask them to
 identify anything that the researcher did incorrectly.

 As an example, there was an article in a recent issue of an APA journal
 where the researchers randomly assigned participants to one of six
 conditions in a 2x3 factorial design.  The N wouldn't allow equal cell
 sizes, and the reported df exceeded N.  Yet the article said the
 researchers ran a two-way fixed-effects ANOVA.

 One of my students wrote on her homework, It is especially hard to know
 when you are doing something wrong when journals allow bad examples of
 research to be published on a regular basis.

 I'd like to hear what other list members think about this problem and
 whether there are solutions that would not alienate journal editors.  (As a
 relative new assistant professor, I can't do that or I'll never get
 published, I'll be denied tenure, and I'll have to go out on the street
 corners with a sign that says, Will Analyze Data For Food.)

 Cheers.
 Lise
 ~~~
 Lise DeShea, Ph.D.
 Assistant Professor
 Educational and Counseling Psychology Department
 University of Kentucky
 245 Dickey Hall
 Lexington KY 40506
 Email:  [EMAIL PROTECTED]
 Phone:  (859) 257-9884



 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



convergence of probability measures

2001-04-12 Thread Vadim and Oxana Marmer


Does anyone familiar with this book by P. Billingsley? There are two
editions of this book, and it seems like they are different. What are the
differences? Which edition should I get? I know that the first edition is
going out of print, but it still available at some on-line stores.

Thanks




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: ARIMA forecasting using EViews

2001-04-10 Thread Vadim and Oxana Marmer

Without any relation to the type of your data (stock market) : ARMA is a
way to model a data with no long-range dependence. Correlation among
observations dies out really fast ( at exponential rate ), so when you
trying to forecast out of sample, you realise very soon that the past data
contains no information about future, and the best possible predictor is
unconditional mean. So if you want "non-flat" predictor you should move to
statistical models which allow for long-range dependence. An example is
fractional ARIMA or ARFIMA. Jan Beran "Statistics for Long-Memory
processes" is the best book (very expensive) on that subject ( and the
only ?). You can find also some information (but not very much) in Hamilton "Time 
Series"
and Gourieroux "Time Series and Dynamic Models".

In your case, it turns out that expected price change (?) is zero, and the
best predictor of future price is its today value. It sounds like
efficient market hypothesis, and if you belive in it you should not have
been trying to forecast it in the first place.

If you do not believe in efficient market hypothesis, and think that
today financial data contains some information about future that can be
extracted using statistical methods, you should use something more
advanced than simple ARIMA. I am sure that any possible correlation of
that type has been exploited already. I am sure also that there is no good
univariate statistical model for financial data, and if somebody has one,
I am sure he would not tell anyone :)

But you can try multivariate models, for example  Multifactor Pricing
Models (see, for example, "The Econometrics of Financial Markets" by Campbell).
If you can specify all factors affecting prices and if you have a good
idea about those factors future values then you can do a good prediction.

But again, it's very hard to come out with a predictor of future prices
that is better than today's price, you know why :) But it's much more
easier to model/predict second moments or volatility. Check out GARCH
models (Campbell's book again is one of sources for references).

Vadim




On 10 Apr 2001, Matt Kaar wrote:

 I have a question that probably applies to ARIMA forecasting in general,
 but the specific piece of econometrics software I'm using is EViews.

 When I use an ARIMA(1,1,0) model to model ~150 pieces of stock market
 data and then use the EViews software to forecast the next 100 values,
 Every forecast after about the sixth forecasted value is the same to
 around 10 significant figures.

 My question is: Why is this happening?  My professor said that ARIMA(1,1,0)
 should be able to forecast varying values way past the sixth value.

 Thanks,
 Matt

 --
 Matt Kaar
 Georgia Tech, CS Major
 Email: [EMAIL PROTECTED]




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=