On 15-Apr-08 12:28:53, Linn wrote: > > Hi > Could anyone please explain to me the difference between > the = and the ==? > I'm quite new to R and I've tried to find out but didn't > get any wiser... > > Thanks
While these are indeed documented in ?"=" and ?"==", as Gabor Csardi has pointed out, these particular help pages (especially ?"=") devote so much attention to deep issues in the implementation of R that they are unlikely to give much to a newcomer to R. (Though ?"==" is not too bad). Putting it simply: "==" is a comparison operator. If 'x' and 'y' are two variables of the same type, then "x==y" has value TRUE if 'x' and 'y' have the same value. There are a couple of traps here, which even beginners should take care to be aware of. One is that "NA" is not a value. Its logical status is, in effect, "value not known". Therefore, when 'y' is "NA", "x==y" cannot have a definite resolution, since it is possible for the unkown value of 'y' to be equal to the value of 'x'; and equally possible that it may not be. Hence the value of "x==y" is itself "NA". Similarly the value of "x==y" is "NA" when both of 'x' and 'y' are "NA". The function to use for testing whether (say) 'x' is "NA" is is.na(x). The other is that the comparison of two floating-point numbers which (mathematically) should be equal may be FALSE, since their internal binary representations may be different. Floating-point arithmetic in fixed-precision computers is almost always approximate (though, in R, to a very close degree of approximation). Thus, for instance, x <- sqrt(2) x^2 == 2 # [1] FALSE and the reason for this is 2 - sqrt(2)^2 # [1] -4.440892e-16 But, as pointed out in ?"==", a better test for this kind of "equality" is the function all.equal(): all.equal(x^2,2) # [1] TRUE since all.equal(x,y) considers x and y to be "equal" if the numerical values corresponding to their representations do not differ by more than a certain "tolerance" which has a default value, but can be changed by the user. So much for "==". Where "=" is concerned, it functions rather like an assignment, but with complications. All that incomprehensible stuff in ?"=" has to do with the complications. In R, use "<-" rather than "=" for assigning a value to a variable. Using "=" may often work, but sometimes it won't, for deeply tangled reasons! As in "x <- sqrt(2)" above, rather than "x = sqrt(2)" -- though in this case that works as you would expect: y=sqrt(2) x==y # [1] TRUE In programming in R, it is a useful rule of thumb to think "use something I know will work" rather than thinking "use something which will work unless ... "; unless, of course, you know all about those "..."! Where you will routinely use "=" is in naming elements of lists and dataframes, and in assigning values to named parameters in functions. Thus, if you already have vectors X and Y and you want to make a dataframe in which you want X to play the role of the "independent variable" in a subsequent regression, and Y the role of the "dependent variable", then you could write MyData <- data.frame(Indep=X, Depend=Y) and then, later, execute the linear modelling function lm() in the form: lm(Depend ~ Indep, data=MyData) This executes lm() using what it finds in "Data" with name "Depend" as the dependent variable, and what it finds in "Data" with name "Indep" as the independent variable. This lm() call, in turn, illustrates the other typical use of "=" in assigning a value to a parameter in a function call, since the lm() function has a paramater called "data", and "data=MyData" then tells it which dataframe to use as the parameter "data" in this call of lm(). Not that you necessarily *have* to do it that way, of course, since often you may simply write lm(Y ~ X) without reference to a dataframe, just referring to variables you happen to have around at the time. But the lm(...,data=...) form is useful in two kinds of context: one, where the data come to you as a dataframe in the first place, and it then saves you explicitly extracting the variables from the dataframe; the other, where the call (e.g., as above) lm(Depend ~ Indep, data=MyData) is in some "generic" part of your code, and you do not want to change it. Then it makes sense to change the contents of "MyData", but keeping the names "Depend" and "Indep", so that whatever you actually put in as X and Y will be used in the same way. Hoping this helps! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 15-Apr-08 Time: 14:36:30 ------------------------------ XFMail ------------------------------ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.