RE: Factor analysis - which package is best for Windows?
MVA comes with R base. However, it is a seperate library. Libraries that are not sent with base are available in Windows binaries on CRAN, but you do not have to worry about that for MVA. Type: library() and you will get a list of the available packages. To make MVA available (i.e. load it), type: library(mva) then you can ask for, for example: help (factanal) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Wednesday, September 05, 2001 5:42 PM To: [EMAIL PROTECTED] Subject: Re: Factor analysis - which package is best for Windows? [EMAIL PROTECTED] (Magill, Brett) wrote in message news:[EMAIL PROTECTED]... Also check out R, a GNU implementation of the S language, most prominently known through its use in S-Plus. R is a fully featured statisitical programming environment. In its MVA (Multivariate) package, it includes routines for factor analysis using maximum liklihood estimation with varimax and promax rotations. I have installed R1.3.0 on my Windows system and have noted that MVA is an add-on. The FAQ tells how to obtain these add-ons but only for UNIX. Is this add-on actually available for Windows? If so, how do I obtain it? Thanks, Peter = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Factor analysis - which package is best for Windows?
Also check out R, a GNU implementation of the S language, most prominently known through its use in S-Plus. R is a fully featured statisitical programming environment. In its MVA (Multivariate) package, it includes routines for factor analysis using maximum liklihood estimation with varimax and promax rotations. R is open-source, which means that it is frequently updated and, most importantly, it can be downloaded free of charge. The only downside (to some) is that at this stage of its development R is completely command-prompt driven. However, I find the R language intuitive and easy to learn. http://www.r-project.org -Original Message- From: Aron Landy [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 30, 2001 6:33 AM To: [EMAIL PROTECTED] Subject: Re: Factor analysis - which package is best for Windows? I have tried it and it is amazing. A bargain ;) Richard Wright [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... KyPlot runs under Windows, is freeware and gives you several factor analysis algorithms to choose from. http://www.rocketdownload.com/Details/Math/kyplot.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: book on elaboration and regression
Rosenberg, Morris. 1968. The Logic of Survey Analysis. New York: Basic Books An older book, but nice treatment of the elaboration model using tables. Might be hard to find now however. I think it is in the process of being updated by another author. -Original Message- From: John Hendrickx [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 29, 2001 10:12 AM To: [EMAIL PROTECTED] Subject: book on elaboration and regression I'll be giving a course next fall on the basics of multivariate analysis and I'm looking for some texts. The students will be familiar with basic descriptive statistics, correlation and regression. I want to teach them how other variables can affect such bivariate relationships, e.g. spurious, interpreted, suppressor, interaction effects. That's basically it: how a third variable can affect an observed bivariate relationship and how this should be taken into account in the conceptual model and the analysis. This principle will be illustrated using crosstables, percentages and association measures on the one hand and regression models on the other. Now comes the question: can anyone suggest some good texts on these subjects? I'm looking for a text on elaboration using tables on the one hand and on elementary linear regression on the other. The course is for students of International Management at the University of Nijmegen in the Netherlands. It'll be taught in English though, but books related to business and economics are preferred. Advance thanks for any help, John Hendrickx = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Question
Don and Dennis, Thanks for your comments, I have some points and futher questions on the ussue below. For both Dennis and Don: I think the option of aggregating the information is a viable one. Yet, I cannot help but think there is some way to do this taking into account the fact that there is variation within organizations. I mean, if I have a organizational salary mean of .70 (70%) with a very tiny s.d. it is different than a mean of .70 with a large s.d. Should be some way to account for this. In addition, the problems with aggregation are well documented and I believe in gereneral suggest that aggregated results overestimate relationships. Don: I suggested that the problem was not a traditional multilevel problem. Perhaps I am wrong, but here is where I thought the difference was. Typically, say in a classroom problem, I want to assess the effect of classroom characterisitcs (student/teacher ratio, teacher experience, etc.) which are constant within classrooms on say student performance, which varies within classroom across individuals. The difference between this and the problem I presented is that the OUTCOME is a contextual variable. That is, rather than individual-level varaition, the outcome caries only at the organizational level. Perhaps this can be modeled with MLMs, but it is certainly different than the typical problem. With regard to independence, I am talking about the independence of the X2's. That is X2-1 is not independent of X2-2 and X2-4 is not independent of X2-5. This is because these cases come from the same organization. So, if we simply regressed Y~X2, not accounting for X1 in the model, this causes problems for ANOVA and regression, the GLM family more generally. The lack of independence here is exactly the reason for repeated measures and MLM more generally, no? Perhaps I am making to much of the issue, but the data structure is one that I have not encountered before and I found it something of an interesting and challenging problem, just hoping I might learn something along the way. Would appreciate any comments on my comments above. Oh, and just so there is no confusion, the data below I constructed. It reflects that structure of the data and nature of the relatinoship, but I generated this data set. In addition, the real thing does include variables such as tenure, previous experience, etc. that are also used as covariates at the individual level. Of course, this also means that these would need be aggregated as well if that approach is taken. Best IDX1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Question
A colleague has a data set with a structure like the one below: ID X1 X2 Y 1 1 0.700.40 2 1 0.800.40 3 1 0.650.40 4 2 1.200.25 5 2 1.100.25 6 3 0.900.30 7 4 0.500.50 8 4 0.600.50 9 4 0.700.50 Where X1 is the organization. X2 is the percent of market salary an employee within the organization is paid--i.e. ID 1 makes 70% of the market salary for their position and the local economy. And Y is the annual overall turnover rate in the organization, so it is constant across individuals within the organization. There are different numbers of employee salaries measured within each organization. The goal is to assess the relationship between employee salary (as percent of market salary for their position and location) and overall organizational turnover rates. How should these data be analyzed? The difficulty is that the data are cross level. Not the traditional multi-level model however. That there is no variance across individuals within an organization on the outcome is problematic. Of course, so is aggregating the individual results. How can this be modeled both preserving the fact that there is variance within organizations and between organizations. I suggested that this was a repeated measures problem, with repeated measurements within the organization, my colleague argued it was not. Can this be modeled appropriately with traditional regression models at the individual level? That is, ignoring X1 and regressing Y ~ X2. It seems to me that this violates the assumption of independence. Certainly, the percent of market salary that an employee is paid is correlated between employees within an organization (taking into account things like tenure, previous experience, etc.). Thanks = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Statistical Notation
Does anyone know of a resource that lists symbols often used in statistics and probability. What I am looking for is something with the symbol, its name, and some common uses. In particular, I would like web sources, but I would be grateful for any suggestions. Best, Brett = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
FW: Regression with repeated measures
These both sound to me as if multi-level models would be appropriate to handle the type of data to which you are referring. Look at this site for some basic info on multi-level models (MLM): http://www.ioe.ac.uk/multilevel/ Interested in learning more... then dowload this classic text on MLM for free: http://www.arnoldpublishers.com/support/goldstein.htm Finally, If you decide this method is what you are looking for, then have a look at the following text that describes Linear MLM or as they call it Hierarchichal Linear Models (HLM)--the multilevel equivalent of linear regression: Bryk,A.S., and Raudenbush,S.W. (1992). Hierarchical Linear Models. Newbury Park, Sage. -Original Message- From: Rich Strauss [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 28, 2001 2:40 PM To: [EMAIL PROTECTED] Subject: Re: Regression with repeated measures I don't have an answer, but I'm very glad this question was asked because I'm having a similar problem. I have 14 grids, values from which are to be used as the dependent variable in a regression. Each 6x6 grid consists of 36 observation points. Their are some fairly strong spatial correlations among the values at each grid, so I certainly can't treat them as if they were independent, yet reducing each grid to a single mean value (the other extreme) seems like a foolish waste of power. I'm trying to figure out how to use all of the observations, but also use the estimated spatial autocorrelations to weight them in the regression. (The design was originally created to answer a very different question, which is how I got into this mess.) I hope that there's a single answer to both of our questions. Rich Strauss At 10:54 AM 2/28/01 -0600, Michael M. Granaas wrote: I have a student coming in later to talk about a regression problem. Based on what he's told me so far he is going to be using predicting inter-response intervals to predict inter-stimulus intervals (or vice versa). What bothers me is that he will be collecting data from multiple trials for each subject and then treating the trials as independent replicates. That is, assuming 10 tials/S and 10 S he will act as if he has 100 independent data points for calculating a bivariate regression. Obviously these are not independent data points. Is the non-independence likely to be severe enough to warrant concern? If yes, is there some method that will allow him to get the prediction equation he wants? Thanks Michael Dr Richard E Strauss Biological Sciences Texas Tech University Lubbock TX 79409-3131 Email: [EMAIL PROTECTED] (formerly [EMAIL PROTECTED]) Phone: 806-742-2719 Fax: 806-742-2963 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Sample size question
G*Power is a powere analysis package that is freely available. You can download it at: http://www.psychologie.uni-trier.de:8000/projects/gpower.html You can calculate a sample size for a given effect size, alpha level, and power value. -Original Message- From: Scheltema, Karen [mailto:[EMAIL PROTECTED]] Sent: Friday, February 23, 2001 10:07 AM To: [EMAIL PROTECTED] Subject: Sample size question Can anyone point me to software for estimating ANCOVA or regression sample sizes based on effect size? Karen Scheltema Statistician HealthEast Research and Education 1700 University Ave W St. Paul, MN 55104 (651) 232-5212 fax (651) 641-0683 [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: NY Times on statisticians' view of election
It has created controversy, as witnessed by the replies it has generated, therefore it is controversial. I am not sure why the results that were presented need to be terribly controversial. Democratic supporters tend to be minority, older, poorer, and less educated than their republican counterparts. I would suggest (perhaps revealing my bias) that this is because the democratic party has done more to protect the interests of disenfranchised groups. On the other hand, the republican party leans toward favoring the wealthy and business interest who are normatively better educated, white, and definition of higher economic status. Even if these broad generalizations are not the case generally, it is certainly true that the stage has been set in this manner during this presidential campaign. If this influenced voters, it makes sense that we would find these demographic differences in the presidential vote. It would be interesting to see the means presented previously with these demographic characteristics controlled. I cannot imagine that there would be differences. I do not believe anyone truly believes that party affiliation somehow affects literacy. On the other hand, other characteristics associated with literacy (education, economic status, age, race) tend to influence party selection. Thus, I suggest that literacy problems that manifest in ballots inherently favor republicans and in doing so wholly disenfranchise a large number of already disadvantaged voters (my partisan statement). = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: Polls: Errors on Prime Time - NOT AN ERROR
I believe the issue is that the questionable balloting method was used in predominantly democratic districts and therefore disproportionately affected democratic voters, i.e. Gore supporters. Furthermore some have argued that they did in fact ask for a new ballot, which was denied. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Monday, November 13, 2000 11:58 AM To: [EMAIL PROTECTED] Subject: Re: Polls: Errors on Prime Time - NOT AN ERROR Are you saying that only Gore supporters could not figure out the ballot? Plus, only Gore voters were too timid to ask for assistance or for a new ballot? :-)) Could it be that they are complaining ex post facto when confronted with an unpopular result? :-) Apparently, upon leaving the polling place, they at first must have "misled" the exit pollsters! Only later after the polls closed they remembered their "error." Hm. On Mon, 13 Nov 2000 08:42:36 -0500, SSCHEINE [EMAIL PROTECTED] wrote: Actually, the exiting polls got it right!!! Remember, a lot of people left the polling booth thinking that they had voted for Gore, when they had actually messed up their ballot. Based on who they thought that they had voted for, they informed the exit pollers who called it for Gore. Sam ** Samuel M. Scheiner Div. Envir. Biol. (Rm 635) National Science Foundation 4201 Wilson Blvd. Arlington, VA 22230 Tel: 703-292-7189 Fax: 703-292-9065 Email: [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
RE: 120 subjects on 120 occassion: a model ?
I don't know enough about time series really to provide much advice. However, I have seen methods by which a slope was calculated across time for each subject with the first measurement as the incercept (within subjects). Subsequently, the individual slope was regressed on other factors. Thus, answering the question what factors (X) influence the rate of change/direction across time in Y. -Original Message- From: MJ Ray [mailto:[EMAIL PROTECTED]] Sent: Friday, October 13, 2000 8:15 AM To: [EMAIL PROTECTED] Subject: Re: 120 subjects on 120 occassion: a model ? "Gaj Vidmar" [EMAIL PROTECTED] writes: there seems to be no word from professional statisticians yet, so here's an addenum. This message was posted in many places, so presumably we will get a summary of responses if we care? My own suggestion (mangled by a bad emailer) was to use vector time series methods, but this could lead to a fairly large computation probelm without extra information. I wasn't able to recommend a very good specialist reference off the top of my head, though. MJR = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Nested Models and HLM
Is there a difference between a Nested Model in general and what is referred to as a hierarchical linear model? Thanks, Brett Magill = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Regression and Correlation (Was Correlation)
I am no statistician, so let me make sure I am understanding what you are saying. Your point is that you may have an identical regression equation despite the fact that the correlation may vary depending on the amount of variation in X. If this is your point, I agree and recognize this--r is a measure of the fit about the regression line. Nonetheless, regression and correlation are the same in the bivariate case with the exception of scale. In a bivariate regression, the standardized Beta coefficient is equal to the Pearson r. As with any standardization, it removes the scale of the variation and the result is that the slope describes the relationship or B = r. Brett -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Sent: Friday, May 19, 2000 11:43 AM To: [EMAIL PROTECTED] Subject: Re: Correlation Magill, Brett [EMAIL PROTECTED] wrote: Mike, In the bivariate case, regression and correlation are identical. This is false. Correlation is the measure of the proportion of the variance of one variable explained by a linear function of the other in a joint distribution, while linear regression is the linear relation itself. One can have non-linear versions as well. If in fact E(Y|X) = aX + b, this will also be the case no matter how selection is made on X, whereas the correlation can vary greatly. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Correlation
Mike, In the bivariate case, regression and correlation are identical. Assuming you want to select one of your proxy measures to use in place of the expensive 'true" measure, run the regression models--"true" measure regressed on each of the "different techniques". The r's that you will get can be interpreted the same as the correlation coefficient you would calculate. Of course, r^2 is the coefficient of determination--the amount of total variance in the dependent variable attributable to variation in the independent variable. Compare these across your proxy measures of the "true" score and pick the best one. You also then have your prediction model. Model fit and the like can be assessed in the typical manner for regression. -Original Message- From: mbattagl [mailto:[EMAIL PROTECTED]] Sent: Wednesday, May 17, 2000 12:02 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Correlation I have data that measures light intensity with a number of different techniques. One of the measurements (a direct measurement and "true" measurement of light intensity) involves lots of time, labor, and expense. The other techniques are more practical in the sense of time and labor, but are indirect measurements (based on canopy structure (density, location of holes in the canopy, etc). My goal is to determine if the indirect measurements are valid estimates of the direct measurements. However, I would also like to predict light intensity based on the indirect methods. I can see two methods of analysis for this situation: correlation and regression. It seems that correlation would be the best option to validate the measurements to each other. If the measurements are correlated then the use of regression analysis would yield a prediction equation. For the correlation analysis, I can use Pearson or Spearman analysis. To use Pearson, the variables should be normally distributed. However, I have read that the distribution for correlation should be bivariate normally distributed. I understand how to test for normality with a univariate normal distribution but have no idea how to test for bivariate normal distribution. I am using the SAS program to do my analysis. Does anyone know how to test for bivariate normal distribution? If the variables are bivariate normally distributed then I use Pearson, but if they are not normally distributed I use Spearman. Is this correct? The regression analysis is also somewhat confusing. Regression analysis is based on the fact that the Y (dependent variable) is random and the X (independent variable) is fixed with no error. For my case, both X and Y are random and have some measurement error. Is it correct to use simple linear regression for this analysis or is there another type of analysis to obtain predictions? I apologize for such a long post, but I have been struggling with this analysis for sometime and the more information I obtain from Statistics books, the more confused I get. Thanks in advance, Mike === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ === === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Blackjack problem
Clip from earlier message... "The Player may choose to play exactly the same rules as the Dealer is REQUIRED to play; or the Player may choose some of the other options. Since the Player has more choices or options in play than does the Dealer, why does the Dealer have the statistical advantage? It seems to me the Player would have the advantage." Doesn't the law of large numbers figure in here somewhere too: 1. The probability of winning with the house strategy is known a priori and it is optimal (as someone else pointed out). 2. An individual playing with this same strategy may win or lose more or less in the short run. 3. With the volume of games the house plays, the empirical probability will approach the a priori probability in the long run--to the house's advantage. Simplistic and poorly articulated I am sure, but I think it captures the essence of the mechanism at work here. "The Player may choose to play exactly the same rules as the Dealer is REQUIRED to play; or the Player may choose some of the other options. Since the Player has more choices or options in play than does the Dealer, why does the Dealer have the statistical advantage? It seems to me the Player would have the advantage." === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
RE: Power for Pilot Studies
Seems to me that the notion of power in a pilot study is moot. Typically, a pilot study is a test of the research methodology and instruments. As such, your sample size is a pragmatic decision and should consist of enough observations to test your design and research instruments. For instance, if you are planning to assess the psychometric properties of an instrument that you are using, you need enough observations to meet the requirements of the statistical tests you chose, such as Cronbach's Alpha or a Principle Components. Power has nothing to do with this decision. === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
FW: could someone help me with this intro to stat. problem
Mike, With randomization pre, it is not necessary to take a pre-intervention measurement. Test the difference in confidence following the training. If it is significant, there is a difference. Decide what direction it is in and attribute the difference to the training. You can make this attribution because of random assignment even without pre-measure. -Original Message- From: Mike Wogan [mailto:[EMAIL PROTECTED]] Sent: Wednesday, December 08, 1999 2:16 PM To: Luv 2 muah 143 Cc: [EMAIL PROTECTED] Subject: Re: could someone help me with this intro to stat. problem On 8 Dec 1999, Luv 2 muah 143 wrote: 5 of 10 volunteers are randomly selected to receive self-defense training. The other 5 receive no training. At the end of the training period, all subjects complete a self-confidence questionnaire. a.) Is there a difference in self-confidence between the 2 groups (p.01)? b.) What are the effects of self-defense traing on self-confidence (I'm assuming a two-tailed test?). Explain analysis Please help, I can't figure it out...my mind has gone blank Without a pre-test measure of self-confidence, taken prior to the training, even if there is a significant difference post-training, it's not possible to tell whether the difference is the result of the training or was there to begin with. If there is a pre-post measurement of self-confidence, then you need a mixed model Anova, with Training vs. No Training as the between groups factor and Pre-Post as the within groups factor. Mike