[R] Statistical significance in robust estimation rlm()
Hi there, I used the rlm() function for doing a robust estimation based on M-estimates. Obviously, you only get the estimate, standard error and t- value by implementing this rlm() function. So, how can I say if a coefficient is statistical significant without the presence of a p-value? Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Statistical-significance-in-robust-estimation-rlm-tp4650275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test in R
Given my acknowledged statistical ignorance, I tried to find a *solution *in this forum... And this is not primarily a statistical issue, it is an issue about the Hausman test in the R environment. I cannot imagine, no one in this forum has ever done a Hausman test on OLS regressions. I read in the systemfit package and found only this example referring to 2SLS and 3SLS regressions: data( Kmenta ) eqDemand - consump ~ price + income eqSupply - consump ~ price + farmPrice + trend inst - ~ income + farmPrice + trend system - list( demand = eqDemand, supply = eqSupply ) ## perform the estimations fit2sls - systemfit( system, 2SLS, inst = inst, data = Kmenta ) fit3sls - systemfit( system, 3SLS, inst = inst, data = Kmenta ) ## perform the Hausman test h - hausman.systemfit( fit2sls, fit3sls ) print( h ) -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647774.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test in R
Thanks for your answer, John! Having read in Wooldridge, Verbeek and Hausman himself, I tried to figure out how this whole Hausman test works. I tried to figure out, if endogeneity exists in my particular case. So I did this Y ~ X + Z + Rest + error term [# this is the the original regression with Z = instrumental variable for X, X = potentially endogenous variable and Rest = more independent variables] Regression 1: X ~ Z + Rest + error term Regression 2: Y ~ X + Rest + residuals(Reg1) + error [# I took the residuals from Regression 1 by Reg1_resid - cbind(Red1$resid) Finally, if the coefficient for the residuals is statistically significant, there is endogeneity. Is this approach correct? p.s: My p-value is 0.1138... Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hausman test in R
Hi there, I am really new to statistics in R and statistics itself as well. My situation: I ran a lot of OLS regressions with different independent variables. (using the lm() function). After having done that, I know there is endogeneity due to omitted variables. (or perhaps due to any other reasons). And here comes the Hausman test. I know this test is used to identify endogeneity. But what I am not sure about is: Can I use the Hausman test in a simple OLS regression or is it only possible in a 2SLS regression model? And if it is possible to use it, how can I do it? Info about the data: data = lots of data :) x1 - data$x1 x2 - data$x2 x3 - data$x3 x4 - data$x4 y1 - data$y1 reg1 - summary(lm(y1 ~ x1 + x2 + x3 + x4)) Thanks in advance for any support! -- View this message in context: http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Export summary from regression output
Hi there, I tried it many times but didn't get it worked. I just want to export the summary of a OLS regression (lm() function) into a csv-file including the call-formula, coefficients, r-squared, adjusted r-squared and f statistic. I know I can export: write.csv2(Regression_60d_ann$coefficients, Regression_60d_ann.csv) But then I only get the coefficients, but not all the other output... I tried creating a matrix and I wanted to put in Regression_60d_ann$coefficients, Regression_60d_ann$adj.r.squared, Regression_60d_ann$r.squared, etc. but it didn't work due to different length of rows. Can anyone help or has a better solution? Thanks in advance Felix -- View this message in context: http://r.789695.n4.nabble.com/Export-summary-from-regression-output-tp4647109.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to count rows with a condition
Hi, I have a dataset called data. There is one row called ac_name. Some names in this column appear very often, some less. What I want is to filter this dataset with the following condition: Exclude the names, which appear more than five times. (example: House A appears 8 times == exclude it; House B appears 5 times == include it etc.) In the end, I want to have the old data dataset excluding the rows with the above mentioned condition and another list with all the names which have been excluded. I think for one of the professionals amongst you this is pretty easy to solve. ;-) Thanks dudes! Cheerio, Felix -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to count rows with a condition
Thanks for the first reply. Unfortunately, my list of different ac_names ist pretty long (about 1,000 different names). Is there a way, to sort them, count the quantity of each name and exclude these rows, who exceed a particular limit? -- View this message in context: http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454p4646465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a correlation matrix with significance levels
Hi there, I tried this code from homepage: http://myowelt.blogspot.de/2008/04/beautiful-correlation-tables-in-r.html http://myowelt.blogspot.de/2008/04/beautiful-correlation-tables-in-r.html corstarsl - function(x){ require(Hmisc) x - as.matrix(x) R - rcorr(x)$r p - rcorr(x)$P ## define notions for significance levels; spacing is important. mystars - ifelse(p .001, ***, ifelse(p .01, ** , ifelse(p .05, * , ))) ## trunctuate the matrix that holds the correlations to two decimal R - format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1] ## build a new matrix that includes the correlations with their apropriate stars Rnew - matrix(paste(R, mystars, sep=), ncol=ncol(x)) diag(Rnew) - paste(diag(R), , sep=) rownames(Rnew) - colnames(x) colnames(Rnew) - paste(colnames(x), , sep=) ## remove upper triangle Rnew - as.matrix(Rnew) Rnew[upper.tri(Rnew, diag = TRUE)] - Rnew - as.data.frame(Rnew) ## remove last column and return the matrix (which is now a data frame) Rnew - cbind(Rnew[1:length(Rnew)-1]) return(Rnew) } Output_cor - xtable(corstarsl(swiss[,1:4])) setwd(paste(path,Output/Correlation/,sep=)) print.xtable(Output_cor, type=html, file=correlation.html) In this example it shows the output of package example Hmisc. I want to use this code for my own matrix called: Corr_Matrix - cbind(MA_data_raw$1, MA_data_raw$2, MA_data_raw$3, MA_data_raw$4, MA_data_raw$5, MA_data_raw$6, MA_data_raw$7, MA_data_raw$8, I(MA_data_raw$21/MA_data_raw$20), MA_data_raw$9) How can I do this? Thanks! I appreciate all helpful answers! ;-) -- View this message in context: http://r.789695.n4.nabble.com/Creating-a-correlation-matrix-with-significance-levels-tp4645984.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create a column in dependence of another column
Hi there, I'm sorry for the bad subject decision. Couldn't describe it better... In my dataset called dataSet I want to create a new variable column called deal_category which depends on another column called trans_value. In column trans_value I have values in USDm. Now what I want to do is to give these values a category called low, medium or high. The classification depends on the size of the values. low, if value in trans_value is 200 USDm medium, if value x in trans_value is: 200 USDm = x 500 USDm high, if value in trans_value is: = 500 USDm Having defined these deals with low, medium, high I want to run a lm() with these categories as independent variable. deal_category2 - factor(deal_category) levels(deal_category2) - c(low, medium, high) reg_1 - lm(dep_var1 ~ indep_1 + indep_2 + deal_category2) summary(reg_1) Is this correct? Does R recognize my categories as variables? Thanks for all your support! Felix -- View this message in context: http://r.789695.n4.nabble.com/How-to-create-a-column-in-dependence-of-another-column-tp4645548.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating the mean in one column with empty cells
Hi, the first command was bringing the numbers into R directly: * testdata - c(0.2006160108532920, 0.1321167173880490, 0.0563941428921262, 0.0264198664609803, 0.0200581303857603, -0.2971754213679500, -0.2353086361784190, 0.0667195538296534, 0.1755852636926560) mean(testdata) [1] 0.0161584* Here I tried to calculate the mean with the same numbers as given above, but taken from my dataset. * str(dataSet2$ac_bhar_60d_4d_after_ann[1:9]) num [1:9] 0.2 0.13 0.06 0.03 0.02 -0.3 -0.24 0.07 0.18 mean(dataSet2$ac_bhar_60d_4d_after_ann[1:9]) [1] 0.0167 * It seems that in the second case he calculates the mean with rounded numbers (0.2 and not 0.20061601085...) Could it be that R imports only the rounded numbers? How can I build a CSV-file with numbers showing all decimal places? Because I think my current CSV-file only has numbers with 2 decimal places. Kind Regards, Felix -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645252.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating the mean in one column with empty cells
I created a Microsoft Excel spreadsheet. As you said, I only have as displayed numbers. I just solved the problem by showing 25 decimal places in Excel and then exported the data into a CSV-file. Is there a better way to solve this? Regards, Felix -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645278.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating the mean in one column with empty cells
Hi all, I recently tried to calculate the mean and the median just for one column. In this column I have numbers with some empty cells due to missing data. So how can I calculate the mean just for the filled cells? I tried: mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)], na.rm=TRUE) But the output was different to the calculation I died in Microsoft Excel. Thanks in advance, Felix -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating the mean in one column with empty cells
I imported the whole dataset with read.csv2() and it works fine. (2 for German is correct ;) ) I already checked the numbers and I also tried to calculate the mean of a range of numbers where there is no NA given. (as mentioned in my last post above). -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645166.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating the mean in one column with empty cells
I'm sorry! Now I tried it again with just 10 numbers (just random numbers) and Excel gives a different output than R. Here are the numbers I used: 0,2006160108532920 0,1321167173880490 0,0563941428921262 0,0264198664609803 0,0200581303857603 -0,2971754213679500 -0,2353086361784190 0,0667195538296534 0,1755852636926560 And this is the command in R: nums - as.numeric(as.character(dataSet2$ac_bhar_60d_4d_after_ann[2:10])) m - mean(nums, na.rm = T) m The output of R is: print(m, digits= 12) [1] 0.01667 The output in Excel is: 0,0161584031062386 The numbers are imported correctly. Or does R reduce the imported numbers to any decimal place? (i don't think so ;-) ) Best Regards, Felix -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645165.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.