[R] Statistical significance in robust estimation rlm()

2012-11-21 Thread fxen3k
Hi there,

I used the rlm() function for doing a robust estimation based on
M-estimates.
Obviously, you only get the estimate, standard error and t- value by
implementing this rlm() function.
So, how can I say if a coefficient is statistical significant without the
presence of a p-value?

Thanks in advance!





--
View this message in context: 
http://r.789695.n4.nabble.com/Statistical-significance-in-robust-estimation-rlm-tp4650275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hausman test in R

2012-10-29 Thread fxen3k
Given my acknowledged statistical ignorance, I tried to find a *solution
*in this forum...
And this is not primarily a statistical issue, it is an issue about the
Hausman test in the R environment. 

I cannot imagine, no one in this forum has ever done a Hausman test on OLS
regressions.
I read in the systemfit package and found only this example referring to
2SLS and 3SLS regressions: 

data( Kmenta )
eqDemand - consump ~ price + income
eqSupply - consump ~ price + farmPrice + trend
inst - ~ income + farmPrice + trend
system - list( demand = eqDemand, supply = eqSupply )
## perform the estimations
fit2sls - systemfit( system, 2SLS, inst = inst, data = Kmenta )
fit3sls - systemfit( system, 3SLS, inst = inst, data = Kmenta )
## perform the Hausman test
h - hausman.systemfit( fit2sls, fit3sls )
print( h )




--
View this message in context: 
http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647774.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hausman test in R

2012-10-29 Thread fxen3k
Thanks for your answer, John!

Having read in Wooldridge, Verbeek and Hausman himself, I tried to figure
out how this whole Hausman test works.

I tried to figure out, if endogeneity exists in my particular case. So I did
this

Y ~ X + Z + Rest + error term [# this is the the original regression with Z
= instrumental variable for X, X = potentially endogenous variable and Rest
= more independent variables]
Regression 1:
X ~ Z + Rest + error term
Regression 2:
Y ~ X + Rest + residuals(Reg1) + error [# I took the residuals from
Regression 1 by Reg1_resid - cbind(Red1$resid)

Finally, if the coefficient for the residuals is statistically significant,
there is endogeneity. 

Is this approach correct?

p.s: My p-value is 0.1138...

Thanks for your help





--
View this message in context: 
http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716p4647800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hausman test in R

2012-10-28 Thread fxen3k
Hi there,

I am really new to statistics in R and statistics itself as well.
My situation: I ran a lot of OLS regressions with different independent
variables. (using the lm() function).
After having done that, I know there is endogeneity due to omitted
variables. (or perhaps due to any other reasons).
And here comes the Hausman test. I know this test is used to identify
endogeneity. 
But what I am not sure about is: Can I use the Hausman test in a simple OLS
regression or is it only possible in a 2SLS regression model? And if it is
possible to use it, how can I do it?

Info about the data:

data = lots of data :)

x1 - data$x1
x2 - data$x2
x3 - data$x3
x4 - data$x4
y1 - data$y1

reg1 - summary(lm(y1 ~ x1 + x2 + x3 + x4))

Thanks in advance for any support!



--
View this message in context: 
http://r.789695.n4.nabble.com/Hausman-test-in-R-tp4647716.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Export summary from regression output

2012-10-23 Thread fxen3k
Hi there,

I tried it many times but didn't get it worked.

I just want to export the summary of a OLS regression (lm() function) into a
csv-file including the call-formula, coefficients, r-squared, 
adjusted r-squared  and f statistic. 

I know I can export:
write.csv2(Regression_60d_ann$coefficients, Regression_60d_ann.csv)
But then I only get the coefficients, but not all the other output...

I tried creating a matrix and I wanted to put in
Regression_60d_ann$coefficients, Regression_60d_ann$adj.r.squared,
Regression_60d_ann$r.squared, etc. but it didn't work due to different
length of rows.


Can anyone help or has a better solution?

Thanks in advance
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/Export-summary-from-regression-output-tp4647109.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to count rows with a condition

2012-10-17 Thread fxen3k
Hi,

I have a dataset called data. There is one row called ac_name. Some
names in this column appear very often, some less. 
What I want is to filter this dataset with the following condition:

Exclude the names, which appear more than five times. (example: House A
appears 8 times == exclude it; House B appears 5 times == include it etc.)

In the end, I want to have the old data dataset excluding the rows with
the above mentioned condition and another list with all the names which have
been excluded.


I think for one of the professionals amongst you this is pretty easy to
solve. ;-)

Thanks dudes!

Cheerio,
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to count rows with a condition

2012-10-17 Thread fxen3k
Thanks for the first reply. 

Unfortunately, my list of different ac_names ist pretty long (about 1,000
different names). Is there a way, to sort them, count the quantity of each
name and exclude these rows, who exceed a particular limit?



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-count-rows-with-a-condition-tp4646454p4646465.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a correlation matrix with significance levels

2012-10-12 Thread fxen3k
Hi there,

I tried this code from homepage: 
http://myowelt.blogspot.de/2008/04/beautiful-correlation-tables-in-r.html
http://myowelt.blogspot.de/2008/04/beautiful-correlation-tables-in-r.html  

corstarsl - function(x){
require(Hmisc)
x - as.matrix(x)
R - rcorr(x)$r
p - rcorr(x)$P

## define notions for significance levels; spacing is important.
mystars - ifelse(p  .001, ***, ifelse(p  .01, ** , ifelse(p  .05, *
,  )))

## trunctuate the matrix that holds the correlations to two decimal
R - format(round(cbind(rep(-1.11, ncol(x)), R), 2))[,-1]

## build a new matrix that includes the correlations with their apropriate
stars
Rnew - matrix(paste(R, mystars, sep=), ncol=ncol(x))
diag(Rnew) - paste(diag(R),  , sep=)
rownames(Rnew) - colnames(x)
colnames(Rnew) - paste(colnames(x), , sep=)

## remove upper triangle
Rnew - as.matrix(Rnew)
Rnew[upper.tri(Rnew, diag = TRUE)] - 
Rnew - as.data.frame(Rnew)

## remove last column and return the matrix (which is now a data frame)
Rnew - cbind(Rnew[1:length(Rnew)-1])
return(Rnew)
} 

Output_cor - xtable(corstarsl(swiss[,1:4]))
setwd(paste(path,Output/Correlation/,sep=))
print.xtable(Output_cor, type=html, file=correlation.html)

In this example it shows the output of package example Hmisc. I want to use
this code for my own matrix  called:

Corr_Matrix - cbind(MA_data_raw$1, MA_data_raw$2, MA_data_raw$3,
MA_data_raw$4, MA_data_raw$5, MA_data_raw$6, MA_data_raw$7, MA_data_raw$8,
I(MA_data_raw$21/MA_data_raw$20), MA_data_raw$9)

How can I do this? 

Thanks! 
I appreciate all helpful answers! ;-)



--
View this message in context: 
http://r.789695.n4.nabble.com/Creating-a-correlation-matrix-with-significance-levels-tp4645984.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to create a column in dependence of another column

2012-10-09 Thread fxen3k
Hi there,

I'm sorry for the bad subject decision. Couldn't describe it better...

In my dataset called dataSet I want to create a new variable column called
deal_category which depends on another column called trans_value.
In column trans_value I have values in USDm. Now what I want to do is to
give these values a category called low, medium or high. The
classification depends on the size of the values. 

low, if value in trans_value is  200 USDm
medium, if value x in trans_value is: 200 USDm = x  500 USDm
high, if value in trans_value is: = 500 USDm

Having defined these deals with low, medium, high I want to run a lm() with
these categories as independent variable.

deal_category2 - factor(deal_category)
levels(deal_category2) - c(low, medium, high)
reg_1 - lm(dep_var1 ~ indep_1 + indep_2 + deal_category2)
summary(reg_1)

Is this correct? Does R recognize my categories as variables?

Thanks for all your support!

Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-create-a-column-in-dependence-of-another-column-tp4645548.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating the mean in one column with empty cells

2012-10-06 Thread fxen3k
Hi, 

the first command was bringing the numbers into R directly: 
* testdata - c(0.2006160108532920, 0.1321167173880490, 0.0563941428921262,
0.0264198664609803, 0.0200581303857603, -0.2971754213679500,
-0.2353086361784190, 0.0667195538296534, 0.1755852636926560)
 mean(testdata)
[1] 0.0161584*

Here I tried to calculate the mean with the same numbers as given above, but
taken from my dataset.
*
 str(dataSet2$ac_bhar_60d_4d_after_ann[1:9])
 num [1:9] 0.2 0.13 0.06 0.03 0.02 -0.3 -0.24 0.07 0.18
 mean(dataSet2$ac_bhar_60d_4d_after_ann[1:9])
[1] 0.0167
*

It seems that in the second case he calculates the mean with rounded numbers
(0.2 and not 0.20061601085...)
Could it be that R imports only the rounded numbers? 
How can I build a CSV-file with numbers showing all decimal places? Because
I think my current CSV-file only has numbers with 2 decimal places.


Kind Regards,
Felix




--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645252.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating the mean in one column with empty cells

2012-10-06 Thread fxen3k
I created a Microsoft Excel spreadsheet. As you said, I only have as
displayed numbers. I just solved the problem by showing 25 decimal places
in Excel and then exported the data into a CSV-file. 

Is there a better way to solve this?

Regards,
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645278.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating the mean in one column with empty cells

2012-10-05 Thread fxen3k
Hi all,

I recently tried to calculate the mean and the median just for one column.
In this column I have numbers with some empty cells due to missing data. 
So how can I calculate the mean just for the filled cells? 
I tried:
mean(dataSet2$ac_60d_4d_after_ann[!is.na(master$ac_60d_4d_after_ann)],
na.rm=TRUE)
But the output was different to the calculation I died in Microsoft Excel.

Thanks in advance,
Felix



--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread fxen3k
I imported the whole dataset with read.csv2() and it works fine. (2 for
German is correct ;) )

I already checked the numbers and I also tried to calculate the mean of a
range of numbers where there is no NA given. (as mentioned in my last post
above).




--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645166.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating the mean in one column with empty cells

2012-10-05 Thread fxen3k
I'm sorry! 

Now I tried it again with just 10 numbers (just random numbers) and Excel
gives a different output than R.

Here are the numbers I used:

0,2006160108532920
0,1321167173880490
0,0563941428921262
0,0264198664609803
0,0200581303857603
-0,2971754213679500
-0,2353086361784190
0,0667195538296534
0,1755852636926560

And this is the command in R:

 nums - as.numeric(as.character(dataSet2$ac_bhar_60d_4d_after_ann[2:10]))
 m - mean(nums, na.rm = T)
 m

The output of R is: 
 print(m, digits= 12)
[1] 0.01667

The output in Excel is:
0,0161584031062386

The numbers are imported correctly. Or does R reduce the imported numbers to
any decimal place? (i don't think so ;-) )

Best Regards,
Felix




--
View this message in context: 
http://r.789695.n4.nabble.com/Calculating-the-mean-in-one-column-with-empty-cells-tp4645135p4645165.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.