[R] cannot calculate standard estimate with predict on loess

2012-05-04 Thread Saurav Pathak

Hi,

For some reason I have been unable to use the predict function when I 
desire the standard error to be calculated too.  For example, when I try 
the following:


l- loess(d~x+y, span=span, se=TRUE)
p- predict(l, se=TRUE)


I get the following error message:

Error in vector(double, length) : vector size cannot be NA
In addition: Warning message:
In N * M1 : NAs produced by integer overflow


But when I try the following:

l- loess(d~x+y, span=span, se=TRUE)
p- predict(l, se=FALSE)


I have no problem, and p$fit gives me the desired fitted values.  Note 
that the only difference in this piece of code is se=FALSE.


My data d is a vector, and x and y are vectors too of the same length.

Any help will be greatly appreciated.

Thanks,
Saurav

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cannot calculate standard estimate with predict on loess

2012-05-04 Thread Saurav Pathak

On 05/04/2012 10:39 AM, David Winsemius wrote:


On May 3, 2012, at 7:10 PM, Saurav Pathak wrote:


Hi,

For some reason I have been unable to use the predict function when I 
desire the standard error to be calculated too.  For example, when I 
try the following:


l- loess(d~x+y, span=span, se=TRUE)
p- predict(l, se=TRUE)



I don't know what effect the se=TRUE will have in the first call. As 
far as I can tell there is no such argument to loess(). Could it be 
that the extranous argument is have an adverse effect on the effort to 
later use predict (which does have such an argument.)?


Thanks for your reply.  I tried the following with the same error message:

l - loess(d~x+y, span=span)
p - predict(l, se=TRUE)

Saurav











I get the following error message:

Error in vector(double, length) : vector size cannot be NA
In addition: Warning message:
In N * M1 : NAs produced by integer overflow


But when I try the following:

l- loess(d~x+y, span=span, se=TRUE)
p- predict(l, se=FALSE)


I have no problem, and p$fit gives me the desired fitted values.  
Note that the only difference in this piece of code is se=FALSE.


My data d is a vector, and x and y are vectors too of the same length.

Any help will be greatly appreciated.

Thanks,
Saurav

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loess question

2011-10-07 Thread Saurav Pathak

Hi All,

I am trying to use loess to smooth a 2D image, and also obtain the 
standard error for every pixel.  I see that the standard error does not 
make sense.  For example, running the following:


library(stats)
x - array(c(1:100), dim=c(100,100))
y - t(x)
v - exp(-((x-50)^2+(y-50)^2)/30^2)
s - v*0.02
g_noise - rnorm(1, mean = 0, sd = s)
f - v + g_noise
f.loess - loess(f ~ x + y, span=0.1, data.frame(x=c(x),y=c(y),f=c(f)))
f.predict - predict(f.loess, data = data.frame(x = c(x), y = c(y), f = 
c(f)), span = 0.1,se=TRUE)

image(1:100,1:100,matrix(f.predict$se,nrow=100))

I get an image of the standard error that has peaks at regular grid 
nodes.  Shouldn't I expect to see roughly the same error that I put in 
(in this case g_noise)?  I notice that the noise peaks move apart for 
higher span values.


Thanks for your help!
Saurav

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the c implementation of loess

2011-06-17 Thread Saurav Pathak

Hi All,

I am trying to trace the origin of the current loess implementation in 
R.  The reference mentions that Prof Ripley based it on the 1998 version 
of dloess.  When I look at dloess in http://www.netlib.org/a, the file 
changes mentions dloess was made available in 1992 and that a memory 
leak was plugged in 1996 with no mention of 1998.  Is there another 
version available?


Thanks,
Saurav

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 2d loess question

2011-06-14 Thread Saurav Pathak

Hi,

We have been trying to use loess on 2D data (basically a matrix) in the 
following way:


x - 1:256
y - 1:256
z - data # input from data
z.loess = loess(z ~ x + y)

We get a 256 x 1 vector of fitted values with a 256 x 256 array of 
residuals,

but not a 256 x 256 array of fitted values.

Why would this be?  I think we are using loess incorrectly but can't 
figure out what is wrong.  I have looked at past messages on this 
mailing list and searched the web, without any more insight.  Any help 
would be much appreciated.


Thanks,
Saurav

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create MULTILEVELS in a dataset??

2009-10-19 Thread saurav pathak
Hi Ista
You got that correct, yearctry is a composite created as yearctry =
year*1+country, so that say for example USA with country code 1 and year
2000 will be 201, for year 2005, it will be 2005001, the years are
listed from 2000 to 2008, for many countries, for UK say it will be 244
and 2005044 and so on for various years from 2000-2008 and various
countries, I am listing the result of str(e) here,

'data.frame':   902533 obs. of  18 variables:
 $ yearctry: num  2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 ...
 $ discent : int  0 0 0 NA 0 1 0 0 0 NA ...
 $ age : int  51 46 26 24 19 18 20 19 25 19 ...
 $ gender  : int  1 2 1 1 1 1 1 1 1 1 ...
 $ gemeduc : int  0 0 111 111 111 111 111 111 111 111 ...
 $ gemhhinc: int  33 33 33 33 33 33 33 33 33 33 ...
 $ ref_group   : int  1 2 3 3 3 3 3 3 3 3 ...
 $ fearfail_ref: num  1 NA 0.473 0.473 0.473 ...
 $ knowent_ref : num  0 NA 0.484 0.484 0.484 ...
 $ nbgoodc_ref : num  NA 0 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.84 ...
 $ nbstatus_ref: num  NA 1 0.846 0.846 0.846 ...
 $ estbbuso_ref: num  0 0 0.0172 0.0172 0.0172 ...
 $ lngdp   : num  8.99 9.08 9.29 9.13 8.99 ...
 $ lngdpsq : num  19.5 19.4 19.2 19.4 19.5 ...
 $ es_gdppcppp : num  7995 8804 10872 9189 7995 ...
 $ sq_gdppcppp : num  3.01e+08 2.74e+08 2.10e+08 2.61e+08 3.01e+08 2.74e+08
2.10e+08 3.01e+08 2.10e+08 2.61e+08 ...
 $ estbbo_m: num  0.1063 0.078 0.049 0.0355 0.1063 ...
 $ es_gdpchg   : num  -10.9 8.837 9.179 -0.789 -10.9 ...

a portion of yearctry is also listed

 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07
[65391] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65417] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65443] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65469] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65495] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65521] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65547] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65573] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07
2e+07 2e+07 2e+07
[65599] 2e+07 2e+07 2e+0

By looking at the above I dont know whether R recognises them as different
numbers, here the exponential format of representing yearctry does not
reveal whether it takes yearctry as I explained above( ie whether it
recognises 201 different from 2005001 , all of them appear to be 2e+07),
if that is the case how do I make R to recognise it as the number 201
and so on, Stata too lists as an exponential format but I know that it
recognises yearctry values different for different yeras and countries,
please help

I have to shift to R because Stata is taking days and days to run gllamm

Kindly help



On Mon, Oct 19, 2009 at 2:19 PM, Ista Zahn istaz...@gmail.com wrote:

 HI,
 Please keep r-help copied on the reply -- hopefully someone will pick
 up this thread and help us out.

 On Mon, Oct 19, 2009 at 2:17 AM, saurav pathak pathak.sau...@gmail.com
 wrote:
  Dear Ista
  Thanks for answering, the previous question was a primer to what I
 wanted, I
  did just what you said with yearctry  below as the country code or
 group
  variable, ie yearctry (data grouped by yearctry) was the variable I was
  using to pass as the country id.

 I suggested using country as the grouping variable. what is yearcty?
 From the name it sounds like a composite of year and country.

  Kindly notice that after running the lmer
  model, it recognises yearctry as the group, but shows no of groups
 :Groups:
  yearctry,1, this means it did not recognise yearctry as the variable by
  which the data is grouped. The number should be 239 and not 1

 That's weird. What does

 str(e)

 say?

 
  But please see below:
 
  My data set is e
 
  names(e)
   [1] yearctry discent  age  gender
  gemeduc  gemhhinc ref_groupfearfail_ref knowent_ref
  nbgoodc_ref
  [11] nbstatus_ref estbbuso_ref lngdplngdpsq
  es_gdppcppp  sq_gdppcppp  estbbo_m es_gdpchg
 
  hear I have variables representing two levels, namely

[R] How to create MULTILEVELS in a dataset??

2009-10-18 Thread saurav pathak
Dear R users

I have a data set which has five variables. One depenedent variable y, and 4
Independent variables (education-level, householdincome, countrygdp and
countrygdpsquare). The first two are data corresponding to the individual
and the next two coorespond to the country to which the individual belongs
to. My data set does not make this distinction between individual level and
country level. Is there a way such that I can make R make countrygdp and
countrygdpsquare at a different level than the individual level data. In
other words I wish to transform my dataset such that it recognizes two
individual level variables to be at Level-1 and the other two country level
variables at Level-2.

I need to run a multilevel model, but first I must make my dataset recognise
data at Level-1 and Level-2. How can I create this country level group (gdp
and gdp^2) such that I can perform a multilevel model as follows:

lmer(y ~ education-level + householdincome + countrygdp + countrygdpsquare +
(1 I Level2),family=binomial(link=probit),data=dataset)

Please kindly help me with the relevant commands for creating this Level2
(having two variables)

Thanks
Saurav





Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer function and Inverse mills ratio

2009-10-17 Thread saurav pathak
Dear R users
I have two questions, I have been on this problem for last 3 months, please
help

First question:

*How can I use the lmer function for a three level probit ( ie please help
me with the command syntax)?*

The second question is,

*how can I then subsequently calculate the Inverse Mills ratio after the
above probit is calculated using lmer?* Is there any other way (if lmer does
not do this) to get IMR of a 3 level probit

Kindly help
Thanks
Saurav

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to cluster data for use with lmer

2009-10-17 Thread saurav pathak
Dear R users
My data set is e

 names(e)
 [1] yearctry discent  age  gender
gemeduc  gemhhinc ref_groupfearfail_ref knowent_ref
nbgoodc_ref
[11] nbstatus_ref estbbuso_ref lngdplngdpsq
es_gdppcppp  sq_gdppcppp  estbbo_m es_gdpchg

hear I have variables representing two levels, namely individual level and
country level. My data is thus a 2 level data. the country level variables
(level-2) are lngdplngdpsq  es_gdppcppp  sq_gdppcppp
estbbo_m es_gdpchg grouped by yearctry and the rest of the
variables are individual level (level-1).

the  number of Individual observations are 655078 and number of yearctry ie
groups =239, however when I model a probit to see the influence of 4
individual level var (ie age gender gemeduc and gemhhinc) and one country
level var (es_gdppcppp) using

 prb1-lmer(discent~age+gender+gemeduc+gemhhinc+es_gdppcppp+(1 |
yearctry),family=binomial(link=probit),data=e)

I get

Generalized linear mixed model fit by the Laplace approximation
Formula: discent ~ age + gender + gemeduc + gemhhinc + es_gdppcppp + (1
|  yearctry)
   Data: e
AICBIC logLik deviance
 194043 194122 -97014   194029
Random effects:
 Groups   NameVariance   Std.Dev.
 yearctry (Intercept) 4.0708e-06 0.0020176
*Number of obs: 655078, groups: yearctry, 1*
Fixed effects:
  Estimate Std. Error z value Pr(|z|)
(Intercept) -7.578e-01  1.839e-02  -41.20   2e-16 ***
age -2.441e-03  2.990e-04   -8.16 3.30e-16 ***
gender  -2.886e-01  7.710e-03  -37.43   2e-16 ***
gemeduc  9.244e-05  6.930e-06   13.34   2e-16 ***
gemhhinc-8.938e-07  1.359e-07   -6.58 4.75e-11 ***
es_gdppcppp -2.459e-05  2.691e-07  -91.40   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) agegender gemedc gmhhnc
age -0.580
gender  -0.563 -0.138
gemeduc -0.373  0.166  0.011
gemhhinc-0.009 -0.132 -0.024 -0.201
es_gdppcppp -0.490  0.071  0.314 -0.297  0.256
*The model did not recognise group to be yearctry and shows 1 instead of
239,* can somebody help me as to how to make my model recognise es_gdppcppp
as a country level variable grouped by yearctry (such that yeractry no of
obs should be 239)

Please help
Thanks in advance
Saurav






-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inverse Mills in clustered (multilevel) cross-sectional panel data

2009-09-08 Thread saurav pathak
 Dear R saviors,
kindly address to this problem, I would really appreciate any takers. I am
trying to resolve this issue of IMR in clustered (multilevel)
cross-sectional panel data for more than two months now,.

The characteristics of my dataset are as follows:
-   some 900 000 individuals
-   total of 60 countries
-   cross-sectional time series at the country level max 10 years, not all
countries included every year

For each country, we have a maximum of 10 cross sectional samples (1 per
year) of at least 2000 adult-age individuals (random selection). But,
individuals are not followed over time. Every year a new random sampling is
carried out.
I am interested in analysing individuals' behaviors in a given economic
activity -- entrepreneurship. To do this, I first need to control for the
fact that some individuals self-select to entrepreneurship. This
self-selection may be influenced by individual-level characteristics (such
as age, gender, education etc) as well as country-level factors (e.g.,
taxation). Because both individual- and country-level factors may drive both
self-selection and behavior, once self-selection has occurred, *multi-level
techniques are required for the selection equation. How to do this in R. *The
results of this selection equation would then be used as a control in the
second stage where an OLS is to be run

Thank you for any suggestions




-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PROBIT REGRESSION FOR GROUPED/CLUSTERED DATA

2009-07-16 Thread saurav pathak
Hello all

I have been working to fix this for weeks now, It should be simple to fix.
Please help

Let me explain what I am doing, I have a data set for 65 countries over a
period of 9 years (2000-2008). Each country has on an average say 2000
interviews, so that the total set has roughly 65*9*2000 data
points/observations (of course there are missing vales as well). Now let me
explain how are the data clustered or grouped. I use the variable yearctry
which is computed as year*1+ international phone code of the country,
say for example USA with calling code 001 for the year 2000 will have a
yearctry value = 201. Under this particular value of yearctry of 201
there are roughly 2000 observations, next for the same year for say UK the
yearctry value would be 244 (having roughly 2000 observations) , and
similarly so on for the rest of the 63 countries for the year 2000 and all
other years from 2000 to 2008. For say the year 2001, the values of yearctry
for USA and UK would be 2001001 and 2001044 respectively (again 2000
obseravations for each country roughly) and so on for the other 63 countries
as well. So the data set is *grouped/clustered using yearctry*

I am trying to look into a selection bias if any within each yearctry (ie
2000 observation for one country for 9 years and so on for 65 countries)
value, essentially therefore I wish to check for 65*9 values of yearctry
with each yearctry having 2000 observations roughly. Hence I use the
glm/probit to look into the selection bias where all my dependant variable
s are either  0 or 1. The formula

*myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc +
imf_pop + estbbo_m, family = binomial(link = probit), data =
adpopdata)*

is the Heckman selection equation based on all observations without taking
into account the fact that each yearctry is unique, I want the selection
equation to recognise the uniqueness of each yearctry value , takes one
yearctry at a time, estimates the probit, goes to the next yearctry
repeats the probit regression and then give me the result. At the moment I
do not accomplish that using the above formula. The above formula does
regression on a bulk basis, but I wish that it recognises one yearctry from
the other and then performs the regression for all yearctry values and
finally produces me the result

Is there any other model recommended that should do the job other than the
glm???If Yes please help how?

Let me give you the exact command that Stata uses, so that things become
very clear:

*xtprobit s age gender gemeduc gemhinc es_gdppc imf_pop estbbo_m,
i(yearctry)*

This does exactly what I wish to accomplish in R, ie does the heckman
selection equation for the selection variables (seven in my case) based upon
the uniqueness of yearctrty

I have worked weeks on this, kindly help me, I think it is a small issue to
fix in the equation, although since I am new to R, I do not exactly know
what exactly will fix my problem, so any help will be highly appreciated
Thanks

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] DECLARING A PANEL VARIABLE???

2009-07-15 Thread saurav pathak
Hi
I am working on a panel data, my data are clustered/grouped by the variable
yearctry, I am running the regression below, but I cant make the
regression recognise yearctry as the panel variable in the regression

myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc +
imf_pop + estbbo_m, family = binomial(link = probit), data =
adpopdata)

Can anyone help me do this please???

Thanks
Saurav

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ERROR message while using -invMillsRatio()

2009-07-12 Thread saurav pathak
Hi
I have been trying so many different things to get my Inverse Mills Ratio
going for a Two stage Heckman Model, I have tried the following so far (the
commands are listed below till teh point where I get an error), I get an
error in the last sentence (marked in bold below), if this were successful
then I could have used the IMR as a control in my OLS (which would be the
OLS for the outcome equation), what I see is that the number of rows of IMR
calculated and the number of rows in the actual data set do not match and
hence IMR could not be added to my original data set, how do I fix this and
then proceed to get correct IMR to use in my outcome equation, Can someone
please help???/

 load(C:\\adpopdata1.Rdata)
 ls()
[1] adpopdata
 attach(adpopdata)
 names(adpopdata)

 [1] ctry   setid  gender ageyear
 [6] ownmge busang futsup discentknowent
[11] opport suskillfearfail   nbgoodcnbstatus
[16] sunowjob   suyr5job   omnowjob   omyr5job   gemwork3
[21] suboanwbabybuso   estbbuso   suboan_m   babybo_m
[26] estbbo_m   es_gdppc   ief_fisc   yearctry   ief_ipr
[31] wb_law wb_corrimf_popimf_pgro   es_gdpch
[36] sq_gdppc   ln_oy5 gemedu gemhincipr_edu
[41] ipr_hinc   age_sq ln_oy5_a   ln_oy5_1   ln_sy5
[46] ln_osy5ln_oy5_e   X_est_full o_tmitro_citr
[51] o_tdtr ln_oy5_2   invmills   ln_onj ln_diff
[56] omjoagro   omjorgro   ln_orgro   ln_oagro   sunowjo1
[61] osnowjob   ln_osnow   osyr5job   osjorgro   ln_osrgr
[66] ln_osagr   s_nacnew   p  phicapphi
[71] invmil_4   subobaby   fearf_mdisc_m dis_fear
[76] fearf_rfearf_rm   dis_ff_r   s

 library(sampleSelection)
 myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc +
+ imf_pop + estbbo_m, family = binomial(link = probit))
 summary(myProbit)

Call:
glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc +
imf_pop + estbbo_m, family = binomial(link = probit))

Deviance Residuals:
Min   1Q   Median   3Q  Max
-0.7660  -0.3053  -0.2462  -0.1984   3.2166

Coefficients:
  Estimate Std. Error z value Pr(|z|)
(Intercept) -2.439e+00  2.985e-02 -81.714   2e-16 ***
age -6.436e-03  3.743e-04 -17.193   2e-16 ***
genderMALE   1.785e-01  9.424e-03  18.945   2e-16 ***
gemedu   2.128e-02  4.698e-03   4.528 5.95e-06 ***
gemhinc  1.062e-01  6.185e-03  17.166   2e-16 ***
es_gdppc 9.765e-06  1.400e-06   6.977 3.02e-12 ***
imf_pop  4.131e-06  1.707e-05   0.2420.809
estbbo_m 5.932e+00  1.088e-01  54.539   2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 84763  on 257357  degrees of freedom
Residual deviance: 80515  on 257350  degrees of freedom
  (85893 observations deleted due to missingness)
AIC: 80531

Number of Fisher Scoring iterations: 6
*
 adpopdata$IMR-invMillsRatio(myProbit)$IMR1
Error in `$-.data.frame`(`*tmp*`, IMR, value = c(2.50039945424535,  :
  replacement has 257358 rows, data has 343251*







-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Heckman Selection Model/Inverse Mills Ratio

2009-07-11 Thread saurav pathak
I have so far used the following command

glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc +
imf_pop + estbbo_m, family = binomial(link = probit))

My question is
1. How do i discard the non significant selection variables (one out of the
seven variables above is non-significant) and calculate the Inverse Mills
Ratio of the significant variables

2. I need the inverse mills ratio from the above to run the outcome equation
model using OLS with the Inverse mills ratio as the control for selection
bias, kindly help, hence I need to get the IMR

3. How can this eb done in R using my concept or otherwise does there exist
anotehr way of doing what I wish to achieve
Please help
Thanks#
Saurav

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Modifying Data from a Panel for Probit Regression

2009-07-10 Thread saurav pathak
Hi
I have two questions

1. I am working on a panel data, the variable *A  *has some values as
missing denoted by . and others are  non-missing values. I wish to
create another variable *B  *from the variable *A *such that all the
missing values are assigned as ZERO (0) and all the non-missing values as
ONE (1). This is preparation for running a probit regression with B as the
dependant variable and hence we need 0B1. The expected outcome should look
like below:

* A**B
*.   0
.   0
.   0
.   0
.6931472   1
.   0
.   0
.   0
.6931472   1
.   0
.   0
.   0
.   0
.   0
1.098612   1

2. How do I then see (what command) whether the desired outcome for B has
occured or not, in other words how do we tabulate B?
Kindly provide the commands/operations needed to attain the above

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GLM for Probit for Panel Data

2009-07-10 Thread saurav pathak
Hello
I am working on a panel data, my panel variable is the variable yearctry,
let me explain what I mean, yearctry is calculated based on the year and the
ISD phone code of a country, eg, for the year 2000 say and for country USA
say (code = 001), my yearctry variable will then be 201, there are 2000
observations (ie 2000 individual responses with yearctry = 201), I have
65 different countries in the survey and time span from 2000-2008 ie 9
years.

My question is:

How do I make R recognise yearctry as my panel variable when I am running a
glm like the one below:

 glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc +
imf_pop + estbbo_m, family = binomial(link = probit))

What changes/additions do I make to this above probit regression command
such that the variable yearctry could be passed/assigned as the panel
variable?

Kindly help
Thanks in advance
Saurav

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] getOptions(max.print) in R

2009-07-01 Thread saurav pathak
I am typing the following on the command prompt:

variab = read.csv(file.choose(), header=T)

variab

It lists 900,000 ( this is the total number of observations in variab )
minus 797124 observations and prompts the following message

[ reached getOption(max.print) -- omitted 797124 entries ]]

Is there a way to see the entire set of data, ie all of 900,000 obs, and how
to then save this variab

Thanks
Saurav

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stata file and R Interaction :File Size Problem in Import

2009-06-30 Thread saurav pathak
Hi

 I am using Stata 10 and I need to import a data set in stata 10 to R, I
have  saved the dataset in lower versions of Stata as well by using saveold
 command in Stata.

 My RAM is 4gb and the stata file is 600MB, I am getting an error message
 which says :

 Error: cannot allocate vector of size 3.4 Mb
 In addition: There were 50 or more warnings (use warnings() to see the
first
 50)

 Thus far I have already tried the following

 1. By right clicking on the R icon I have used --max-mem-size=1000M in the
 target under properties of the R icon

 2. I have used library(foreign) at teh command prompt

 3. then I use trialfile - read.dta(C:/filename.dta)

  Here I get error for a Stata data file that is 600MB in size, however,
with
 data set in Stata 10 and Stata 8 of the size of 200KB, I have successfully
 being able to import the stata file in R

 I am therefor confused whteher there is problem with the version of my
stata
 file (which should not eb the case as I the smaller file of both versions
 are working fine) or is it the size issue,

 Its pretty important for me, kindly address this question
 Thanks
 Saurav

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stata file Import and Analysis in R

2009-06-29 Thread saurav pathak
Hi
I have a  stata data set (.dta file) of size 600 MB, I need to import it in
R and do a 2SLS multilevel analysis on the data set. I would eb grateful if
help is provided for the first part of how to import this big file from
Stata to R and then how to open the imported Stata file in R?
Kindly help
Thanks in advance

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Large Stata file Import in R

2009-06-29 Thread saurav pathak
Hi

I am using Stata 10 and I need to import a data set in stata 10 to R, I have
saved the dataset in lower versions of Stata as well by using saveold
command in Stata.

My RAM is 4gb and the stata file is 600MB, I am getting an error message
which says :

Error: cannot allocate vector of size 3.4 Mb
In addition: There were 50 or more warnings (use warnings() to see the first
50)

Thus far I have already tried the following

1. By right clicking on the R icon I have used --max-mem-size=1000M in the
target under properties of the R icon
2. I have used library(foreign) at teh command prompt
3. then I use trialfile - read.dta(C:/filename.dta)
 Here I get error for a Stata data file that is 600MB in size, however, with
data set in Stata 10 and Stata 8 of the size of 200KB, I have successfully
being able to import the stata file in R

I am therefor confused whteher there is problem with the version of my stata
file (which should not eb the case as I the smaller file of both versions
are working fine) or is it the size issue,

Its pretty important for me, kindly address this question
Thanks
Saurav


-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stata file Import and Analysis in R

2009-06-29 Thread saurav pathak
Hi
I have a  stata data set (.dta file) of size 600 MB, I need to import it in
R and do a 2SLS multilevel analysis on the data set. I would eb grateful if
help is provided for the first part of how to import this big file from
Stata to R and then how to open the imported Stata file in R?
Kindly help
Thanks in advance

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summing over an index of an array

2008-02-19 Thread Saurav Pathak
Hi,

I cannot seem to figure out how to sum over an index of a array.
For example, let A be a 3 dimensional array.  I want to, say, find
the sum over the first dimension.  That is

S_jk = Sum_i A_ijk

where now S is a 2-dim matrix.  I dont want to use a loop.

Thanks,
-- 
saurav

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.