[R] cannot calculate standard estimate with predict on loess
Hi, For some reason I have been unable to use the predict function when I desire the standard error to be calculated too. For example, when I try the following: l- loess(d~x+y, span=span, se=TRUE) p- predict(l, se=TRUE) I get the following error message: Error in vector(double, length) : vector size cannot be NA In addition: Warning message: In N * M1 : NAs produced by integer overflow But when I try the following: l- loess(d~x+y, span=span, se=TRUE) p- predict(l, se=FALSE) I have no problem, and p$fit gives me the desired fitted values. Note that the only difference in this piece of code is se=FALSE. My data d is a vector, and x and y are vectors too of the same length. Any help will be greatly appreciated. Thanks, Saurav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cannot calculate standard estimate with predict on loess
On 05/04/2012 10:39 AM, David Winsemius wrote: On May 3, 2012, at 7:10 PM, Saurav Pathak wrote: Hi, For some reason I have been unable to use the predict function when I desire the standard error to be calculated too. For example, when I try the following: l- loess(d~x+y, span=span, se=TRUE) p- predict(l, se=TRUE) I don't know what effect the se=TRUE will have in the first call. As far as I can tell there is no such argument to loess(). Could it be that the extranous argument is have an adverse effect on the effort to later use predict (which does have such an argument.)? Thanks for your reply. I tried the following with the same error message: l - loess(d~x+y, span=span) p - predict(l, se=TRUE) Saurav I get the following error message: Error in vector(double, length) : vector size cannot be NA In addition: Warning message: In N * M1 : NAs produced by integer overflow But when I try the following: l- loess(d~x+y, span=span, se=TRUE) p- predict(l, se=FALSE) I have no problem, and p$fit gives me the desired fitted values. Note that the only difference in this piece of code is se=FALSE. My data d is a vector, and x and y are vectors too of the same length. Any help will be greatly appreciated. Thanks, Saurav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loess question
Hi All, I am trying to use loess to smooth a 2D image, and also obtain the standard error for every pixel. I see that the standard error does not make sense. For example, running the following: library(stats) x - array(c(1:100), dim=c(100,100)) y - t(x) v - exp(-((x-50)^2+(y-50)^2)/30^2) s - v*0.02 g_noise - rnorm(1, mean = 0, sd = s) f - v + g_noise f.loess - loess(f ~ x + y, span=0.1, data.frame(x=c(x),y=c(y),f=c(f))) f.predict - predict(f.loess, data = data.frame(x = c(x), y = c(y), f = c(f)), span = 0.1,se=TRUE) image(1:100,1:100,matrix(f.predict$se,nrow=100)) I get an image of the standard error that has peaks at regular grid nodes. Shouldn't I expect to see roughly the same error that I put in (in this case g_noise)? I notice that the noise peaks move apart for higher span values. Thanks for your help! Saurav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] the c implementation of loess
Hi All, I am trying to trace the origin of the current loess implementation in R. The reference mentions that Prof Ripley based it on the 1998 version of dloess. When I look at dloess in http://www.netlib.org/a, the file changes mentions dloess was made available in 1992 and that a memory leak was plugged in 1996 with no mention of 1998. Is there another version available? Thanks, Saurav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2d loess question
Hi, We have been trying to use loess on 2D data (basically a matrix) in the following way: x - 1:256 y - 1:256 z - data # input from data z.loess = loess(z ~ x + y) We get a 256 x 1 vector of fitted values with a 256 x 256 array of residuals, but not a 256 x 256 array of fitted values. Why would this be? I think we are using loess incorrectly but can't figure out what is wrong. I have looked at past messages on this mailing list and searched the web, without any more insight. Any help would be much appreciated. Thanks, Saurav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create MULTILEVELS in a dataset??
Hi Ista You got that correct, yearctry is a composite created as yearctry = year*1+country, so that say for example USA with country code 1 and year 2000 will be 201, for year 2005, it will be 2005001, the years are listed from 2000 to 2008, for many countries, for UK say it will be 244 and 2005044 and so on for various years from 2000-2008 and various countries, I am listing the result of str(e) here, 'data.frame': 902533 obs. of 18 variables: $ yearctry: num 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 ... $ discent : int 0 0 0 NA 0 1 0 0 0 NA ... $ age : int 51 46 26 24 19 18 20 19 25 19 ... $ gender : int 1 2 1 1 1 1 1 1 1 1 ... $ gemeduc : int 0 0 111 111 111 111 111 111 111 111 ... $ gemhhinc: int 33 33 33 33 33 33 33 33 33 33 ... $ ref_group : int 1 2 3 3 3 3 3 3 3 3 ... $ fearfail_ref: num 1 NA 0.473 0.473 0.473 ... $ knowent_ref : num 0 NA 0.484 0.484 0.484 ... $ nbgoodc_ref : num NA 0 0.84 0.84 0.84 0.84 0.84 0.84 0.84 0.84 ... $ nbstatus_ref: num NA 1 0.846 0.846 0.846 ... $ estbbuso_ref: num 0 0 0.0172 0.0172 0.0172 ... $ lngdp : num 8.99 9.08 9.29 9.13 8.99 ... $ lngdpsq : num 19.5 19.4 19.2 19.4 19.5 ... $ es_gdppcppp : num 7995 8804 10872 9189 7995 ... $ sq_gdppcppp : num 3.01e+08 2.74e+08 2.10e+08 2.61e+08 3.01e+08 2.74e+08 2.10e+08 3.01e+08 2.10e+08 2.61e+08 ... $ estbbo_m: num 0.1063 0.078 0.049 0.0355 0.1063 ... $ es_gdpchg : num -10.9 8.837 9.179 -0.789 -10.9 ... a portion of yearctry is also listed 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65391] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65417] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65443] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65469] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65495] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65521] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65547] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65573] 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 2e+07 [65599] 2e+07 2e+07 2e+0 By looking at the above I dont know whether R recognises them as different numbers, here the exponential format of representing yearctry does not reveal whether it takes yearctry as I explained above( ie whether it recognises 201 different from 2005001 , all of them appear to be 2e+07), if that is the case how do I make R to recognise it as the number 201 and so on, Stata too lists as an exponential format but I know that it recognises yearctry values different for different yeras and countries, please help I have to shift to R because Stata is taking days and days to run gllamm Kindly help On Mon, Oct 19, 2009 at 2:19 PM, Ista Zahn istaz...@gmail.com wrote: HI, Please keep r-help copied on the reply -- hopefully someone will pick up this thread and help us out. On Mon, Oct 19, 2009 at 2:17 AM, saurav pathak pathak.sau...@gmail.com wrote: Dear Ista Thanks for answering, the previous question was a primer to what I wanted, I did just what you said with yearctry below as the country code or group variable, ie yearctry (data grouped by yearctry) was the variable I was using to pass as the country id. I suggested using country as the grouping variable. what is yearcty? From the name it sounds like a composite of year and country. Kindly notice that after running the lmer model, it recognises yearctry as the group, but shows no of groups :Groups: yearctry,1, this means it did not recognise yearctry as the variable by which the data is grouped. The number should be 239 and not 1 That's weird. What does str(e) say? But please see below: My data set is e names(e) [1] yearctry discent age gender gemeduc gemhhinc ref_groupfearfail_ref knowent_ref nbgoodc_ref [11] nbstatus_ref estbbuso_ref lngdplngdpsq es_gdppcppp sq_gdppcppp estbbo_m es_gdpchg hear I have variables representing two levels, namely
[R] How to create MULTILEVELS in a dataset??
Dear R users I have a data set which has five variables. One depenedent variable y, and 4 Independent variables (education-level, householdincome, countrygdp and countrygdpsquare). The first two are data corresponding to the individual and the next two coorespond to the country to which the individual belongs to. My data set does not make this distinction between individual level and country level. Is there a way such that I can make R make countrygdp and countrygdpsquare at a different level than the individual level data. In other words I wish to transform my dataset such that it recognizes two individual level variables to be at Level-1 and the other two country level variables at Level-2. I need to run a multilevel model, but first I must make my dataset recognise data at Level-1 and Level-2. How can I create this country level group (gdp and gdp^2) such that I can perform a multilevel model as follows: lmer(y ~ education-level + householdincome + countrygdp + countrygdpsquare + (1 I Level2),family=binomial(link=probit),data=dataset) Please kindly help me with the relevant commands for creating this Level2 (having two variables) Thanks Saurav Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer function and Inverse mills ratio
Dear R users I have two questions, I have been on this problem for last 3 months, please help First question: *How can I use the lmer function for a three level probit ( ie please help me with the command syntax)?* The second question is, *how can I then subsequently calculate the Inverse Mills ratio after the above probit is calculated using lmer?* Is there any other way (if lmer does not do this) to get IMR of a 3 level probit Kindly help Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to cluster data for use with lmer
Dear R users My data set is e names(e) [1] yearctry discent age gender gemeduc gemhhinc ref_groupfearfail_ref knowent_ref nbgoodc_ref [11] nbstatus_ref estbbuso_ref lngdplngdpsq es_gdppcppp sq_gdppcppp estbbo_m es_gdpchg hear I have variables representing two levels, namely individual level and country level. My data is thus a 2 level data. the country level variables (level-2) are lngdplngdpsq es_gdppcppp sq_gdppcppp estbbo_m es_gdpchg grouped by yearctry and the rest of the variables are individual level (level-1). the number of Individual observations are 655078 and number of yearctry ie groups =239, however when I model a probit to see the influence of 4 individual level var (ie age gender gemeduc and gemhhinc) and one country level var (es_gdppcppp) using prb1-lmer(discent~age+gender+gemeduc+gemhhinc+es_gdppcppp+(1 | yearctry),family=binomial(link=probit),data=e) I get Generalized linear mixed model fit by the Laplace approximation Formula: discent ~ age + gender + gemeduc + gemhhinc + es_gdppcppp + (1 | yearctry) Data: e AICBIC logLik deviance 194043 194122 -97014 194029 Random effects: Groups NameVariance Std.Dev. yearctry (Intercept) 4.0708e-06 0.0020176 *Number of obs: 655078, groups: yearctry, 1* Fixed effects: Estimate Std. Error z value Pr(|z|) (Intercept) -7.578e-01 1.839e-02 -41.20 2e-16 *** age -2.441e-03 2.990e-04 -8.16 3.30e-16 *** gender -2.886e-01 7.710e-03 -37.43 2e-16 *** gemeduc 9.244e-05 6.930e-06 13.34 2e-16 *** gemhhinc-8.938e-07 1.359e-07 -6.58 4.75e-11 *** es_gdppcppp -2.459e-05 2.691e-07 -91.40 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Correlation of Fixed Effects: (Intr) agegender gemedc gmhhnc age -0.580 gender -0.563 -0.138 gemeduc -0.373 0.166 0.011 gemhhinc-0.009 -0.132 -0.024 -0.201 es_gdppcppp -0.490 0.071 0.314 -0.297 0.256 *The model did not recognise group to be yearctry and shows 1 instead of 239,* can somebody help me as to how to make my model recognise es_gdppcppp as a country level variable grouped by yearctry (such that yeractry no of obs should be 239) Please help Thanks in advance Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inverse Mills in clustered (multilevel) cross-sectional panel data
Dear R saviors, kindly address to this problem, I would really appreciate any takers. I am trying to resolve this issue of IMR in clustered (multilevel) cross-sectional panel data for more than two months now,. The characteristics of my dataset are as follows: - some 900 000 individuals - total of 60 countries - cross-sectional time series at the country level max 10 years, not all countries included every year For each country, we have a maximum of 10 cross sectional samples (1 per year) of at least 2000 adult-age individuals (random selection). But, individuals are not followed over time. Every year a new random sampling is carried out. I am interested in analysing individuals' behaviors in a given economic activity -- entrepreneurship. To do this, I first need to control for the fact that some individuals self-select to entrepreneurship. This self-selection may be influenced by individual-level characteristics (such as age, gender, education etc) as well as country-level factors (e.g., taxation). Because both individual- and country-level factors may drive both self-selection and behavior, once self-selection has occurred, *multi-level techniques are required for the selection equation. How to do this in R. *The results of this selection equation would then be used as a control in the second stage where an OLS is to be run Thank you for any suggestions -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PROBIT REGRESSION FOR GROUPED/CLUSTERED DATA
Hello all I have been working to fix this for weeks now, It should be simple to fix. Please help Let me explain what I am doing, I have a data set for 65 countries over a period of 9 years (2000-2008). Each country has on an average say 2000 interviews, so that the total set has roughly 65*9*2000 data points/observations (of course there are missing vales as well). Now let me explain how are the data clustered or grouped. I use the variable yearctry which is computed as year*1+ international phone code of the country, say for example USA with calling code 001 for the year 2000 will have a yearctry value = 201. Under this particular value of yearctry of 201 there are roughly 2000 observations, next for the same year for say UK the yearctry value would be 244 (having roughly 2000 observations) , and similarly so on for the rest of the 63 countries for the year 2000 and all other years from 2000 to 2008. For say the year 2001, the values of yearctry for USA and UK would be 2001001 and 2001044 respectively (again 2000 obseravations for each country roughly) and so on for the other 63 countries as well. So the data set is *grouped/clustered using yearctry* I am trying to look into a selection bias if any within each yearctry (ie 2000 observation for one country for 9 years and so on for 65 countries) value, essentially therefore I wish to check for 65*9 values of yearctry with each yearctry having 2000 observations roughly. Hence I use the glm/probit to look into the selection bias where all my dependant variable s are either 0 or 1. The formula *myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = probit), data = adpopdata)* is the Heckman selection equation based on all observations without taking into account the fact that each yearctry is unique, I want the selection equation to recognise the uniqueness of each yearctry value , takes one yearctry at a time, estimates the probit, goes to the next yearctry repeats the probit regression and then give me the result. At the moment I do not accomplish that using the above formula. The above formula does regression on a bulk basis, but I wish that it recognises one yearctry from the other and then performs the regression for all yearctry values and finally produces me the result Is there any other model recommended that should do the job other than the glm???If Yes please help how? Let me give you the exact command that Stata uses, so that things become very clear: *xtprobit s age gender gemeduc gemhinc es_gdppc imf_pop estbbo_m, i(yearctry)* This does exactly what I wish to accomplish in R, ie does the heckman selection equation for the selection variables (seven in my case) based upon the uniqueness of yearctrty I have worked weeks on this, kindly help me, I think it is a small issue to fix in the equation, although since I am new to R, I do not exactly know what exactly will fix my problem, so any help will be highly appreciated Thanks -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DECLARING A PANEL VARIABLE???
Hi I am working on a panel data, my data are clustered/grouped by the variable yearctry, I am running the regression below, but I cant make the regression recognise yearctry as the panel variable in the regression myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = probit), data = adpopdata) Can anyone help me do this please??? Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ERROR message while using -invMillsRatio()
Hi I have been trying so many different things to get my Inverse Mills Ratio going for a Two stage Heckman Model, I have tried the following so far (the commands are listed below till teh point where I get an error), I get an error in the last sentence (marked in bold below), if this were successful then I could have used the IMR as a control in my OLS (which would be the OLS for the outcome equation), what I see is that the number of rows of IMR calculated and the number of rows in the actual data set do not match and hence IMR could not be added to my original data set, how do I fix this and then proceed to get correct IMR to use in my outcome equation, Can someone please help???/ load(C:\\adpopdata1.Rdata) ls() [1] adpopdata attach(adpopdata) names(adpopdata) [1] ctry setid gender ageyear [6] ownmge busang futsup discentknowent [11] opport suskillfearfail nbgoodcnbstatus [16] sunowjob suyr5job omnowjob omyr5job gemwork3 [21] suboanwbabybuso estbbuso suboan_m babybo_m [26] estbbo_m es_gdppc ief_fisc yearctry ief_ipr [31] wb_law wb_corrimf_popimf_pgro es_gdpch [36] sq_gdppc ln_oy5 gemedu gemhincipr_edu [41] ipr_hinc age_sq ln_oy5_a ln_oy5_1 ln_sy5 [46] ln_osy5ln_oy5_e X_est_full o_tmitro_citr [51] o_tdtr ln_oy5_2 invmills ln_onj ln_diff [56] omjoagro omjorgro ln_orgro ln_oagro sunowjo1 [61] osnowjob ln_osnow osyr5job osjorgro ln_osrgr [66] ln_osagr s_nacnew p phicapphi [71] invmil_4 subobaby fearf_mdisc_m dis_fear [76] fearf_rfearf_rm dis_ff_r s library(sampleSelection) myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc + + imf_pop + estbbo_m, family = binomial(link = probit)) summary(myProbit) Call: glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = probit)) Deviance Residuals: Min 1Q Median 3Q Max -0.7660 -0.3053 -0.2462 -0.1984 3.2166 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -2.439e+00 2.985e-02 -81.714 2e-16 *** age -6.436e-03 3.743e-04 -17.193 2e-16 *** genderMALE 1.785e-01 9.424e-03 18.945 2e-16 *** gemedu 2.128e-02 4.698e-03 4.528 5.95e-06 *** gemhinc 1.062e-01 6.185e-03 17.166 2e-16 *** es_gdppc 9.765e-06 1.400e-06 6.977 3.02e-12 *** imf_pop 4.131e-06 1.707e-05 0.2420.809 estbbo_m 5.932e+00 1.088e-01 54.539 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 84763 on 257357 degrees of freedom Residual deviance: 80515 on 257350 degrees of freedom (85893 observations deleted due to missingness) AIC: 80531 Number of Fisher Scoring iterations: 6 * adpopdata$IMR-invMillsRatio(myProbit)$IMR1 Error in `$-.data.frame`(`*tmp*`, IMR, value = c(2.50039945424535, : replacement has 257358 rows, data has 343251* -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Heckman Selection Model/Inverse Mills Ratio
I have so far used the following command glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = probit)) My question is 1. How do i discard the non significant selection variables (one out of the seven variables above is non-significant) and calculate the Inverse Mills Ratio of the significant variables 2. I need the inverse mills ratio from the above to run the outcome equation model using OLS with the Inverse mills ratio as the control for selection bias, kindly help, hence I need to get the IMR 3. How can this eb done in R using my concept or otherwise does there exist anotehr way of doing what I wish to achieve Please help Thanks# Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Modifying Data from a Panel for Probit Regression
Hi I have two questions 1. I am working on a panel data, the variable *A *has some values as missing denoted by . and others are non-missing values. I wish to create another variable *B *from the variable *A *such that all the missing values are assigned as ZERO (0) and all the non-missing values as ONE (1). This is preparation for running a probit regression with B as the dependant variable and hence we need 0B1. The expected outcome should look like below: * A**B *. 0 . 0 . 0 . 0 .6931472 1 . 0 . 0 . 0 .6931472 1 . 0 . 0 . 0 . 0 . 0 1.098612 1 2. How do I then see (what command) whether the desired outcome for B has occured or not, in other words how do we tabulate B? Kindly provide the commands/operations needed to attain the above -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLM for Probit for Panel Data
Hello I am working on a panel data, my panel variable is the variable yearctry, let me explain what I mean, yearctry is calculated based on the year and the ISD phone code of a country, eg, for the year 2000 say and for country USA say (code = 001), my yearctry variable will then be 201, there are 2000 observations (ie 2000 individual responses with yearctry = 201), I have 65 different countries in the survey and time span from 2000-2008 ie 9 years. My question is: How do I make R recognise yearctry as my panel variable when I am running a glm like the one below: glm(formula = s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = probit)) What changes/additions do I make to this above probit regression command such that the variable yearctry could be passed/assigned as the panel variable? Kindly help Thanks in advance Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getOptions(max.print) in R
I am typing the following on the command prompt: variab = read.csv(file.choose(), header=T) variab It lists 900,000 ( this is the total number of observations in variab ) minus 797124 observations and prompts the following message [ reached getOption(max.print) -- omitted 797124 entries ]] Is there a way to see the entire set of data, ie all of 900,000 obs, and how to then save this variab Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stata file and R Interaction :File Size Problem in Import
Hi I am using Stata 10 and I need to import a data set in stata 10 to R, I have saved the dataset in lower versions of Stata as well by using saveold command in Stata. My RAM is 4gb and the stata file is 600MB, I am getting an error message which says : Error: cannot allocate vector of size 3.4 Mb In addition: There were 50 or more warnings (use warnings() to see the first 50) Thus far I have already tried the following 1. By right clicking on the R icon I have used --max-mem-size=1000M in the target under properties of the R icon 2. I have used library(foreign) at teh command prompt 3. then I use trialfile - read.dta(C:/filename.dta) Here I get error for a Stata data file that is 600MB in size, however, with data set in Stata 10 and Stata 8 of the size of 200KB, I have successfully being able to import the stata file in R I am therefor confused whteher there is problem with the version of my stata file (which should not eb the case as I the smaller file of both versions are working fine) or is it the size issue, Its pretty important for me, kindly address this question Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stata file Import and Analysis in R
Hi I have a stata data set (.dta file) of size 600 MB, I need to import it in R and do a 2SLS multilevel analysis on the data set. I would eb grateful if help is provided for the first part of how to import this big file from Stata to R and then how to open the imported Stata file in R? Kindly help Thanks in advance -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Large Stata file Import in R
Hi I am using Stata 10 and I need to import a data set in stata 10 to R, I have saved the dataset in lower versions of Stata as well by using saveold command in Stata. My RAM is 4gb and the stata file is 600MB, I am getting an error message which says : Error: cannot allocate vector of size 3.4 Mb In addition: There were 50 or more warnings (use warnings() to see the first 50) Thus far I have already tried the following 1. By right clicking on the R icon I have used --max-mem-size=1000M in the target under properties of the R icon 2. I have used library(foreign) at teh command prompt 3. then I use trialfile - read.dta(C:/filename.dta) Here I get error for a Stata data file that is 600MB in size, however, with data set in Stata 10 and Stata 8 of the size of 200KB, I have successfully being able to import the stata file in R I am therefor confused whteher there is problem with the version of my stata file (which should not eb the case as I the smaller file of both versions are working fine) or is it the size issue, Its pretty important for me, kindly address this question Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stata file Import and Analysis in R
Hi I have a stata data set (.dta file) of size 600 MB, I need to import it in R and do a 2SLS multilevel analysis on the data set. I would eb grateful if help is provided for the first part of how to import this big file from Stata to R and then how to open the imported Stata file in R? Kindly help Thanks in advance -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Summing over an index of an array
Hi, I cannot seem to figure out how to sum over an index of a array. For example, let A be a 3 dimensional array. I want to, say, find the sum over the first dimension. That is S_jk = Sum_i A_ijk where now S is a 2-dim matrix. I dont want to use a loop. Thanks, -- saurav __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.