Giacomo, Please include some representative data. It is not clear why your offset of 4 (z$cod[i - 4]) is going to be an accurate surrogate for complete data.
Since I do not have your data set or its true structure I am having to guess. # make 5 copies of 200 companies companies <- paste0(rep(LETTERS[1:4], 5, each = 50), rep(1:50, 5)) companies <- companies[order(companies)] years <- rep(1:5, 200) z <- data.frame(cod = companies, year = years, revenue = round(rnorm(1000, mean = 100000, sd = 10000))) # trim this down to the 728 rows you have by pulling out records at random set.seed(1) # so that you can repeat these results z <- z[sample.int(1000, 728), ] z <- z[order(z$cod, z$year), ] #No matter how you order these data, your offset approach will not tell you which companies have full records. > head(z, 10) cod year revenue 1 A1 1 112192 2 A1 2 105840 4 A1 4 112357 5 A1 5 91772 7 A10 2 102601 8 A10 3 105183 11 A11 1 101269 12 A11 2 100719 14 A11 4 86138 15 A11 5 105044 #You can do something like the following. counts <- table(z$cod) complete <- names(counts[as.integer(counts) == 5]) # It is probably better to keep the dummy variable inside the dataframe. z$complete <- ifelse(z$cod %in% complete, TRUE, FALSE) > head(z, 20) cod year revenue complete 1 A1 1 112192 FALSE 2 A1 2 105840 FALSE 4 A1 4 112357 FALSE 5 A1 5 91772 FALSE 7 A10 2 102601 FALSE 8 A10 3 105183 FALSE 11 A11 1 101269 FALSE 12 A11 2 100719 FALSE 14 A11 4 86138 FALSE 15 A11 5 105044 FALSE 20 A12 5 95872 FALSE 21 A13 1 78513 TRUE 22 A13 2 90502 TRUE 23 A13 3 108683 TRUE 24 A13 4 110711 TRUE 25 A13 5 87842 TRUE 28 A14 3 99939 FALSE 30 A14 5 111289 FALSE 31 A15 1 100930 FALSE 32 A15 2 93765 FALSE > Do not use HTML. Use plain text. The character string "//" is not a comment indicator in R. Do not use attach(). It does not do anything in your example, but it is poor practice. Always write out TRUE and FALSE R. Mark Sharp, Ph.D. msh...@txbiomed.org > On Jun 24, 2015, at 1:26 PM, giacomo begnis <gmbeg...@yahoo.it> wrote: > > Hi, I have a dataset (728 obs) containing three variables code of a company, > year and revenue. Some companies have a complete history of 5 years, others > have not a complete history (for instance observations for three or four > years).I would like to determine the companies with a complete history using > a dummy variables.I have written the following program but there is somehting > wrong because the dummy variable that I have create is always equal to > zero.Can somebody help me?Thanks, gm > > z<-read.table(file="c:/Rp/cddat.txt", sep="", header=T) > attach(z) > n<-length(z$cod) // number of obs dataset > > d1<-numeric(n) // dummy variable > > for (i in 5:n) { > if (z$cod[i]==z$cod[i-4]) // cod is the code of a company > { d1[i]<=1} else { d1[i]<=0} // d1=1 for a company with > complete history, d1=0 if the history is not complete }d1 > When I run the program d1 is always equal to zero. Why? > Once I have create the dummy variable with subset I obtains the code of the > companies with a complete history and finally with a merge I determine a > panel of companies with a complete history.But how to determine correctly > d1?My best regards, gm > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.