Run Rprof on your script that is updating the dataframe. A dataframe is a list and everytime you access something in the list it can be expensive. Rprof will probably show that a lot of time is spent in the function "[[" which is accessing portions of the dataframe. Vectors are much faster because they are typically sequentially in memory and can be accessed easily. Rprof is always helpful in answering the question of "why is something taking so long". It helps you to find where the potential bottlenecks are.
On Wed, Oct 15, 2008 at 7:33 AM, Tom La Bone <[EMAIL PROTECTED]> wrote: > > I want to thank everyone for the help. I ended up having to use a loop to > assign values from the table to NinYear. However, as I have played with the > full datasets I have noticed that R is MUCH faster if I use vectors in the > loop rather than columns of a dataframe. In the specific case of 43,000 > lines of data, assigning values from the table to the 43,000 elements of a > vector took 6 seconds whereas assigning values from the table to 43,000 > elements of a dataframe took 21 minutes. Why is there such a huge > difference? > > Tom > > > > > Tom La Bone wrote: >> >> Assume that I have the dataframe "data1", which is listed at the end of >> this message. I want count the number of lines that each person has for >> each year. For example, the person with ID=213 has 15 entries (NinYear) >> for 1953. The following bit of code calculates NinYear: >> >> for (i in 1:length(data1$ID)) { >> data1$NinYear[i] <- length(data1[data1$Year==data1$Year[i] & >> data1$ID==data1$ID[i],1]) } >> >> This seems to work but is horribly slow (some files I am working with have >> over 500,000 lines). Can anyone suggest a faster way of doing this, >> perhaps a way that does not use a for loop? Thanks. >> >> Tom >> >> ID Year NinYear >> 209 1971 0 >> 209 1971 0 >> 213 1951 0 >> 213 1951 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1953 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1954 0 >> 213 1955 0 >> 213 1955 0 >> 234 1953 0 >> 234 1953 0 >> 234 1953 0 >> 234 1953 0 >> 234 1953 0 >> 234 1958 0 >> 234 1958 0 >> 234 1965 0 >> 234 1965 0 >> 234 1965 0 >> 249 1952 0 >> 249 1952 0 >> >> >> >> > > -- > View this message in context: > http://www.nabble.com/Doing-a-Task-Without-Using-a-For-Loop-tp19974078p19991682.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.