If you look, half of the time is spent in the 'findSubsets" function and the other half in determining where the differences are in the sets. Is there a faster way of doing what findSubsets does since it is the biggest time consumer. The setdiff might be speeded up by using 'match'.
On Wed, Jun 27, 2012 at 12:51 PM, Adrian Duşa <[email protected]> wrote: > Hi Jim, > > On Wed, Jun 27, 2012 at 7:27 PM, jim holtman <[email protected]> wrote: >> One place to start is to use Rprof to see where time is being spent. >> I used the sample you sent and this is what I got: >> >> >> 0 16.7 root >> 1. 16.2 system.time >> 2. . 16.1 testfoo >> 3. . . 16.1 setdiff >> 4. . . . 8.2 as.vector >> 5. . . . . 8.2 findSubsets >> 6. . . . . . 6.4 increment >> 7. . . . . . . 4.2 as.vector >> 8. . . . . . . . 3.6 outer >> 9. . . . . . . . . 0.3 rep.int >> 7. . . . . . . 1.6 c >> 7. . . . . . . 0.2 max >> 4. . . . 7.9 unique >> 5. . . . . 7.3 match >> 5. . . . . 0.3 unique.default >> 1. 0.5 sort >> 2. . 0.5 standardGeneric >> 3. . . 0.3 sample >> 3. . . 0.2 sort >> 4. . . . 0.2 sort.default >> 5. . . . . 0.2 sort.int >> >> Of the 16.7 seconds to execute the code, 16.1 was taken up in >> 'setdiff'. Maybe there is some other way you can determine the >> difference. So if you continue to use 'setdiff', it does not look >> like there is much that can be done. > > One thing to notice is that setdiff() is part of the while() loop. > > I could in principle loop over the entire vector and eliminate (all) > the derived numbers at the end, but I have a hunch it might take even > longer. The point of setdiff() was to progressively shorten the vector > in order to minimize the time spent in the loop. On the other hand, > setdiff() overwrites the vector at each iteration and that of course > also takes time. > > I thought a C program might prove to be faster (because of the faster > looping over each value in the vector), but although it works just > fine it seems I am unable to properly use C, given the similar long > time spent (probably because of toying with the memory too much). > > Well, any other quicker alternative would do... > Thanks, > Adrian > > -- > Adrian Dusa > Romanian Social Data Archive > 1, Schitu Magureanu Bd. > 050025 Bucharest sector 5 > Romania > Tel.:+40 21 3126618 \ > +40 21 3120210 / int.101 > Fax: +40 21 3158391 -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

