>>>>> Hervé Pagès <hpa...@fredhutch.org> >>>>> on Fri, 2 Jun 2017 04:05:15 -0700 writes:
> Hi, I have a long numeric vector 'xx' and I want to use > sum() to count the number of elements that satisfy some > criteria like non-zero values or values lower than a > certain threshold etc... > The problem is: sum() returns an NA (with a warning) if > the count is greater than 2^31. For example: >> xx <- runif(3e9) sum(xx < 0.9) > [1] NA Warning message: In sum(xx < 0.9) : integer > overflow - use sum(as.numeric(.)) > This already takes a long time and doing > sum(as.numeric(.)) would take even longer and require > allocation of 24Gb of memory just to store an intermediate > numeric vector made of 0s and 1s. Plus, having to do > sum(as.numeric(.)) every time I need to count things is > not convenient and is easy to forget. > It seems that sum() on a logical vector could be modified > to return the count as a double when it cannot be > represented as an integer. Note that length() already > does this so that wouldn't create a precedent. Also and > FWIW prod() avoids the problem by always returning a > double, whatever the type of the input is (except on a > complex vector). > I can provide a patch if this change sounds reasonable. This sounds very reasonable, thank you Hervé, for the report, and even more for a (small) patch. Martin > Cheers, H. > -- > Hervé Pagès > Program in Computational Biology Division of Public Health > Sciences Fred Hutchinson Cancer Research Center 1100 > Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA > 98109-1024 > E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: > (206) 667-1319 > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel