[R] amount of data R can handle in a single file
Dear Sir/Madam, I would like to know what is the maximum number of observations a single file must have when using R. I am asking this because am trying to do research on banking transactions and i have around 49million records. Can R handle this? Advise with regard to this. Mark Nasila Quantitative Analyst CBS Risk Management Personal Banking 7th Floor, 2 First Place, Cnr Jeppe and Simmonds Street, Johannesburg, 2000 Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 e-mail mnas...@fnb.co.za mailto:mnas...@fnb.co.za www.fnb.co.za http://www.fnb.co.za/ www.howcanwehelpyou.co.za http://www.howcanwehelpyou.co.za/ First National Bank - a division of FirstRand Bank Limited. An Authorised Financial Services and Credit Provider (NCRCP20). 'Consider the effect on the environment before printing this email.' To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] amount of data R can handle in a single file
On 02/17/2011 10:16 AM, Nasila, Mark wrote: Dear Sir/Madam, I would like to know what is the maximum number of observations a single file must have when using R. I am asking this because am trying Dear Mark, to do research on banking transactions and i have around 49million records. Can R handle this? Advise with regard to this. I think R can address up to a length of 2^32 ≈ 4.3e9 elements. 2^32 elements (numeric) = 32 GB per vector (matrix, array). For me, the available RAM is the more important limit: I work without problem with (numeric) matrices of size 2e5 x 250 = 5e7 elements (380 MB) that were produced from 5e4 x 2500 = 1.25e8 elements (≈ 1GB) raw data. The raw data is the practical limit on my 8 GB (64 bit linux) machine: During the processing it becomes complex, thus ≈ 2 GB, and with that I had to be very careful not to copy the matrix too often. This and a bunch of gc() calls let me process the data without swapping. :-) Note that 2 GB corresponds quite nicely to the rule of thumb that the end of fun is reached with variable sizes of 1/3 of the RAM. If you are concerned about your data set, I'd recommend reading a fraction of the data set and have a look at the object.size() and also on how the RAM use is during data analysis of that partial data set. Then extrapolate to the complete data set. HTH Claudia Mark Nasila Quantitative Analyst CBS Risk Management Personal Banking 7th Floor, 2 First Place, Cnr Jeppe and Simmonds Street, Johannesburg, 2000 Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 e-mail mnas...@fnb.co.zamailto:mnas...@fnb.co.za www.fnb.co.zahttp://www.fnb.co.za/ www.howcanwehelpyou.co.za http://www.howcanwehelpyou.co.za/ First National Bank - a division of FirstRand Bank Limited. An Authorised Financial Services and Credit Provider (NCRCP20). 'Consider the effect on the environment before printing this email.' To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] amount of data R can handle in a single file
On Thu, 17 Feb 2011, Nasila, Mark wrote: Dear Sir/Madam, I would like to know what is the maximum number of observations a single file must have when using R. I am asking this because am trying to do research on banking transactions and i have around 49million records. Can R handle this? Advise with regard to this. Depends on the platform and how many fields there are in a record. (On a 64-bit platform we have handled databases of 70m records and about 30 fields: we did use a DBMS to store them, though: see the 'R Data Import/Export Manual'.) OTOH, one could ask what extra useful information there is in 49m records over a 1% sample. (In our case it was rare combinations, and we simply extracted those separately from the DBMS.) Mark Nasila Quantitative Analyst CBS Risk Management Personal Banking 7th Floor, 2 First Place, Cnr Jeppe and Simmonds Street, Johannesburg, 2000 Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 e-mail mnas...@fnb.co.za mailto:mnas...@fnb.co.za www.fnb.co.za http://www.fnb.co.za/ www.howcanwehelpyou.co.za http://www.howcanwehelpyou.co.za/ First National Bank - a division of FirstRand Bank Limited. An Authorised Financial Services and Credit Provider (NCRCP20). 'Consider the effect on the environment before printing this email.' To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.