[R] amount of data R can handle in a single file

2011-02-17 Thread Nasila, Mark
Dear Sir/Madam,

 

  I would like to know what is the maximum number of observations a
single file must have when using R. I am asking this because am trying
to do research on banking transactions and i have around 49million
records. Can R handle this? Advise with regard to this.

 

 


 



Mark Nasila
Quantitative Analyst
CBS Risk Management

Personal Banking
7th Floor, 2 First Place,
Cnr Jeppe and Simmonds Street,
Johannesburg,
2000
Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118
e-mail mnas...@fnb.co.za mailto:mnas...@fnb.co.za 

www.fnb.co.za http://www.fnb.co.za/  www.howcanwehelpyou.co.za
http://www.howcanwehelpyou.co.za/ 

First National Bank - a division of FirstRand Bank Limited.
An Authorised Financial Services and Credit Provider (NCRCP20).

'Consider the effect on the environment before printing this email.'

 


To read FirstRand Bank's Disclaimer for this email click on the following 
address or copy into your Internet browser: 
https://www.fnb.co.za/disclaimer.html 

If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] amount of data R can handle in a single file

2011-02-17 Thread Claudia Beleites

On 02/17/2011 10:16 AM, Nasila, Mark wrote:

Dear Sir/Madam,



   I would like to know what is the maximum number of observations a
single file must have when using R. I am asking this because am trying

Dear Mark,


to do research on banking transactions and i have around 49million
records. Can R handle this? Advise with regard to this.

I think R can address up to a length of 2^32 ≈ 4.3e9 elements.
2^32 elements (numeric) = 32 GB per vector (matrix, array).

For me, the available RAM is the more important limit:
I work without problem with (numeric) matrices of size 2e5 x 250 = 5e7 elements 
(380 MB) that were produced from 5e4 x 2500 = 1.25e8 elements (≈ 1GB) raw data. 
The raw data is the practical limit on my 8 GB (64 bit linux) machine:
During the processing it becomes complex, thus ≈ 2 GB, and with that I had to be 
very careful not to copy the matrix too often. This and a bunch of gc() calls 
let me process the data without swapping. :-)
Note that 2 GB corresponds quite nicely to the rule of thumb that the end of fun 
is reached with variable sizes of 1/3 of the RAM.


If you are concerned about your data set, I'd recommend reading a fraction of 
the data set and have a look at the object.size() and also on how the RAM use is 
during data analysis of that partial data set. Then extrapolate to the complete 
data set.


HTH Claudia














Mark Nasila
Quantitative Analyst
CBS Risk Management

Personal Banking
7th Floor, 2 First Place,
Cnr Jeppe and Simmonds Street,
Johannesburg,
2000
Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118
e-mail mnas...@fnb.co.zamailto:mnas...@fnb.co.za

www.fnb.co.zahttp://www.fnb.co.za/   www.howcanwehelpyou.co.za
http://www.howcanwehelpyou.co.za/

First National Bank - a division of FirstRand Bank Limited.
An Authorised Financial Services and Credit Provider (NCRCP20).

'Consider the effect on the environment before printing this email.'




To read FirstRand Bank's Disclaimer for this email click on the following 
address or copy into your Internet browser:
https://www.fnb.co.za/disclaimer.html

If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbelei...@units.it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] amount of data R can handle in a single file

2011-02-17 Thread Prof Brian Ripley

On Thu, 17 Feb 2011, Nasila, Mark wrote:


Dear Sir/Madam,



 I would like to know what is the maximum number of observations a
single file must have when using R. I am asking this because am trying
to do research on banking transactions and i have around 49million
records. Can R handle this? Advise with regard to this.


Depends on the platform and how many fields there are in a record. 
(On a 64-bit platform we have handled databases of 70m records and 
about 30 fields: we did use a DBMS to store them, though: see the 'R 
Data Import/Export Manual'.)


OTOH, one could ask what extra useful information there is in 49m 
records over a 1% sample.  (In our case it was rare combinations, and 
we simply extracted those separately from the DBMS.)



Mark Nasila
Quantitative Analyst
CBS Risk Management

Personal Banking
7th Floor, 2 First Place,
Cnr Jeppe and Simmonds Street,
Johannesburg,
2000
Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118
e-mail mnas...@fnb.co.za mailto:mnas...@fnb.co.za

www.fnb.co.za http://www.fnb.co.za/  www.howcanwehelpyou.co.za
http://www.howcanwehelpyou.co.za/

First National Bank - a division of FirstRand Bank Limited.
An Authorised Financial Services and Credit Provider (NCRCP20).

'Consider the effect on the environment before printing this email.'




To read FirstRand Bank's Disclaimer for this email click on the following 
address or copy into your Internet browser:
https://www.fnb.co.za/disclaimer.html

If you are unable to access the Disclaimer, send a blank e-mail to
firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.