Re: [R] how to use large data set ?

2006-07-20 Thread bogdan romocea
By far, the cheapest and easiest solution (and the very first to try)
is to add more memory. The cost depends on what kind you need, but
here's for example 2 GB you can buy for only $150:
http://www.newegg.com/Product/Product.asp?Item=N82E16820144157

Project constraints?! If they don't want to spend a couple hundred USD
for memory, you're working on the wrong project (and/or for the wrong
organization). Buying more memory (say up to a few GB) is orders of
magnitude cheaper than the licenses for some proprietary software that
can get around memory constraints, and probably (much) cheaper than
the loss of productivity caused by the extra training and setup time
needed to try to implement an alternative solution (such as a
connection to a DBMS). And even if the extra memory needed for R were
as expensive as the license for a proprietary software, which choice
would be more reasonable?


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of mahesh r
 Sent: Wednesday, July 19, 2006 4:23 PM
 To: r-help@stat.math.ethz.ch
 Subject: Re: [R] how to use large data set ?

 Hi,
 I would like to extend to the query posted earlier on using large data
 bases. I am trying to use Rgdal to mine within the remote
 sensing imageries.
 I dont have problems bring the images within the R
 environment. But when I
 try to convert the images to a data.frame I receive an
 warning message from
 R saying 1: Reached total allocation of 510Mb: see
 help(memory.size) and
 the process terminates. Due to project constarints I am given a very
 old 2.4Ghz computer with only 512 MB RAM. I think what R is currently
 doing is
 trying to store the results in the RAM and since the image
 size is very big
 (some 9 million pixels), I think it gets out of memory.

 My question is
 1. Is there any possibility to dump the temporary variables
 in a temp folder
 within the hard disk (as many softwares do) instead of leting
 R store them
 in RAM
 2. Could this be possible without creating a connection to a
 any back hand
 database like Oracle.

 Thanks,

 Mahesh


 On 7/19/06, Greg Snow [EMAIL PROTECTED] wrote:
 
  You did not say what analysis you want to do, but many
 common analyses
  can be done as special cases of regression models and you
 can use the
  biglm package to do regression models.
 
  Here is an example that worked for me to get the mean and standard
  deviation by day from an oracle database with over 23
 million rows (I
  had previously set up 'edw' as an odbc connection to the
 database under
  widows, any of the database connections packages should work for you
  though):
 
  library(RODBC)
  library(biglm)
 
  con - odbcConnect('edw',uid='glsnow',pwd=pass)
 
  odbcQuery(con, select ADMSN_WEEKDAY_CD, LOS_DYS from
 CM.CASEMIX_SMRY)
 
  t1 - Sys.time()
 
  tmp - sqlGetResults(con, max=10)
 
  names(tmp) - c(Day,LoS)
  tmp$Day - factor(tmp$Day, levels=as.character(1:7))
  tmp - na.omit(tmp)
  tmp - subset(tmp, LoS  0)
 
  ff - log(LoS) ~ Day
 
  fit - biglm(ff, tmp)
 
  i - nrow(tmp)
  while( !is.null(nrow( tmp - sqlGetResults(con, max=10) ) ) ){
  names(tmp) - c(Day,LoS)
  tmp$Day - factor(tmp$Day, levels=as.character(1:7))
  tmp - na.omit(tmp)
  tmp - subset(tmp, LoS  0)
 
  fit - update(fit,tmp)
 
  i - i + nrow(tmp)
  cat(format(i,big.mark=','), rows processed\n)
  }
 
  summary(fit)
 
  t2 - Sys.time()
 
  t2-t1
 
  Hope this helps,
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  [EMAIL PROTECTED]
  (801) 408-8111
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
 Yohan CHOUKROUN
  Sent: Wednesday, July 19, 2006 9:42 AM
  To: 'r-help@stat.math.ethz.ch'
  Subject: [R] how to use large data set ?
 
  Hello R users,
 
 
 
  Sorry for my English, i'm French.
 
 
 
  I want to use a large dataset (3 millions of rows and 70 var) but I
  don't know how to do because my computer crash quickly (P4
 2.8Ghz, 1Go
  ).
 
  I have also a bi Xeon with 2Go so I want to do computation on this
  computer and show the results on mine. Both of them are on
 Windows XP...
 
 
 
  To do shortly I have:
 
 
 
  1 server with a MySQL database
 
  1computer
 
  and I want to use them with a large dataset.
 
 
 
  I'm trying to use RDCOM to connect the database and
 installing (but it's
  hard for me..) Rpad.
 
 
 
  Is there another solutions ?
 
 
 
  Thanks in advance
 
 
 
 
 
  Yohan C.
 
 
 
 
 --
  Ce message est confidentiel. Son contenu ne represente en
 aucun cas un
  engagement de la part du Groupe Soft Computing sous reserve de tout
  accord conclu par ecrit entre vous et le Groupe Soft
 Computing. Toute
  publication, utilisation ou diffusion, meme partielle, doit etre
  autorisee prealablement.
  Si vous n'etes pas destinataire de ce message, merci d'en avertir
  immediatement

[R] how to use large data set ?

2006-07-19 Thread Yohan CHOUKROUN
Hello R users,

 

Sorry for my English, i'm French.

 

I want to use a large dataset (3 millions of rows and 70 var) but I don't
know how to do because my computer crash quickly (P4 2.8Ghz, 1Go ).

I have also a bi Xeon with 2Go so I want to do computation on this computer
and show the results on mine. Both of them are on Windows XP...

 

To do shortly I have: 

 

1 server with a MySQL database

1computer

and I want to use them with a large dataset. 

 

I'm trying to use RDCOM to connect the database and installing (but it's
hard for me..) Rpad.

 

Is there another solutions ?

 

Thanks in advance

 

 

Yohan C.



--
Ce message est confidentiel. Son contenu ne represente en aucun cas un
engagement de la part du Groupe Soft Computing sous reserve de tout accord
conclu par ecrit entre vous et le Groupe Soft Computing. Toute publication,
utilisation ou diffusion, meme partielle, doit etre autorisee prealablement.
Si vous n'etes pas destinataire de ce message, merci d'en avertir
immediatement l'expediteur. 
This message is confidential. Its content does not constitute a commitment
by Soft Computing Group except where provided for in a written agreement
between you and Soft Computing Group. Any unauthorised disclosure, use or
dissemination, either whole or partial, is prohibited. If you are not the
intended recipient of this message, please notify the sender immediately. 
-- 



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to use large data set ?

2006-07-19 Thread Greg Snow
You did not say what analysis you want to do, but many common analyses
can be done as special cases of regression models and you can use the
biglm package to do regression models.

Here is an example that worked for me to get the mean and standard
deviation by day from an oracle database with over 23 million rows (I
had previously set up 'edw' as an odbc connection to the database under
widows, any of the database connections packages should work for you
though):

library(RODBC)
library(biglm)

con - odbcConnect('edw',uid='glsnow',pwd=pass)

odbcQuery(con, select ADMSN_WEEKDAY_CD, LOS_DYS from CM.CASEMIX_SMRY)

t1 - Sys.time()

tmp - sqlGetResults(con, max=10)

names(tmp) - c(Day,LoS)
tmp$Day - factor(tmp$Day, levels=as.character(1:7))
tmp - na.omit(tmp)
tmp - subset(tmp, LoS  0)

ff - log(LoS) ~ Day

fit - biglm(ff, tmp)

i - nrow(tmp)
while( !is.null(nrow( tmp - sqlGetResults(con, max=10) ) ) ){
names(tmp) - c(Day,LoS)
tmp$Day - factor(tmp$Day, levels=as.character(1:7))
tmp - na.omit(tmp)
tmp - subset(tmp, LoS  0)

fit - update(fit,tmp)

i - i + nrow(tmp)
cat(format(i,big.mark=','), rows processed\n)
}

summary(fit)

t2 - Sys.time()

t2-t1
 
Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Yohan CHOUKROUN
Sent: Wednesday, July 19, 2006 9:42 AM
To: 'r-help@stat.math.ethz.ch'
Subject: [R] how to use large data set ?

Hello R users,

 

Sorry for my English, i'm French.

 

I want to use a large dataset (3 millions of rows and 70 var) but I
don't know how to do because my computer crash quickly (P4 2.8Ghz, 1Go
).

I have also a bi Xeon with 2Go so I want to do computation on this
computer and show the results on mine. Both of them are on Windows XP...

 

To do shortly I have: 

 

1 server with a MySQL database

1computer

and I want to use them with a large dataset. 

 

I'm trying to use RDCOM to connect the database and installing (but it's
hard for me..) Rpad.

 

Is there another solutions ?

 

Thanks in advance

 

 

Yohan C.



--
Ce message est confidentiel. Son contenu ne represente en aucun cas un
engagement de la part du Groupe Soft Computing sous reserve de tout
accord conclu par ecrit entre vous et le Groupe Soft Computing. Toute
publication, utilisation ou diffusion, meme partielle, doit etre
autorisee prealablement.
Si vous n'etes pas destinataire de ce message, merci d'en avertir
immediatement l'expediteur. 
This message is confidential. Its content does not constitute a
commitment by Soft Computing Group except where provided for in a
written agreement between you and Soft Computing Group. Any unauthorised
disclosure, use or dissemination, either whole or partial, is
prohibited. If you are not the intended recipient of this message,
please notify the sender immediately. 
-- 



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to use large data set ?

2006-07-19 Thread mahesh r
Hi,
I would like to extend to the query posted earlier on using large data
bases. I am trying to use Rgdal to mine within the remote sensing imageries.
I dont have problems bring the images within the R environment. But when I
try to convert the images to a data.frame I receive an warning message from
R saying 1: Reached total allocation of 510Mb: see help(memory.size) and
the process terminates. Due to project constarints I am given a very
old 2.4Ghz computer with only 512 MB RAM. I think what R is currently
doing is
trying to store the results in the RAM and since the image size is very big
(some 9 million pixels), I think it gets out of memory.

My question is
1. Is there any possibility to dump the temporary variables in a temp folder
within the hard disk (as many softwares do) instead of leting R store them
in RAM
2. Could this be possible without creating a connection to a any back hand
database like Oracle.

Thanks,

Mahesh


On 7/19/06, Greg Snow [EMAIL PROTECTED] wrote:

 You did not say what analysis you want to do, but many common analyses
 can be done as special cases of regression models and you can use the
 biglm package to do regression models.

 Here is an example that worked for me to get the mean and standard
 deviation by day from an oracle database with over 23 million rows (I
 had previously set up 'edw' as an odbc connection to the database under
 widows, any of the database connections packages should work for you
 though):

 library(RODBC)
 library(biglm)

 con - odbcConnect('edw',uid='glsnow',pwd=pass)

 odbcQuery(con, select ADMSN_WEEKDAY_CD, LOS_DYS from CM.CASEMIX_SMRY)

 t1 - Sys.time()

 tmp - sqlGetResults(con, max=10)

 names(tmp) - c(Day,LoS)
 tmp$Day - factor(tmp$Day, levels=as.character(1:7))
 tmp - na.omit(tmp)
 tmp - subset(tmp, LoS  0)

 ff - log(LoS) ~ Day

 fit - biglm(ff, tmp)

 i - nrow(tmp)
 while( !is.null(nrow( tmp - sqlGetResults(con, max=10) ) ) ){
 names(tmp) - c(Day,LoS)
 tmp$Day - factor(tmp$Day, levels=as.character(1:7))
 tmp - na.omit(tmp)
 tmp - subset(tmp, LoS  0)

 fit - update(fit,tmp)

 i - i + nrow(tmp)
 cat(format(i,big.mark=','), rows processed\n)
 }

 summary(fit)

 t2 - Sys.time()

 t2-t1

 Hope this helps,

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 [EMAIL PROTECTED]
 (801) 408-8111


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Yohan CHOUKROUN
 Sent: Wednesday, July 19, 2006 9:42 AM
 To: 'r-help@stat.math.ethz.ch'
 Subject: [R] how to use large data set ?

 Hello R users,



 Sorry for my English, i'm French.



 I want to use a large dataset (3 millions of rows and 70 var) but I
 don't know how to do because my computer crash quickly (P4 2.8Ghz, 1Go
 ).

 I have also a bi Xeon with 2Go so I want to do computation on this
 computer and show the results on mine. Both of them are on Windows XP...



 To do shortly I have:



 1 server with a MySQL database

 1computer

 and I want to use them with a large dataset.



 I'm trying to use RDCOM to connect the database and installing (but it's
 hard for me..) Rpad.



 Is there another solutions ?



 Thanks in advance





 Yohan C.



 --
 Ce message est confidentiel. Son contenu ne represente en aucun cas un
 engagement de la part du Groupe Soft Computing sous reserve de tout
 accord conclu par ecrit entre vous et le Groupe Soft Computing. Toute
 publication, utilisation ou diffusion, meme partielle, doit etre
 autorisee prealablement.
 Si vous n'etes pas destinataire de ce message, merci d'en avertir
 immediatement l'expediteur.
 This message is confidential. Its content does not constitute a
 commitment by Soft Computing Group except where provided for in a
 written agreement between you and Soft Computing Group. Any unauthorised
 disclosure, use or dissemination, either whole or partial, is
 prohibited. If you are not the intended recipient of this message,
 please notify the sender immediately.
 --



 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] how to use large data set ?

2006-07-19 Thread Dylan Beaudette
Hi,

While R is generally flexible enough for just about anything you can throw at 
it, detailed analysis of imagery might be better accomplished in a 
specialized piece of software. One option might be GRASS, which would allow 
you to do further processing on a subset of the original data in R.

Cheers,

Dylan

On Wednesday 19 July 2006 13:22, mahesh r wrote:
 Hi,
 I would like to extend to the query posted earlier on using large data
 bases. I am trying to use Rgdal to mine within the remote sensing
 imageries. I dont have problems bring the images within the R environment.
 But when I try to convert the images to a data.frame I receive an warning
 message from R saying 1: Reached total allocation of 510Mb: see
 help(memory.size) and the process terminates. Due to project constarints I
 am given a very old 2.4Ghz computer with only 512 MB RAM. I think what R is
 currently doing is
 trying to store the results in the RAM and since the image size is very big
 (some 9 million pixels), I think it gets out of memory.

 My question is
 1. Is there any possibility to dump the temporary variables in a temp
 folder within the hard disk (as many softwares do) instead of leting R
 store them in RAM
 2. Could this be possible without creating a connection to a any back hand
 database like Oracle.

 Thanks,

 Mahesh

 On 7/19/06, Greg Snow [EMAIL PROTECTED] wrote:
  You did not say what analysis you want to do, but many common analyses
  can be done as special cases of regression models and you can use the
  biglm package to do regression models.
 
  Here is an example that worked for me to get the mean and standard
  deviation by day from an oracle database with over 23 million rows (I
  had previously set up 'edw' as an odbc connection to the database under
  widows, any of the database connections packages should work for you
  though):
 
  library(RODBC)
  library(biglm)
 
  con - odbcConnect('edw',uid='glsnow',pwd=pass)
 
  odbcQuery(con, select ADMSN_WEEKDAY_CD, LOS_DYS from CM.CASEMIX_SMRY)
 
  t1 - Sys.time()
 
  tmp - sqlGetResults(con, max=10)
 
  names(tmp) - c(Day,LoS)
  tmp$Day - factor(tmp$Day, levels=as.character(1:7))
  tmp - na.omit(tmp)
  tmp - subset(tmp, LoS  0)
 
  ff - log(LoS) ~ Day
 
  fit - biglm(ff, tmp)
 
  i - nrow(tmp)
  while( !is.null(nrow( tmp - sqlGetResults(con, max=10) ) ) ){
  names(tmp) - c(Day,LoS)
  tmp$Day - factor(tmp$Day, levels=as.character(1:7))
  tmp - na.omit(tmp)
  tmp - subset(tmp, LoS  0)
 
  fit - update(fit,tmp)
 
  i - i + nrow(tmp)
  cat(format(i,big.mark=','), rows processed\n)
  }
 
  summary(fit)
 
  t2 - Sys.time()
 
  t2-t1
 
  Hope this helps,
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  [EMAIL PROTECTED]
  (801) 408-8111
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Yohan CHOUKROUN
  Sent: Wednesday, July 19, 2006 9:42 AM
  To: 'r-help@stat.math.ethz.ch'
  Subject: [R] how to use large data set ?
 
  Hello R users,
 
 
 
  Sorry for my English, i'm French.
 
 
 
  I want to use a large dataset (3 millions of rows and 70 var) but I
  don't know how to do because my computer crash quickly (P4 2.8Ghz, 1Go
  ).
 
  I have also a bi Xeon with 2Go so I want to do computation on this
  computer and show the results on mine. Both of them are on Windows XP...
 
 
 
  To do shortly I have:
 
 
 
  1 server with a MySQL database
 
  1computer
 
  and I want to use them with a large dataset.
 
 
 
  I'm trying to use RDCOM to connect the database and installing (but it's
  hard for me..) Rpad.
 
 
 
  Is there another solutions ?
 
 
 
  Thanks in advance
 
 
 
 
 
  Yohan C.
 
 
 
  --
  Ce message est confidentiel. Son contenu ne represente en aucun cas un
  engagement de la part du Groupe Soft Computing sous reserve de tout
  accord conclu par ecrit entre vous et le Groupe Soft Computing. Toute
  publication, utilisation ou diffusion, meme partielle, doit etre
  autorisee prealablement.
  Si vous n'etes pas destinataire de ce message, merci d'en avertir
  immediatement l'expediteur.
  This message is confidential. Its content does not constitute a
  commitment by Soft Computing Group except where provided for in a
  written agreement between you and Soft Computing Group. Any unauthorised
  disclosure, use or dissemination, either whole or partial, is
  prohibited. If you are not the intended recipient of this message,
  please notify the sender immediately.
  --
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self