Re: [R] Database connection query
On Feb 9, 2015, at 4:33 AM, Lalitha Kristipati lalitha.kristip...@techmahindra.com wrote: Hi, I would like to know when to use drivers and when to use packages to connect to databases in R Regards, Lalitha Kristipati Associate Software Engineer In general, you will need both. There is more information in the R Data Import/Export manual: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Relational-databases and there is a SIG list for R and DB specific subject matter: https://stat.ethz.ch/mailman/listinfo/r-sig-db Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database connectivity
RODBC On Mon, Apr 16, 2012 at 11:40 AM, Partha Sinha pnsinh...@gmail.com wrote: Dear All please let me know how to connect SQL server from R? which all packages I will require for this? Thanks Parth __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database
I don't use Access but my general impression is that the advantages it brings will be similar to those brought by any other database: performance rather than ability -- they are both Turing complete after all, after some trickery on the SQL end. Databases allow much larger data sets than R currently does and often allow faster queries -- some would argue the SQL syntax is clearer for some subsetting operations, but that's perhaps a function of familiarity. For the task you describe, it should be elementary in both platforms and I'd just use whichever one the data was already in. For more substantive data analysis, you almost certainly want to use R. Others with Access experience (or more SQL) can add more. Michael On Tue, Feb 28, 2012 at 6:06 PM, Trying To learn again tryingtolearnag...@gmail.com wrote: Hi all, I´m new using Access. I see that many things that you can do on Access you can do on CRAN R but not on contrary. My question is: Is there any manual with examples comparing how to do data base analysis on access and making the same on CRAN R? Imagine I want to compare two columns Name of two different data bases. I want to see if there are identical names on both files. It is better to use Access? Or it is better to use cran r (importing data and work on CRAN R)? This is only an example. I know CRAN R is more specialized on statistics and data analysis but I ´m trying not to learn Access and SQL so on. I cannot explain better I hope you comprehed me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database
On 29/02/2012 12:45, R. Michael Weylandt wrote: I don't use Access but my general impression is that the advantages it brings will be similar to those brought by any other database: performance rather than ability -- they are both Turing complete after all, after some trickery on the SQL end. Databases allow much larger data sets than R currently does and often But not much larger in Access than 64-bit R allows: Access is a pretty limited system. allow faster queries -- some would argue the SQL syntax is clearer for some subsetting operations, but that's perhaps a function of familiarity. For the task you describe, it should be elementary in both platforms and I'd just use whichever one the data was already in. For more substantive data analysis, you almost certainly want to use R. Others with Access experience (or more SQL) can add more. Access does not bring the performance benefits of more advanced DBMS engines: on Windows I would certainly recommend using SQL Server Express (or whatever it is currently called) instead. If you want to use a DBMS to supplement R (as per the R Data Import/Export manual) I would use MySQL or SQLite. Michael On Tue, Feb 28, 2012 at 6:06 PM, Trying To learn again tryingtolearnag...@gmail.com wrote: Hi all, I´m new using Access. I see that many things that you can do on Access you can do on CRAN R but not on contrary. My question is: Is there any manual with examples comparing how to do data base analysis on access and making the same on CRAN R? Imagine I want to compare two columns Name of two different data bases. I want to see if there are identical names on both files. 'files'? Where are the data? If you mean different tables in one Access database then I would still do this in R via RODBC. It is better to use Access? Or it is better to use cran r (importing data and work on CRAN R)? This is only an example. I know CRAN R is more specialized on statistics and data analysis but I ´m trying not to learn Access and SQL so on. In which case you can simply use RODBC to import tables to R and work there. I cannot explain better I hope you comprehed me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database
On Tue, Feb 28, 2012 at 6:06 PM, Trying To learn again tryingtolearnag...@gmail.com wrote: Hi all, I´m new using Access. I see that many things that you can do on Access you can do on CRAN R but not on contrary. My question is: Is there any manual with examples comparing how to do data base analysis on access and making the same on CRAN R? The examples section at the bottom of ?sqldf in the sqldf package shows how to do a variety of calculations in both SQL and in R. Also see the home page at http://sqldf.googlecode.com -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database
It all depends on what you are doing but R is pretty powerful. I have never used Access so I don't know what it can do but I have played around with othe dbs at a very basic level and most things I did could be done quite easily in R : Sheer data set size could be a problem but unless you have millions of data items you are probably okay in R. For your example. Assume nams1 is from dataset 1 and nams 2 is from dataset 2. = nams1 - letters[1:5] nams2 - letters[3:7] nams1 %in% nams2 = Done. John Kane Kingston ON Canada -Original Message- From: tryingtolearnag...@gmail.com Sent: Wed, 29 Feb 2012 00:06:08 +0100 To: r-help@r-project.org Subject: [R] Database Hi all, I4m new using Access. I see that many things that you can do on Access you can do on CRAN R but not on contrary. My question is: Is there any manual with examples comparing how to do data base analysis on access and making the same on CRAN R? Imagine I want to compare two columns Name of two different data bases. I want to see if there are identical names on both files. It is better to use Access? Or it is better to use cran r (importing data and work on CRAN R)? This is only an example. I know CRAN R is more specialized on statistics and data analysis but I 4m trying not to learn Access and SQL so on. I cannot explain better I hope you comprehed me. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Share photos screenshots in seconds... TRY FREE IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if1 Works in all emails, instant messengers, blogs, forums and social networks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Database abstraction
I was also wondering about precedents or similar packages in the R ecosytem. I'm familiar with R-ORM and SQLDF, which seem closest. At this point any and all feedback and critiques are extremely welcome! I imagine other frequent database users have had problems much the same as mine. I'd like to release this in the near future. I've been using Roxygen for documentation, but support for annotations on R5 methods isn't great... Thanks all. - Original Message - From: Rob Tirrell r...@stanford.edu To: r-help@r-project.org Sent: Sunday, February 27, 2011 2:38:18 AM Subject: [R] Database abstraction Hi all - I've been working on a lite ORM and database abstraction package for R. Formatting complex queries by hand has always been an error-prone hassle, so I've tried to do away with that as much as possible, instead, using R objects to represent elements of a database system (statements, clauses, operators, functions, fields, tables, etc.). R5 classes have made the development of this package somewhat easier than it would have been in the past (in my view, at least). Then an expression like: query(some_table$some_field)$where(some_table$some_field != NA some_table$some_other_field %in% c(1, 2, 3)) would generate SQL like: SELECT `database`.`some_table`.`some_field` AS `some_field` WHERE `database`.`some_table`.`some_field` IS NOT NULL AND `database`.`some_table`.`some_other_field` IN (1, 2, 3); The basic features: - Abstraction of tables and fields to R5 objects and use of R functions where a clear parallel in SQL exists (e.g. %in% - IN, unique - DISTINCT). - Result proxying (by a data.frame-like object), which allows on-demand fetching and live updates to results (where appropriate). - Session management. It's my feeling that most R users care more about using existing tables than creating new ones, so introspection is more of a focus than is defining new relations. It's reasonably far along, and I've been using it for my own projects for some time. However, certainly much syntax is not supported, and I've only written a compiler for MySQL. At this point I'd welcome anyone interesting in contributing or offering advice. The code is (I think) designed as to support the addition of more complex logic at any stage (generation, preparation, compilation, formatting) withouth becoming spaghettized. I've written a reasonably-sized test suite that covers most of the functionality. The code is hosted on github - https://github.com/rtirrell/databasr. Feel free to fork, critique and/or use. I've taken some pains to ensure that interface will remain relatively stable. Thanks! -- Robert Tirrell | r...@stanford.edu | (607) 437-6532 Program in Biomedical Informatics | Butte Lab | Stanford University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram On Thursday, September 11, 2008, at 12:07PM, Aaron Mackey [EMAIL PROTECTED] wrote: Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
I guess I'd do it something like this: dbGetQuery(con, CREATE TEMPORARY TABLE foo ( etc etc)) sapply(@userids, function (x) { dbGetQuery(con, paste(INSERT INTO foo (userid) VALUES (, x, ))) }) then later: dbGetQuery(con, DROP TABLE foo); -Aaron On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony [EMAIL PROTECTED] wrote: Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram On Thursday, September 11, 2008, at 12:07PM, Aaron Mackey [EMAIL PROTECTED] wrote: Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Aaron Mackey writes: I guess I'd do it something like this: dbGetQuery(con, CREATE TEMPORARY TABLE foo ( etc etc)) sapply(@userids, function (x) { dbGetQuery(con, paste(INSERT INTO foo (userid) VALUES (, x, ))) }) then later: dbGetQuery(con, DROP TABLE foo); Actually, based on my reading of the DBI reference, you should be able to do the following to create a table (although possibly not temporary): dbWriteTable(connection, r_user_ids, r) Then you can use the following to drop the table: dbRemoveTable(connection, r_user_ids) Of course, I don't know whether the ODBC driver implements these functions or not. (Is 'RODBC' built on DBI? Looks like Aaron and I have been assuming that.) Coey -Aaron On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony [EMAIL PROTECTED] wrote: Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram On Thursday, September 11, 2008, at 12:07PM, Aaron Mackey [EMAIL PROTECTED] wrote: Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
While the subquery with a temporary table is probably the better option, you could just manually generate the subquery and pass it in with the query. As an example, if you have user_ids 1000-1005, instead of having ... where user_id in (select user_id from r_user_id), you would have ... where user_id in (1000,1001,1002,1003,1004,1005). With 2000 user_ids, you'd just have R generate the string, the whole query for that matter, and pass it in. Sure, it's a few thousand characters going into the database as a query, but it sure beats pulling down a few million characters that you're going to just throw away. I have no idea how the performance of the above method would compare to using a temporary table (on the database side, of course), but I have a hard time believing that it wouldn't be within a factor of two. But I'm just making the suggestion. Coey Avram Aelony writes: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
I have not devoted time to setting up ROracle since binaries are not available and it seems to require some effort to compile (see http://cran.r-project.org/web/packages/ROracle/index.html). On the other hand, RODBC worked more or less magically once I set up the data sources. What is your success using ROracle and why would it be preferable to RODBC ? -Avram On Thursday, September 11, 2008, at 12:47PM, Coey Minear [EMAIL PROTECTED] wrote: Aaron Mackey writes: I guess I'd do it something like this: dbGetQuery(con, CREATE TEMPORARY TABLE foo ( etc etc)) sapply(@userids, function (x) { dbGetQuery(con, paste(INSERT INTO foo (userid) VALUES (, x, ))) }) then later: dbGetQuery(con, DROP TABLE foo); Actually, based on my reading of the DBI reference, you should be able to do the following to create a table (although possibly not temporary): dbWriteTable(connection, r_user_ids, r) Then you can use the following to drop the table: dbRemoveTable(connection, r_user_ids) Of course, I don't know whether the ODBC driver implements these functions or not. (Is 'RODBC' built on DBI? Looks like Aaron and I have been assuming that.) Coey -Aaron On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony [EMAIL PROTECTED] wrote: Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram On Thursday, September 11, 2008, at 12:07PM, Aaron Mackey [EMAIL PROTECTED] wrote: Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Avram Aelony writes: I have not devoted time to setting up ROracle since binaries are not available and it seems to require some effort to compile (see http://cran.r-project.org/web/packages/ROracle/index.html). On the other hand, RODBC worked more or less magically once I set up the data sources. What is your success using ROracle and why would it be preferable to RODBC ? -Avram Actually, I've only been using RSQLite, so that's where my (limited) knowledge is coming from (and only for less than a week). RSQLite based on DBI, and noticed that Aaron and I were answering with DBI functions. I then noticed that you kept referring to sqlQuery. That's why I raised the issue of whether RODBC is based on DBI or not. If RODBC is not based on DBI, then obviously you cannot leverage the use of dbWriteTable and dbRemoveTable, as they may not be present. If RODBC offers something similar, you'll have to find out, but that may have driven your initial desire to avoid the temporary table option. Regardless of that, I don't see why you could not consider the pre-built subquery option, unless you find that ODBC, or RODBC, has a limit on what can be sent to the database. Coey On Thursday, September 11, 2008, at 12:47PM, Coey Minear [EMAIL PROTECTED] wrote: Aaron Mackey writes: I guess I'd do it something like this: dbGetQuery(con, CREATE TEMPORARY TABLE foo ( etc etc)) sapply(@userids, function (x) { dbGetQuery(con, paste(INSERT INTO foo (userid) VALUES (, x, ))) }) then later: dbGetQuery(con, DROP TABLE foo); Actually, based on my reading of the DBI reference, you should be able to do the following to create a table (although possibly not temporary): dbWriteTable(connection, r_user_ids, r) Then you can use the following to drop the table: dbRemoveTable(connection, r_user_ids) Of course, I don't know whether the ODBC driver implements these functions or not. (Is 'RODBC' built on DBI? Looks like Aaron and I have been assuming that.) Coey -Aaron On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony [EMAIL PROTECTED] wrote: Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram On Thursday, September 11, 2008, at 12:07PM, Aaron Mackey [EMAIL PROTECTED] wrote: Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) /
Re: [R] database table merging tips with R
At some point I'd like to try to compile the DBI-based ROracle package as well. For now though, I'll stick with RODBC as it seems to do what I need. I believe RODBC is not based on DBI, but that shouldn't preclude using the pre-built subquery option which is a great idea and should work regardless. I will try that. It might also be a great way to implement the innards of the fictional code (for both RODBC and ROracle ) that references a column in the R data frame as I had suggested previously or a better variation on that...! Thanks for your responses and insight. Avram On Thursday, September 11, 2008, at 02:19PM, Coey Minear [EMAIL PROTECTED] wrote: Avram Aelony writes: I have not devoted time to setting up ROracle since binaries are not available and it seems to require some effort to compile (see http://cran.r-project.org/web/packages/ROracle/index.html). On the other hand, RODBC worked more or less magically once I set up the data sources. What is your success using ROracle and why would it be preferable to RODBC ? -Avram Actually, I've only been using RSQLite, so that's where my (limited) knowledge is coming from (and only for less than a week). RSQLite based on DBI, and noticed that Aaron and I were answering with DBI functions. I then noticed that you kept referring to sqlQuery. That's why I raised the issue of whether RODBC is based on DBI or not. If RODBC is not based on DBI, then obviously you cannot leverage the use of dbWriteTable and dbRemoveTable, as they may not be present. If RODBC offers something similar, you'll have to find out, but that may have driven your initial desire to avoid the temporary table option. Regardless of that, I don't see why you could not consider the pre-built subquery option, unless you find that ODBC, or RODBC, has a limit on what can be sent to the database. Coey On Thursday, September 11, 2008, at 12:47PM, Coey Minear [EMAIL PROTECTED] wrote: Aaron Mackey writes: I guess I'd do it something like this: dbGetQuery(con, CREATE TEMPORARY TABLE foo ( etc etc)) sapply(@userids, function (x) { dbGetQuery(con, paste(INSERT INTO foo (userid) VALUES (, x, ))) }) then later: dbGetQuery(con, DROP TABLE foo); Actually, based on my reading of the DBI reference, you should be able to do the following to create a table (although possibly not temporary): dbWriteTable(connection, r_user_ids, r) Then you can use the following to drop the table: dbRemoveTable(connection, r_user_ids) Of course, I don't know whether the ODBC driver implements these functions or not. (Is 'RODBC' built on DBI? Looks like Aaron and I have been assuming that.) Coey -Aaron On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony [EMAIL PROTECTED] wrote: Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it. It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery. -Avram On Thursday, September 11, 2008, at 12:07PM, Aaron Mackey [EMAIL PROTECTED] wrote: Sorry, I see now you want to avoid this, but you did ask what was the best way to efficiently ..., and the temp. table solution certainly matches your description. What's wrong with using a temporary table? -Aaron On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey [EMAIL PROTECTED] wrote: I would load your set of userid's into a temporary table in oracle, then join that table with the rest of your SQL query to get only the matching rows out. -Aaron On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony [EMAIL PROTECTED] wrote: Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to
Re: [R] database table merging tips with R
One possibility is as follows: If r$userid is your array of (2000) ID's then s - paste(r$userid,sep=,) s- paste(select t.userid, x, y, z from largetable t where t.serid in (,s,),sep=) and finally d - sqlQuery(connection,s) Regards, Moshe. --- On Fri, 12/9/08, Avram Aelony [EMAIL PROTECTED] wrote: From: Avram Aelony [EMAIL PROTECTED] Subject: [R] database table merging tips with R To: [EMAIL PROTECTED] Received: Friday, 12 September, 2008, 4:33 AM Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
Just a small correction: start with s - paste(r$userid,collapse=,) and not s - paste(r$userid,sep=,) --- On Fri, 12/9/08, Moshe Olshansky [EMAIL PROTECTED] wrote: From: Moshe Olshansky [EMAIL PROTECTED] Subject: Re: [R] database table merging tips with R To: [EMAIL PROTECTED], Avram Aelony [EMAIL PROTECTED] Received: Friday, 12 September, 2008, 8:59 AM One possibility is as follows: If r$userid is your array of (2000) ID's then s - paste(r$userid,sep=,) s- paste(select t.userid, x, y, z from largetable t where t.serid in (,s,),sep=) and finally d - sqlQuery(connection,s) Regards, Moshe. --- On Fri, 12/9/08, Avram Aelony [EMAIL PROTECTED] wrote: From: Avram Aelony [EMAIL PROTECTED] Subject: [R] database table merging tips with R To: [EMAIL PROTECTED] Received: Friday, 12 September, 2008, 4:33 AM Dear R list, What is the best way to efficiently marry an R dataset with a very large (Oracle) database table? The goal is to only return Oracle table rows that match IDs present in the R dataset. I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*10,0)) ...and I need to pull data from an Oracle table only for these 2000 IDs. The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields. I currently connect to Oracle via odbc: library(RODBC) connection - odbcConnect(, uid=, pwd=) d = sqlQuery(connection, select userid, x, y, z from largetable where timestamp sysdate -7) ...allowing me to pull data from the database table into the R object d and then use the R merge function. The problem however is that if d is too large it may fail due to memory limitations or be inefficient. I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output. Is there a way to do this? Something like the following fictional code: d = sqlQuery(connection, select t.userid, x, y, z from largetable t where r$userid=t.userid) Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how? This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R. I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 . Thanks for your comments, ideas, recommendations. -Avram __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] database table merging tips with R
On Thu, 11 Sep 2008, Coey Minear wrote: Actually, based on my reading of the DBI reference, you should be able to do the following to create a table (although possibly not temporary): dbWriteTable(connection, r_user_ids, r) Then you can use the following to drop the table: dbRemoveTable(connection, r_user_ids) Of course, I don't know whether the ODBC driver implements these functions or not. (Is 'RODBC' built on DBI? Looks like Aaron and I have been assuming that.) RODBC is not built on DBI, but it has analogous functions: sqlSave and sqlDrop. -thomas Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.