On 10/18/2013 12:44 PM, Hadley Wickham wrote:
Hi all,

The approach that DBI takes to escaping is sub-optimal: it tries to
figure out if an R variable name matches an SQL reserved word, and if
so munge it so that there's no longer a conflict. This creates a
situation where there are some identifiers that are valid in R, and
some that are valid in SQL and we have a complicated and bug prone
approach to converting between them.

Instead, I recommend taking an approach where identifiers (i.e. table
and field names) are always quoted using the appropriate database
syntax. This not only avoids any problems with SQL reserved words, but
it also ensures that every field name in R (even those containing
spaces and other special characters) can be used in SQL.

To achieve this change, I think we should to:

* deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()`
* add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`.

The new generics would be defined on the driver object, and would come
with default methods as follows:

```
setGeneric("sqlQuoteString", function(drv, x, ...) {
   standardGeneric("sqlQuoteString")
})
setMethod("sqlQuoteString", "DBIDriver", function(drv, x, ...) {
   x <- gsub('"', '""', x, fixed = TRUE)
   paste('"', x, '"', sep = "")
})

setGeneric("sqlQuoteIdentifer", function(drv, x, ...) {
   standardGeneric("sqlQuoteIdentifer")
})
setMethod("sqlQuoteString", "DBIDriver", function(drv, x, ...) {
   x <- gsub("'", "''", x, fixed = TRUE)
   paste("'", x, "'", sep = "")
})
```

Individual implementations would be encouraged to provide methods that
use the quoting functions provided by the client library, where
available.

Does anyone see any problems with this approach?

Hadley



Hadley,

Admittedly, I rarely use R to *create* SQL table definitions. But I would like to preserve the possibility of (a) creating db-safe identifiers, (b) not using SQL quoting when I don't want to.

Regarding (a), I would suggest *not* deprecating `make.db.names()`, etc.

Regarding (b), my reasoning is that when working with Postgres, I would want to allow Postgres to do its normal lower casing of unquoted identifiers. That is, `thisField` is a valid identifier, but passed in quoted case will be preserved, which means always having to quote the identifier in the future. Passed in quoted, Postgres will force it to `thisfield`, and if a client requests `thisField` unquoted, the correct field will be returned.

--Lee

--
Lee Hachadoorian
Assistant Professor in Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu

_______________________________________________
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-db

Reply via email to