Re: [R-sig-DB] SQL escaping/quoting proposal

2013-10-22 Thread Hadley Wickham
 I've cleaned up the examples and made them work and put the results in
 https://gist.github.com/hadley/7057387 - that should make discussion a
 bit more concrete.

I made a few more changes to https://gist.github.com/hadley/7057387

* the method is now based on the connection, not the driver, since for
some drivers the escaping may vary between connections (e.g. ODBC)

* I added a sample tableName function (which I don't think would be
included in DBI), showing how you might accept multiple forms of table
names, and do the right thing as much as possible.

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/

___
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-db


Re: [R-sig-DB] SQL escaping/quoting proposal

2013-10-19 Thread Paul Gilbert



On 13-10-18 12:44 PM, Hadley Wickham wrote:

Hi all,

The approach that DBI takes to escaping is sub-optimal: it tries to
figure out if an R variable name matches an SQL reserved word, and if
so munge it so that there's no longer a conflict. This creates a
situation where there are some identifiers that are valid in R, and
some that are valid in SQL and we have a complicated and bug prone
approach to converting between them.

Instead, I recommend taking an approach where identifiers (i.e. table
and field names) are always quoted using the appropriate database
syntax.


I think this addresses my longtime wish that DBI would present a 
consistent interface on the R side wrt capitalization even though the db 
engines do different things in this regard. This would make changing 
among engines easier. If it does not achieve this, is it possible?


Paul

This not only avoids any problems with SQL reserved words, but

it also ensures that every field name in R (even those containing
spaces and other special characters) can be used in SQL.

To achieve this change, I think we should to:

* deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()`
* add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`.

The new generics would be defined on the driver object, and would come
with default methods as follows:

```
setGeneric(sqlQuoteString, function(drv, x, ...) {
   standardGeneric(sqlQuoteString)
})
setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) {
   x - gsub('', '', x, fixed = TRUE)
   paste('', x, '', sep = )
})

setGeneric(sqlQuoteIdentifer, function(drv, x, ...) {
   standardGeneric(sqlQuoteIdentifer)
})
setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) {
   x - gsub(', '', x, fixed = TRUE)
   paste(', x, ', sep = )
})
```

Individual implementations would be encouraged to provide methods that
use the quoting functions provided by the client library, where
available.

Does anyone see any problems with this approach?

Hadley




___
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-db


Re: [R-sig-DB] SQL escaping/quoting proposal

2013-10-19 Thread Hadley Wickham
I've cleaned up the examples and made them work and put the results in
https://gist.github.com/hadley/7057387 - that should make discussion a
bit more concrete.

 Could you provide a bit more clarification on what the sqlQuoteIdentifier 
 should do?
 How shall we deal when a vector of strings is passed?
   Is it right to assume that sqlQuoteIdentifier(drv, c('a', 'b')) should
   return a vector consisting of quoted results of individual element of the 
 vector?

Yes, I think that makes the most sense.

 How do we construct a reference to table with schema, or column with table?
 eg schema.table or table.column?
   More specifically, is it right to assume that sqlQuoteIdentifier is used 
 for constructing
   individual part of the composite identifier?

I think that would be up to the individual function author: you could
assume that if a vector was passed then you should quote then
concatenate together with ..  Or you could assume that for more
complicated references the user had already flagged that the input
should not be escaped with sql().

 You had a minor mistake in showing the default method (The name is both 
 sqlQuoteString,
 and I am not sure which is intended for sqlQuoteIdentifier).

Fixed.

 Another consideration is for the name of the function. Whether we should
 use sql prefix or use db prefix.  I would like to know what others think for 
 this point.

I'm pretty sure it should be db to be consistent with the rest of the
package. (And I've also added dbFetch as an alias since fetch is the
_only_ function in DBI without the db prefix)

 Here, I think we can avoid problem in most cases, but there are still a bit
 cases where the encoding does not allow proper conversion.
 That's the problem of the database capability, there is not much things that
 the driver can do, though.

Right, we can only support what the db can.

The quoting function could also throw an error if it was not possible
to quote the input in a safe way for the database.

Hadley


-- 
Chief Scientist, RStudio
http://had.co.nz/

___
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-db


Re: [R-sig-DB] SQL escaping/quoting proposal

2013-10-18 Thread Hadley Wickham
 Regarding (b), my reasoning is that when working with Postgres, I would want
 to allow Postgres to do its normal lower casing of unquoted identifiers.
 That is, `thisField` is a valid identifier, but passed in quoted case will
 be preserved, which means always having to quote the identifier in the
 future. Passed in quoted, Postgres will force it to `thisfield`, and if a
 client requests `thisField` unquoted, the correct field will be returned.

It might be reasonable to opt in to this behaviour, but it's dangerous
by default because R is case-sensitive. What happens if you're using
dbWriteTable with this data.frame?

df - data.frame(a = 1, A = 2)

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/

___
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-db


Re: [R-sig-DB] SQL escaping/quoting proposal

2013-10-18 Thread NISHIYAMA Tomoaki
Hi Hadley,

Thanks to bring up this issue and 
I agree in the overall direction and have some minor concerns.

Could you provide a bit more clarification on what the sqlQuoteIdentifier 
should do?
How shall we deal when a vector of strings is passed?
  Is it right to assume that sqlQuoteIdentifier(drv, c('a', 'b')) should 
  return a vector consisting of quoted results of individual element of the 
vector?

How do we construct a reference to table with schema, or column with table?
eg schema.table or table.column?
  More specifically, is it right to assume that sqlQuoteIdentifier is used for 
constructing
  individual part of the composite identifier?

You had a minor mistake in showing the default method (The name is both 
sqlQuoteString,
and I am not sure which is intended for sqlQuoteIdentifier).

Another consideration is for the name of the function. Whether we should 
use sql prefix or use db prefix.  I would like to know what others think for 
this point.

 This not only avoids any problems with SQL reserved words, but
 it also ensures that every field name in R (even those containing
 spaces and other special characters) can be used in SQL.

Here, I think we can avoid problem in most cases, but there are still a bit
cases where the encoding does not allow proper conversion.
That's the problem of the database capability, there is not much things that
the driver can do, though.
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi, 
Kanazawa, 920-0934, Japan


On 2013/10/19, at 1:44, Hadley Wickham wrote:

 Hi all,
 
 The approach that DBI takes to escaping is sub-optimal: it tries to
 figure out if an R variable name matches an SQL reserved word, and if
 so munge it so that there's no longer a conflict. This creates a
 situation where there are some identifiers that are valid in R, and
 some that are valid in SQL and we have a complicated and bug prone
 approach to converting between them.
 
 Instead, I recommend taking an approach where identifiers (i.e. table
 and field names) are always quoted using the appropriate database
 syntax. This not only avoids any problems with SQL reserved words, but
 it also ensures that every field name in R (even those containing
 spaces and other special characters) can be used in SQL.
 
 To achieve this change, I think we should to:
 
 * deprecate `make.db.names()`, `isSQLKeyword()`, and `SQLKeywords()`
 * add new generics `sqlQuoteString()` and `sqlQuoteIdentifier()`.
 
 The new generics would be defined on the driver object, and would come
 with default methods as follows:
 
 ```
 setGeneric(sqlQuoteString, function(drv, x, ...) {
  standardGeneric(sqlQuoteString)
 })
 setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) {
  x - gsub('', '', x, fixed = TRUE)
  paste('', x, '', sep = )
 })
 
 setGeneric(sqlQuoteIdentifer, function(drv, x, ...) {
  standardGeneric(sqlQuoteIdentifer)
 })
 setMethod(sqlQuoteString, DBIDriver, function(drv, x, ...) {
  x - gsub(', '', x, fixed = TRUE)
  paste(', x, ', sep = )
 })
 ```
 
 Individual implementations would be encouraged to provide methods that
 use the quoting functions provided by the client library, where
 available.
 
 Does anyone see any problems with this approach?
 
 Hadley
 
 
 -- 
 Chief Scientist, RStudio
 http://had.co.nz/
 
 ___
 R-sig-DB mailing list -- R Special Interest Group
 R-sig-DB@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-db

___
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-db